(CCs Intel folks)

On Tue, 2013-08-06 at 09:29 +0200, Mike Galbraith wrote: 
> On Tue, 2013-07-30 at 11:35 +0200, Peter Zijlstra wrote:
> 
> > It would be good if you could do what Thomas suggested and look at which
> > timer is actually active during your workload.
> 
> Rebuilding regression test trees, some pipe-test results...
> 
> I'm missing mwait_idle() rather a lot on Q6600, and at 3.8, E5620 took a
> severe NOHZ drubbing from the menu governor. 
> 
> pipe-test, scheduling cross core
> 
> NOTE: nohz is throttled here (patchlet below), as to not eat horrible
> microidle cost, see E5620 v3.7.10-nothrottle below.
> 
> Q6600
> v3.8.13                  500.6 KHz     1.000
> v3.9.11                  422.4 KHz      .843
> v3.10.4                  420.2 KHz      .839
> v3.11-rc3-4-g36f571e     404.7 KHz      .808
> 
> Q6600 3.9 regression:
> guilty party is 69fb3676 x86 idle: remove mwait_idle() and "idle=mwait" 
> cmdline param
> halt sucks, HTH does one activate mwait_idle_with_hints() [processor_idle()] 
> for core2 boxen?
> 
> E5620                                            +write 0 -> 
> /dev/cpu_dma_latency, hold open
> v3.7.10                  578.5 KHz     1.000     675.4 KHz     1.000
> v3.7.10-nothrottle       366.7 KHz      .633     395.0 KHz      .584
> v3.8.13                  468.3 KHz      .809     690.0 KHz     1.021
> v3.8.13 idle=mwait       595.1 KHz     1.028     NA
> v3.9.11                  462.0 KHz      .798     691.1 KHz     1.023
> v3.10.4                  419.4 KHz      .724     570.8 KHz      .845
> v3.11-rc3-4-g36f571e     400.1 KHz      .691     538.5 KHz      .797
> 
> E5620 3.8 regression:
> guilty party: 69a37bea cpuidle: Quickly notice prediction failure for repeat 
> mode
> 
> 
> Q6600 (2.4 GHz core2 quad)
>      v3.11-rc3-4-g36f571e                       v3.8.13
>      7.97%  [k] reschedule_interrupt            8.63%  [k] __schedule
>      6.27%  [k] __schedule                      6.07%  [k] native_sched_clock
>      4.74%  [k] native_sched_clock              4.96%  [k] system_call
>      4.23%  [k] _raw_spin_lock_irqsave          4.30%  [k] 
> _raw_spin_lock_irqsave
>      3.39%  [k] system_call                     4.06%  [k] resched_task
>      2.89%  [k] sched_clock_local               3.44%  [k] sched_clock_local
>      2.79%  [k] mutex_lock                      3.39%  [k] pipe_read
>      2.57%  [k] pipe_read                       3.21%  [k] mutex_lock
>      2.55%  [k] __switch_to                     2.98%  [k] read_tsc
>      2.24%  [k] read_tsc                        2.87%  [k] __switch_to
> 
> 
> E5620 (2.4 GHz Westmere quad)
>     v3.7.10                                     v3.7.10-nothrottle            
>            v3.7.10-nothrottle
>     8.01%  [k] __schedule                      25.80%  [k] 
> _raw_spin_unlock_irqrestore   21.80%  [k] _raw_spin_unlock_irqrestore
>     4.49%  [k] resched_tas                      4.64%  [k] 
> __hrtimer_start_range_ns      - _raw_spin_unlock_irqrestore
>     3.94%  [k] mutex_lock                       4.62%  [k] timerqueue_add     
>               + 37.94% __hrtimer_start_range_ns
>     3.44%  [k] __switch_to                      4.54%  [k] __schedule         
>                 19.69% hrtimer_cancel
>     3.18%  [k] menu_select                      2.84%  [k] enqueue_hrtimer    
>                    tick_nohz_restart
>     3.05%  [k] copy_user_generic_string         2.64%  [k] resched_task       
>                    tick_nohz_idle_exit
>     3.02%  [k] task_waking_fair                 2.29%  [k] 
> _raw_spin_lock_irqsave                cpu_idle
>     2.91%  [k] mutex_unlock                     2.28%  [k] mutex_lock         
>                    start_secondary
>     2.82%  [k] pipe_read                        1.96%  [k] __switch_to        
>               + 16.05% hrtimer_start_range_ns
>     2.32%  [k] ktime_get_real                   1.73%  [k] menu_select        
>                 15.46% hrtimer_start
>                                                                               
>                    tick_nohz_stop_sched_tick
>                                                                               
>                    __tick_nohz_idle_enter
>                                                                               
>                    tick_nohz_idle_enter
>                                                                               
>                    cpu_idle
>                                                                               
>                    start_secondary
>                                                                               
>                 6.37% hrtimer_try_to_cancel
>                                                                               
>                    hrtimer_cancel
>                                                                               
>                    tick_nohz_restart
>                                                                               
>                    tick_nohz_idle_exit
>                                                                               
>                    cpu_idle
>                                                                               
>                    start_secondary
> 
>     v3.8.13                                    v3.8.13 idle=mwait             
>            v3.8.13 (throttled, but menu gov bites.. HARD)
>     23.16%  [k] _raw_spin_unlock_irqrestore    8.35%  [k] __schedule          
>            -  22.91%  [k] _raw_spin_unlock_irqrestore
>      4.93%  [k] __schedule                     6.49%  [k] __switch_to         
>               - _raw_spin_unlock_irqrestore
>      3.42%  [k] resched_task                   5.71%  [k] resched_task        
>                  - 47.26% hrtimer_try_to_cancel
>      3.27%  [k] __switch_to                    4.64%  [k] mutex_lock          
>                       hrtimer_cancel
>      3.05%  [k] mutex_lock                     3.48%  [k] 
> copy_user_generic_string                  menu_hrtimer_cancel
>      2.32%  [k] copy_user_generic_string       3.15%  [k] task_waking_fair    
>                       tick_nohz_idle_exit
>      2.30%  [k] _raw_spin_lock_irqsave         3.13%  [k] pipe_read           
>                       cpu_idle
>      2.15%  [k] pipe_read                      2.61%  [k] mutex_unlock        
>                       start_secondary
>      2.15%  [k] task_waking_fair               2.54%  [k] finish_task_switch  
>                  - 40.01% __hrtimer_start_range_ns
>      2.08%  [k] ktime_get                      2.29%  [k] 
> _raw_spin_lock_irqsave                    hrtimer_start
>      1.87%  [k] mutex_unlock                   1.91%  [k] idle_cpu            
>                       menu_select
>      1.76%  [k] finish_task_switch             1.84%  [k] __wake_up_common    
>                       cpuidle_idle_call
>                                                                               
>                       cpu_idle
>                                                                               
>                       start_secondary
> 
>     v3.9.11
>     18.67%  [k] _raw_spin_unlock_irqrestore
>      4.36%  [k] __schedule
>      3.66%  [k] __switch_to
>      3.13%  [k] mutex_lock
>      2.97%  [k] __hrtimer_start_range_ns
>      2.69%  [k] _raw_spin_lock_irqsave
>      2.38%  [k] copy_user_generic_string
>      2.34%  [k] hrtimer_reprogram.isra.32
>      2.34%  [k] task_waking_fair
>      2.25%  [k] ktime_get
>      2.14%  [k] pipe_read
>      1.98%  [k] menu_select
> 
>     v3.10.4
>     20.42%  [k] _raw_spin_unlock_irqrestore
>      4.75%  [k] __schedule
>      4.42%  [k] reschedule_interrupt  <== appears in 3.10, guilty party as 
> yet unknown
>      3.52%  [k] __switch_to
>      3.27%  [k] resched_task
>      2.64%  [k] cpuidle_enter_state
>      2.63%  [k] _raw_spin_lock_irqsave
>      2.04%  [k] copy_user_generic_string
>      2.00%  [k] cpu_idle_loop
>      1.97%  [k] mutex_lock
>      1.90%  [k] ktime_get
>      1.75%  [k] task_waking_fair
> 
>    v3.11-rc3-4-g36f571e
>    18.96%  [k] _raw_spin_unlock_irqrestore
>     4.84%  [k] __schedule
>     4.69%  [k] reschedule_interrupt
>     3.75%  [k] __switch_to
>     2.62%  [k] _raw_spin_lock_irqsave
>     2.43%  [k] cpuidle_enter_state
>     2.28%  [k] resched_task
>     2.20%  [k] cpu_idle_loop
>     1.97%  [k] copy_user_generic_string
>     1.88%  [k] ktime_get
>     1.81%  [k] task_waking_fair
>     1.75%  [k] mutex_lock
> 
> sched: ratelimit nohz
> 
> Entering nohz code on every micro-idle is too expensive to bear.
> 
> Signed-off-by: Mike Galbraith <efa...@gmx.de>
> 
> ---
>  include/linux/sched.h    |    5 +++++
>  kernel/sched/core.c      |    5 +++++
>  kernel/time/tick-sched.c |    2 +-
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -235,9 +235,14 @@ extern int runqueue_is_locked(int cpu);
>  extern void nohz_balance_enter_idle(int cpu);
>  extern void set_cpu_sd_state_idle(void);
>  extern int get_nohz_timer_target(void);
> +extern int sched_needs_cpu(int cpu);
>  #else
>  static inline void nohz_balance_enter_idle(int cpu) { }
>  static inline void set_cpu_sd_state_idle(void) { }
> +static inline int sched_needs_cpu(int cpu)
> +{
> +     return 0;
> +}
>  #endif
>  
>  /*
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -650,6 +650,11 @@ static inline bool got_nohz_idle_kick(vo
>       return false;
>  }
>  
> +int sched_needs_cpu(int cpu)
> +{
> +     return  cpu_rq(cpu)->avg_idle < sysctl_sched_migration_cost;
> +}
> +
>  #else /* CONFIG_NO_HZ_COMMON */
>  
>  static inline bool got_nohz_idle_kick(void)
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -548,7 +548,7 @@ static ktime_t tick_nohz_stop_sched_tick
>               time_delta = timekeeping_max_deferment();
>       } while (read_seqretry(&jiffies_lock, seq));
>  
> -     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
> +     if (sched_needs_cpu(cpu) || rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
>           arch_needs_cpu(cpu) || irq_work_needs_cpu()) {
>               next_jiffies = last_jiffies + 1;
>               delta_jiffies = 1;
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to