Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

Peter Zijlstra Mon, 09 Nov 2020 03:49:20 -0800

On Mon, Nov 02, 2020 at 06:41:43PM -0800, Andi Kleen wrote:
> On Mon, Nov 02, 2020 at 03:16:25PM +0100, Peter Zijlstra wrote:
> > On Sun, Nov 01, 2020 at 07:52:38PM -0800, Andi Kleen wrote:
> > > The main motivation is actually that the "multiple groups" algorithm
> > > in perf doesn't work all that great: it has quite a few cases where it
> > > starves groups or makes the wrong decisions. That is because it is very
> > > difficult (likely NP complete) problem and the kernel takes a lot
> > > of short cuts to avoid spending too much time on it.
> > 
> > The event scheduling should be starvation free, except in the presence
> > of pinned events.
> > 
> > If you can show starvation without pinned events, it's a bug.
> > 
> > It will also always do equal or better than exclusive mode wrt PMU
> > utilization. Again, if it doesn't it's a bug.
> 
> Simple example (I think we've shown that one before):
> 
> (on skylake)
> $ cat /proc/sys/kernel/nmi_watchdog
> 0
> $ perf stat -e 
> instructions,cycles,frontend_retired.latency_ge_2,frontend_retired.latency_ge_16
>  -a sleep 2
> 
>  Performance counter stats for 'system wide':
> 
>        654,514,990      instructions              #    0.34  insn per cycle   
>         (50.67%)
>      1,924,297,028      cycles                                                
>         (74.28%)
>         21,708,935      frontend_retired.latency_ge_2                         
>             (75.01%)
>          1,769,952      frontend_retired.latency_ge_16                        
>              (24.99%)
> 
>        2.002426541 seconds time elapsed
> 
> The second frontend_retired should be both getting 50% and the fixed events 
> should be getting
> 100%. So several events are starved.


*should* how? Also, nothing is 0% so nothing is getting starved.

> Another similar example is trying to schedule the topdown events on Icelake 
> in parallel to other
> groups. It works with one extra group, but breaks with two.
> 
> (on icelake)
> $ cat /proc/sys/kernel/nmi_watchdog
> 0
> $ perf stat -e 
> '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring},{branches,branches,branches,branches,branches,branches,branches,branches},{branches,branches,branches,branches,branches,branches,branches,branches}'
>  -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>         71,229,087      slots                                                 
>         (60.65%)
>          5,066,320      topdown-bad-spec          #      7.1% bad speculation 
>         (60.65%)
>         35,080,387      topdown-be-bound          #     49.2% backend bound   
>         (60.65%)
>         22,769,750      topdown-fe-bound          #     32.0% frontend bound  
>         (60.65%)
>          8,336,760      topdown-retiring          #     11.7% retiring        
>         (60.65%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>            424,584      branches                                              
>         (70.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
>          3,634,075      branches                                              
>         (30.00%)
> 
>        1.001312511 seconds time elapsed
> 
> A tool using exclusive hopefully will be able to do better than this.

I don't see how, exclusive will always result in equal or worse PMU
utilization, never better.

Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

Reply via email to