perf numbers (was: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue)

Borislav Petkov Sun, 26 Apr 2015 04:22:56 -0700

On Sat, Apr 25, 2015 at 11:12:06PM +0200, Borislav Petkov wrote:
> I've prepended the perf stat output with markers A:, B: or C: for easier
> comparing. The markers mean:
> 
> A: Linus' master from a couple of days ago + tip/master + tip/x86/asm
> B: With Andy's SYSRET patch ontop
> C: Without RCX canonicalness check (see patch at the end).


What was missing is D = B+C results, here they are:

A:    2835570.145246      cpu-clock (msec)                                      
        ( +-  0.02% ) [100.00%]
B:    2833364.074970      cpu-clock (msec)                                      
        ( +-  0.04% ) [100.00%]
C:    2834708.335431      cpu-clock (msec)                                      
        ( +-  0.02% ) [100.00%]
D:    2835055.118431      cpu-clock (msec)                                      
        ( +-  0.01% ) [100.00%]

A:    2835570.099981      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.02% ) [100.00%]
B:    2833364.073633      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.04% ) [100.00%]
C:    2834708.350387      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.02% ) [100.00%]
D:    2835055.094383      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.01% ) [100.00%]

A: 5,591,213,166,613      cycles                    #    1.972 GHz              
        ( +-  0.03% ) [75.00%]
B: 5,585,023,802,888      cycles                    #    1.971 GHz              
        ( +-  0.03% ) [75.00%]
C: 5,587,983,212,758      cycles                    #    1.971 GHz              
        ( +-  0.02% ) [75.00%]
D: 5,584,838,532,936      cycles                    #    1.970 GHz              
        ( +-  0.03% ) [75.00%]

A: 3,106,707,101,530      instructions              #    0.56  insns per cycle  
        ( +-  0.01% ) [75.00%]
B: 3,106,632,251,528      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]
C: 3,106,265,958,142      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]
D: 3,106,294,801,185      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]

A:   683,676,044,429      branches                  #  241.107 M/sec            
        ( +-  0.01% ) [75.00%]
B:   683,670,899,595      branches                  #  241.293 M/sec            
        ( +-  0.01% ) [75.00%]
C:   683,675,772,858      branches                  #  241.180 M/sec            
        ( +-  0.01% ) [75.00%]
D:   683,683,533,664      branches                  #  241.154 M/sec            
        ( +-  0.00% ) [75.00%]

A:    43,829,535,008      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]
B:    43,844,118,416      branch-misses             #    6.41% of all branches  
        ( +-  0.03% ) [75.00%]
C:    43,819,871,086      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]
D:    43,795,107,998      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]

A:         2,030,357      context-switches          #    0.716 K/sec            
        ( +-  0.06% ) [100.00%]
B:         2,029,313      context-switches          #    0.716 K/sec            
        ( +-  0.05% ) [100.00%]
C:         2,028,566      context-switches          #    0.716 K/sec            
        ( +-  0.06% ) [100.00%]
D:         2,028,895      context-switches          #    0.716 K/sec            
        ( +-  0.06% ) [100.00%]

A:            52,421      migrations                #    0.018 K/sec            
        ( +-  1.13% )
B:            52,049      migrations                #    0.018 K/sec            
        ( +-  1.02% )
C:            51,365      migrations                #    0.018 K/sec            
        ( +-  0.92% )
D:            51,766      migrations                #    0.018 K/sec            
        ( +-  1.11% )

A:     709.528485252 seconds time elapsed                                       
   ( +-  0.02% )
B:     708.976557288 seconds time elapsed                                       
   ( +-  0.04% )
C:     709.312844791 seconds time elapsed                                       
   ( +-  0.02% )
D:     709.400050112 seconds time elapsed                                       
   ( +-  0.01% )

So in all events except "branches" - which is comprehensible, btw -
we have a minimal net win when looking at how the numbers in A have
improved in D *with* *both* patches applied.

And those numbers are pretty nice IMO - even if the net win is
measurement artefact and not really a win, we certainly don't have a net
loss so unless anyone objects, I'm going to apply both patches but wait
for Andy's v2 with better comments and changed ss_sel test as per Denys'
suggestion.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

perf numbers (was: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue)

Reply via email to