* Linus Torvalds <torva...@linux-foundation.org> wrote:

> On Fri, Mar 27, 2015 at 1:53 PM, Brian Gerst <brge...@gmail.com> wrote:
> >> <-- IRQ.  Boom
> >
> > The sti will delay interrupts for one instruction, and that should include 
> > NMIs.
> 
> Nope. Intel explicitly documents the NMI case only for mov->ss and popss.

Interestingly, I still see a STI 'NMI shadow' even on Intel CPUs.

Try something like this as root on a system with Intel CPUs (running 
recent tools/perf), with high-freq NMI sampling:

   perf top -F 10000

execute a tight syscall loop on all CPUs (getppid() loop for example), 
and you'll see something like this in the profile:

Samples: 1M of event 'cycles', Event count (approx.): 377899840545              
                                                                                
          
Overhead  Shared Object                      Symbol                             
                                                                                
         ◆
  27.67%  libc-2.19.so                       [.] __GI___getppid                 
                                                                                
         ▒
  21.34%  [kernel]                           [k] system_call                    
                                                                                
         ▒
  17.42%  [kernel]                           [k] system_call_after_swapgs       
                                                                                
         ▒
  12.00%  [kernel]                           [k] pid_vnr                        
                                                                                
         ▒
   7.49%  [kernel]                           [k] sys_getppid                    
                                                                                
         ▒
   5.49%  [kernel]                           [k] sysret_check                   
                                                                                
         ▒
   5.34%  loop-getppid                       [.] main                           
                                                                                
         ▒
   1.56%  [kernel]                           [k] system_call_fastpath           
                                                                                
         ▒
   0.36%  loop-getppid                       [.] getppid@plt                    
                                                                                
         ▒

Note the very high sample count (due to sampling at 10 KHz).

Now if you hit '<Enter>' twice to annotate system_call_after_swapgs 
you should see something like this (the live kernel image disassembly, 
annotated):

  system_call_after_swapgs  /proc/kcore                                         
                                                                                
            
       │                                             
       │                                             
       │                                             
       │              Disassembly of section load0:  
       │                                             
       │              ffffffff8178b3f3 <load0>:      
  9.72 │ffffffff8178b3f3:   mov    %rsp,%gs:0xb040
 44.24 │ffffffff8178b3fc:   mov    %gs:0xb888,%rsp
  0.02 │ffffffff8178b405:   sti                      
       │ffffffff8178b406:   nopl   0x0(%rax)         
 16.04 │ffffffff8178b40d:   sub    $0x50,%rsp
       │ffffffff8178b411:   mov    %rdi,0x40(%rsp)   
  6.51 │ffffffff8178b416:   mov    %rsi,0x38(%rsp)
  5.81 │ffffffff8178b41b:   mov    %rdx,0x30(%rsp)
  2.22 │ffffffff8178b420:   mov    %rax,0x20(%rsp)
  2.16 │ffffffff8178b425:   mov    %r8,0x18(%rsp)
  0.93 │ffffffff8178b42a:   mov    %r9,0x10(%rsp)
  1.57 │ffffffff8178b42f:   mov    %r10,0x8(%rsp)
  3.70 │ffffffff8178b434:   mov    %r11,(%rsp)
  2.27 │ffffffff8178b438:   mov    %rax,0x48(%rsp)
                                                
Note how the 7-byte NOP after the STI did not get a single profiler 
hit.

This is with the default '-e cycles', not '-e cycles:pp', so what we 
see as profiler hits should be the raw NMI entry RIPs.

Arguably this could be just the decoder hiding the NOP efficiently, 
I'll try to run some more experiments ...

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to