Jan Kiszka wrote:
> Avi Kivity wrote:
>   
>> Jan Kiszka wrote:
>>     
>>> Avi,
>>>
>>> [somehow your mails do not get through to my private account, so I'm
>>> switching]
>>>
>>> Avi Kivity wrote:
>>>   
>>>       
>>>> Jan Kiszka wrote:
>>>>     
>>>>         
>>>>> Clarification: I can't precisely tell what code is executed in VM mode,
>>>>> as I don't have qemu or that guest instrumented. I just see the kernel
>>>>> entering VM mode and leaving it again more than 300 us later. So I
>>>>> wonder why this is allowed while some external IRQ is pending.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> How do you know an external interrupt is pending?
>>>>     
>>>>         
>>> It's the host timer IRQ, programmed to fire in certain intervals (100 us
>>> here). Test case is some latency measurement tool like tglx's cyclictest
>>> or similar programs we use in Xenomai.
>>>
>>>   
>>>       
>>>> kvm programs the hardware to exit when an external interrupt arrives.
>>>>
>>>>     
>>>>         
>>> Here is a latency trace I just managed to capture over 2.6.23.1-rt1 with 
>>> latest kvm from git hacked into (kvm generally seems to work fine this way):
>>>
>>> ...
>>> qemu-sys-7543  0...1 13897us : vmcs_write16+0xb/0x20 
>>> (vmx_save_host_state+0x1a7/0x1c0)
>>> qemu-sys-7543  0...1 13897us : vmcs_writel+0xb/0x30 (vmcs_write16+0x1e/0x20)
>>> qemu-sys-7543  0...1 13898us : segment_base+0xc/0x70 
>>> (vmx_save_host_state+0xa0/0x1c0)
>>> qemu-sys-7543  0...1 13898us : vmcs_writel+0xb/0x30 
>>> (vmx_save_host_state+0xb0/0x1c0)
>>> qemu-sys-7543  0...1 13898us : segment_base+0xc/0x70 
>>> (vmx_save_host_state+0xbf/0x1c0)
>>> qemu-sys-7543  0...1 13898us : vmcs_writel+0xb/0x30 
>>> (vmx_save_host_state+0xcf/0x1c0)
>>> qemu-sys-7543  0...1 13898us : load_msrs+0xb/0x40 
>>> (vmx_save_host_state+0xe7/0x1c0)
>>> qemu-sys-7543  0...1 13898us : kvm_load_guest_fpu+0x8/0x40 
>>> (kvm_vcpu_ioctl_run+0xbf/0x570)
>>> qemu-sys-7543  0D..1 13899us : vmx_vcpu_run+0xc/0x110 
>>> (kvm_vcpu_ioctl_run+0x120/0x570)
>>> qemu-sys-7543  0D..1 13899us!: vmcs_writel+0xb/0x30 
>>> (vmx_vcpu_run+0x22/0x110)
>>> qemu-sys-7543  0D..1 14344us : vmcs_read32+0xb/0x20 
>>> (vmx_vcpu_run+0xc7/0x110)
>>> qemu-sys-7543  0D..1 14345us : vmcs_readl+0x8/0x10 (vmcs_read32+0x16/0x20)
>>> qemu-sys-7543  0D..1 14345us : vmcs_read32+0xb/0x20 
>>> (vmx_vcpu_run+0xf4/0x110)
>>> qemu-sys-7543  0D..1 14345us+: vmcs_readl+0x8/0x10 (vmcs_read32+0x16/0x20)
>>> qemu-sys-7543  0D..1 14349us : irq_enter+0xb/0x30 (do_IRQ+0x45/0xc0)
>>> qemu-sys-7543  0D.h1 14350us : do_IRQ+0x73/0xc0 (f8caae24     0     0)
>>> qemu-sys-7543  0D.h1 14351us : handle_level_irq+0xe/0x120 (do_IRQ+0x7d/0xc0)
>>> qemu-sys-7543  0D.h1 14351us : __spin_lock+0xc/0x30 
>>> (handle_level_irq+0x24/0x120)
>>> qemu-sys-7543  0D.h2 14352us : mask_and_ack_8259A+0x14/0x120 
>>> (handle_level_irq+0x37/0x120)
>>> qemu-sys-7543  0D.h2 14352us+: __spin_lock_irqsave+0x11/0x60 
>>> (mask_and_ack_8259A+0x2a/0x120)
>>> qemu-sys-7543  0D.h3 14357us : __spin_unlock_irqrestore+0xc/0x60 
>>> (mask_and_ack_8259A+0x7a/0x120)
>>> qemu-sys-7543  0D.h2 14358us : redirect_hardirq+0x8/0x70 
>>> (handle_level_irq+0x72/0x120)
>>> qemu-sys-7543  0D.h2 14358us : __spin_unlock+0xb/0x40 
>>> (handle_level_irq+0x8e/0x120)
>>> qemu-sys-7543  0D.h1 14358us : handle_IRQ_event+0xe/0x110 
>>> (handle_level_irq+0x9a/0x120)
>>> qemu-sys-7543  0D.h1 14359us : timer_interrupt+0xb/0x60 
>>> (handle_IRQ_event+0x67/0x110)
>>> qemu-sys-7543  0D.h1 14359us : hrtimer_interrupt+0xe/0x1f0 
>>> (timer_interrupt+0x20/0x60)
>>> ...
>>>
>>> One can see 345 us latency between vm-enter and vm-exit in vmx_vcpu_run -
>>> and this while cyclictest runs at a period of 100 us!
>>>
>>> I got the same results over Adeos/I-pipe & Xenomai with the function
>>> tracer there, also pointing to the period while the CPU is in VM mode.
>>>
>>> Anyone any ideas? Greg, I put you on CC as you said you once saw "decent
>>> latencies" with your patches. Are there still magic bits missing in
>>> official kvm?
>>>   
>>>       
>> No bits missing as far as I know.  It should just work.
>>
>> Can you explain some more about the latenct tracer?  How does it work? 
>>     
>
> Ah, sorry: The latency tracers in both -rt and I-pipe use gcc's -pg to
> put a call to a function called mcount at the beginning of each compiled
> function. mcount is provided by the tracers and stores the caller
> address, its parent, the current time, and more in a log. An API is
> provided to start and stop the trace, e.g. after someone (kernel or user
> space) detected large wakeup latencies.
>
>   

Ok.  So it is not interrupt driven, and that's how you get traces in
functions that run with interrupts disabled.

>> Seeing vmx_vcpu_run() in there confuses me, as it always runs with
>> interrupts disabled (it does dispatch NMIs, so we could be seeing an NMI).
>>     
>
> The point is that the cyclictest does not find large latencies when kvm
> is not happening to start or stop right now. And if you are thinking
> about NMIs triggered by the kvm code on vm-exit: I also instrumented
> that code path, and it is not taken in case of the long delay.
>
>   

Right.  With your explanation it all makes sense, and indeed it looks
like the guest is not exiting.

>> Please post a disassembly of your vmx_vcpu_run so we can interpret the
>> offsets.
>>     
>
> Here it comes:
>
> 00002df0 <vmx_vcpu_run>:
>     2df0:       55                      push   %ebp
>     2df1:       89 e5                   mov    %esp,%ebp
>     2df3:       53                      push   %ebx
>     2df4:       83 ec 08                sub    $0x8,%esp
>     2df7:       e8 fc ff ff ff          call   2df8 <vmx_vcpu_run+0x8>
>     2dfc:       8b 5d 08                mov    0x8(%ebp),%ebx
>     2dff:       0f 20 c0                mov    %cr0,%eax
>     2e02:       89 44 24 04             mov    %eax,0x4(%esp)
>     2e06:       c7 04 24 00 6c 00 00    movl   $0x6c00,(%esp)
>     2e0d:       e8 be d8 ff ff          call   6d0 <vmcs_writel>
>   

first trace


>     2e12:       8b 83 80 0d 00 00       mov    0xd80(%ebx),%eax
>     2e18:       ba 14 6c 00 00          mov    $0x6c14,%edx
>     2e1d:       89 d9                   mov    %ebx,%ecx
>     2e1f:       60                      pusha
>     2e20:       51                      push   %ecx
>     2e21:       0f 79 d4                vmwrite %esp,%edx
>     2e24:       83 f8 00                cmp    $0x0,%eax
>     2e27:       8b 81 78 01 00 00       mov    0x178(%ecx),%eax
>     2e2d:       0f 22 d0                mov    %eax,%cr2
>     2e30:       8b 81 50 01 00 00       mov    0x150(%ecx),%eax
>     2e36:       8b 99 5c 01 00 00       mov    0x15c(%ecx),%ebx
>     2e3c:       8b 91 58 01 00 00       mov    0x158(%ecx),%edx
>     2e42:       8b b1 68 01 00 00       mov    0x168(%ecx),%esi
>     2e48:       8b b9 6c 01 00 00       mov    0x16c(%ecx),%edi
>     2e4e:       8b a9 64 01 00 00       mov    0x164(%ecx),%ebp
>     2e54:       8b 89 54 01 00 00       mov    0x154(%ecx),%ecx
>     2e5a:       75 05                   jne    2e61 <vmx_vcpu_run+0x71>
>     2e5c:       0f 01 c2                vmlaunch
>     2e5f:       eb 03                   jmp    2e64 <vmx_vcpu_run+0x74>
>     2e61:       0f 01 c3                vmresume
>     2e64:       87 0c 24                xchg   %ecx,(%esp)
>     2e67:       89 81 50 01 00 00       mov    %eax,0x150(%ecx)
>     2e6d:       89 99 5c 01 00 00       mov    %ebx,0x15c(%ecx)
>     2e73:       ff 34 24                pushl  (%esp)
>     2e76:       8f 81 54 01 00 00       popl   0x154(%ecx)
>     2e7c:       89 91 58 01 00 00       mov    %edx,0x158(%ecx)
>     2e82:       89 b1 68 01 00 00       mov    %esi,0x168(%ecx)
>     2e88:       89 b9 6c 01 00 00       mov    %edi,0x16c(%ecx)
>     2e8e:       89 a9 64 01 00 00       mov    %ebp,0x164(%ecx)
>     2e94:       0f 20 d0                mov    %cr2,%eax
>     2e97:       89 81 78 01 00 00       mov    %eax,0x178(%ecx)
>     2e9d:       8b 0c 24                mov    (%esp),%ecx
>     2ea0:       59                      pop    %ecx
>     2ea1:       61                      popa
>     2ea2:       0f 96 c0                setbe  %al
>     2ea5:       88 83 84 0d 00 00       mov    %al,0xd84(%ebx)
>     2eab:       c7 04 24 24 48 00 00    movl   $0x4824,(%esp)
>     2eb2:       e8 49 d2 ff ff          call   100 <vmcs_read32>
>   

second trace

>     2eb7:       a8 03                   test   $0x3,%al
>     2eb9:       0f 94 c0                sete   %al
>     2ebc:       0f b6 c0                movzbl %al,%eax
>     2ebf:       89 83 28 01 00 00       mov    %eax,0x128(%ebx)
>     2ec5:       b8 7b 00 00 00          mov    $0x7b,%eax
>     2eca:       8e d8                   mov    %eax,%ds
>     2ecc:       8e c0                   mov    %eax,%es
>     2ece:       c7 83 80 0d 00 00 01    movl   $0x1,0xd80(%ebx)
>     2ed5:       00 00 00
>     2ed8:       c7 04 24 04 44 00 00    movl   $0x4404,(%esp)
>     2edf:       e8 1c d2 ff ff          call   100 <vmcs_read32>
>     2ee4:       25 00 07 00 00          and    $0x700,%eax
>     2ee9:       3d 00 02 00 00          cmp    $0x200,%eax
>     2eee:       75 02                   jne    2ef2 <vmx_vcpu_run+0x102>
>     2ef0:       cd 02                   int    $0x2
>     2ef2:       83 c4 08                add    $0x8,%esp
>     2ef5:       5b                      pop    %ebx
>     2ef6:       5d                      pop    %ebp
>     2ef7:       c3                      ret
>     2ef8:       90                      nop
>     2ef9:       8d b4 26 00 00 00 00    lea    0x0(%esi),%esi
>
> Note that the first, unresolved call here goes to mcount().
>
>   

(the -r option to objdump is handy)

Exiting on a pending interrupt is controlled by the vmcs word
PIN_BASED_EXEC_CONTROL, bit PIN_BASED_EXT_INTR_MASK.  Can you check (via
vmcs_read32()) that the bit is indeed set?

[if not, a guest can just enter a busy loop and kill a processor]



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to