Jan Kiszka wrote: > Avi Kivity wrote: > >> Jan Kiszka wrote: >> >>> Avi, >>> >>> [somehow your mails do not get through to my private account, so I'm >>> switching] >>> >>> Avi Kivity wrote: >>> >>> >>>> Jan Kiszka wrote: >>>> >>>> >>>>> Clarification: I can't precisely tell what code is executed in VM mode, >>>>> as I don't have qemu or that guest instrumented. I just see the kernel >>>>> entering VM mode and leaving it again more than 300 us later. So I >>>>> wonder why this is allowed while some external IRQ is pending. >>>>> >>>>> >>>>> >>>>> >>>> How do you know an external interrupt is pending? >>>> >>>> >>> It's the host timer IRQ, programmed to fire in certain intervals (100 us >>> here). Test case is some latency measurement tool like tglx's cyclictest >>> or similar programs we use in Xenomai. >>> >>> >>> >>>> kvm programs the hardware to exit when an external interrupt arrives. >>>> >>>> >>>> >>> Here is a latency trace I just managed to capture over 2.6.23.1-rt1 with >>> latest kvm from git hacked into (kvm generally seems to work fine this way): >>> >>> ... >>> qemu-sys-7543 0...1 13897us : vmcs_write16+0xb/0x20 >>> (vmx_save_host_state+0x1a7/0x1c0) >>> qemu-sys-7543 0...1 13897us : vmcs_writel+0xb/0x30 (vmcs_write16+0x1e/0x20) >>> qemu-sys-7543 0...1 13898us : segment_base+0xc/0x70 >>> (vmx_save_host_state+0xa0/0x1c0) >>> qemu-sys-7543 0...1 13898us : vmcs_writel+0xb/0x30 >>> (vmx_save_host_state+0xb0/0x1c0) >>> qemu-sys-7543 0...1 13898us : segment_base+0xc/0x70 >>> (vmx_save_host_state+0xbf/0x1c0) >>> qemu-sys-7543 0...1 13898us : vmcs_writel+0xb/0x30 >>> (vmx_save_host_state+0xcf/0x1c0) >>> qemu-sys-7543 0...1 13898us : load_msrs+0xb/0x40 >>> (vmx_save_host_state+0xe7/0x1c0) >>> qemu-sys-7543 0...1 13898us : kvm_load_guest_fpu+0x8/0x40 >>> (kvm_vcpu_ioctl_run+0xbf/0x570) >>> qemu-sys-7543 0D..1 13899us : vmx_vcpu_run+0xc/0x110 >>> (kvm_vcpu_ioctl_run+0x120/0x570) >>> qemu-sys-7543 0D..1 13899us!: vmcs_writel+0xb/0x30 >>> (vmx_vcpu_run+0x22/0x110) >>> qemu-sys-7543 0D..1 14344us : vmcs_read32+0xb/0x20 >>> (vmx_vcpu_run+0xc7/0x110) >>> qemu-sys-7543 0D..1 14345us : vmcs_readl+0x8/0x10 (vmcs_read32+0x16/0x20) >>> qemu-sys-7543 0D..1 14345us : vmcs_read32+0xb/0x20 >>> (vmx_vcpu_run+0xf4/0x110) >>> qemu-sys-7543 0D..1 14345us+: vmcs_readl+0x8/0x10 (vmcs_read32+0x16/0x20) >>> qemu-sys-7543 0D..1 14349us : irq_enter+0xb/0x30 (do_IRQ+0x45/0xc0) >>> qemu-sys-7543 0D.h1 14350us : do_IRQ+0x73/0xc0 (f8caae24 0 0) >>> qemu-sys-7543 0D.h1 14351us : handle_level_irq+0xe/0x120 (do_IRQ+0x7d/0xc0) >>> qemu-sys-7543 0D.h1 14351us : __spin_lock+0xc/0x30 >>> (handle_level_irq+0x24/0x120) >>> qemu-sys-7543 0D.h2 14352us : mask_and_ack_8259A+0x14/0x120 >>> (handle_level_irq+0x37/0x120) >>> qemu-sys-7543 0D.h2 14352us+: __spin_lock_irqsave+0x11/0x60 >>> (mask_and_ack_8259A+0x2a/0x120) >>> qemu-sys-7543 0D.h3 14357us : __spin_unlock_irqrestore+0xc/0x60 >>> (mask_and_ack_8259A+0x7a/0x120) >>> qemu-sys-7543 0D.h2 14358us : redirect_hardirq+0x8/0x70 >>> (handle_level_irq+0x72/0x120) >>> qemu-sys-7543 0D.h2 14358us : __spin_unlock+0xb/0x40 >>> (handle_level_irq+0x8e/0x120) >>> qemu-sys-7543 0D.h1 14358us : handle_IRQ_event+0xe/0x110 >>> (handle_level_irq+0x9a/0x120) >>> qemu-sys-7543 0D.h1 14359us : timer_interrupt+0xb/0x60 >>> (handle_IRQ_event+0x67/0x110) >>> qemu-sys-7543 0D.h1 14359us : hrtimer_interrupt+0xe/0x1f0 >>> (timer_interrupt+0x20/0x60) >>> ... >>> >>> One can see 345 us latency between vm-enter and vm-exit in vmx_vcpu_run - >>> and this while cyclictest runs at a period of 100 us! >>> >>> I got the same results over Adeos/I-pipe & Xenomai with the function >>> tracer there, also pointing to the period while the CPU is in VM mode. >>> >>> Anyone any ideas? Greg, I put you on CC as you said you once saw "decent >>> latencies" with your patches. Are there still magic bits missing in >>> official kvm? >>> >>> >> No bits missing as far as I know. It should just work. >> >> Can you explain some more about the latenct tracer? How does it work? >> > > Ah, sorry: The latency tracers in both -rt and I-pipe use gcc's -pg to > put a call to a function called mcount at the beginning of each compiled > function. mcount is provided by the tracers and stores the caller > address, its parent, the current time, and more in a log. An API is > provided to start and stop the trace, e.g. after someone (kernel or user > space) detected large wakeup latencies. > >
Ok. So it is not interrupt driven, and that's how you get traces in functions that run with interrupts disabled. >> Seeing vmx_vcpu_run() in there confuses me, as it always runs with >> interrupts disabled (it does dispatch NMIs, so we could be seeing an NMI). >> > > The point is that the cyclictest does not find large latencies when kvm > is not happening to start or stop right now. And if you are thinking > about NMIs triggered by the kvm code on vm-exit: I also instrumented > that code path, and it is not taken in case of the long delay. > > Right. With your explanation it all makes sense, and indeed it looks like the guest is not exiting. >> Please post a disassembly of your vmx_vcpu_run so we can interpret the >> offsets. >> > > Here it comes: > > 00002df0 <vmx_vcpu_run>: > 2df0: 55 push %ebp > 2df1: 89 e5 mov %esp,%ebp > 2df3: 53 push %ebx > 2df4: 83 ec 08 sub $0x8,%esp > 2df7: e8 fc ff ff ff call 2df8 <vmx_vcpu_run+0x8> > 2dfc: 8b 5d 08 mov 0x8(%ebp),%ebx > 2dff: 0f 20 c0 mov %cr0,%eax > 2e02: 89 44 24 04 mov %eax,0x4(%esp) > 2e06: c7 04 24 00 6c 00 00 movl $0x6c00,(%esp) > 2e0d: e8 be d8 ff ff call 6d0 <vmcs_writel> > first trace > 2e12: 8b 83 80 0d 00 00 mov 0xd80(%ebx),%eax > 2e18: ba 14 6c 00 00 mov $0x6c14,%edx > 2e1d: 89 d9 mov %ebx,%ecx > 2e1f: 60 pusha > 2e20: 51 push %ecx > 2e21: 0f 79 d4 vmwrite %esp,%edx > 2e24: 83 f8 00 cmp $0x0,%eax > 2e27: 8b 81 78 01 00 00 mov 0x178(%ecx),%eax > 2e2d: 0f 22 d0 mov %eax,%cr2 > 2e30: 8b 81 50 01 00 00 mov 0x150(%ecx),%eax > 2e36: 8b 99 5c 01 00 00 mov 0x15c(%ecx),%ebx > 2e3c: 8b 91 58 01 00 00 mov 0x158(%ecx),%edx > 2e42: 8b b1 68 01 00 00 mov 0x168(%ecx),%esi > 2e48: 8b b9 6c 01 00 00 mov 0x16c(%ecx),%edi > 2e4e: 8b a9 64 01 00 00 mov 0x164(%ecx),%ebp > 2e54: 8b 89 54 01 00 00 mov 0x154(%ecx),%ecx > 2e5a: 75 05 jne 2e61 <vmx_vcpu_run+0x71> > 2e5c: 0f 01 c2 vmlaunch > 2e5f: eb 03 jmp 2e64 <vmx_vcpu_run+0x74> > 2e61: 0f 01 c3 vmresume > 2e64: 87 0c 24 xchg %ecx,(%esp) > 2e67: 89 81 50 01 00 00 mov %eax,0x150(%ecx) > 2e6d: 89 99 5c 01 00 00 mov %ebx,0x15c(%ecx) > 2e73: ff 34 24 pushl (%esp) > 2e76: 8f 81 54 01 00 00 popl 0x154(%ecx) > 2e7c: 89 91 58 01 00 00 mov %edx,0x158(%ecx) > 2e82: 89 b1 68 01 00 00 mov %esi,0x168(%ecx) > 2e88: 89 b9 6c 01 00 00 mov %edi,0x16c(%ecx) > 2e8e: 89 a9 64 01 00 00 mov %ebp,0x164(%ecx) > 2e94: 0f 20 d0 mov %cr2,%eax > 2e97: 89 81 78 01 00 00 mov %eax,0x178(%ecx) > 2e9d: 8b 0c 24 mov (%esp),%ecx > 2ea0: 59 pop %ecx > 2ea1: 61 popa > 2ea2: 0f 96 c0 setbe %al > 2ea5: 88 83 84 0d 00 00 mov %al,0xd84(%ebx) > 2eab: c7 04 24 24 48 00 00 movl $0x4824,(%esp) > 2eb2: e8 49 d2 ff ff call 100 <vmcs_read32> > second trace > 2eb7: a8 03 test $0x3,%al > 2eb9: 0f 94 c0 sete %al > 2ebc: 0f b6 c0 movzbl %al,%eax > 2ebf: 89 83 28 01 00 00 mov %eax,0x128(%ebx) > 2ec5: b8 7b 00 00 00 mov $0x7b,%eax > 2eca: 8e d8 mov %eax,%ds > 2ecc: 8e c0 mov %eax,%es > 2ece: c7 83 80 0d 00 00 01 movl $0x1,0xd80(%ebx) > 2ed5: 00 00 00 > 2ed8: c7 04 24 04 44 00 00 movl $0x4404,(%esp) > 2edf: e8 1c d2 ff ff call 100 <vmcs_read32> > 2ee4: 25 00 07 00 00 and $0x700,%eax > 2ee9: 3d 00 02 00 00 cmp $0x200,%eax > 2eee: 75 02 jne 2ef2 <vmx_vcpu_run+0x102> > 2ef0: cd 02 int $0x2 > 2ef2: 83 c4 08 add $0x8,%esp > 2ef5: 5b pop %ebx > 2ef6: 5d pop %ebp > 2ef7: c3 ret > 2ef8: 90 nop > 2ef9: 8d b4 26 00 00 00 00 lea 0x0(%esi),%esi > > Note that the first, unresolved call here goes to mcount(). > > (the -r option to objdump is handy) Exiting on a pending interrupt is controlled by the vmcs word PIN_BASED_EXEC_CONTROL, bit PIN_BASED_EXT_INTR_MASK. Can you check (via vmcs_read32()) that the bit is indeed set? [if not, a guest can just enter a busy loop and kill a processor] -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel