Re: Runtime-modified DIMMs and live migration issue
Fixed with cherry-pick of the 7a72f7a140bfd3a5dae73088947010bfdbcf6a40 and its predecessor 7103f60de8bed21a0ad5d15d2ad5b7a333dda201. Of course this is not a real fix as the only race precondition is shifted/disappeared by a clear assumption. Though there are not too many hotplug users around, I hope this information would be useful for those who would experience the same in a next year or so, until 3.18+ will be stable enough for hypervisor kernel role. Any suggestions on a further debug/race re-exposition are of course very welcomed. CCing kvm@ as it looks as a hypervisor subsystem issue then. The entire discussion can be found at https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg03117.html . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008
But you are very appositely mistaken: copy_huge_page() used to make the same mistake, and Dave Hansen fixed it back in v3.13, but the fix never went to the stable trees. commit 30b0a105d9f7141e4cbf72ae5511832457d89788 Author: Dave Hansen dave.han...@linux.intel.com Date: Thu Nov 21 14:31:58 2013 -0800 mm: thp: give transparent hugepage code a separate copy_page Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); and a non-hugetlbfs page has no page_hstate(). This works 99% of the time because page_hstate() determines the hstate from the page order alone. Since the page order of a THP page matches the default hugetlbfs page order, it works. But, if you change the default huge page size on the boot command-line (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate so page_hstate() returns null and copy_huge_page() oopses pretty fast since copy_huge_page() dereferences the hstate: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) MAX_ORDER_NR_PAGES)) { ... Mel noticed that the migration code is really the only user of these functions. This moves all the copy code over to migrate.c and makes copy_huge_page() work for THP by checking for it explicitly. I believe the bug was introduced in commit b32967ff101a (mm: numa: Add THP migration for the NUMA working set scanning fault case) [a...@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi] Signed-off-by: Dave Hansen dave.han...@linux.intel.com Acked-by: Mel Gorman mgor...@suse.de Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: Hillf Danton dhi...@gmail.com Cc: Andrea Arcangeli aarca...@redhat.com Tested-by: Dave Jiang dave.ji...@intel.com Signed-off-by: Andrew Morton a...@linux-foundation.org Signed-off-by: Linus Torvalds torva...@linux-foundation.org Thanks, the issue is fixed on 3.10 with trivial patch modification. Ping? 3.10 still misses that.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
A small update: the behavior is caused by setting unrestricted_guest feature to N, I had this feature disabled everywhere from approx. three years ago when its enablement was one of suspects of the host crashes with contemporary then KVM module. Also nVMX is likely to not work at all and produce the same traces as in https://lkml.org/lkml/2014/7/17/12 without unrestricted_guest=1. I think this fact actually explaining all real mode weirdness we`ve seen before and this should be probably ended either by putting appropriate bits in a README or module information or making strict dependency between apicv/unrestricted_guest+nested/unrestricted_guest or fixing the issue at its root if this is possible or appropriate solution. Thanks everyone for keeping up with ideas through this thread! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Wed, Apr 1, 2015 at 2:49 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-31 21:23+0300, Andrey Korolyov: On Tue, Mar 31, 2015 at 9:04 PM, Bandan Das b...@redhat.com wrote: Bandan Das b...@redhat.com writes: Andrey Korolyov and...@xdel.ru writes: ... http://xdel.ru/downloads/kvm-e5v2-issue/another-tracepoint-fail-with-apicv.dat.gz Something a bit more interesting, but the mess is happening just *after* NMI firing. What happens if NMI is turned off on the host ? Sorry, I meant the watchdog.. Thanks, everything goes well (as it probably should go there): http://xdel.ru/downloads/kvm-e5v2-issue/apicv-enabled-nmi-disabled.dat.gz Nice revelation! KVM doesn't expect host's NMIs to look like this so it doesn't pass them to the host. What was the watchdog that casually sent NMIs? (It worked after nmi_watchdog=0 on the host?) (Guest's NMI should have a different result as well. NMI_EXCEPTION is an expected exit reason for guest's hard exceptions, they are then differentiated by intr_info and nothing hinted that this was a NMI.) Yes, I disabled host watchdog during runtime. Indeed guest-induced NMI would look different and they had no reasons to be fired at this stage inside guest. I`d suspect a hypervisor hardware misbehavior there but have a very little idea on how APICv behavior (which is completely microcode-dependent and CPU-dependent but decoupled from peripheral hardware) may vary at this point, I am using 1.20140913.1 ucode version from debian if this can matter. Will send trace suggested by Paolo in a next couple of hours. Also it would be awesome to ask hardware folks from Intel who can prove or disprove my abovementioned statement (as I was unable to catch the problem on 2603v2 so far, this hypothesis has some chance to be real). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Wed, Apr 1, 2015 at 4:19 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 01/04/2015 14:26, Andrey Korolyov wrote: Yes, I disabled host watchdog during runtime. Indeed guest-induced NMI would look different and they had no reasons to be fired at this stage inside guest. I`d suspect a hypervisor hardware misbehavior there but have a very little idea on how APICv behavior (which is completely microcode-dependent and CPU-dependent but decoupled from peripheral hardware) may vary at this point, I am using 1.20140913.1 ucode version from debian if this can matter. Will send trace suggested by Paolo in a next couple of hours. Also it would be awesome to ask hardware folks from Intel who can prove or disprove my abovementioned statement (as I was unable to catch the problem on 2603v2 so far, this hypothesis has some chance to be real). Yes, the interaction with the NMI watchdog is unexpected and makes a processor erratum somewhat more likely. Paolo http://xdel.ru/downloads/kvm-e5v2-issue/trace-nmi-apicv-fail-at-reboot.dat.gz err, no NMI entries nearby failure event, though capture should be correct: /sys/kernel/debug/tracing/events/kvm*/filter /sys/kernel/debug/tracing/events/*/kvm*/filter /sys/kernel/debug/tracing/events/nmi*/filter /sys/kernel/debug/tracing/events/*/nmi*/filter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Wed, Apr 1, 2015 at 6:37 PM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Apr 1, 2015 at 4:19 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 01/04/2015 14:26, Andrey Korolyov wrote: Yes, I disabled host watchdog during runtime. Indeed guest-induced NMI would look different and they had no reasons to be fired at this stage inside guest. I`d suspect a hypervisor hardware misbehavior there but have a very little idea on how APICv behavior (which is completely microcode-dependent and CPU-dependent but decoupled from peripheral hardware) may vary at this point, I am using 1.20140913.1 ucode version from debian if this can matter. Will send trace suggested by Paolo in a next couple of hours. Also it would be awesome to ask hardware folks from Intel who can prove or disprove my abovementioned statement (as I was unable to catch the problem on 2603v2 so far, this hypothesis has some chance to be real). Yes, the interaction with the NMI watchdog is unexpected and makes a processor erratum somewhat more likely. Paolo http://xdel.ru/downloads/kvm-e5v2-issue/trace-nmi-apicv-fail-at-reboot.dat.gz err, no NMI entries nearby failure event, though capture should be correct: /sys/kernel/debug/tracing/events/kvm*/filter /sys/kernel/debug/tracing/events/*/kvm*/filter /sys/kernel/debug/tracing/events/nmi*/filter /sys/kernel/debug/tracing/events/*/nmi*/filter Moved 2603v2s back and issue is still here. I used wrong pattern for the issue on a previous series of tests on those CPUs in the middle of month, continuously respawning VMs when the real issue is hiding in *first* reboot events starting from the hypervisor reboot (or module load). So either it should be reproducible anywhere or this is not a hardware issue (or it is related to the mainboard instead of CPU itself :) ). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
*putting my tinfoil hat on* After thinking a little bit more, the observable behavior is a quite good match for a bios-level hypervisor (hardware trojan in a modern terminology), as it likely is sensitive to timing[1], does not appear more than once per VM during boot cycle and seemingly does not regard a fact if kvm-intel was reloaded once or twice (or more) and not reproducible outside of domain of a single board model. If nobody has a better suggestions to try on, I`ll do a couple of steps in a next days: - extract and compare bios to the vendor`s image with SPI programmer, - extract and compare BMC image with public version (should be easy as well), - try to analyze switch timings by writing sample code for a bare hardware (there can be a hint that the L2 Linux guest can expose larger execution time difference with L1 on host with top-level hypervisor than on supposedly 'non-infected' one), - try to analyze binary BIOS code itself, though it can be VERY problematic, I am even not talking for same possibility for BMC. Sorry for posting such a naive and stupid stuff in the public ml, but I am really out of clues of what`s happening there and why it is not reproducible anywhere else. 1. https://xakep.ru/2011/12/26/58104/ (russian text, but can be read through g-translate without lack of details) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008
On Sun, Mar 29, 2015 at 3:25 AM, Hugh Dickins hu...@google.com wrote: On Sat, 28 Mar 2015, Andrey Korolyov wrote: On Tue, Feb 24, 2015 at 3:12 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Feb 04, 2015 at 08:34:04PM +0400, Andrey Korolyov wrote: Hi, I've seen the problem quite a few times. Before spending more time on it, I'd like to have a quick check here to see if anyone ever saw the same problem? Hope it is a relevant question with this mail list. Jul 2 11:08:21 arno-3 kernel: [ 2165.078623] BUG: unable to handle kernel NULL pointer dereference at 0008 Jul 2 11:08:21 arno-3 kernel: [ 2165.078916] IP: [8118d0fa] copy_huge_page+0x8a/0x2a0 Jul 2 11:08:21 arno-3 kernel: [ 2165.079128] PGD 0 Jul 2 11:08:21 arno-3 kernel: [ 2165.079198] Oops: [#1] SMP Jul 2 11:08:21 arno-3 kernel: [ 2165.079319] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bridge stp llc ast ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea lp mei_me ioatdma ext2 parport mei shpchp dcdbas joydev mac_hid lpc_ich acpi_pad wmi hid_generic usbhid hid ixgbe igb dca i2c_algo_bit ahci ptp libahci mdio pps_core Jul 2 11:08:21 arno-3 kernel: [ 2165.081090] CPU: 19 PID: 3494 Comm: qemu-system-x86 Not tainted 3.11.0-15-generic #25~precise1-Ubuntu Jul 2 11:08:21 arno-3 kernel: [ 2165.081424] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V, BIOS 2.0.3 07/03/2013 Jul 2 11:08:21 arno-3 kernel: [ 2165.081705] task: 88102675 ti: 881026056000 task.ti: 881026056000 Jul 2 11:08:21 arno-3 kernel: [ 2165.081973] RIP: 0010:[8118d0fa] [8118d0fa] copy_huge_page+0x8a/0x2a0 Hello, sorry for possible top-posting, the same issue appears on at least 3.10 LTS series. The original thread is at http://marc.info/?l=kvmm=14043742300901. Andrey, I am unable to access the URL above? The necessary components for failure to reappear are a single running kvm guest and mounted large thp: hugepagesz=1G (seemingly the same as in initial report). With default 2M pages everything is working well, the same for 3.18 with 1G THP. Are there any obvious clues for the issue? Thanks! Hello, Marcelo, sorry, I`ve missed your reply in time. The working link, for example is http://www.spinics.net/lists/linux-mm/msg75658.html. The reproducer is a very simple, you need 1G THP and mounted hugetlbfs. What is interesting, if guest is backed by THP like '-object memory-backend-file,id=mem,size=1G,mem-path=/hugepages,share=on' the failure is less likely to occur. I think you're mistaken when you write of 1G THP: although hugetlbfs can support 1G hugepages, we don't support that size with Transparent Huge Pages. But you are very appositely mistaken: copy_huge_page() used to make the same mistake, and Dave Hansen fixed it back in v3.13, but the fix never went to the stable trees. Your report was on an Ubuntu 3.11.0-15 kernel: I think Ubuntu have discontinued their 3.11-stable kernel series, but 3.10-longterm and 3.12-longterm would benefit from including this fix. I haven't tried patching and building and testing it there, but it looks reasonable. Hugh commit 30b0a105d9f7141e4cbf72ae5511832457d89788 Author: Dave Hansen dave.han...@linux.intel.com Date: Thu Nov 21 14:31:58 2013 -0800 mm: thp: give transparent hugepage code a separate copy_page Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); and a non-hugetlbfs page has no page_hstate(). This works 99% of the time because page_hstate() determines the hstate from the page order alone. Since the page order of a THP page matches the default hugetlbfs page order, it works. But, if you change the default huge page size on the boot command-line (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate so page_hstate() returns null and copy_huge_page() oopses pretty fast since copy_huge_page() dereferences the hstate: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) MAX_ORDER_NR_PAGES)) { ... Mel noticed that the migration code is really the only user of these functions. This moves all the copy code over to migrate.c and makes copy_huge_page() work for THP by checking for it explicitly
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Tue, Mar 31, 2015 at 4:45 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-30 22:32+0300, Andrey Korolyov: On Mon, Mar 30, 2015 at 9:56 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-27 13:16+0300, Andrey Korolyov: On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das b...@redhat.com wrote: Radim Krčmář rkrc...@redhat.com writes: I second Bandan -- checking that it reproduces on other machine would be great for sanity :) (Although a bug in our APICv is far more likely.) If it's APICv related, a run without apicv enabled could give more hints. Your devices not getting reset hypothesis makes the most sense to me, maybe the timer vector in the error message is just one part of the whole story. Another misbehaving interrupt from the dark comes in at the same time and leads to a double fault. Default trace (APICv enabled, first reboot introduced the issue): http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz The relevant part is here, prefixed with qemu-system-x86-4180 [002] 697.111550: kvm_exit: reason CR_ACCESS rip 0xd272 info 0 0 kvm_cr: cr_write 0 = 0x10 kvm_mmu_get_page: existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync kvm_entry:vcpu 0 kvm_emulate_insn: f:d275: ea 7a d2 00 f0 kvm_emulate_insn: f:d27a: 2e 0f 01 1e f0 6c kvm_emulate_insn: f:d280: 31 c0 kvm_emulate_insn: f:d282: 8e e0 kvm_emulate_insn: f:d284: 8e e8 kvm_emulate_insn: f:d286: 8e c0 kvm_emulate_insn: f:d288: 8e d8 kvm_emulate_insn: f:d28a: 8e d0 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xd28f info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0x8dd0 info 184 0 kvm_page_fault: address f8dd0 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0x76d6 info 184 0 kvm_page_fault: address f76d6 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason PENDING_INTERRUPT rip 0xd331 info 0 0 kvm_inj_virq: irq 8 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0xfea5 info 184 0 kvm_page_fault: address ffea5 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0xe990 info 184 0 kvm_page_fault: address fe990 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xe990 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EXCEPTION_NMI rip 0xd334 info 0 8b0d kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) Trace without APICv (three reboots, just to make sure to hit the problematic condition of supposed DF, as it still have not one hundred percent reproducibility): http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz The trace here contains a well matching excerpt, just instead of the EXCEPTION_NMI, it does 169.905098: kvm_exit: reason EPT_VIOLATION rip 0xd334 info 181 0 169.905102: kvm_page_fault: address feffd066 error_code 181 and works. Page fault says we tried to read 0xfeffd066 -- probably IOPB of TSS. (I guess it is pre-fetch for following IO instruction.) Nothing strikes me when looking at it, but some APICv boots don't fail, so it would be interesting to compare them ... hosts's 0xf6 interrupt (IRQ_WORK_VECTOR) is a possible source of races. (We could look more closely. It is fired too often for my liking as well.) Thanks Radim, http://xdel.ru/downloads/kvm-e5v2-issue/no-fail-with-apicv.dat.gz The related bits looks the same as with enable_apicv=0 for me. Yeah, qemu-system-x86-4201 [007] 159.297337: kvm_exit: reason CR_ACCESS rip 0xd272 info 0 0 kvm_cr: cr_write 0 = 0x10 kvm_mmu_get_page: existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync kvm_entry:vcpu 0 kvm_emulate_insn: f:d275: ea 7a d2 00 f0 kvm_emulate_insn: f:d27a: 2e 0f 01 1e f0 6c kvm_emulate_insn: f:d280: 31 c0 kvm_emulate_insn: f:d282: 8e e0 kvm_emulate_insn: f:d284: 8e e8 kvm_emulate_insn: f:d286: 8e c0 kvm_emulate_insn: f:d288: 8e d8 kvm_emulate_insn: f:d28a: 8e d0
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Tue, Mar 31, 2015 at 7:45 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-31 17:56+0300, Andrey Korolyov: Chasing the culprit this way could take a long time, so a new tracepoint that shows if 0xef is set on entry would let us guess the bug faster ... Please provide a failing trace with the following patch: Thanks, please see below: http://xdel.ru/downloads/kvm-e5v2-issue/new-tracepoint-fail-with-apicv.dat.gz qemu-system-x86-4022 [006] 255.915978: kvm_entry:vcpu 0 kvm_emulate_insn: f:d275: ea 7a d2 00 f0 kvm_emulate_insn: f:d27a: 2e 0f 01 1e f0 6c kvm_emulate_insn: f:d280: 31 c0 kvm_emulate_insn: f:d282: 8e e0 kvm_emulate_insn: f:d284: 8e e8 kvm_emulate_insn: f:d286: 8e c0 kvm_emulate_insn: f:d288: 8e d8 kvm_emulate_insn: f:d28a: 8e d0 kvm_entry:vcpu 0 kvm_0xef: irr clear, isr clear, vmcs 0x0 kvm_exit: reason EPT_VIOLATION rip 0x8dd0 info 184 0 kvm_page_fault: address f8dd0 error_code 184 kvm_entry:vcpu 0 kvm_0xef: irr clear, isr clear, vmcs 0x0 kvm_exit: reason EPT_VIOLATION rip 0x76d6 info 184 0 kvm_page_fault: address f76d6 error_code 184 kvm_entry:vcpu 0 kvm_0xef: irr clear, isr clear, vmcs 0x0 kvm_exit: reason EXCEPTION_NMI rip 0xd331 info 0 8b0d kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) Ok, nothing obvious here either ... I've desperately added all information I know about. Please run it again, thanks. (The patch has to be applied instead of the previous one.) --- diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 7c7bc8bef21f..f986636ad9d0 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -742,6 +742,41 @@ TRACE_EVENT(kvm_emulate_insn, #define trace_kvm_emulate_insn_start(vcpu) trace_kvm_emulate_insn(vcpu, 0) #define trace_kvm_emulate_insn_failed(vcpu) trace_kvm_emulate_insn(vcpu, 1) +TRACE_EVENT(kvm_0xef, + TP_PROTO(bool irr, bool isr, u32 info, bool on, bool pir, u16 status), + TP_ARGS(irr, isr, info, on, pir, status), + + TP_STRUCT__entry( + __field(bool, irr ) + __field(bool, isr ) + __field(u32, info) + __field(bool, on ) + __field(bool, pir ) + __field(u8,rvi ) + __field(u8,svi ) + ), + + TP_fast_assign( + __entry-irr = irr; + __entry-isr = isr; + __entry-info = info; + __entry-on = on; + __entry-pir = pir; + __entry-rvi = status 0xff; + __entry-svi = status 8; + ), + + TP_printk(irr %s, isr %s, info 0x%x, on %s, pir %s, rvi 0x%x, svi 0x%x, + __entry-irr ? set : clear, + __entry-isr ? set : clear, + __entry-info, + __entry-on ? set : clear, + __entry-pir ? set : clear, + __entry-rvi, + __entry-svi +) + ); + TRACE_EVENT( vcpu_match_mmio, TP_PROTO(gva_t gva, gpa_t gpa, bool write, bool gpa_match), diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index eee63dc33d89..b461edc93d53 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5047,6 +5047,25 @@ static int handle_machine_check(struct kvm_vcpu *vcpu) return 1; } +#define VEC_POS(v) ((v) (32 - 1)) +#define REG_POS(v) (((v) 5) 4) +static inline int apic_test_vector(int vec, void *bitmap) +{ + return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + +static inline void random_trace(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + trace_kvm_0xef(apic_test_vector(0xef, vcpu-arch.apic-regs + APIC_IRR), + apic_test_vector(0xef, vcpu-arch.apic-regs + APIC_ISR), + vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), + test_bit(POSTED_INTR_ON, (unsigned long *)vmx-pi_desc.control), + test_bit(0xef, (unsigned long *)vmx-pi_desc.pir), + vmcs_read16(GUEST_INTR_STATUS)); +} + static int handle_exception(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -5077,6 +5096,8 @@ static int handle_exception(struct kvm_vcpu *vcpu) return 1; } + random_trace(vcpu); + error_code = 0; if (intr_info INTR_INFO_DELIVER_CODE_MASK) error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE); @@ -8143,6 +8164,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vmx-emulation_required) return; + random_trace(vcpu); + if (vmx
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Tue, Mar 31, 2015 at 9:04 PM, Bandan Das b...@redhat.com wrote: Bandan Das b...@redhat.com writes: Andrey Korolyov and...@xdel.ru writes: ... http://xdel.ru/downloads/kvm-e5v2-issue/another-tracepoint-fail-with-apicv.dat.gz Something a bit more interesting, but the mess is happening just *after* NMI firing. What happens if NMI is turned off on the host ? Sorry, I meant the watchdog.. Thanks, everything goes well (as it probably should go there): http://xdel.ru/downloads/kvm-e5v2-issue/apicv-enabled-nmi-disabled.dat.gz -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Mon, Mar 30, 2015 at 9:56 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-27 13:16+0300, Andrey Korolyov: On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das b...@redhat.com wrote: Radim Krčmář rkrc...@redhat.com writes: I second Bandan -- checking that it reproduces on other machine would be great for sanity :) (Although a bug in our APICv is far more likely.) If it's APICv related, a run without apicv enabled could give more hints. Your devices not getting reset hypothesis makes the most sense to me, maybe the timer vector in the error message is just one part of the whole story. Another misbehaving interrupt from the dark comes in at the same time and leads to a double fault. Default trace (APICv enabled, first reboot introduced the issue): http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz The relevant part is here, prefixed with qemu-system-x86-4180 [002] 697.111550: kvm_exit: reason CR_ACCESS rip 0xd272 info 0 0 kvm_cr: cr_write 0 = 0x10 kvm_mmu_get_page: existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync kvm_entry:vcpu 0 kvm_emulate_insn: f:d275: ea 7a d2 00 f0 kvm_emulate_insn: f:d27a: 2e 0f 01 1e f0 6c kvm_emulate_insn: f:d280: 31 c0 kvm_emulate_insn: f:d282: 8e e0 kvm_emulate_insn: f:d284: 8e e8 kvm_emulate_insn: f:d286: 8e c0 kvm_emulate_insn: f:d288: 8e d8 kvm_emulate_insn: f:d28a: 8e d0 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xd28f info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0x8dd0 info 184 0 kvm_page_fault: address f8dd0 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0x76d6 info 184 0 kvm_page_fault: address f76d6 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason PENDING_INTERRUPT rip 0xd331 info 0 0 kvm_inj_virq: irq 8 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0xfea5 info 184 0 kvm_page_fault: address ffea5 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0xe990 info 184 0 kvm_page_fault: address fe990 error_code 184 kvm_entry:vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xe990 info 0 80f6 kvm_entry:vcpu 0 kvm_exit: reason EXCEPTION_NMI rip 0xd334 info 0 8b0d kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) Trace without APICv (three reboots, just to make sure to hit the problematic condition of supposed DF, as it still have not one hundred percent reproducibility): http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz The trace here contains a well matching excerpt, just instead of the EXCEPTION_NMI, it does 169.905098: kvm_exit: reason EPT_VIOLATION rip 0xd334 info 181 0 169.905102: kvm_page_fault: address feffd066 error_code 181 and works. Page fault says we tried to read 0xfeffd066 -- probably IOPB of TSS. (I guess it is pre-fetch for following IO instruction.) Nothing strikes me when looking at it, but some APICv boots don't fail, so it would be interesting to compare them ... hosts's 0xf6 interrupt (IRQ_WORK_VECTOR) is a possible source of races. (We could look more closely. It is fired too often for my liking as well.) Thanks Radim, http://xdel.ru/downloads/kvm-e5v2-issue/no-fail-with-apicv.dat.gz (missed right button in mailer previously) The related bits looks the same as with enable_apicv=0 for me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008
On Tue, Feb 24, 2015 at 3:12 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Feb 04, 2015 at 08:34:04PM +0400, Andrey Korolyov wrote: Hi, I've seen the problem quite a few times. Before spending more time on it, I'd like to have a quick check here to see if anyone ever saw the same problem? Hope it is a relevant question with this mail list. Jul 2 11:08:21 arno-3 kernel: [ 2165.078623] BUG: unable to handle kernel NULL pointer dereference at 0008 Jul 2 11:08:21 arno-3 kernel: [ 2165.078916] IP: [8118d0fa] copy_huge_page+0x8a/0x2a0 Jul 2 11:08:21 arno-3 kernel: [ 2165.079128] PGD 0 Jul 2 11:08:21 arno-3 kernel: [ 2165.079198] Oops: [#1] SMP Jul 2 11:08:21 arno-3 kernel: [ 2165.079319] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bridge stp llc ast ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea lp mei_me ioatdma ext2 parport mei shpchp dcdbas joydev mac_hid lpc_ich acpi_pad wmi hid_generic usbhid hid ixgbe igb dca i2c_algo_bit ahci ptp libahci mdio pps_core Jul 2 11:08:21 arno-3 kernel: [ 2165.081090] CPU: 19 PID: 3494 Comm: qemu-system-x86 Not tainted 3.11.0-15-generic #25~precise1-Ubuntu Jul 2 11:08:21 arno-3 kernel: [ 2165.081424] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V, BIOS 2.0.3 07/03/2013 Jul 2 11:08:21 arno-3 kernel: [ 2165.081705] task: 88102675 ti: 881026056000 task.ti: 881026056000 Jul 2 11:08:21 arno-3 kernel: [ 2165.081973] RIP: 0010:[8118d0fa] [8118d0fa] copy_huge_page+0x8a/0x2a0 Hello, sorry for possible top-posting, the same issue appears on at least 3.10 LTS series. The original thread is at http://marc.info/?l=kvmm=14043742300901. Andrey, I am unable to access the URL above? The necessary components for failure to reappear are a single running kvm guest and mounted large thp: hugepagesz=1G (seemingly the same as in initial report). With default 2M pages everything is working well, the same for 3.18 with 1G THP. Are there any obvious clues for the issue? Thanks! Hello, Marcelo, sorry, I`ve missed your reply in time. The working link, for example is http://www.spinics.net/lists/linux-mm/msg75658.html. The reproducer is a very simple, you need 1G THP and mounted hugetlbfs. What is interesting, if guest is backed by THP like '-object memory-backend-file,id=mem,size=1G,mem-path=/hugepages,share=on' the failure is less likely to occur. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das b...@redhat.com wrote: Radim Krčmář rkrc...@redhat.com writes: 2015-03-26 21:24+0300, Andrey Korolyov: On Thu, Mar 26, 2015 at 8:40 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-26 20:08+0300, Andrey Korolyov: KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d Btw. does this part ever change? I see that first report had: KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d Was that a Windows guest by any chance? Yes, exactly, different extra data output was from a Windows VMs. Windows uses vector 0xd1 for timer interrupts. I second Bandan -- checking that it reproduces on other machine would be great for sanity :) (Although a bug in our APICv is far more likely.) If it's APICv related, a run without apicv enabled could give more hints. Your devices not getting reset hypothesis makes the most sense to me, maybe the timer vector in the error message is just one part of the whole story. Another misbehaving interrupt from the dark comes in at the same time and leads to a double fault. Default trace (APICv enabled, first reboot introduced the issue): http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz Trace without APICv (three reboots, just to make sure to hit the problematic condition of supposed DF, as it still have not one hundred percent reproducibility): http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz It would be great of course to reproduce this somewhere else, otherwise all this thread may end in fixing a bug which exists only at my particular platform. Right now I have no hardware except a lot of well-known (in terms of existing issues) Supermicro boards of one model. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 11:40 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-26 21:24+0300, Andrey Korolyov: On Thu, Mar 26, 2015 at 8:40 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-26 20:08+0300, Andrey Korolyov: KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d Btw. does this part ever change? I see that first report had: KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d Was that a Windows guest by any chance? Yes, exactly, different extra data output was from a Windows VMs. Windows uses vector 0xd1 for timer interrupts. I second Bandan -- checking that it reproduces on other machine would be great for sanity :) (Although a bug in our APICv is far more likely.) Trace with new bits: KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d extra data[2]: 77b EAX= EBX= ECX= EDX= ESI= EDI= EBP= ESP=6d24 EIP=d331 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6cb0 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f ba 2d d4 fe fb 3f -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 5:47 AM, Bandan Das b...@redhat.com wrote: Hi Andrey, Andrey Korolyov and...@xdel.ru writes: On Mon, Mar 16, 2015 at 10:17 PM, Andrey Korolyov and...@xdel.ru wrote: For now, it looks like bug have a mixed Murphy-Heisenberg nature, as it appearance is very rare (compared to the number of actual launches) and most probably bounded to the physical characteristics of my production nodes. As soon as I reach any reproducible path for a regular workstation environment, I`ll let everyone know. Also I am starting to think that issue can belong to the particular motherboard firmware revision, despite fact that the CPU microcode is the same everywhere. I will take the risk and say this - could it be a processor bug ? :) Hello everyone, I`ve managed to reproduce this issue *deterministically* with latest seabios with smp fix and 3.18.3. The error occuring just *once* per vm until hypervisor reboots, at least in my setup, this is definitely crazy... - launch two VMs (Centos 7 in my case), - wait a little while they are booting, - attach serial console (I am using virsh list for this exact purpose), - issue acpi reboot or reset, does not matter, - VM always hangs at boot, most times with sgabios initialization string printed out [1], but sometimes it hangs a bit later [2], - no matter how many times I try to relaunch the QEMU afterwards, the issue does not appear on VM which experienced problem once; - trace and sample args can be seen in [3] and [4] respectively. My system is a Dell R720 dual socket which has 2620v2s. I tried your setup but couldn't reproduce (my qemu cmdline isn't exactly the same as yours), although, if you could simplify your command line a bit, I can try again. Bandan 1) Google, Inc. Serial Graphics Adapter 06/11/14 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 Term: 211x62 4 0 2) Google, Inc. Serial Graphics Adapter 06/11/14 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 Term: 211x62 4 0 [...empty screen...] SeaBIOS (version 1.8.1-20150325_230423-testnode) Machine UUID 3c78721f-7317-4f85-bcbe-f5ad46d293a1 iPXE (http://ipxe.org) 00:02.0 C100 PCI2.10 PnP PMM+3FF95BA0+3FEF5BA0 C10 3) KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d EAX= EBX= ECX= EDX= ESI= EDI= EBP= ESP=6d2c EIP=d331 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6cb0 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f ba 2d d4 fe fb 3f 4) /usr/bin/qemu-system-x86_64 -name centos71 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -bios /usr/share/seabios/bios.bin -m 1024 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -uuid 3c78721f-7317-4f85-bcbe-f5ad46d293a1 -nographic -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos71.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=rbd:dev-rack2/centos7-1.raw:id=qemukvm:key=XX:auth_supported=cephx\;none:mon_host=10.6.0.1\:6789\;10.6.0.3\:6789\;10.6.0.4\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/centos71.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 -msg timestamp=on Hehe, 2.2 works just perfectly but 2.1 isn`t. I`ll bisect the issue in a next couple of days and post the right commit (but as can remember none of commits b/w 2.1 and 2.2 can fix simular issue by a purpose). I`ve attached a reference xml to simplify playing with libvirt if anyone willing to do so. domain type='kvm
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor ke...@koconnor.net wrote: On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Krčmář wrote: 2015-03-25 20:05-0400, Kevin O'Connor: On Thu, Mar 26, 2015 at 02:35:58AM +0300, Andrey Korolyov wrote: Thanks, strangely the reboot is always failing now and always reaching seabios greeting. May be prints straightened up a race (e.g. it is not int19 problem really). object file part: d331 irq_trampoline_0x19: irq_trampoline_0x19(): /root/seabios-1.8.1/src/romlayout.S:195 d331: cd 19 int$0x19 d333: cb lretw [...] Jump to int19 (vector=f000e6f2) Thanks. So, it dies on the int $0x19 instruction itself. The vector looks correct and I don't see anything in the cpu register state that looks wrong. Maybe one of the kvm developers will have an idea what could cause a fault there. The place agrees with the cd 19 cb part of KVM error output. Suberror 2 means that we were interrupted while delivering a vector, here it is disected: (delivering 'vect_info') vect_info (extra data[0]: 80ef) - vector 0xef - INTR_TYPE_EXT_INTR (0x000) - no error code (0x000) - valid (0x8000) intr_info (extra data[1]: 8b0d) - #GP (0x0d) - INTR_TYPE_HARD_EXCEPTION (0x300) - error code on stack (0x800) [Hunk at the bottom exposes it.] - valid (0x8000) Thanks for the background info. Notice the 0xef. My best hypothesis so far is that we fail at resetting devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted. (The bug happens at the first place that enables interrupts.) FYI, the int $0x19 isn't the first place SeaBIOS will enable interrupts. Each screen print (every character in the seabios banner and uuid string) will call the vga bios (int $0x10) with irqs enabled (see output.c:screenc). Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a simple iretw. Things that are unusual about the int $0x19 call: - it is likely the first place that the cpu is transitioned into 16bit real mode as opposed to big real mode. (That is, the first place interrupts are enabled with the segment limits set to 0x.) - it's right after the fw/shadow.c:make_bios_readonly() call, which attempts to configures the memory at 0xf-0x10 as read-only. That code also issues a wbinvd() call. I'm not sure if the crash always happens at the int $0x19 location though. Andrey, does the crash always happen with EIP=d331 and/or with Code=... cd 19? -Kevin There are also rare occurences for d3f9 (in the middle of ep) and d334 ep (less than one tenth of events for both). I`ll post a sample event capture with and without Radim`s proposed patch maybe today or tomorrow. /root/seabios-1.8.1/src/romlayout.S:289 d3eb: 66 50 pushl %eax d3ed: 66 51 pushl %ecx d3ef: 66 52 pushl %edx d3f1: 66 53 pushl %ebx d3f3: 66 55 pushl %ebp d3f5: 66 56 pushl %esi d3f7: 66 57 pushl %edi d3f9: 06 pushw %es d3fa: 1e pushw %ds d334 irq_trampoline_0x1c: irq_trampoline_0x1c(): /root/seabios-1.8.1/src/romlayout.S:196 d334: cd 1c int$0x1c d336: cb lretw -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor ke...@koconnor.net wrote: On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote: On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor ke...@koconnor.net wrote: I'm not sure if the crash always happens at the int $0x19 location though. Andrey, does the crash always happen with EIP=d331 and/or with Code=... cd 19? There are also rare occurences for d3f9 (in the middle of ep) and d334 ep (less than one tenth of events for both). I`ll post a sample event capture with and without Radim`s proposed patch maybe today or tomorrow. /root/seabios-1.8.1/src/romlayout.S:289 d3eb: 66 50 pushl %eax d3ed: 66 51 pushl %ecx d3ef: 66 52 pushl %edx d3f1: 66 53 pushl %ebx d3f3: 66 55 pushl %ebp d3f5: 66 56 pushl %esi d3f7: 66 57 pushl %edi d3f9: 06 pushw %es d3fa: 1e pushw %ds d334 irq_trampoline_0x1c: irq_trampoline_0x1c(): /root/seabios-1.8.1/src/romlayout.S:196 d334: cd 1c int$0x1c d336: cb lretw Thanks. The d334 looks very similar to the d331 report (code=cd 1c). That path could happen during post (big real mode) or immiediately after post (real mode). The d3f9 report does not look like the others - interrupts are disabled there. If you still have the error logs, can you post the full kvm crash report for d3f9? Here you go: KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 8:18 PM, Kevin O'Connor ke...@koconnor.net wrote: On Thu, Mar 26, 2015 at 08:08:52PM +0300, Andrey Korolyov wrote: On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor ke...@koconnor.net wrote: On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote: On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor ke...@koconnor.net wrote: I'm not sure if the crash always happens at the int $0x19 location though. Andrey, does the crash always happen with EIP=d331 and/or with Code=... cd 19? There are also rare occurences for d3f9 (in the middle of ep) and d334 ep (less than one tenth of events for both). I`ll post a sample event capture with and without Radim`s proposed patch maybe today or tomorrow. /root/seabios-1.8.1/src/romlayout.S:289 d3eb: 66 50 pushl %eax d3ed: 66 51 pushl %ecx d3ef: 66 52 pushl %edx d3f1: 66 53 pushl %ebx d3f3: 66 55 pushl %ebp d3f5: 66 56 pushl %esi d3f7: 66 57 pushl %edi d3f9: 06 pushw %es d3fa: 1e pushw %ds d334 irq_trampoline_0x1c: irq_trampoline_0x1c(): /root/seabios-1.8.1/src/romlayout.S:196 d334: cd 1c int$0x1c d336: cb lretw Thanks. The d334 looks very similar to the d331 report (code=cd 1c). That path could happen during post (big real mode) or immiediately after post (real mode). The d3f9 report does not look like the others - interrupts are disabled there. If you still have the error logs, can you post the full kvm crash report for d3f9? Here you go: Thanks. While we're at, can you verify if all your reports are showing the cpu in real mode. That is, do they all have in the third column of the segment registers - as in: ES = 9300 That`s positive. [...] Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e KVM reports the code as int $0x10 here. Was it possible this report was from a different build of seabios (that had a different code layout)? Yep, sorry, I`ve mixed in logs just from before transition out of 1.7.5. Interestingly, this int $0x10 is also in real-mode and not big real mode, so I think it would have occurred after post completed. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 8:40 PM, Radim Krčmář rkrc...@redhat.com wrote: 2015-03-26 20:08+0300, Andrey Korolyov: KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d Btw. does this part ever change? I see that first report had: KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d Was that a Windows guest by any chance? Yes, exactly, different extra data output was from a Windows VMs. Thanks for clarifying things for your patch, I hadn`t looked at the vmx code yet and thought that it changing things. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 12:18 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Mar 26, 2015 at 5:47 AM, Bandan Das b...@redhat.com wrote: Hi Andrey, Andrey Korolyov and...@xdel.ru writes: On Mon, Mar 16, 2015 at 10:17 PM, Andrey Korolyov and...@xdel.ru wrote: For now, it looks like bug have a mixed Murphy-Heisenberg nature, as it appearance is very rare (compared to the number of actual launches) and most probably bounded to the physical characteristics of my production nodes. As soon as I reach any reproducible path for a regular workstation environment, I`ll let everyone know. Also I am starting to think that issue can belong to the particular motherboard firmware revision, despite fact that the CPU microcode is the same everywhere. I will take the risk and say this - could it be a processor bug ? :) Hello everyone, I`ve managed to reproduce this issue *deterministically* with latest seabios with smp fix and 3.18.3. The error occuring just *once* per vm until hypervisor reboots, at least in my setup, this is definitely crazy... - launch two VMs (Centos 7 in my case), - wait a little while they are booting, - attach serial console (I am using virsh list for this exact purpose), - issue acpi reboot or reset, does not matter, - VM always hangs at boot, most times with sgabios initialization string printed out [1], but sometimes it hangs a bit later [2], - no matter how many times I try to relaunch the QEMU afterwards, the issue does not appear on VM which experienced problem once; - trace and sample args can be seen in [3] and [4] respectively. My system is a Dell R720 dual socket which has 2620v2s. I tried your setup but couldn't reproduce (my qemu cmdline isn't exactly the same as yours), although, if you could simplify your command line a bit, I can try again. Bandan 1) Google, Inc. Serial Graphics Adapter 06/11/14 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 Term: 211x62 4 0 2) Google, Inc. Serial Graphics Adapter 06/11/14 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 Term: 211x62 4 0 [...empty screen...] SeaBIOS (version 1.8.1-20150325_230423-testnode) Machine UUID 3c78721f-7317-4f85-bcbe-f5ad46d293a1 iPXE (http://ipxe.org) 00:02.0 C100 PCI2.10 PnP PMM+3FF95BA0+3FEF5BA0 C10 3) KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d EAX= EBX= ECX= EDX= ESI= EDI= EBP= ESP=6d2c EIP=d331 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6cb0 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f ba 2d d4 fe fb 3f 4) /usr/bin/qemu-system-x86_64 -name centos71 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -bios /usr/share/seabios/bios.bin -m 1024 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -uuid 3c78721f-7317-4f85-bcbe-f5ad46d293a1 -nographic -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos71.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=rbd:dev-rack2/centos7-1.raw:id=qemukvm:key=XX:auth_supported=cephx\;none:mon_host=10.6.0.1\:6789\;10.6.0.3\:6789\;10.6.0.4\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/centos71.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 -msg timestamp=on Hehe, 2.2 works just perfectly but 2.1 isn`t. I`ll bisect the issue in a next couple of days and post the right commit (but as can remember none of commits b/w 2.1 and 2.2 can fix simular issue by a purpose). I`ve attached a reference xml
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Mon, Mar 16, 2015 at 10:17 PM, Andrey Korolyov and...@xdel.ru wrote: For now, it looks like bug have a mixed Murphy-Heisenberg nature, as it appearance is very rare (compared to the number of actual launches) and most probably bounded to the physical characteristics of my production nodes. As soon as I reach any reproducible path for a regular workstation environment, I`ll let everyone know. Also I am starting to think that issue can belong to the particular motherboard firmware revision, despite fact that the CPU microcode is the same everywhere. Hello everyone, I`ve managed to reproduce this issue *deterministically* with latest seabios with smp fix and 3.18.3. The error occuring just *once* per vm until hypervisor reboots, at least in my setup, this is definitely crazy... - launch two VMs (Centos 7 in my case), - wait a little while they are booting, - attach serial console (I am using virsh list for this exact purpose), - issue acpi reboot or reset, does not matter, - VM always hangs at boot, most times with sgabios initialization string printed out [1], but sometimes it hangs a bit later [2], - no matter how many times I try to relaunch the QEMU afterwards, the issue does not appear on VM which experienced problem once; - trace and sample args can be seen in [3] and [4] respectively. 1) Google, Inc. Serial Graphics Adapter 06/11/14 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 Term: 211x62 4 0 2) Google, Inc. Serial Graphics Adapter 06/11/14 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 Term: 211x62 4 0 [...empty screen...] SeaBIOS (version 1.8.1-20150325_230423-testnode) Machine UUID 3c78721f-7317-4f85-bcbe-f5ad46d293a1 iPXE (http://ipxe.org) 00:02.0 C100 PCI2.10 PnP PMM+3FF95BA0+3FEF5BA0 C10 3) KVM internal error. Suberror: 2 extra data[0]: 80ef extra data[1]: 8b0d EAX= EBX= ECX= EDX= ESI= EDI= EBP= ESP=6d2c EIP=d331 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6cb0 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f ba 2d d4 fe fb 3f 4) /usr/bin/qemu-system-x86_64 -name centos71 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -bios /usr/share/seabios/bios.bin -m 1024 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -uuid 3c78721f-7317-4f85-bcbe-f5ad46d293a1 -nographic -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos71.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=rbd:dev-rack2/centos7-1.raw:id=qemukvm:key=XX:auth_supported=cephx\;none:mon_host=10.6.0.1\:6789\;10.6.0.3\:6789\;10.6.0.4\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/centos71.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 -msg timestamp=on -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
- attach serial console (I am using virsh list for this exact purpose), virsh console of course, sorry -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Wed, Mar 25, 2015 at 11:54 PM, Kevin O'Connor ke...@koconnor.net wrote: On Wed, Mar 25, 2015 at 11:43:31PM +0300, Andrey Korolyov wrote: On Mon, Mar 16, 2015 at 10:17 PM, Andrey Korolyov and...@xdel.ru wrote: For now, it looks like bug have a mixed Murphy-Heisenberg nature, as it appearance is very rare (compared to the number of actual launches) and most probably bounded to the physical characteristics of my production nodes. As soon as I reach any reproducible path for a regular workstation environment, I`ll let everyone know. Also I am starting to think that issue can belong to the particular motherboard firmware revision, despite fact that the CPU microcode is the same everywhere. Hello everyone, I`ve managed to reproduce this issue *deterministically* with latest seabios with smp fix and 3.18.3. The error occuring just *once* per vm until hypervisor reboots, at least in my setup, this is definitely crazy... - launch two VMs (Centos 7 in my case), - wait a little while they are booting, - attach serial console (I am using virsh list for this exact purpose), - issue acpi reboot or reset, does not matter, - VM always hangs at boot, most times with sgabios initialization string printed out [1], but sometimes it hangs a bit later [2], - no matter how many times I try to relaunch the QEMU afterwards, the issue does not appear on VM which experienced problem once; - trace and sample args can be seen in [3] and [4] respectively. Can you add something like: -chardev file,path=seabioslog.`date +%s`,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios to the qemu command line and forward the resulting log from both a succesful boot and a failed one? -Kevin Of course, logs are attached. reboot.failed Description: Binary data reboot.succeeded Description: Binary data
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 26, 2015 at 2:02 AM, Kevin O'Connor ke...@koconnor.net wrote: On Thu, Mar 26, 2015 at 01:31:11AM +0300, Andrey Korolyov wrote: On Wed, Mar 25, 2015 at 11:54 PM, Kevin O'Connor ke...@koconnor.net wrote: Can you add something like: -chardev file,path=seabioslog.`date +%s`,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios to the qemu command line and forward the resulting log from both a succesful boot and a failed one? -Kevin Of course, logs are attached. Thanks. From a diff of the two logs: 4: 3ffe - 4000 = 2 RESERVED 5: feffc000 - ff00 = 2 RESERVED 6: fffc - 0001 = 2 RESERVED -enter handle_19: - NULL -Booting from Hard Disk... -Booting from :7c00 So, it got most of the way through the reboot - there's only a few function calls between the e820 map being dumped and the handle_19 call. The fault also seems to show it stopped in the BIOS in 16bit mode: EIP=d331 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 Can you add the patch below, force the fault, and forward the log. Also, if you recreate the failure can you take the EIP from the fault (eg, d331) and search for the corresponding function in the output of: objdump -m i386 -M i8086 -M suffix -ldr out/rom16.o | less (That is, search for d331:.) If that's too much of a pain, just send me a direct email with the seabios out/rom16.o file and the new EIP of the fault. (I need the out/rom16.o that was used to build the version of SeaBIOS that faulted.) -Kevin diff --git a/src/post.c b/src/post.c index 9ea5620..bbd19c0 100644 --- a/src/post.c +++ b/src/post.c @@ -185,21 +185,24 @@ prepareboot(void) pmm_prepboot(); malloc_prepboot(); memmap_prepboot(); +dprintf(1, a\n); HaveRunPost = 2; // Setup bios checksum. BiosChecksum -= checksum((u8*)BUILD_BIOS_ADDR, BUILD_BIOS_SIZE); +dprintf(1, b\n); } // Begin the boot process by invoking an int0x19 in 16bit mode. void VISIBLE32FLAT startBoot(void) { +dprintf(1, e\n); // Clear low-memory allocations (required by PMM spec). memset((void*)BUILD_STACK_ADDR, 0, BUILD_EBDA_MINIMUM - BUILD_STACK_ADDR); -dprintf(3, Jump to int19\n); +dprintf(1, Jump to int19 (vector=%x)\n, GET_IVT(0x19).segoff); struct bregs br; memset(br, 0, sizeof(br)); br.flags = F_IF; @@ -239,9 +242,11 @@ maininit(void) // Prepare for boot. prepareboot(); +dprintf(1, c\n); // Write protect bios memory. make_bios_readonly(); +dprintf(1, d\n); // Invoke int 19 to start boot process. startBoot(); } Thanks, strangely the reboot is always failing now and always reaching seabios greeting. May be prints straightened up a race (e.g. it is not int19 problem really). object file part: d331 irq_trampoline_0x19: irq_trampoline_0x19(): /root/seabios-1.8.1/src/romlayout.S:195 d331: cd 19 int$0x19 d333: cb lretw reboot.failed Description: Binary data
Re: [Qemu-devel] E5-2620v2 - emulation stop error
For now, it looks like bug have a mixed Murphy-Heisenberg nature, as it appearance is very rare (compared to the number of actual launches) and most probably bounded to the physical characteristics of my production nodes. As soon as I reach any reproducible path for a regular workstation environment, I`ll let everyone know. Also I am starting to think that issue can belong to the particular motherboard firmware revision, despite fact that the CPU microcode is the same everywhere. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Wed, Mar 11, 2015 at 10:59 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Andrey Korolyov (and...@xdel.ru) wrote: On Wed, Mar 11, 2015 at 10:33 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Kevin O'Connor (ke...@koconnor.net) wrote: On Wed, Mar 11, 2015 at 02:45:31PM -0400, Kevin O'Connor wrote: On Wed, Mar 11, 2015 at 02:40:39PM -0400, Kevin O'Connor wrote: For what it's worth, I can't seem to trigger the problem if I move the cmos read above the SIPI/LAPIC code (see patch below). Ugh! That's a seabios bug. Main processor modifies the rtc index (rtc_read()) while APs try to clear the NMI bit by modifying the rtc index (romlayout.S:transition32). I'll put together a fix. The seabios patch below resolves the issue for me. Thanks! Looks good here. Andrey, Paolo, Bandan: Does it fix it for you as well? Thanks Kevin, Dave, I`m afraid that I`m hitting something different not only because different suberror code but also because of mine version of seabios - I am using 1.7.5 and corresponding code in the proposed patch looks different - there is no smp-related code patch is about of. Those mentioned devices went to production successfully and I`m afraid I cannot afford playing on them anymore, even if I re-trigger the issue with patched 1.8.1-rc, there is no way to switch to a different kernel and retest due to specific conditions of this production suite. I`ve ordered a pair of new shoes^W 2620v2-s which should arrive to me next Well I was testing on a pair of 'E5-2620 v2'; but as you saw my test case was pretty simple. If you can suggest any flags I should add etc to the test I'd be happy to give it a go. Dave Here is mine launch string: qemu-system-x86_64 -enable-kvm -name vmtest -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config -nodefaults -device sga -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m 512,slots=31,maxmem=16384M -object memory-backend-ram,id=mem0,size=512M -device pc-dimm,id=dimm0,node=0,memdev=mem0 I omitted disk backend in this example, but there is a chance that my problem is not reproducible without some calls made explicitly by a bootloader (not sure what to say for mid-runtime failures). Monday, so I`ll be able to test a) against 1.8.0-release, b) against patched bios code, c) reproduce initial error on master/3.19 (may be I`ll take them before weekend by going into this computer shop in person). Until then, I have a very deep feeling that mine issue is not there :) Also I became very curious on how a lack of IDT feature may completely eliminate the issue appearance for me, the only possible explanation is a clock-related race which is kinda stupid suggestion and unlikely to exist in nature. Thanks again for everyone for throughout testing and ideas! -Kevin --- a/src/romlayout.S +++ b/src/romlayout.S @@ -22,7 +22,8 @@ // %edx = return location (in 32bit mode) // Clobbers: ecx, flags, segment registers, cr0, idt/gdt DECLFUNC transition32 -transition32_for_smi: +transition32_nmi_off: +// transition32 when NMI and A20 are already initialized movl %eax, %ecx jmp 1f transition32: @@ -205,7 +206,7 @@ __farcall16: entry_smi: // Transition to 32bit mode. movl $1f + BUILD_BIOS_ADDR, %edx -jmp transition32_for_smi +jmp transition32_nmi_off .code32 1: movl $BUILD_SMM_ADDR + 0x8000, %esp calll _cfunc32flat_handle_smi - BUILD_BIOS_ADDR @@ -216,8 +217,10 @@ entry_smi: DECLFUNC entry_smp entry_smp: // Transition to 32bit mode. +cli +cld movl $2f + BUILD_BIOS_ADDR, %edx -jmp transition32 +jmp transition32_nmi_off .code32 // Acquire lock and take ownership of shared stack 1: rep ; nop -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Wed, Mar 11, 2015 at 10:33 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Kevin O'Connor (ke...@koconnor.net) wrote: On Wed, Mar 11, 2015 at 02:45:31PM -0400, Kevin O'Connor wrote: On Wed, Mar 11, 2015 at 02:40:39PM -0400, Kevin O'Connor wrote: For what it's worth, I can't seem to trigger the problem if I move the cmos read above the SIPI/LAPIC code (see patch below). Ugh! That's a seabios bug. Main processor modifies the rtc index (rtc_read()) while APs try to clear the NMI bit by modifying the rtc index (romlayout.S:transition32). I'll put together a fix. The seabios patch below resolves the issue for me. Thanks! Looks good here. Andrey, Paolo, Bandan: Does it fix it for you as well? Thanks Kevin, Dave, I`m afraid that I`m hitting something different not only because different suberror code but also because of mine version of seabios - I am using 1.7.5 and corresponding code in the proposed patch looks different - there is no smp-related code patch is about of. Those mentioned devices went to production successfully and I`m afraid I cannot afford playing on them anymore, even if I re-trigger the issue with patched 1.8.1-rc, there is no way to switch to a different kernel and retest due to specific conditions of this production suite. I`ve ordered a pair of new shoes^W 2620v2-s which should arrive to me next Monday, so I`ll be able to test a) against 1.8.0-release, b) against patched bios code, c) reproduce initial error on master/3.19 (may be I`ll take them before weekend by going into this computer shop in person). Until then, I have a very deep feeling that mine issue is not there :) Also I became very curious on how a lack of IDT feature may completely eliminate the issue appearance for me, the only possible explanation is a clock-related race which is kinda stupid suggestion and unlikely to exist in nature. Thanks again for everyone for throughout testing and ideas! -Kevin --- a/src/romlayout.S +++ b/src/romlayout.S @@ -22,7 +22,8 @@ // %edx = return location (in 32bit mode) // Clobbers: ecx, flags, segment registers, cr0, idt/gdt DECLFUNC transition32 -transition32_for_smi: +transition32_nmi_off: +// transition32 when NMI and A20 are already initialized movl %eax, %ecx jmp 1f transition32: @@ -205,7 +206,7 @@ __farcall16: entry_smi: // Transition to 32bit mode. movl $1f + BUILD_BIOS_ADDR, %edx -jmp transition32_for_smi +jmp transition32_nmi_off .code32 1: movl $BUILD_SMM_ADDR + 0x8000, %esp calll _cfunc32flat_handle_smi - BUILD_BIOS_ADDR @@ -216,8 +217,10 @@ entry_smi: DECLFUNC entry_smp entry_smp: // Transition to 32bit mode. +cli +cld movl $2f + BUILD_BIOS_ADDR, %edx -jmp transition32 +jmp transition32_nmi_off .code32 // Acquire lock and take ownership of shared stack 1: rep ; nop -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Thu, Mar 12, 2015 at 12:59 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Andrey Korolyov (and...@xdel.ru) wrote: On Wed, Mar 11, 2015 at 10:59 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Andrey Korolyov (and...@xdel.ru) wrote: On Wed, Mar 11, 2015 at 10:33 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Kevin O'Connor (ke...@koconnor.net) wrote: On Wed, Mar 11, 2015 at 02:45:31PM -0400, Kevin O'Connor wrote: On Wed, Mar 11, 2015 at 02:40:39PM -0400, Kevin O'Connor wrote: For what it's worth, I can't seem to trigger the problem if I move the cmos read above the SIPI/LAPIC code (see patch below). Ugh! That's a seabios bug. Main processor modifies the rtc index (rtc_read()) while APs try to clear the NMI bit by modifying the rtc index (romlayout.S:transition32). I'll put together a fix. The seabios patch below resolves the issue for me. Thanks! Looks good here. Andrey, Paolo, Bandan: Does it fix it for you as well? Thanks Kevin, Dave, I`m afraid that I`m hitting something different not only because different suberror code but also because of mine version of seabios - I am using 1.7.5 and corresponding code in the proposed patch looks different - there is no smp-related code patch is about of. Those mentioned devices went to production successfully and I`m afraid I cannot afford playing on them anymore, even if I re-trigger the issue with patched 1.8.1-rc, there is no way to switch to a different kernel and retest due to specific conditions of this production suite. I`ve ordered a pair of new shoes^W 2620v2-s which should arrive to me next Well I was testing on a pair of 'E5-2620 v2'; but as you saw my test case was pretty simple. If you can suggest any flags I should add etc to the test I'd be happy to give it a go. Dave Here is mine launch string: qemu-system-x86_64 -enable-kvm -name vmtest -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config -nodefaults -device sga -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m 512,slots=31,maxmem=16384M -object memory-backend-ram,id=mem0,size=512M -device pc-dimm,id=dimm0,node=0,memdev=mem0 I omitted disk backend in this example, but there is a chance that my problem is not reproducible without some calls made explicitly by a bootloader (not sure what to say for mid-runtime failures). It seems to survive OK: Thanks David, I`ll go through test sequence and report. Unfortunately my orchestration does not have even a hundred millisecond precision for libvirt events, so I can`t tell if the immediate start-up failures happened before bootloader execution or during it, all I have for those is a less-than-two-second interval between actual pass of a launch command and paused state event. QEMU logging also does not give me timestamps for an emulation errors even with appropriate timestamp arg. while true; do (sleep 1; echo -e '\001cc\n'; sleep 5; echo -e 'q\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -enable-kvm -name vmtest -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config -device sga -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m 512,slots=31,maxmem=16384M -object memory-backend-ram,id=mem0,size=512M -device pc-dimm,id=dimm0,node=0,memdev=mem0 ~/pi.vfd 21 | tee /tmp/qemu.op; grep internal error /tmp/qemu.op -q break; done Dave Monday, so I`ll be able to test a) against 1.8.0-release, b) against patched bios code, c) reproduce initial error on master/3.19 (may be I`ll take them before weekend by going into this computer shop in person). Until then, I have a very deep feeling that mine issue is not there :) Also I became very curious on how a lack of IDT feature may completely eliminate the issue appearance for me, the only possible explanation is a clock-related race which is kinda stupid suggestion and unlikely to exist in nature. Thanks again for everyone for throughout testing and ideas! -Kevin --- a/src/romlayout.S +++ b/src/romlayout.S @@ -22,7 +22,8 @@ // %edx = return location (in 32bit mode) // Clobbers: ecx, flags, segment registers, cr0, idt/gdt
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Sat, Mar 7, 2015 at 3:00 AM, Andrey Korolyov and...@xdel.ru wrote: On Fri, Mar 6, 2015 at 7:57 PM, Bandan Das b...@redhat.com wrote: Andrey Korolyov and...@xdel.ru writes: On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov and...@xdel.ru wrote: Hello, recently I`ve got a couple of shiny new Intel 2620v2s for future replacement of the E5-2620v1, but I experienced relatively many events with emulation errors, all traces looks simular to the one below. I am running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but can switch to some other versions if necessary. Most of crashes happened during reboot cycle or at the end of ACPI-based shutdown action, if this can help. I have zero clues of what can introduce such a mess inside same processor family using identical software, as 2620v1 has no simular problem ever. Please let me know if there can be some side measures for making entire story more clear. Thanks! KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e It turns out that those errors are introduced by APICv, which gets enabled due to different feature set. If anyone is interested in reproducing/fixing this exactly on 3.10, it takes about one hundred of migrations/power state changes for an issue to appear, guest OS can be Linux or Win. Are you able to reproduce this on a more recent upstream kernel as well ? Bandan I`ll go through test cycle with 3.18 and 2603v2 around tomorrow and follow up with any reproduceable results. Heh.. issue is not triggered on 2603v2 at all, at least I am not able to hit this. The only difference with 2620v2 except lower frequency is an Intel Dynamic Acceleration feature. I`d appreciate any testing with higher CPU models with same or richer feature set. The testing itself can be done on both generic 3.10 or RH7 kernels, as both of them are experiencing this issue. I conducted all tests with disabled cstates so I advise to do the same for a first reproduction step. Thanks! model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping: 4 microcode : 0x416 cpu MHz : 2100.039 cache size : 15360 KB siblings: 12 apicid : 43 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Tue, Mar 10, 2015 at 7:57 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Andrey Korolyov (and...@xdel.ru) wrote: On Sat, Mar 7, 2015 at 3:00 AM, Andrey Korolyov and...@xdel.ru wrote: On Fri, Mar 6, 2015 at 7:57 PM, Bandan Das b...@redhat.com wrote: Andrey Korolyov and...@xdel.ru writes: On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov and...@xdel.ru wrote: Hello, recently I`ve got a couple of shiny new Intel 2620v2s for future replacement of the E5-2620v1, but I experienced relatively many events with emulation errors, all traces looks simular to the one below. I am running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but can switch to some other versions if necessary. Most of crashes happened during reboot cycle or at the end of ACPI-based shutdown action, if this can help. I have zero clues of what can introduce such a mess inside same processor family using identical software, as 2620v1 has no simular problem ever. Please let me know if there can be some side measures for making entire story more clear. Thanks! KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e It turns out that those errors are introduced by APICv, which gets enabled due to different feature set. If anyone is interested in reproducing/fixing this exactly on 3.10, it takes about one hundred of migrations/power state changes for an issue to appear, guest OS can be Linux or Win. Are you able to reproduce this on a more recent upstream kernel as well ? Bandan I`ll go through test cycle with 3.18 and 2603v2 around tomorrow and follow up with any reproduceable results. Heh.. issue is not triggered on 2603v2 at all, at least I am not able to hit this. The only difference with 2620v2 except lower frequency is an Intel Dynamic Acceleration feature. I`d appreciate any testing with higher CPU models with same or richer feature set. The testing itself can be done on both generic 3.10 or RH7 kernels, as both of them are experiencing this issue. I conducted all tests with disabled cstates so I advise to do the same for a first reproduction step. Thanks! model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping: 4 microcode : 0x416 cpu MHz : 2100.039 cache size : 15360 KB siblings: 12 apicid : 43 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms I'm seeing something similar; it's very intermittent and generally happening right at boot of the guest; I'm running this on qemu head+my postcopy world (but it's happening right at boot before postcopy gets a chance), and I'm using a 3.19ish kernel. Xeon E5-2407 in my case but hey maybe I'm seeing a different bug. Dave Yep, looks like we are hitting same bug - two thirds of mine failure events shot during boot/reboot cycle and approx. one third of events happened in the middle of runtime. What CPU, v0 or v2 are you using (in other words, is APICv enabled)? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Tue, Mar 10, 2015 at 9:16 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Andrey Korolyov (and...@xdel.ru) wrote: On Tue, Mar 10, 2015 at 7:57 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Andrey Korolyov (and...@xdel.ru) wrote: On Sat, Mar 7, 2015 at 3:00 AM, Andrey Korolyov and...@xdel.ru wrote: On Fri, Mar 6, 2015 at 7:57 PM, Bandan Das b...@redhat.com wrote: Andrey Korolyov and...@xdel.ru writes: On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov and...@xdel.ru wrote: Hello, recently I`ve got a couple of shiny new Intel 2620v2s for future replacement of the E5-2620v1, but I experienced relatively many events with emulation errors, all traces looks simular to the one below. I am running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but can switch to some other versions if necessary. Most of crashes happened during reboot cycle or at the end of ACPI-based shutdown action, if this can help. I have zero clues of what can introduce such a mess inside same processor family using identical software, as 2620v1 has no simular problem ever. Please let me know if there can be some side measures for making entire story more clear. Thanks! KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e It turns out that those errors are introduced by APICv, which gets enabled due to different feature set. If anyone is interested in reproducing/fixing this exactly on 3.10, it takes about one hundred of migrations/power state changes for an issue to appear, guest OS can be Linux or Win. Are you able to reproduce this on a more recent upstream kernel as well ? Bandan I`ll go through test cycle with 3.18 and 2603v2 around tomorrow and follow up with any reproduceable results. Heh.. issue is not triggered on 2603v2 at all, at least I am not able to hit this. The only difference with 2620v2 except lower frequency is an Intel Dynamic Acceleration feature. I`d appreciate any testing with higher CPU models with same or richer feature set. The testing itself can be done on both generic 3.10 or RH7 kernels, as both of them are experiencing this issue. I conducted all tests with disabled cstates so I advise to do the same for a first reproduction step. Thanks! model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping: 4 microcode : 0x416 cpu MHz : 2100.039 cache size : 15360 KB siblings: 12 apicid : 43 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms I'm seeing something similar; it's very intermittent and generally happening right at boot of the guest; I'm running this on qemu head+my postcopy world (but it's happening right at boot before postcopy gets a chance), and I'm using a 3.19ish kernel. Xeon E5-2407 in my case but hey maybe I'm seeing a different bug. Dave Yep, looks like we are hitting same bug - two thirds of mine failure events shot during boot/reboot cycle and approx. one third of events happened in the middle of runtime. What CPU, v0 or v2 are you using (in other words, is APICv enabled)? processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz stepping: 7 microcode : 0x70d cpu MHz
Re: [Qemu-devel] E5-2620v2 - emulation stop error
On Fri, Mar 6, 2015 at 7:57 PM, Bandan Das b...@redhat.com wrote: Andrey Korolyov and...@xdel.ru writes: On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov and...@xdel.ru wrote: Hello, recently I`ve got a couple of shiny new Intel 2620v2s for future replacement of the E5-2620v1, but I experienced relatively many events with emulation errors, all traces looks simular to the one below. I am running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but can switch to some other versions if necessary. Most of crashes happened during reboot cycle or at the end of ACPI-based shutdown action, if this can help. I have zero clues of what can introduce such a mess inside same processor family using identical software, as 2620v1 has no simular problem ever. Please let me know if there can be some side measures for making entire story more clear. Thanks! KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e It turns out that those errors are introduced by APICv, which gets enabled due to different feature set. If anyone is interested in reproducing/fixing this exactly on 3.10, it takes about one hundred of migrations/power state changes for an issue to appear, guest OS can be Linux or Win. Are you able to reproduce this on a more recent upstream kernel as well ? Bandan I`ll go through test cycle with 3.18 and 2603v2 around tomorrow and follow up with any reproduceable results. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
E5-2620v2 - emulation stop error
Hello, recently I`ve got a couple of shiny new Intel 2620v2s for future replacement of the E5-2620v1, but I experienced relatively many events with emulation errors, all traces looks simular to the one below. I am running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but can switch to some other versions if necessary. Most of crashes happened during reboot cycle or at the end of ACPI-based shutdown action, if this can help. I have zero clues of what can introduce such a mess inside same processor family using identical software, as 2620v1 has no simular problem ever. Please let me know if there can be some side measures for making entire story more clear. Thanks! KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: E5-2620v2 - emulation stop error
On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov and...@xdel.ru wrote: Hello, recently I`ve got a couple of shiny new Intel 2620v2s for future replacement of the E5-2620v1, but I experienced relatively many events with emulation errors, all traces looks simular to the one below. I am running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but can switch to some other versions if necessary. Most of crashes happened during reboot cycle or at the end of ACPI-based shutdown action, if this can help. I have zero clues of what can introduce such a mess inside same processor family using identical software, as 2620v1 has no simular problem ever. Please let me know if there can be some side measures for making entire story more clear. Thanks! KVM internal error. Suberror: 2 extra data[0]: 80d1 extra data[1]: 8b0d EAX=0003 EBX= ECX= EDX= ESI= EDI= EBP= ESP=6cd4 EIP=d3f9 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= 000f6e98 0037 IDT= 03ff CR0=0010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e It turns out that those errors are introduced by APICv, which gets enabled due to different feature set. If anyone is interested in reproducing/fixing this exactly on 3.10, it takes about one hundred of migrations/power state changes for an issue to appear, guest OS can be Linux or Win. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008
Hi, I've seen the problem quite a few times. Before spending more time on it, I'd like to have a quick check here to see if anyone ever saw the same problem? Hope it is a relevant question with this mail list. Jul 2 11:08:21 arno-3 kernel: [ 2165.078623] BUG: unable to handle kernel NULL pointer dereference at 0008 Jul 2 11:08:21 arno-3 kernel: [ 2165.078916] IP: [8118d0fa] copy_huge_page+0x8a/0x2a0 Jul 2 11:08:21 arno-3 kernel: [ 2165.079128] PGD 0 Jul 2 11:08:21 arno-3 kernel: [ 2165.079198] Oops: [#1] SMP Jul 2 11:08:21 arno-3 kernel: [ 2165.079319] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bridge stp llc ast ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea lp mei_me ioatdma ext2 parport mei shpchp dcdbas joydev mac_hid lpc_ich acpi_pad wmi hid_generic usbhid hid ixgbe igb dca i2c_algo_bit ahci ptp libahci mdio pps_core Jul 2 11:08:21 arno-3 kernel: [ 2165.081090] CPU: 19 PID: 3494 Comm: qemu-system-x86 Not tainted 3.11.0-15-generic #25~precise1-Ubuntu Jul 2 11:08:21 arno-3 kernel: [ 2165.081424] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V, BIOS 2.0.3 07/03/2013 Jul 2 11:08:21 arno-3 kernel: [ 2165.081705] task: 88102675 ti: 881026056000 task.ti: 881026056000 Jul 2 11:08:21 arno-3 kernel: [ 2165.081973] RIP: 0010:[8118d0fa] [8118d0fa] copy_huge_page+0x8a/0x2a0 Hello, sorry for possible top-posting, the same issue appears on at least 3.10 LTS series. The original thread is at http://marc.info/?l=kvmm=14043742300901. The necessary components for failure to reappear are a single running kvm guest and mounted large thp: hugepagesz=1G (seemingly the same as in initial report). With default 2M pages everything is working well, the same for 3.18 with 1G THP. Are there any obvious clues for the issue? Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008
Sorry for all the previous mess, my Claws-mailer went nuts for no reason. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cpu frequency
On Wed, Feb 4, 2015 at 3:06 AM, Nerijus Baliunas neri...@users.sourceforge.net wrote: On Tue, 3 Feb 2015 18:07:57 +0400 Andrey Korolyov and...@xdel.ru wrote: Have you tried to disable turbo mode (assuming you have new enough CPU model) and fix frequency via frequency governor` settings? If it helps, it can be an ugly hack with pre-up/post-up libvirt actions, though you`d probably want to keep frequency the same to maximize performance. I tried to alter frequency governor settings, but unsuccessfully. They seem to change but then revert back in some short time. CentOS 7, Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz. I remember that the floating frequency was resulted in incorrect guest CPU information, so may be it is an exact solution for your situation. Those frequency values could be altered by running service, unfortunately I have a not enough knowledge about Centos7 packages to name it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cpu frequency
It did not help. Today that commecrial application detects 2400, although Control Panel - System shows 2.20 GHz. So my question again - is it possible to patch qemu-kvm that it shows some constant frequency to the guest? But the answer is probably not, because I don't know how the application computes the frequency... Have you tried to disable turbo mode (assuming you have new enough CPU model) and fix frequency via frequency governor` settings? If it helps, it can be an ugly hack with pre-up/post-up libvirt actions, though you`d probably want to keep frequency the same to maximize performance. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Possible approaches to limit csw overhead
Hello, I have a rather practical question, is it possible to limit amount of vm-initiated events for single VM? As and example, VM which experienced OOM and effectively stuck dead generates a lot of unnecessary context switches triggering do_raw_spin_lock very often and therefore increasing overall compute workload. This possibly can be done via reactive limitation of the cpu quota via cgroup, but such method is quite impractical because every orchestration solution will need to implement its own piece of code to detect such VM states and act properly. I wonder if there may be a proposal which will do this job better than userspace-implemented perf statistics loop. Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
On Tue, Sep 2, 2014 at 10:36 AM, Amit Shah amit.s...@redhat.com wrote: On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote: Hi, all I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. without virtio-serial: 4k-read-random 1186 IOPS with virtio-serial: 4k-read-random 871 IOPS but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. And, ide performance degradation does not happen with virtio-serial. Pretty sure it's related to MSI vectors in use. It's possible that the virtio-serial device takes up all the avl vectors in the guests, leaving old-style irqs for the virtio-blk device. I don't think so, I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager-virtio-serial = disable, then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager-virtio-serial = enable, the performance got back again, very obvious. add comments: Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. Using the vectors= option as mentioned below, you can restrict the number of MSI vectors the virtio-serial device gets. You can then confirm whether it's MSI that's related to these issues. So, I think it has no business with legacy interrupt mode, right? I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, and the difference of perf top data on guest when disable/enable virtio-serial in guest, any ideas? Thanks, Zhang Haoyu If you restrict the number of vectors the virtio-serial device gets (using the -device virtio-serial-pci,vectors= param), does that make things better for you? Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Can confirm serious degradation comparing to the 1.1 with regular serial output - I am able to hang VM forever after some tens of seconds after continuously printing dmest to the ttyS0. VM just ate all available CPU quota during test and hanged over some tens of seconds, not even responding to regular pings and progressively raising CPU consumption up to the limit. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
On Tue, Sep 2, 2014 at 10:11 PM, Amit Shah amit.s...@redhat.com wrote: On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote: Can confirm serious degradation comparing to the 1.1 with regular serial output - I am able to hang VM forever after some tens of seconds after continuously printing dmest to the ttyS0. VM just ate all available CPU quota during test and hanged over some tens of seconds, not even responding to regular pings and progressively raising CPU consumption up to the limit. Entirely different to what's being discussed here. You're observing slowdown with ttyS0 in the guest -- the isa-serial device. This thread is discussing virtio-blk and virtio-serial. Amit Sorry for thread hijacking, the problem definitely not related to the interrupt rework, will start a new thread. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug: No irq handler for vector (irq -1) on C602
Hello, ran into this error for a first time over veru large hardware span/uptime (the server which experienced the error is identical to others, and I had absolutely none of MSI-related problems with this hardware ever). Running 3.10 at host, I had one (of many) VM on it which produced enormous count of context switches due to mess inside (hundreds of active apache-itk workers). All VM threads are pinned to the first sibling for every core on two-head system, e.g. having 24 HT cores and second half is just HT siblings, cpuset cg limits threads only to first. The error itself was produced a second after reset event for this VM (through libvirt, if exact call matters): [7696746.523478] do_IRQ: 11.233 No irq handler for vector (irq -1) Since there are no hints for this exact error recently, and it triggered by critical part of the kernel code, I think it may be interesting to re-raise the issue (or, at least, make a better bound for error source). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Verifying Execution Integrity in Untrusted hypervisors
On Sat, Jul 26, 2014 at 2:06 AM, Paolo Bonzini pbonz...@redhat.com wrote: Thanks a lot Paolo. Is there a way to atleast detect that the hypervisor has done something malicious and the client will be able to refer to some kind of logs to prove it? If you want a theoretical, perfect solution, no. I wouldn't be surprised if this is equivalent to the halting problem. If you want a practical solution, you have to define a threat model. What kind of attacks are you worried about? Which parts of the environment can you control? Can you place something trusted between the vulnerable VM and its clients? And so on. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Here are some bits I read before: https://www.cs.purdue.edu/homes/bb/cs590/papers/secure_vm.pdf. It`s all about timing measurement after all, if you`ll be able to measure them or derive methods from, say, cache correlation attacks to avoid possibility of continuous hijack due to knowledge of amount of computation/timings which will not be possible with constant Eve measurements. you can complete task at a half. Complete execution without continuous checking against locally placed trusted blackbox equivalent (hardware token, trusted execution replaying or so) is hardly possible by my understanding. Anyway, any of imaginable cases relies on a finite amount of computing power available to a single thread, so I can hardly say that real-world implementation *is secure*, though we can define high probability of it. I believe that the homomorphic encryption can do its way for at least some kind of services by next decade, due to tendency of total cloudization, and this is definitely better than sticks-and-mud approach with timings. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
No-downtime ROM upgrades
Hello, As it shows up, upgrading of any system ROMs loaded to the emulator initially is not possible without complete restart of the emulator itself, as live migration refuses to complete with different payload at both ends. Assuming that the guest side payload for vga/ethernet can be actually re-read via powering off and on corresponding PCI devices at the runtime, it`s hard to say of what to do with BIOS itself. Does anyone have an idea if such kind of upgrades are discussed/implemented somewhere? Though ROMs are very unlikely to be buggy, some additional features may come in over couple of releases adding necessity of relaunching virtual machine (like upcoming PNP080 update for example). And, of course, there are business-driven cases where such thing must be avoided. :) Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.10.X kernel/jump_label kvm
On 02/28/2014 11:47 PM, Stefan Priebe wrote: Hello, i got this stack trace multiple times while using a vanilla 3.10.32 kernel and already sent it to the list in december but got no replies. Hi, What kind of workload the host system is experiencing at same time? Does this event correlate with high memory pressure? [78136.551061] WARNING: at kernel/jump_label.c:80 __static_key_slow_dec+0xb6/0xc0() [78136.551062] jump label: negative count! [78136.551063] Modules linked in: sch_htb act_police cls_u32 sch_ingress vhost_net tun macvtap macvlan netconsole ipt_REJECT dlm sctp iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding ext2 8021q garp fuse mperf coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel microcode i2c_i801 button dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci ahci ehci_hcd igb libahci i2c_algo_bit isci usbcore i2c_core usb_common libsas ptp ixgbe(O) scsi_transport_sas pps_core [78136.551080] CPU: 21 PID: 47183 Comm: kvm Tainted: GW O 3.10.32+68-ph #1 [78136.551081] Hardware name: Supermicro X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 3.00 07/05/2013 [78136.551081] 0009 882f4a669be8 81524606 882f4a669c28 [78136.551085] 8104853b 4a669c08 a045cc40 00fa [78136.551088] a045cc60 882f51460160 882f74ab8110 882f4a669c88 [78136.551091] Call Trace: [78136.551093] [81524606] dump_stack+0x19/0x1b [78136.551095] [8104853b] warn_slowpath_common+0x6b/0xa0 [78136.551098] [81048611] warn_slowpath_fmt+0x41/0x50 [78136.551100] [810e05f6] __static_key_slow_dec+0xb6/0xc0 [78136.551102] [810e0631] static_key_slow_dec_deferred+0x11/0x20 [78136.551110] [a043ff60] kvm_free_lapic+0x90/0xa0 [kvm] [78136.551116] [a0429ef3] kvm_arch_vcpu_uninit+0x23/0x90 [kvm] [78136.551122] [a0410a20] kvm_vcpu_uninit+0x20/0x40 [kvm] [78136.551125] [a021fc12] vmx_free_vcpu+0x52/0x70 [kvm_intel] [78136.551132] [a04295ef] kvm_arch_vcpu_free+0x4f/0x60 [kvm] [78136.551138] [a042a112] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm] [78136.551141] [81071048] ? synchronize_srcu+0x18/0x20 [78136.551143] [8112677a] ? mmu_notifier_unregister+0xaa/0xe0 [78136.551149] [a041380e] kvm_put_kvm+0x10e/0x1b0 [kvm] [78136.551155] [a0413a33] kvm_vcpu_release+0x13/0x20 [kvm] [78136.551157] [811452d1] __fput+0xe1/0x230 [78136.551160] [81145429] fput+0x9/0x10 [78136.551162] [81068de5] task_work_run+0xb5/0xd0 [78136.551164] [8104da1c] do_exit+0x2ac/0xa30 [78136.551166] [8107a89b] ? wake_up_state+0xb/0x10 [78136.551169] [81059fad] ? signal_wake_up_state+0x1d/0x30 [78136.551171] [8105b1c3] ? zap_other_threads+0x83/0xa0 [78136.551173] [8104e21a] do_group_exit+0x3a/0xa0 [78136.551175] [8104e292] SyS_exit_group+0x12/0x20 [78136.551177] [81529fd2] system_call_fastpath+0x16/0x1b [78136.551178] ---[ end trace b9ebb6de9753ef4c ]--- Thanks! Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Qemu main thread is blocked in g_poll in windows guest
On 10/15/2013 04:18 PM, Xiexiangyou wrote: Thanks for your reply :-) The QEMU version is 1.5.1,and the KVM version is 3.6 QEMU command: /usr/bin/qemu-kvm -name win2008_dc_5 -S -machine pc-i440fx-1.5,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 4,maxcpus=64,sockets=16,cores=4,threads=1 -uuid 13e08e3e-cd23-4450-8bd3-60e7c220316d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win2008_dc_5.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/vmdisk/win2008_dc_5,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:16:49:23,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/run/libvirt/qe m u/win2008_dc_5.extend,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 -chardev socket,id=charchannel1,path=/var/run/libvirt/qemu/win2008_dc_5.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -vnc 0.0.0.0:4 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 (gdb) bt #0 0x7f9ba661a423 in poll () from /lib64/libc.so.6 #1 0x0059460f in os_host_main_loop_wait (timeout=4294967295) at main-loop.c:226 #2 0x005946a4 in main_loop_wait (nonblocking=0) at main-loop.c:464 #3 0x00619309 in main_loop () at vl.c:2182 #4 0x0061fb5e in main (argc=54, argv=0x7fff879830c8, envp=0x7fff87983280) at vl.c:4611 Main thread's strace message: # strace -p 6386 Process 6386 attached - interrupt to quit restart_syscall(... resuming interrupted call ... cpu thread's strace message: # strace -p 6389 Process 6389 attached - interrupt to quit rt_sigtimedwait([BUS USR1], 0x7f9ba36fbc00) = -1 EAGAIN (Resource temporarily unavailable) rt_sigpending([]) = 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ioctl(17, 0xae80, 0)= 0 ... Thanks! --xie -Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Tuesday, October 15, 2013 7:52 PM To: Xiexiangyou Cc: qemu-de...@nongnu.org; qemu-devel-requ...@nongnu.org; kvm@vger.kernel.org; Huangpeng (Peter); Luonengjun Subject: Re: [RFH] Qemu main thread is blocked in g_poll in windows guest Il 15/10/2013 12:21, Xiexiangyou ha scritto: Hi all: Windows2008 Guest run without pressure for long time. Sometimes, it stop and looks like hanging. But when I connect to it with VNC, It resume to run, but VM's time is delayed . When the vm is hanging, I check the main thread of QEMU. I find that the thread is blocked in g_poll function. it is waiting for a SIG, However, there is no SIG . I tried the clock with hpet and no hpet, but came out the same problem. Then I upgrade the glibc to newer, it didn't work too. I'm confused. Is the reason that VM in sleep state and doesn't emit the signal. I set the windows 's power option, enable/disable the allow the wake timers, I didn't work. Is anybody have met the same problem before, or know the reason. Your reply will be very helpful. This post is missing a few pieces of information: * What version of QEMU is this? * What is the command line? * How do you know g_poll is waiting for a signal and not for a file descriptor? * What is the backtrace of the main thread? What is the backtrace of the VCPU thread? etc. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hello, To revive this thread - I have exactly the same problem on freshly migrated virtual machines. The guest operating system is almost always Linux, bug impact ratio is very low, about one per tens of migrations. VM 'uptime',
Re: QEMU P2P migration speed
On 02/07/2014 07:32 PM, Paolo Bonzini wrote: Il 07/02/2014 14:07, Andrey Korolyov ha scritto: Ok, I will do, but looks like libvirt version(1.0.2) in not relevant - it meets criteria set by debian packagers Then Debian's qemu packaging it's wrong, QEMU 1.6 or newer should conflict with libvirt 1.2.0. and again, 'broken state' is not relevant to the libvirt state history, it more likely to be qemu/kvm problem. It is relevant, qemu introduced a new migration status before active (setup) and libvirt doesn't recognize it. That's why you need at least 1.2.0. Paolo Thanks, both issues - with reverted CPU dependency and with migration itself went away. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QEMU P2P migration speed
On 02/07/2014 12:14 PM, Paolo Bonzini wrote: Il 06/02/2014 14:40, Andrey Korolyov ha scritto: Took and build 1.6.2 and faced a problem - after a couple of bounce iterations of migration (1-2-1-2) VM is not able to migrate anymore back in a probabilistic manner with an error 'internal error unexpected migration status in setup'. Error may disappear over a time, or may not disappear at all and it may took a lot of tries in a row to succeed. There are no obvious hints with default logging level in libvirt/qemu logs and seemingly libvirt is not a cause because accumulated error state preserves over service restarts. Also every VM is affected, not ones which are experiencing multiple migration actions. Error happens on 3rd-5th second of the migration procedure, if it may help. You need to update libvirt too. Paolo Ok, I will do, but looks like libvirt version(1.0.2) in not relevant - it meets criteria set by debian packagers and again, 'broken state' is not relevant to the libvirt state history, it more likely to be qemu/kvm problem. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QEMU P2P migration speed
On 02/05/2014 07:15 PM, Paolo Bonzini wrote: Il 05/02/2014 11:46, Andrey Korolyov ha scritto: On 02/05/2014 11:27 AM, Paolo Bonzini wrote: Il 04/02/2014 18:06, Andrey Korolyov ha scritto: Migration time is almost independent of VM RSS(varies by ten percent at maximum), for situation when VM is active on target host, time is about 85 seconds to migrate 8G between hosts, and when it is turned off, migration time *increasing* to 120s. For curious ones, frequency management is completely inactive on both nodes, neither CStates mechanism. Interconnection is relatively fast (20+Gbit/s by IPoIB). What version of QEMU? Paolo Ancie.. ehm, stable - 1.1.2 from wheezy. Should I try 1.6/1.7? Yeah, you can checkout the release notes on wiki.qemu.org to find out which versions had good improvements. You can also try compiling straight from git, there are more speedups there. Paolo Took and build 1.6.2 and faced a problem - after a couple of bounce iterations of migration (1-2-1-2) VM is not able to migrate anymore back in a probabilistic manner with an error 'internal error unexpected migration status in setup'. Error may disappear over a time, or may not disappear at all and it may took a lot of tries in a row to succeed. There are no obvious hints with default logging level in libvirt/qemu logs and seemingly libvirt is not a cause because accumulated error state preserves over service restarts. Also every VM is affected, not ones which are experiencing multiple migration actions. Error happens on 3rd-5th second of the migration procedure, if it may help. What is more interesting, the original counter-intuitive behavior is not disappeared but increased relative span: 25 vs 70 seconds for fully commited 8G VM. I am suspecting some mechanism falling back to the idle and dropping overall performance therefore but can not image one beyond standard freq/cstates which are definitely turned off. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QEMU P2P migration speed
On 02/05/2014 11:27 AM, Paolo Bonzini wrote: Il 04/02/2014 18:06, Andrey Korolyov ha scritto: Migration time is almost independent of VM RSS(varies by ten percent at maximum), for situation when VM is active on target host, time is about 85 seconds to migrate 8G between hosts, and when it is turned off, migration time *increasing* to 120s. For curious ones, frequency management is completely inactive on both nodes, neither CStates mechanism. Interconnection is relatively fast (20+Gbit/s by IPoIB). What version of QEMU? Paolo Ancie.. ehm, stable - 1.1.2 from wheezy. Should I try 1.6/1.7? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
QEMU P2P migration speed
Hello, I`ve got strange results during benchmarking migration speed for different kinds of loads on source/target host: when source host is 'empty', migration takes approx. 30 percent longer than the same for host already occupied by one VM with CPU overcommit ratio=1. [src host, three equal vms, each with ability to eat all cores once] [tgt host, one VM, with same appetite and limitations] All VMs was put into cgroups with same cpu ceiling and cpu shares values. Migration time is almost independent of VM RSS(varies by ten percent at maximum), for situation when VM is active on target host, time is about 85 seconds to migrate 8G between hosts, and when it is turned off, migration time *increasing* to 120s. For curious ones, frequency management is completely inactive on both nodes, neither CStates mechanism. Interconnection is relatively fast (20+Gbit/s by IPoIB). Anyone have a suggestions on how to possibly explain/fix this? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Equivalent of vmware SIOC (Storage IO Control) in KVM
Hello, By the way, is there plans to enhance qemu I/O throttling to able to swallow peaks or to apply various disciplines? Current one-second flat discipline seemingly is not enough for uneven workloads especially when there is no alternative like cgroups for nbd usage. Thanks! On Sun, Oct 13, 2013 at 5:26 PM, Paolo Bonzini pbonz...@redhat.com wrote: 2) enable I/O throttling in QEMU, to apply limits at the level of the guest disk. If you're using libvirt, add the iotune element within the disk element in the definition of the virtual machine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lockups using per-thread cgs and kvm
Hi, We (a cloud hosting provider) has recently observed a couple of strange lockups when physical node runs significant amount of Win2008R2 kvm appliances, one may see collection of those lockups at the link below. After checking a lot of ideas without any valuable result, I have suggested that nested per-thread cgroup placement created by libvirt may lead to this problem(libvirt puts emulator and each of vcpu threads into separate sub-cgroup). Disabling such behavior, e.g. having only one cgroup per kvm process per cgroup type solved this problem, at least it didn`t happen on most stressful tests we`re able to do. Since it is generally unusual for well-known kernel mechanism, such as cgroups, to broke way like this, I hope we`ve found a quite rare kind of bug. Just for the record, the bug also may happen using linux guest, but much rarely, one or two orders of magnitude. We have stayed on default scheduler granularity value at this tests, if it matters. For anyone who wants to see entire timeline of this bug, please see [1]. [1]. http://www.spinics.net/lists/kvm/msg85956.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: windows 2008 guest causing rcu_shed to emit NMI
On Thu, Jan 31, 2013 at 12:11 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Wed, Jan 30, 2013 at 11:21:08AM +0300, Andrey Korolyov wrote: On Wed, Jan 30, 2013 at 3:15 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 29, 2013 at 02:35:02AM +0300, Andrey Korolyov wrote: On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov and...@xdel.ru wrote: On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote: On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? If you can upgrade to an upstream kernel, please do that. With vanilla 3.7.4 there is almost no changes, and NMI started firing again. External symptoms looks like following: starting from some count, may be third or sixth vm, qemu-kvm process allocating its memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to kill stuck kvm processes and node returned back to the normal, when on 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' output (problem and workaround when no scheduler involved described here http://www.spinics.net/lists/kvm/msg84799.html). Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. Hi Marcelo, thanks, this parameter helped to increase number of working VMs in a half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 to 15 percents, persists on such numbers for a long time, where linux guests in same configuration do not jump over one percent even under stress bench. After I disabled HT, crash happens only in long runs and now it is kernel panic :) Stair-like memory allocation behaviour disappeared, but other symptom leading to the crash which I have not counted previously, persists: if VM count is ``enough'' for crash, some qemu processes starting to eat one core, and they`ll panic system after run in tens of minutes in such state or if I try to attach debugger to one of them. If needed, I can log entire crash output via netconsole, now I have some tail, almost the same every time: http://xdel.ru/downloads/btwin.png Yes, please log entire crash output, thanks. Here please, 3.7.4-vanilla, 16 vms, ple_gap=0: http://xdel.ru/downloads/oops-default-kvmintel.txt Just an update: I was able to reproduce that on pure linux VMs using qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at start of vm(with count ten working machines at the moment). Qemu-1.1.2 generally is not able to reproduce that, but host node with older version crashing on less amount of Windows VMs(three to six instead ten to fifteen) than with 1.3, please see trace below: http://xdel.ru/downloads/oops-old-qemu.txt Single bit memory error, apparently. Try: 1. memtest86. 2. Boot with slub_debug=ZFPU kernel parameter. 3. Reproduce on different machine Hi Marcelo, I always follow the rule - if some weird bug exists, check it on ECC-enabled machine and check IPMI logs too before start complaining :) I have finally managed to ``fix'' the problem, but my solution seems a bit strange: - I have noticed that if virtual machines started without any cgroup setting they will not cause this bug under any conditions, - I have thought, very wrong in my mind, that the CONFIG_SCHED_AUTOGROUP should regroup the tasks without any cgroup and should not touch tasks already inside any existing cpu cgroup. First sight on the 200-line patch shows that the autogrouping always applies to all tasks, so I tried to disable it, - wild magic appears - VMs didn`t crashed host any more, even in count 30+ they work fine. I still don`t know what exactly triggered that and will I face it again under different conditions, so my solution more likely to be a patch of mud in wall of the dam, instead of proper fixing. There seems to be two possible origins of such error - a very very hideous race condition involving cgroups and processes like qemu-kvm causing frequent context switches and simple
Re: windows 2008 guest causing rcu_shed to emit NMI
On Wed, Jan 30, 2013 at 3:15 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 29, 2013 at 02:35:02AM +0300, Andrey Korolyov wrote: On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov and...@xdel.ru wrote: On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote: On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? If you can upgrade to an upstream kernel, please do that. With vanilla 3.7.4 there is almost no changes, and NMI started firing again. External symptoms looks like following: starting from some count, may be third or sixth vm, qemu-kvm process allocating its memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to kill stuck kvm processes and node returned back to the normal, when on 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' output (problem and workaround when no scheduler involved described here http://www.spinics.net/lists/kvm/msg84799.html). Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. Hi Marcelo, thanks, this parameter helped to increase number of working VMs in a half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 to 15 percents, persists on such numbers for a long time, where linux guests in same configuration do not jump over one percent even under stress bench. After I disabled HT, crash happens only in long runs and now it is kernel panic :) Stair-like memory allocation behaviour disappeared, but other symptom leading to the crash which I have not counted previously, persists: if VM count is ``enough'' for crash, some qemu processes starting to eat one core, and they`ll panic system after run in tens of minutes in such state or if I try to attach debugger to one of them. If needed, I can log entire crash output via netconsole, now I have some tail, almost the same every time: http://xdel.ru/downloads/btwin.png Yes, please log entire crash output, thanks. Here please, 3.7.4-vanilla, 16 vms, ple_gap=0: http://xdel.ru/downloads/oops-default-kvmintel.txt Just an update: I was able to reproduce that on pure linux VMs using qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at start of vm(with count ten working machines at the moment). Qemu-1.1.2 generally is not able to reproduce that, but host node with older version crashing on less amount of Windows VMs(three to six instead ten to fifteen) than with 1.3, please see trace below: http://xdel.ru/downloads/oops-old-qemu.txt Single bit memory error, apparently. Try: 1. memtest86. 2. Boot with slub_debug=ZFPU kernel parameter. 3. Reproduce on different machine Hi Marcelo, I always follow the rule - if some weird bug exists, check it on ECC-enabled machine and check IPMI logs too before start complaining :) I have finally managed to ``fix'' the problem, but my solution seems a bit strange: - I have noticed that if virtual machines started without any cgroup setting they will not cause this bug under any conditions, - I have thought, very wrong in my mind, that the CONFIG_SCHED_AUTOGROUP should regroup the tasks without any cgroup and should not touch tasks already inside any existing cpu cgroup. First sight on the 200-line patch shows that the autogrouping always applies to all tasks, so I tried to disable it, - wild magic appears - VMs didn`t crashed host any more, even in count 30+ they work fine. I still don`t know what exactly triggered that and will I face it again under different conditions, so my solution more likely to be a patch of mud in wall of the dam, instead of proper fixing. There seems to be two possible origins of such error - a very very hideous race condition involving cgroups and processes like qemu-kvm causing frequent context switches and simple incompatibility between NUMA, logic of CONFIG_SCHED_AUTOGROUP and qemu VMs already doing work in the cgroup, since I have not observed this errors on single numa node(mean, desktop) on relatively heavier condition. -- To unsubscribe from this list: send the line unsubscribe kvm
Re: windows 2008 guest causing rcu_shed to emit NMI
On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote: On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? If you can upgrade to an upstream kernel, please do that. With vanilla 3.7.4 there is almost no changes, and NMI started firing again. External symptoms looks like following: starting from some count, may be third or sixth vm, qemu-kvm process allocating its memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to kill stuck kvm processes and node returned back to the normal, when on 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' output (problem and workaround when no scheduler involved described here http://www.spinics.net/lists/kvm/msg84799.html). Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. Hi Marcelo, thanks, this parameter helped to increase number of working VMs in a half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 to 15 percents, persists on such numbers for a long time, where linux guests in same configuration do not jump over one percent even under stress bench. After I disabled HT, crash happens only in long runs and now it is kernel panic :) Stair-like memory allocation behaviour disappeared, but other symptom leading to the crash which I have not counted previously, persists: if VM count is ``enough'' for crash, some qemu processes starting to eat one core, and they`ll panic system after run in tens of minutes in such state or if I try to attach debugger to one of them. If needed, I can log entire crash output via netconsole, now I have some tail, almost the same every time: http://xdel.ru/downloads/btwin.png Yes, please log entire crash output, thanks. Here please, 3.7.4-vanilla, 16 vms, ple_gap=0: http://xdel.ru/downloads/oops-default-kvmintel.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: windows 2008 guest causing rcu_shed to emit NMI
On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov and...@xdel.ru wrote: On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote: On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? If you can upgrade to an upstream kernel, please do that. With vanilla 3.7.4 there is almost no changes, and NMI started firing again. External symptoms looks like following: starting from some count, may be third or sixth vm, qemu-kvm process allocating its memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to kill stuck kvm processes and node returned back to the normal, when on 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' output (problem and workaround when no scheduler involved described here http://www.spinics.net/lists/kvm/msg84799.html). Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. Hi Marcelo, thanks, this parameter helped to increase number of working VMs in a half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 to 15 percents, persists on such numbers for a long time, where linux guests in same configuration do not jump over one percent even under stress bench. After I disabled HT, crash happens only in long runs and now it is kernel panic :) Stair-like memory allocation behaviour disappeared, but other symptom leading to the crash which I have not counted previously, persists: if VM count is ``enough'' for crash, some qemu processes starting to eat one core, and they`ll panic system after run in tens of minutes in such state or if I try to attach debugger to one of them. If needed, I can log entire crash output via netconsole, now I have some tail, almost the same every time: http://xdel.ru/downloads/btwin.png Yes, please log entire crash output, thanks. Here please, 3.7.4-vanilla, 16 vms, ple_gap=0: http://xdel.ru/downloads/oops-default-kvmintel.txt Just an update: I was able to reproduce that on pure linux VMs using qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at start of vm(with count ten working machines at the moment). Qemu-1.1.2 generally is not able to reproduce that, but host node with older version crashing on less amount of Windows VMs(three to six instead ten to fifteen) than with 1.3, please see trace below: http://xdel.ru/downloads/oops-old-qemu.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: windows 2008 guest causing rcu_shed to emit NMI
On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? If you can upgrade to an upstream kernel, please do that. With vanilla 3.7.4 there is almost no changes, and NMI started firing again. External symptoms looks like following: starting from some count, may be third or sixth vm, qemu-kvm process allocating its memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to kill stuck kvm processes and node returned back to the normal, when on 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' output (problem and workaround when no scheduler involved described here http://www.spinics.net/lists/kvm/msg84799.html). Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. Hi Marcelo, thanks, this parameter helped to increase number of working VMs in a half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 to 15 percents, persists on such numbers for a long time, where linux guests in same configuration do not jump over one percent even under stress bench. After I disabled HT, crash happens only in long runs and now it is kernel panic :) Stair-like memory allocation behaviour disappeared, but other symptom leading to the crash which I have not counted previously, persists: if VM count is ``enough'' for crash, some qemu processes starting to eat one core, and they`ll panic system after run in tens of minutes in such state or if I try to attach debugger to one of them. If needed, I can log entire crash output via netconsole, now I have some tail, almost the same every time: http://xdel.ru/downloads/btwin.png -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: windows 2008 guest causing rcu_shed to emit NMI
Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? On Thu, Jan 24, 2013 at 4:52 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 22, 2013 at 09:00:25PM +0300, Andrey Korolyov wrote: Hi, problem described in the title happens on heavy I/O pressure on the host, without idle=poll trace almost always is the same, involving mwait, with poll and nohz=off RIP varies from time to time, at the previous hang it was tg_throttle_down, rather than test_ti_thread_flag in attached one. Both possible clocksource drivers, hpet and tsc, able to reproduce that with equal probability. VMs are pinned over one of two numa sets on two-head machine, mean emulator thread and each of vcpu threads has its own cpuset cg with '0-5,12-17' or '6-11,18-23'. I`ll appreciate any suggestions to try. Andrey, Can you reproduce with an upstream kernel? Commit 5cfc2aabcb282f fixes a livelock. d2 75 c3 eb 03 41 89 c6 48 83 c4 18 44 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 31 c0 c3 48 63 ff 48 c7 c2 80 37 01 00 48 8b 0c fd e0 d6 68 81 [12738.508644] Call Trace: [12738.508648] [81035a66] ? walk_tg_tree_from+0x70/0x99 [12738.508652] [81014c03] ? __switch_to_xtra+0x14c/0x160 [12738.508656] [8103bcce] ? throttle_cfs_rq+0x4d/0x109 [12738.508660] [8103be70] ? put_prev_task_fair+0x3f/0x65 [12738.508663] [8134c8ae] ? __schedule+0x32e/0x5c3 [12738.508666] [8134ceee] ? yield_to+0xfa/0x10c [12738.508669] [8105d5af] ? atomic_inc+0x3/0x4 [12738.508678] [a03a8fc4] ? kvm_vcpu_on_spin+0x8c/0xf7 [kvm] [12738.508684] [a030602f] ? handle_pause+0x11/0x18 dmesg.txt.gz Description: GNU Zip compressed data
Re: windows 2008 guest causing rcu_shed to emit NMI
On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: Thank you Marcelo, Host node locking up sometimes later than yesterday, bur problem still here, please see attached dmesg. Stuck process looks like root 19251 0.0 0.0 228476 12488 ?D14:42 0:00 /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device virtio-blk-pci,? -device on fourth vm by count. Should I try upstream kernel instead of applying patch to the latest 3.4 or it is useless? If you can upgrade to an upstream kernel, please do that. With vanilla 3.7.4 there is almost no changes, and NMI started firing again. External symptoms looks like following: starting from some count, may be third or sixth vm, qemu-kvm process allocating its memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to kill stuck kvm processes and node returned back to the normal, when on 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' output (problem and workaround when no scheduler involved described here http://www.spinics.net/lists/kvm/msg84799.html). dmesg-3.7.4.txt.gz Description: GNU Zip compressed data
windows 2008 guest causing rcu_shed to emit NMI
Hi, problem described in the title happens on heavy I/O pressure on the host, without idle=poll trace almost always is the same, involving mwait, with poll and nohz=off RIP varies from time to time, at the previous hang it was tg_throttle_down, rather than test_ti_thread_flag in attached one. Both possible clocksource drivers, hpet and tsc, able to reproduce that with equal probability. VMs are pinned over one of two numa sets on two-head machine, mean emulator thread and each of vcpu threads has its own cpuset cg with '0-5,12-17' or '6-11,18-23'. I`ll appreciate any suggestions to try. dmesg2.txt.gz Description: GNU Zip compressed data
Proper taming of oom-killer with kvm
Hi, I have recently run in the following issue: under certain conditions, if emulator process have exceeded its own memory limit in the cgroup and oom shot it, /proc entry may stay long indefinitely. There are two possible side-effects - at first, if one will try to read cmdline from such entry, his request will hang indefinitely long too, e.g. if issuing ``ps aux'' once per minute will fill out default PID limit in less a half of day by ps processes in D state. Second effect may appear only on heavily loaded node - scheduler process will eat 100% of selected cores (almost always at only one), with system becoming unresponsive in a couple of minutes. This should be reproduced easily: - start kvm process - put in to memory cgroup and set the limits - disable oom_killer via oom_control - simply put the process into oom condition (using balloon, it should be very simple) - check kvm process state - if it stuck in D state, all should be okay, since you`re able to catch oom condition - simply send TERM signal and raise memory limit by nonsignificant amount and process wiil end normally. If you`re observing kvm process with triggered under_oom flag and in _sleep_ state, TERM signal will kill it, with nice lock described above. I have solved problem by quite stupid workaround - after getting informed of oom event(w/ disabled oom in cg), I`m freezing kvm process via freezer cg, therefore moving it to D state, then sending it TERM, then raising memory limits and finally unfreezing it - it is very ugly, but at least I have get rid of the problem. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html