On 11/02/15 19:53, Alex Williamson wrote: > On Mon, 2015-11-02 at 19:31 +0100, Laszlo Ersek wrote: >> On 10/31/15 18:50, Laszlo Ersek wrote: >> >>> I'm very sorry, but I don't think I can spend time on this, unless >>> someone gives me ssh and/or console access to a host that readily >>> reproduces the bug, with the latest kvm/master, qemu, and ekd2 >>> builds. >> >> We just got lucky, the problem reproduces on my oldie workstation (HP Z400, >> (R) Xeon(R) CPU W3550 @ 3.07GHz, family/model/stepping/microcode = >> 6/26/5/0x19). >> >> OVMF works fine with the most recent Fedora 21 host kernel, >> "kernel-4.1.10-100.fc21.x86_64", and it breaks on the crispy new upstream >> Linux 4.3 release. >> >> The OVMF log contains a number of failed ASSERT()s, printed by the APs, in a >> way that is very characteristic of multi-threading bugs: >> >> --*-- >> >> Detect CPU count: 2 >> AASSSSEERRTT >> //hhoomem/el/alcaocso/ss/rscr/cu/pusptsrteraema/me/dekd2k-2g-igti-ts-vsnv/nM/dMedPekPgk/gL/iLbirbarrayr/yB/aBsaesSeySnycnhcrhornoiznatiizoantLiiobn/LSiybn/cShyrnocnhirzoantiizoantGicocn.Gcc(c1.9c8()1:9 >> 8L)o:c kLVoaclkuVea l=u=e (=(=U I(N(TUNI)N T2N)) |2|) L|o|c kLVoaclkuVea >> l=u=e (=(=U I(N(TUNI)N T1N))^M >> 1) >> ASSERT >> /home/lacos/src/upstream/edk2-git-svn/MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(198): >> LockValue == ((UINTN) 2) || LockValue == ((UINTN) 1) >> >> --*-- >> >> (Note that the above is for a 8 VCPU guest. Cf. "Detect CPU count: 2".) >> >> >> Maybe important, maybe not: my workstation's IOMMU does *not* have snoop >> control: >> >> [ 3.043297] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap >> c90780106f0462 ecap f02076 >> >> Where 0xf02076 & 0x80 == 0. >> >> Whereas my laptop *does* have Snoop Control: >> >> [ 0.030486] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap >> c0000020660462 ecap f0101a >> [ 0.030491] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap >> d2008020660462 ecap f010da >> >> The first IOMMU doesn't have Snoop Control, but its scope only extends to >> the integrated Intel Graphics Device. The second IOMMU, which covers the SD >> card reader I used for testing (and all other PCI devices), *does* have >> Snoop Control. 0xf010da & 0x80 = 0x80. >> >> However, SC support should not matter, due to kernel commit >> fb279950ba02e3210a16b11ecfa8871f3ee0ca49, so I think we might be looking an >> independent issue. (Possibly in CpuDxe, only exposed by the new host kernel.) > > Snoop Control is what I immediately jumped to as well, but I've never > seen the problem on my desktop, which reports the following ecaps: > f0101a & f0105a. Thanks,
This issue is actually very easy to reproduce on my workstation. It has 1 socket, 4 cores, 2 threads/core. If I start a guest with 32 VCPUs (i.e., heavily over-subscribing the host), then the issue hits immediately. Device assignment is not necessary. But, it also reproduces fairly regularly with just 8 VCPUs (even with VCPUs pinned to separate logical processors on the host). Judging purely from the KVM trace captured during such a 32-VCPU startup, I'm suspecting a general thread scheduling (or signaling) change sometime in the v4.1..v4.2 interval. It takes *extremely* long after the second SIPI (which should wake the AP threads) until the first "kvm_entry" message shows up for the last nonzero VCPU. Below is an excerpt. The first line shows the 2nd SIPI being delivered to the last VCPU. The lines after that show the first time that each nonzero VCPU reports in with "kvm_entry". qemu-system-x86-19114 [005] 6668.931183: kvm_apic_accept_irq: apicid 1f vec 159 (SIPI|edge) ... qemu-system-x86-19136 [006] 6668.931953: kvm_entry: vcpu 22 qemu-system-x86-19133 [003] 6668.931953: kvm_entry: vcpu 19 qemu-system-x86-19138 [007] 6668.931953: kvm_entry: vcpu 24 qemu-system-x86-19118 [004] 6668.934967: kvm_entry: vcpu 4 qemu-system-x86-19119 [006] 6668.941502: kvm_entry: vcpu 5 qemu-system-x86-19121 [004] 6668.944957: kvm_entry: vcpu 7 qemu-system-x86-19124 [001] 6668.944957: kvm_entry: vcpu 10 qemu-system-x86-19137 [000] 6668.948561: kvm_entry: vcpu 23 qemu-system-x86-19140 [002] 6668.948561: kvm_entry: vcpu 26 qemu-system-x86-19131 [003] 6668.948561: kvm_entry: vcpu 17 qemu-system-x86-19122 [007] 6668.948561: kvm_entry: vcpu 8 qemu-system-x86-19120 [001] 6668.964832: kvm_entry: vcpu 6 qemu-system-x86-19142 [002] 6668.964962: kvm_entry: vcpu 28 qemu-system-x86-19125 [000] 6668.964964: kvm_entry: vcpu 11 qemu-system-x86-19139 [007] 6668.964965: kvm_entry: vcpu 25 qemu-system-x86-19123 [002] 6668.968954: kvm_entry: vcpu 9 qemu-system-x86-19130 [000] 6668.968955: kvm_entry: vcpu 16 qemu-system-x86-19144 [000] 6668.974965: kvm_entry: vcpu 30 qemu-system-x86-19132 [003] 6668.974967: kvm_entry: vcpu 18 qemu-system-x86-19126 [002] 6668.981415: kvm_entry: vcpu 12 qemu-system-x86-19143 [003] 6668.981416: kvm_entry: vcpu 29 qemu-system-x86-19134 [000] 6668.981418: kvm_entry: vcpu 20 qemu-system-x86-19127 [003] 6669.002546: kvm_entry: vcpu 13 qemu-system-x86-19117 [000] 6669.004972: kvm_entry: vcpu 3 qemu-system-x86-19135 [003] 6669.004973: kvm_entry: vcpu 21 qemu-system-x86-19145 [002] 6669.004974: kvm_entry: vcpu 31 qemu-system-x86-19116 [005] 6669.005002: kvm_entry: vcpu 2 qemu-system-x86-19128 [000] 6669.014976: kvm_entry: vcpu 14 qemu-system-x86-19129 [002] 6669.015058: kvm_entry: vcpu 15 qemu-system-x86-19115 [002] 6669.029626: kvm_entry: vcpu 1 qemu-system-x86-19141 [005] 6669.034981: kvm_entry: vcpu 27 The last VCPU (#27) reports in 6669.034981 - 6668.931183 = 103.798 ms after the second SIPI's delivery is completed. That is about twice as long as the virtual BSP is willing to wait for the virtual AP detection to complete. (The default timeout is 50 ms.) For comparison, within that 50 ms, the virtual BSP manages to read the emulated ACPI PM timer a whopping 6443 times! That is the same number of userspace exits, QEMU doing the emulation, etc. I haven't compared this to v4.1 numbers (yet?), but the inter-thread latency seems quite bad. (I'm sure it boosts throughput though.) I'll try to bisect this: the v4.1..b18d5431acc range. Hopefully this problem was introduced *before* b18d5431acc would mask it with the very long LZMA decompression time. Thanks Laszlo _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel