On 2015-11-02 12:16:12, Laszlo Ersek wrote: > On 11/02/15 19:53, Alex Williamson wrote: > > On Mon, 2015-11-02 at 19:31 +0100, Laszlo Ersek wrote: > >> On 10/31/15 18:50, Laszlo Ersek wrote: > >> > >>> I'm very sorry, but I don't think I can spend time on this, unless > >>> someone gives me ssh and/or console access to a host that readily > >>> reproduces the bug, with the latest kvm/master, qemu, and ekd2 > >>> builds. > >> > >> We just got lucky, the problem reproduces on my oldie workstation (HP > >> Z400, (R) Xeon(R) CPU W3550 @ 3.07GHz, family/model/stepping/microcode = > >> 6/26/5/0x19). > >> > >> OVMF works fine with the most recent Fedora 21 host kernel, > >> "kernel-4.1.10-100.fc21.x86_64", and it breaks on the crispy new upstream > >> Linux 4.3 release. > >> > >> The OVMF log contains a number of failed ASSERT()s, printed by the APs, in > >> a way that is very characteristic of multi-threading bugs: > >> > >> --*-- > >> > >> Detect CPU count: 2 > >> AASSSSEERRTT > >> //hhoomem/el/alcaocso/ss/rscr/cu/pusptsrteraema/me/dekd2k-2g-igti-ts-vsnv/nM/dMedPekPgk/gL/iLbirbarrayr/yB/aBsaesSeySnycnhcrhornoiznatiizoantLiiobn/LSiybn/cShyrnocnhirzoantiizoantGicocn.Gcc(c1.9c8()1:9 > >> 8L)o:c kLVoaclkuVea l=u=e (=(=U I(N(TUNI)N T2N)) |2|) L|o|c > >> kLVoaclkuVea l=u=e (=(=U I(N(TUNI)N T1N))^M > >> 1) > >> ASSERT > >> /home/lacos/src/upstream/edk2-git-svn/MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(198): > >> LockValue == ((UINTN) 2) || LockValue == ((UINTN) 1) > >> > >> --*-- > >> > >> (Note that the above is for a 8 VCPU guest. Cf. "Detect CPU count: 2".) > >> > >> > >> Maybe important, maybe not: my workstation's IOMMU does *not* have snoop > >> control: > >> > >> [ 3.043297] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap > >> c90780106f0462 ecap f02076 > >> > >> Where 0xf02076 & 0x80 == 0. > >> > >> Whereas my laptop *does* have Snoop Control: > >> > >> [ 0.030486] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap > >> c0000020660462 ecap f0101a > >> [ 0.030491] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap > >> d2008020660462 ecap f010da > >> > >> The first IOMMU doesn't have Snoop Control, but its scope only extends to > >> the integrated Intel Graphics Device. The second IOMMU, which covers the > >> SD card reader I used for testing (and all other PCI devices), *does* have > >> Snoop Control. 0xf010da & 0x80 = 0x80. > >> > >> However, SC support should not matter, due to kernel commit > >> fb279950ba02e3210a16b11ecfa8871f3ee0ca49, so I think we might be looking > >> an independent issue. (Possibly in CpuDxe, only exposed by the new host > >> kernel.) > > > > Snoop Control is what I immediately jumped to as well, but I've never > > seen the problem on my desktop, which reports the following ecaps: > > f0101a & f0105a. Thanks, > > This issue is actually very easy to reproduce on my workstation. It > has 1 socket, 4 cores, 2 threads/core. If I start a guest with 32 > VCPUs (i.e., heavily over-subscribing the host), then the issue hits > immediately. Device assignment is not necessary. > > But, it also reproduces fairly regularly with just 8 VCPUs (even > with VCPUs pinned to separate logical processors on the host). > > Judging purely from the KVM trace captured during such a 32-VCPU > startup, I'm suspecting a general thread scheduling (or signaling) > change sometime in the v4.1..v4.2 interval. It takes *extremely* > long after the second SIPI (which should wake the AP threads) until > the first "kvm_entry" message shows up for the last nonzero VCPU. > > Below is an excerpt. The first line shows the 2nd SIPI being > delivered to the last VCPU. The lines after that show the first time > that each nonzero VCPU reports in with "kvm_entry". > > qemu-system-x86-19114 [005] 6668.931183: kvm_apic_accept_irq: apicid 1f > vec 159 (SIPI|edge) > ... > qemu-system-x86-19136 [006] 6668.931953: kvm_entry: vcpu 22 > qemu-system-x86-19133 [003] 6668.931953: kvm_entry: vcpu 19 > qemu-system-x86-19138 [007] 6668.931953: kvm_entry: vcpu 24 > qemu-system-x86-19118 [004] 6668.934967: kvm_entry: vcpu 4 > qemu-system-x86-19119 [006] 6668.941502: kvm_entry: vcpu 5 > qemu-system-x86-19121 [004] 6668.944957: kvm_entry: vcpu 7 > qemu-system-x86-19124 [001] 6668.944957: kvm_entry: vcpu 10 > qemu-system-x86-19137 [000] 6668.948561: kvm_entry: vcpu 23 > qemu-system-x86-19140 [002] 6668.948561: kvm_entry: vcpu 26 > qemu-system-x86-19131 [003] 6668.948561: kvm_entry: vcpu 17 > qemu-system-x86-19122 [007] 6668.948561: kvm_entry: vcpu 8 > qemu-system-x86-19120 [001] 6668.964832: kvm_entry: vcpu 6 > qemu-system-x86-19142 [002] 6668.964962: kvm_entry: vcpu 28 > qemu-system-x86-19125 [000] 6668.964964: kvm_entry: vcpu 11 > qemu-system-x86-19139 [007] 6668.964965: kvm_entry: vcpu 25 > qemu-system-x86-19123 [002] 6668.968954: kvm_entry: vcpu 9 > qemu-system-x86-19130 [000] 6668.968955: kvm_entry: vcpu 16 > qemu-system-x86-19144 [000] 6668.974965: kvm_entry: vcpu 30 > qemu-system-x86-19132 [003] 6668.974967: kvm_entry: vcpu 18 > qemu-system-x86-19126 [002] 6668.981415: kvm_entry: vcpu 12 > qemu-system-x86-19143 [003] 6668.981416: kvm_entry: vcpu 29 > qemu-system-x86-19134 [000] 6668.981418: kvm_entry: vcpu 20 > qemu-system-x86-19127 [003] 6669.002546: kvm_entry: vcpu 13 > qemu-system-x86-19117 [000] 6669.004972: kvm_entry: vcpu 3 > qemu-system-x86-19135 [003] 6669.004973: kvm_entry: vcpu 21 > qemu-system-x86-19145 [002] 6669.004974: kvm_entry: vcpu 31 > qemu-system-x86-19116 [005] 6669.005002: kvm_entry: vcpu 2 > qemu-system-x86-19128 [000] 6669.014976: kvm_entry: vcpu 14 > qemu-system-x86-19129 [002] 6669.015058: kvm_entry: vcpu 15 > qemu-system-x86-19115 [002] 6669.029626: kvm_entry: vcpu 1 > qemu-system-x86-19141 [005] 6669.034981: kvm_entry: vcpu 27 > > The last VCPU (#27) reports in 6669.034981 - 6668.931183 = 103.798 > ms after the second SIPI's delivery is completed. > > That is about twice as long as the virtual BSP is willing to wait > for the virtual AP detection to complete. (The default timeout is 50 > ms.)
This doesn't appear to be reproducing the same issue that Janusz is seeing. In that case even a *10 second* timeout was not enough for 8 processors to start up. -Jordan > For comparison, within that 50 ms, the virtual BSP manages to read > the emulated ACPI PM timer a whopping 6443 times! That is the same > number of userspace exits, QEMU doing the emulation, etc. > > I haven't compared this to v4.1 numbers (yet?), but the inter-thread > latency seems quite bad. (I'm sure it boosts throughput though.) > > I'll try to bisect this: the v4.1..b18d5431acc range. Hopefully this > problem was introduced *before* b18d5431acc would mask it with the > very long LZMA decompression time. > > Thanks > Laszlo > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel