On 11/02/15 23:29, Jordan Justen wrote: > On 2015-11-02 12:16:12, Laszlo Ersek wrote: >> On 11/02/15 19:53, Alex Williamson wrote: >>> On Mon, 2015-11-02 at 19:31 +0100, Laszlo Ersek wrote: >>>> On 10/31/15 18:50, Laszlo Ersek wrote: >>>> >>>>> I'm very sorry, but I don't think I can spend time on this, unless >>>>> someone gives me ssh and/or console access to a host that readily >>>>> reproduces the bug, with the latest kvm/master, qemu, and ekd2 >>>>> builds. >>>> >>>> We just got lucky, the problem reproduces on my oldie workstation (HP >>>> Z400, (R) Xeon(R) CPU W3550 @ 3.07GHz, family/model/stepping/microcode = >>>> 6/26/5/0x19). >>>> >>>> OVMF works fine with the most recent Fedora 21 host kernel, >>>> "kernel-4.1.10-100.fc21.x86_64", and it breaks on the crispy new upstream >>>> Linux 4.3 release. >>>> >>>> The OVMF log contains a number of failed ASSERT()s, printed by the APs, in >>>> a way that is very characteristic of multi-threading bugs: >>>> >>>> --*-- >>>> >>>> Detect CPU count: 2 >>>> AASSSSEERRTT >>>> //hhoomem/el/alcaocso/ss/rscr/cu/pusptsrteraema/me/dekd2k-2g-igti-ts-vsnv/nM/dMedPekPgk/gL/iLbirbarrayr/yB/aBsaesSeySnycnhcrhornoiznatiizoantLiiobn/LSiybn/cShyrnocnhirzoantiizoantGicocn.Gcc(c1.9c8()1:9 >>>> 8L)o:c kLVoaclkuVea l=u=e (=(=U I(N(TUNI)N T2N)) |2|) L|o|c >>>> kLVoaclkuVea l=u=e (=(=U I(N(TUNI)N T1N))^M >>>> 1) >>>> ASSERT >>>> /home/lacos/src/upstream/edk2-git-svn/MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(198): >>>> LockValue == ((UINTN) 2) || LockValue == ((UINTN) 1) >>>> >>>> --*-- >>>> >>>> (Note that the above is for a 8 VCPU guest. Cf. "Detect CPU count: 2".) >>>> >>>> >>>> Maybe important, maybe not: my workstation's IOMMU does *not* have snoop >>>> control: >>>> >>>> [ 3.043297] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap >>>> c90780106f0462 ecap f02076 >>>> >>>> Where 0xf02076 & 0x80 == 0. >>>> >>>> Whereas my laptop *does* have Snoop Control: >>>> >>>> [ 0.030486] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap >>>> c0000020660462 ecap f0101a >>>> [ 0.030491] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap >>>> d2008020660462 ecap f010da >>>> >>>> The first IOMMU doesn't have Snoop Control, but its scope only extends to >>>> the integrated Intel Graphics Device. The second IOMMU, which covers the >>>> SD card reader I used for testing (and all other PCI devices), *does* have >>>> Snoop Control. 0xf010da & 0x80 = 0x80. >>>> >>>> However, SC support should not matter, due to kernel commit >>>> fb279950ba02e3210a16b11ecfa8871f3ee0ca49, so I think we might be looking >>>> an independent issue. (Possibly in CpuDxe, only exposed by the new host >>>> kernel.) >>> >>> Snoop Control is what I immediately jumped to as well, but I've never >>> seen the problem on my desktop, which reports the following ecaps: >>> f0101a & f0105a. Thanks, >> >> This issue is actually very easy to reproduce on my workstation. It >> has 1 socket, 4 cores, 2 threads/core. If I start a guest with 32 >> VCPUs (i.e., heavily over-subscribing the host), then the issue hits >> immediately. Device assignment is not necessary. >> >> But, it also reproduces fairly regularly with just 8 VCPUs (even >> with VCPUs pinned to separate logical processors on the host). >> >> Judging purely from the KVM trace captured during such a 32-VCPU >> startup, I'm suspecting a general thread scheduling (or signaling) >> change sometime in the v4.1..v4.2 interval. It takes *extremely* >> long after the second SIPI (which should wake the AP threads) until >> the first "kvm_entry" message shows up for the last nonzero VCPU. >> >> Below is an excerpt. The first line shows the 2nd SIPI being >> delivered to the last VCPU. The lines after that show the first time >> that each nonzero VCPU reports in with "kvm_entry". >> >> qemu-system-x86-19114 [005] 6668.931183: kvm_apic_accept_irq: apicid 1f >> vec 159 (SIPI|edge) >> ... >> qemu-system-x86-19136 [006] 6668.931953: kvm_entry: vcpu 22 >> qemu-system-x86-19133 [003] 6668.931953: kvm_entry: vcpu 19 >> qemu-system-x86-19138 [007] 6668.931953: kvm_entry: vcpu 24 >> qemu-system-x86-19118 [004] 6668.934967: kvm_entry: vcpu 4 >> qemu-system-x86-19119 [006] 6668.941502: kvm_entry: vcpu 5 >> qemu-system-x86-19121 [004] 6668.944957: kvm_entry: vcpu 7 >> qemu-system-x86-19124 [001] 6668.944957: kvm_entry: vcpu 10 >> qemu-system-x86-19137 [000] 6668.948561: kvm_entry: vcpu 23 >> qemu-system-x86-19140 [002] 6668.948561: kvm_entry: vcpu 26 >> qemu-system-x86-19131 [003] 6668.948561: kvm_entry: vcpu 17 >> qemu-system-x86-19122 [007] 6668.948561: kvm_entry: vcpu 8 >> qemu-system-x86-19120 [001] 6668.964832: kvm_entry: vcpu 6 >> qemu-system-x86-19142 [002] 6668.964962: kvm_entry: vcpu 28 >> qemu-system-x86-19125 [000] 6668.964964: kvm_entry: vcpu 11 >> qemu-system-x86-19139 [007] 6668.964965: kvm_entry: vcpu 25 >> qemu-system-x86-19123 [002] 6668.968954: kvm_entry: vcpu 9 >> qemu-system-x86-19130 [000] 6668.968955: kvm_entry: vcpu 16 >> qemu-system-x86-19144 [000] 6668.974965: kvm_entry: vcpu 30 >> qemu-system-x86-19132 [003] 6668.974967: kvm_entry: vcpu 18 >> qemu-system-x86-19126 [002] 6668.981415: kvm_entry: vcpu 12 >> qemu-system-x86-19143 [003] 6668.981416: kvm_entry: vcpu 29 >> qemu-system-x86-19134 [000] 6668.981418: kvm_entry: vcpu 20 >> qemu-system-x86-19127 [003] 6669.002546: kvm_entry: vcpu 13 >> qemu-system-x86-19117 [000] 6669.004972: kvm_entry: vcpu 3 >> qemu-system-x86-19135 [003] 6669.004973: kvm_entry: vcpu 21 >> qemu-system-x86-19145 [002] 6669.004974: kvm_entry: vcpu 31 >> qemu-system-x86-19116 [005] 6669.005002: kvm_entry: vcpu 2 >> qemu-system-x86-19128 [000] 6669.014976: kvm_entry: vcpu 14 >> qemu-system-x86-19129 [002] 6669.015058: kvm_entry: vcpu 15 >> qemu-system-x86-19115 [002] 6669.029626: kvm_entry: vcpu 1 >> qemu-system-x86-19141 [005] 6669.034981: kvm_entry: vcpu 27 >> >> The last VCPU (#27) reports in 6669.034981 - 6668.931183 = 103.798 >> ms after the second SIPI's delivery is completed. >> >> That is about twice as long as the virtual BSP is willing to wait >> for the virtual AP detection to complete. (The default timeout is 50 >> ms.) > > This doesn't appear to be reproducing the same issue that Janusz is > seeing.
You might be correct, but I can only work with what I can reproduce. > In that case even a *10 second* timeout was not enough for 8 > processors to start up. Note that I didn't try to await until the AP enumeration *completed* in the problematic cases. The only thing I "measured" in the above was the (nearly) *first* sign of life for each VCPU, after the INIT-SIPI-SIPI. Your timeout depends on the *last* relevant act of the "slowest" CPU. In any case, even if this issue is independent, the success vs. failure question is pretty clear on my workstation, wherever I am in the bisection. Assuming I end up with a relevant-looking culprit commit, we could do something about it, and ask Janusz to test it too. Thanks Laszlo > > -Jordan > >> For comparison, within that 50 ms, the virtual BSP manages to read >> the emulated ACPI PM timer a whopping 6443 times! That is the same >> number of userspace exits, QEMU doing the emulation, etc. >> >> I haven't compared this to v4.1 numbers (yet?), but the inter-thread >> latency seems quite bad. (I'm sure it boosts throughput though.) >> >> I'll try to bisect this: the v4.1..b18d5431acc range. Hopefully this >> problem was introduced *before* b18d5431acc would mask it with the >> very long LZMA decompression time. >> >> Thanks >> Laszlo >> _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel