On 2015-11-02 12:16:12, Laszlo Ersek wrote:
> On 11/02/15 19:53, Alex Williamson wrote:
> > On Mon, 2015-11-02 at 19:31 +0100, Laszlo Ersek wrote:
> >> On 10/31/15 18:50, Laszlo Ersek wrote:
> >>
> >>> I'm very sorry, but I don't think I can spend time on this, unless
> >>> someone gives me ssh and/or console access to a host that readily
> >>> reproduces the bug, with the latest kvm/master, qemu, and ekd2
> >>> builds.
> >>
> >> We just got lucky, the problem reproduces on my oldie workstation (HP 
> >> Z400, (R) Xeon(R) CPU W3550 @ 3.07GHz, family/model/stepping/microcode = 
> >> 6/26/5/0x19).
> >>
> >> OVMF works fine with the most recent Fedora 21 host kernel, 
> >> "kernel-4.1.10-100.fc21.x86_64", and it breaks on the crispy new upstream 
> >> Linux 4.3 release.
> >>
> >> The OVMF log contains a number of failed ASSERT()s, printed by the APs, in 
> >> a way that is very characteristic of multi-threading bugs:
> >>
> >> --*--
> >>
> >> Detect CPU count: 2
> >> AASSSSEERRTT  
> >> //hhoomem/el/alcaocso/ss/rscr/cu/pusptsrteraema/me/dekd2k-2g-igti-ts-vsnv/nM/dMedPekPgk/gL/iLbirbarrayr/yB/aBsaesSeySnycnhcrhornoiznatiizoantLiiobn/LSiybn/cShyrnocnhirzoantiizoantGicocn.Gcc(c1.9c8()1:9
> >>  8L)o:c kLVoaclkuVea l=u=e  (=(=U I(N(TUNI)N T2N))  |2|)  L|o|c 
> >> kLVoaclkuVea l=u=e  (=(=U I(N(TUNI)N T1N))^M 
> >> 1)
> >> ASSERT 
> >> /home/lacos/src/upstream/edk2-git-svn/MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(198):
> >>  LockValue == ((UINTN) 2) || LockValue == ((UINTN) 1)
> >>
> >> --*--
> >>
> >> (Note that the above is for a 8 VCPU guest. Cf. "Detect CPU count: 2".)
> >>
> >>
> >> Maybe important, maybe not: my workstation's IOMMU does *not* have snoop 
> >> control:
> >>
> >> [    3.043297] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 
> >> c90780106f0462 ecap f02076
> >>
> >> Where 0xf02076 & 0x80 == 0.
> >>
> >> Whereas my laptop *does* have Snoop Control:
> >>
> >> [    0.030486] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 
> >> c0000020660462 ecap f0101a
> >> [    0.030491] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap 
> >> d2008020660462 ecap f010da
> >>
> >> The first IOMMU doesn't have Snoop Control, but its scope only extends to 
> >> the integrated Intel Graphics Device. The second IOMMU, which covers the 
> >> SD card reader I used for testing (and all other PCI devices), *does* have 
> >> Snoop Control. 0xf010da & 0x80 = 0x80.
> >>
> >> However, SC support should not matter, due to kernel commit 
> >> fb279950ba02e3210a16b11ecfa8871f3ee0ca49, so I think we might be looking 
> >> an independent issue. (Possibly in CpuDxe, only exposed by the new host 
> >> kernel.)
> > 
> > Snoop Control is what I immediately jumped to as well, but I've never
> > seen the problem on my desktop, which reports the following ecaps:
> > f0101a & f0105a.  Thanks,
> 
> This issue is actually very easy to reproduce on my workstation. It
> has 1 socket, 4 cores, 2 threads/core. If I start a guest with 32
> VCPUs (i.e., heavily over-subscribing the host), then the issue hits
> immediately. Device assignment is not necessary.
> 
> But, it also reproduces fairly regularly with just 8 VCPUs (even
> with VCPUs pinned to separate logical processors on the host).
> 
> Judging purely from the KVM trace captured during such a 32-VCPU
> startup, I'm suspecting a general thread scheduling (or signaling)
> change sometime in the v4.1..v4.2 interval. It takes *extremely*
> long after the second SIPI (which should wake the AP threads) until
> the first "kvm_entry" message shows up for the last nonzero VCPU.
> 
> Below is an excerpt. The first line shows the 2nd SIPI being
> delivered to the last VCPU. The lines after that show the first time
> that each nonzero VCPU reports in with "kvm_entry".
> 
>  qemu-system-x86-19114 [005]  6668.931183: kvm_apic_accept_irq:  apicid 1f 
> vec 159 (SIPI|edge)
> ...
>  qemu-system-x86-19136  [006]  6668.931953:  kvm_entry:  vcpu  22
>  qemu-system-x86-19133  [003]  6668.931953:  kvm_entry:  vcpu  19
>  qemu-system-x86-19138  [007]  6668.931953:  kvm_entry:  vcpu  24
>  qemu-system-x86-19118  [004]  6668.934967:  kvm_entry:  vcpu  4
>  qemu-system-x86-19119  [006]  6668.941502:  kvm_entry:  vcpu  5
>  qemu-system-x86-19121  [004]  6668.944957:  kvm_entry:  vcpu  7
>  qemu-system-x86-19124  [001]  6668.944957:  kvm_entry:  vcpu  10
>  qemu-system-x86-19137  [000]  6668.948561:  kvm_entry:  vcpu  23
>  qemu-system-x86-19140  [002]  6668.948561:  kvm_entry:  vcpu  26
>  qemu-system-x86-19131  [003]  6668.948561:  kvm_entry:  vcpu  17
>  qemu-system-x86-19122  [007]  6668.948561:  kvm_entry:  vcpu  8
>  qemu-system-x86-19120  [001]  6668.964832:  kvm_entry:  vcpu  6
>  qemu-system-x86-19142  [002]  6668.964962:  kvm_entry:  vcpu  28
>  qemu-system-x86-19125  [000]  6668.964964:  kvm_entry:  vcpu  11
>  qemu-system-x86-19139  [007]  6668.964965:  kvm_entry:  vcpu  25
>  qemu-system-x86-19123  [002]  6668.968954:  kvm_entry:  vcpu  9
>  qemu-system-x86-19130  [000]  6668.968955:  kvm_entry:  vcpu  16
>  qemu-system-x86-19144  [000]  6668.974965:  kvm_entry:  vcpu  30
>  qemu-system-x86-19132  [003]  6668.974967:  kvm_entry:  vcpu  18
>  qemu-system-x86-19126  [002]  6668.981415:  kvm_entry:  vcpu  12
>  qemu-system-x86-19143  [003]  6668.981416:  kvm_entry:  vcpu  29
>  qemu-system-x86-19134  [000]  6668.981418:  kvm_entry:  vcpu  20
>  qemu-system-x86-19127  [003]  6669.002546:  kvm_entry:  vcpu  13
>  qemu-system-x86-19117  [000]  6669.004972:  kvm_entry:  vcpu  3
>  qemu-system-x86-19135  [003]  6669.004973:  kvm_entry:  vcpu  21
>  qemu-system-x86-19145  [002]  6669.004974:  kvm_entry:  vcpu  31
>  qemu-system-x86-19116  [005]  6669.005002:  kvm_entry:  vcpu  2
>  qemu-system-x86-19128  [000]  6669.014976:  kvm_entry:  vcpu  14
>  qemu-system-x86-19129  [002]  6669.015058:  kvm_entry:  vcpu  15
>  qemu-system-x86-19115  [002]  6669.029626:  kvm_entry:  vcpu  1
>  qemu-system-x86-19141  [005]  6669.034981:  kvm_entry:  vcpu  27
> 
> The last VCPU (#27) reports in 6669.034981 - 6668.931183 = 103.798
> ms after the second SIPI's delivery is completed.
> 
> That is about twice as long as the virtual BSP is willing to wait
> for the virtual AP detection to complete. (The default timeout is 50
> ms.)

This doesn't appear to be reproducing the same issue that Janusz is
seeing. In that case even a *10 second* timeout was not enough for 8
processors to start up.

-Jordan

> For comparison, within that 50 ms, the virtual BSP manages to read
> the emulated ACPI PM timer a whopping 6443 times! That is the same
> number of userspace exits, QEMU doing the emulation, etc.
> 
> I haven't compared this to v4.1 numbers (yet?), but the inter-thread
> latency seems quite bad. (I'm sure it boosts throughput though.)
> 
> I'll try to bisect this: the v4.1..b18d5431acc range. Hopefully this
> problem was introduced *before* b18d5431acc would mask it with the
> very long LZMA decompression time.
> 
> Thanks
> Laszlo
> 
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to