On 11/02/15 23:29, Jordan Justen wrote:
> On 2015-11-02 12:16:12, Laszlo Ersek wrote:
>> On 11/02/15 19:53, Alex Williamson wrote:
>>> On Mon, 2015-11-02 at 19:31 +0100, Laszlo Ersek wrote:
>>>> On 10/31/15 18:50, Laszlo Ersek wrote:
>>>>
>>>>> I'm very sorry, but I don't think I can spend time on this, unless
>>>>> someone gives me ssh and/or console access to a host that readily
>>>>> reproduces the bug, with the latest kvm/master, qemu, and ekd2
>>>>> builds.
>>>>
>>>> We just got lucky, the problem reproduces on my oldie workstation (HP 
>>>> Z400, (R) Xeon(R) CPU W3550 @ 3.07GHz, family/model/stepping/microcode = 
>>>> 6/26/5/0x19).
>>>>
>>>> OVMF works fine with the most recent Fedora 21 host kernel, 
>>>> "kernel-4.1.10-100.fc21.x86_64", and it breaks on the crispy new upstream 
>>>> Linux 4.3 release.
>>>>
>>>> The OVMF log contains a number of failed ASSERT()s, printed by the APs, in 
>>>> a way that is very characteristic of multi-threading bugs:
>>>>
>>>> --*--
>>>>
>>>> Detect CPU count: 2
>>>> AASSSSEERRTT  
>>>> //hhoomem/el/alcaocso/ss/rscr/cu/pusptsrteraema/me/dekd2k-2g-igti-ts-vsnv/nM/dMedPekPgk/gL/iLbirbarrayr/yB/aBsaesSeySnycnhcrhornoiznatiizoantLiiobn/LSiybn/cShyrnocnhirzoantiizoantGicocn.Gcc(c1.9c8()1:9
>>>>  8L)o:c kLVoaclkuVea l=u=e  (=(=U I(N(TUNI)N T2N))  |2|)  L|o|c 
>>>> kLVoaclkuVea l=u=e  (=(=U I(N(TUNI)N T1N))^M 
>>>> 1)
>>>> ASSERT 
>>>> /home/lacos/src/upstream/edk2-git-svn/MdePkg/Library/BaseSynchronizationLib/SynchronizationGcc.c(198):
>>>>  LockValue == ((UINTN) 2) || LockValue == ((UINTN) 1)
>>>>
>>>> --*--
>>>>
>>>> (Note that the above is for a 8 VCPU guest. Cf. "Detect CPU count: 2".)
>>>>
>>>>
>>>> Maybe important, maybe not: my workstation's IOMMU does *not* have snoop 
>>>> control:
>>>>
>>>> [    3.043297] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 
>>>> c90780106f0462 ecap f02076
>>>>
>>>> Where 0xf02076 & 0x80 == 0.
>>>>
>>>> Whereas my laptop *does* have Snoop Control:
>>>>
>>>> [    0.030486] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 
>>>> c0000020660462 ecap f0101a
>>>> [    0.030491] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap 
>>>> d2008020660462 ecap f010da
>>>>
>>>> The first IOMMU doesn't have Snoop Control, but its scope only extends to 
>>>> the integrated Intel Graphics Device. The second IOMMU, which covers the 
>>>> SD card reader I used for testing (and all other PCI devices), *does* have 
>>>> Snoop Control. 0xf010da & 0x80 = 0x80.
>>>>
>>>> However, SC support should not matter, due to kernel commit 
>>>> fb279950ba02e3210a16b11ecfa8871f3ee0ca49, so I think we might be looking 
>>>> an independent issue. (Possibly in CpuDxe, only exposed by the new host 
>>>> kernel.)
>>>
>>> Snoop Control is what I immediately jumped to as well, but I've never
>>> seen the problem on my desktop, which reports the following ecaps:
>>> f0101a & f0105a.  Thanks,
>>
>> This issue is actually very easy to reproduce on my workstation. It
>> has 1 socket, 4 cores, 2 threads/core. If I start a guest with 32
>> VCPUs (i.e., heavily over-subscribing the host), then the issue hits
>> immediately. Device assignment is not necessary.
>>
>> But, it also reproduces fairly regularly with just 8 VCPUs (even
>> with VCPUs pinned to separate logical processors on the host).
>>
>> Judging purely from the KVM trace captured during such a 32-VCPU
>> startup, I'm suspecting a general thread scheduling (or signaling)
>> change sometime in the v4.1..v4.2 interval. It takes *extremely*
>> long after the second SIPI (which should wake the AP threads) until
>> the first "kvm_entry" message shows up for the last nonzero VCPU.
>>
>> Below is an excerpt. The first line shows the 2nd SIPI being
>> delivered to the last VCPU. The lines after that show the first time
>> that each nonzero VCPU reports in with "kvm_entry".
>>
>>  qemu-system-x86-19114 [005]  6668.931183: kvm_apic_accept_irq:  apicid 1f 
>> vec 159 (SIPI|edge)
>> ...
>>  qemu-system-x86-19136  [006]  6668.931953:  kvm_entry:  vcpu  22
>>  qemu-system-x86-19133  [003]  6668.931953:  kvm_entry:  vcpu  19
>>  qemu-system-x86-19138  [007]  6668.931953:  kvm_entry:  vcpu  24
>>  qemu-system-x86-19118  [004]  6668.934967:  kvm_entry:  vcpu  4
>>  qemu-system-x86-19119  [006]  6668.941502:  kvm_entry:  vcpu  5
>>  qemu-system-x86-19121  [004]  6668.944957:  kvm_entry:  vcpu  7
>>  qemu-system-x86-19124  [001]  6668.944957:  kvm_entry:  vcpu  10
>>  qemu-system-x86-19137  [000]  6668.948561:  kvm_entry:  vcpu  23
>>  qemu-system-x86-19140  [002]  6668.948561:  kvm_entry:  vcpu  26
>>  qemu-system-x86-19131  [003]  6668.948561:  kvm_entry:  vcpu  17
>>  qemu-system-x86-19122  [007]  6668.948561:  kvm_entry:  vcpu  8
>>  qemu-system-x86-19120  [001]  6668.964832:  kvm_entry:  vcpu  6
>>  qemu-system-x86-19142  [002]  6668.964962:  kvm_entry:  vcpu  28
>>  qemu-system-x86-19125  [000]  6668.964964:  kvm_entry:  vcpu  11
>>  qemu-system-x86-19139  [007]  6668.964965:  kvm_entry:  vcpu  25
>>  qemu-system-x86-19123  [002]  6668.968954:  kvm_entry:  vcpu  9
>>  qemu-system-x86-19130  [000]  6668.968955:  kvm_entry:  vcpu  16
>>  qemu-system-x86-19144  [000]  6668.974965:  kvm_entry:  vcpu  30
>>  qemu-system-x86-19132  [003]  6668.974967:  kvm_entry:  vcpu  18
>>  qemu-system-x86-19126  [002]  6668.981415:  kvm_entry:  vcpu  12
>>  qemu-system-x86-19143  [003]  6668.981416:  kvm_entry:  vcpu  29
>>  qemu-system-x86-19134  [000]  6668.981418:  kvm_entry:  vcpu  20
>>  qemu-system-x86-19127  [003]  6669.002546:  kvm_entry:  vcpu  13
>>  qemu-system-x86-19117  [000]  6669.004972:  kvm_entry:  vcpu  3
>>  qemu-system-x86-19135  [003]  6669.004973:  kvm_entry:  vcpu  21
>>  qemu-system-x86-19145  [002]  6669.004974:  kvm_entry:  vcpu  31
>>  qemu-system-x86-19116  [005]  6669.005002:  kvm_entry:  vcpu  2
>>  qemu-system-x86-19128  [000]  6669.014976:  kvm_entry:  vcpu  14
>>  qemu-system-x86-19129  [002]  6669.015058:  kvm_entry:  vcpu  15
>>  qemu-system-x86-19115  [002]  6669.029626:  kvm_entry:  vcpu  1
>>  qemu-system-x86-19141  [005]  6669.034981:  kvm_entry:  vcpu  27
>>
>> The last VCPU (#27) reports in 6669.034981 - 6668.931183 = 103.798
>> ms after the second SIPI's delivery is completed.
>>
>> That is about twice as long as the virtual BSP is willing to wait
>> for the virtual AP detection to complete. (The default timeout is 50
>> ms.)
> 
> This doesn't appear to be reproducing the same issue that Janusz is
> seeing.

You might be correct, but I can only work with what I can reproduce.

> In that case even a *10 second* timeout was not enough for 8
> processors to start up.

Note that I didn't try to await until the AP enumeration *completed* in
the problematic cases. The only thing I "measured" in the above was the
(nearly) *first* sign of life for each VCPU, after the INIT-SIPI-SIPI.
Your timeout depends on the *last* relevant act of the "slowest" CPU.

In any case, even if this issue is independent, the success vs. failure
question is pretty clear on my workstation, wherever I am in the
bisection. Assuming I end up with a relevant-looking culprit commit, we
could do something about it, and ask Janusz to test it too.

Thanks
Laszlo

> 
> -Jordan
> 
>> For comparison, within that 50 ms, the virtual BSP manages to read
>> the emulated ACPI PM timer a whopping 6443 times! That is the same
>> number of userspace exits, QEMU doing the emulation, etc.
>>
>> I haven't compared this to v4.1 numbers (yet?), but the inter-thread
>> latency seems quite bad. (I'm sure it boosts throughput though.)
>>
>> I'll try to bisect this: the v4.1..b18d5431acc range. Hopefully this
>> problem was introduced *before* b18d5431acc would mask it with the
>> very long LZMA decompression time.
>>
>> Thanks
>> Laszlo
>>

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to