CC'ing Xiao and Alex again.

On 10/29/15 19:39, Jordan Justen wrote:
> On 2015-10-29 04:45:37, Laszlo Ersek wrote:
>> On 10/29/15 02:32, Jordan Justen wrote:
>>> +    ASSERT (MaxProcessors > 0);
>>> +    PcdSet32 (PcdCpuMaxLogicalProcessorNumber, MaxProcessors);
>>
>> I think that when this branch is active, then
>> PcdCpuApInitTimeOutInMicroSeconds should *also* be set, namely to
>> MAX_UINT32 (~71 minutes, the closest we can get to "infinity"). When
>> this hint is available from QEMU, then we should practically disable
>> the timeout option in CpuDxe's AP counting.
>
> I think this is a good idea, but I don't think 71 minutes is useful.
> Perhaps 30 seconds? This seems more than adequate for hundreds of
> processors to startup. Or perhaps some timeout based on the number of
> processors?
>
> Janusz and I were discussing
> https://github.com/tianocore/edk2/issues/21 on irc. We increased the
> timeout to 10 seconds, and with only 8 processors it was still timing
> out.
>
> Obviously we are somehow failing to start the processors correctly, or
> QEMU/KVM is doing something wrong.
>
> Have you been able to reproduce this issue? It seems like we need to
> set the timeout to 71 minutes, and then debug QEMU/KVM to see what
> state the APs are in...
>
> Unfortunately I haven't yet been able to reproduce the bug on my
> system. :(

I've been staring at the following things for a few tens of minutes now:

(1) Kernel commit b18d5431acc7. Note that the commit changes the return
    value of the vmx_get_mt_mask() function *exactly* in the following
    case:

      kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
      (kvm_read_cr0(vcpu) & X86_CR0_CD)

    The first sub-condition is satisfied by GPU passthrough / device
    assignment, I think; the second part depends on the VCPU having
    turned on (or having *left* on) CR0.CD.

(2) Consult the vmx_vcpu_reset() function in "arch/x86/kvm/vmx.c"
    (current upstream). You will find:

        cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
        vmx_set_cr0(vcpu, cr0); /* enter rmode */

    Meaning a VCPU will start with CD and NW set, in real mode, after
    re-set.

    This setting dates back to the birth of KVM:

      commit 6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7
      Author: Avi Kivity <a...@qumranet.com>
      Date: Sun Dec 10 02:21:36 2006 -0800

          [PATCH] kvm: userspace interface

    Search that commit for "0x60000010" (the second hit, although the
    comment that contains the first hit is quite telling as well).

(3) Consult the Intel SDM, Table 11-5. "Cache Operating Modes".

    The (CD, NW) == (1, 1) setting in CR0 is documented as:
    - "Memory coherency is not maintained."
    - "(P6 family and Pentium processors.) State of the processor after
      a power up or reset. "
    - [in footnote 2] "The Pentium 4 and more recent processor families
      do not support this mode; setting the CD and NW bits to 1 selects
      the no-fill cache mode."

    In other words, the settings implemented by vmx_vcpu_reset()
    actually invoke the behavior of the "no-fill cache mode" (which is
    (CD, NW) == (1, 0)) for all practical purposes.

(4) Same reference.

    The (CD, NW) == (1, 0) setting in CR0 is documented as:
    - "No-fill Cache Mode. Memory coherency is maintained."
    - "(Pentium 4 and later processor families.) State of processor
      after a power up or reset. "

(5) The AsmEnableCache() function in
    "MdePkg/Library/BaseLib/Ia32/EnableCache.c". It clears both CD and
    NW in CR0.

(6) This setting ((CD, NW) == (0, 0))is documented in the Intel SDM as:
    - "Normal Cache Mode. Highest performance cache operation."

(7) The AsmEnableCache() function is invoked by MtrrLib
    [UefiCpuPkg/Library/MtrrLib/MtrrLib.c] after any and all MTRR
    changes. Consider:

    PostMtrrChange() | MtrrSetAllMtrrs()
      PostMtrrChangeEnableCache()
        AsmEnableCache()

    Where MtrrSetAllMtrrs() is a public function of the library; plus
    PostMtrrChange() is invoked by all of the following public
    functions:

    - MtrrSetMemoryAttribute()
    - MtrrSetVariableMtrr()
    - MtrrSetFixedMtrr()

(8) Because we call MtrrLib in PlatformPei first, there are two
    consequences:

    (a) The boot VCPU has CR0.CD *set* in all parts of OVMF that run
        earlier than that.

        This caused a widely reported boot perf regression in SEC (the
        LZMA decompression). Ultimately another MTRR change in KVM was
        reverted, so (as far as I know) this symptom has not been seen
        recently. (In any case, we should probably fix this sometime...)

    (b) The other consequence is that the boot VCPU's CR0.CD is clear in
        the rest of OVMF. Which is what makes its speed acceptable, I
        guess (as long as no APs are started up).

(9) Our AP startup code massages CR0, but only for mode switches. CR0.CD
    and CR0.NW are never touched.

    Now, I guess this could be easily added to the assembly encoded as a
    C array ("mStartupCodeTemplate" in "UefiCpuPkg/CpuDxe/ApStartup.c")
    -- when cr0 is massaged anyway, just clear bits 29 and 30 too; same
    as in AsmEnableCache().

    However, for testing the idea, perhaps the following one-liner
    suffices too -- this is the earliest an AP executes C code:

> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c
> index 3f56faa..e7f5b41 100644
> --- a/UefiCpuPkg/CpuDxe/CpuMp.c
> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c
> @@ -1451,6 +1451,8 @@ ApEntryPointInC (
>    VOID*           TopOfApStack;
>    UINTN           ProcessorNumber;
>
> +  AsmEnableCache ();
> +
>    if (!mAPsAlreadyInitFinished) {
>      FillInProcessorInformation (FALSE, mMpSystemData.NumberOfProcessors);
>      TopOfApStack  = (UINT8*)mApStackStart + gApStackSize;

    This should clear CR0.CD, and "undo" kernel commit b18d5431acc7 for
    the AP (by falsifying the second subcondition seen in (1)).

Janusz, can you please test this one-liner (with no other out-of-tree
patch applied)?

Thanks!
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to