This is going to be another long email... Jump to bullet (6) if you only care about my summary.
On 11/03/15 08:22, Jordan Justen wrote: > On 2015-11-02 17:14:12, Laszlo Ersek wrote: >> Anyway, with the following host kernel change, the AP startup problem >> goes away (tested on top of v4.3): >> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 9a9a198..4f978ad 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -622,7 +622,8 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long >>> cr0) >>> if ((cr0 ^ old_cr0) & update_bits) >>> kvm_mmu_reset_context(vcpu); >>> >>> - if ((cr0 ^ old_cr0) & X86_CR0_CD) >>> + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED) && >>> + (cr0 ^ old_cr0) & X86_CR0_CD) >>> kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL); >>> >>> return 0; >> >> (Honestly I just imitated fb279950ba here; I'm not making any better >> argument for this diff. But, independently, I wonder why this hunk >> didn't have the noncoherent DMA check either, originally.) >> >> Janusz, could you rebuild your host kernel with this patch, and share >> the results? (I'm also attaching the same as a formatted patch, so >> you can apply it with "git am" easily.) > > One note related to decompression caching. I also created this patch: > > https://github.com/jljusten/edk2/commit/7a1d3799 > > In my https://github.com/jljusten/edk2/tree/decompress-mtrr branch. > > Infernix tried it and it didn't seem to totally fix the issue. (See > comments on https://github.com/tianocore/edk2/issues/21) I don't know > if Janusz got a chance to try it. > > I didn't clean it up and send it out, since I didn't have any evidence > that it helped... (1) There is now evidence that the modifications that kernel commit b18d5431acc7 KVM: x86: fix CR0.CD virtualization did to two separate functions: - vmx_get_mt_mask() - kvm_set_cr0() cause separate symptoms with OVMF. The vmx_get_mt_mask() change causes the slowdown with the LZMA decompression in SEC, and the kvm_set_cr0() change causes the delay with the AP startup in CpuDxe. Isolating the latter (i.e. the kvm_set_cr0() change's effect on the AP startup) was the topic of my previous email: with fixup fb279950ba KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED in the host kernel (which takes vmx_get_mt_mask() out of the picture), flipping the kvm_set_cr0() change on and off can be directly correlated with the AP startup problem. And now I checked the other symptom. I reverted the fb279950ba fixup KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED from vmx_get_mt_mask(), on top of v4.3, but I *also* applied my patch (see last email) entitled KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0(). This makes the AP startup work again without problems, but *should* bring back the LZMA slowdown. And it does. (Tested it on my HP Z400 workstation, with an assigned GPU.) (2) Using the awesome "ts" (timestamp) utility from the "moreutils" package, one can redirect the QEMU debug console to a *named pipe* (--> mkfifo), then read that pipe with: $ ts -i '%H:%M:%.S' /tmp/debugcon.pipe This will prefix each line with a relative / incremental timestamp, with sub-second resolution. (Of course you can write the debug console to /dev/stdout as well, and then just use the pipe operator "|" with "ts".) Now, this is what I get from an upstream OVMF build, with the kernel-side reproducer described in (1): > 00:00:00.063287 SEC: Normal boot > 00:01:33.306886 Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE The LZMA decompression takes about one and half minute. (3) Using your patch "OvmfPkg/Sec: Program MTRRs before decompression" (with a fixup, see it later), I get: > 00:00:00.013923 SEC: Normal boot > 00:00:26.657586 Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE Better (approx 26 seconds), but still quite slow. In addition, the DXE core loading / startup takes conspicuously long as well: > 00:00:00.001067 Loading DXE CORE at 0x000BFF60000 EntryPoint=0x000BFF60240 > 00:00:05.680312 Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6 More than five seconds. (4) Whereas, if I only apply my host kernel patch KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0(). on top of v4.3, and do *not* revert the fb279950ba fixup KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED then both the AP startup works, and the SEC decompression is fast. Measured without your OVMF patch: > 00:00:00.000596 SEC: Normal boot > 00:00:00.208118 Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE About one fifth of a second... And, > 00:00:00.000216 Loading DXE CORE at 0x000BFF60000 EntryPoint=0x000BFF60240 > 00:00:00.015877 Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6 Which is unnoticeable. (5) Review comments on your patch "OvmfPkg/Sec: Program MTRRs before decompression" follow. (Although I don't think this avenue should be pursued for now -- I'll elaborate in the next bullet --, I also believe it's basic politeness to review a patch when asked to, assuming time allows for it.) The commit hash is different because I applied the patch to my tree (for testing it, see above) and then formatted it out of there. > From 0fa30019127773d547f147bc987df6fd2f2714d5 Mon Sep 17 00:00:00 2001 > From: Jordan Justen <jordan.l.jus...@intel.com> > Date: Tue, 20 Oct 2015 00:01:55 -0700 > Subject: [PATCH] OvmfPkg/Sec: Program MTRRs before decompression > > KVM will perform slowly in some cases if the MTRRs are not programmed > correctly. Since the OVMF decompression uses memory extensively, we > need to program the MTRRs before decompressing the PEI/DXE code. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > OvmfPkg/Sec/SecMain.inf | 3 ++ > OvmfPkg/PlatformPei/MemDetect.c | 6 ++-- > OvmfPkg/Sec/SecMain.c | 62 > +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 68 insertions(+), 3 deletions(-) > > diff --git a/OvmfPkg/Sec/SecMain.inf b/OvmfPkg/Sec/SecMain.inf > index 2f78f3c..4c5392a 100644 > --- a/OvmfPkg/Sec/SecMain.inf > +++ b/OvmfPkg/Sec/SecMain.inf > @@ -56,11 +56,14 @@ [LibraryClasses] > PeCoffExtraActionLib > ExtractGuidedSectionLib > LocalApicLib > + MtrrLib > > [Ppis] > gEfiTemporaryRamSupportPpiGuid # PPI ALWAYS_PRODUCED > > [Pcd] > + gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress > + gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPeiMemFvBase > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPeiMemFvSize > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfDxeMemFvBase > diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c > index 612bb4a..bdc16f5 100644 > --- a/OvmfPkg/PlatformPei/MemDetect.c > +++ b/OvmfPkg/PlatformPei/MemDetect.c > @@ -291,9 +291,9 @@ QemuInitializeRam ( > // > // MTRRs disabled, fixed MTRRs disabled, default type is uncached > // > - ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0); > - ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0); > - ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0); > + //ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0); > + //ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0); > + //ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0); > > // > // flip default type to writeback > diff --git a/OvmfPkg/Sec/SecMain.c b/OvmfPkg/Sec/SecMain.c > index 4f87059..ee3761a 100644 > --- a/OvmfPkg/Sec/SecMain.c > +++ b/OvmfPkg/Sec/SecMain.c > @@ -29,6 +29,7 @@ > #include <Library/PeCoffExtraActionLib.h> > #include <Library/ExtractGuidedSectionLib.h> > #include <Library/LocalApicLib.h> > +#include <Library/MtrrLib.h> > > #include <Ppi/TemporaryRamSupport.h> > > @@ -687,6 +688,59 @@ FindAndReportEntryPoints ( > return; > } > > + > +/* > + Initialize the MTRRs for the SEC phase. > + > + Most importantly this will enable caching for decompression of the FVs. > + > +**/ > +VOID > +InitializeMtrrsForSec ( > + VOID > + ) > +{ > + MTRR_SETTINGS Settings; > + MTRR_VARIABLE_SETTING *Mtrr; > + > + ZeroMem ((VOID *) &Settings, sizeof(Settings)); (i) Please drop the space character from within the cast. > + > + Settings.MtrrDefType = MTRR_LIB_CACHE_MTRR_ENABLED | > + MTRR_LIB_CACHE_FIXED_MTRR_ENABLED | MTRR_CACHE_UNCACHEABLE; Seems correct, PlatformPei has the same (BIT11 | BIT10 | 6). > + > + // > + // Address 0 - 0xa0000: Enable Write Back Cache > + // > + SetMem (&Settings.Fixed.Mtrr[0], 2 * sizeof(Settings.Fixed.Mtrr[0]), > + MTRR_CACHE_WRITE_BACK); Looks good, yes. > + > + // > + // Set first 256MB as write-back cache > + // > + ASSERT ((PcdGet32 (PcdOvmfPeiMemFvBase) + PcdGet32 (PcdOvmfPeiMemFvSize)) < > + SIZE_256MB); > + ASSERT ((PcdGet32 (PcdOvmfDxeMemFvBase) + PcdGet32 (PcdOvmfDxeMemFvSize)) < > + SIZE_256MB); (ii) With my SMM patches in mind, this should probably cover up to PcdOvmfDecompressionScratchEnd instead. No change in behavior though, 256MB is plenty. > + Mtrr = &Settings.Variables.Mtrr[0]; > + Mtrr->Base = 0LL | MTRR_CACHE_WRITE_BACK; > + Mtrr->Mask = ((~(UINT64) (SIZE_256MB - 1) & > + MTRR_LIB_CACHE_VALID_ADDRESS) | > + MTRR_LIB_CACHE_MTRR_ENABLED); This took quite some time to review against the Intel SDM :), but it looks correct. (iii) Except, please, drop the space from within the cast. > + > + // > + // Set ROM/Flash as write-protected cache > + // > + Mtrr = &Settings.Variables.Mtrr[0]; (iv) This is where I needed to fix up the patch: the subscript should be one, not zero. (You probably copied the assignment from above, and forgot to update it.) > + Mtrr->Base = PcdGet32 (PcdOvmfFdBaseAddress) | MTRR_CACHE_WRITE_PROTECTED; > + Mtrr->Mask = > + ((~(UINT64) (PcdGet32 (PcdOvmfFirmwareFdSize) - 1) & > + MTRR_LIB_CACHE_VALID_ADDRESS) | > + MTRR_LIB_CACHE_MTRR_ENABLED); (v) Please drop the space from whitin the cast. (vi) It should be asserted that PcdOvmfFirmwareFdSize is an integral power of two. (vii) It should be asserted that PcdOvmfFdBaseAddress has all bits clear when masked with (PcdOvmfFirmwareFdSize - 1). > + > + MtrrSetAllMtrrs (&Settings); > +} > + > + > VOID > EFIAPI > SecCoreStartupWithStack ( > @@ -714,6 +768,14 @@ SecCoreStartupWithStack ( > InitializeFloatingPointUnits (); > > // > + // Initialize the MTRRs for the SEC phase. This will be redone more > + // thoroughly at a later stage. > + // > + if (IsMtrrSupported ()) { > + InitializeMtrrsForSec (); > + } > + > + // > // Initialize IDT > // > IdtTableInStack.PeiService = NULL; > -- > 1.8.3.1 > (6) In closing: - The LZMA decompression slowdown has been quirked in the upstream kernel. A KVM ioctl() has been introduced that allows QEMU userspace to *disable* any quirks, but as far as I can see in QEMU, this is not being done automatically, nor is it exposed to the user. Therefore we can conclude that this issue has been solved for the mid term at least. - I have proved that the AP startup slowdown is a separate issue, one that is caused (or "exposed") by the "other" part of host kernel commit b18d5431acc7 -- the part that has lived on to v4.3 un-quirked. I'll await Janusz's test results, and then I'll submit my proposed fix (= extending the quirk) to the KVM mailing list. With that we should consider this issue also resolved for OVMF, for the mid term. - Rather than auditing the MP services implementation in UefiCpuPkg/CpuDxe *right now*, can we *PLEASE* focus on the SMM series? I've been working on it (with some intermittent delays due to external pressure) since *April*. We are so close now. This is what remains to be done: - Agreement between Paolo, Jordan and Mike about implementing broadcast SMIs. I am willing to code up whatever design is agreed upon. Can everyone involved please prioritize this discussion a little? - Solving the MP-related ExitBootServices() handler bug in CpuDxe. Patches have been on the list for almost a week. While I've been breaking my back testing and reviewing patches for others (not just on edk2-devel, mind you), nobody has batted an eye about that series. [edk2] [PATCH v2 0/4] UefiCpuPkg/CpuDxe: Fix ExitBootServices() callback in the presence of SMIs http://thread.gmane.org/gmane.comp.bios.edk2.devel/3518 Please review it, so that I can include it at the front of my upcoming v4 SMM series. - Thirty (30) patches from the SMM series still need reviews. Once all of the above is covered, I'll update the OvmfPkg/README patch about the status, and post version 4. I wouldn't mind if we could commit the series still in 2015, but I'm getting less and less hopeful. Thanks Laszlo _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel