On 11/14/16 09:50, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 09:17, Laszlo Ersek wrote:
>> On 11/13/16 13:51, Fan, Jeff wrote:
>>> Laszlo,
>>>
>>> Thanks your testing. It seems that there is still some unknown issue 
>>> existing.
>>>
>>> I suggest to push this serial of patches firstly, because they have
>>> big progress to solve the AP crashed issue in
>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216.
>>
>> Sounds good to me.
>>
>>> I could submit another bug to handle "AP lost" issue.
>>
>> I hope that Paolo can continue to help us with the KVM trace analysis.
> 
> I will, but it will take a few days.  In the meanwhile it would be nice
> if you could take a look at using SendSmiIpiAllExcludingSelf() to bridge
> the difference between 0xb2 on QEMU and on real hardware.

You've tried that:

https://www.mail-archive.com/edk2-devel@lists.01.org/msg02840.html
https://www.mail-archive.com/edk2-devel@lists.01.org/msg02923.html

Do you suggest to make the LocalApicLib instances usable at runtime?

For that I think we'll need to cover the LAPIC address range with a
runtime-marked EfiMemoryMappedIO area. This can be done in
"OvmfPkg/SmmControl2Dxe".

Also, we'll need a LocalApicLib instance that registers a callback for
SetVirtualAddressMap() and converts the LAPIC base address pointer.

Currently BaseXApicX2ApicLib.c's GetLocalApicBaseAddress() function uses
the MSR_IA32_APIC_BASE register if it's available -- based on CPUID --,
and falls back to PcdCpuLocalApicBaseAddress otherwise. And only
PcdCpuLocalApicBaseAddress is what we could replace with the virtual
pointer. We can't accommodate a guest OS that reprograms the LAPIC base
address.

Jeff, what do you think?

Anyway, I believe KVM doesn't support moving the LAPIC window; is that
right? (Independently, I seem to recall an attack that stole SMRAM
accesses by hiding SMRAM with the LAPIC window.)

Thanks
Laszlo


>>> Thus, JIewen's
>>> or others' patches could be push as long as they have no additional
>>> issue except for "AP Lost:".
>>
>> I haven't gotten around testing Jiewen's v3 series yet. I think it would
>> be best if I could test Jiewen's v3 after this v2 series of yours is
>> committed. I'll report back with results.
>>
>> Thanks
>> Laszlo
>>
>>>
>>> I could follow up to fix "AP Lost" issue.
>>>
>>> Thanks!
>>> Jeff
>>>
>>>
>>> -----Original Message-----
>>> From: Laszlo Ersek [mailto:ler...@redhat.com] 
>>> Sent: Saturday, November 12, 2016 3:49 AM
>>> To: Fan, Jeff
>>> Cc: edk2-de...@ml01.01.org; Yao, Jiewen; Paolo Bonzini
>>> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
>>>
>>> On 11/11/16 06:45, Jeff Fan wrote:
>>>> On S3 path, we will wake up APs to restore CPU context in 
>>>> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
>>>> from hlt state and execute the instruction after HLT instruction.
>>>>
>>>> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>>>>
>>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>>>>
>>>> I tested real platform with 64bit DXE.
>>>>
>>>> v2:
>>>>   1. Make stack alignment per Laszlo's comment.
>>>>   2. Trim whitespace at end of end per Laszlo's comment.
>>>>   3. Update year mark in file header.
>>>>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
>>>>
>>>> Jeff Fan (3):
>>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
>>>>
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 
>>>> ++++++++++++++++++++++++++-
>>>>  4 files changed, 136 insertions(+), 4 deletions(-)
>>>>
>>>
>>> Applied this locally to master (ffd6b0b1b65e) for testing. I tested the 
>>> series with a suspend-resume loop -- not a busy loop, just manually. (So 
>>> there was always one second or so between adjacent steps.)
>>>
>>> No crashes or emulation failures, but the "AP going lost" issue remains 
>>> present -- sometimes Linux cannot bring up one of the four VCPUs after 
>>> resume.
>>>
>>> In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.
>>>
>>> In the Ia32X64 case, I experienced the symptom after the 89th resume.
>>>
>>> Thanks
>>> Laszlo
>>>

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to