On 11/03/15 16:40, Xiao Guangrong wrote:
> 
> 
> On 11/03/2015 11:22 PM, Paolo Bonzini wrote:
>>
>>
>> On 03/11/2015 15:35, Xiao Guangrong wrote:
>>>>>>>
>>>>>>> -    if ((cr0 ^ old_cr0) & X86_CR0_CD)
>>>>>>> +    if (!kvm_check_has_quirk(vcpu->kvm,
>>>>>>> KVM_X86_QUIRK_CD_NW_CLEARED) &&
>>>>>>> +        (cr0 ^ old_cr0) & X86_CR0_CD)
>>>>>>>            kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
>>>>>>>
>>>>>>>        return 0;
>>>>> (Honestly I just imitated fb279950ba here; I'm not making any better
>>>>> argument for this diff. But, independently, I wonder why this hunk
>>>>> didn't have the noncoherent DMA check either, originally.)
>>>>
>>>> Great job.  I look forward to the testing results.
>>>>
>>>> It should also have the noncoherent DMA check, in fact, though that's
>>>> just an optimization and it would have masked the bug on your system.
>>>
>>> Hmm... but kvm_zap_gfn_range even other shadow page zapping operations
>>> are really usual behaviour in host - it depends on how we handle memory
>>> overcommit/memory-layout-change on host.
>>>
>>> I doubt it is really a right way to go. kvm_zap_gfn_range() is just a
>>> time
>>> delay factor to trigger OVMF boot issue but there must have other
>>> factors
>>> have such delay too, for example, more vcpus in OVMF, high overload on
>>> host, etc.
>>
>> But it's pointless if the quirk is enabled.  Also, bringing up APs will
>> cause heavy contention on mmu_lock as Laszlo pointed out.
> 
> Yes, i agree quirk is a good solution for it, however, it is a challenge
> for
> us to handle page zap / shadow page zap in future development, we do not
> know
> if it will hurt vulnerable OVMF...

Don't worry, GPU assignment users will tell you about it quickly. :)

> And i am not sure this quick can help us to completely avoid the issue,
> assume
> a guest with 255 vCPUs on a CPU overload host, the same issue can be highly
> triggered.
> 
> After all, using fixed value to wait CPU boot up is not a good way like my
> previous suggestion in the thread where i figured out the root case:
> 
> | And a generic factor is, if the guest has more vCPUs then more time is
> | needed. That why the bug is hardly triggered on small vCPUs guest. I
> | guess we need a self-adapting way to handle the case...

Yes, and we plan to document, audit, and clean up the MP services
implementation. Later.

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to