-----邮件原件-----
发件人: Laszlo Ersek [mailto:ler...@redhat.com] 
发送时间: 2015年6月15日 22:08
收件人: Maoming
抄送: edk2-devel@lists.sourceforge.net; Huangpeng (Peter); Wei Liu; Paolo Bonzini
主题: Re: 答复: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in 
QemuInitializeRam()

On 06/15/15 15:25, Maoming wrote:
> Hi :
> Sorry for the late reply.
> I tested the patch series using 64G and 80G.
> Both of them are OK in XEN.
> 
> Here is what it looks like inside the VM (the memory is 80G):
>                            total       used       free     shared    buffers  
>    cached
>  Mem:       81956412     654708   81301704          0      10528      42256
>  -/+ buffers/cache:      601924   81354488
>  Swap:      4186108          0    4186108
>  
>  Thanks a lot for your nice work!
>  Maoming

Thanks for reporting back!

Since you mentioned earlier that you encountered the problem on qemu/KVM
too -- can you please give that a whirl as well, with this patch series
in place?

Thank you
Laszlo


 The patch series works well in KVM too.
 My environment is :
 version:        kvm-kmod-3.6
 QEMU emulator version 2.1.0
 
 Here is what it looks like inside the VM (the memory is 90G):
                             total       used       free     shared    buffers  
   cached
 Mem:       92862616    1155156   91707460          0      13552      77952
 -/+ buffers/cache:         1063652   91798964
 Swap:        4063224          0    4063224

Thanks!
Maoming


> -----邮件原件-----
> 发件人: Laszlo Ersek [mailto:ler...@redhat.com] 
> 发送时间: 2015年6月10日 21:03
> 收件人: Maoming
> 抄送: edk2-devel@lists.sourceforge.net; Huangpeng (Peter); Wei Liu; Paolo 
> Bonzini
> 主题: Re: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in 
> QemuInitializeRam()
> 
> On 06/09/15 04:15, Laszlo Ersek wrote:
>> On 06/08/15 23:46, Laszlo Ersek wrote:
>>> At the moment we work with a UC default MTRR type, and set three 
>>> memory ranges to WB:
>>> - [0, 640 KB),
>>> - [1 MB, LowerMemorySize),
>>> - [4 GB, 4 GB + UpperMemorySize).
>>>
>>> Unfortunately, coverage for the third range can fail with a high 
>>> likelihood. If the alignment of the base (ie. 4 GB) and the alignment 
>>> of the size (UpperMemorySize) differ, then MtrrLib creates a series 
>>> of variable MTRR entries, with power-of-two sized MTRR masks. And, 
>>> it's really easy to run out of variable MTRR entries, dependent on 
>>> the alignment difference.
>>>
>>> This is a problem because a Linux guest will loudly reject any high 
>>> memory that is not covered my MTRR.
>>>
>>> So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
>>> - flip the MTRR default type to WB,
>>> - set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
>>>   type and variable MTRRs, so we can't avoid this,
>>> - set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
>>> - set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
>>>   more likely than the other scheme (due to less chaotic alignment
>>>   differences).
>>>
>>> Effects of this patch can be observed by setting DEBUG_CACHE 
>>> (0x00200000) in PcdDebugPrintErrorLevel.
>>>
>>> BUG: Although the MTRRs look good to me in the OVMF debug log, I 
>>> still can't boot >= 64 GB guests with this. Instead of the complaints 
>>> mentioned above, the Linux guest apparently spirals into an infinite 
>>> loop (on KVM), or hangs with no CPU load (on TCG).
>>
>> No, actually there is no bug in this patch (so s/RFC/PATCH/). I did 
>> more testing and these are the findings:
>> - I can reproduce the same issue on KVM with SeaBIOS guests.
>> - The exact symptoms are that as soon as the highest guest-phys address
>>   is >= 64 GB, then the guest kernel doesn't boot. It gets stuck
>>   somewhere after hitting Enter in grub.
>> - Normally 3 GB of the guest RAM is mapped under 4 GB in guest-phys
>>   address space, then there's a 1 GB PCI hole, and the rest is above
>>   4 GB. This means that a 63 GB guest can be started (because 63 - 3 + 4
>>   == 64), but if you add just 1 MB more, it won't boot.
>> - (This was the big discovery:) I flipped the "ept" parameter of the
>>   kvm_intel module on my host to N, and then things started to work. I
>>   just booted a 128 GB Linux guest with this patchset. (I have 4 GB
>>   RAM in my host, plus approx 250 GB swap.) The guest could see it all.
>> - The TCG boot didn't hang either; I just couldn't wait earlier for
>>   network initialization to complete.
>>
>> I'm CC'ing Paolo for help with the EPT question. Other than that, this 
>> series is functional. (For QEMU/KVM at least; Xen will likely need 
>> more fixes from others.)
> 
> We have a root cause, it seems. The issue is that the processor in my laptop, 
> on which I tested, has only 36 bits for physical addresses:
> 
>   $ grep 'address sizes' /proc/cpuinfo
>   address sizes   : 36 bits physical, 48 bits virtual
>   ...
> 
> Which matches where the problem surfaces (64 GB guest-phys address
> space) with hw-supported nested paging (EPT) enabled on the host.
> 
> In order to confirm this, a colleague of mine gave me access to a server with 
> 96 GB of RAM, and:
> 
>   address sizes       : 46 bits physical, 48 bits virtual
> 
> On this host I booted a 72 GB OVMF guest on QEMU/KVM, with EPT enabled, and 
> according to the guest dmesg, the guest saw it all.
> 
>   Memory: 74160924K/75493820K available (7735K kernel code, 1149K
>   rwdata, 3340K rodata, 1500K init, 1524K bss, 1332896K reserved, 0K
>   cma-reserved)
> 
> Maoming: since you reported this issue, please confirm that the patch series 
> resolves it for you as well. In that case, I'll repost the series with 
> "PATCH" as subject-prefix instead of "RFC", and I'll drop the BUG note from 
> the last commit message.
> 
> Thanks
> Laszlo
> 
>>> Cc: Maoming <maoming.maom...@huawei.com>
>>> Cc: Huangpeng (Peter) <peter.huangp...@huawei.com>
>>> Cc: Wei Liu <wei.l...@citrix.com>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Laszlo Ersek <ler...@redhat.com>
>>> ---
>>>  OvmfPkg/PlatformPei/MemDetect.c | 43 
>>> +++++++++++++++++++++++++++++++++++++----
>>>  1 file changed, 39 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/OvmfPkg/PlatformPei/MemDetect.c 
>>> b/OvmfPkg/PlatformPei/MemDetect.c index 3ceb142..cceab22 100644
>>> --- a/OvmfPkg/PlatformPei/MemDetect.c
>>> +++ b/OvmfPkg/PlatformPei/MemDetect.c
>>> @@ -194,6 +194,8 @@ QemuInitializeRam (  {
>>>    UINT64                      LowerMemorySize;
>>>    UINT64                      UpperMemorySize;
>>> +  MTRR_SETTINGS               MtrrSettings;
>>> +  EFI_STATUS                  Status;
>>>  
>>>    DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
>>>  
>>> @@ -214,12 +216,45 @@ QemuInitializeRam (
>>>      }
>>>    }
>>>  
>>> -  MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, 
>>> CacheWriteBack);
>>> +  //
>>> +  // We'd like to keep the following ranges uncached:
>>> +  // - [640 KB, 1 MB)
>>> +  // - [LowerMemorySize, 4 GB)
>>> +  //
>>> +  // Everything else should be WB. Unfortunately, programming the inverse 
>>> (ie.
>>> +  // keeping the default UC, and configuring the complement set of 
>>> + the above as  // WB) is not reliable in general, because the end of 
>>> + the upper RAM can have  // practically any alignment, and we may 
>>> + not have enough variable MTRRs to  // cover it exactly.
>>> +  //
>>> +  if (IsMtrrSupported ()) {
>>> +    MtrrGetAllMtrrs (&MtrrSettings);
>>>  
>>> -  MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, 
>>> CacheWriteBack);
>>> +    //
>>> +    // MTRRs disabled, fixed MTRRs disabled, default type is uncached
>>> +    //
>>> +    ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
>>> +    ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
>>> +    ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
>>>  
>>> -  if (UpperMemorySize != 0) {
>>> -    MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
>>> +    //
>>> +    // flip default type to writeback
>>> +    //
>>> +    SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
>>> +    ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
>>> +    MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
>>> +    MtrrSetAllMtrrs (&MtrrSettings);
>>> +
>>> +    //
>>> +    // punch holes
>>> +    //
>>> +    Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
>>> +               SIZE_256KB + SIZE_128KB, CacheUncacheable);
>>> +    ASSERT_EFI_ERROR (Status);
>>> +
>>> +    Status = MtrrSetMemoryAttribute (LowerMemorySize,
>>> +               SIZE_4GB - LowerMemorySize, CacheUncacheable);
>>> +    ASSERT_EFI_ERROR (Status);
>>>    }
>>>  }
>>>  
>>>
>>
> 

------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to