> On 08-Sep-2023, at 9:32 PM, David Hildenbrand <da...@redhat.com> wrote:
>
> On 08.09.23 17:13, Ani Sinha wrote:
>>> On 08-Sep-2023, at 7:46 PM, David Hildenbrand <da...@redhat.com> wrote:
>>>
>>> On 08.09.23 16:12, Ani Sinha wrote:
>>>>> On 08-Sep-2023, at 3:58 PM, David Hildenbrand <da...@redhat.com> wrote:
>>>>>
>>>>> On 08.09.23 11:50, Ani Sinha wrote:
>>>>>> Depending on the number of available address bits of the current
>>>>>> processor, a
>>>>>> VM can only use a certain maximum amount of memory and no more. This
>>>>>> change
>>>>>> makes sure that a VM is not configured to have more memory than what it
>>>>>> can use
>>>>>> with the current processor settings when started. Additionally, the
>>>>>> change adds
>>>>>> checks during memory hotplug to ensure that the VM does not end up
>>>>>> getting more
>>>>>> memory than what it can actually use after hotplug.
>>>>>> Currently, both the above checks are only for pc (x86) platform.
>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1235403
>>>>>> CC: imamm...@redhat.com
>>>>>> Signed-off-by: Ani Sinha <anisi...@redhat.com>
>>>>>> ---
>>>>>> hw/i386/pc.c | 45 ++++++++++++++++++++++++++++++++++++++++++
>>>>>> hw/mem/memory-device.c | 6 ++++++
>>>>>> include/hw/boards.h | 9 +++++++++
>>>>>> 3 files changed, 60 insertions(+)
>>>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>>>> index 54838c0c41..f84e4c4916 100644
>>>>>> --- a/hw/i386/pc.c
>>>>>> +++ b/hw/i386/pc.c
>>>>>> @@ -31,6 +31,7 @@
>>>>>> #include "hw/i386/topology.h"
>>>>>> #include "hw/i386/fw_cfg.h"
>>>>>> #include "hw/i386/vmport.h"
>>>>>> +#include "hw/mem/memory-device.h"
>>>>>> #include "sysemu/cpus.h"
>>>>>> #include "hw/block/fdc.h"
>>>>>> #include "hw/ide/internal.h"
>>>>>> @@ -1006,6 +1007,17 @@ void pc_memory_init(PCMachineState *pcms,
>>>>>> exit(EXIT_FAILURE);
>>>>>> }
>>>>>> + /*
>>>>>> + * check if the VM started with more ram configured than max
>>>>>> physical
>>>>>> + * address available with the current processor.
>>>>>> + */
>>>>>> + if (machine->ram_size > maxphysaddr + 1) {
>>>>>> + error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64
>>>>>> + " (max configured memory), phys-bits too low (%u)",
>>>>>> + maxphysaddr, machine->ram_size, cpu->phys_bits);
>>>>>> + exit(EXIT_FAILURE);
>>>>>> + }
>>>>>
>>>>> ... I know that this used to be a problem in the past, but nowadays we
>>>>> already do have similar checks in place?
>>>>>
>>>>> $ ./build/qemu-system-x86_64 -m 4T -machine q35,memory-backend=mem0
>>>>> -object memory-backend-ram,id=mem0,size=4T,reserve=off
>>>>> qemu-system-x86_64: Address space limit 0xffffffffff < 0x5077fffffff
>>>>> phys-bits too low (40)
>>>> So you are saying that this is OK and should be allowed? On a 32 bit
>>>> processor that can access only 4G memory, I am spinning up a 10G VM.
>>>
>>> Would that 32bit process have PAE (Physical Address Extension) and still be
>>> able to access that memory?
>> You are sidestepping my point. Sure, we can improve the condition check by
>> checking for PAE CPUID etc but that is not the issue I am trying too point
>> out. What if the processor did not have PAE? Would we allow a VM to have
>> memory size which the processor can’t access? There is no such check today
>> it would seem.
>
> Indeed, because the implementation for 32bit in pc_max_used_gpa() is wrong.
>
> Note that for 64bit it does the right thing, even with memory hotplug,
> because the PCI64 hole is placed above the memory device region.
>
> So I think we should tackle that via pc_max_used_gpa().
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 54838c0c41..d187890675 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms,
> uint64_t pci_hole64_size)
> {
> X86CPU *cpu = X86_CPU(first_cpu);
>
> - /* 32-bit systems don't have hole64 thus return max CPU address */
> - if (cpu->phys_bits <= 32) {
> - return ((hwaddr)1 << cpu->phys_bits) - 1;
> + /*
> + * 32-bit systems don't have hole64, but we might have a region for
> + * memory hotplug.
> + */
> + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
> + return pc_pci_hole64_start() - 1;
Ok this is very confusing! I am looking at pc_pci_hole64_start() function. I
have a few questions …
(a) pc_get_device_memory_range() returns the size of the device memory as the
difference between ram_size and maxram_size. But from what I understand,
ram_size is the actual size of the ram present and maxram_size is the max size
of ram *after* hot plugging additional memory. How can we assume that the
additional available space is already occupied by hot plugged memory?
(b) Another question is, in pc_pci_hole64_start(), why are we adding this size
to the start address?
} else if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) {
pc_get_device_memory_range(pcms, &hole64_start, &size);
if (!pcmc->broken_reserved_end) {
hole64_start += size;
I think this is trying to put the hole after the device memory. But if the ram
size is <=maxram_size then the hole is after the above_4G memory? Why?
(c) in your above change, what does long mode have anything to do with all of
this?
> }
>
> return pc_pci_hole64_start() + pci_hole64_size - 1;
>
>
> That implies:
>
> ./build/qemu-system-x86_64 -cpu pentium -m size=4G -nodefaults -nographic
> qemu-system-x86_64: Address space limit 0xffffffff < 0x13fffffff phys-bits
> too low (32)
>
> As we have memory over 4G (due to PCI hole), that would now correctly fail.
>
> However, what works is:
>
> ./build/qemu-system-x86_64 -cpu pentium -m size=3G -nodefaults -nographic
>
>
> Weirdly enough, when setting cpu->phys_bits, we take care of PSE36 and allow
> for 36bits in the address space.
Hmm, I see this
if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
cpu->phys_bits = 36;
} else {
cpu->phys_bits = 32;
}
I will send a small patch to add PAE as well to this.
>
> So what works:
>
> ./build/qemu-system-x86_64 -cpu pentium,pse36=on -m size=32G -nodefaults
> -nographic
>
> And what doesn't:
>
> ./build/qemu-system-x86_64 -cpu pentium,pse36=on -m size=64G -nodefaults
> -nographic -S
> qemu-system-x86_64: Address space limit 0xfffffffff < 0x103fffffff phys-bits
> too low (36)
>
>
> However, we don't seem to have such handling in place for PAE (do we have to
> extend that handling in x86_cpu_realizefn()?). Maybe pae should always imply
> pse36, not sure ...
>
> --
> Cheers,
>
> David / dhildenb
>