Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Laszlo Ersek
On 09/27/18 09:03, Igor Mammedov wrote:
> On Thu, 27 Sep 2018 00:31:07 +0200
> Laszlo Ersek  wrote:

>> - I guess I could ascertain the mis-alignment by using small guest RAM
>> (128MB), a single DIMM hotplug slot so that reserved-memory-end is
>> rounded up to 5GB (through the 1GB alignment), and a single ivshmem
>> device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
>> Is this the best compromise?
>>
>> (The last option is basically an optimization of my previous reproducer,
>> namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
>> would end at an *odd* GB number -- we can optimize that by using an
>> empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
>> optimized to the smallest *even* (positive) number of GBs, namely 2GB.
> that probably would do the job

There's some black magic going on in the sizing of the reserved memory
area, due to at least commit 085f8e88ba73 ("pc: count in 1Gb hugepage
alignment when sizing hotplug-memory container", 2014-11-24). At this
point I'm basically trying to find whatever numbers that (a) don't take
up many resources, (b) reproduce the issue. :)

Thanks
Laszlo



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Laszlo Ersek
On 09/27/18 17:24, Eric Blake wrote:
> On 9/26/18 3:35 PM, Laszlo Ersek wrote:
>> (+Eric)
> 
>> This too sounds useful. AIUI, ftruncate() is neither forbidden, nor
>> required, to allocate filesystem extents when increasing the size of a
>> file. Using one smaller regular temporary file as the common foundation
>> for multiple "memory-backend-file" objects will save space on the fs if
>> ftruncate() happens to allocate extents.
> 
> Correct. Most likely, ftruncate() extends by creating a hole if the file
> system supports sparse files, but POSIX has no way to guarantee holes
> (it merely states that both ftruncate() and write() beyond EOF are the
> two POSIX interfaces that may create holes - but that it is up to the
> implementation whether holes actually get created, then fstat() and
> lseek(SEEK_HOLE) can inform you whether holes were actually created [the
> former when st_blocks < st_size]). posix_fallocate() exists to force
> allocation, and Linux fallocate() offers even more fine-tuning when you
> need specific hole or non-hole behaviors.
> 

Thank you for the answers!
Laszlo



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Eric Blake

On 9/26/18 3:35 PM, Laszlo Ersek wrote:

(+Eric)



This too sounds useful. AIUI, ftruncate() is neither forbidden, nor
required, to allocate filesystem extents when increasing the size of a
file. Using one smaller regular temporary file as the common foundation
for multiple "memory-backend-file" objects will save space on the fs if
ftruncate() happens to allocate extents.


Correct. Most likely, ftruncate() extends by creating a hole if the file 
system supports sparse files, but POSIX has no way to guarantee holes 
(it merely states that both ftruncate() and write() beyond EOF are the 
two POSIX interfaces that may create holes - but that it is up to the 
implementation whether holes actually get created, then fstat() and 
lseek(SEEK_HOLE) can inform you whether holes were actually created [the 
former when st_blocks < st_size]). posix_fallocate() exists to force 
allocation, and Linux fallocate() offers even more fine-tuning when you 
need specific hole or non-hole behaviors.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Eric Blake

On 9/26/18 3:26 PM, Laszlo Ersek wrote:

(+Eric)




I see shm_open() is used heavily in ivshmem-related tests. I haven't
looked much at shm_open() before. (I've always known it existed in
POSIX, but I've never cared.)


I've never actually played with shm_open() myself, but understand the 
theory of it enough to reply.




So now I first checked what shm_open() would give me over a regular
temporary file created with open(); after all, the file descriptor
returned by either would have to be mmap()'d. From the rationale in POSIX:

,

it seems like the {shm_open(), mmap()} combo has two significant
guarantees over {open(), mmap()}:

- the namespace may be distinct (there need not be a writeable
   filesystem at all),

- the shared object will *always* be locked in RAM ("Shared memory is
   not just simply providing common access to data, it is providing the
   fastest possible communication between the processes").

The rationale seems to permit, on purpose, an shm_open() implementation
that is actually based on open(), using a special file system -- and
AIUI, /dev/shm is just that, on Linux.

Eric, does the above sound more or less correct?


You're right about it being permitted to be a distinct namespace; on the 
other hand, it doesn't even have to be a special file system.  An 
implementation could even use a compile-time fixed directory name 
visible to the rest of the system (although of course you shouldn't rely 
on being able to use the file system to poke at the shmem objects, nor 
should you manipulate the file system underneath the reserved directory 
behind shmem's back if that is what the implementation is using).  So 
I'm less certain of whether you are guaranteed that the shared memory 
has to be locked in place (where it can never be paged out), since an 
implementation on top of the filesystem does not have to do such locking 
- but you are also right that a high quality-of-implementation will 
strive to keep the memory live rather than paging it out precisely 
because it is used for interprocess communication that would be 
penalized if it can be paged out.




If it is correct, then I think shm_open() is exactly what I *don't* want
for this use case. Because, while I do need a pathname for an
mmap()-able object (regular file, or otherwise), just so I can do:

   -object memory-backend-file,id=mem-obj,...,mem-path=... \
   -device ivshmem-plain,memdev=mem-obj,...

, I want the underlying object to put as little pressure on the system
that runs the test suite as possible.

This means I should specifically ask for a regular file, to be mmap()'d
(with MAP_SHARED). Then the kernel knows in advance that it can always
page out the dirty stuff, and the mapping shouldn't clash with cgroups,
or disabled memory overcommit.


Indeed, shmem CAN be a thin veneer on top of the file system, and 
support being paged out; but since an implementation that pins the 
memory such that it cannot page is permitted (and in fact maybe 
desirable), you are right that using shmem can indeed put pressure on 
different resources in relation to what you can accomplish by using the 
file system yourself.




Now, in order to make that actually safe, I should in theory ask for
preallocation on the filesystem (otherwise, if the filesystem runs out
of space, while the kernel is allocating fs extents in order to flush
the dirty pages to them, the process gets a SIGBUS, IIRC). However,
because I know that nothing will be in fact dirtied, I can minimize the
footprint on the filesystem as well, and forego preallocation too.

This suggests that, in my test case,
- I call g_file_open_tmp() for creating the temporary file,
- pass the returned fd to ftruncate() for resizing the temporary file,
- pass the returned pathname to the "memory-backend-file" objects, in
   the "mem-path" property,
- set "share=on",
- set "prealloc=off",
- "discard-data" is irrelevant (there won't be any dirty pages).

Thanks
Laszlo



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Eric Blake

On 9/27/18 4:21 AM, Laszlo Ersek wrote:

On 09/27/18 07:48, Gerd Hoffmann wrote:

   Hi,


Maybe using memdev file backend with manually created sparse file
might actually work (with preallocate disabled)


Thanks, this sounds like a good idea.

I see shm_open() is used heavily in ivshmem-related tests. I haven't
looked much at shm_open() before. (I've always known it existed in
POSIX, but I've never cared.)


How about improving the lovely pci-testdev we have a bit?  Then we can
have huge pci bars without needing backing storage for them ...


Earlier I checked

   qemu-system-x86_64 -device pci-testdev,\?


Side note: While that works, we are trying to shift the documentation 
over to recommending -device pci-testdev,help instead, as then you don't 
need to remember to shell-quote the ? to prevent unintended globbing.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Gerd Hoffmann
  Hi,

> I realize you pasted a prototype patch below (thanks for that!), but it
> doesn't update the documentation. I wouldn't like to dig down another
> design rabbit hole here.
> 
> If you can pursue upstreaming this change, I can definitely delay my
> series, and use the enhanced pci testdev as a reproducer.

Patch sent.

> But, I'm not explicitly asking you to pursue it -- none of this
> classifies as a "serious bug", so these patches might not be the best
> uses of our times.

Well, jumping through loops to get the tests going with ivshmem + memory
overcommit isn't that a great use of our times either ...

cheers,
  Gerd




Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Laszlo Ersek
On 09/27/18 07:48, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Maybe using memdev file backend with manually created sparse file
>>> might actually work (with preallocate disabled)
>>
>> Thanks, this sounds like a good idea.
>>
>> I see shm_open() is used heavily in ivshmem-related tests. I haven't
>> looked much at shm_open() before. (I've always known it existed in
>> POSIX, but I've never cared.)
> 
> How about improving the lovely pci-testdev we have a bit?  Then we can
> have huge pci bars without needing backing storage for them ...

Earlier I checked

  qemu-system-x86_64 -device pci-testdev,\?

briefly. I didn't see anything relevant and figured it was out of scope.

(
Actually, I've thought of yet another thing: the resource reservation
capability for PCIe Root Ports. The 64-bit reservation field in that cap
structure should be perfectly suitable for this testing.

Except, of course, SeaBIOS only handles the bus number reservation
field, at this time. :/
)

Looking at "docs/specs/pci-testdev.txt" now, extending the PCI testdev
looks like a separate project to me. We're getting quite far from the
original goal. I totally agree that if the PCI testdev already had this
ability, it would be a perfect fit, but I don't think I can personally
take that on now.

Thanks
Laszlo

> === cut here ===
> From 05cfced149b0b5c953391666c3151034bc7fe88b Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann 
> Date: Thu, 27 Sep 2018 07:43:10 +0200
> Subject: [PATCH] pci-testdev: add optional memory bar
> 
> Add memory bar to pci-testdev.  Size is configurable using the membar
> property.  Setting the size to zero (default) turns it off.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/misc/pci-testdev.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
> index 32041f535f..af4d678ee4 100644
> --- a/hw/misc/pci-testdev.c
> +++ b/hw/misc/pci-testdev.c
> @@ -85,6 +85,9 @@ typedef struct PCITestDevState {
>  MemoryRegion portio;
>  IOTest *tests;
>  int current;
> +
> +size_t membar_size;
> +MemoryRegion membar;
>  } PCITestDevState;
>  
>  #define TYPE_PCI_TEST_DEV "pci-testdev"
> @@ -253,6 +256,15 @@ static void pci_testdev_realize(PCIDevice *pci_dev, 
> Error **errp)
>  pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio);
>  pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >portio);
>  
> +if (d->membar_size) {
> +memory_region_init(>membar, OBJECT(d), "membar", d->membar_size);
> +pci_register_bar(pci_dev, 2,
> + PCI_BASE_ADDRESS_SPACE_MEMORY |
> + PCI_BASE_ADDRESS_MEM_PREFETCH |
> + PCI_BASE_ADDRESS_MEM_TYPE_64,
> + >membar);
> +}
> +
>  d->current = -1;
>  d->tests = g_malloc0(IOTEST_MAX * sizeof *d->tests);
>  for (i = 0; i < IOTEST_MAX; ++i) {
> @@ -305,6 +317,11 @@ static void qdev_pci_testdev_reset(DeviceState *dev)
>  pci_testdev_reset(d);
>  }
>  
> +static Property pci_testdev_properties[] = {
> +DEFINE_PROP_SIZE("membar", PCITestDevState, membar_size, 0),
> +DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void pci_testdev_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -319,6 +336,7 @@ static void pci_testdev_class_init(ObjectClass *klass, 
> void *data)
>  dc->desc = "PCI Test Device";
>  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  dc->reset = qdev_pci_testdev_reset;
> +dc->props = pci_testdev_properties;
>  }
>  
>  static const TypeInfo pci_testdev_info = {
> 




Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Laszlo Ersek
On 09/27/18 11:21, Laszlo Ersek wrote:
> On 09/27/18 07:48, Gerd Hoffmann wrote:
>>   Hi,
>>
 Maybe using memdev file backend with manually created sparse file
 might actually work (with preallocate disabled)
>>>
>>> Thanks, this sounds like a good idea.
>>>
>>> I see shm_open() is used heavily in ivshmem-related tests. I haven't
>>> looked much at shm_open() before. (I've always known it existed in
>>> POSIX, but I've never cared.)
>>
>> How about improving the lovely pci-testdev we have a bit?  Then we can
>> have huge pci bars without needing backing storage for them ...
> 
> Earlier I checked
> 
>   qemu-system-x86_64 -device pci-testdev,\?
> 
> briefly. I didn't see anything relevant and figured it was out of scope.
> 
> (
> Actually, I've thought of yet another thing: the resource reservation
> capability for PCIe Root Ports. The 64-bit reservation field in that cap
> structure should be perfectly suitable for this testing.
> 
> Except, of course, SeaBIOS only handles the bus number reservation
> field, at this time. :/
> )
> 
> Looking at "docs/specs/pci-testdev.txt" now, extending the PCI testdev
> looks like a separate project to me. We're getting quite far from the
> original goal. I totally agree that if the PCI testdev already had this
> ability, it would be a perfect fit, but I don't think I can personally
> take that on now.

I realize you pasted a prototype patch below (thanks for that!), but it
doesn't update the documentation. I wouldn't like to dig down another
design rabbit hole here.

If you can pursue upstreaming this change, I can definitely delay my
series, and use the enhanced pci testdev as a reproducer. But, I'm not
explicitly asking you to pursue it -- none of this classifies as a
"serious bug", so these patches might not be the best uses of our times.

Laszlo

>> === cut here ===
>> From 05cfced149b0b5c953391666c3151034bc7fe88b Mon Sep 17 00:00:00 2001
>> From: Gerd Hoffmann 
>> Date: Thu, 27 Sep 2018 07:43:10 +0200
>> Subject: [PATCH] pci-testdev: add optional memory bar
>>
>> Add memory bar to pci-testdev.  Size is configurable using the membar
>> property.  Setting the size to zero (default) turns it off.
>>
>> Signed-off-by: Gerd Hoffmann 
>> ---
>>  hw/misc/pci-testdev.c | 18 ++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
>> index 32041f535f..af4d678ee4 100644
>> --- a/hw/misc/pci-testdev.c
>> +++ b/hw/misc/pci-testdev.c
>> @@ -85,6 +85,9 @@ typedef struct PCITestDevState {
>>  MemoryRegion portio;
>>  IOTest *tests;
>>  int current;
>> +
>> +size_t membar_size;
>> +MemoryRegion membar;
>>  } PCITestDevState;
>>  
>>  #define TYPE_PCI_TEST_DEV "pci-testdev"
>> @@ -253,6 +256,15 @@ static void pci_testdev_realize(PCIDevice *pci_dev, 
>> Error **errp)
>>  pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio);
>>  pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >portio);
>>  
>> +if (d->membar_size) {
>> +memory_region_init(>membar, OBJECT(d), "membar", d->membar_size);
>> +pci_register_bar(pci_dev, 2,
>> + PCI_BASE_ADDRESS_SPACE_MEMORY |
>> + PCI_BASE_ADDRESS_MEM_PREFETCH |
>> + PCI_BASE_ADDRESS_MEM_TYPE_64,
>> + >membar);
>> +}
>> +
>>  d->current = -1;
>>  d->tests = g_malloc0(IOTEST_MAX * sizeof *d->tests);
>>  for (i = 0; i < IOTEST_MAX; ++i) {
>> @@ -305,6 +317,11 @@ static void qdev_pci_testdev_reset(DeviceState *dev)
>>  pci_testdev_reset(d);
>>  }
>>  
>> +static Property pci_testdev_properties[] = {
>> +DEFINE_PROP_SIZE("membar", PCITestDevState, membar_size, 0),
>> +DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>>  static void pci_testdev_class_init(ObjectClass *klass, void *data)
>>  {
>>  DeviceClass *dc = DEVICE_CLASS(klass);
>> @@ -319,6 +336,7 @@ static void pci_testdev_class_init(ObjectClass *klass, 
>> void *data)
>>  dc->desc = "PCI Test Device";
>>  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>  dc->reset = qdev_pci_testdev_reset;
>> +dc->props = pci_testdev_properties;
>>  }
>>  
>>  static const TypeInfo pci_testdev_info = {
>>
> 




Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Laszlo Ersek
On 09/27/18 09:03, Igor Mammedov wrote:
> On Thu, 27 Sep 2018 00:31:07 +0200
> Laszlo Ersek  wrote:
> 
>> On 09/26/18 18:26, Alex Williamson wrote:
>>> On Wed, 26 Sep 2018 13:12:47 +0200
>>> Laszlo Ersek  wrote:
>>>   
 On 09/25/18 22:36, Alex Williamson wrote:  
> On Tue, 25 Sep 2018 00:13:46 +0200
> Laszlo Ersek  wrote:
> 
>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>> the ACPI DSDT that would be at least as large as the new 
>> "pci-hole64-size"
>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>
>> In that commit, we added or modified five functions:
>>
>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a 
>> default
>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>   the DIMM hotplug area too (if any).
>>
>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>   board-specific 64-bit base property getters called abstractly by the
>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>   honor the firmware's BAR assignments (i.e., they treat the lowest 
>> 64-bit
>>   GPA programmed by the firmware as the base address for the aperture).
>>
>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>   these intended to extend the aperture to our size recommendation,
>>   calculated relative to the base of the aperture.
>>
>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>> to the default base (pc_pci_hole64_start()), ignoring any programming 
>> done
>> by the firmware. This means that our size recommendation may not be met.
>> Fix it by honoring the firmware's address assignments.
>>
>> The strange extension sizes were spotted by Alex, in the log of a guest
>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>
>> This change only affects DSDT generation, therefore no new compat 
>> property
>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.
>
> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> devices could do the same.  Thanks,

 The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
 when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.

 However, using SeaBIOS+i440fx, I can't show the difference. I've been
 experimenting with various ivshmem devices (even multiple at the same
 time, with different sizes). The "all or nothing" nature of SeaBIOS's
 high allocation of the 64-bit BARs, combined with hugepage alignment
 inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
 for i440fx, seem to make it surprisingly difficult to trigger the issue.

 I figure I should:

 (1) remove the sentence "the patch makes no difference to SeaBIOS
 guests" from the commit message,

 (2) include the DSDT diff on SeaBIOS/q35 in the commit message,

 (3) remain silent on SeaBIOS/i440fx, in the commit message,

 (4) append a new patch, for "bios-tables-test", so that the ACPI gen
 change is validated as part of the test suite, on SeaBIOS/q35.

 Regarding (4):

 - is it OK if I add the test only for Q35?

 - what guest RAM size am I allowed to use in the test suite? In my own
 SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
 acceptable for the test suite.  
>>>
>>> Seems like you've done due diligence, the plan looks ok to me.
>>> Regarding the test memory allocation, is it possible and reasonable to
>>> perhaps create a 256MB shared memory area and re-use it for multiple
>>> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
>>> devices, all with the same backing.  Thanks,  
>>
>> In order to show that the patch makes a difference, on SeaBIOS+Q35, I
>> must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
>> / end-of-DIMM-hotplug-area (i.e., strictly past the return value of
>> pc_pci_hole64_start()). That's not easy, assuming test suite 

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-27 Thread Igor Mammedov
On Thu, 27 Sep 2018 00:31:07 +0200
Laszlo Ersek  wrote:

> On 09/26/18 18:26, Alex Williamson wrote:
> > On Wed, 26 Sep 2018 13:12:47 +0200
> > Laszlo Ersek  wrote:
> >   
> >> On 09/25/18 22:36, Alex Williamson wrote:  
> >>> On Tue, 25 Sep 2018 00:13:46 +0200
> >>> Laszlo Ersek  wrote:
> >>> 
>  In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>  hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>  the ACPI DSDT that would be at least as large as the new 
>  "pci-hole64-size"
>  property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>  64-bit MMIO aperture to the guest OS for hotplug purposes.
> 
>  In that commit, we added or modified five functions:
> 
>  - pc_pci_hole64_start(): shared between i440fx and q35. Provides a 
>  default
>    64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>    the DIMM hotplug area too (if any).
> 
>  - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>    board-specific 64-bit base property getters called abstractly by the
>    ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>    firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>    assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>    honor the firmware's BAR assignments (i.e., they treat the lowest 
>  64-bit
>    GPA programmed by the firmware as the base address for the aperture).
> 
>  - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>    these intended to extend the aperture to our size recommendation,
>    calculated relative to the base of the aperture.
> 
>  Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>  q35_host_get_pci_hole64_end() currently only extend the aperture relative
>  to the default base (pc_pci_hole64_start()), ignoring any programming 
>  done
>  by the firmware. This means that our size recommendation may not be met.
>  Fix it by honoring the firmware's address assignments.
> 
>  The strange extension sizes were spotted by Alex, in the log of a guest
>  kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> 
>  This change only affects DSDT generation, therefore no new compat 
>  property
>  is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>  64-bit BARs, the patch makes no difference to SeaBIOS guests.
> >>>
> >>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> >>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> >>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> >>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> >>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> >>> devices could do the same.  Thanks,
> >>
> >> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
> >> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
> >>
> >> However, using SeaBIOS+i440fx, I can't show the difference. I've been
> >> experimenting with various ivshmem devices (even multiple at the same
> >> time, with different sizes). The "all or nothing" nature of SeaBIOS's
> >> high allocation of the 64-bit BARs, combined with hugepage alignment
> >> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
> >> for i440fx, seem to make it surprisingly difficult to trigger the issue.
> >>
> >> I figure I should:
> >>
> >> (1) remove the sentence "the patch makes no difference to SeaBIOS
> >> guests" from the commit message,
> >>
> >> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
> >>
> >> (3) remain silent on SeaBIOS/i440fx, in the commit message,
> >>
> >> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> >> change is validated as part of the test suite, on SeaBIOS/q35.
> >>
> >> Regarding (4):
> >>
> >> - is it OK if I add the test only for Q35?
> >>
> >> - what guest RAM size am I allowed to use in the test suite? In my own
> >> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> >> acceptable for the test suite.  
> > 
> > Seems like you've done due diligence, the plan looks ok to me.
> > Regarding the test memory allocation, is it possible and reasonable to
> > perhaps create a 256MB shared memory area and re-use it for multiple
> > ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> > devices, all with the same backing.  Thanks,  
> 
> In order to show that the patch makes a difference, on SeaBIOS+Q35, I
> must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
> / end-of-DIMM-hotplug-area (i.e., strictly past the return value of
> pc_pci_hole64_start()). That's not easy, assuming test suite constraints:
> 
> - With a small guest RAM size (and no 

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Gerd Hoffmann
  Hi,

> > Maybe using memdev file backend with manually created sparse file
> > might actually work (with preallocate disabled)
> 
> Thanks, this sounds like a good idea.
> 
> I see shm_open() is used heavily in ivshmem-related tests. I haven't
> looked much at shm_open() before. (I've always known it existed in
> POSIX, but I've never cared.)

How about improving the lovely pci-testdev we have a bit?  Then we can
have huge pci bars without needing backing storage for them ...

HTH,
  Gerd

=== cut here ===
>From 05cfced149b0b5c953391666c3151034bc7fe88b Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Thu, 27 Sep 2018 07:43:10 +0200
Subject: [PATCH] pci-testdev: add optional memory bar

Add memory bar to pci-testdev.  Size is configurable using the membar
property.  Setting the size to zero (default) turns it off.

Signed-off-by: Gerd Hoffmann 
---
 hw/misc/pci-testdev.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
index 32041f535f..af4d678ee4 100644
--- a/hw/misc/pci-testdev.c
+++ b/hw/misc/pci-testdev.c
@@ -85,6 +85,9 @@ typedef struct PCITestDevState {
 MemoryRegion portio;
 IOTest *tests;
 int current;
+
+size_t membar_size;
+MemoryRegion membar;
 } PCITestDevState;
 
 #define TYPE_PCI_TEST_DEV "pci-testdev"
@@ -253,6 +256,15 @@ static void pci_testdev_realize(PCIDevice *pci_dev, Error 
**errp)
 pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio);
 pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >portio);
 
+if (d->membar_size) {
+memory_region_init(>membar, OBJECT(d), "membar", d->membar_size);
+pci_register_bar(pci_dev, 2,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >membar);
+}
+
 d->current = -1;
 d->tests = g_malloc0(IOTEST_MAX * sizeof *d->tests);
 for (i = 0; i < IOTEST_MAX; ++i) {
@@ -305,6 +317,11 @@ static void qdev_pci_testdev_reset(DeviceState *dev)
 pci_testdev_reset(d);
 }
 
+static Property pci_testdev_properties[] = {
+DEFINE_PROP_SIZE("membar", PCITestDevState, membar_size, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void pci_testdev_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -319,6 +336,7 @@ static void pci_testdev_class_init(ObjectClass *klass, void 
*data)
 dc->desc = "PCI Test Device";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 dc->reset = qdev_pci_testdev_reset;
+dc->props = pci_testdev_properties;
 }
 
 static const TypeInfo pci_testdev_info = {
-- 
2.9.3




Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Laszlo Ersek
On 09/26/18 18:26, Alex Williamson wrote:
> On Wed, 26 Sep 2018 13:12:47 +0200
> Laszlo Ersek  wrote:
> 
>> On 09/25/18 22:36, Alex Williamson wrote:
>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>> Laszlo Ersek  wrote:
>>>   
 In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
 hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
 the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
 property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
 64-bit MMIO aperture to the guest OS for hotplug purposes.

 In that commit, we added or modified five functions:

 - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
   the DIMM hotplug area too (if any).

 - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
   board-specific 64-bit base property getters called abstractly by the
   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
   GPA programmed by the firmware as the base address for the aperture).

 - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
   these intended to extend the aperture to our size recommendation,
   calculated relative to the base of the aperture.

 Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
 q35_host_get_pci_hole64_end() currently only extend the aperture relative
 to the default base (pc_pci_hole64_start()), ignoring any programming done
 by the firmware. This means that our size recommendation may not be met.
 Fix it by honoring the firmware's address assignments.

 The strange extension sizes were spotted by Alex, in the log of a guest
 kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).

 This change only affects DSDT generation, therefore no new compat property
 is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
>>>
>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>> devices could do the same.  Thanks,  
>>
>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>
>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>> experimenting with various ivshmem devices (even multiple at the same
>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>> high allocation of the 64-bit BARs, combined with hugepage alignment
>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>
>> I figure I should:
>>
>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>> guests" from the commit message,
>>
>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>
>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>
>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>> change is validated as part of the test suite, on SeaBIOS/q35.
>>
>> Regarding (4):
>>
>> - is it OK if I add the test only for Q35?
>>
>> - what guest RAM size am I allowed to use in the test suite? In my own
>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>> acceptable for the test suite.
> 
> Seems like you've done due diligence, the plan looks ok to me.
> Regarding the test memory allocation, is it possible and reasonable to
> perhaps create a 256MB shared memory area and re-use it for multiple
> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> devices, all with the same backing.  Thanks,

In order to show that the patch makes a difference, on SeaBIOS+Q35, I
must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
/ end-of-DIMM-hotplug-area (i.e., strictly past the return value of
pc_pci_hole64_start()). That's not easy, assuming test suite constraints:

- With a small guest RAM size (and no DIMM hotplug area),
pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
too. Therefore the patch makes no difference.

- If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
adding a DIMM 

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Laszlo Ersek
(+Eric)

On 09/26/18 18:26, Alex Williamson wrote:
> On Wed, 26 Sep 2018 13:12:47 +0200
> Laszlo Ersek  wrote:
> 
>> On 09/25/18 22:36, Alex Williamson wrote:
>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>> Laszlo Ersek  wrote:
>>>   
 In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
 hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
 the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
 property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
 64-bit MMIO aperture to the guest OS for hotplug purposes.

 In that commit, we added or modified five functions:

 - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
   the DIMM hotplug area too (if any).

 - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
   board-specific 64-bit base property getters called abstractly by the
   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
   GPA programmed by the firmware as the base address for the aperture).

 - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
   these intended to extend the aperture to our size recommendation,
   calculated relative to the base of the aperture.

 Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
 q35_host_get_pci_hole64_end() currently only extend the aperture relative
 to the default base (pc_pci_hole64_start()), ignoring any programming done
 by the firmware. This means that our size recommendation may not be met.
 Fix it by honoring the firmware's address assignments.

 The strange extension sizes were spotted by Alex, in the log of a guest
 kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).

 This change only affects DSDT generation, therefore no new compat property
 is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
>>>
>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>> devices could do the same.  Thanks,  
>>
>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>
>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>> experimenting with various ivshmem devices (even multiple at the same
>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>> high allocation of the 64-bit BARs, combined with hugepage alignment
>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>
>> I figure I should:
>>
>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>> guests" from the commit message,
>>
>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>
>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>
>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>> change is validated as part of the test suite, on SeaBIOS/q35.
>>
>> Regarding (4):
>>
>> - is it OK if I add the test only for Q35?
>>
>> - what guest RAM size am I allowed to use in the test suite? In my own
>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>> acceptable for the test suite.
> 
> Seems like you've done due diligence, the plan looks ok to me.
> Regarding the test memory allocation, is it possible and reasonable to
> perhaps create a 256MB shared memory area and re-use it for multiple
> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> devices, all with the same backing.  Thanks,

This too sounds useful. AIUI, ftruncate() is neither forbidden, nor
required, to allocate filesystem extents when increasing the size of a
file. Using one smaller regular temporary file as the common foundation
for multiple "memory-backend-file" objects will save space on the fs if
ftruncate() happens to allocate extents.

(I've also thought of passing the same "memory-backend-file" object to
multiple ivshmem-plain devices, but ivshmem_plain_realize()
[hw/misc/ivshmem.c] checks whether the HostMemoryBackend is already mapped.)

Thanks!
Laszlo



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Laszlo Ersek
(+Eric)

On 09/26/18 14:10, Igor Mammedov wrote:
> On Wed, 26 Sep 2018 13:35:14 +0200
> Laszlo Ersek  wrote:
> 
>> On 09/26/18 13:12, Laszlo Ersek wrote:
>>
>>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>>> change is validated as part of the test suite, on SeaBIOS/q35.
>>>
>>> Regarding (4):
>>>
>>> - is it OK if I add the test only for Q35?
>>>
>>> - what guest RAM size am I allowed to use in the test suite? In my own
>>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>>> acceptable for the test suite.  
>>
>> And, even if the patch's effect can be shown with little guest DRAM, the
>> test case still requires a multi-gig ivshmem-plain device. In
>> "tests/ivshmem-test.c", I see how it is set up -- the backend is set up
>> with shm_open(). The file created under /dev/shm (on Linux) might
>> require host RAM just the same as normal guest DRAM (especially with
>> memory overcommit disabled on the host), correct?
> with over commit disable or cgroups limits enforced (I'd expect that
> in automated testing env i.e. travis or something else)
> allocating such amount of RAM probably would fail like crazy.
> 
> Maybe using memdev file backend with manually created sparse file
> might actually work (with preallocate disabled)

Thanks, this sounds like a good idea.

I see shm_open() is used heavily in ivshmem-related tests. I haven't
looked much at shm_open() before. (I've always known it existed in
POSIX, but I've never cared.)

So now I first checked what shm_open() would give me over a regular
temporary file created with open(); after all, the file descriptor
returned by either would have to be mmap()'d. From the rationale in POSIX:

,

it seems like the {shm_open(), mmap()} combo has two significant
guarantees over {open(), mmap()}:

- the namespace may be distinct (there need not be a writeable
  filesystem at all),

- the shared object will *always* be locked in RAM ("Shared memory is
  not just simply providing common access to data, it is providing the
  fastest possible communication between the processes").

The rationale seems to permit, on purpose, an shm_open() implementation
that is actually based on open(), using a special file system -- and
AIUI, /dev/shm is just that, on Linux.

Eric, does the above sound more or less correct?

If it is correct, then I think shm_open() is exactly what I *don't* want
for this use case. Because, while I do need a pathname for an
mmap()-able object (regular file, or otherwise), just so I can do:

  -object memory-backend-file,id=mem-obj,...,mem-path=... \
  -device ivshmem-plain,memdev=mem-obj,...

, I want the underlying object to put as little pressure on the system
that runs the test suite as possible.

This means I should specifically ask for a regular file, to be mmap()'d
(with MAP_SHARED). Then the kernel knows in advance that it can always
page out the dirty stuff, and the mapping shouldn't clash with cgroups,
or disabled memory overcommit.

Now, in order to make that actually safe, I should in theory ask for
preallocation on the filesystem (otherwise, if the filesystem runs out
of space, while the kernel is allocating fs extents in order to flush
the dirty pages to them, the process gets a SIGBUS, IIRC). However,
because I know that nothing will be in fact dirtied, I can minimize the
footprint on the filesystem as well, and forego preallocation too.

This suggests that, in my test case,
- I call g_file_open_tmp() for creating the temporary file,
- pass the returned fd to ftruncate() for resizing the temporary file,
- pass the returned pathname to the "memory-backend-file" objects, in
  the "mem-path" property,
- set "share=on",
- set "prealloc=off",
- "discard-data" is irrelevant (there won't be any dirty pages).

Thanks
Laszlo



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Alex Williamson
On Wed, 26 Sep 2018 13:12:47 +0200
Laszlo Ersek  wrote:

> On 09/25/18 22:36, Alex Williamson wrote:
> > On Tue, 25 Sep 2018 00:13:46 +0200
> > Laszlo Ersek  wrote:
> >   
> >> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> >> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> >> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> >> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> >> 64-bit MMIO aperture to the guest OS for hotplug purposes.
> >>
> >> In that commit, we added or modified five functions:
> >>
> >> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
> >>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
> >>   the DIMM hotplug area too (if any).
> >>
> >> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
> >>   board-specific 64-bit base property getters called abstractly by the
> >>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
> >>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
> >>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
> >>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
> >>   GPA programmed by the firmware as the base address for the aperture).
> >>
> >> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
> >>   these intended to extend the aperture to our size recommendation,
> >>   calculated relative to the base of the aperture.
> >>
> >> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
> >> q35_host_get_pci_hole64_end() currently only extend the aperture relative
> >> to the default base (pc_pci_hole64_start()), ignoring any programming done
> >> by the firmware. This means that our size recommendation may not be met.
> >> Fix it by honoring the firmware's address assignments.
> >>
> >> The strange extension sizes were spotted by Alex, in the log of a guest
> >> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> >>
> >> This change only affects DSDT generation, therefore no new compat property
> >> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
> >> 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
> > 
> > This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> > only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> > src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> > assigned GPUs and you'll eventually cross that threshold and all 64-bit
> > BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> > devices could do the same.  Thanks,  
> 
> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
> 
> However, using SeaBIOS+i440fx, I can't show the difference. I've been
> experimenting with various ivshmem devices (even multiple at the same
> time, with different sizes). The "all or nothing" nature of SeaBIOS's
> high allocation of the 64-bit BARs, combined with hugepage alignment
> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
> for i440fx, seem to make it surprisingly difficult to trigger the issue.
> 
> I figure I should:
> 
> (1) remove the sentence "the patch makes no difference to SeaBIOS
> guests" from the commit message,
> 
> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
> 
> (3) remain silent on SeaBIOS/i440fx, in the commit message,
> 
> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> change is validated as part of the test suite, on SeaBIOS/q35.
> 
> Regarding (4):
> 
> - is it OK if I add the test only for Q35?
> 
> - what guest RAM size am I allowed to use in the test suite? In my own
> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> acceptable for the test suite.

Seems like you've done due diligence, the plan looks ok to me.
Regarding the test memory allocation, is it possible and reasonable to
perhaps create a 256MB shared memory area and re-use it for multiple
ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
devices, all with the same backing.  Thanks,

Alex



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Igor Mammedov
On Wed, 26 Sep 2018 13:35:14 +0200
Laszlo Ersek  wrote:

> On 09/26/18 13:12, Laszlo Ersek wrote:
> 
> > (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> > change is validated as part of the test suite, on SeaBIOS/q35.
> > 
> > Regarding (4):
> > 
> > - is it OK if I add the test only for Q35?
> > 
> > - what guest RAM size am I allowed to use in the test suite? In my own
> > SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> > acceptable for the test suite.  
> 
> And, even if the patch's effect can be shown with little guest DRAM, the
> test case still requires a multi-gig ivshmem-plain device. In
> "tests/ivshmem-test.c", I see how it is set up -- the backend is set up
> with shm_open(). The file created under /dev/shm (on Linux) might
> require host RAM just the same as normal guest DRAM (especially with
> memory overcommit disabled on the host), correct?
with over commit disable or cgroups limits enforced (I'd expect that
in automated testing env i.e. travis or something else)
allocating such amount of RAM probably would fail like crazy.

Maybe using memdev file backend with manually created sparse file
might actually work (with preallocate disabled)

> 
> Thanks
> Laszlo
> 




Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Laszlo Ersek
On 09/26/18 13:12, Laszlo Ersek wrote:

> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> change is validated as part of the test suite, on SeaBIOS/q35.
> 
> Regarding (4):
> 
> - is it OK if I add the test only for Q35?
> 
> - what guest RAM size am I allowed to use in the test suite? In my own
> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> acceptable for the test suite.

And, even if the patch's effect can be shown with little guest DRAM, the
test case still requires a multi-gig ivshmem-plain device. In
"tests/ivshmem-test.c", I see how it is set up -- the backend is set up
with shm_open(). The file created under /dev/shm (on Linux) might
require host RAM just the same as normal guest DRAM (especially with
memory overcommit disabled on the host), correct?

Thanks
Laszlo



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-26 Thread Laszlo Ersek
On 09/25/18 22:36, Alex Williamson wrote:
> On Tue, 25 Sep 2018 00:13:46 +0200
> Laszlo Ersek  wrote:
> 
>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>
>> In that commit, we added or modified five functions:
>>
>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>   the DIMM hotplug area too (if any).
>>
>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>   board-specific 64-bit base property getters called abstractly by the
>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>   GPA programmed by the firmware as the base address for the aperture).
>>
>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>   these intended to extend the aperture to our size recommendation,
>>   calculated relative to the base of the aperture.
>>
>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>> by the firmware. This means that our size recommendation may not be met.
>> Fix it by honoring the firmware's address assignments.
>>
>> The strange extension sizes were spotted by Alex, in the log of a guest
>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>
>> This change only affects DSDT generation, therefore no new compat property
>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.
> 
> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> devices could do the same.  Thanks,

The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.

However, using SeaBIOS+i440fx, I can't show the difference. I've been
experimenting with various ivshmem devices (even multiple at the same
time, with different sizes). The "all or nothing" nature of SeaBIOS's
high allocation of the 64-bit BARs, combined with hugepage alignment
inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
for i440fx, seem to make it surprisingly difficult to trigger the issue.

I figure I should:

(1) remove the sentence "the patch makes no difference to SeaBIOS
guests" from the commit message,

(2) include the DSDT diff on SeaBIOS/q35 in the commit message,

(3) remain silent on SeaBIOS/i440fx, in the commit message,

(4) append a new patch, for "bios-tables-test", so that the ACPI gen
change is validated as part of the test suite, on SeaBIOS/q35.

Regarding (4):

- is it OK if I add the test only for Q35?

- what guest RAM size am I allowed to use in the test suite? In my own
SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
acceptable for the test suite.

Thanks!
Laszlo



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-25 Thread Laszlo Ersek
On 09/25/18 22:36, Alex Williamson wrote:
> On Tue, 25 Sep 2018 00:13:46 +0200
> Laszlo Ersek  wrote:
> 
>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>
>> In that commit, we added or modified five functions:
>>
>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>   the DIMM hotplug area too (if any).
>>
>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>   board-specific 64-bit base property getters called abstractly by the
>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>   GPA programmed by the firmware as the base address for the aperture).
>>
>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>   these intended to extend the aperture to our size recommendation,
>>   calculated relative to the base of the aperture.
>>
>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>> by the firmware. This means that our size recommendation may not be met.
>> Fix it by honoring the firmware's address assignments.
>>
>> The strange extension sizes were spotted by Alex, in the log of a guest
>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>
>> This change only affects DSDT generation, therefore no new compat property
>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.
> 
> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> src/fw/pciinit.c:pci_bios_map_devices.

Indeed, this mistake in the commit message briefly crossed my mind, some
time after posting the series. Then I promptly forgot about it again. :/

(Because, if SeaBIOS really never picked 64-bit addresses, then commit
9fa99d2519cb would have been mostly untestable, in the first place.)

Thank you for pointing this out!

> Create a VM with several
> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> devices could do the same.  Thanks,

OK, I think I'll have to do those ivshmem tests, and rework the commit
message a bit.

Thanks!
Laszlo

> 
> Alex
> 
>> (Which is in
>> turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
>>
>> Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
>>
>>> @@ -881,9 +881,9 @@
>>>  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
>>> Cacheable, ReadWrite,
>>>  0x, // Granularity
>>>  0x0008, // Range Minimum
>>> -0x00080001C0FF, // Range Maximum
>>> +0x00087FFF, // Range Maximum
>>>  0x, // Translation Offset
>>> -0x0001C100, // Length
>>> +0x8000, // Length
>>>  ,, , AddressRangeMemory, TypeStatic)
>>>  })
>>>  Device (GPE0)  
>>
>> (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
>> guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
>> (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
>> the firmware-programmed base of 32GB, before the patch. Therefore, before
>> the patch, the extension is ineffective. After the patch, we add the 2GB
>> extension to the firmware-programmed base, namely 32GB.)
>>
>> Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
>>
>>> @@ -3162,9 +3162,9 @@
>>>  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
>>> Cacheable, ReadWrite,
>>>  0x, // Granularity
>>>  0x0008, // Range Minimum
>>> -0x0009BFFF, // Range Maximum
>>> +0x000F, // Range Maximum
>>>  0x, // Translation Offset
>>> -0x0001C000, // Length
>>> +0x0008, // Length
>>>  ,, , AddressRangeMemory, TypeStatic)
>>>  })
>>>  Device (GPE0)  
>>
>> (On Q35, the low RAM 

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-25 Thread Michael S. Tsirkin
On Tue, Sep 25, 2018 at 09:07:45PM +0300, Marcel Apfelbaum wrote:
> Hi Laszlo,
> 
> On 9/25/18 1:13 AM, Laszlo Ersek wrote:
> > In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> > hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> > the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> > property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> > 64-bit MMIO aperture to the guest OS for hotplug purposes.
> > 
> 
> [...]
> 
> > Also, because SeaBIOS never assigns 64-bit GPAs to
> > 64-bit BARs, the patch makes no difference to SeaBIOS guests.
> 
> And this is why we didn't catch that earlier... we need to include
> OVMF in our basic tests.

For that we need to distribute the binary with QEMU, we don't
want basic tests to have another external dependency.

> >   (Which is in
> > turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
> > 
> > Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
> > 
> > > @@ -881,9 +881,9 @@
> > >   QWordMemory (ResourceProducer, PosDecode, MinFixed, 
> > > MaxFixed, Cacheable, ReadWrite,
> > >   0x, // Granularity
> > >   0x0008, // Range Minimum
> > > -0x00080001C0FF, // Range Maximum
> > > +0x00087FFF, // Range Maximum
> > >   0x, // Translation Offset
> > > -0x0001C100, // Length
> > > +0x8000, // Length
> > >   ,, , AddressRangeMemory, TypeStatic)
> > >   })
> > >   Device (GPE0)
> > (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
> > guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
> > (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
> > the firmware-programmed base of 32GB, before the patch. Therefore, before
> > the patch, the extension is ineffective. After the patch, we add the 2GB
> > extension to the firmware-programmed base, namely 32GB.)
> > 
> > Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
> > 
> > > @@ -3162,9 +3162,9 @@
> > >   QWordMemory (ResourceProducer, PosDecode, MinFixed, 
> > > MaxFixed, Cacheable, ReadWrite,
> > >   0x, // Granularity
> > >   0x0008, // Range Minimum
> > > -0x0009BFFF, // Range Maximum
> > > +0x000F, // Range Maximum
> > >   0x, // Translation Offset
> > > -0x0001C000, // Length
> > > +0x0008, // Length
> > >   ,, , AddressRangeMemory, TypeStatic)
> > >   })
> > >   Device (GPE0)
> > (On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
> > DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
> > the 32GB extension to that yields 39GB (0x_0009_BFFF_ + 1), before
> > the patch. After the patch, we add the 32GB extension to the
> > firmware-programmed base, namely 32GB.)
> > 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Alex Williamson 
> > Cc: Marcel Apfelbaum 
> > Link: 
> > a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com">http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
> > Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
> > Signed-off-by: Laszlo Ersek 
> > ---
> >   hw/pci-host/piix.c | 2 +-
> >   hw/pci-host/q35.c  | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> > index 0df91e002076..fb4b0669ac9f 100644
> > --- a/hw/pci-host/piix.c
> > +++ b/hw/pci-host/piix.c
> > @@ -285,7 +285,7 @@ static void i440fx_pcihost_get_pci_hole64_end(Object 
> > *obj, Visitor *v,
> >   {
> >   PCIHostState *h = PCI_HOST_BRIDGE(obj);
> >   I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
> > -uint64_t hole64_start = pc_pci_hole64_start();
> > +uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
> >   Range w64;
> >   uint64_t value, hole64_end;
> > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> > index 8acf942b5e65..1c5433161f14 100644
> > --- a/hw/pci-host/q35.c
> > +++ b/hw/pci-host/q35.c
> > @@ -145,7 +145,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, 
> > Visitor *v,
> >   {
> >   PCIHostState *h = PCI_HOST_BRIDGE(obj);
> >   Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> > -uint64_t hole64_start = pc_pci_hole64_start();
> > +uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
> >   Range w64;
> >   uint64_t value, hole64_end;
> 
> 
> Reviewed-by: Marcel Apfelbaum
> 
> 
> Thanks,
> Marcel
> 



Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-25 Thread Alex Williamson
On Tue, 25 Sep 2018 00:13:46 +0200
Laszlo Ersek  wrote:

> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> 64-bit MMIO aperture to the guest OS for hotplug purposes.
> 
> In that commit, we added or modified five functions:
> 
> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>   the DIMM hotplug area too (if any).
> 
> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>   board-specific 64-bit base property getters called abstractly by the
>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>   GPA programmed by the firmware as the base address for the aperture).
> 
> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>   these intended to extend the aperture to our size recommendation,
>   calculated relative to the base of the aperture.
> 
> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
> q35_host_get_pci_hole64_end() currently only extend the aperture relative
> to the default base (pc_pci_hole64_start()), ignoring any programming done
> by the firmware. This means that our size recommendation may not be met.
> Fix it by honoring the firmware's address assignments.
> 
> The strange extension sizes were spotted by Alex, in the log of a guest
> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> 
> This change only affects DSDT generation, therefore no new compat property
> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
> 64-bit BARs, the patch makes no difference to SeaBIOS guests.

This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
only if it cannot satisfy all the BARs from 32-bit MMMIO, see
src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
assigned GPUs and you'll eventually cross that threshold and all 64-bit
BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
devices could do the same.  Thanks,

Alex

> (Which is in
> turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
> 
> Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
> 
> > @@ -881,9 +881,9 @@
> >  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
> > Cacheable, ReadWrite,
> >  0x, // Granularity
> >  0x0008, // Range Minimum
> > -0x00080001C0FF, // Range Maximum
> > +0x00087FFF, // Range Maximum
> >  0x, // Translation Offset
> > -0x0001C100, // Length
> > +0x8000, // Length
> >  ,, , AddressRangeMemory, TypeStatic)
> >  })
> >  Device (GPE0)  
> 
> (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
> guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
> (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
> the firmware-programmed base of 32GB, before the patch. Therefore, before
> the patch, the extension is ineffective. After the patch, we add the 2GB
> extension to the firmware-programmed base, namely 32GB.)
> 
> Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
> 
> > @@ -3162,9 +3162,9 @@
> >  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
> > Cacheable, ReadWrite,
> >  0x, // Granularity
> >  0x0008, // Range Minimum
> > -0x0009BFFF, // Range Maximum
> > +0x000F, // Range Maximum
> >  0x, // Translation Offset
> > -0x0001C000, // Length
> > +0x0008, // Length
> >  ,, , AddressRangeMemory, TypeStatic)
> >  })
> >  Device (GPE0)  
> 
> (On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
> DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
> the 32GB extension to that yields 39GB (0x_0009_BFFF_ + 1), before
> the patch. After the patch, we add the 32GB extension to the
> firmware-programmed base, namely 32GB.)
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Marcel Apfelbaum 
> Link: 
> a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com">http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
> Fixes: 

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

2018-09-25 Thread Marcel Apfelbaum

Hi Laszlo,

On 9/25/18 1:13 AM, Laszlo Ersek wrote:

In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
64-bit MMIO aperture to the guest OS for hotplug purposes.



[...]


Also, because SeaBIOS never assigns 64-bit GPAs to
64-bit BARs, the patch makes no difference to SeaBIOS guests.


And this is why we didn't catch that earlier... we need to include
OVMF in our basic tests.


  (Which is in
turn why ACPI test data for the "bios-tables-test" need not be refreshed.)

Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:


@@ -881,9 +881,9 @@
  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
Cacheable, ReadWrite,
  0x, // Granularity
  0x0008, // Range Minimum
-0x00080001C0FF, // Range Maximum
+0x00087FFF, // Range Maximum
  0x, // Translation Offset
-0x0001C100, // Length
+0x8000, // Length
  ,, , AddressRangeMemory, TypeStatic)
  })
  Device (GPE0)

(On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
(5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
the firmware-programmed base of 32GB, before the patch. Therefore, before
the patch, the extension is ineffective. After the patch, we add the 2GB
extension to the firmware-programmed base, namely 32GB.)

Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:


@@ -3162,9 +3162,9 @@
  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
Cacheable, ReadWrite,
  0x, // Granularity
  0x0008, // Range Minimum
-0x0009BFFF, // Range Maximum
+0x000F, // Range Maximum
  0x, // Translation Offset
-0x0001C000, // Length
+0x0008, // Length
  ,, , AddressRangeMemory, TypeStatic)
  })
  Device (GPE0)

(On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
the 32GB extension to that yields 39GB (0x_0009_BFFF_ + 1), before
the patch. After the patch, we add the 32GB extension to the
firmware-programmed base, namely 32GB.)

Cc: "Michael S. Tsirkin" 
Cc: Alex Williamson 
Cc: Marcel Apfelbaum 
Link: 
a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com">http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
Signed-off-by: Laszlo Ersek 
---
  hw/pci-host/piix.c | 2 +-
  hw/pci-host/q35.c  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 0df91e002076..fb4b0669ac9f 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -285,7 +285,7 @@ static void i440fx_pcihost_get_pci_hole64_end(Object *obj, 
Visitor *v,
  {
  PCIHostState *h = PCI_HOST_BRIDGE(obj);
  I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
-uint64_t hole64_start = pc_pci_hole64_start();
+uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
  Range w64;
  uint64_t value, hole64_end;
  
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c

index 8acf942b5e65..1c5433161f14 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -145,7 +145,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, 
Visitor *v,
  {
  PCIHostState *h = PCI_HOST_BRIDGE(obj);
  Q35PCIHost *s = Q35_HOST_DEVICE(obj);
-uint64_t hole64_start = pc_pci_hole64_start();
+uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
  Range w64;
  uint64_t value, hole64_end;
  



Reviewed-by: Marcel Apfelbaum


Thanks,
Marcel