[Qemu-devel] virtio_gpu testing
Hi, I am testing virtio_gpu/virgl and have followed the instructions at: https://www.kraxel.org/slides/qemu-gfx/#resources https://docs.google.com/document/d/1CNiN0rHdfh7cp9tQ3coebNEJtHJzm4OCWvF3qL4nucc/pub Host runs 3.13 with drm.rnodes=1 My fc20 guest boots fine with virgl and sdl2.0 support. However glxinfo and piglit sanity seem to have some issues, most likely something wrong with my setup. glxinfo says: libGL error: failed to create dri screen libGL error: failed to load driver: virtio_gpu although /usr/lib64/dri/virtio_gpu_dri.so is in the guest (also a symlink to it named /usr/lib64/dri/virgl_dri.so) guest dmesg: http://pastebin.com/0xtQKqvU full glxinfo output: http://pastebin.com/jJibN4gp xorg loads xf6-video-virgl, log: http://pastebin.com/mgdbB8CY piglit sanity log: http://pastebin.com/hqte8mHs The only crucial difference from the setup described afaict, is that my host mesa is the default system mesa, it does not support EGL_EXT_image_dma_buf_import, as: http://cgit.freedesktop.org/~airlied/mesa/log/?h=egl-mesa-drm-buf-export Does correct functionality depend on EGL_EXT_image_dma_buf_import? (btw branch egl-mesa-drm-buf-export branch seems old, is there an updated one?). I tried to use this branch as well but I get no gnome session display / login. Host mesa is using radeonsi_dri Also input/cursor is not entirely working in sdl guest display (cursor works, but is not visible), i guess this may be expected at the moment? Is the display supposed to work with either an X11 or a wayland session in the guest? thanks for any hints, - Vasilis
Re: [Qemu-devel] [PATCH v2] qtest/bios-tables: Add DMAR unit test on intel_iommu for q35
On Mon, Nov 23, 2014 at 04:25:05PM +0200, Marcel Apfelbaum wrote: > On Mon, 2014-11-24 at 14:37 +0100, Vasilis Liaskovitis wrote: > > The test enables intel_iommu on q35, looks for and reads the DMAR table as > > well > > as its only DRHC structure (for now), checking the header and checksums. > > Hi Vaisilis, > I had a deeper look to your patch and the code already checks > header and checksum for DMAR, all you had to do is to add your latest chunk: > > @@ -779,7 +823,7 @@ static void test_acpi_tcg(void) > > > > memset(&data, 0, sizeof(data)); > > data.machine = MACHINE_Q35; > > -test_acpi_one("-machine q35,accel=tcg", &data); > > +test_acpi_one("-machine q35,accel=tcg,iommu=on", &data); > > free_test_data(&data); > > You can check that it is automatically done by test_dst_table function. > You can add there a print to convince yourself. > > However what is missing is a DMAR binary table to compare the content with an > expected one. > You can create it by running: > tests/acpi-test-data/rebuild-expected-aml.sh > > Then add the newly created file to tests/acpi-test-data/q35/DMAR sorry for the delay. thanks, I missed this. I sent v3 simply with the addition of the DMAR aml file and just the "iommu=on" chunk, as you suggested. - Vasilis
[Qemu-devel] [PATCH v3] qtest/bios-tables: Add DMAR aml file and enable intel_iommu for q35
We generate the expected DMAR table and enable intel_iommu on q35 to test the table. Signed-off-by: Vasilis Liaskovitis --- tests/acpi-test-data/q35/DMAR | Bin 0 -> 64 bytes tests/bios-tables-test.c | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) create mode 100644 tests/acpi-test-data/q35/DMAR diff --git a/tests/acpi-test-data/q35/DMAR b/tests/acpi-test-data/q35/DMAR new file mode 100644 index ..6def4553381f4e48b80ead11af2adf6ce09c8c7e GIT binary patch literal 64 ycmZ?qbqsP~U|?XZbMklg2v%^42yk`*iZKGkKx`1L2E+&;zyK0sV7U1YL;?U|dI
Re: [Qemu-devel] [PATCH] qtest/bios-tables: Add DMAR unit test on intel_iommu for q35
Hi, On Mon, Nov 24, 2014 at 11:37:05AM +0200, Marcel Apfelbaum wrote: > On Sat, 2014-11-22 at 20:05 +0100, Vasilis Liaskovitis wrote: > > The test enables intel_iommu on q35 and reads the DMAR table and its only > > DRHC structure (for now), checking only the header and checksums. > > > > Signed-off-by: Vasilis Liaskovitis > > --- > > tests/bios-tables-test.c | 34 +- > > 1 file changed, 33 insertions(+), 1 deletion(-) > > > > diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c > > index 9e4d205..f09b0cb 100644 > > --- a/tests/bios-tables-test.c > > +++ b/tests/bios-tables-test.c > > @@ -45,6 +45,8 @@ typedef struct { > > AcpiRsdtDescriptorRev1 rsdt_table; > > AcpiFadtDescriptorRev1 fadt_table; > > AcpiFacsDescriptorRev1 facs_table; > > +AcpiTableDmar dmar_table; > > +AcpiDmarHardwareUnit drhd; > > uint32_t *rsdt_tables_addr; > > int rsdt_tables_nr; > > GArray *tables; > > @@ -371,6 +373,33 @@ static void test_acpi_dsdt_table(test_data *data) > > g_array_append_val(data->tables, dsdt_table); > > } > > > > +static void test_acpi_dmar_table(test_data *data) > > +{ > > +AcpiTableDmar *dmar_table = &data->dmar_table; > > +AcpiDmarHardwareUnit *drhd = &data->drhd; > > +struct AcpiTableHeader *header = (struct AcpiTableHeader *) dmar_table; > > +int tables_nr = data->rsdt_tables_nr - 1; > > +uint32_t addr = data->rsdt_tables_addr[tables_nr]; /* dmar is last */ > Hi, > > The DMAR table is always last? If not, it will break when we add another > table test. > I suggest going over the tables and looking for the DMAR signature. agreed, it would also break if hw/i386/acpi-build.c adds other tables after DMAR. I posted v2 following your suggestion. thanks, - Vasilis
[Qemu-devel] [PATCH v2] qtest/bios-tables: Add DMAR unit test on intel_iommu for q35
The test enables intel_iommu on q35, looks for and reads the DMAR table as well as its only DRHC structure (for now), checking the header and checksums. Signed-off-by: Vasilis Liaskovitis --- tests/bios-tables-test.c | 46 +- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 9e4d205..93b4b3f 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -45,6 +45,8 @@ typedef struct { AcpiRsdtDescriptorRev1 rsdt_table; AcpiFadtDescriptorRev1 fadt_table; AcpiFacsDescriptorRev1 facs_table; +AcpiTableDmar dmar_table; +AcpiDmarHardwareUnit drhd; uint32_t *rsdt_tables_addr; int rsdt_tables_nr; GArray *tables; @@ -371,6 +373,45 @@ static void test_acpi_dsdt_table(test_data *data) g_array_append_val(data->tables, dsdt_table); } +static void test_acpi_dmar_table(test_data *data) +{ +AcpiTableDmar *dmar_table = &data->dmar_table; +AcpiDmarHardwareUnit *drhd = &data->drhd; +struct AcpiTableHeader *header = (struct AcpiTableHeader *) dmar_table; +uint32_t addr, signature_le; +bool found_dmar = false; +char signature_str[5] = {}; +int i; + +memset(dmar_table, 0, sizeof(*dmar_table)); + +/* look for DMAR signature in the tables, FADT comes first so we skip it */ +for (i = 1; i < data->rsdt_tables_nr; i++) { +addr = data->rsdt_tables_addr[i]; +ACPI_READ_TABLE_HEADER(dmar_table, addr); +signature_le = cpu_to_le32(header->signature); +memcpy(signature_str, &signature_le, 4); +if (!g_strcmp0(signature_str, "DMAR")) { +found_dmar = true; +break; +} +} +g_assert(found_dmar); +ACPI_READ_FIELD(dmar_table->host_address_width, addr); +ACPI_READ_FIELD(dmar_table->flags, addr); +ACPI_READ_ARRAY_PTR(dmar_table->reserved, 10, addr); + +memset(drhd, 0, sizeof(*drhd)); +ACPI_READ_FIELD(drhd->type, addr); +ACPI_READ_FIELD(drhd->length, addr); +ACPI_READ_FIELD(drhd->flags, addr); +ACPI_READ_FIELD(drhd->pci_segment, addr); +ACPI_READ_FIELD(drhd->address, addr); +g_assert(!acpi_checksum((uint8_t *)dmar_table, sizeof(AcpiTableDmar) + +drhd->length)); + +} + static void test_acpi_tables(test_data *data) { int tables_nr = data->rsdt_tables_nr - 1; /* fadt is first */ @@ -747,6 +788,9 @@ static void test_acpi_one(const char *params, test_data *data) test_acpi_fadt_table(data); test_acpi_facs_table(data); test_acpi_dsdt_table(data); +if (strstr(params, "iommu=on")) { +test_acpi_dmar_table(data); +} test_acpi_tables(data); if (iasl) { @@ -779,7 +823,7 @@ static void test_acpi_tcg(void) memset(&data, 0, sizeof(data)); data.machine = MACHINE_Q35; -test_acpi_one("-machine q35,accel=tcg", &data); +test_acpi_one("-machine q35,accel=tcg,iommu=on", &data); free_test_data(&data); } -- 1.9.1
[Qemu-devel] [PATCH] qtest/bios-tables: Add DMAR unit test on intel_iommu for q35
The test enables intel_iommu on q35 and reads the DMAR table and its only DRHC structure (for now), checking only the header and checksums. Signed-off-by: Vasilis Liaskovitis --- tests/bios-tables-test.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 9e4d205..f09b0cb 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -45,6 +45,8 @@ typedef struct { AcpiRsdtDescriptorRev1 rsdt_table; AcpiFadtDescriptorRev1 fadt_table; AcpiFacsDescriptorRev1 facs_table; +AcpiTableDmar dmar_table; +AcpiDmarHardwareUnit drhd; uint32_t *rsdt_tables_addr; int rsdt_tables_nr; GArray *tables; @@ -371,6 +373,33 @@ static void test_acpi_dsdt_table(test_data *data) g_array_append_val(data->tables, dsdt_table); } +static void test_acpi_dmar_table(test_data *data) +{ +AcpiTableDmar *dmar_table = &data->dmar_table; +AcpiDmarHardwareUnit *drhd = &data->drhd; +struct AcpiTableHeader *header = (struct AcpiTableHeader *) dmar_table; +int tables_nr = data->rsdt_tables_nr - 1; +uint32_t addr = data->rsdt_tables_addr[tables_nr]; /* dmar is last */ + +memset(dmar_table, 0, sizeof(*dmar_table)); +ACPI_READ_TABLE_HEADER(dmar_table, addr); +ACPI_ASSERT_CMP(header->signature, "DMAR"); + +ACPI_READ_FIELD(dmar_table->host_address_width, addr); +ACPI_READ_FIELD(dmar_table->flags, addr); +ACPI_READ_ARRAY_PTR(dmar_table->reserved, 10, addr); + +memset(drhd, 0, sizeof(*drhd)); +ACPI_READ_FIELD(drhd->type, addr); +ACPI_READ_FIELD(drhd->length, addr); +ACPI_READ_FIELD(drhd->flags, addr); +ACPI_READ_FIELD(drhd->pci_segment, addr); +ACPI_READ_FIELD(drhd->address, addr); +g_assert(!acpi_checksum((uint8_t *)dmar_table, sizeof(AcpiTableDmar) + +drhd->length)); + +} + static void test_acpi_tables(test_data *data) { int tables_nr = data->rsdt_tables_nr - 1; /* fadt is first */ @@ -747,6 +776,9 @@ static void test_acpi_one(const char *params, test_data *data) test_acpi_fadt_table(data); test_acpi_facs_table(data); test_acpi_dsdt_table(data); +if (strstr(params, "iommu=on")) { +test_acpi_dmar_table(data); +} test_acpi_tables(data); if (iasl) { @@ -779,7 +811,7 @@ static void test_acpi_tcg(void) memset(&data, 0, sizeof(data)); data.machine = MACHINE_Q35; -test_acpi_one("-machine q35,accel=tcg", &data); +test_acpi_one("-machine q35,accel=tcg,iommu=on", &data); free_test_data(&data); } -- 1.9.1
Re: [Qemu-devel] [PATCH 0/2] pc: memory hotplug fixes
On Mon, Jun 09, 2014 at 07:27:59PM +0200, Igor Mammedov wrote: > Series is build on mst/pci tree that includes memory hotplug bits. > > patch 2/2, fixes incorrect address auto-allocation caused by wrong > sorting order due to overflow in pc_dimm_addr_sort() comparator. > thanks for the fixes. For future hotplug related fixes, please cc Mohammed Gamal and Anshul Makkar (I added them in cc above) instead of me. thanks, - Vasilis
Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
On Thu, May 29, 2014 at 11:12:37AM +0200, Igor Mammedov wrote: > On Wed, 28 May 2014 18:38:13 +0200 > Vasilis Liaskovitis wrote: > > > On Wed, May 28, 2014 at 03:26:42PM +0200, Igor Mammedov wrote: > > > On Wed, 28 May 2014 14:23:13 +0200 > > > Vasilis Liaskovitis wrote: > > > > > > > On Wed, May 28, 2014 at 10:07:22AM +0200, Igor Mammedov wrote: > > > > > On Tue, 27 May 2014 17:57:31 +0200 > > > > > Anshul Makkar wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I tested the hot unplug patch and doesn't seem to work properly > > > > > > with Debian > > > > > > 6 and Ubuntu host. > > > > > > > > > > > > Scenario: > > > > > > I added 3 dimm devices of 1G each: > > > > > > > > > > > > object_add memory-ram,id=ram0,size=1G, device_add > > > > > > dimm,id=dimm1,memdev=ram0 > > > > > > > > > > > > object_add memory-ram,id=ram1,size=1G, device_add > > > > > > dimm,id=dimm2,memdev=ram1 > > > > > > > > > > > > object_add memory-ram,id=ram2,size=1G, device_add > > > > > > dimm,id=dimm3,memdev=ram2 > > > > > > > > > > > > device_del dimm3: I get the OST EVENT EJECT 0x3 and OST STATUS as > > > > > > 0x84(IN > > > > > > PROGRESS) If I check on the guest, the device has been successfully > > > > > > removed. But no OST EJECT SUCCESS event was received. > > > > > I think there should be a SUCCESS event, > > > > > it should be investigated from the guest side first, OST support in > > > > > kernel > > > > > is relatively new. > > > > > > > > When testing older guest kernel (3.11), _OST success events are not sent > > > > from the guest. I haven't tried newer versions yet. > > > > > > > > In terms of OSPM _OST behaviour, I am not sure if returning OST success > > > > status > > > > on succcesful removal is *required*. Figure 6-37, page 306 of ACPI > > > > spec5.0 > > > > shows that on succcesfull OS ejection ejection, _EJ0 is evaluated. > > > > Evaluating > > > > _OST does not seem to be a requirement, is it? (cc'ing linux-acpi for > > > > input) > > > > > > > > In linux guests, on successful removal, _EJ0 is always evaluated. I > > > > believe we > > > > should be handling _EJ0 and doing the dimm removal (object_unparent) > > > > there. > > > > Currently OST successes are never received and dimm devices remain in > > > > QEMU even > > > > when successfully ejected from guest. > > > > E.g. a quick patch for _EJ0 handling, on top of Hu Tao's series: > > > > > The same register [0x14] is control register when writing there, so we can > use-reuse > the same bit position as MRMV for signaling QEMU to perform eject on writing > 1 there, > similarly like it's done for insert event. thanks, on a closer look that makes sense. MRMV is sufficient, example patch below. With the new "query-acpi-ospm-status" we can read OST status from QMP. However as I mentioned, on succesful removal, the guest kernel does not send the OST success status (0x0). This leads to the scenario where memory device is succesfully ejected, _EJ0 is evaluated in guest and device is deleted in qemu (with the patch below). The dimm will no longer show up in "query-memory-devices", however the ospm status of the dimm slot will still remain at 0x84 (ejection in progress). This can be confusing to management layers. Do you have any suggestion for how to report the succesful _EJ0? We could simply write a succesful status: mdev->ost_status = 0x0 when MRMV is written to, but it is a bit hacky. On the other hand, having another command only for _EJ0 notification sounds like overkill. In terms of linux kernel ACPI conformity, Figure 6-37 in the spec implies that _OST evaluation is not required after succesfull _EJ0 evaluation. Also see relevant comment in drivers/acpi/scan.c: acpi_scan_hot_remove() line 378 (3.15-rc8): /* * Verify if eject was indeed successful. If not, log an error * message. No need to call _OST since _EJ0 call was made * OK. */ acpi, memory-hotplug: Add _EJ0 handling --- docs/specs/acpi_mem_hotplug.txt |3 ++- hw/acpi/memory_hotplug.c|9 +++-- 2 files changed, 5 insertions(+), 7 deleti
Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
On Wed, May 28, 2014 at 03:26:42PM +0200, Igor Mammedov wrote: > On Wed, 28 May 2014 14:23:13 +0200 > Vasilis Liaskovitis wrote: > > > On Wed, May 28, 2014 at 10:07:22AM +0200, Igor Mammedov wrote: > > > On Tue, 27 May 2014 17:57:31 +0200 > > > Anshul Makkar wrote: > > > > > > > Hi, > > > > > > > > I tested the hot unplug patch and doesn't seem to work properly with > > > > Debian > > > > 6 and Ubuntu host. > > > > > > > > Scenario: > > > > I added 3 dimm devices of 1G each: > > > > > > > > object_add memory-ram,id=ram0,size=1G, device_add > > > > dimm,id=dimm1,memdev=ram0 > > > > > > > > object_add memory-ram,id=ram1,size=1G, device_add > > > > dimm,id=dimm2,memdev=ram1 > > > > > > > > object_add memory-ram,id=ram2,size=1G, device_add > > > > dimm,id=dimm3,memdev=ram2 > > > > > > > > device_del dimm3: I get the OST EVENT EJECT 0x3 and OST STATUS as > > > > 0x84(IN > > > > PROGRESS) If I check on the guest, the device has been successfully > > > > removed. But no OST EJECT SUCCESS event was received. > > > I think there should be a SUCCESS event, > > > it should be investigated from the guest side first, OST support in kernel > > > is relatively new. > > > > When testing older guest kernel (3.11), _OST success events are not sent > > from the guest. I haven't tried newer versions yet. > > > > In terms of OSPM _OST behaviour, I am not sure if returning OST success > > status > > on succcesful removal is *required*. Figure 6-37, page 306 of ACPI spec5.0 > > shows that on succcesfull OS ejection ejection, _EJ0 is evaluated. > > Evaluating > > _OST does not seem to be a requirement, is it? (cc'ing linux-acpi for input) > > > > In linux guests, on successful removal, _EJ0 is always evaluated. I believe > > we > > should be handling _EJ0 and doing the dimm removal (object_unparent) there. > > Currently OST successes are never received and dimm devices remain in QEMU > > even > > when successfully ejected from guest. > > E.g. a quick patch for _EJ0 handling, on top of Hu Tao's series: > > > > acpi, memory-hotplug: Add _EJ0 handling > > > > --- > > docs/specs/acpi_mem_hotplug.txt |3 ++- > > hw/acpi/memory_hotplug.c | 13 +++-- > > hw/i386/ssdt-misc.dsl|3 ++- > > include/hw/acpi/memory_hotplug.h |1 + > > 4 files changed, 12 insertions(+), 8 deletions(-) > > > > diff --git a/docs/specs/acpi_mem_hotplug.txt > > b/docs/specs/acpi_mem_hotplug.txt > > index 1290994..1352962 100644 > > --- a/docs/specs/acpi_mem_hotplug.txt > > +++ b/docs/specs/acpi_mem_hotplug.txt > > @@ -28,7 +28,8 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte > > access): > > region will read/store data from/to selected memory device. > >[0x4-0x7] OST event code reported by OSPM > >[0x8-0xb] OST status code reported by OSPM > > - [0xc-0x13] reserved, writes into it are ignored > > + [0xc]EJ device if written to > > + [0xd-0x13] reserved, writes into it are ignored > >[0x14] Memory device control fields > >bits: > >0: reserved, OSPM must clear it before writing to register > > diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c > > index 8aa829d..d3edd28 100644 > > --- a/hw/acpi/memory_hotplug.c > > +++ b/hw/acpi/memory_hotplug.c > > @@ -93,9 +93,6 @@ static void acpi_memory_hotplug_write(void *opaque, > > hwaddr addr, uint64_t data, > > case 0x03: /* EJECT */ > > switch (mdev->ost_status) { > > case 0x0: /* SUCCESS */ > > -object_unparent(OBJECT(mdev->dimm)); > > -mdev->is_removing = false; > > -mdev->dimm = NULL; > > break; > > case 0x1: /* FAILURE */ > > case 0x2: /* UNRECOGNIZED NOTIFY */ > > @@ -115,9 +112,6 @@ static void acpi_memory_hotplug_write(void *opaque, > > hwaddr addr, uint64_t data, > > case 0x103: /* OSPM EJECT */ > > switch (mdev->ost_status) { > > case 0x0: /* SUCCESS */ > > -object_unparent(OBJECT(mdev->dimm)); > > -mdev->is_removing = false; > > -mdev->dimm = NUL
Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
On Wed, May 28, 2014 at 10:07:22AM +0200, Igor Mammedov wrote: > On Tue, 27 May 2014 17:57:31 +0200 > Anshul Makkar wrote: > > > Hi, > > > > I tested the hot unplug patch and doesn't seem to work properly with Debian > > 6 and Ubuntu host. > > > > Scenario: > > I added 3 dimm devices of 1G each: > > > > object_add memory-ram,id=ram0,size=1G, device_add dimm,id=dimm1,memdev=ram0 > > > > object_add memory-ram,id=ram1,size=1G, device_add dimm,id=dimm2,memdev=ram1 > > > > object_add memory-ram,id=ram2,size=1G, device_add dimm,id=dimm3,memdev=ram2 > > > > device_del dimm3: I get the OST EVENT EJECT 0x3 and OST STATUS as 0x84(IN > > PROGRESS) If I check on the guest, the device has been successfully > > removed. But no OST EJECT SUCCESS event was received. > I think there should be a SUCCESS event, > it should be investigated from the guest side first, OST support in kernel > is relatively new. When testing older guest kernel (3.11), _OST success events are not sent from the guest. I haven't tried newer versions yet. In terms of OSPM _OST behaviour, I am not sure if returning OST success status on succcesful removal is *required*. Figure 6-37, page 306 of ACPI spec5.0 shows that on succcesfull OS ejection ejection, _EJ0 is evaluated. Evaluating _OST does not seem to be a requirement, is it? (cc'ing linux-acpi for input) In linux guests, on successful removal, _EJ0 is always evaluated. I believe we should be handling _EJ0 and doing the dimm removal (object_unparent) there. Currently OST successes are never received and dimm devices remain in QEMU even when successfully ejected from guest. E.g. a quick patch for _EJ0 handling, on top of Hu Tao's series: acpi, memory-hotplug: Add _EJ0 handling --- docs/specs/acpi_mem_hotplug.txt |3 ++- hw/acpi/memory_hotplug.c | 13 +++-- hw/i386/ssdt-misc.dsl|3 ++- include/hw/acpi/memory_hotplug.h |1 + 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/specs/acpi_mem_hotplug.txt b/docs/specs/acpi_mem_hotplug.txt index 1290994..1352962 100644 --- a/docs/specs/acpi_mem_hotplug.txt +++ b/docs/specs/acpi_mem_hotplug.txt @@ -28,7 +28,8 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte access): region will read/store data from/to selected memory device. [0x4-0x7] OST event code reported by OSPM [0x8-0xb] OST status code reported by OSPM - [0xc-0x13] reserved, writes into it are ignored + [0xc]EJ device if written to + [0xd-0x13] reserved, writes into it are ignored [0x14] Memory device control fields bits: 0: reserved, OSPM must clear it before writing to register diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index 8aa829d..d3edd28 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -93,9 +93,6 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data, case 0x03: /* EJECT */ switch (mdev->ost_status) { case 0x0: /* SUCCESS */ -object_unparent(OBJECT(mdev->dimm)); -mdev->is_removing = false; -mdev->dimm = NULL; break; case 0x1: /* FAILURE */ case 0x2: /* UNRECOGNIZED NOTIFY */ @@ -115,9 +112,6 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data, case 0x103: /* OSPM EJECT */ switch (mdev->ost_status) { case 0x0: /* SUCCESS */ -object_unparent(OBJECT(mdev->dimm)); -mdev->is_removing = false; -mdev->dimm = NULL; break; case 0x84: /* EJECTION IN PROGRESS */ mdev->is_enabled = false; @@ -137,6 +131,12 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data, mdev->is_enabled = false; } break; +case 0x0c: +mdev = &mem_st->devs[mem_st->selector]; +object_unparent(OBJECT(mdev->dimm)); +mdev->is_removing = false; +mdev->dimm = NULL; +break; } } @@ -238,6 +238,7 @@ static const VMStateDescription vmstate_memhp_sts = { VMSTATE_BOOL(is_inserting, MemStatus), VMSTATE_UINT32(ost_event, MemStatus), VMSTATE_UINT32(ost_status, MemStatus), +VMSTATE_UINT32(ej_status, MemStatus), VMSTATE_END_OF_LIST() } }; diff --git a/hw/i386/ssdt-misc.dsl b/hw/i386/ssdt-misc.dsl index 927e503..d20c6f0 100644 --- a/hw/i386/ssdt-misc.dsl +++ b/hw/i386/ssdt-misc.dsl @@ -163,6 +163,7 @@ DefinitionBlock ("ssdt-misc.aml", "SSDT", 0x01, "BXPC", "BXSSDTSUSP", 0x1) MSEL, 32, // DIMM selector, write only MOEV, 32, // _OST event code, write only MOSC, 32, // _OST status code, write only +MEJE, 8, // if written to, eject DIMM } Method(MESC, 0, Serialized) { @@ -283,7
Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
On Tue, May 06, 2014 at 09:52:39AM +0800, Hu Tao wrote: > On Mon, May 05, 2014 at 05:59:15PM +0200, Vasilis Liaskovitis wrote: > > Hi, > > > > On Mon, Apr 14, 2014 at 06:44:42PM +0200, Igor Mammedov wrote: > > > On Mon, 14 Apr 2014 15:25:01 +0800 > > > Hu Tao wrote: > > > > > > > On Fri, Apr 04, 2014 at 03:36:58PM +0200, Igor Mammedov wrote: > > > Could you be more specific, what and how doesn't work and why there is > > > need for SRAT entries per DIMM? > > > I've briefly tested with your unplug patches and linux seemed be ok with > > > unplug, > > > i.e. device node was removed from /sys after receiving remove > > > notification. > > > > Just a heads-up, is this the unplug patch that you are using for testing: > > https://github.com/taohu/qemu/commit/55c9540919e189b0ad2e6a759af742080f8f5dc4 > > > > or is there a newer version based on Igor's patchseries? > > Yeah. There is a new version. I pushed it up to > https://github.com/taohu/qemu/commits/memhp for you to check out. cool, thanks. - Vasilis
Re: [Qemu-devel] [PATCH 20/35] acpi: memory hotplug ACPI hardware implementation
On Tue, May 06, 2014 at 09:13:13AM +0200, Igor Mammedov wrote: > On Mon, 5 May 2014 14:20:25 +0200 > Vasilis Liaskovitis wrote: > > On Fri, Apr 04, 2014 at 03:36:45PM +0200, Igor Mammedov wrote: > > > +if (data == 1) { > > > +/* TODO: handle device insert OST event */ > > > +} else if (data == 3) { > > > +/* TODO: handle device remove OST event */ > > > +} > > > > Are there any patches planned to report _OST notifications to upper > > management > > layers? E.g. some older patchseries implemented a queue for these > > notifications > > that could be queried with an "info memhp" command. > I don't recall seeing patches "info memhp", could you point them to me, > please? > But I have in mind to add corresponding commands for get OST and sending > corresponding QMP events. Old patch for "info memhp" (or "info memory-hotplug") are here: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03539.html not sure if we want a separate command though, or have eveyrhting in "info dimm" > > > > > As a more general question for the patchseries: How do we query > > status/presence > > of dimms present? Some posssible options could be: > > > > - info qtree: If links<> are implemented between dimms and an acpi machine > > adapter, would the dimms show up in the general device tree? Currently I > > believe > > they don't. > > > > - info dimm: We could have a new "info dimm" command that shows information > > for > > present DIMMs: start-end guest physical address, last _OST notification > > received > > for this DIMM, as well as backing memdev object for this dimm. > I'd prefer this one for hmp and from QMP side qom-get could be used to > enumerate/get > properties. I also prefer "info dimm" for hmp. For qmp, how flexible is qom-get? Can we use a single qom-get command to e.g. receive properties of all dimm devices? I think a command that lists all dimm devices and properties could be useful. thanks, - Vasilis
Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
Hi, On Mon, Apr 14, 2014 at 06:44:42PM +0200, Igor Mammedov wrote: > On Mon, 14 Apr 2014 15:25:01 +0800 > Hu Tao wrote: > > > On Fri, Apr 04, 2014 at 03:36:58PM +0200, Igor Mammedov wrote: > Could you be more specific, what and how doesn't work and why there is > need for SRAT entries per DIMM? > I've briefly tested with your unplug patches and linux seemed be ok with > unplug, > i.e. device node was removed from /sys after receiving remove notification. Just a heads-up, is this the unplug patch that you are using for testing: https://github.com/taohu/qemu/commit/55c9540919e189b0ad2e6a759af742080f8f5dc4 or is there a newer version based on Igor's patchseries? thanks, - Vasilis
Re: [Qemu-devel] [PATCH 20/35] acpi: memory hotplug ACPI hardware implementation
Hi, On Fri, Apr 04, 2014 at 03:36:45PM +0200, Igor Mammedov wrote: > - implements QEMU hardware part of memory hotplug protocol > described at "docs/specs/acpi_mem_hotplug.txt" > - handles only memory add notification event for now > [...] > + [0x4-0x7] OST event code reported by OSPM > + [0x8-0xb] OST status code reported by OSPM > +case 0x4: /* _OST event */ > +mdev = &mem_st->devs[mem_st->selector]; > +if (data == 1) { > +/* TODO: handle device insert OST event */ > +} else if (data == 3) { > +/* TODO: handle device remove OST event */ > +} Are there any patches planned to report _OST notifications to upper management layers? E.g. some older patchseries implemented a queue for these notifications that could be queried with an "info memhp" command. As a more general question for the patchseries: How do we query status/presence of dimms present? Some posssible options could be: - info qtree: If links<> are implemented between dimms and an acpi machine adapter, would the dimms show up in the general device tree? Currently I believe they don't. - info dimm: We could have a new "info dimm" command that shows information for present DIMMs: start-end guest physical address, last _OST notification received for this DIMM, as well as backing memdev object for this dimm. (qemu) info dimm dimm0: range="start_address - end_address" memdev="obj0" _OST="last_OST message" dimm1: range="start_address - end_address" memdev="obj1" _OST="last_OST message" where last_OST message could be "hot-add succesfull", "hot-add failed", "hot-remove failed". Not sure how "hot-remove successful" would be reported though, as the dimm device would be removed (or soon to be removed) from the machine. Unless we have a separate command for OST messages received/queued, as mentioned above. If the guest does not support _OST, the OST entries would remain empty, at least giving management layer a hint that the guest may not have succesfully completed the requested hot-operation. The examples are all in hmp, but there should obviously be qmp support. Thoughts? thanks, - Vasilis
Re: [Qemu-devel] status of cpu hotplug work for x86_64?
Hi, On Mon, Apr 28, 2014 at 11:58:38AM -0600, Chris Friesen wrote: > Hi, > > I'm trying to figure out what the current status is for cpu hotplug > and hot-remove on x86_64. > > As far as I can tell, it seems like currently there is a QMP > "cpu-add" command but no matching remove...is that correct? correct. "cpu-add" is the way to hot-add CPUs. There is no support for cpu hot-remove at this point. The latest patchset for cpu hot-remove that I know of is: http://lists.gnu.org/archive/html/qemu-devel/2013-12/msg04266.html If I understand correctly the biggest hurdle is supporting vcpu destruction during the VM lifetime on the kvm host side. The corresponding kvm patches have not been accepted: http://comments.gmane.org/gmane.comp.emulators.kvm.devel/114347 So I have the same question as you: Is there any plan to support cpu hot-remove? thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v2] i386: Add _PXM ACPI method to CPU objects
On Wed, Nov 27, 2013 at 04:58:51PM +0100, Paolo Bonzini wrote: > Il 27/11/2013 16:53, Igor Mammedov ha scritto: > > Patch looks good, > > Please add patch to update hw/i386/ssdt-proc.hex.generated for hosts > > without iasl > > for completness > > Also please rename PXM to CPXM or CPPX for consistency. I posted an updated version of the patch on the list, also here: http://patchwork.ozlabs.org/patch/294636/ thanks, - Vasilis
[Qemu-devel] [PATCH v3] i386: Add _PXM ACPI method to CPU objects
This patch adds a _PXM method to ACPI CPU objects for the pc machine. The _PXM value is derived from the passed in guest info, same way as CPU SRAT entries. Currently, CPU SRAT entries are only enabled for cpus that are already present in the system. The SRAT entries for hotpluggable processors are disabled (flags bit 0 set to 0 in hw/i385/acpi-build.c:build_srat). Section 5.2.16.1 of ACPI spec mentions "If the Local APIC ID of a dynamically added processor is not present in the SRAT, a _PXM object must exist for the processor’s device or one of its ancestors in the ACPI Namespace." Since SRAT entries are not available for the hot-pluggable processors, a _PXM method must exist for them. Otherwise, the CPU is hot-added in the wrong NUMA node (default node 0). Even if CPU SRAT entries are enabled, _PXM method is what the linux kernel consults on hot-add time. Section 17.2.1 of ACPI spec mentions " OSPM will consume the SRAT only at boot time. OSPM should use _PXM for any devices that are hot-added into the system after boot up." To be more precise if SRAT information is available to the guest kernel, it is used. However, parsed SRAT info is reset and lost after hot-remove operations, see kernel commit c4c60524. This means that on a hot-unplug / hot-replug scenario, and without a _PXM method, the kernel may put a CPU on different nodes because SRAT info has been reset by a previous hot-remove operation. The above hot-remove/hot-add scenario has been tested on master, plus cpu-del patches from: https://lists.gnu.org/archive/html/qemu-devel/2013-10/msg01085.html With the curret _PXM patch, hot-added CPUs are always placed into the correct NUMA node, regardless of kernel behaviour. v1->v2: Make method return a DWORD integer Tested on qemu master + cpu-del patches v2->v3: Add changed hw/i386/sdt-proc.hex.generated file Change PXM constant name to CPXM Signed-off-by: Vasilis Liaskovitis Reviewed-by: Thilo Fromm --- hw/i386/acpi-build.c|5 hw/i386/ssdt-proc.dsl |5 hw/i386/ssdt-proc.hex.generated | 57 --- 3 files changed, 51 insertions(+), 16 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index b48c930..387a869 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -605,6 +605,7 @@ static inline char acpi_get_hex(uint32_t val) #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2) #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4) #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start) +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) @@ -726,6 +727,10 @@ build_ssdt(GArray *table_data, GArray *linker, proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i); proc[ACPI_PROC_OFFSET_CPUID1] = i; proc[ACPI_PROC_OFFSET_CPUID2] = i; +proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info->node_cpu[i]; +proc[ACPI_PROC_OFFSET_CPUPXM + 1] = 0; +proc[ACPI_PROC_OFFSET_CPUPXM + 2] = 0; +proc[ACPI_PROC_OFFSET_CPUPXM + 3] = 0; } /* build this code: diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl index 8229bfd..52b44e3 100644 --- a/hw/i386/ssdt-proc.dsl +++ b/hw/i386/ssdt-proc.dsl @@ -47,6 +47,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1) * also updating the C code. */ Name(_HID, "ACPI0007") +ACPI_EXTRACT_NAME_DWORD_CONST ssdt_proc_pxm +Name(CPXM, 0x) External(CPMA, MethodObj) External(CPST, MethodObj) External(CPEJ, MethodObj) @@ -59,5 +61,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1) Method(_EJ0, 1, NotSerialized) { CPEJ(ID, Arg0) } +Method(_PXM, 0) { +Return (CPXM) +} } } diff --git a/hw/i386/ssdt-proc.hex.generated b/hw/i386/ssdt-proc.hex.generated index bb9920d..8497866 100644 --- a/hw/i386/ssdt-proc.hex.generated +++ b/hw/i386/ssdt-proc.hex.generated @@ -1,17 +1,26 @@ +static unsigned char ssdt_proc_end[] = { +0x8e +}; static unsigned char ssdt_proc_name[] = { 0x28 }; +static unsigned char ssdt_proc_pxm[] = { +0x4e +}; +static unsigned char ssdt_proc_id[] = { +0x38 +}; static unsigned char ssdp_proc_aml[] = { 0x53, 0x53, 0x44, 0x54, -0x78, +0x8e, 0x0, 0x0, 0x0, 0x1, -0xb8, +0x19, 0x42, 0x58, 0x50, @@ -34,21 +43,21 @@ static unsigned char ssdp_proc_aml[] = { 0x4e, 0x54, 0x4c, -0x23, -0x8, -0x13, +0x28, +0x5, +0x10, 0x20, 0x5b, 0x83, -0x42, -0x5, +0x48, +0x6, 0x43, 0x50, 0x41, 0x41, 0xaa, -0x10, -0xb0, +0x0, +0x0, 0x0, 0x0, 0x0, @@ -74,6 +83,16 @@ static unsigned char
[Qemu-devel] [RFC PATCH v2] i386: Add _PXM ACPI method to CPU objects
This patch adds a _PXM method to ACPI CPU objects for the pc machine. The _PXM value is derived from the passed in guest info, same way as CPU SRAT entries. Currently, CPU SRAT entries are only enabled for cpus that are already present in the system. The SRAT entries for hotpluggable processors are disabled (flags bit 0 set to 0 in hw/i385/acpi-build.c:build_srat). Section 5.2.16.1 of ACPI spec mentions "If the Local APIC ID of a dynamically added processor is not present in the SRAT, a _PXM object must exist for the processor’s device or one of its ancestors in the ACPI Namespace." Since SRAT entries are not available for the hot-pluggable processors, a _PXM method must exist for them. Otherwise, the CPU is hot-added in the wrong NUMA node (default node 0). Even if CPU SRAT entries are enabled, _PXM method is what the linux kernel consults on hot-add time. Section 17.2.1 of ACPI spec mentions " OSPM will consume the SRAT only at boot time. OSPM should use _PXM for any devices that are hot-added into the system after boot up." To be more precise if SRAT information is available to the guest kernel, it is used. However, parsed SRAT info is reset and lost after hot-remove operations, see kernel commit c4c60524. This means that on a hot-unplug / hot-replug scenario, and without a _PXM method, the kernel may put a CPU on different nodes because SRAT info has been reset by a previous hot-remove operation. The above hot-remove/hot-add scenario has been tested on master, plus cpu-del patches from: https://lists.gnu.org/archive/html/qemu-devel/2013-10/msg01085.html With the curret _PXM patch, hot-added CPUs are always placed into the correct NUMA node, regardless of kernel behaviour. v1->v2: Make method return a DWORD integer Tested on qemu master + cpu-del patches Signed-off-by: Vasilis Liaskovitis Reviewed-by: Thilo Fromm --- hw/i386/acpi-build.c |5 + hw/i386/ssdt-proc.dsl |5 + 2 files changed, 10 insertions(+) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index d089e1e..3c11ddc 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val) #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2) #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4) #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start) +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) @@ -724,6 +725,10 @@ build_ssdt(GArray *table_data, GArray *linker, proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i); proc[ACPI_PROC_OFFSET_CPUID1] = i; proc[ACPI_PROC_OFFSET_CPUID2] = i; +proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info->node_cpu[i]; +proc[ACPI_PROC_OFFSET_CPUPXM + 1] = 0; +proc[ACPI_PROC_OFFSET_CPUPXM + 2] = 0; +proc[ACPI_PROC_OFFSET_CPUPXM + 3] = 0; } /* build this code: diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl index 8229bfd..8d4c5bf 100644 --- a/hw/i386/ssdt-proc.dsl +++ b/hw/i386/ssdt-proc.dsl @@ -47,6 +47,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1) * also updating the C code. */ Name(_HID, "ACPI0007") +ACPI_EXTRACT_NAME_DWORD_CONST ssdt_proc_pxm +Name(PXM, 0x) External(CPMA, MethodObj) External(CPST, MethodObj) External(CPEJ, MethodObj) @@ -59,5 +61,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1) Method(_EJ0, 1, NotSerialized) { CPEJ(ID, Arg0) } +Method(_PXM, 0) { +Return (PXM) +} } } -- 1.7.10.4
Re: [Qemu-devel] [RFC qom-cpu v4 09/10] piix4: implement function cpu_status_write() for vcpu ejection
On Fri, Nov 22, 2013 at 09:02:27AM +0100, Vasilis Liaskovitis wrote: > Hi, > > On Wed, Oct 09, 2013 at 05:43:17PM +0800, Chen Fan wrote: > > When OS eject a vcpu (like: echo 1 > /sys/bus/acpi/devices/LNXCPUXX/eject), > > it will call acpi EJ0 method, the firmware will write the new cpumap, QEMU > > will know which vcpu need to be ejected. > > I think that the _EJ0 callback (CPEJ method in > hw/i386/acpi-dsdt-cpu-hotplug.dsl) > currently does not write the new cpumap, it only sleeps. So cpu_state_write is > never called on ejection, and the cpu objects remain allocated in qemu. Is > there > an updated version of the patchseries with a CPEJ that writes the new cpumap? oops, never mind. I missed your seabios patch mentioned in the head message, got it now. thanks, - Vasilis
Re: [Qemu-devel] [RFC qom-cpu v4 09/10] piix4: implement function cpu_status_write() for vcpu ejection
Hi, On Wed, Oct 09, 2013 at 05:43:17PM +0800, Chen Fan wrote: > When OS eject a vcpu (like: echo 1 > /sys/bus/acpi/devices/LNXCPUXX/eject), > it will call acpi EJ0 method, the firmware will write the new cpumap, QEMU > will know which vcpu need to be ejected. I think that the _EJ0 callback (CPEJ method in hw/i386/acpi-dsdt-cpu-hotplug.dsl) currently does not write the new cpumap, it only sleeps. So cpu_state_write is never called on ejection, and the cpu objects remain allocated in qemu. Is there an updated version of the patchseries with a CPEJ that writes the new cpumap? thanks, - Vasilis > > Signed-off-by: Chen Fan > --- > hw/acpi/piix4.c | 37 - > 1 file changed, 36 insertions(+), 1 deletion(-) > > diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c > index dc506bf..fd27001 100644 > --- a/hw/acpi/piix4.c > +++ b/hw/acpi/piix4.c > @@ -61,6 +61,7 @@ struct pci_status { > > typedef struct CPUStatus { > uint8_t sts[PIIX4_PROC_LEN]; > +uint8_t old_sts[PIIX4_PROC_LEN]; > } CPUStatus; > > typedef struct PIIX4PMState { > @@ -611,6 +612,12 @@ static const MemoryRegionOps piix4_pci_ops = { > }, > }; > > +static void acpi_piix_eject_vcpu(int64_t cpuid) > +{ > +/* TODO: eject a vcpu, release allocated vcpu and exit the vcpu pthread. > */ > +PIIX4_DPRINTF("vcpu: %" PRIu64 " need to be ejected.\n", cpuid); > +} > + > static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size) > { > PIIX4PMState *s = opaque; > @@ -623,7 +630,27 @@ static uint64_t cpu_status_read(void *opaque, hwaddr > addr, unsigned int size) > static void cpu_status_write(void *opaque, hwaddr addr, uint64_t data, > unsigned int size) > { > -/* TODO: implement VCPU removal on guest signal that CPU can be removed > */ > +PIIX4PMState *s = opaque; > +CPUStatus *cpus = &s->gpe_cpu; > +uint8_t val; > +int i; > +int64_t cpuid = 0; > + > +val = cpus->old_sts[addr] ^ data; > + > +if (val == 0) { > +return; > +} > + > +for (i = 0; i < 8; i++) { > +if (val & 1 << i) { > +cpuid = 8 * addr + i; > +} > +} > + > +if (cpuid != 0) { > +acpi_piix_eject_vcpu(cpuid); > +} > } > > static const MemoryRegionOps cpu_hotplug_ops = { > @@ -643,13 +670,20 @@ static void piix4_cpu_hotplug_req(PIIX4PMState *s, > CPUState *cpu, > ACPIGPE *gpe = &s->ar.gpe; > CPUClass *k = CPU_GET_CLASS(cpu); > int64_t cpu_id; > +int i; > > assert(s != NULL); > > *gpe->sts = *gpe->sts | PIIX4_CPU_HOTPLUG_STATUS; > cpu_id = k->get_arch_id(CPU(cpu)); > + > +for (i = 0; i < PIIX4_PROC_LEN; i++) { > +g->old_sts[i] = g->sts[i]; > +} > + > if (action == PLUG) { > g->sts[cpu_id / 8] |= (1 << (cpu_id % 8)); > +g->old_sts[cpu_id / 8] |= (1 << (cpu_id % 8)); > } else { > g->sts[cpu_id / 8] &= ~(1 << (cpu_id % 8)); > } > @@ -688,6 +722,7 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion > *parent, > > g_assert((id / 8) < PIIX4_PROC_LEN); > s->gpe_cpu.sts[id / 8] |= (1 << (id % 8)); > +s->gpe_cpu.old_sts[id / 8] |= (1 << (id % 8)); > } > memory_region_init_io(&s->io_cpu, OBJECT(s), &cpu_hotplug_ops, s, >"acpi-cpu-hotplug", PIIX4_PROC_LEN); > -- > 1.8.1.4 > >
Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects
Hi, On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote: > > This patch adds a _PXM method to ACPI CPU objects for the pc machine. The > > _PXM > > value is derived from the passed in guest info, same way as CPU SRAT > > entries. > > > > The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when > > using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux > > guest kernel parses the SRAT CPU entries at boot time and stores them in the > > array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel > > resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel > > commit > > c4c60524). When the removed cpu is hot-added again, the linux kernel looks > > up > > the hot-added cpu object's _PXM method instead of somehow re-discovering the > > SRAT entry info. With current qemu/seabios, the _PXM method is not found, > > and > > the CPU is thus hot-plugged in the default NUMA node 0. (The problem does > > not > > show up on initial hotplug of a cpu; the PXM method is still not found in > > this > > case, but the kernel still has the correct proximity value from the CPU's > > SRAT > > entry stored in __apicid_to_node) > > > > ACPI spec mentions that the _PXM method is the correct way to determine > > proximity information at hot-add time. > > Where does it say this? > I found this: > If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically > added processor is not present in the System Resource Affinity Table > (SRAT), a _PXM object must exist for the processor’s device or one of > its ancestors in the ACPI Namespace. > > Does this mean that linux is buggy, and should be fixed up to look up > the apic ID in SRAT? The quote above suggests that if SRAT is absent, _PXM should be present. Seabios/qemu provide SRAT entries, and no _PXM. The fact that the kernel resets the parse SRAT info on hot-remove time looks like a kernel problem. But As Toshi Kani mentioned in the original thread, here is a quote from ACPI 5.0, stating _PXM and only _PXM should be used at hot-plug time: === 17.2.1 System Resource Affinity Table Definition This optional System Resource Affinity Table (SRAT) provides the boot time description of the processor and memory ranges belonging to a system locality. OSPM will consume the SRAT only at boot time. OSPM should use _PXM for any devices that are hot-added into the system after boot up. So in this sense, the kernel is correct (kernel only uses _PXM at hot-plug time) , and qemu/Seabios should have _PXM methods for hot operations. > > > So far, qemu/seabios do not provide this > > method for CPUs. So regardless of kernel behaviour, it is a good idea to add > > this _PXM method. Since ACPI table generation has recently been moved from > > seabios to qemu, we do this in qemu. > > > > Note that the above hot-remove/hot-add scenario has been tested on an older > > qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu > > master (since cpu-del support is still not on master). The only testing done > > with qemu/seabios master and this patch, are successful boots of multi-node > > linux and windows8 guests. > > > > For the initial discussion on seabios and linux-acpi lists see > > http://www.spinics.net/lists/linux-acpi/msg47058.html > > > > Signed-off-by: Vasilis Liaskovitis > > Reviewed-by: Thilo Fromm > > Even if this is a linux bug, I have no issue with working around > it in qemu. > > But I think proper testing needs to be done with rebased upport for cpu-del. Ok, I can try to rebase cpu-del support for testing. If there are cpu-del bits already somewhere (Igor?) and not merged yet, please point me to them. > > > --- > > hw/i386/acpi-build.c |2 ++ > > hw/i386/ssdt-proc.dsl |2 ++ > > 2 files changed, 4 insertions(+) > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > > index 6cfa044..9373f5e 100644 > > --- a/hw/i386/acpi-build.c > > +++ b/hw/i386/acpi-build.c > > @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val) > > #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2) > > #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4) > > #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start) > > +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start) > > #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) > > #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_p
[Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects
This patch adds a _PXM method to ACPI CPU objects for the pc machine. The _PXM value is derived from the passed in guest info, same way as CPU SRAT entries. The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux guest kernel parses the SRAT CPU entries at boot time and stores them in the array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit c4c60524). When the removed cpu is hot-added again, the linux kernel looks up the hot-added cpu object's _PXM method instead of somehow re-discovering the SRAT entry info. With current qemu/seabios, the _PXM method is not found, and the CPU is thus hot-plugged in the default NUMA node 0. (The problem does not show up on initial hotplug of a cpu; the PXM method is still not found in this case, but the kernel still has the correct proximity value from the CPU's SRAT entry stored in __apicid_to_node) ACPI spec mentions that the _PXM method is the correct way to determine proximity information at hot-add time. So far, qemu/seabios do not provide this method for CPUs. So regardless of kernel behaviour, it is a good idea to add this _PXM method. Since ACPI table generation has recently been moved from seabios to qemu, we do this in qemu. Note that the above hot-remove/hot-add scenario has been tested on an older qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu master (since cpu-del support is still not on master). The only testing done with qemu/seabios master and this patch, are successful boots of multi-node linux and windows8 guests. For the initial discussion on seabios and linux-acpi lists see http://www.spinics.net/lists/linux-acpi/msg47058.html Signed-off-by: Vasilis Liaskovitis Reviewed-by: Thilo Fromm --- hw/i386/acpi-build.c |2 ++ hw/i386/ssdt-proc.dsl |2 ++ 2 files changed, 4 insertions(+) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 6cfa044..9373f5e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val) #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2) #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4) #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start) +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) @@ -724,6 +725,7 @@ build_ssdt(GArray *table_data, GArray *linker, proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i); proc[ACPI_PROC_OFFSET_CPUID1] = i; proc[ACPI_PROC_OFFSET_CPUID2] = i; +proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info->node_cpu[i]; } /* build this code: diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl index 8229bfd..7eef8b2 100644 --- a/hw/i386/ssdt-proc.dsl +++ b/hw/i386/ssdt-proc.dsl @@ -47,6 +47,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1) * also updating the C code. */ Name(_HID, "ACPI0007") +ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_pxm +Name(_PXM, 0xAA) External(CPMA, MethodObj) External(CPST, MethodObj) External(CPEJ, MethodObj) -- 1.7.10.4
Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
On Wed, Jul 24, 2013 at 12:02:46PM +0200, Igor Mammedov wrote: > On Wed, 24 Jul 2013 17:52:50 +0800 > Hu Tao wrote: > > > v6 doesn't work here, things are going fine until online hotplugged > > memory in guest. > > > > steps: > > > > 1. qemu cmd: > > > > ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 512,maxmem=2G,slots=1 \ > > -hda /mnt/data/libvirt-images/hut-rhel6.3.img -L ../pc-bios-memhp/ > > > > (bios is from MST's acpi tree) > > > > 2. hot-plug a dimm: > > > > device_adddimm,id=d0,size=1G > > > > 3. online hotplugged memory(in guest): > > > > echo 'onlone' > /sys/devices/system/memory/memory/32/state > > > > then after several seconds the console prints error messages like: > > > > nommu_map_sg: overflow 107c15000+4096 of device mask > > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > > ata1.00: cmd ca/00:10:d0:0d:a4/00:00:00:00:00/e0 tag 0 dma 8192 out > >res 50/00:00:08:09:e0/00:00:00:00:00/e0 Emask 0x40 (internal > > error) > > ata1.00: configured for MWDMA2 > > ata1: EH complete > > > > (repeat) > > > > and can't do any disk I/O. > Looks like a guest bug where it tries to use high memory but assumes low one. yes. Iirc booting the guest kernel with "swiotlb=force" option could also work around this. > if you boot guest with initial memory 4Gb then it wont hit issue or use FC18 > which doesn't have this problem. thanks, - Vasilis
Re: [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start
Hi, 2013/6/26 Hu Tao > From: Vasilis Liaskovitis > > Initialize the 32-bit and 64-bit pci starting offsets from values passed > in by > the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the > starting > offsets based on initial memory and hotplug-able dimms. > We should drop this patch and the corresponding seabios patch since Michael Tsirkin's pci-window patches are merged or will be soon. See "pc: pass PCI hole ranges to Guests" - already in qemu master "pci: load memory window setup from host" (should go into seabios) thanks, - Vasilis > It's possible to avoid the new paravirt interface, and calculate pci > ranges from > srat entries. But the code changes are ugly, see: > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03548.html > > Signed-off-by: Vasilis Liaskovitis > --- > src/paravirt.c | 7 +++ > src/paravirt.h | 1 + > src/pciinit.c | 9 + > 3 files changed, 17 insertions(+) > > diff --git a/src/paravirt.c b/src/paravirt.c > index 5925c63..9c1e511 100644 > --- a/src/paravirt.c > +++ b/src/paravirt.c > @@ -134,6 +134,7 @@ qemu_platform_setup(void) > #define QEMU_CFG_BOOT_MENU 0x0e > #define QEMU_CFG_MAX_CPUS 0x0f > #define QEMU_CFG_FILE_DIR 0x19 > +#define QEMU_CFG_PCI_WINDOW 0x1a > #define QEMU_CFG_ARCH_LOCAL 0x8000 > #define QEMU_CFG_ACPI_TABLES(QEMU_CFG_ARCH_LOCAL + 0) > #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) > @@ -339,3 +340,9 @@ void qemu_cfg_init(void) > , 0, be32_to_cpu(qfile.size)); > } > } > + > +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start) > +{ > +qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64)); > +qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64)); > +} > diff --git a/src/paravirt.h b/src/paravirt.h > index fce5af9..2c37d0d 100644 > --- a/src/paravirt.h > +++ b/src/paravirt.h > @@ -27,5 +27,6 @@ static inline int runningOnKVM(void) { > void qemu_preinit(void); > void qemu_platform_setup(void); > void qemu_cfg_init(void); > +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start); > > #endif > diff --git a/src/pciinit.c b/src/pciinit.c > index 8370b96..7e63c5e 100644 > --- a/src/pciinit.c > +++ b/src/pciinit.c > @@ -805,6 +805,7 @@ static void pci_bios_map_devices(struct pci_bus > *busses) > void > pci_setup(void) > { > +u64 pv_pcimem_start, pv_pcimem64_start; > if (!CONFIG_QEMU) > return; > > @@ -837,6 +838,14 @@ pci_setup(void) > > pci_bios_init_devices(); > > +/* if qemu gives us other pci window values, it means there are > hotplug-able > + * dimms. Adjust accordingly */ > +qemu_cfg_get_pci_offsets(&pv_pcimem_start, &pv_pcimem64_start); > +if (pv_pcimem_start > pcimem_start) > +pcimem_start = pv_pcimem_start; > +if (pv_pcimem64_start > pcimem64_start) > +pcimem64_start = pv_pcimem64_start; > + > free(busses); > > pci_enable_default_vga(); > -- > 1.8.3.1 > >
Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
On Mon, Jul 15, 2013 at 07:10:30PM +0200, Paolo Bonzini wrote: > Il 15/07/2013 19:05, Vasilis Liaskovitis ha scritto: > > from what i understand, we are currently favoring this numa option? (I saw > > it > > mentioned in Gao's numa patchset series as well) > > The two patchsets have some overlap, so it's good to find a design that > fits both. > > > There is still the question of "how many hotpluggable dimm devices does this > > memory range describe?". With the dimm device that was clearly defined, but > > not > > so with this option. Do we choose a default granularity e.g. 1 GB? > > I think it's the same. One "-numa mem" option = one "-device dimm" > option; both define one range. Unused memory ranges may remain if you ah ok, I get the proposal now. > stumble upon a unusable range such as the PCI window. For example two > "-numa mem,size=2G" options would allocate memory from 0 to 2G and from > 4 to 6G. ok, that's done already. thanks, - Vasilis
Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
Hi, On Thu, Jun 27, 2013 at 08:55:25AM +0200, Paolo Bonzini wrote: > Il 27/06/2013 07:08, Wanlong Gao ha scritto: > > Do we really need to specify the memory range? I suspect that we can > > follow current design of normal memory in hot-plug memory. > > I think we can do both. I'm afraid that the configuration of the VM > will not be perfectly reproducible without specifying the range, more so > if you allow hotplug. > > > Currently, > > we just specify the size of normal memory in each node, and the range > > in normal memory is node by node. Then I think we can just specify > > the memory size of hot-plug in each node, then the hot-plug memory > > range is also node by node, and the whole hot-plug memory block is > > just located after the normal memory block. If so, the option can > > come like: > > -numa > > node,nodeid=0,mem=2G,cpus=0-1,mem-hotplug=2G,mem-policy=membind,mem-hostnode=0-1,mem-hotplug-policy=interleave,mem-hotplug-hostnode=1 > > -numa > > node,nodeid=1,mem=2G,cpus=2-3,mem-hotplug=2G,mem-policy=preferred,mem-hostnode=1,mem-hotplug-policy=membind,mem-hotplug-hostnode=0-1 > > I think specifying different policies and bindings for normal and > hotplug memory is too much fine-grained. If you really want that, then > you would need something like > > -numa node,nodeid=0,cpus=0-1 \ > -numa mem,nodeid=0,size=2G,policy=membind,hostnode=0-1 \ > -numa mem,nodeid=0,size=2G,policy=interleave,hostnode=1,populated=no > > Hmm... this actually doesn't look too bad, and it is much more > future-proof. Eduardo, what do you think about it? Should Wanlong redo > his patches to support this "-numa mem" syntax? Parsing it should be > easy using the QemuOpts visitor, too. from what i understand, we are currently favoring this numa option? (I saw it mentioned in Gao's numa patchset series as well) There is still the question of "how many hotpluggable dimm devices does this memory range describe?". With the dimm device that was clearly defined, but not so with this option. Do we choose a default granularity e.g. 1 GB? Also, as you mentioned, without specifying the memory range, the VM configuration may be ambiguous. Currently, the VM memory map depends on the order of dimms defined on the command line. So: "-device dimm,id=dimm0,size=1G,node=0 -device dimm,id=dimm1,size=2G,node=0" and "-device dimm,id=dimm1,size=2G,node=0 -device dimm,id=dimm1,size=1G,node=0" assign different memory ranges to the dimms. On the other hand, iirc memory ranges were discussed with previous maintainers but was rejected: The user/management library may not want to know or simply does not know architectural details of the guest hardware. What happens if the user specifies memory on the PCI-hole? Do we bail out or adjust their arguments? Adjusting ranges might open another can of worms. In any case, it would be good to get a final consensus on this. thanks, - Vasilis
Re: [Qemu-devel] [PATCH v11 11/15] rdma: core logic
Hi, On Mon, Jun 24, 2013 at 09:58:01PM -0400, mrhi...@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" > [...] > +/* > + * Put in the log file which RDMA device was opened and the details > + * associated with that device. > + */ > +static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs) > +{ > +printf("%s RDMA Device opened: kernel name %s " > + "uverbs device name %s, " > + "infiniband_verbs class device path %s," > + " infiniband class device path %s\n", > +who, > +verbs->device->name, > +verbs->device->dev_name, > +verbs->device->dev_path, > +verbs->device->ibdev_path); > +} see below > +static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) > +{ > +int ret = -EINVAL, idx; > +struct sockaddr_in sin; > +struct rdma_cm_id *listen_id; > +char ip[40] = "unknown"; > + > +for (idx = 0; idx < RDMA_CONTROL_MAX_WR; idx++) { > +rdma->wr_data[idx].control_len = 0; > +rdma->wr_data[idx].control_curr = NULL; > +} > + > +if (rdma->host == NULL) { > +ERROR(errp, "RDMA host is not set!\n"); > +rdma->error_state = -EINVAL; > +return -1; > +} > +/* create CM channel */ > +rdma->channel = rdma_create_event_channel(); > +if (!rdma->channel) { > +ERROR(errp, "could not create rdma event channel\n"); > +rdma->error_state = -EINVAL; > +return -1; > +} > + > +/* create CM id */ > +ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP); > +if (ret) { > +ERROR(errp, "could not create cm_id!\n"); > +goto err_dest_init_create_listen_id; > +} > + > +memset(&sin, 0, sizeof(sin)); > +sin.sin_family = AF_INET; > +sin.sin_port = htons(rdma->port); > + > +if (rdma->host && strcmp("", rdma->host)) { > +struct hostent *dest_addr; > +dest_addr = gethostbyname(rdma->host); > +if (!dest_addr) { > +ERROR(errp, "migration could not gethostbyname!\n"); > +ret = -EINVAL; > +goto err_dest_init_bind_addr; > +} > +memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr, > +dest_addr->h_length); > +inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip); > +} else { > +sin.sin_addr.s_addr = INADDR_ANY; > +} > + > +DPRINTF("%s => %s\n", rdma->host, ip); > + > +ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin); > +if (ret) { > +ERROR(errp, "Error: could not rdma_bind_addr!\n"); > +goto err_dest_init_bind_addr; > +} > + > +rdma->listen_id = listen_id; > +if (listen_id->verbs) { > +rdma->verbs = listen_id->verbs; > +} > +qemu_rdma_dump_id("dest_init", rdma->verbs); I wonder if you have ever hit the case where rdma_bind_addr() does not set the verbs structure in listen_id because we are binding to the loopback device (also see linux kernel commit 8523c048). I keep hitting this case on my destination VM ("incoming x-rdma:host:port) Then I think qemu_rdma_dump_id can segfault trying to dereference a null verbs structure. The dump_id function should check for non-NULL verbs argument, or the dump should be made only in the (verbs != NULL) if clause. Disabling the dump_id above, I have rdma_resolve_addr() problems on the source VM side (getting RDMA_CM_EVENT_ADDR_ERROR instead of RDMA_CM_EVENT_ADDR_RESOLVED). I assume that is because of the null verbs structure destination problem above. qemu_rdma_dest_prepare() will always fail with a NULL verbs argument: > + > +static int qemu_rdma_dest_prepare(RDMAContext *rdma, Error **errp) > +{ > +int ret; > +int idx; > + > +if (!rdma->verbs) { > +ERROR(errp, "no verbs context!\n"); > +return 0; > +} It is first called from rdma_start_incoming_migration() and will fail with the loopback binding case (rdma->verbs == NULL). however later qemu_rdma_accept() will check against the incoming cm_event verbs structure and set the RDMAContext's verb struct, calling qemu_rdma_dest_prepare with that struct: [...] > +static int qemu_rdma_accept(RDMAContext *rdma) [...] > +if (!rdma->verbs) { > +rdma->verbs = verbs; > +/* > + * Cannot propagate errp, as there is no error pointer > + * to be propagated. > + */ > +ret = qemu_rdma_dest_prepare(rdma, NULL); > +if (ret) { > +fprintf(stderr, "rdma migration: error preparing dest!\n"); > +goto err_rdma_dest_wait; > +} Are these two cases intentionally different? thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, [...] >> > > >> > > I haven't updated the series for a while, but I can rework if there is a >> > > more >> > > clear direction for the community. >> > > >> > > Another open issue is reference counting of memoryregions in qemu memory >> > > model. In order to make memory hot-remove operations safe, we need to >> > > remove >> > > a memoryregion after all users (e.g. both guest and block layer) have >> > > stopped >> > > using it, >> > >> > it seems it mostly up to the user who want to hot-(un)plug, >> > if user want to un-plug a memory which is kernel's main memory, kernel >> > will always run on it(never stop) unless power off. >> > and if guest stops, all DIMMs should be safe to hot-remove, >> > or else we should do something to let user can unlock all reference. >> >> it's not only the guest-side that needs to stop using it, we need to make >> sure >> that the qemu block layer is also not using the memory region anymore. See >> the 2 >> links below for discussion: >> > > can't we simply track this(MemoryRegion) usage by ref-count? > e.g. > every time mr used, inc ref-count, then dec it when unused > even for cpu_physical_memory_map and other potential users. > yes, that's the idea the patchset below try to implement, but last time I checked this was not upstreamed. I will take a closer look next week. >> > > see discussion at >> > > http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03986.html. >> > > There was a >> > > relevant ibm patchset >> > > https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02697.html >> > > but it was not merged. - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, On Tue, Mar 26, 2013 at 10:47:01AM -0400, Luiz Capitulino wrote: > On Tue, 18 Dec 2012 13:41:28 +0100 > Vasilis Liaskovitis wrote: > > > This is v4 of the ACPI memory hotplug functionality. Only x86_64 target is > > supported (both i440fx and q35). There are still several issues, but it's > > been a while since v3 and I wanted to get some more feedback on the current > > state of the patchseries. > > It seems this series doesn't apply anymore, do you plan to respin it? > I 'll respin sometime in April due to other work currently. > Also, some months ago I saw patches flying on linux-mm fixing some > issues related to memory hotplug, so should this work with latest Linux > kernel? hot-add is working. But hot-remove is broken in mainline and still in progress. See discussion at: https://lkml.org/lkml/2013/3/25/490 thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, On Tue, Mar 19, 2013 at 02:30:25PM +0800, li guang wrote: > 在 2013-01-10四的 19:57 +0100,Vasilis Liaskovitis写道: > > > > > > > > IIRC q35 supports memory hotplug natively (picked up in some > > > > discussion). Is that correct? > > > > > > > From previous discussion I also understand that q35 supports native > > > hotplug. > > > Sections 5.1 and 5.2 of the spec describe the MCH registers but the native > > > memory hotplug specifics are not yet clear to me. Any pointers from the > > > spec are welcome. > > > > Ping. Could anyone who's familiar with the q35 spec provide some pointers on > > native memory hotplug details in the spec? I see pcie hotplug registers but > > can't > > find memory hotplug interface details. If I am not mistaken, the spec is > > here: > > http://www.intel.com/design/chipsets/datashts/316966.htm > > > > Is the q35 memory hotplug support supposed to be an shpc-like interface > > geared > > towards memory slots instead of pci slots? > > > > seems there's no so-called q35-native support that was also my first impression when scanning the specification. Wasn't native memory hotplug capabilities one of the reasons that q35 got picked as the next pc chipset? thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, On Tue, Mar 19, 2013 at 03:28:38PM +0800, li guang wrote: [...] > > > > This is v4 of the ACPI memory hotplug functionality. Only x86_64 target > > > > is > > > > supported (both i440fx and q35). There are still several issues, but > > > > it's > > > > been a while since v3 and I wanted to get some more feedback on the > > > > current > > > > state of the patchseries. > > > > > > > > > > > We are working in memory hotplug functionality on pSeries machine. I'm > > > wondering whether and how we can better integrate things. Do you think the > > > DIMM abstraction is generic enough to be used in other machine types? > > > > I think the DimmDevice is generic enough but I am open to other > > suggestions. > > > > A related issue is that the patchseries uses a DimmBus to hot-add and > > hot-remove > > DimmDevice. Another approach that has been suggested is to use links<> > > between > > DimmDevices and the dram controller device (piix4 or mch for pc and q35-pc > > machines respectively). This would be more similar to the CPUState/qom > > patches - see Andreas Färber's earlier reply to this thread. > > > > I think we should get some consensus from the community/maintainers before > > we > > continue to integrate. > > > > I haven't updated the series for a while, but I can rework if there is a > > more > > clear direction for the community. > > > > Another open issue is reference counting of memoryregions in qemu memory > > model. In order to make memory hot-remove operations safe, we need to remove > > a memoryregion after all users (e.g. both guest and block layer) have > > stopped > > using it, > > it seems it mostly up to the user who want to hot-(un)plug, > if user want to un-plug a memory which is kernel's main memory, kernel > will always run on it(never stop) unless power off. > and if guest stops, all DIMMs should be safe to hot-remove, > or else we should do something to let user can unlock all reference. it's not only the guest-side that needs to stop using it, we need to make sure that the qemu block layer is also not using the memory region anymore. See the 2 links below for discussion: > > see discussion at > > http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03986.html. There > > was a > > relevant ibm patchset > > https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02697.html > > but it was not merged. > > thanks, - Vasilis
Re: [Qemu-devel] Some questions for PATCH: ACPI memory hotplug, TKS
Hi, On Mon, Mar 11, 2013 at 09:16:34AM +, Chijianchun wrote: > http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html > > > > one this patch, you say it does not support for windows, Does it support > now? > no,it still does not work for windows. It is most likely some seabios bug in the new dimm/acpi code, but I didn't get any helpful hints on the mailing list when I reported the problem, and the bug is still there. thanks, - Vasilis > > > -- > - A main blocker issue is windows guest functionality. The patchset does not > work for windows currently. Testing on win2012 server RC or windows2008 > consumer prerelease, when adding a DIMM, there is a BSOD with ACPI_BIOS_ERROR > > message. After this, the VM keeps rebooting with ACPI_BIOS_ERROR. > > > > > >
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, sorry for the delay. On Tue, Feb 19, 2013 at 07:39:40PM -0300, Erlon Cruz wrote: > On Tue, Dec 18, 2012 at 10:41 AM, Vasilis Liaskovitis < > vasilis.liaskovi...@profitbricks.com> wrote: > > > This is v4 of the ACPI memory hotplug functionality. Only x86_64 target is > > supported (both i440fx and q35). There are still several issues, but it's > > been a while since v3 and I wanted to get some more feedback on the current > > state of the patchseries. > > > > > We are working in memory hotplug functionality on pSeries machine. I'm > wondering whether and how we can better integrate things. Do you think the > DIMM abstraction is generic enough to be used in other machine types? I think the DimmDevice is generic enough but I am open to other suggestions. A related issue is that the patchseries uses a DimmBus to hot-add and hot-remove DimmDevice. Another approach that has been suggested is to use links<> between DimmDevices and the dram controller device (piix4 or mch for pc and q35-pc machines respectively). This would be more similar to the CPUState/qom patches - see Andreas Färber's earlier reply to this thread. I think we should get some consensus from the community/maintainers before we continue to integrate. I haven't updated the series for a while, but I can rework if there is a more clear direction for the community. Another open issue is reference counting of memoryregions in qemu memory model. In order to make memory hot-remove operations safe, we need to remove a memoryregion after all users (e.g. both guest and block layer) have stopped using it, see discussion at http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03986.html. There was a relevant ibm patchset https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02697.html but it was not merged. > > > > Overview: > > > > Dimm device layout is modeled with a normal qemu device: > > > > "-device dimm,id=name,size=sz,node=pxm,populated=on|off,bus=membus.0" > > > > > How does this will handle the no-hotplugable memory for example the memory > passed in '-m' parameter? The non-hotpluggable initial memory (-m) is currently not modelled at all as a DimmDevice. We may want to model it though. thanks, - Vasilis
Re: [Qemu-devel] [RFC v1 3/3] make address_space_map safe
Hi, I am looking at this old ref/unref patchset for safely removing hot-plugged dimms/MemoryRegions. I am not sure if the set is still actively worked on or relevant for qemu-master, but I had a small comment below: On Fri, Nov 09, 2012 at 11:14:30AM +0800, Liu Ping Fan wrote: > From: Liu Ping Fan > > Signed-off-by: Liu Ping Fan > --- > cpu-common.h |8 ++-- > cputlb.c |4 ++-- > dma-helpers.c |4 +++- > dma.h |5 - > exec.c| 45 + > memory.h |4 +++- > target-i386/kvm.c |4 ++-- > 7 files changed, 57 insertions(+), 17 deletions(-) > [snip] > diff --git a/exec.c b/exec.c > index e5f1c0f..e9bd695 100644 > --- a/exec.c > +++ b/exec.c [snip] > @@ -3822,7 +3837,8 @@ void address_space_unmap(AddressSpace *as, void > *buffer, target_phys_addr_t len, > { > if (buffer != bounce.buffer) { > if (is_write) { > -ram_addr_t addr1 = qemu_ram_addr_from_host_nofail(buffer); > +/* Will release RAM refcnt */ > +ram_addr_t addr1 = qemu_ram_addr_from_host_nofail(buffer, true); > while (access_len) { > unsigned l; > l = TARGET_PAGE_SIZE; Since qemu_ram_addr_from_host_nofail(buffer, true) will decrease the reference counter for this memoryregion, I think is should be called regardless of read/write i.e. outside of the "if (is_write)" clause. Otherwise references for reads are not decreased properly. thanks, - Vasilis
Re: [Qemu-devel] Want to be part of Memory Hotplug
Hi Senthil, On Mon, Feb 04, 2013 at 04:17:30PM +0530, kumaran wrote: > Hi, > > I am Senthil, doing post graduation in IIT Bombay,India. > > I am looking for some problems related to KVM as part of my research > work. I read about memory hotplug and other open issues in KVM. > > Can i be part of your work regarding memory hotplug. > > My Experience in KVM: > > - I have implemented Record and Replay feature in KVM > - I have good hands on experience with KVM and QEMU's code base > - Thorough knowledge about QEMU's architecture and PCI emulation > - I have implemented Record and replay for Intel Ee100Pro network > interface and for IDE disks (including DMA) > > If I get chance, I can spend around 5 dedicated months on this work. > Please let me know your interest. thanks for your interest in the memory hotplug effort. You are welcome to contribute. Memory hotplug is still not in mainline qemu. It is a qemu-wide project rather than a kvm-specific one. Some possible directions are: - reviewing / improvement suggestions on v4 here: http://lists.nongnu.org/archive/html/qemu-devel/2012-12/msg02693.html - define the qom interface for Dimms. The current patches use a memory bus abstraction (DimmBus) where Dimms can plugin. That's definitely one choice. Some people have suggested using from the i440fx/mch (in general the memory controller device of the emulated system) to the Dimm devices. That's similar to how CPUs are re-modelled in qom-cpu patchsets I think. The final dimm interface has not been agreed upon yet. - Add acpi native memory hotplug support for q35/ich9. The current code creates paravirtual Dimmbus + Dimms, without emulating real hotplug-capable memory controller hardware. - Make sure ejection is safe i.e. all users of a hotplugged MemoryRegion (not only guest/CPUs but also qemu block layer) have stopped using the memory, before actually freeing it. make sure to include the qemu list (cc'ed) in order to let everyone know what you are working on and to get feedback. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 13/30] piix_pci and pc_piix: refactor
Hi, On Wed, Jan 16, 2013 at 12:17:05PM +0100, Andreas Färber wrote: > Hi, > > Am 16.01.2013 10:36, schrieb Vasilis Liaskovitis: > > On Wed, Jan 16, 2013 at 03:20:40PM +0800, Hu Tao wrote: > >> On Tue, Dec 18, 2012 at 01:41:41PM +0100, Vasilis Liaskovitis wrote: > >>> Refactor code so that chipset initialization is similar to q35. This will > >>> allow memory map initialization at chipset qdev init time for both > >>> machines, as well as more similar code structure overall. > >>> > >>> Signed-off-by: Vasilis Liaskovitis > >>> --- > >>> hw/pc_piix.c | 57 --- > >>> hw/piix_pci.c | 225 > >>> ++--- > >>> 2 files changed, 100 insertions(+), 182 deletions(-) > >>> > >>> diff --git a/hw/pc_piix.c b/hw/pc_piix.c > >>> index 19e342a..6a9b508 100644 > >>> --- a/hw/pc_piix.c > >>> +++ b/hw/pc_piix.c > >>> @@ -47,6 +47,7 @@ > >>> #ifdef CONFIG_XEN > >>> # include > >>> #endif > >>> +#include "piix_pci.h" > >> > >> Can't find this file. Did you forget to add this file to git? > > > > sorry, you are right. Below is the corrected patch with the missing header > > Please take review comments on other similar series into account. You > can also check if the QOM Vadis slides from KVM Forum are online somewhere. thanks, I will take a look. > > You are aware that there were two people previously working on > QOM'ifying i440fx? I am aware of Anthony's i440fx-pmc patchset from about a year ago (a few months ago I asked if it would be respinned, but got no response iirc, so I am not sure what the status is) What's the second effort you mention and its status? Are prep_pci patchsets going to address this in the future? I don't mean to step on other people's work-in-progress, so I don't mind dropping this patch if one of the other efforts is still active. > > --- > > hw/pc_piix.c | 57 --- > > hw/piix_pci.c | 225 > > ++--- > > hw/piix_pci.h | 116 + > > 3 files changed, 216 insertions(+), 182 deletions(-) > > create mode 100644 hw/piix_pci.h > > > > diff --git a/hw/pc_piix.c b/hw/pc_piix.c > > index 19e342a..6a9b508 100644 > > --- a/hw/pc_piix.c > > +++ b/hw/pc_piix.c > [...] > > @@ -127,21 +130,53 @@ static void pc_init1(MemoryRegion *system_memory, > > } > > > > if (pci_enabled) { > > -pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi, > > - system_memory, system_io, ram_size, > > - below_4g_mem_size, > > - 0x1ULL - below_4g_mem_size, > > - 0x1ULL + above_4g_mem_size, > > - (sizeof(hwaddr) == 4 > > - ? 0 > > - : ((uint64_t)1 << 62)), > > - pci_memory, ram_memory); > > +i440fx_host = I440FX_HOST_DEVICE(qdev_create(NULL, > > +TYPE_I440FX_HOST_DEVICE)); > > Elsewhere it was requested to use _HOST_BRIDGE wording. ok > > > +i440fx_host->mch.ram_memory = ram_memory; > > +i440fx_host->mch.pci_address_space = pci_memory; > > +i440fx_host->mch.system_memory = get_system_memory(); > > +i440fx_host->mch.address_space_io = get_system_io();; > > +i440fx_host->mch.below_4g_mem_size = below_4g_mem_size; > > +i440fx_host->mch.above_4g_mem_size = above_4g_mem_size; > > + > > +qdev_init_nofail(DEVICE(i440fx_host)); > > +i440fx_state = &i440fx_host->mch; > > +pci_bus = i440fx_host->parent_obj.bus; > > Please don't access the parent field, in particular not "parent_obj". It > was specifically renamed after checking that no more users exist. ok > > PCIHostState *phb = PCI_HOST_BRIDGE(i440fx_host); > ... > pci_bus = phb->bus; > > > +/* Xen supports additional interrupt routes from the PCI devices to > > + * the IOAPIC: the four pins of each PCI device on the bus are also > > + * connected to the IOAPIC directly. > > + * These additional routes can be discovered through ACPI. */ > > +if (xen_enabled()) { > > +piix3 = DO_UPC
Re: [Qemu-devel] [RFC PATCH v4 13/30] piix_pci and pc_piix: refactor
Hi, On Wed, Jan 16, 2013 at 03:20:40PM +0800, Hu Tao wrote: > Hi Vasilis, > > On Tue, Dec 18, 2012 at 01:41:41PM +0100, Vasilis Liaskovitis wrote: > > Refactor code so that chipset initialization is similar to q35. This will > > allow memory map initialization at chipset qdev init time for both > > machines, as well as more similar code structure overall. > > > > Signed-off-by: Vasilis Liaskovitis > > --- > > hw/pc_piix.c | 57 --- > > hw/piix_pci.c | 225 > > ++--- > > 2 files changed, 100 insertions(+), 182 deletions(-) > > > > diff --git a/hw/pc_piix.c b/hw/pc_piix.c > > index 19e342a..6a9b508 100644 > > --- a/hw/pc_piix.c > > +++ b/hw/pc_piix.c > > @@ -47,6 +47,7 @@ > > #ifdef CONFIG_XEN > > # include > > #endif > > +#include "piix_pci.h" > > Can't find this file. Did you forget to add this file to git? sorry, you are right. Below is the corrected patch with the missing header Refactor code so that chipset initialization is similar to q35. This will allow memory map initialization at chipset qdev init time for both machines, as well as more similar code structure overall. Signed-off-by: Vasilis Liaskovitis --- hw/pc_piix.c | 57 --- hw/piix_pci.c | 225 ++--- hw/piix_pci.h | 116 + 3 files changed, 216 insertions(+), 182 deletions(-) create mode 100644 hw/piix_pci.h diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 19e342a..6a9b508 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -47,6 +47,7 @@ #ifdef CONFIG_XEN # include #endif +#include "piix_pci.h" #define MAX_IDE_BUS 2 @@ -85,6 +86,8 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *pci_memory; MemoryRegion *rom_memory; void *fw_cfg = NULL; +I440FXState *i440fx_host; +PIIX3State *piix3; pc_cpus_init(cpu_model); @@ -127,21 +130,53 @@ static void pc_init1(MemoryRegion *system_memory, } if (pci_enabled) { -pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi, - system_memory, system_io, ram_size, - below_4g_mem_size, - 0x1ULL - below_4g_mem_size, - 0x1ULL + above_4g_mem_size, - (sizeof(hwaddr) == 4 - ? 0 - : ((uint64_t)1 << 62)), - pci_memory, ram_memory); +i440fx_host = I440FX_HOST_DEVICE(qdev_create(NULL, +TYPE_I440FX_HOST_DEVICE)); +i440fx_host->mch.ram_memory = ram_memory; +i440fx_host->mch.pci_address_space = pci_memory; +i440fx_host->mch.system_memory = get_system_memory(); +i440fx_host->mch.address_space_io = get_system_io();; +i440fx_host->mch.below_4g_mem_size = below_4g_mem_size; +i440fx_host->mch.above_4g_mem_size = above_4g_mem_size; + +qdev_init_nofail(DEVICE(i440fx_host)); +i440fx_state = &i440fx_host->mch; +pci_bus = i440fx_host->parent_obj.bus; +/* Xen supports additional interrupt routes from the PCI devices to + * the IOAPIC: the four pins of each PCI device on the bus are also + * connected to the IOAPIC directly. + * These additional routes can be discovered through ACPI. */ +if (xen_enabled()) { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(pci_bus, -1, true, +"PIIX3-xen")); +pci_bus_irqs(pci_bus, xen_piix3_set_irq, xen_pci_slot_get_pirq, +piix3, XEN_PIIX_NUM_PIRQS); +} else { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(pci_bus, -1, true, +"PIIX3")); +pci_bus_irqs(pci_bus, piix3_set_irq, pci_slot_get_pirq, piix3, +PIIX_NUM_PIRQS); +pci_bus_set_route_irq_fn(pci_bus, piix3_route_intx_pin_to_irq); +} +piix3->pic = gsi; +isa_bus = DO_UPCAST(ISABus, qbus, +qdev_get_child_bus(&piix3->dev.qdev, "isa.0")); + +piix3_devfn = piix3->dev.devfn; + +ram_size = ram_size / 8 / 1024 / 1024; +if (ram_size > 255) { +ram_size = 255; +} +i440fx_state->dev.config[0x57] = ram_size; } else { pci_bus = NULL; -i440fx_state = NULL; isa_bus = isa_bus_new(NULL, system_io); no_hpet = 1; } + isa_bus_irqs(isa_bus, gsi); if (kvm_irqchip_in_kernel()) { @@ -157,7 +192,7 @@ st
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
> > > > IIRC q35 supports memory hotplug natively (picked up in some > > discussion). Is that correct? > > > From previous discussion I also understand that q35 supports native hotplug. > Sections 5.1 and 5.2 of the spec describe the MCH registers but the native > memory hotplug specifics are not yet clear to me. Any pointers from the > spec are welcome. Ping. Could anyone who's familiar with the q35 spec provide some pointers on native memory hotplug details in the spec? I see pcie hotplug registers but can't find memory hotplug interface details. If I am not mistaken, the spec is here: http://www.intel.com/design/chipsets/datashts/316966.htm Is the q35 memory hotplug support supposed to be an shpc-like interface geared towards memory slots instead of pci slots? thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 21/30] Implement dimm-info
On Tue, Jan 08, 2013 at 04:20:26PM -0700, Eric Blake wrote: > On 12/18/2012 05:41 AM, Vasilis Liaskovitis wrote: > > "query-dimm-info" and "info dimm" will give current state of all dimms in > > the > > system e.g. > > > > dimm0: on > > dimm1: off > > dimm2: off > > dimm3: on > > etc. > > > > Signed-off-by: Vasilis Liaskovitis > > --- > > > +++ b/qapi-schema.json > > @@ -2914,6 +2914,32 @@ > > { 'command': 'query-memory-total', 'returns': 'int' } > > > > ## > > +# @DimmInfo: > > +# > > +# Information about status of a memory hotplug command > > +# > > +# @dimm: the Dimm associated with the result > > +# > > +# @result: the result of the hotplug command > > Here you call it 'result', > > > +# > > +# Since: 1.4 > > +# > > +## > > +{ 'type': 'DimmInfo', > > + 'data': {'dimm': 'str', 'state': 'bool'} } > > but here you call it 'state'. Which is it? And does 'true' mean > plugged in, or that the last command succeeded (where the last command > may have been either a plug or an unplug)? My preference is that 'true' > means plugged in, so more documentation would help. "True" does mean "plugged in" as you suggest, and the name should be "state". I 'll clarify the documentation. > > > + > > +## > > +# @query-dimm-info: > > +# > > +# Returns total memory in bytes, including hotplugged dimms > > Really? > > > +# > > +# Returns: int > > Copy-and-paste error? This doesn't return an 'int', but an array of > 'DimmInfo'. both copy-paste errors, will fix. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 19/30] Implement "info memory-total" and "query-memory-total"
On Fri, Jan 04, 2013 at 09:21:08AM -0700, Eric Blake wrote: > On 12/18/2012 05:41 AM, Vasilis Liaskovitis wrote: > > Returns total physical memory available to guest in bytes, including > > hotplugged > > memory. Note that the number reported here may be different from what the > > guest > > sees e.g. if the guest has not logically onlined hotplugged memory. > > > > This functionality is provided independently of a balloon device, since a > > guest can be using ACPI memory hotplug without using a balloon device. > > > > v3->v4: Moved qmp command implementation to vl.c. This prevents a circular > > header dependency problem. > > Generally, patch change history should occur... > > > > > Signed-off-by: Vasilis Liaskovitis > > --- > > ...here, after the --- divider. It's useful in the email chain, but > does not need to be part of the final git history. ok, thanks. > > > +++ b/qapi-schema.json > > @@ -2903,6 +2903,17 @@ > > { 'command': 'query-target', 'returns': 'TargetInfo' } > > > > ## > > +# @query-memory-total: > > +# > > +# Returns total memory in bytes, including hotplugged dimms > > +# > > +# Returns: int > > +# > > +# Since: 1.4 > > +## > > +{ 'command': 'query-memory-total', 'returns': 'int' } > > Any reason you can't name this just 'query-memory', and return a JSON > dictionary instead of a single int, so that in the future you can add > other memory parameters into the same call? For example, down the road > we may want to report some 'newstat' without adding a new QMP command: I am fine with a dictionary, if we see a need for extending the command in the future. Is it common practice to start off with dicts for simple commands? In any case, I 'll update. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, On Wed, Jan 09, 2013 at 01:08:52AM +0100, Andreas Färber wrote: > Am 18.12.2012 13:41, schrieb Vasilis Liaskovitis: > > Because dimm layout needs to be configured on machine-boot, all dimm devices > > need to be specified on startup command line (either with populated=on or > > with > > populated=off). The dimm information is stored in dimm configuration > > structures. > > > > After machine startup, dimms are hot-added or removed with normal device_add > > and device_del operations e.g.: > > Hot-add syntax: "device_add dimm,id=mydimm0,bus=membus.0" > > Hot-remove syntax: "device_del dimm,id=mydimm0" > > This sounds contradictory: Either all devices need to be specified on > the command line, or they can be hot-added via monitor. Due to the fixed layout requirement, all memory devices need to be specified at the command line. This was done with a separate "-dimm" argument in previous versions (see v3), but some reviewers didn't like the extra argument and suggested handling everything with the normal "-device" arg. So "-device dimm,..." saves the layout for *all* memory devices. However for "populated=off" dimms, the device is actually *not* created at startup. This is why the following combination is not contradictory: Dimm descirption at startup: -device dimm,id=mydimm0,bus=membus.0,size=1G,node=0,populated=off Hot-add with monitor command: device_add dimm,id=mydimm0,bus=membus.0 If on the other hand we specify: -device dimm,id=mydimm0,bus=membus.0,size=1G,node=0,populated=on the dimm device is indeed created at startup. granted it's confusing, but this is how v4 handles the fixed layout/device creation without adding a new command line argument for the layout. Better solutions are welcome. > > Assuming a fixed layout at startup, I wonder if there is another clever > way to model this... For CPU hotplug Anthony had suggested to have a > fixed set of link properties that get set to a CPU socket as > needed. Might a similar strategy work for memory, i.e. a > startup-configured amount of links on /machine/dimm[n] that point > to a QOM DIMM object or NULL if unpopulated? Hot(un)plug would then > simply work via QMP qom-set command. (CC'ing some people) This may work for a fixed number of PV dimms. On the other hand some other reviewers like the idea of modelling the memory bus (DimmBus), either for paravirtualized features (e.g. i440fx) or for emulated memory controllers in the future. I assume we either go with a bus or links<>, and not both. Btw, is the CPU link feature already implemented in a qom-cpu branch? I haven't tested qom-cpu for a long time, but I could take a look as a point of reference. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
On Wed, Dec 19, 2012 at 12:45:46AM +0800, Zhi Yong Wu wrote: > HI, > > One stupid question, 'dimm' presents one guest memory, then why it is > called as "dimm"? what is its full name? it's a bad name coming from dram technology (dual in-line memory module). Memory-slot or memory-module is probably a better name, since we are not really modelling a specific memory technology. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
Hi, On Wed, Dec 19, 2012 at 08:27:36AM +0100, Gerd Hoffmann wrote: > Hi, > > > - multiple memory buses can be registered. Memory buses of the real > > hw/chipset > > or a paravirtual memory bus can be added. > > IIRC q35 supports memory hotplug natively (picked up in some > discussion). Is that correct? > > What does the code emulate? It doesn't look like it emulates q35 memory > hotplug ... correct, only the number of channels and ranks(dimms) per channel has been emulated so far (2 channels of 4 dimms each). So it is still paravirtual memory hotplug, not native. Native support still needs to be worked on. >From previous discussion I also understand that q35 supports native hotplug. Sections 5.1 and 5.2 of the spec describe the MCH registers but the native acpi-memory hotplug specifics are not yet clear to me. Any pointers from the spec are welcome. > > I think the paravirtual memory hotplug controller should be a PCI device > (which we then can add as function to the chipset). Having some fixed > magic addresses is bad. ok, so in your opinion a pci-based hotplug controller sounds better than adding acpi ports to piix4 or ich9? Magic acpi_ich9 ports can be avoided if q35 native support is implemented. For i440fx/piix4 it was discussed and more or less decided we would only support a paravirtual way of memory hotplug. In the description. I meant "paravirtual memory bus" to describe a memory bus with unlimited number of dimm devices. But the "hotplug control" has always been acpi-based so far and not a pci device. thanks, - Vasilis
[Qemu-devel] [RFC PATCH v4 24/30] acpi_piix4: add hot-remove capability
--- docs/specs/acpi_hotplug.txt |8 hw/acpi_piix4.c | 29 - 2 files changed, 36 insertions(+), 1 deletions(-) diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt index 8391713..cf86242 100644 --- a/docs/specs/acpi_hotplug.txt +++ b/docs/specs/acpi_hotplug.txt @@ -12,3 +12,11 @@ Dimm hot-plug notification pending. One bit per slot. Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove events. Read-only. + +Memory Dimm ejection success notification (IO port 0xafa0, 1-byte access): +--- +Dimm hot-remove _EJ0 notification. Byte value indicates Dimm slot that was +ejected. + +Written by ACPI memory device _EJ0 method to notify qemu of successfull +hot-removal. Write-only. diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 879d8a0..6e4718e 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -50,6 +50,7 @@ #define PCI_EJ_BASE 0xae08 #define PCI_RMV_BASE 0xae0c #define MEM_BASE 0xaf80 +#define MEM_EJ_BASE 0xafa0 #define PIIX4_MEM_HOTPLUG_STATUS 8 #define PIIX4_PCI_HOTPLUG_STATUS 2 @@ -544,12 +545,29 @@ static uint32_t memhp_readb(void *opaque, uint32_t addr) return val; } +static void memhp_writeb(void *opaque, uint32_t addr, uint32_t val) +{ +switch (addr) { +case MEM_EJ_BASE - MEM_BASE: +dimm_notify(val, DIMM_REMOVE_SUCCESS); +break; +default: +PIIX4_DPRINTF("memhp write invalid %x <== %d\n", addr, val); +} +PIIX4_DPRINTF("memhp write %x <== %d\n", addr, val); +} + static const MemoryRegionOps piix4_memhp_ops = { .old_portio = (MemoryRegionPortio[]) { { .offset = 0, .len = DIMM_BITMAP_BYTES, .size = 1, .read = memhp_readb, }, +{ +.offset = MEM_EJ_BASE - MEM_BASE, .len = 1, +.size = 1, +.write = memhp_writeb, +}, PORTIO_END_OF_LIST() }, .endianness = DEVICE_LITTLE_ENDIAN, @@ -635,7 +653,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s) memory_region_add_subregion(get_system_io(), PCI_HOTPLUG_ADDR, &s->io_pci); memory_region_init_io(&s->io_memhp, &piix4_memhp_ops, s, "apci-memhp0", - DIMM_BITMAP_BYTES); + DIMM_BITMAP_BYTES + 1); memory_region_add_subregion(get_system_io(), MEM_BASE, &s->io_memhp); for (i = 0; i < DIMM_BITMAP_BYTES; i++) { @@ -665,6 +683,13 @@ static void enable_mem_device(PIIX4PMState *s, int memdevice) g->mems_sts[memdevice/8] |= (1 << (memdevice%8)); } +static void disable_mem_device(PIIX4PMState *s, int memdevice) +{ +struct gpe_regs *g = &s->gperegs; +s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS; +g->mems_sts[memdevice/8] &= ~(1 << (memdevice%8)); +} + static int piix4_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int add) { @@ -674,6 +699,8 @@ static int piix4_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int if (add) { enable_mem_device(s, slot->idx); +} else { +disable_mem_device(s, slot->idx); } pm_update_sci(s); return 0; -- 1.7.9
[Qemu-devel] [RFC PATCH v4 22/30] [SeaBIOS] acpi: add _EJ0 operation and eject port for memory devices
This will allow hot-remove signalling from/to qemu and acpi-enabled guest. --- src/acpi-dsdt-mem-hotplug.dsl | 15 +++ src/ssdt-mem.dsl |3 +++ 2 files changed, 18 insertions(+), 0 deletions(-) diff --git a/src/acpi-dsdt-mem-hotplug.dsl b/src/acpi-dsdt-mem-hotplug.dsl index 0e7ced3..fd73ea7 100644 --- a/src/acpi-dsdt-mem-hotplug.dsl +++ b/src/acpi-dsdt-mem-hotplug.dsl @@ -21,6 +21,13 @@ Scope(\_SB) { MES, 256 } +/* Memory eject byte */ +OperationRegion(MEMJ, SystemIO, 0xafa0, 1) +Field (MEMJ, ByteAcc, NoLock, Preserve) +{ +MPE, 8 +} + Method(MESC, 0) { // Local5 = active memdevice bitmap Store (MES, Local5) @@ -47,6 +54,8 @@ Scope(\_SB) { // Do MEM notify If (LEqual(Local3, 1)) { MTFY(Local0, 1) +} Else { +MTFY(Local0, 3) } } Increment(Local0) @@ -54,4 +63,10 @@ Scope(\_SB) { Return(One) } +Method (MPEJ, 2, NotSerialized) { +// _EJ0 method - eject callback +Store(Arg0, MPE) +Sleep(200) +} + } diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl index dbac33f..eef84b6 100644 --- a/src/ssdt-mem.dsl +++ b/src/ssdt-mem.dsl @@ -57,6 +57,9 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1) Method (_STA, 0) { Return(CMST(ID)) } +Method (_EJ0, 1, NotSerialized) { +MPEJ(ID, Arg0) +} } } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 26/30] Implement qmp and hmp commands for notification lists
Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method. This patch implements a tail queue to store guest notifications for memory hot-add and hot-remove requests. Guest responses for memory hotplug command on a per-dimm basis can be detected with the new hmp command "info memory-hotplug" or the new qmp command "query-memory-hotplug" Examples: (qemu) device_add dimm,id=ram0 (qemu) info memory-hotplug dimm: ram0 hot-add success or dimm: ram0 hot-add failure (qemu) device_del ram3 (qemu) info memory-hotplug dimm: ram3 hot-remove success or dimm: ram3 hot-remove failure Results are removed from the queue once read. This patch only queues _EJ events that signal hot-remove success. For _OST event queuing, which cover the hot-remove failure and hot-add success/failure cases, the _OST patches in this series are are also needed. These notification items should probably be part of migration state (not yet implemented). Signed-off-by: Vasilis Liaskovitis --- hmp-commands.hx |2 + hmp.c| 17 +++ hmp.h|1 + hw/dimm.c| 61 ++ hw/dimm.h|1 + monitor.c|7 ++ qapi-schema.json | 26 +++ qmp-commands.hx | 37 8 files changed, 152 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 65d799e..b94b7a2 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1574,6 +1574,8 @@ show roms show memory-total @item info dimm show dimm +@item info memory-hotplug +show memory-hotplug @end table ETEXI diff --git a/hmp.c b/hmp.c index f8456fd..727ed80 100644 --- a/hmp.c +++ b/hmp.c @@ -652,6 +652,23 @@ void hmp_info_dimm(Monitor *mon) qapi_free_DimmInfoList(info); } +void hmp_info_memory_hotplug(Monitor *mon) +{ +MemHpInfoList *info; +MemHpInfoList *item; +MemHpInfo *dimm; + +info = qmp_query_memory_hotplug(NULL); +for (item = info; item; item = item->next) { +dimm = item->value; +monitor_printf(mon, "dimm: %s %s %s\n", dimm->dimm, +dimm->request, dimm->result); +dimm->dimm = NULL; +} + +qapi_free_MemHpInfoList(info); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index 74ac061..92095df 100644 --- a/hmp.h +++ b/hmp.h @@ -38,6 +38,7 @@ void hmp_info_pci(Monitor *mon); void hmp_info_block_jobs(Monitor *mon); void hmp_info_memory_total(Monitor *mon); void hmp_info_dimm(Monitor *mon); +void hmp_info_memory_hotplug(Monitor *mon); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/hw/dimm.c b/hw/dimm.c index 0b4e22d..4670ae6 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -67,6 +67,7 @@ static void dimm_bus_initfn(Object *obj) DimmBus *bus = DIMM_BUS(obj); QTAILQ_INIT(&bus->dimmconfig_list); QTAILQ_INIT(&bus->dimmlist); +QTAILQ_INIT(&bus->dimm_hp_result_queue); } static const TypeInfo dimm_bus_info = { @@ -278,6 +279,58 @@ DimmInfoList *qmp_query_dimm_info(Error **errp) return head; } +MemHpInfoList *qmp_query_memory_hotplug(Error **errp) +{ +DimmBus *bus; +MemHpInfoList *head = NULL, *cur_item = NULL, *info; +struct dimm_hp_result *item, *nextitem; + +QLIST_FOREACH(bus, &memory_buses, next) { +QTAILQ_FOREACH_SAFE(item, &bus->dimm_hp_result_queue, next, nextitem) { + +info = g_malloc0(sizeof(*info)); +info->value = g_malloc0(sizeof(*info->value)); +info->value->dimm = g_malloc0(sizeof(char) * 32); +info->value->request = g_malloc0(sizeof(char) * 16); +info->value->result = g_malloc0(sizeof(char) * 16); +switch (item->ret) { +case DIMM_REMOVE_SUCCESS: +strcpy(info->value->request, "hot-remove"); +strcpy(info->value->result, "success"); +break; +case DIMM_REMOVE_FAIL: +strcpy(info->value->request, "hot-remove"); +strcpy(info->value->result, "failure"); +break; +case DIMM_ADD_SUCCESS: +strcpy(info->value->request, "hot-add"); +strcpy(info->value->result, "success"); +break; +case DIMM_ADD_FAIL: +strcpy(info->value->request, "hot-add"); +strcpy(info->value->result, "failure"); +break; +default: +break; +} +strcpy(info->value->dimm, item->dimmname); +/* XXX: waiting for the qa
[Qemu-devel] [RFC PATCH v4 16/30] pc: Add dimm paravirt SRAT info
The numa_fw_cfg paravirt interface is extended to include SRAT information for all hotplug-able dimms. There are 3 words for each hotplug-able memory slot, denoting start address, size and node proximity. The new info is appended after existing numa info, so that the fw_cfg layout does not break. This information is used by Seabios to build hotplug memory device objects at runtime. nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info to SeaBIOS. v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm ranges. Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx initialization. v2->v3: setting nb_numa_nodes to 1 is not needed v1->v2: Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not to break existing layout Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt Signed-off-by: Vasilis Liaskovitis --- docs/specs/fwcfg.txt | 28 hw/pc.c | 28 +++- hw/pc.h |1 + hw/pc_piix.c |1 + hw/pc_q35.c |8 +--- sysemu.h |1 + 6 files changed, 59 insertions(+), 8 deletions(-) create mode 100644 docs/specs/fwcfg.txt diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt new file mode 100644 index 000..e6fcd8f --- /dev/null +++ b/docs/specs/fwcfg.txt @@ -0,0 +1,28 @@ +QEMU<->BIOS Paravirt Documentation +-- + +This document describes paravirt data structures passed from QEMU to BIOS. + +fw_cfg SRAT paravirt info + +The SRAT info passed from QEMU to BIOS has the following layout: + +--- +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem + +--- +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm + +Entry 0 contains the number of numa nodes (nb_numa_nodes). + +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each +one of the vCPUs in the system. + +Entries max_cpus+1..max_cpus+nb_numa_nodes+1: The next nb_numa_nodes entries +describe the memory size for each one of the NUMA nodes in the system. + +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms) + +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains +the physical address offset, size (in bytes), and node proximity for the +respective dimm. diff --git a/hw/pc.c b/hw/pc.c index b11e7c4..025c356 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -51,6 +51,7 @@ #include "exec-memory.h" #include "arch_init.h" #include "bitmap.h" +#include "hw/dimm.h" /* debug PC/ISA interrupts */ //#define DEBUG_IRQ @@ -582,8 +583,6 @@ static void *bochs_bios_init(void) void *fw_cfg; uint8_t *smbios_table; size_t smbios_len; -uint64_t *numa_fw_cfg; -int i, j; PortioList *bochs_bios_port_list = g_new(PortioList, 1); portio_list_init(bochs_bios_port_list, bochs_bios_portio_list, @@ -607,11 +606,24 @@ static void *bochs_bios_init(void) fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)&hpet_cfg, sizeof(struct hpet_fw_config)); + +return fw_cfg; +} + +void bochs_meminfo_bios_init(void *fw_cfg) +{ +uint64_t *numa_fw_cfg; +uint64_t *hp_dimms_fw_cfg; +int i, j; + /* allocate memory for the NUMA channel: one (64bit) word for the number * of nodes, one word for each VCPU->node and one word for each node to * hold the amount of memory. + * Finally one word for the number of hotplug memory slots and three words + * for each hotplug memory slot (start address, size and node proximity). */ -numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8); +numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) +* 8); numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); for (i = 0; i < max_cpus; i++) { for (j = 0; j < nb_numa_nodes; j++) { @@ -624,10 +636,16 @@ static void *bochs_bios_init(void) for (i = 0; i < nb_numa_nodes; i++) { numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]); } + +numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms); + +hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes; +if (nb_hp_dimms) { +dimm_setup_fwcfg_layout(hp_dimms_fw_cfg); +} fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg, - (1 + max_cpus + nb_numa_nodes) * 8); + (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8); -return fw_cfg; } static long get_file_size(FILE *f) diff --git a/hw/pc.h b/hw/p
[Qemu-devel] [RFC PATCH v4 13/30] piix_pci and pc_piix: refactor
Refactor code so that chipset initialization is similar to q35. This will allow memory map initialization at chipset qdev init time for both machines, as well as more similar code structure overall. Signed-off-by: Vasilis Liaskovitis --- hw/pc_piix.c | 57 --- hw/piix_pci.c | 225 ++--- 2 files changed, 100 insertions(+), 182 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 19e342a..6a9b508 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -47,6 +47,7 @@ #ifdef CONFIG_XEN # include #endif +#include "piix_pci.h" #define MAX_IDE_BUS 2 @@ -85,6 +86,8 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *pci_memory; MemoryRegion *rom_memory; void *fw_cfg = NULL; +I440FXState *i440fx_host; +PIIX3State *piix3; pc_cpus_init(cpu_model); @@ -127,21 +130,53 @@ static void pc_init1(MemoryRegion *system_memory, } if (pci_enabled) { -pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi, - system_memory, system_io, ram_size, - below_4g_mem_size, - 0x1ULL - below_4g_mem_size, - 0x1ULL + above_4g_mem_size, - (sizeof(hwaddr) == 4 - ? 0 - : ((uint64_t)1 << 62)), - pci_memory, ram_memory); +i440fx_host = I440FX_HOST_DEVICE(qdev_create(NULL, +TYPE_I440FX_HOST_DEVICE)); +i440fx_host->mch.ram_memory = ram_memory; +i440fx_host->mch.pci_address_space = pci_memory; +i440fx_host->mch.system_memory = get_system_memory(); +i440fx_host->mch.address_space_io = get_system_io();; +i440fx_host->mch.below_4g_mem_size = below_4g_mem_size; +i440fx_host->mch.above_4g_mem_size = above_4g_mem_size; + +qdev_init_nofail(DEVICE(i440fx_host)); +i440fx_state = &i440fx_host->mch; +pci_bus = i440fx_host->parent_obj.bus; +/* Xen supports additional interrupt routes from the PCI devices to + * the IOAPIC: the four pins of each PCI device on the bus are also + * connected to the IOAPIC directly. + * These additional routes can be discovered through ACPI. */ +if (xen_enabled()) { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(pci_bus, -1, true, +"PIIX3-xen")); +pci_bus_irqs(pci_bus, xen_piix3_set_irq, xen_pci_slot_get_pirq, +piix3, XEN_PIIX_NUM_PIRQS); +} else { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(pci_bus, -1, true, +"PIIX3")); +pci_bus_irqs(pci_bus, piix3_set_irq, pci_slot_get_pirq, piix3, +PIIX_NUM_PIRQS); +pci_bus_set_route_irq_fn(pci_bus, piix3_route_intx_pin_to_irq); +} +piix3->pic = gsi; +isa_bus = DO_UPCAST(ISABus, qbus, +qdev_get_child_bus(&piix3->dev.qdev, "isa.0")); + +piix3_devfn = piix3->dev.devfn; + +ram_size = ram_size / 8 / 1024 / 1024; +if (ram_size > 255) { +ram_size = 255; +} +i440fx_state->dev.config[0x57] = ram_size; } else { pci_bus = NULL; -i440fx_state = NULL; isa_bus = isa_bus_new(NULL, system_io); no_hpet = 1; } + isa_bus_irqs(isa_bus, gsi); if (kvm_irqchip_in_kernel()) { @@ -157,7 +192,7 @@ static void pc_init1(MemoryRegion *system_memory, gsi_state->i8259_irq[i] = i8259[i]; } if (pci_enabled) { -ioapic_init_gsi(gsi_state, "i440fx"); +ioapic_init_gsi(gsi_state, NULL); } pc_register_ferr_irq(gsi[13]); diff --git a/hw/piix_pci.c b/hw/piix_pci.c index ba1b3de..7ca3c73 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -31,70 +31,15 @@ #include "range.h" #include "xen.h" #include "pam.h" +#include "piix_pci.h" -/* - * I440FX chipset data sheet. - * http://download.intel.com/design/chipsets/datashts/29054901.pdf - */ - -typedef struct I440FXState { -PCIHostState parent_obj; -} I440FXState; - -#define PIIX_NUM_PIC_IRQS 16 /* i8259 * 2 */ -#define PIIX_NUM_PIRQS 4ULL/* PIRQ[A-D] */ -#define XEN_PIIX_NUM_PIRQS 128ULL -#define PIIX_PIRQC 0x60 - -typedef struct PIIX3State { -PCIDevice dev; - -/* - * bitmap to track pic levels. - * The pic level is the logical OR of all the PCI irqs mapped to it - * So one PIC level is tracked by PIIX_NUM_PIRQS bits. - * - * PIRQ is mapped to PIC pins, we track it by - * PIIX_NUM_PI
[Qemu-devel] [RFC PATCH v4 12/30] acpi_ich9 : Implement memory device hotplug registers
This implements acpi dimm hot-add capability for q35 (ich9). The logic is the same as for the pc machine (piix4). TODO: Fix acpi irq delivery bug. Currently there is a flood of irqs when delivering an acpi interrupt (should be just one). Guest complains as follows: "irq 9: nobody cared [...] Disabling IRQ #9" where #9 is the acpi irq Signed-off-by: Vasilis Liaskovitis --- hw/acpi_ich9.c | 61 +-- hw/acpi_ich9.h |7 +- hw/lpc_ich9.c |2 +- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/hw/acpi_ich9.c b/hw/acpi_ich9.c index c5978d3..abafbb5 100644 --- a/hw/acpi_ich9.c +++ b/hw/acpi_ich9.c @@ -48,11 +48,14 @@ static void pm_update_sci(ICH9LPCPMRegs *pm) pm1a_sts = acpi_pm1_evt_get_sts(&pm->acpi_regs); -sci_level = (((pm1a_sts & pm->acpi_regs.pm1.evt.en) & +sci_level = pm1a_sts & pm->acpi_regs.pm1.evt.en) & (ACPI_BITMASK_RT_CLOCK_ENABLE | ACPI_BITMASK_POWER_BUTTON_ENABLE | ACPI_BITMASK_GLOBAL_LOCK_ENABLE | - ACPI_BITMASK_TIMER_ENABLE)) != 0); + ACPI_BITMASK_TIMER_ENABLE)) != 0) || +(((pm->acpi_regs.gpe.sts[0] & pm->acpi_regs.gpe.en[0]) & + (ICH9_MEM_HOTPLUG_STATUS)) != 0)); + qemu_set_irq(pm->irq, sci_level); /* schedule a timer interruption if needed */ @@ -90,6 +93,29 @@ static const MemoryRegionOps ich9_gpe_ops = { .endianness = DEVICE_LITTLE_ENDIAN, }; +static uint32_t memhp_readb(void *opaque, uint32_t addr) +{ +ICH9LPCPMRegs *s = opaque; +uint32_t val = 0; +struct gpe_regs *g = &s->gperegs; +if (addr < DIMM_BITMAP_BYTES) { +val = (uint32_t) g->mems_sts[addr]; +} +ICH9_DEBUG("memhp read %x == %x\n", addr, val); +return val; +} + +static const MemoryRegionOps ich9_memhp_ops = { +.old_portio = (MemoryRegionPortio[]) { +{ +.offset = 0, .len = DIMM_BITMAP_BYTES, .size = 1, +.read = memhp_readb, +}, +PORTIO_END_OF_LIST() +}, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + static uint64_t ich9_smi_readl(void *opaque, hwaddr addr, unsigned width) { ICH9LPCPMRegs *pm = opaque; @@ -201,8 +227,31 @@ static void pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(&pm->acpi_regs); } -void ich9_pm_init(ICH9LPCPMRegs *pm, qemu_irq sci_irq, qemu_irq cmos_s3) +static void enable_mem_device(ICH9LPCState *s, int memdevice) { +struct gpe_regs *g = &s->pm.gperegs; +s->pm.acpi_regs.gpe.sts[0] |= ICH9_MEM_HOTPLUG_STATUS; +g->mems_sts[memdevice/8] |= (1 << (memdevice%8)); +} + +static int ich9_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int +add) +{ +PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev); +ICH9LPCState *s = DO_UPCAST(ICH9LPCState, d, pci_dev); +DimmDevice *slot = DIMM(dev); + +if (add) { +enable_mem_device(s, slot->idx); +} +pm_update_sci(&s->pm); +return 0; +} + +void ich9_pm_init(void *device, qemu_irq sci_irq, qemu_irq cmos_s3) +{ +ICH9LPCState *lpc = (ICH9LPCState *)device; +ICH9LPCPMRegs *pm = &lpc->pm; memory_region_init(&pm->io, "ich9-pm", ICH9_PMIO_SIZE); memory_region_set_enabled(&pm->io, false); memory_region_add_subregion(get_system_io(), 0, &pm->io); @@ -220,6 +269,12 @@ void ich9_pm_init(ICH9LPCPMRegs *pm, qemu_irq sci_irq, qemu_irq cmos_s3) 8); memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi); +memory_region_init_io(&pm->io_memhp, &ich9_memhp_ops, pm, "apci-memhp0", + DIMM_BITMAP_BYTES); +memory_region_add_subregion(get_system_io(), ICH9_MEM_BASE, &pm->io_memhp); + +dimm_bus_hotplug(ich9_dimm_hotplug, &lpc->d.qdev); + pm->irq = sci_irq; qemu_register_reset(pm_reset, pm); pm->powerdown_notifier.notify = pm_powerdown_req; diff --git a/hw/acpi_ich9.h b/hw/acpi_ich9.h index bc221d3..4419247 100644 --- a/hw/acpi_ich9.h +++ b/hw/acpi_ich9.h @@ -23,6 +23,9 @@ #include "acpi.h" +#define ICH9_MEM_BASE0xaf80 +#define ICH9_MEM_HOTPLUG_STATUS 8 + typedef struct ICH9LPCPMRegs { /* * In ich9 spec says that pm1_cnt register is 32bit width and @@ -33,16 +36,18 @@ typedef struct ICH9LPCPMRegs { MemoryRegion io; MemoryRegion io_gpe; MemoryRegion io_smi; +MemoryRegion io_memhp; uint32_t smi_en; uint32_t smi_sts; qemu_irq irq; /* SCI */ +struct gpe_regs gperegs; uint32_t pm_io_base; Notifier powerdown_notifier; } ICH9LPCPMRegs; -void ich9_pm_init(ICH9LPCPMRegs *pm, +void ich9_pm_init(void *lpc, qemu_irq sci_irq, qemu_irq cmos_s3_resume); void ich9_pm_
[Qemu-devel] [RFC PATCH v4 19/30] Implement "info memory-total" and "query-memory-total"
Returns total physical memory available to guest in bytes, including hotplugged memory. Note that the number reported here may be different from what the guest sees e.g. if the guest has not logically onlined hotplugged memory. This functionality is provided independently of a balloon device, since a guest can be using ACPI memory hotplug without using a balloon device. v3->v4: Moved qmp command implementation to vl.c. This prevents a circular header dependency problem. Signed-off-by: Vasilis Liaskovitis --- hmp-commands.hx |2 ++ hmp.c|7 +++ hmp.h|1 + hw/dimm.c| 14 ++ hw/dimm.h|1 + monitor.c|7 +++ qapi-schema.json | 11 +++ qmp-commands.hx | 20 vl.c |9 + 9 files changed, 72 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 010b8c9..3fbd975 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1570,6 +1570,8 @@ show device tree show qdev device model list @item info roms show roms +@item info memory-total +show memory-total @end table ETEXI diff --git a/hmp.c b/hmp.c index 180ba2b..fb39b0d 100644 --- a/hmp.c +++ b/hmp.c @@ -628,6 +628,13 @@ void hmp_info_block_jobs(Monitor *mon) } } +void hmp_info_memory_total(Monitor *mon) +{ +uint64_t ram_total; +ram_total = (uint64_t)qmp_query_memory_total(NULL); +monitor_printf(mon, "MemTotal: %lu\n", ram_total); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index 0ab03be..25a3a70 100644 --- a/hmp.h +++ b/hmp.h @@ -36,6 +36,7 @@ void hmp_info_spice(Monitor *mon); void hmp_info_balloon(Monitor *mon); void hmp_info_pci(Monitor *mon); void hmp_info_block_jobs(Monitor *mon); +void hmp_info_memory_total(Monitor *mon); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/hw/dimm.c b/hw/dimm.c index e384952..f181e54 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -189,6 +189,20 @@ void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots) } } +uint64_t get_hp_memory_total(void) +{ +DimmBus *bus; +DimmDevice *slot; +uint64_t info = 0; + +QLIST_FOREACH(bus, &memory_buses, next) { +QTAILQ_FOREACH(slot, &bus->dimmlist, nextdimm) { +info += slot->size; +} +} +return info; +} + static int dimm_init(DeviceState *s) { DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(s)); diff --git a/hw/dimm.h b/hw/dimm.h index 75a6911..5130b2c 100644 --- a/hw/dimm.h +++ b/hw/dimm.h @@ -85,5 +85,6 @@ DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms, dimm_calcoffset_fn pmc_set_offset); void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node, uint32_t dimm_idx, uint32_t populated); +uint64_t get_hp_memory_total(void); #endif diff --git a/monitor.c b/monitor.c index c0e32d6..6e87d0d 100644 --- a/monitor.c +++ b/monitor.c @@ -2708,6 +2708,13 @@ static mon_cmd_t info_cmds[] = { .mhandler.info = hmp_info_balloon, }, { +.name = "memory-total", +.args_type = "", +.params = "", +.help = "show total memory size", +.mhandler.info = hmp_info_memory_total, +}, +{ .name = "qtree", .args_type = "", .params = "", diff --git a/qapi-schema.json b/qapi-schema.json index 5dfa052..33f88d6 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2903,6 +2903,17 @@ { 'command': 'query-target', 'returns': 'TargetInfo' } ## +# @query-memory-total: +# +# Returns total memory in bytes, including hotplugged dimms +# +# Returns: int +# +# Since: 1.4 +## +{ 'command': 'query-memory-total', 'returns': 'int' } + +## # @QKeyCode: # # An enumeration of key name. diff --git a/qmp-commands.hx b/qmp-commands.hx index 5c692d0..a99117a 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2654,3 +2654,23 @@ EQMP .args_type = "", .mhandler.cmd_new = qmp_marshal_input_query_target, }, + +{ +.name = "query-memory-total", +.args_type = "", +.mhandler.cmd_new = qmp_marshal_input_query_memory_total +}, +SQMP +query-memory-total +-- + +Return total memory in bytes, including hotplugged dimms + +Example: + +-> { "execute": "query-memory-total" } +<- { + "return": 1073741824 + } + +EQMP diff --git a/vl.c b/vl.c index 8406933..80803c5 100644 --- a/vl.c +++ b/vl.c @@ -126,6 +126,7 @@ int main(int argc, char **argv) #include "hw/xen.h" #include "hw/qdev.h"
[Qemu-devel] [RFC PATCH v4 05/30] [SeaBIOS] q35: Add memory hotplug handler
--- src/q35-acpi-dsdt.dsl |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/q35-acpi-dsdt.dsl b/src/q35-acpi-dsdt.dsl index c031d83..5b28d72 100644 --- a/src/q35-acpi-dsdt.dsl +++ b/src/q35-acpi-dsdt.dsl @@ -403,7 +403,7 @@ DefinitionBlock ( } #include "acpi-dsdt-cpu-hotplug.dsl" - +#include "acpi-dsdt-mem-hotplug.dsl" / * General purpose events @@ -418,7 +418,9 @@ DefinitionBlock ( // CPU hotplug event \_SB.PRSC() } -Method(_L02) { +Method(_E02) { +// Memory hotplug event +\_SB.MESC() } Method(_L03) { } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 08/30] qemu-option: export parse_option_number
Signed-off-by: Vasilis Liaskovitis --- qemu-option.c |2 +- qemu-option.h |2 ++ 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/qemu-option.c b/qemu-option.c index 38e0a11..88fd370 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -185,7 +185,7 @@ static void parse_option_bool(const char *name, const char *value, bool *ret, } } -static void parse_option_number(const char *name, const char *value, +void parse_option_number(const char *name, const char *value, uint64_t *ret, Error **errp) { char *postfix; diff --git a/qemu-option.h b/qemu-option.h index b8ee5b3..8b7235f 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -154,5 +154,7 @@ int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, int abort_on_failure); void parse_option_size(const char *name, const char *value, uint64_t *ret, Error **errp); +void parse_option_number(const char *name, const char *value, +uint64_t *ret, Error **errp); #endif -- 1.7.9
[Qemu-devel] [RFC PATCH v4 03/30] [SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
Extend the DSDT to include methods for handling memory hot-add and hot-remove notifications and memory device status requests. These functions are called from the memory device SSDT methods. --- src/acpi-dsdt-mem-hotplug.dsl | 57 + src/acpi-dsdt.dsl |5 +++- 2 files changed, 61 insertions(+), 1 deletions(-) create mode 100644 src/acpi-dsdt-mem-hotplug.dsl diff --git a/src/acpi-dsdt-mem-hotplug.dsl b/src/acpi-dsdt-mem-hotplug.dsl new file mode 100644 index 000..0e7ced3 --- /dev/null +++ b/src/acpi-dsdt-mem-hotplug.dsl @@ -0,0 +1,57 @@ +/ + * Memory hotplug + / + +Scope(\_SB) { +/* Objects filled in by run-time generated SSDT */ +External(MTFY, MethodObj) +External(MEON, PkgObj) + +Method (CMST, 1, NotSerialized) { +// _STA method - return ON status of memdevice +// Local0 = MEON flag for this cpu +Store(DerefOf(Index(MEON, Arg0)), Local0) +If (Local0) { Return(0xF) } Else { Return(0x0) } +} + +/* Memory hotplug notify array */ +OperationRegion(MEST, SystemIO, 0xaf80, 32) +Field (MEST, ByteAcc, NoLock, Preserve) +{ +MES, 256 +} + +Method(MESC, 0) { +// Local5 = active memdevice bitmap +Store (MES, Local5) +// Local2 = last read byte from bitmap +Store (Zero, Local2) +// Local0 = memory device iterator +Store (Zero, Local0) +While (LLess(Local0, SizeOf(MEON))) { +// Local1 = MEON flag for this memory device +Store(DerefOf(Index(MEON, Local0)), Local1) +If (And(Local0, 0x07)) { +// Shift down previously read bitmap byte +ShiftRight(Local2, 1, Local2) +} Else { +// Read next byte from memdevice bitmap +Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2) +} +// Local3 = active state for this memory device +Store(And(Local2, 1), Local3) + +If (LNotEqual(Local1, Local3)) { +// State change - update MEON with new state +Store(Local3, Index(MEON, Local0)) +// Do MEM notify +If (LEqual(Local3, 1)) { +MTFY(Local0, 1) +} +} +Increment(Local0) +} +Return(One) +} + +} diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 158f6b4..98c9413 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -294,6 +294,7 @@ DefinitionBlock ( } #include "acpi-dsdt-cpu-hotplug.dsl" +#include "acpi-dsdt-mem-hotplug.dsl" / @@ -313,7 +314,9 @@ DefinitionBlock ( // CPU hotplug event \_SB.PRSC() } -Method(_L03) { +Method(_E03) { +// Memory hotplug event +\_SB.MESC() } Method(_L04) { } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 25/30] acpi_ich9: add hot-remove capability
--- hw/acpi_ich9.c | 28 +++- hw/acpi_ich9.h |1 + 2 files changed, 28 insertions(+), 1 deletions(-) diff --git a/hw/acpi_ich9.c b/hw/acpi_ich9.c index abafbb5..f5dc1c9 100644 --- a/hw/acpi_ich9.c +++ b/hw/acpi_ich9.c @@ -105,12 +105,29 @@ static uint32_t memhp_readb(void *opaque, uint32_t addr) return val; } +static void memhp_writeb(void *opaque, uint32_t addr, uint32_t val) +{ +switch (addr) { +case ICH9_MEM_EJ_BASE - ICH9_MEM_BASE: +dimm_notify(val, DIMM_REMOVE_SUCCESS); +break; +default: +ICH9_DEBUG("memhp write invalid %x <== %d\n", addr, val); +} +ICH9_DEBUG("memhp write %x <== %d\n", addr, val); +} + static const MemoryRegionOps ich9_memhp_ops = { .old_portio = (MemoryRegionPortio[]) { { .offset = 0, .len = DIMM_BITMAP_BYTES, .size = 1, .read = memhp_readb, }, +{ +.offset = ICH9_MEM_EJ_BASE - ICH9_MEM_BASE, +.len = 1, .size = 1, +.write = memhp_writeb, +}, PORTIO_END_OF_LIST() }, .endianness = DEVICE_LITTLE_ENDIAN, @@ -234,6 +251,13 @@ static void enable_mem_device(ICH9LPCState *s, int memdevice) g->mems_sts[memdevice/8] |= (1 << (memdevice%8)); } +static void disable_mem_device(ICH9LPCState *s, int memdevice) +{ +struct gpe_regs *g = &s->pm.gperegs; +s->pm.acpi_regs.gpe.sts[0] |= ICH9_MEM_HOTPLUG_STATUS; +g->mems_sts[memdevice/8] &= ~(1 << (memdevice%8)); +} + static int ich9_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int add) { @@ -243,6 +267,8 @@ static int ich9_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int if (add) { enable_mem_device(s, slot->idx); +} else { +disable_mem_device(s, slot->idx); } pm_update_sci(&s->pm); return 0; @@ -270,7 +296,7 @@ void ich9_pm_init(void *device, qemu_irq sci_irq, qemu_irq cmos_s3) memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi); memory_region_init_io(&pm->io_memhp, &ich9_memhp_ops, pm, "apci-memhp0", - DIMM_BITMAP_BYTES); + DIMM_BITMAP_BYTES + 1); memory_region_add_subregion(get_system_io(), ICH9_MEM_BASE, &pm->io_memhp); dimm_bus_hotplug(ich9_dimm_hotplug, &lpc->d.qdev); diff --git a/hw/acpi_ich9.h b/hw/acpi_ich9.h index 4419247..af61a2d 100644 --- a/hw/acpi_ich9.h +++ b/hw/acpi_ich9.h @@ -24,6 +24,7 @@ #include "acpi.h" #define ICH9_MEM_BASE0xaf80 +#define ICH9_MEM_EJ_BASE0xafa0 #define ICH9_MEM_HOTPLUG_STATUS 8 typedef struct ICH9LPCPMRegs { -- 1.7.9
[Qemu-devel] [RFC PATCH v4 10/30] vl: handle "-device dimm"
Signed-off-by: Vasilis Liaskovitis --- vl.c | 51 +++ 1 files changed, 51 insertions(+), 0 deletions(-) diff --git a/vl.c b/vl.c index a3ab384..8406933 100644 --- a/vl.c +++ b/vl.c @@ -169,6 +169,7 @@ int main(int argc, char **argv) #include "ui/qemu-spice.h" #include "qapi/string-input-visitor.h" +#include "hw/dimm.h" //#define DEBUG_NET //#define DEBUG_SLIRP @@ -249,6 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order = int nb_numa_nodes; uint64_t node_mem[MAX_NODES]; unsigned long *node_cpumask[MAX_NODES]; +int nb_hp_dimms; uint8_t qemu_uuid[16]; @@ -2065,6 +2067,50 @@ static int chardev_init_func(QemuOpts *opts, void *opaque) return 0; } +static int dimmcfg_init_func(QemuOpts *opts, void *opaque) +{ +const char *driver; +const char *id; +uint64_t node, size; +uint32_t populated; +const char *buf, *busbuf; + +/* DimmDevice configuration needs to be known in order to initialize chipset + * with correct memory and pci ranges. But all devices are created after + * chipset / machine initialization. In * order to avoid this problem, we + * parse dimm information earlier into dimmcfg structs. */ + +driver = qemu_opt_get(opts, "driver"); +if (!strcmp(driver, "dimm")) { + +id = qemu_opts_id(opts); +buf = qemu_opt_get(opts, "size"); +parse_option_size("size", buf, &size, NULL); +buf = qemu_opt_get(opts, "node"); +parse_option_number("node", buf, &node, NULL); +busbuf = qemu_opt_get(opts, "bus"); +buf = qemu_opt_get(opts, "populated"); +if (!buf) { +populated = 0; +} else { +populated = strcmp(buf, "on") ? 0 : 1; +} + +dimm_config_create((char *)id, size, busbuf ? busbuf : "membus.0", +node, nb_hp_dimms, populated); + +/* if !populated, we just keep the config. The real device + * will be created in the future with a normal device_add + * command. */ +if (!populated) { +qemu_opts_del(opts); +} +nb_hp_dimms++; +} + +return 0; +} + #ifdef CONFIG_VIRTFS static int fsdev_init_func(QemuOpts *opts, void *opaque) { @@ -3859,6 +3905,11 @@ int main(int argc, char **argv, char **envp) } qemu_add_globals(); +/* init generic devices */ +if (qemu_opts_foreach(qemu_find_opts("device"), + dimmcfg_init_func, NULL, 1) != 0) { +exit(1); +} qdev_machine_init(); QEMUMachineInitArgs args = { .ram_size = ram_size, -- 1.7.9
[Qemu-devel] [RFC PATCH v4 18/30] Introduce paravirt interface QEMU_CFG_PCI_WINDOW
Qemu calculates the 32-bit and 64-bit PCI starting offsets based on initial memory and hotplug-able dimms. This info needs to be passed to Seabios for PCI initialization. Signed-off-by: Vasilis Liaskovitis --- hw/fw_cfg.h |1 + hw/pc_piix.c | 10 ++ hw/pc_q35.c |9 + 3 files changed, 20 insertions(+), 0 deletions(-) diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index 619a394..8b48493 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -27,6 +27,7 @@ #define FW_CFG_SETUP_SIZE 0x17 #define FW_CFG_SETUP_DATA 0x18 #define FW_CFG_FILE_DIR 0x19 +#define FW_CFG_PCI_WINDOW 0x1a #define FW_CFG_FILE_FIRST 0x20 #define FW_CFG_FILE_SLOTS 0x10 diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 1a99852..b6633e8 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -48,6 +48,7 @@ # include #endif #include "piix_pci.h" +#include "fw_cfg.h" #define MAX_IDE_BUS 2 @@ -86,6 +87,7 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *pci_memory; MemoryRegion *rom_memory; void *fw_cfg = NULL; +uint64_t *pci_window_fw_cfg; I440FXState *i440fx_host; PIIX3State *piix3; @@ -141,6 +143,14 @@ static void pc_init1(MemoryRegion *system_memory, qdev_init_nofail(DEVICE(i440fx_host)); bochs_meminfo_bios_init(fw_cfg); + +pci_window_fw_cfg = g_malloc0(2 * 8); +pci_window_fw_cfg[0] = cpu_to_le64(i440fx_host->mch.below_4g_mem_size); +pci_window_fw_cfg[1] = cpu_to_le64(0x1ULL + +i440fx_host->mch.above_4g_mem_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW, +(uint8_t *)pci_window_fw_cfg, 2 * 8); + i440fx_state = &i440fx_host->mch; pci_bus = i440fx_host->parent_obj.bus; /* Xen supports additional interrupt routes from the PCI devices to diff --git a/hw/pc_q35.c b/hw/pc_q35.c index 7ce0b53..e35814a 100644 --- a/hw/pc_q35.c +++ b/hw/pc_q35.c @@ -87,6 +87,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) PCIDevice *ahci; qemu_irq *cmos_s3; void *fw_cfg = NULL; +uint64_t *pci_window_fw_cfg; pc_cpus_init(cpu_model); @@ -139,6 +140,14 @@ static void pc_q35_init(QEMUMachineInitArgs *args) /* pci */ qdev_init_nofail(DEVICE(q35_host)); bochs_meminfo_bios_init(fw_cfg); + +pci_window_fw_cfg = g_malloc0(2 * 8); +pci_window_fw_cfg[0] = cpu_to_le64(MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT); +pci_window_fw_cfg[1] = cpu_to_le64(0x1ULL + +q35_host->mch.above_4g_mem_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW, +(uint8_t *)pci_window_fw_cfg, 2 * 8); + host_bus = q35_host->host.pci.bus; /* create ISA bus */ lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV, -- 1.7.9
[Qemu-devel] [RFC PATCH v4 20/30] balloon: update with hotplugged memory
query-balloon and "info balloon" should report total memory available to the guest. balloon inflate/ deflate can also use all memory available to the guest (initial + hotplugged memory) Ballon driver has been minimaly tested with the patch, please review and test. Caveat: if the guest does not online hotplugged-memory, it's easy for a balloon inflate command to OOM a guest. Signed-off-by: Vasilis Liaskovitis --- hw/virtio-balloon.c | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index dd1a650..149e8ba 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -22,6 +22,7 @@ #include "virtio-balloon.h" #include "kvm.h" #include "exec-memory.h" +#include "dimm.h" #if defined(__linux__) #include @@ -147,10 +148,11 @@ static void virtio_balloon_set_config(VirtIODevice *vdev, VirtIOBalloon *dev = to_virtio_balloon(vdev); struct virtio_balloon_config config; uint32_t oldactual = dev->actual; +uint64_t hotplugged_ram_size = get_hp_memory_total(); memcpy(&config, config_data, 8); dev->actual = le32_to_cpu(config.actual); if (dev->actual != oldactual) { -qemu_balloon_changed(ram_size - +qemu_balloon_changed(ram_size + hotplugged_ram_size - (dev->actual << VIRTIO_BALLOON_PFN_SHIFT)); } } @@ -188,17 +190,20 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo *info) info->actual = ram_size - ((uint64_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT); +info->actual += get_hp_memory_total(); } static void virtio_balloon_to_target(void *opaque, ram_addr_t target) { VirtIOBalloon *dev = opaque; +uint64_t hotplugged_ram_size = get_hp_memory_total(); -if (target > ram_size) { -target = ram_size; +if (target > ram_size + hotplugged_ram_size) { +target = ram_size + hotplugged_ram_size; } if (target) { -dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT; +dev->num_pages = (ram_size + hotplugged_ram_size - target) >> + VIRTIO_BALLOON_PFN_SHIFT; virtio_notify_config(&dev->vdev); } } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 04/30] [SeaBIOS] acpi: generate hotplug memory devices
The memory device generation is guided by qemu paravirt info. Seabios first uses the info to setup SRAT entries for the hotplug-able memory slots. Afterwards, build_memssdt uses the created SRAT entries to generate appropriate memory device objects. One memory device (and corresponding SRAT entry) is generated for each hotplug-able qemu memslot. Currently no SSDT memory device is created for initial system memory. We only support up to 255 DIMMs for now (PackageOp used for the MEON array can only describe an array of at most 255 elements. VarPackageOp would be needed to support more than 255 devices) v1->v2: Seabios reads mems_sts from qemu to build e820_map SSDT size and some offsets are calculated with extraction macros. --- src/acpi.c | 158 +-- 1 files changed, 152 insertions(+), 6 deletions(-) diff --git a/src/acpi.c b/src/acpi.c index 6267d7b..82231da 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -14,6 +14,7 @@ #include "ioport.h" // inl #include "paravirt.h" // qemu_cfg_irq0_override #include "dev-q35.h" // qemu_cfg_irq0_override +#include "memmap.h" // /* ACPI tables init */ @@ -446,11 +447,26 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes) #define PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) #define PCI_SLOTS 32 +/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */ +#define MEM_BASE 0xaf80 +#define MEM_AML (ssdm_mem_aml + *ssdt_mem_start) +#define MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start) +#define MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2) +#define MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start) +#define MEM_OFFSET_PXM 31 +#define MEM_OFFSET_START 55 +#define MEM_OFFSET_END 63 +#define MEM_OFFSET_SIZE 79 + +u64 nb_hp_memslots = 0; +struct srat_memory_affinity *mem; + #define SSDT_SIGNATURE 0x54445353 // SSDT #define SSDT_HEADER_LENGTH 36 #include "ssdt-susp.hex" #include "ssdt-pcihp.hex" +#include "ssdt-mem.hex" #define PCI_RMV_BASE 0xae0c @@ -502,6 +518,111 @@ static void patch_pcihp(int slot, u8 *ssdt_ptr, u32 eject) } } +static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node) +{ +memcpy(ssdt_ptr, MEM_AML, MEM_SIZEOF); +ssdt_ptr[MEM_OFFSET_HEX] = getHex(i >> 4); +ssdt_ptr[MEM_OFFSET_HEX+1] = getHex(i); +ssdt_ptr[MEM_OFFSET_ID] = i; +ssdt_ptr[MEM_OFFSET_PXM] = node; +*(u64*)(ssdt_ptr + MEM_OFFSET_START) = mem_base; +*(u64*)(ssdt_ptr + MEM_OFFSET_END) = mem_base + mem_len; +*(u64*)(ssdt_ptr + MEM_OFFSET_SIZE) = mem_len; +} + +static void* +build_memssdt(void) +{ +u64 mem_base; +u64 mem_len; +u8 node; +int i; +struct srat_memory_affinity *entry = mem; +u64 nb_memdevs = nb_hp_memslots; +u8 memslot_status, enabled; + +int length = ((1+3+4) + + (nb_memdevs * MEM_SIZEOF) + + (1+2+5+(12*nb_memdevs)) + + (6+2+1+(1*nb_memdevs))); +u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length); +if (! ssdt) { +warn_noalloc(); +return NULL; +} +u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header); + +// build Scope(_SB_) header +*(ssdt_ptr++) = 0x10; // ScopeOp +ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3); +*(ssdt_ptr++) = '_'; +*(ssdt_ptr++) = 'S'; +*(ssdt_ptr++) = 'B'; +*(ssdt_ptr++) = '_'; + +for (i = 0; i < nb_memdevs; i++) { +mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low); +mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low); +node = entry->proximity[0]; +build_memdev(ssdt_ptr, i, mem_base, mem_len, node); +ssdt_ptr += MEM_SIZEOF; +entry++; +} + +// build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}" +*(ssdt_ptr++) = 0x14; // MethodOp +ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2); +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'T'; +*(ssdt_ptr++) = 'F'; +*(ssdt_ptr++) = 'Y'; +*(ssdt_ptr++) = 0x02; +for (i=0; i> 4); +*(ssdt_ptr++) = getHex(i); +*(ssdt_ptr++) = 0x69; // Arg1Op +} + +// build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })" +*(ssdt_ptr++) = 0x08; // NameOp +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'E'; +*(ssdt_ptr++) = 'O'; +*(ssdt_ptr++) = 'N'; +*(ssdt_ptr++) = 0x12; // PackageOp +ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2); +*(ssdt_ptr++) = nb_memdevs; + +entry = mem; +memslot_status = 0; + +for (i = 0; i < nb_memdevs; i++) { +enabled = 0; +if (i % 8 == 0) +memslot_status = inb(MEM_BASE + i/8); +enabled = memslot_status & 1; +mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low); +mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low); +*(ssdt_ptr++) = enabled ? 0x01 : 0x00; +if (enabled
[Qemu-devel] [RFC PATCH v4 09/30] Implement dimm device abstraction
Each hotplug-able memory slot is a DimmDevice. All DimmDevices are attached to a new bus called DimmBus. This bus is introduced so that we no longer depend on hotplug-capability of main system bus (the main bus does not allow hotplugging). The DimmBus should be attached to a chipset Device (i440fx in case of the pc) A hot-add operation for a particular dimm: - creates a new DimmDevice and attaches it to the DimmBus - creates a new MemoryRegion of the given physical address offset, size and node proximity, and attaches it to main system memory as a sub_region. Hotplug operations are done through normal device_add commands. Also add properties to DimmDevice. v3->v4: Removed hot-remove functions. Will be offered in separate patches. Signed-off-by: Vasilis Liaskovitis --- hw/Makefile.objs |2 +- hw/dimm.c| 245 ++ hw/dimm.h| 89 3 files changed, 335 insertions(+), 1 deletions(-) create mode 100644 hw/dimm.c create mode 100644 hw/dimm.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index d581d8d..51494c9 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -29,7 +29,7 @@ common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o common-obj-$(CONFIG_PCSPK) += pcspk.o common-obj-$(CONFIG_PCKBD) += pckbd.o common-obj-$(CONFIG_FDC) += fdc.o -common-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o acpi_ich9.o smbus_ich9.o +common-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o acpi_ich9.o smbus_ich9.o dimm.o common-obj-$(CONFIG_APM) += pm_smbus.o apm.o common-obj-$(CONFIG_DMA) += dma.o common-obj-$(CONFIG_I82374) += i82374.o diff --git a/hw/dimm.c b/hw/dimm.c new file mode 100644 index 000..e384952 --- /dev/null +++ b/hw/dimm.c @@ -0,0 +1,245 @@ +/* + * Dimm device for Memory Hotplug + * + * Copyright ProfitBricks GmbH 2012 + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/> + */ + +#include "trace.h" +#include "qdev.h" +#include "dimm.h" +#include +#include "../exec-memory.h" +#include "qmp-commands.h" + +/* the following list is used to hold dimm config info before machine + * is initialized. After machine init, the list is not used anymore.*/ +static DimmConfiglist dimmconfig_list = + QTAILQ_HEAD_INITIALIZER(dimmconfig_list); + +/* the list of memory buses */ +static QLIST_HEAD(, DimmBus) memory_buses; + +static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent); +static char *dimmbus_get_fw_dev_path(DeviceState *dev); + +static Property dimm_properties[] = { +DEFINE_PROP_UINT64("start", DimmDevice, start, 0), +DEFINE_PROP_SIZE("size", DimmDevice, size, DEFAULT_DIMMSIZE), +DEFINE_PROP_UINT32("node", DimmDevice, node, 0), +DEFINE_PROP_BIT("populated", DimmDevice, populated, 0, false), +DEFINE_PROP_END_OF_LIST(), +}; + +static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent) +{ +} + +static char *dimmbus_get_fw_dev_path(DeviceState *dev) +{ +char path[40]; + +snprintf(path, sizeof(path), "%s", qdev_fw_name(dev)); +return strdup(path); +} + +static void dimm_bus_class_init(ObjectClass *klass, void *data) +{ +BusClass *k = BUS_CLASS(klass); + +k->print_dev = dimmbus_dev_print; +k->get_fw_dev_path = dimmbus_get_fw_dev_path; +} + +static void dimm_bus_initfn(Object *obj) +{ +DimmBus *bus = DIMM_BUS(obj); +QTAILQ_INIT(&bus->dimmconfig_list); +QTAILQ_INIT(&bus->dimmlist); +} + +static const TypeInfo dimm_bus_info = { +.name = TYPE_DIMM_BUS, +.parent = TYPE_BUS, +.instance_size = sizeof(DimmBus), +.instance_init = dimm_bus_initfn, +.class_init = dimm_bus_class_init, +}; + +DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms, +dimm_calcoffset_fn pmc_set_offset) +{ +DimmBus *memory_bus; +DimmConfig *dimm_cfg, *next_cfg; +uint32_t num_dimms = 0; + +memory_bus = g_malloc0(dimm_bus_info.instance_size); +memory_bus->qbus.name = name ? g_strdup(name) : "membus.0"; +qbus_create_inplace(&memory_bus->qbus, TYPE_DIMM_BUS, DEVICE(parent), + name); + +QTAILQ_FOREACH_SAFE(dimm_cfg, &dimmconfig_list, nextdimmcfg, next_cfg) { +if (!strcmp(memory_bus->qbus.name, di
[Qemu-devel] [RFC PATCH v4 02/30] [SeaBIOS] Add SSDT memory device support
Define SSDT hotplug-able memory devices in _SB namespace. The dynamically generated SSDT includes per memory device hotplug methods. These methods just call methods defined in the DSDT. Also dynamically generate a MTFY method and a MEON array of the online/available memory devices. ACPI extraction macros are used to place the AML code in variables later used by src/acpi. The design is taken from SSDT cpu generation. v3->v4: EJ0 operation will be provided separately --- Makefile |2 +- src/ssdt-mem.dsl | 62 ++ 2 files changed, 63 insertions(+), 1 deletions(-) create mode 100644 src/ssdt-mem.dsl diff --git a/Makefile b/Makefile index f28d86c..c8fcc57 100644 --- a/Makefile +++ b/Makefile @@ -220,7 +220,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.p $(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst > $(OUT)$*.off $(Q)cat $(OUT)$*.off > $@ -$(OUT)acpi.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-susp.hex $(OUT)q35-acpi-dsdt.hex +$(OUT)acpi.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-susp.hex $(OUT)q35-acpi-dsdt.hex $(OUT)ssdt-mem.hex Kconfig rules diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl new file mode 100644 index 000..dbac33f --- /dev/null +++ b/src/ssdt-mem.dsl @@ -0,0 +1,62 @@ +/* This file is the basis for the ssdt_mem[] variable in src/acpi.c. + * It is similar in design to the ssdt_proc variable. + * It defines the contents of the per-dimm QWordMemory() object. At + * runtime, a dynamically generated SSDT will contain one copy of this + * AML snippet for every possible memory device in the system. The + * objects will * be placed in the \_SB_ namespace. + * + * In addition to the aml code generated from this file, the + * src/acpi.c file creates a MTFY method with an entry for each memdevice: + * Method(MTFY, 2) { + * If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) } + * If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) } + * ... + * } + * and a MEON array with the list of active and inactive memory devices: + * Name(MEON, Package() { One, One, ..., Zero, Zero, ... }) + */ +ACPI_EXTRACT_ALL_CODE ssdm_mem_aml + +DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1) +/* v-- DO NOT EDIT --v */ +{ +ACPI_EXTRACT_DEVICE_START ssdt_mem_start +ACPI_EXTRACT_DEVICE_END ssdt_mem_end +ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name +Device(MPAA) { +ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id +Name(ID, 0xAA) +/* ^-- DO NOT EDIT --^ + * + * The src/acpi.c code requires the above layout so that it can update + * MPAA and 0xAA with the appropriate MEMDEVICE id (see + * SD_OFFSET_MEMHEX/MEMID1/MEMID2). Don't change the above without + * also updating the C code. + */ +Name(_HID, EISAID("PNP0C80")) +Name(_PXM, 0xAA) + +External(CMST, MethodObj) +External(MPEJ, MethodObj) + +Name(_CRS, ResourceTemplate() { +QwordMemory( + ResourceConsumer, + , + MinFixed, + MaxFixed, + Cacheable, + ReadWrite, + 0x0, + 0xDEADBEEF, + 0xE6ADBEEE, + 0x, + 0x0800, + ) +}) +Method (_STA, 0) { +Return(CMST(ID)) +} +} +} + -- 1.7.9
[Qemu-devel] [RFC PATCH v4 11/30] acpi_piix4 : Implement memory device hotplug registers
A 32-byte register is used to present up to 256 hotplug-able memory devices to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug event through these. Only reads are allowed from these registers. An ACPI hot-remove event but needs to wait for OSPM to eject the device. We use a single-byte register to know when OSPM has called the _EJ function for a particular dimm. A write to this byte will depopulate the respective dimm. Only writes are allowed to this byte. v1->v2: mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for cpu-hotplugging in the future). _EJ array is reduced to a single byte. Add documentation in docs/specs/acpi_hotplug.txt v3->v4: Removed hot-remove functions, will be added separately. Updated for memory API. Signed-off-by: Vasilis Liaskovitis --- docs/specs/acpi_hotplug.txt | 14 + hw/acpi.h |5 +++ hw/acpi_piix4.c | 65 +- 3 files changed, 82 insertions(+), 2 deletions(-) create mode 100644 docs/specs/acpi_hotplug.txt diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt new file mode 100644 index 000..8391713 --- /dev/null +++ b/docs/specs/acpi_hotplug.txt @@ -0,0 +1,14 @@ +QEMU<->ACPI BIOS hotplug interface +-- +This document describes the interface between QEMU and the ACPI BIOS for non-PCI +space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt + +QEMU<->ACPI BIOS memory hotplug interface +-- + +Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access): +--- +Dimm hot-plug notification pending. One bit per slot. + +Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove +events. Read-only. diff --git a/hw/acpi.h b/hw/acpi.h index afda153..dc617d3 100644 --- a/hw/acpi.h +++ b/hw/acpi.h @@ -120,6 +120,11 @@ struct ACPIREGS { Notifier wakeup; }; +#include "dimm.h" +struct gpe_regs { +uint8_t mems_sts[DIMM_BITMAP_BYTES]; +}; + /* PM_TMR */ void acpi_pm_tmr_update(ACPIREGS *ar, bool enable); void acpi_pm_tmr_calc_overflow_time(ACPIREGS *ar); diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 0b5b0d3..879d8a0 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -29,6 +29,8 @@ #include "ioport.h" #include "fw_cfg.h" #include "exec-memory.h" +#include "sysbus.h" +#include "dimm.h" //#define DEBUG @@ -47,7 +49,9 @@ #define PCI_DOWN_BASE 0xae04 #define PCI_EJ_BASE 0xae08 #define PCI_RMV_BASE 0xae0c +#define MEM_BASE 0xaf80 +#define PIIX4_MEM_HOTPLUG_STATUS 8 #define PIIX4_PCI_HOTPLUG_STATUS 2 struct pci_status { @@ -60,6 +64,7 @@ typedef struct PIIX4PMState { MemoryRegion io; MemoryRegion io_gpe; MemoryRegion io_pci; +MemoryRegion io_memhp; ACPIREGS ar; APMState apm; @@ -74,6 +79,7 @@ typedef struct PIIX4PMState { Notifier powerdown_notifier; /* for pci hotplug */ +struct gpe_regs gperegs; struct pci_status pci0_status; uint32_t pci0_hotplug_enable; uint32_t pci0_slot_device_present; @@ -98,8 +104,8 @@ static void pm_update_sci(PIIX4PMState *s) ACPI_BITMASK_POWER_BUTTON_ENABLE | ACPI_BITMASK_GLOBAL_LOCK_ENABLE | ACPI_BITMASK_TIMER_ENABLE)) != 0) || -(((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) - & PIIX4_PCI_HOTPLUG_STATUS) != 0); +(((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) & + (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0); qemu_set_irq(s->irq, sci_level); /* schedule a timer interruption if needed */ @@ -526,6 +532,29 @@ static const MemoryRegionOps piix4_gpe_ops = { .endianness = DEVICE_LITTLE_ENDIAN, }; +static uint32_t memhp_readb(void *opaque, uint32_t addr) +{ +PIIX4PMState *s = opaque; +uint32_t val = 0; +struct gpe_regs *g = &s->gperegs; +if (addr < DIMM_BITMAP_BYTES) { +val = (uint32_t) g->mems_sts[addr]; +} +PIIX4_DPRINTF(stderr, "memhp read %x == %x\n", addr, val); +return val; +} + +static const MemoryRegionOps piix4_memhp_ops = { +.old_portio = (MemoryRegionPortio[]) { +{ +.offset = 0, .len = DIMM_BITMAP_BYTES, .size = 1, +.read = memhp_readb, +}, +PORTIO_END_OF_LIST() +}, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + static uint32_t pci_up_read(void *opaque, uint32_t addr) { PIIX4PMState *s = opaque; @@ -592,9 +621,11 @@ static const MemoryRegionOps piix4_pci_ops = { static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, PCIHotplugState state); +static int piix4_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int add); static void piix4_acpi_syste
[Qemu-devel] [RFC PATCH v4 00/30] ACPI memory hotplug
n state. series is based on: - qemu master (commit a8a826a3) + patch: https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02699.html - seabios master (commit a810e4e7) Can also be found at: http://github.com/vliaskov/qemu-kvm/commits/memhp-v4 http://github.com/vliaskov/seabios/commits/memhp-v4 Vasilis Liaskovitis (21): qapi: make visit_type_size fallback to type_int Add SIZE type to qdev properties qemu-option: export parse_option_number Implement dimm device abstraction vl: handle "-device dimm" acpi_piix4 : Implement memory device hotplug registers acpi_ich9 : Implement memory device hotplug registers piix_pci and pc_piix: refactor piix_pci: Add i440fx dram controller initialization q35: Add i440fx dram controller initialization pc: Add dimm paravirt SRAT info Introduce paravirt interface QEMU_CFG_PCI_WINDOW Implement "info memory-total" and "query-memory-total" balloon: update with hotplugged memory Implement dimm-info dimm: add hot-remove capability acpi_piix4: add hot-remove capability acpi_ich9: add hot-remove capability Implement qmp and hmp commands for notification lists Add _OST dimm support Implement _PS3 for dimm docs/specs/acpi_hotplug.txt | 54 ++ docs/specs/fwcfg.txt| 28 +++ hmp-commands.hx |6 + hmp.c | 41 hmp.h |3 + hw/Makefile.objs|2 +- hw/acpi.h |5 + hw/acpi_ich9.c | 115 +++- hw/acpi_ich9.h | 12 +- hw/acpi_piix4.c | 126 - hw/dimm.c | 444 +++ hw/dimm.h | 102 ++ hw/fw_cfg.h |1 + hw/lpc_ich9.c |2 +- hw/pc.c | 28 +++- hw/pc.h |1 + hw/pc_piix.c| 74 ++-- hw/pc_q35.c | 18 ++- hw/piix_pci.c | 249 - hw/q35.c| 27 +++ hw/q35.h|5 + hw/qdev-properties.c| 60 ++ hw/qdev-properties.h|3 + hw/virtio-balloon.c | 13 +- monitor.c | 21 ++ qapi-schema.json| 63 ++ qapi/qapi-visit-core.c | 11 +- qemu-option.c |4 +- qemu-option.h |4 + qmp-commands.hx | 57 ++ sysemu.h|1 + vl.c| 60 ++ 32 files changed, 1432 insertions(+), 208 deletions(-) create mode 100644 docs/specs/acpi_hotplug.txt create mode 100644 docs/specs/fwcfg.txt create mode 100644 hw/dimm.c create mode 100644 hw/dimm.h Vasilis Liaskovitis (9): Add ACPI_EXTRACT_DEVICE* macros Add SSDT memory device support acpi-dsdt: Implement functions for memory hotplug acpi: generate hotplug memory devices q35: Add memory hotplug handler pci: Use paravirt interface for pcimem_start and pcimem64_start acpi: add _EJ0 operation and eject port for memory devices Add _OST dimm method Implement _PS3 method for memory device Makefile |2 +- src/acpi-dsdt-mem-hotplug.dsl | 136 +++ src/acpi-dsdt.dsl |5 +- src/acpi.c| 158 +++-- src/paravirt.c|6 ++ src/paravirt.h|2 + src/pciinit.c |9 +++ src/q35-acpi-dsdt.dsl |6 +- src/ssdt-mem.dsl | 73 +++ tools/acpi_extract.py | 28 +++ 10 files changed, 415 insertions(+), 10 deletions(-) create mode 100644 src/acpi-dsdt-mem-hotplug.dsl create mode 100644 src/ssdt-mem.dsl -- 1.7.9
[Qemu-devel] [RFC PATCH v4 23/30] dimm: add hot-remove capability
On a succesfull _EJ0 operation unmap the device from the guest by using the new qdev function qdev_unplug_complete, see: https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02699.html The memory of the device should be freed when the last subsystem using it unmaps it, see the following two series: https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00728.html https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02697.html Needs testing. Other subsystems (e.g. virtio-blk) may have to install new memorylisteners to complete pending I/O before device memory can be freed. Signed-off-by: Vasilis Liaskovitis --- hw/dimm.c | 51 +++ hw/dimm.h |1 + 2 files changed, 52 insertions(+), 0 deletions(-) diff --git a/hw/dimm.c b/hw/dimm.c index e79f23d..0b4e22d 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -120,6 +120,18 @@ static void dimm_populate(DimmDevice *s) s->mr = new; } +static int dimm_depopulate(DeviceState *dev) +{ +DimmDevice *s = DIMM(dev); +assert(s); +vmstate_unregister_ram(s->mr, NULL); +memory_region_del_subregion(get_system_memory(), s->mr); +memory_region_destroy(s->mr); +s->populated = false; +s->mr = NULL; +return 0; +} + void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node, uint32_t dimm_idx, uint32_t populated) { @@ -159,6 +171,11 @@ static void dimm_plug_device(DimmDevice *slot) static int dimm_unplug_device(DeviceState *qdev) { +DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(qdev)); + +if (bus->dimm_hotplug) { +bus->dimm_hotplug(bus->dimm_hotplug_qdev, DIMM(qdev), 0); +} return 1; } @@ -186,6 +203,21 @@ static DimmDevice *dimm_find_from_name(DimmBus *bus, const char *name) return NULL; } +static DimmDevice *dimm_find_from_idx(uint32_t idx) +{ +DimmDevice *slot; +DimmBus *bus; + +QLIST_FOREACH(bus, &memory_buses, next) { +QTAILQ_FOREACH(slot, &bus->dimmlist, nextdimm) { +if (slot->idx == idx) { +return slot; +} +} +} +return NULL; +} + void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots) { DimmConfig *slot; @@ -275,6 +307,24 @@ static int dimm_init(DeviceState *s) return 0; } +void dimm_notify(uint32_t idx, uint32_t event) +{ +DimmBus *bus; +DimmDevice *slot; + +slot = dimm_find_from_idx(idx); +assert(slot != NULL); +bus = DIMM_BUS(qdev_get_parent_bus(&slot->qdev)); + +switch (event) { +case DIMM_REMOVE_SUCCESS: +qdev_unplug_complete((DeviceState *)slot, NULL); +QTAILQ_REMOVE(&bus->dimmlist, slot, nextdimm); +break; +default: +break; +} +} static void dimm_class_init(ObjectClass *klass, void *data) { @@ -283,6 +333,7 @@ static void dimm_class_init(ObjectClass *klass, void *data) dc->props = dimm_properties; dc->unplug = dimm_unplug_device; dc->init = dimm_init; +dc->exit = dimm_depopulate; dc->bus_type = TYPE_DIMM_BUS; } diff --git a/hw/dimm.h b/hw/dimm.h index 5130b2c..86c7cd5 100644 --- a/hw/dimm.h +++ b/hw/dimm.h @@ -86,5 +86,6 @@ DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms, void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node, uint32_t dimm_idx, uint32_t populated); uint64_t get_hp_memory_total(void); +void dimm_notify(uint32_t idx, uint32_t event); #endif -- 1.7.9
[Qemu-devel] [RFC PATCH v4 29/30] [SeaBIOS] Implement _PS3 method for memory device
--- src/acpi-dsdt-mem-hotplug.dsl | 15 +++ src/ssdt-mem.dsl |4 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/src/acpi-dsdt-mem-hotplug.dsl b/src/acpi-dsdt-mem-hotplug.dsl index a648bee..7d7c078 100644 --- a/src/acpi-dsdt-mem-hotplug.dsl +++ b/src/acpi-dsdt-mem-hotplug.dsl @@ -49,6 +49,13 @@ Scope(\_SB) { MIF, 8 } +/* Memory _PS3 byte */ +OperationRegion(MPSB, SystemIO, 0xafa4, 1) +Field (MPSB, ByteAcc, NoLock, Preserve) +{ +MPS, 8 +} + Method(MESC, 0) { // Local5 = active memdevice bitmap Store (MES, Local5) @@ -90,6 +97,14 @@ Scope(\_SB) { Sleep(200) } + +Method (MPS3, 1, NotSerialized) { +// _PS3 method - power-off method +Store(Arg0, MPS) +Store(Zero, Index(MEON, Arg0)) +Sleep(200) +} + Method (MOST, 3, Serialized) { // _OST method - OS status indication Switch (And(Arg0, 0xFF)) { diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl index 47a3b4f..9827a58 100644 --- a/src/ssdt-mem.dsl +++ b/src/ssdt-mem.dsl @@ -39,6 +39,7 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1) External(CMST, MethodObj) External(MPEJ, MethodObj) External(MOST, MethodObj) +External(MPS3, MethodObj) Name(_CRS, ResourceTemplate() { QwordMemory( @@ -64,6 +65,9 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1) Method (_OST, 3) { MOST(Arg0, Arg1, ID) } +Method (_PS3, 0) { +MPS3(ID) +} } } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 21/30] Implement dimm-info
"query-dimm-info" and "info dimm" will give current state of all dimms in the system e.g. dimm0: on dimm1: off dimm2: off dimm3: on etc. Signed-off-by: Vasilis Liaskovitis --- hmp-commands.hx |2 ++ hmp.c| 17 + hmp.h|1 + hw/dimm.c| 43 +++ monitor.c|7 +++ qapi-schema.json | 26 ++ 6 files changed, 96 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 3fbd975..65d799e 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1572,6 +1572,8 @@ show qdev device model list show roms @item info memory-total show memory-total +@item info dimm +show dimm @end table ETEXI diff --git a/hmp.c b/hmp.c index fb39b0d..f8456fd 100644 --- a/hmp.c +++ b/hmp.c @@ -635,6 +635,23 @@ void hmp_info_memory_total(Monitor *mon) monitor_printf(mon, "MemTotal: %lu\n", ram_total); } +void hmp_info_dimm(Monitor *mon) +{ +DimmInfoList *info; +DimmInfoList *item; +DimmInfo *dimm; + +info = qmp_query_dimm_info(NULL); +for (item = info; item; item = item->next) { +dimm = item->value; +monitor_printf(mon, "dimm %s : %s\n", dimm->dimm, +dimm->state ? "on" : "off"); +dimm->dimm = NULL; +} + +qapi_free_DimmInfoList(info); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index 25a3a70..74ac061 100644 --- a/hmp.h +++ b/hmp.h @@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon); void hmp_info_pci(Monitor *mon); void hmp_info_block_jobs(Monitor *mon); void hmp_info_memory_total(Monitor *mon); +void hmp_info_dimm(Monitor *mon); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/hw/dimm.c b/hw/dimm.c index f181e54..e79f23d 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -174,6 +174,18 @@ static DimmConfig *dimmcfg_find_from_name(DimmBus *bus, const char *name) return NULL; } +static DimmDevice *dimm_find_from_name(DimmBus *bus, const char *name) +{ +DimmDevice *slot; + +QTAILQ_FOREACH(slot, &bus->dimmlist, nextdimm) { +if (!strcmp(slot->qdev.id, name)) { +return slot; +} +} +return NULL; +} + void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots) { DimmConfig *slot; @@ -203,6 +215,37 @@ uint64_t get_hp_memory_total(void) return info; } +DimmInfoList *qmp_query_dimm_info(Error **errp) +{ +DimmBus *bus; +DimmConfig *slot; +DimmInfoList *head = NULL, *info, *cur_item = NULL; + +QLIST_FOREACH(bus, &memory_buses, next) { +QTAILQ_FOREACH(slot, &bus->dimmconfig_list, nextdimmcfg) { + +info = g_malloc0(sizeof(*info)); +info->value = g_malloc0(sizeof(*info->value)); +info->value->dimm = g_malloc0(sizeof(char) * 32); +strcpy(info->value->dimm, slot->name); +if (dimm_find_from_name(bus, slot->name)) { +info->value->state = 1; +} else { +info->value->state = 0; +} +/* XXX: waiting for the qapi to support GSList */ +if (!cur_item) { +head = cur_item = info; +} else { +cur_item->next = info; +cur_item = info; +} +} +} + +return head; +} + static int dimm_init(DeviceState *s) { DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(s)); diff --git a/monitor.c b/monitor.c index 6e87d0d..de1dcf1 100644 --- a/monitor.c +++ b/monitor.c @@ -2743,6 +2743,13 @@ static mon_cmd_t info_cmds[] = { .mhandler.info = do_trace_print_events, }, { +.name = "dimm", +.args_type = "", +.params = "", +.help = "show active and non active dimms", +.mhandler.info = hmp_info_dimm, +}, +{ .name = NULL, }, }; diff --git a/qapi-schema.json b/qapi-schema.json index 33f88d6..5a20577 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2914,6 +2914,32 @@ { 'command': 'query-memory-total', 'returns': 'int' } ## +# @DimmInfo: +# +# Information about status of a memory hotplug command +# +# @dimm: the Dimm associated with the result +# +# @result: the result of the hotplug command +# +# Since: 1.4 +# +## +{ 'type': 'DimmInfo', + 'data': {'dimm': 'str', 'state': 'bool'} } + +## +# @query-dimm-info: +# +# Returns total memory in bytes, including hotplugged dimms +# +# Returns: int +# +# Since: 1.4 +## +{ 'command': 'query-dimm-info', 'returns': ['DimmInfo'] } + +## # @QKeyCode: # # An enumeration of key name. -- 1.7.9
[Qemu-devel] [RFC PATCH v4 28/30] Add _OST dimm support
This allows qemu to receive notifications from the guest OS on success or failure of a memory hotplug request. The guest OS needs to implement the _OST functionality for this to work (linux-next: http://lkml.org/lkml/2012/6/25/321) This patch also updates dimm bitmap state and hot-remove pending flag on hot-remove fail. This allows failed hot operations to be retried at anytime (only works for guests that use _OST notification). Also adds new _OST registers in docs/specs/acpi_hotplug.txt --- docs/specs/acpi_hotplug.txt | 25 + hw/acpi_ich9.c | 31 --- hw/acpi_ich9.h |3 +++ hw/acpi_piix4.c | 35 --- hw/dimm.c | 28 +++- hw/dimm.h | 11 ++- 6 files changed, 125 insertions(+), 8 deletions(-) diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt index cf86242..536da16 100644 --- a/docs/specs/acpi_hotplug.txt +++ b/docs/specs/acpi_hotplug.txt @@ -20,3 +20,28 @@ ejected. Written by ACPI memory device _EJ0 method to notify qemu of successfull hot-removal. Write-only. + +Memory Dimm ejection failure notification (IO port 0xafa1, 1-byte access): +--- +Dimm hot-remove _OST notification. Byte value indicates Dimm slot for which +ejection failed. + +Written by ACPI memory device _OST method to notify qemu of failed +hot-removal. Write-only. + +Memory Dimm insertion success notification (IO port 0xafa2, 1-byte access): +--- +Dimm hot-remove _OST notification. Byte value indicates Dimm slot for which +insertion succeeded. + +Written by ACPI memory device _OST method to notify qemu of failed +hot-add. Write-only. + +Memory Dimm insertion failure notification (IO port 0xafa3, 1-byte access): +--- +Dimm hot-remove _OST notification. Byte value indicates Dimm slot for which +insertion failed. + +Written by ACPI memory device _OST method to notify qemu of failed +hot-add. Write-only. + diff --git a/hw/acpi_ich9.c b/hw/acpi_ich9.c index f5dc1c9..2705230 100644 --- a/hw/acpi_ich9.c +++ b/hw/acpi_ich9.c @@ -111,6 +111,15 @@ static void memhp_writeb(void *opaque, uint32_t addr, uint32_t val) case ICH9_MEM_EJ_BASE - ICH9_MEM_BASE: dimm_notify(val, DIMM_REMOVE_SUCCESS); break; +case ICH9_MEM_OST_REMOVE_FAIL - ICH9_MEM_BASE: +dimm_notify(val, DIMM_REMOVE_FAIL); +break; +case ICH9_MEM_OST_ADD_SUCCESS - ICH9_MEM_BASE: +dimm_notify(val, DIMM_ADD_SUCCESS); +break; +case ICH9_MEM_OST_ADD_FAIL - ICH9_MEM_BASE: +dimm_notify(val, DIMM_ADD_FAIL); +break; default: ICH9_DEBUG("memhp write invalid %x <== %d\n", addr, val); } @@ -125,7 +134,7 @@ static const MemoryRegionOps ich9_memhp_ops = { }, { .offset = ICH9_MEM_EJ_BASE - ICH9_MEM_BASE, -.len = 1, .size = 1, +.len = 4, .size = 1, .write = memhp_writeb, }, PORTIO_END_OF_LIST() @@ -274,6 +283,22 @@ static int ich9_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int return 0; } +static int ich9_dimm_revert(DeviceState *qdev, DimmDevice *dev, int add) +{ +PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev); +ICH9LPCState *s = DO_UPCAST(ICH9LPCState, d, pci_dev); +struct gpe_regs *g = &s->pm.gperegs; +DimmDevice *slot = DIMM(dev); +int idx = slot->idx; + +if (add) { +g->mems_sts[idx/8] &= ~(1 << (idx%8)); +} else { +g->mems_sts[idx/8] |= (1 << (idx%8)); +} +return 0; +} + void ich9_pm_init(void *device, qemu_irq sci_irq, qemu_irq cmos_s3) { ICH9LPCState *lpc = (ICH9LPCState *)device; @@ -296,10 +321,10 @@ void ich9_pm_init(void *device, qemu_irq sci_irq, qemu_irq cmos_s3) memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi); memory_region_init_io(&pm->io_memhp, &ich9_memhp_ops, pm, "apci-memhp0", - DIMM_BITMAP_BYTES + 1); + DIMM_BITMAP_BYTES + 4); memory_region_add_subregion(get_system_io(), ICH9_MEM_BASE, &pm->io_memhp); -dimm_bus_hotplug(ich9_dimm_hotplug, &lpc->d.qdev); +dimm_bus_hotplug(ich9_dimm_hotplug, ich9_dimm_revert, &lpc->d.qdev); pm->irq = sci_irq; qemu_register_reset(pm_reset, pm); diff --git a/hw/acpi_ich9.h b/hw/acpi_ich9.h index af61a2d..8f57cd8 100644 --- a/hw/acpi_ich9.h +++ b/hw/acpi_ich9.h @@ -26,6 +26,9 @@ #define ICH9_MEM_BASE0xaf80 #define ICH9_MEM_EJ_BASE0xafa0 #define ICH9_MEM_HOTPLUG_STATUS 8 +#define ICH9_MEM_OST_REMOVE_FAIL 0xafa1 +#define ICH9_MEM_OST_ADD_SUCCESS 0xafa2 +#define ICH9_MEM_OST_ADD_FAIL 0xafa3 typedef struct ICH9LPCPMRegs { /* diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4
[Qemu-devel] [RFC PATCH v4 17/30] [SeaBIOS] pci: Use paravirt interface for pcimem_start and pcimem64_start
Initialize the 32-bit and 64-bit pci starting offsets from values passed in by the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the starting offsets based on initial memory and hotplug-able dimms. It's possible to avoid the new paravirt interface, and calculate pci ranges from srat entries. But the code changes are ugly, see: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03548.html --- src/paravirt.c |6 ++ src/paravirt.h |2 ++ src/pciinit.c |9 + 3 files changed, 17 insertions(+), 0 deletions(-) diff --git a/src/paravirt.c b/src/paravirt.c index 4b5c441..f7517b9 100644 --- a/src/paravirt.c +++ b/src/paravirt.c @@ -347,3 +347,9 @@ void qemu_cfg_romfile_setup(void) dprintf(3, "Found fw_cfg file: %s (size=%d)\n", file->name, file->size); } } + +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start) +{ +qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64)); +qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64)); +} diff --git a/src/paravirt.h b/src/paravirt.h index a284c41..b53ff88 100644 --- a/src/paravirt.h +++ b/src/paravirt.h @@ -35,6 +35,7 @@ static inline int kvm_para_available(void) #define QEMU_CFG_BOOT_MENU 0x0e #define QEMU_CFG_MAX_CPUS 0x0f #define QEMU_CFG_FILE_DIR 0x19 +#define QEMU_CFG_PCI_WINDOW 0x1a #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES(QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) @@ -65,5 +66,6 @@ struct e820_reservation { u32 qemu_cfg_e820_entries(void); void* qemu_cfg_e820_load_next(void *addr); void qemu_cfg_romfile_setup(void); +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start); #endif diff --git a/src/pciinit.c b/src/pciinit.c index a406bbd..4103d2d 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -734,6 +734,7 @@ static void pci_bios_map_devices(struct pci_bus *busses) void pci_setup(void) { +u64 pv_pcimem_start, pv_pcimem64_start; if (CONFIG_COREBOOT || usingXen()) { // PCI setup already done by coreboot or Xen - just do probe. pci_probe_devices(); @@ -769,5 +770,13 @@ pci_setup(void) pci_bios_init_devices(); +/* if qemu gives us other pci window values, it means there are hotplug-able + * dimms. Adjust accordingly */ +qemu_cfg_get_pci_offsets(&pv_pcimem_start, &pv_pcimem64_start); +if (pv_pcimem_start > pcimem_start) +pcimem_start = pv_pcimem_start; +if (pv_pcimem64_start > pcimem64_start) +pcimem64_start = pv_pcimem64_start; + free(busses); } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 07/30] Add SIZE type to qdev properties
This patch adds a 'SIZE' type property to qdev. It will make dimm description more convenient by allowing sizes to be specified with K,M,G,T prefixes instead of number of bytes e.g.: -device dimm,id=mem0,size=2G,bus=membus.0 Credits go to Ian Molton for original patch. See: http://patchwork.ozlabs.org/patch/38835/ Signed-off-by: Vasilis Liaskovitis --- hw/qdev-properties.c | 60 ++ hw/qdev-properties.h |3 ++ qemu-option.c|2 +- qemu-option.h|2 + 4 files changed, 66 insertions(+), 1 deletions(-) diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 81d901c..a77f760 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -1279,3 +1279,63 @@ void qemu_add_globals(void) { qemu_opts_foreach(qemu_find_opts("global"), qdev_add_one_global, NULL, 0); } + +/* --- 64bit unsigned int 'size' type --- */ + +static void get_size(Object *obj, Visitor *v, void *opaque, + const char *name, Error **errp) +{ +DeviceState *dev = DEVICE(obj); +Property *prop = opaque; +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); + +visit_type_size(v, ptr, name, errp); +} + +static void set_size(Object *obj, Visitor *v, void *opaque, + const char *name, Error **errp) +{ +DeviceState *dev = DEVICE(obj); +Property *prop = opaque; +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); + +if (dev->state != DEV_STATE_CREATED) { +error_set(errp, QERR_PERMISSION_DENIED); +return; +} + +visit_type_size(v, ptr, name, errp); +} + +static int parse_size(DeviceState *dev, Property *prop, const char *str) +{ +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); +Error *errp = NULL; + +if (str != NULL) { +parse_option_size(prop->name, str, ptr, &errp); +} +assert_no_error(errp); +return 0; +} + +static int print_size(DeviceState *dev, Property *prop, char *dest, size_t len) +{ +uint64_t *ptr = qdev_get_prop_ptr(dev, prop); +char suffixes[] = {'T', 'G', 'M', 'K', 'B'}; +int i = 0; +uint64_t div; + +for (div = (long int)1 << 40; !(*ptr / div) ; div >>= 10) { +i++; +} +return snprintf(dest, len, "%0.03f%c", (double)*ptr/div, suffixes[i]); +} + +PropertyInfo qdev_prop_size = { +.name = "size", +.parse = parse_size, +.print = print_size, +.get = get_size, +.set = set_size, +}; diff --git a/hw/qdev-properties.h b/hw/qdev-properties.h index 5b046ab..0182bef 100644 --- a/hw/qdev-properties.h +++ b/hw/qdev-properties.h @@ -14,6 +14,7 @@ extern PropertyInfo qdev_prop_uint64; extern PropertyInfo qdev_prop_hex8; extern PropertyInfo qdev_prop_hex32; extern PropertyInfo qdev_prop_hex64; +extern PropertyInfo qdev_prop_size; extern PropertyInfo qdev_prop_string; extern PropertyInfo qdev_prop_chr; extern PropertyInfo qdev_prop_ptr; @@ -67,6 +68,8 @@ extern PropertyInfo qdev_prop_pci_host_devaddr; DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex32, uint32_t) #define DEFINE_PROP_HEX64(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex64, uint64_t) +#define DEFINE_PROP_SIZE(_n, _s, _f, _d) \ +DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_size, uint64_t) #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t) diff --git a/qemu-option.c b/qemu-option.c index 27891e7..38e0a11 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -203,7 +203,7 @@ static void parse_option_number(const char *name, const char *value, } } -static void parse_option_size(const char *name, const char *value, +void parse_option_size(const char *name, const char *value, uint64_t *ret, Error **errp) { char *postfix; diff --git a/qemu-option.h b/qemu-option.h index ca72986..b8ee5b3 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -152,5 +152,7 @@ typedef int (*qemu_opts_loopfunc)(QemuOpts *opts, void *opaque); int qemu_opts_print(QemuOpts *opts, void *dummy); int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, int abort_on_failure); +void parse_option_size(const char *name, const char *value, + uint64_t *ret, Error **errp); #endif -- 1.7.9
[Qemu-devel] [RFC PATCH v4 30/30] Implement _PS3 for dimm
This will allow us to update dimm state on OSPM-initiated eject operations e.g. with "echo 1 > /sys/bus/acpi/devices/PNP0C80\:00/eject" v3->v4: Add support for ich9 --- docs/specs/acpi_hotplug.txt |7 +++ hw/acpi_ich9.c |7 +-- hw/acpi_ich9.h |1 + hw/acpi_piix4.c |9 ++--- hw/dimm.c |4 hw/dimm.h |3 ++- 6 files changed, 25 insertions(+), 6 deletions(-) diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt index 536da16..69868fe 100644 --- a/docs/specs/acpi_hotplug.txt +++ b/docs/specs/acpi_hotplug.txt @@ -45,3 +45,10 @@ insertion failed. Written by ACPI memory device _OST method to notify qemu of failed hot-add. Write-only. +Memory Dimm _PS3 power-off initiated by OSPM (IO port 0xafa4, 1-byte access): +--- +Dimm hot-add _PS3 initiated by OSPM. Byte value indicates Dimm slot which +entered D3 state. + +Written by ACPI memory device _PS3 method to notify qemu of power-off state for +the dimm. Write-only. diff --git a/hw/acpi_ich9.c b/hw/acpi_ich9.c index 2705230..5e7fca6 100644 --- a/hw/acpi_ich9.c +++ b/hw/acpi_ich9.c @@ -120,6 +120,9 @@ static void memhp_writeb(void *opaque, uint32_t addr, uint32_t val) case ICH9_MEM_OST_ADD_FAIL - ICH9_MEM_BASE: dimm_notify(val, DIMM_ADD_FAIL); break; +case ICH9_MEM_PS3 - ICH9_MEM_BASE: + dimm_notify(val, DIMM_OSPM_POWEROFF); + break; default: ICH9_DEBUG("memhp write invalid %x <== %d\n", addr, val); } @@ -134,7 +137,7 @@ static const MemoryRegionOps ich9_memhp_ops = { }, { .offset = ICH9_MEM_EJ_BASE - ICH9_MEM_BASE, -.len = 4, .size = 1, +.len = 5, .size = 1, .write = memhp_writeb, }, PORTIO_END_OF_LIST() @@ -321,7 +324,7 @@ void ich9_pm_init(void *device, qemu_irq sci_irq, qemu_irq cmos_s3) memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi); memory_region_init_io(&pm->io_memhp, &ich9_memhp_ops, pm, "apci-memhp0", - DIMM_BITMAP_BYTES + 4); + DIMM_BITMAP_BYTES + 5); memory_region_add_subregion(get_system_io(), ICH9_MEM_BASE, &pm->io_memhp); dimm_bus_hotplug(ich9_dimm_hotplug, ich9_dimm_revert, &lpc->d.qdev); diff --git a/hw/acpi_ich9.h b/hw/acpi_ich9.h index 8f57cd8..816d453 100644 --- a/hw/acpi_ich9.h +++ b/hw/acpi_ich9.h @@ -29,6 +29,7 @@ #define ICH9_MEM_OST_REMOVE_FAIL 0xafa1 #define ICH9_MEM_OST_ADD_SUCCESS 0xafa2 #define ICH9_MEM_OST_ADD_FAIL 0xafa3 +#define ICH9_MEM_PS3 0xafa4 typedef struct ICH9LPCPMRegs { /* diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 70aa480..6c953c2 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -54,6 +54,7 @@ #define MEM_OST_REMOVE_FAIL 0xafa1 #define MEM_OST_ADD_SUCCESS 0xafa2 #define MEM_OST_ADD_FAIL 0xafa3 +#define MEM_PS3 0xafa4 #define PIIX4_MEM_HOTPLUG_STATUS 8 #define PIIX4_PCI_HOTPLUG_STATUS 2 @@ -564,6 +565,9 @@ static void memhp_writeb(void *opaque, uint32_t addr, uint32_t val) case MEM_OST_ADD_FAIL - MEM_BASE: dimm_notify(val, DIMM_ADD_FAIL); break; +case MEM_PS3 - MEM_BASE: +dimm_notify(val, DIMM_OSPM_POWEROFF); +break; default: PIIX4_DPRINTF("memhp write invalid %x <== %d\n", addr, val); } @@ -577,7 +581,7 @@ static const MemoryRegionOps piix4_memhp_ops = { .read = memhp_readb, }, { -.offset = MEM_EJ_BASE - MEM_BASE, .len = 4, +.offset = MEM_EJ_BASE - MEM_BASE, .len = 5, .size = 1, .write = memhp_writeb, }, @@ -666,7 +670,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s) memory_region_add_subregion(get_system_io(), PCI_HOTPLUG_ADDR, &s->io_pci); memory_region_init_io(&s->io_memhp, &piix4_memhp_ops, s, "apci-memhp0", - DIMM_BITMAP_BYTES + 4); + DIMM_BITMAP_BYTES + 5); memory_region_add_subregion(get_system_io(), MEM_BASE, &s->io_memhp); for (i = 0; i < DIMM_BITMAP_BYTES; i++) { @@ -726,7 +730,6 @@ static int piix4_dimm_revert(DeviceState *qdev, DimmDevice *dev, int add) struct gpe_regs *g = &s->gperegs; DimmDevice *slot = DIMM(dev); int idx = slot->idx; - if (add) { g->mems_sts[idx/8] &= ~(1 << (idx%8)); } else { diff --git a/hw/dimm.c b/hw/dimm.c index 69b97b6..2454e38 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -407,6 +407,10 @@ void dimm_notify(uint32_t idx, uint32_t event) qdev_unplug_complete((DeviceState *)slot, NULL); QTAILQ_REMOVE(&bus->dimmlist, slot, nextdimm); QTAILQ_INSERT_TAIL(&bus->dimm_hp_result_queue, result, next); +case DIMM_OSPM_POWEROFF: +if (bus->dimm_revert) { +bus->dimm_revert
[Qemu-devel] [RFC PATCH v4 27/30] [SeaBIOS] Add _OST dimm method
Add support for _OST method. _OST method will write into the correct I/O byte to signal success / failure of hot-add or hot-remove to qemu. --- src/acpi-dsdt-mem-hotplug.dsl | 51 - src/ssdt-mem.dsl |4 +++ 2 files changed, 54 insertions(+), 1 deletions(-) diff --git a/src/acpi-dsdt-mem-hotplug.dsl b/src/acpi-dsdt-mem-hotplug.dsl index fd73ea7..a648bee 100644 --- a/src/acpi-dsdt-mem-hotplug.dsl +++ b/src/acpi-dsdt-mem-hotplug.dsl @@ -27,7 +27,28 @@ Scope(\_SB) { { MPE, 8 } - + +/* Memory hot-remove notify failure byte */ +OperationRegion(MEEF, SystemIO, 0xafa1, 1) +Field (MEEF, ByteAcc, NoLock, Preserve) +{ +MEF, 8 +} + +/* Memory hot-add notify success byte */ +OperationRegion(MPIS, SystemIO, 0xafa2, 1) +Field (MPIS, ByteAcc, NoLock, Preserve) +{ +MIS, 8 +} + +/* Memory hot-add notify failure byte */ +OperationRegion(MPIF, SystemIO, 0xafa3, 1) +Field (MPIF, ByteAcc, NoLock, Preserve) +{ +MIF, 8 +} + Method(MESC, 0) { // Local5 = active memdevice bitmap Store (MES, Local5) @@ -69,4 +90,32 @@ Scope(\_SB) { Sleep(200) } +Method (MOST, 3, Serialized) { +// _OST method - OS status indication +Switch (And(Arg0, 0xFF)) { +Case(0x3) +{ +Switch(And(Arg1, 0xFF)) { +Case(0x1) { +Store(Arg2, MEF) +// Revert MEON flag for this memory device to one +Store(One, Index(MEON, Arg2)) +} +} +} +Case(0x1) +{ +Switch(And(Arg1, 0xFF)) { +Case(0x0) { +Store(Arg2, MIS) +} +Case(0x1) { +Store(Arg2, MIF) +// Revert MEON flag for this memory device to zero +Store(Zero, Index(MEON, Arg2)) +} +} +} +} +} } diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl index eef84b6..47a3b4f 100644 --- a/src/ssdt-mem.dsl +++ b/src/ssdt-mem.dsl @@ -38,6 +38,7 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1) External(CMST, MethodObj) External(MPEJ, MethodObj) +External(MOST, MethodObj) Name(_CRS, ResourceTemplate() { QwordMemory( @@ -60,6 +61,9 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1) Method (_EJ0, 1, NotSerialized) { MPEJ(ID, Arg0) } +Method (_OST, 3) { +MOST(Arg0, Arg1, ID) +} } } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 15/30] q35: Add i440fx dram controller initialization
Create memory buses and introduce function to adjust memory map for hotplug-able dimms. Signed-off-by: Vasilis Liaskovitis --- hw/pc_q35.c |1 + hw/q35.c| 27 +++ hw/q35.h|5 + 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/hw/pc_q35.c b/hw/pc_q35.c index 3429a9a..e6375bf 100644 --- a/hw/pc_q35.c +++ b/hw/pc_q35.c @@ -41,6 +41,7 @@ #include "hw/ide/pci.h" #include "hw/ide/ahci.h" #include "hw/usb.h" +#include "fw_cfg.h" /* ICH9 AHCI has 6 ports */ #define MAX_SATA_PORTS 6 diff --git a/hw/q35.c b/hw/q35.c index efebc27..cc27d72 100644 --- a/hw/q35.c +++ b/hw/q35.c @@ -236,12 +236,39 @@ static void mch_reset(DeviceState *qdev) mch_update(mch); } +static hwaddr mch_dimm_offset(DeviceState *dev, uint64_t size) +{ +MCHPCIState *d = MCH_PCI_DEVICE(dev); +hwaddr ret; + +/* if dimm fits before pci hole, append it normally */ +if (d->below_4g_mem_size + size <= MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT) { +ret = d->below_4g_mem_size; +d->below_4g_mem_size += size; +} +/* otherwise place it above 4GB */ +else { +ret = 0x1LL + d->above_4g_mem_size; +d->above_4g_mem_size += size; +} + +return ret; +} + static int mch_init(PCIDevice *d) { int i; hwaddr pci_hole64_size; MCHPCIState *mch = MCH_PCI_DEVICE(d); +/* Initialize 2 GMC DRAM channels x 4 DRAM ranks each */ +mch->dram_channel[0] = dimm_bus_create(OBJECT(d), "membus.0", 4, +mch_dimm_offset); +mch->dram_channel[1] = dimm_bus_create(OBJECT(d), "membus.1", 4, +mch_dimm_offset); +/* Initialize paravirtual memory bus */ +mch->pv_dram_channel = dimm_bus_create(OBJECT(d), "membus.pv", 0, +mch_dimm_offset); /* setup pci memory regions */ memory_region_init_alias(&mch->pci_hole, "pci-hole", mch->pci_address_space, diff --git a/hw/q35.h b/hw/q35.h index e34f7c1..bf76dc8 100644 --- a/hw/q35.h +++ b/hw/q35.h @@ -34,6 +34,7 @@ #include "acpi.h" #include "acpi_ich9.h" #include "pam.h" +#include "dimm.h" #define TYPE_Q35_HOST_DEVICE "q35-pcihost" #define Q35_HOST_DEVICE(obj) \ @@ -56,6 +57,10 @@ typedef struct MCHPCIState { uint8_t smm_enabled; ram_addr_t below_4g_mem_size; ram_addr_t above_4g_mem_size; +/* GMCH allows for 2 DRAM channels x 4 DRAM ranks each */ +DimmBus * dram_channel[2]; +/* paravirtual memory bus */ +DimmBus *pv_dram_channel; } MCHPCIState; typedef struct Q35PCIHost { -- 1.7.9
[Qemu-devel] [RFC PATCH v4 14/30] piix_pci: Add i440fx dram controller initialization
Also introduce function to adjust memory map for hotplug-able dimms. Signed-off-by: Vasilis Liaskovitis --- hw/pc_piix.c |6 +++--- hw/piix_pci.c | 30 -- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 6a9b508..fe995b9 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -95,9 +95,9 @@ static void pc_init1(MemoryRegion *system_memory, kvmclock_create(); } -if (ram_size >= 0xe000 ) { -above_4g_mem_size = ram_size - 0xe000; -below_4g_mem_size = 0xe000; +if (ram_size >= I440FX_PCI_HOLE_START) { +above_4g_mem_size = ram_size - I440FX_PCI_HOLE_START; +below_4g_mem_size = I440FX_PCI_HOLE_START; } else { above_4g_mem_size = 0; below_4g_mem_size = ram_size; diff --git a/hw/piix_pci.c b/hw/piix_pci.c index 7ca3c73..9866b1d 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -125,6 +125,25 @@ static const VMStateDescription vmstate_i440fx = { } }; +hwaddr i440fx_pmc_dimm_offset(DeviceState *dev, uint64_t size) +{ +PCII440FXState *d = I440FX_PCI_DEVICE(dev); +hwaddr ret; + +/* if dimm fits before pci hole, append it normally */ +if (d->below_4g_mem_size + size <= I440FX_PCI_HOLE_START) { +ret = d->below_4g_mem_size; +d->below_4g_mem_size += size; +} +/* otherwise place it above 4GB */ +else { +ret = 0x1LL + d->above_4g_mem_size; +d->above_4g_mem_size += size; +} + +return ret; +} + static void i440fx_pcihost_initfn(Object *obj) { I440FXState *s = I440FX_HOST_DEVICE(obj); @@ -148,8 +167,8 @@ static int i440fx_pcihost_init(SysBusDevice *dev) sysbus_add_io(dev, 0xcfc, &pci->data_mem); sysbus_init_ioports(&pci->busdev, 0xcfc, 4); -b = pci_bus_new(&s->parent_obj.busdev.qdev, NULL, s->mch.pci_address_space, -s->mch.address_space_io, 0); +b = pci_bus_new(&s->parent_obj.busdev.qdev, "pci.0", +s->mch.pci_address_space, s->mch.address_space_io, 0); s->parent_obj.bus = b; qdev_set_parent_bus(DEVICE(&s->mch), BUS(b)); qdev_init_nofail(DEVICE(&s->mch)); @@ -169,6 +188,13 @@ static int i440fx_initfn(PCIDevice *dev) pci_hole64_size = (sizeof(hwaddr) == 4 ? 0 : ((uint64_t)1 << 62)); + +/* Initialize i440fx's DRAM channel, it can hold up to 8 DRAM ranks */ +f->dram_channel0 = dimm_bus_create(OBJECT(f), "membus.0", 8, +i440fx_pmc_dimm_offset); +/* Initialize paravirtual memory bus */ +f->pv_dram_channel = dimm_bus_create(OBJECT(f), "membus.pv", 0, +i440fx_pmc_dimm_offset); memory_region_init_alias(&f->pci_hole, "pci-hole", f->pci_address_space, f->below_4g_mem_size, 0x1LL - f->below_4g_mem_size); -- 1.7.9
[Qemu-devel] [RFC PATCH v4 06/30] qapi: make visit_type_size fallback to type_int
Currently visit_type_size checks if the visitor's type_size function pointer is NULL. If not, it calls it, otherwise it calls v->type_uint64(). But neither of these pointers are ever set. Fallback to calling v->type_int() in this third (default) case. Signed-off-by: Vasilis Liaskovitis --- qapi/qapi-visit-core.c | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c index 7a82b63..497e693 100644 --- a/qapi/qapi-visit-core.c +++ b/qapi/qapi-visit-core.c @@ -236,8 +236,17 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp) void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp) { +int64_t value; if (!error_is_set(errp)) { -(v->type_size ? v->type_size : v->type_uint64)(v, obj, name, errp); +if (v->type_size) { +v->type_size(v, obj, name, errp); +} else if (v->type_uint64) { +v->type_uint64(v, obj, name, errp); +} else { +value = *obj; +v->type_int(v, &value, name, errp); +*obj = value; +} } } -- 1.7.9
[Qemu-devel] [RFC PATCH v4 01/30] [SeaBIOS] Add ACPI_EXTRACT_DEVICE* macros
This allows to extract the beginning, end and name of a Device object. --- tools/acpi_extract.py | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py index 3295678..3191f53 100755 --- a/tools/acpi_extract.py +++ b/tools/acpi_extract.py @@ -217,6 +217,28 @@ def aml_package_start(offset): offset += 1 return offset + aml_pkglen_bytes(offset) + 1 +def aml_device_start(offset): +#0x5B 0x82 DeviceOp PkgLength NameString ProcID +if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)): +die( "Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x" % + (offset, aml[offset], aml[offset + 1])); +return offset + +def aml_device_string(offset): +#0x5B 0x82 DeviceOp PkgLength NameString ProcID +start = aml_device_start(offset) +offset += 2 +pkglenbytes = aml_pkglen_bytes(offset) +offset += pkglenbytes +return offset + +def aml_device_end(offset): +start = aml_device_start(offset) +offset += 2 +pkglenbytes = aml_pkglen_bytes(offset) +pkglen = aml_pkglen(offset) +return offset + pkglen + lineno = 0 for line in fileinput.input(): # Strip trailing newline @@ -307,6 +329,12 @@ for i in range(len(asl)): offset = aml_processor_end(offset) elif (directive == "ACPI_EXTRACT_PKG_START"): offset = aml_package_start(offset) +elif (directive == "ACPI_EXTRACT_DEVICE_START"): +offset = aml_device_start(offset) +elif (directive == "ACPI_EXTRACT_DEVICE_STRING"): +offset = aml_device_string(offset) +elif (directive == "ACPI_EXTRACT_DEVICE_END"): +offset = aml_device_end(offset) else: die("Unsupported directive %s" % directive) -- 1.7.9
Re: [Qemu-devel] [PATCH] add bochs dispi interface framebuffer driver
On Thu, Nov 01, 2012 at 02:30:35PM +0100, Gerd Hoffmann wrote: > On 10/19/12 12:35, Vasilis Liaskovitis wrote: > > Hi, > > > > On Thu, Mar 08, 2012 at 11:13:46AM +0100, Gerd Hoffmann wrote: > >> This patchs adds a frame buffer driver for (virtual/emulated) vga cards > >> implementing the bochs dispi interface. Supported hardware are the > >> bochs vga card with vbe extension and the qemu standard vga. > >> > >> The driver uses a fixed depth of 32bpp. Otherwise it supports the full > >> (but small) feature set of the bochs dispi interface: Resolution > >> switching and display panning. It is tweaked to maximize fbcon speed, > >> so you'll get the comfort of the framebuffer console in kvm guests > >> without performance penalty. > > > > I am testing this driver with qemu-kvm-1.2 or qemu-kvm master (commit) > > and "-std vga". The driver works fine in general. > > > > When I test a guest that runs X (ubuntu-12.04 desktop amd64), sometimes > > parts of > > the screen and keyboard input is mixed between the X terminal and fbconsole > > terminals. This happens only on the initial X11 login (right after boot or > > reboot) and only sometimes. > > Only with bochsfb or with vesafb (+ fbdev xorg driver) too? vt-switching with vesafb/X11 works fine on a grml 64-bit image. However, xorg uses vesa driver in this case, not fbdev (fbdev / fbdevhw xorg modules are initially loaded but then unloaded). X11 uses 1280x768 and vesafb uses 1024x768 according to dmesg. But i haven't been able to test ubuntu+vesafb. Ubuntu kernels use efifb (CONFIG_FB_EFI=y) and fbconsoles don't work at all with this driver + qemu/seabios/vgastd. I have tried using a custom kernel (CONFIG_FB_EFI not set, CONFIG_FB_VESA=y) but for some reason I can't load vesafb on ubuntu desktop. No fb drivers are blacklisted, but no fb driver is loaded if I specify a vga text mode with "vga=" in the kernel command line. X11 still uses 1280x768 resolution here. Anyway, these are screenshots of the original problem (messed up output with bochsfb + fbdev-xorg on ubuntu 12.04 startup): vt7 http://picpaste.de/bochsfb-badstart-AirrXZuF.png vt1 http://www.picpaste.de/bochsfb-badstart-f1-EO10MVdF.png it still happens with the latest bochsfb driver (tested with 3.6.0 though, not 3.7.0-rc3 yet) > > > Xorg driver used is fbdev (i can send xorg log), not sure if another driver > > should be used/implemented for the bochsfb. > > Yes, that one is fine. > > > CONFIG_FB_BOCHS=m > > CONFIG_FB_VESA=y > > # CONFIG_FB_EFI is not set > > > > Should FB_VESA be turned to "not set" for this test? (it's not tristate in > > Kconfig) > > > > Btw (slightly off-topic) are other framebuffer drivers suitable for the > > standard qemu vga-pci device? Would vesafb or uvesafb work? > > Never tried uvesafb. vesafb will work too, but run with a fixed > resolution. bochsfb allows you to change the display resolution at > runtime using fbset. fbcon is faster too because bochsfb supports > display panning. I assume bochsfb is the way we want to go. I can send more detailed info on the uvesafb issue if needed. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 00/19] ACPI memory hotplug
On Wed, Oct 31, 2012 at 01:16:56PM +0200, Avi Kivity wrote: > On 10/31/2012 12:58 PM, Stefan Hajnoczi wrote: > > On Fri, Sep 21, 2012 at 1:17 PM, Vasilis Liaskovitis > > wrote: > >> This is v3 of the ACPI memory hotplug functionality. Only x86_64 target is > >> supported > >> for now. > > > > Hi Vasilis, > > Regarding the hot unplug issue we've been discussing, it's possible to > > progress this patch series without fully solving that problem upfront. > > > > Karen Noel suggested that the series could be rolled without the hot > > unplug command, so that it's not possible to hit the unsafe case. > > This would allow users to hot plug additional memory. They would have > > to use virtio-balloon to reduce the memory footprint again. Later, > > when the memory region referencing issue has been solved the hot > > unplug command can be added. > > > > Just wanted to mention Karen's idea in case you feel stuck right now. > > We could introduce hotunplug as an experimental feature so people can > test and play with it, and later graduate it to a fully supported feature. ok, I 'll separate hotplug and hotunplug patches for next version of the patchseries (maybe even offer hotunplug in a separate series) thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction
Hi, On Wed, Oct 24, 2012 at 12:15:17PM +0200, Stefan Hajnoczi wrote: > On Wed, Oct 24, 2012 at 10:06 AM, liu ping fan wrote: > > On Tue, Oct 23, 2012 at 8:25 PM, Stefan Hajnoczi wrote: > >> On Fri, Sep 21, 2012 at 01:17:21PM +0200, Vasilis Liaskovitis wrote: > >>> +static void dimm_populate(DimmDevice *s) > >>> +{ > >>> +DeviceState *dev= (DeviceState*)s; > >>> +MemoryRegion *new = NULL; > >>> + > >>> +new = g_malloc(sizeof(MemoryRegion)); > >>> +memory_region_init_ram(new, dev->id, s->size); > >>> +vmstate_register_ram_global(new); > >>> +memory_region_add_subregion(get_system_memory(), s->start, new); > >>> +s->mr = new; > >>> +} > >>> + > >>> +static void dimm_depopulate(DimmDevice *s) > >>> +{ > >>> +assert(s); > >>> +vmstate_unregister_ram(s->mr, NULL); > >>> +memory_region_del_subregion(get_system_memory(), s->mr); > >>> +memory_region_destroy(s->mr); > >>> +s->mr = NULL; > >>> +} > >> > >> How is dimm hot unplug protected against callers who currently have RAM > >> mapped (from cpu_physical_memory_map())? > >> > >> Emulated devices call cpu_physical_memory_map() directly or indirectly > >> through DMA emulation code. The RAM pointer may be held for arbitrary > >> lengths of time, across main loop iterations, etc. > >> > >> It's not clear to me that it is safe to unplug a DIMM that has network > >> or disk I/O buffers, for example. We also need to be robust against > >> malicious guests who abuse the hotplug lifecycle. QEMU should never be > >> left with dangling pointers. > >> > > Not sure about the block layer. But I think those thread are already > > out of big lock, so there should be a MemoryListener to catch the > > RAM-unplug event, and if needed, bdrv_flush. do we want bdrv_flush, or some kind of cancel request e.g. bdrv_aio_cancel? > > Here is the detailed scenario: > > 1. Emulated device does cpu_physical_memory_map() and gets a pointer > to guest RAM. > 2. Return to vcpu or iothread, continue processing... > 3. Hot unplug of RAM causes the guest RAM to disappear. > 4. Pending I/O completes and overwrites memory from dangling guest RAM > pointer. > > Any I/O device that does zero-copy I/O in QEMU faces this problem: > * The block layer is affected. > * The net layer is unaffected because it doesn't do zero-copy tx/rx > across returns to the main loop (#2 above). > * Not sure about other devices classes (e.g. USB). > > How should the MemoryListener callback work? For block I/O it may not > be possible to cancel pending I/O asynchronously - if you try to > cancel then your thread may block until the I/O completes. e.g. paio_cancel does this? is there already an API to asynchronously cancel all in flight operations in a BlockDriverState? Afaict block_job_cancel refers to streaming jobs only and doesn't help here. Can we make the RAM unplug initiate async I/O cancellations, prevent further I/Os, and only free the memory in a callback, after all DMA I/O to the associated memory region has been cancelled or completed? Also iiuc the MemoryListener should be registered from users of cpu_physical_memory_map e.g. hw/virtio.c By the way dimm_depopulate only frees the qemu memory on an ACPI _EJ request, which means that a well-behaved guest will have already offlined the memory and is not using it anymore. If the guest still uses the memory e.g. for a DMA buffer, the logical memory offlining will fail and the _EJ/qemu memory freeing will never happen. But in theory a malicious acpi guest driver could trigger _EJ requests to do step 3 above. Or perhaps the backing block driver can finish an I/O request for a zero-copy block device that the guest doesn't care for anymore? I 'll think about this a bit more. > Synchronous cancel behavior is not workable since it can lead to poor > latency or hangs in the guest. ok thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 06/19] Implement "-dimm" command line option
Hi, On Thu, Oct 18, 2012 at 02:33:02PM +0200, Avi Kivity wrote: > On 10/18/2012 11:27 AM, Vasilis Liaskovitis wrote: > > On Wed, Oct 17, 2012 at 12:03:51PM +0200, Avi Kivity wrote: > >> On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote: > >> >> > >> >> I don't think so, but probably there's a limit of DIMMs that real > >> >> controllers have, something like 8 max. > >> > > >> > In the case of i440fx specifically, do you mean that we should model the > >> > DRB > >> > (Dram row boundary registers in section 3.2.19 of the i440fx spec) ? > >> > > >> > The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row > >> > maps 1-1 to a DimmDevice for this discussion) and only supports up to > >> > 2GB of > >> > memory afaict (bit 31 and above is ignored). > >> > > >> > I 'd rather not model this part of the i440fx - having only 8 DIMMs > >> > seems too > >> > restrictive. The rest of the patchset supports up to 255 DIMMs so it > >> > would be a > >> > waste imho to model an old pc memory controller that only supports 8 > >> > DIMMs. > >> > > >> > There was also an old discussion about i440fx modeling here: > >> > https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html > >> > the general direction was that i440fx is too old and we don't want to > >> > precisely > >> > emulate the DRB registers, since they lack flexibility. > >> > > >> > Possible solutions: > >> > > >> > 1) is there a newer and more flexible chipset that we could model? > >> > >> Look for q35 on this list. > > > > thanks, I 'll take a look. It sounds like the other options below are more > > straightforward now, but let me know if you prefer q35 integration as a > > priority. > > At least validate that what you're doing fits with how q35 works. In terms of pmc modeling, the q35 page http://wiki.qemu.org/Features/Q35 mentions: Refactor i440fx to create i440fx-pmc class ich9: model ICH9 Super I/O chip ich9: make i440fx-pmc a generic PCNorthBridge class and add support for ich9 northbridge is this still the plan? There was an old patchset creating i440fx-pmc here: http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03501.html but I am not sure if it has been dropped or worked on. v3 of the q35 patchset doesn't include a pmc I think. It would be good to know what the current plan regarding pmc modeling (for both q35 and i440fx) is. thanks, - Vasilis
Re: [Qemu-devel] [PATCH] add bochs dispi interface framebuffer driver
Hi, On Thu, Mar 08, 2012 at 11:13:46AM +0100, Gerd Hoffmann wrote: > This patchs adds a frame buffer driver for (virtual/emulated) vga cards > implementing the bochs dispi interface. Supported hardware are the > bochs vga card with vbe extension and the qemu standard vga. > > The driver uses a fixed depth of 32bpp. Otherwise it supports the full > (but small) feature set of the bochs dispi interface: Resolution > switching and display panning. It is tweaked to maximize fbcon speed, > so you'll get the comfort of the framebuffer console in kvm guests > without performance penalty. I am testing this driver with qemu-kvm-1.2 or qemu-kvm master (commit) and "-std vga". The driver works fine in general. When I test a guest that runs X (ubuntu-12.04 desktop amd64), sometimes parts of the screen and keyboard input is mixed between the X terminal and fbconsole terminals. This happens only on the initial X11 login (right after boot or reboot) and only sometimes. During this time, there is a second keyboard cursor at top of the screen on the X11 login. When switching to an fbconsole (ctrl+alt+f1), screen output of the X11 login screen gets mixed with fbconsole screen. And vice-versa when I go back to the X11 terminal(I can send you 2 screendumps if needed, I haven't attached them here due to size) If I try to login (pressing enter), the X11-login is redrawn and from then on vt switching works with no problems (I have to retype login, I am not sure where the original keyboard input goes to) Xorg driver used is fbdev (i can send xorg log), not sure if another driver should be used/implemented for the bochsfb. According to "xrandr -q" same resolution as bochsfb is used: Screen 0: minimum 1024 x 768, current 1024 x 768, maximum 1024 x 768 1024x768116.0* "fbset -i" output is as expected: mode "1024x768-116" # D: 100.000 MHz, H: 93.985 kHz, V: 116.318 Hz geometry 1024 768 1024 4096 32 timings 1 16 16 16 16 8 8 rgba 8/16,8/8,8/0,8/24 endmode Frame buffer device information: Name: bochsfb Address : 0xfd00 Size: 16777216 Type: PACKED PIXELS Visual : TRUECOLOR XPanStep: 1 YPanStep: 1 YWrapStep : 0 LineLength : 4096 Accelerator : No Some framebuffer-relevant guest kernel options used: CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y # CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set CONFIG_FB_DEFERRED_IO=y # # Frame buffer hardware drivers # CONFIG_FB_BOCHS=m CONFIG_FB_VESA=y # CONFIG_FB_EFI is not set Should FB_VESA be turned to "not set" for this test? (it's not tristate in Kconfig) Btw (slightly off-topic) are other framebuffer drivers suitable for the standard qemu vga-pci device? Would vesafb or uvesafb work? I haven't been able to load uvesafb in a guest, because the userspace helper program v86d segfaults (maybe it tries to access vga ioports that are not implemented in qemu?) > > Signed-off-by: Gerd Hoffmann > --- > drivers/video/Kconfig | 18 +++ > drivers/video/Makefile |1 + > drivers/video/bochsfb.c | 385 > +++ > 3 files changed, 404 insertions(+), 0 deletions(-) > create mode 100644 drivers/video/bochsfb.c > > diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig > index 6ca0c40..4d21f90 100644 > --- a/drivers/video/Kconfig > +++ b/drivers/video/Kconfig > @@ -286,6 +286,24 @@ config FB_CIRRUS > Say N unless you have such a graphics board or plan to get one > before you next recompile the kernel. > > +config FB_BOCHS > + tristate "Bochs dispi interface support" > + depends on FB && PCI > + select FB_CFB_FILLRECT > + select FB_CFB_COPYAREA > + select FB_CFB_IMAGEBLIT > + ---help--- > + This is the frame buffer driver for (virtual/emulated) vga > + cards implementing the bochs dispi interface. Supported > + hardware are the bochs vga card with vbe extension and the > + qemu standard vga. > + > + The driver handles the PCI variants only. It uses a fixed > + depth of 32bpp, anything else doesn't make sense these days. > + > + Say Y here if you plan to run the kernel in a virtual machine > + emulated by bochs or qemu. > + > config FB_PM2 > tristate "Permedia2 support" > depends on FB && ((AMIGA && BROKEN) || PCI) > diff --git a/drivers/video/Makefile b/drivers/video/Makefile > index 1426068..a065ad3 100644 > --- a/drivers/video/Makefile > +++ b/drivers/video/Makefile > @@ -99,6 +99,7 @@ obj-$(CONFIG_FB_ARMCLCD) += amba-clcd.o > obj-$(CONFIG_FB_68328)+= 68328fb.o > obj-$(CONFIG_FB_GBE) += gbefb.o > obj-$(CONFIG_FB_CIRRUS)+= cirrusfb.o > +obj-$(CONFIG_FB_BOCHS) += bochsfb.o > obj-$(CONFIG_FB_ASILIANT) += asiliantfb.o > obj-$(CONFIG_FB_PXA) += pxafb.o > obj-$(CON
Re: [Qemu-devel] [RFC PATCH v3 06/19] Implement "-dimm" command line option
On Wed, Oct 17, 2012 at 12:03:51PM +0200, Avi Kivity wrote: > On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote: > >> > >> I don't think so, but probably there's a limit of DIMMs that real > >> controllers have, something like 8 max. > > > > In the case of i440fx specifically, do you mean that we should model the DRB > > (Dram row boundary registers in section 3.2.19 of the i440fx spec) ? > > > > The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row > > maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of > > memory afaict (bit 31 and above is ignored). > > > > I 'd rather not model this part of the i440fx - having only 8 DIMMs seems > > too > > restrictive. The rest of the patchset supports up to 255 DIMMs so it would > > be a > > waste imho to model an old pc memory controller that only supports 8 DIMMs. > > > > There was also an old discussion about i440fx modeling here: > > https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html > > the general direction was that i440fx is too old and we don't want to > > precisely > > emulate the DRB registers, since they lack flexibility. > > > > Possible solutions: > > > > 1) is there a newer and more flexible chipset that we could model? > > Look for q35 on this list. thanks, I 'll take a look. It sounds like the other options below are more straightforward now, but let me know if you prefer q35 integration as a priority. > > > > > 2) model and document > ^--- the critical bit > > > a generic (non-existent) i440fx that would support more > > and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description > > similar to the i440fx DRB registers, the registers would take up a lot of > > space. > > In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how > > many 8MB chunks are contained in DIMMs 0...i. So, the register values are > > cumulative (and total described memory cannot exceed 256x8MB = 2GB) > > Our i440fx has already been extended by support for pci and cpu hotplug, > and I see no reason not to extend it for memory. We can allocate extra > mmio space for registers if needed. Usually I'm against this sort of > thing, but in this case we don't have much choice. ok > > > > > We could for example model: > > - an 8-bit non-cumulative register for each DIMM, denoting how many > > 128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 > > DIMMs we > > describe a bit less than 8TB. These registers require 255 bytes. > > - a 16-bit cumulative register for each DIMM again for 128MB chunks. This > > allows > > us to describe 8TB of memory (but the registers take up double the space, > > because > > they describe cumulative memory amounts) > > There is no reason to save space. Why not have two 64-bit registers per > DIMM, one describing the size and the other the base address, both in > bytes? Use a few low order bits for control. Do we want this generic scheme above to be tied into the i440fx/pc machine? Or have it as a separate generic memory bus / pmc usable by others (e.g. in hw/dimm.c)? The 64-bit values you describe are already part of DimmDevice properties, but they are not hardware registers described as part of a chipset. In terms of control bits, did you want to mimic some other chipset registers? - any examples would be useful. > > > > > 3) let everything be handled/abstracted by dimmbus - the chipset DRB > > modelling > > is not done (at least for i440fx, other machines could). This is the least > > precise > > in terms of emulation. On the other hand, if we are not really trying to > > emulate > > the real (too restrictive) hardware, does it matter? > > We could emulate base memory using the chipset, and extra memory using > the scheme above. This allows guests that are tied to the chipset to > work, and guests that have more awareness (seabios) to use the extra > features. But if we use the real i440fx pmc DRBs for base memory, this means base memory would be <= 2GB, right? Sounds like we 'd need to change the DRBs anyway to describe useful amounts of base memory (e.g. 512MB chunks and check against address lines [36:29] can describe base memory up to 64GB, though that's still limiting for very large VMs). But we'd be diverting from the real hardware again. Then we can model base memory with "tweaked" i440fx pmc's DRB registers - we could only use DRB[0] (one DIMM describing all of base memory) or more. DIMMs would be allowed to be hotplugged in the generic mem-controller scheme only (unless it makes sense to allow hotplug in the remaining pmc DRBs and start using the generic scheme once we run out of emulated DRBs) thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 06/19] Implement "-dimm" command line option
On Sat, Oct 13, 2012 at 08:57:19AM +, Blue Swirl wrote: > On Tue, Oct 9, 2012 at 5:04 PM, Vasilis Liaskovitis > wrote: > >> snip > >> Maybe even the dimmbus device shouldn't exist by itself after all, or > >> it should be pretty much invisible to users. On real HW, the memory > >> controller or south bridge handles the memory. For i440fx, it's part > >> of the same chipset. So I think we should just add qdev properties to > >> i440fx to specify the sizes, nodes etc. Then i440fx should create the > >> dimmbus device unconditionally using the properties. The default > >> properties should create a sane configuration, otherwise -global > >> i440fx.dimm_size=512M etc. could be used. Then the bus would be > >> populated as before or with device_add. > > > > hmm the problem with using only i440fx properties, is that size/nodes look > > dimm specific to me, not chipset-memcontroller specific. Unless we only > > allow > > uniform size dimms. Is it possible to have a dynamic list of sizes/nodes > > pairs as > > properties of a qdev device? > > I don't think so, but probably there's a limit of DIMMs that real > controllers have, something like 8 max. In the case of i440fx specifically, do you mean that we should model the DRB (Dram row boundary registers in section 3.2.19 of the i440fx spec) ? The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of memory afaict (bit 31 and above is ignored). I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too restrictive. The rest of the patchset supports up to 255 DIMMs so it would be a waste imho to model an old pc memory controller that only supports 8 DIMMs. There was also an old discussion about i440fx modeling here: https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html the general direction was that i440fx is too old and we don't want to precisely emulate the DRB registers, since they lack flexibility. Possible solutions: 1) is there a newer and more flexible chipset that we could model? 2) model and document a generic (non-existent) i440fx that would support more and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description similar to the i440fx DRB registers, the registers would take up a lot of space. In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how many 8MB chunks are contained in DIMMs 0...i. So, the register values are cumulative (and total described memory cannot exceed 256x8MB = 2GB) We could for example model: - an 8-bit non-cumulative register for each DIMM, denoting how many 128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs we describe a bit less than 8TB. These registers require 255 bytes. - a 16-bit cumulative register for each DIMM again for 128MB chunks. This allows us to describe 8TB of memory (but the registers take up double the space, because they describe cumulative memory amounts) 3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling is not done (at least for i440fx, other machines could). This is the least precise in terms of emulation. On the other hand, if we are not really trying to emulate the real (too restrictive) hardware, does it matter? thanks, - Vasilis
[Qemu-devel] slower live-migration with XBZRLE
Hi, I am testing XBZRLE compression with qemu-1.2 for live migration of large VM and/or memory-intensive workloads. I have a 4GB guest that runs the memory r/w load generator from the original patchset, see docs/xbzrle.txt or http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01207.html I have set xbzrle to "on" in both source/target, and default cache size in source (I also tried using 1g cache size, during the test or with a new migration). The migration starts but the ram transfer rate is very slow and migration total time is very large. Cache misses and overflows seem small as far as I can tell. Here's example output from the source "info migrate" with xbzrle=on when it's done: (qemu) info migrate capabilities: xbzrle: on Migration status: completed total time: 6530177 milliseconds transferred ram: 4887726 kbytes remaining ram: 0 kbytes total ram: 4211008 kbytes duplicate: 3126234 pages normal: 43587 pages normal bytes: 174348 kbytes cache size: 268435456 bytes xbzrle transferred: 4710325 kbytes xbzrle pages: 266649315 pages xbzrle cache miss: 43440 xbzrle overflow : 147 The same guest+workload migrates much faster with xbzrle=off. I would have expected the opposite behaviour i.e with xbzrle=off, this guest+workload combination would migrate very slowly or never end. Here's example output from the source "info migrate" with xbzrle=off when it's done (qemu) info migrate capabilities: xbzrle: off Migration status: completed total time: 10791 milliseconds transferred ram: 220735 kbytes remaining ram: 0 kbytes total ram: 4211008 kbytes duplicate: 1007476 pages normal: 54938 pages normal bytes: 219752 kbytes Have I missed setting some other migration parameter? I tried using migrate_set_speed to change the bandwidth limit to 10 bytes/sec but it didn't make any difference. Are there any default parameters that would make xbzrle inefficient for this type of workload? Has any one measured a point of diminishing returns where e.g. encoding/decoding cpu-overhead makes the feature ineffective? this was a live-migration performed on same host, but I have seen same behaviour between 2 hosts. The test host was idle apart from the VMs. sample command line: -enable-kvm -M pc -smp 2,maxcpus=64 -cpu host -m 4096 -drive file=/home/debian.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -vga std -netdev type=tap,id=guest0,vhost=on -device virtio-net-pci,netdev=guest0 thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH 0/9] qom: improve reference counting and hotplug
Hi, On Sun, Aug 26, 2012 at 10:51:29AM -0500, Anthony Liguori wrote: > Right now, you need to pair up object_new with object_delete. This is > impractical when using reference counting because we would like to ensure that > object_unref() also frees memory when needed. > > The first few patches fix this problem by introducing a release callback so > that objects that need special release behavior (i.e. g_free) can do that. > > Since link and child properties all hold references, in order to actually free > an object, we need to break those links. User created devices end up as > children of a container. But child properties cannot be removed which means > there's no obvious way to remove the reference and ultimately free the object. > > We introduce the concept of "nullable child" properties to solve this. This > is > a child property that can be broken by writing NULL to the child link. Today > we set all /peripheral* children to be nullable so that they can be deleted by > management tools. > > In terms of modeling hotplug, we represent unplug by removing the object from > the parent bus. We need to register a notifier for when this happens so that > we can also remove the parent's child property to ultimately release the > object. > > Putting it all together, we have: > > 1) qmp_device_del will issue a callback to a device. The default callback > will >do a forced eject (which means writing NULL to the parent_bus link). > > 2) PCI hotplug is a bit more sophisticated in that it waits for the guest to >do the ejection. > > 3) qmp_device_del will register an eject notifier such that the device gets >completely removed. > > There's a slightly change in behavior here. A device is not automatically > destroyed based on a guest initiated eject. A management tool must explicitly > break the parent's link to the child in order for the device to disappear > completely. device_del behaves exactly as it does today though. is there a plan to respin this series, or has it been dropped? Afaict it hasn't landed upstream. I am reworking an acpi memory hotplug RFC patchset, and noticed this patchset changes guest-initiated eject semantics. Currently a successfull guest-initiated eject in my mem-hotplug patchset destroys the qdev device. But the semantics of this patchset would require a new behaviour. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 06/19] Implement "-dimm" command line option
Hi, sorry for the delayed answer. On Sat, Sep 29, 2012 at 11:13:04AM +, Blue Swirl wrote: > > > > The "-dimm" option is supposed to specify the dimm/memory layout, and not > > create > > any devices. > > > > If we don't want this new option, I have a question: > > > > A "-device/device_add" means we create a new qdev device at startup or as a > > hotplug operation respectively. So, the semantics of > > "-device dimm,id=dimm0,size=512M,node=0,populated=on" are clear to me. > > > > What does "-device dimm,populated=off" mean from a qdev perspective? There > > are 2 > > alternatives: > > > > - The device is created on the dimmbus, but is not used/populated yet. Than > > the > > activation/acpi-hotplug of the dimm may require a separate command (we used > > to have > > "dimm_add" in versions < 3). "device_add" handling always hotplugs a new > > qdev > > device, so this wouldn't fit this usecase, because the device already > > exists. In > > this case, the actual "acpi hotplug" operation is decoupled from qdev device > > creation. > > The bus exists but the devices do not, device_add would add DIMMs to > the bus. This matches PCI bus created by the host bridge and PCI > device hotplug. > > A more complex setup would be dimm bus, dimm slot devices and DIMM > devices. The intermediate slot device would contain one DIMM device if > plugged. interesting, I haven't thought about this alternative. It does sounds overly complex, but a dimmslot / dimmdevice splitup could consolidate hotplug semantic differences between populated=on/off. Something similar to the dimmslot device is already present in v3 (dimmcfg structure), but it's not a qdev visible device. I 'd rather avoid the complication, but i might revisit this idea. > > > > > - The dimmdevice is not created when "-device dimm,populated=off" (this > > would > > require some ugly checking in normal -device argument handling). Only the > > dimm > > layout is saved. The hotplug is triggered from a normal device_add later. > > So in > > this case, the "acpi hotplug" happens at the same time as the qdev hotplug. > > > > Do you see a simpler alternative without introducing a new option? > > > > Using the "-dimm" option follows the second semantic and avoids changing > > the "-device" > > semantics. Dimm layout description is decoupled from dimmdevice creation, > > and qdev > > hotplug coincides with acpi hotplug. > > Maybe even the dimmbus device shouldn't exist by itself after all, or > it should be pretty much invisible to users. On real HW, the memory > controller or south bridge handles the memory. For i440fx, it's part > of the same chipset. So I think we should just add qdev properties to > i440fx to specify the sizes, nodes etc. Then i440fx should create the > dimmbus device unconditionally using the properties. The default > properties should create a sane configuration, otherwise -global > i440fx.dimm_size=512M etc. could be used. Then the bus would be > populated as before or with device_add. hmm the problem with using only i440fx properties, is that size/nodes look dimm specific to me, not chipset-memcontroller specific. Unless we only allow uniform size dimms. Is it possible to have a dynamic list of sizes/nodes pairs as properties of a qdev device? Also if there is no dimmbus, and instead we have only links<> from i440fx to dimm-devices, would the current qdev hotplug API be enough? I am currently leaning towards this: i440fx unconditionally creates the dimmbus. Users don't have to specify the bus (i assume this is what you mean by "dimmbus should be invisible to the users") We only use "-device dimm" to describe dimms. With "-device dimm,populated=off", only the dimm config layout will be saved in the dimmbus. The hotplug is triggered from a normal device_add later (same as pci hotplug). thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 08/19] pc: calculate dimm physical addresses and adjust memory map
On Sat, Sep 22, 2012 at 02:15:28PM +, Blue Swirl wrote: > > + > > +/* Function to configure memory offsets of hotpluggable dimms */ > > + > > +target_phys_addr_t pc_set_hp_memory_offset(uint64_t size) > > +{ > > +target_phys_addr_t ret; > > + > > +/* on first call, initialize ram_hp_offset */ > > +if (!ram_hp_offset) { > > +if (ram_size >= PCI_HOLE_START ) { > > +ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START); > > +} else { > > +ram_hp_offset = ram_size; > > +} > > +} > > + > > +if (ram_hp_offset >= 0x1LL) { > > +ret = ram_hp_offset; > > +above_4g_hp_mem_size += size; > > +ram_hp_offset += size; > > +} > > +/* if dimm fits before pci hole, append it normally */ > > +else if (ram_hp_offset + size <= PCI_HOLE_START) { > > } else if ... > > > +ret = ram_hp_offset; > > +below_4g_hp_mem_size += size; > > +ram_hp_offset += size; > > +} > > +/* otherwise place it above 4GB */ > > +else { > > } else { > > > +ret = 0x1LL; > > +above_4g_hp_mem_size += size; > > +ram_hp_offset = 0x1LL + size; > > +} > > + > > +return ret; > > +} > > But the function and use of lots of global variables is ugly. The dimm > devices should be just created in piix_pci.c (i440fx) directly with > correct offsets and sizes, so all below_4g_mem_size etc. calculations > should be moved there. That would implement the PMC part of i440fx. > > For ISA PC, probably the board should create the DIMMs since there may > not be a memory controller. The >4G logic does not make sense there > anyway. What about moving the implementation to pc_piix.c? Initial RAM and pci windows are already calculated in pc_init1, and then passed to i440fx_init. The memory bus could be attached to i440fx for pci-enabled pc and to isabus-bridge for isa-pc (isa-pc not tested yet). Something like the following: --- hw/pc.h |1 + hw/pc_piix.c | 57 +++-- 2 files changed, 52 insertions(+), 6 deletions(-) diff --git a/hw/pc.h b/hw/pc.h index e4db071..d6cc43b 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -10,6 +10,7 @@ #include "memory.h" #include "ioapic.h" +#define PCI_HOLE_START 0xe000 /* PC-style peripherals (also used by other machines). */ /* serial.c */ diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 88ff041..17db95a 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -43,6 +43,7 @@ #include "xen.h" #include "memory.h" #include "exec-memory.h" +#include "dimm.h" #ifdef CONFIG_XEN # include #endif @@ -52,6 +53,8 @@ static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 }; static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; +static ram_addr_t below_4g_hp_mem_size = 0; +static ram_addr_t above_4g_hp_mem_size = 0; static void kvm_piix3_setup_irq_routing(bool pci_enabled) { @@ -117,6 +120,41 @@ static void ioapic_init(GSIState *gsi_state) } } +static target_phys_addr_t pc_set_hp_memory_offset(uint64_t size) +{ +target_phys_addr_t ret; +static ram_addr_t ram_hp_offset = 0; + +/* on first call, initialize ram_hp_offset */ +if (!ram_hp_offset) { +if (ram_size >= PCI_HOLE_START ) { +ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START); +} else { +ram_hp_offset = ram_size; +} +} + +if (ram_hp_offset >= 0x1LL) { +ret = ram_hp_offset; +above_4g_hp_mem_size += size; +ram_hp_offset += size; +} +/* if dimm fits before pci hole, append it normally */ +else if (ram_hp_offset + size <= PCI_HOLE_START) { +ret = ram_hp_offset; +below_4g_hp_mem_size += size; +ram_hp_offset += size; +} +/* otherwise place it above 4GB */ +else { +ret = 0x1LL; +above_4g_hp_mem_size += size; +ram_hp_offset = 0x1LL + size; +} + +return ret; +} + /* PC hardware initialisation */ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *system_io, @@ -155,9 +193,9 @@ static void pc_init1(MemoryRegion *system_memory, kvmclock_create(); } -if (ram_size >= 0xe000 ) { -above_4g_mem_size = ram_size - 0xe000; -below_4g_mem_size = 0xe000; +if (ram_size >= PCI_HOLE_START ) { +above_4g_mem_size = ram_size - PCI_HOLE_START; +below_4g_mem_size = PCI_HOLE_START; } else { above_4g_mem_size = 0; below_4g_mem_size = ram_size; @@ -172,6 +210,9 @@ static void pc_init1(MemoryRegion *system_memory, rom_memory = system_memory; } +/* adjust memory map for hotplug dimms */ +dimm_calc_offsets(pc_set_hp_memory_offset); + /* allocate ram and load rom/bios */ if (!xen_enabled()) { fw_cfg = pc_memory_init(syst
Re: [Qemu-devel] [RFC PATCH v3 11/19] Implement qmp and hmp commands for notification lists
Hi, On Fri, Sep 21, 2012 at 04:03:26PM -0600, Eric Blake wrote: > On 09/21/2012 05:17 AM, Vasilis Liaskovitis wrote: > > Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method. > > This patch implements a tail queue to store guest notifications for memory > > hot-add and hot-remove requests. > > > > Guest responses for memory hotplug command on a per-dimm basis can be > > detected > > with the new hmp command "info memhp" or the new qmp command "query-memhp" > > Naming doesn't match the QMP code. will fix. > > > Examples: > > > > (qemu) device_add dimm,id=ram0 > > > > > These notification items should probably be part of migration state (not yet > > implemented). > > In the case of libvirt driving migration, you already said in 10/19 that > libvirt has to start the destination with the populated=on|off fields > correct for each dimm according to the state it was in at the time the That patch actually alleviates this restriction for the off->on direction i.e. it allows for the target-VM to not have its args updated for dimm hot-add. (e.g. Let's say the source was started with a dimm, initialy off. The dimm is hot-plugged, and then migrated . WIth patch 10/19, the populated arg doesn't have to be updated on the target) The other direction (off->on) still needs correct arg change. If libvirt/management layers guarantee the dimm arguments are correctly changed, I don't see that we need 10/19 patch eventually. What I think is needed is another hmp/qmp command, that will report which dimms are on/off at any given time e.g. (monitor) info memory-hotplug dimm0: off dimm1: on ... dimmN: off This can be used on the source by libvirt / other layers to find out the populated dimms, and construct the correct command line on the destination. Does this make sense to you? The current patch only deals with success/failure event notifications (not on-off state of dimms) and should probably be renamed to "query-memory-hotplug-events". > host started the update. Can the host hot unplug memory after migration > has started? Good testcase. I would rather not allow any hotplug operations while the migration is happening. What do we do with pci hotplug during migration currently? I found a discussion dating from a year ago, suggesting the same as the simplest solution, but I don't know what's currently implemented. http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg01204.html > > > + > > +## > > +# @MemHpInfo: > > +# > > +# Information about status of a memory hotplug command > > +# > > +# @dimm: the Dimm associated with the result > > +# > > +# @result: the result of the hotplug command > > +# > > +# Since: 1.3 > > +# > > +## > > +{ 'type': 'MemHpInfo', > > + 'data': {'dimm': 'str', 'request': 'str', 'result': 'str'} } > > Should 'result' be a bool (true for success, false for still pending) or > an enum, instead of a free-form string? Likewise, isn't 'request' going > to be exactly one of two values (plug or unplug)? agreed with 'request'. For 'result' it is also a boolean, but with 'success' and 'failure' (rather than 'pending'). Items are only queued when the guest has given us a definite _OST or _EJ result wich is either success or fail. If an operation is pending, nothing is queued here. Perhaps queueing pending operations also has a usecase, but this isn't addressed in this patch. thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 20/19][SeaBIOS] alternative: Use paravirt interface for pci windows
On Mon, Sep 24, 2012 at 02:35:30PM +0800, Wen Congyang wrote: > At 09/21/2012 07:20 PM, Vasilis Liaskovitis Wrote: > > Initialize the 32-bit and 64-bit pci starting offsets from values passed in > > by > > the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the > > starting > > offsets based on initial memory and hotplug-able dimms. > > This patch can't be applied if I apply the other patches for seabios. And I > don't find this patch in your tree. to test these alternative patches, please try these trees: https://github.com/vliaskov/seabios/commits/memhp-v3-alt https://github.com/vliaskov/qemu-kvm/commits/memhp-v3-alt thanks, - Vasilis
Re: [Qemu-devel] [RFC PATCH v3 06/19] Implement "-dimm" command line option
On Sat, Sep 22, 2012 at 01:46:57PM +, Blue Swirl wrote: > On Fri, Sep 21, 2012 at 11:17 AM, Vasilis Liaskovitis > wrote: > > Example: > > "-dimm id=dimm0,size=512M,node=0,populated=off" > > There should not be a need to introduce a new top level option, > instead you should just use -device, like > -device dimm,base=0,id=dimm0,size=512M,node=0,populated=off > > That would also specify the start address. What is "base"? the start address? I think the start address should be calculated by the chipset / board, not by the user. The "-dimm" option is supposed to specify the dimm/memory layout, and not create any devices. If we don't want this new option, I have a question: A "-device/device_add" means we create a new qdev device at startup or as a hotplug operation respectively. So, the semantics of "-device dimm,id=dimm0,size=512M,node=0,populated=on" are clear to me. What does "-device dimm,populated=off" mean from a qdev perspective? There are 2 alternatives: - The device is created on the dimmbus, but is not used/populated yet. Than the activation/acpi-hotplug of the dimm may require a separate command (we used to have "dimm_add" in versions < 3). "device_add" handling always hotplugs a new qdev device, so this wouldn't fit this usecase, because the device already exists. In this case, the actual "acpi hotplug" operation is decoupled from qdev device creation. - The dimmdevice is not created when "-device dimm,populated=off" (this would require some ugly checking in normal -device argument handling). Only the dimm layout is saved. The hotplug is triggered from a normal device_add later. So in this case, the "acpi hotplug" happens at the same time as the qdev hotplug. Do you see a simpler alternative without introducing a new option? Using the "-dimm" option follows the second semantic and avoids changing the "-device" semantics. Dimm layout description is decoupled from dimmdevice creation, and qdev hotplug coincides with acpi hotplug. thanks, - Vasilis
[Qemu-devel] [RFC PATCH v3 00/19] ACPI memory hotplug
on uq/master for qemu-kvm, and master for seabios. Can be found also at: http://github.com/vliaskov/qemu-kvm/commits/memhp-v3 http://github.com/vliaskov/seabios/commits/memhp-v3 Vasilis Liaskovitis (12): Implement dimm device abstraction Implement "-dimm" command line option acpi_piix4: Implement memory device hotplug registers pc: calculate dimm physical addresses and adjust memory map pc: Add dimm paravirt SRAT info fix live-migration when "populated=on" is missing Implement qmp and hmp commands for notification lists Implement "info memory-total" and "query-memory-total" balloon: update with hotplugged memory Add _OST dimm support Update dimm state on reset Implement _PS3 for dimm arch_init.c | 24 ++- docs/specs/acpi_hotplug.txt | 54 ++ docs/specs/fwcfg.txt| 28 +++ hmp-commands.hx |4 + hmp.c | 24 +++ hmp.h |2 + hw/Makefile.objs|2 +- hw/acpi_piix4.c | 114 +++- hw/dimm.c | 435 +++ hw/dimm.h | 101 ++ hw/pc.c | 55 ++- hw/pc.h |6 + hw/pc_piix.c| 20 ++- hw/virtio-balloon.c | 13 +- monitor.c | 14 ++ qapi-schema.json| 37 qemu-config.c | 25 +++ qemu-options.hx |5 + qmp-commands.hx | 57 ++ sysemu.h|1 + vl.c| 51 + 21 files changed, 1051 insertions(+), 21 deletions(-) create mode 100644 docs/specs/acpi_hotplug.txt create mode 100644 docs/specs/fwcfg.txt create mode 100644 hw/dimm.c create mode 100644 hw/dimm.h Vasilis Liaskovitis (7): Add ACPI_EXTRACT_DEVICE* macros Subject: [PATCH 02/18] Add SSDT memory device support acpi-dsdt: Implement functions for memory hotplug acpi: generate hotplug memory devices Add _OST dimm method Implement _PS3 method for memory device Calculate pcimem_start and pcimem64_start from SRAT entries Makefile |2 +- src/acpi-dsdt.dsl | 135 ++- src/acpi.c| 216 src/acpi.h|3 + src/pciinit.c |6 +- src/post.c|3 + src/smp.c |4 + src/ssdt-mem.dsl | 73 + tools/acpi_extract.py | 28 +++ 9 files changed, 447 insertions(+), 23 deletions(-) create mode 100644 src/ssdt-mem.dsl -- 1.7.9
[Qemu-devel] [RFC PATCH v3 08/19] pc: calculate dimm physical addresses and adjust memory map
Dimm physical address offsets are calculated automatically and memory map is adjusted accordingly. If a DIMM can fit before the PCI_HOLE_START (currently 0xe000), it will be added normally, otherwise its physical address will be above 4GB. Also create memory bus on i440fx-pcihost device. Signed-off-by: Vasilis Liaskovitis --- hw/pc.c | 41 + hw/pc.h |6 ++ hw/pc_piix.c | 20 ++-- vl.c |1 + 4 files changed, 62 insertions(+), 6 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 112739a..2c9664d 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -52,6 +52,7 @@ #include "arch_init.h" #include "bitmap.h" #include "vga-pci.h" +#include "dimm.h" /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -93,6 +94,9 @@ struct e820_table { static struct e820_table e820_table; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +ram_addr_t below_4g_hp_mem_size = 0; +ram_addr_t above_4g_hp_mem_size = 0; +extern target_phys_addr_t ram_hp_offset; void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -1160,3 +1164,40 @@ void pc_pci_device_init(PCIBus *pci_bus) pci_create_simple(pci_bus, -1, "lsi53c895a"); } } + + +/* Function to configure memory offsets of hotpluggable dimms */ + +target_phys_addr_t pc_set_hp_memory_offset(uint64_t size) +{ +target_phys_addr_t ret; + +/* on first call, initialize ram_hp_offset */ +if (!ram_hp_offset) { +if (ram_size >= PCI_HOLE_START ) { +ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START); +} else { +ram_hp_offset = ram_size; +} +} + +if (ram_hp_offset >= 0x1LL) { +ret = ram_hp_offset; +above_4g_hp_mem_size += size; +ram_hp_offset += size; +} +/* if dimm fits before pci hole, append it normally */ +else if (ram_hp_offset + size <= PCI_HOLE_START) { +ret = ram_hp_offset; +below_4g_hp_mem_size += size; +ram_hp_offset += size; +} +/* otherwise place it above 4GB */ +else { +ret = 0x1LL; +above_4g_hp_mem_size += size; +ram_hp_offset = 0x1LL + size; +} + +return ret; +} diff --git a/hw/pc.h b/hw/pc.h index e4db071..f3304fc 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -10,6 +10,7 @@ #include "memory.h" #include "ioapic.h" +#define PCI_HOLE_START 0xe000 /* PC-style peripherals (also used by other machines). */ /* serial.c */ @@ -214,6 +215,11 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, int irq, NICInfo *nd) /* pc_sysfw.c */ void pc_system_firmware_init(MemoryRegion *rom_memory); +/* memory hotplug */ +target_phys_addr_t pc_set_hp_memory_offset(uint64_t size); +extern ram_addr_t below_4g_hp_mem_size; +extern ram_addr_t above_4g_hp_mem_size; + /* e820 types */ #define E820_RAM1 #define E820_RESERVED 2 diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 88ff041..d1fd276 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -43,6 +43,7 @@ #include "xen.h" #include "memory.h" #include "exec-memory.h" +#include "dimm.h" #ifdef CONFIG_XEN # include #endif @@ -155,9 +156,9 @@ static void pc_init1(MemoryRegion *system_memory, kvmclock_create(); } -if (ram_size >= 0xe000 ) { -above_4g_mem_size = ram_size - 0xe000; -below_4g_mem_size = 0xe000; +if (ram_size >= PCI_HOLE_START ) { +above_4g_mem_size = ram_size - PCI_HOLE_START; +below_4g_mem_size = PCI_HOLE_START; } else { above_4g_mem_size = 0; below_4g_mem_size = ram_size; @@ -172,6 +173,9 @@ static void pc_init1(MemoryRegion *system_memory, rom_memory = system_memory; } +/* adjust memory map for hotplug dimms */ +dimm_calc_offsets(pc_set_hp_memory_offset); + /* allocate ram and load rom/bios */ if (!xen_enabled()) { fw_cfg = pc_memory_init(system_memory, @@ -192,9 +196,11 @@ static void pc_init1(MemoryRegion *system_memory, if (pci_enabled) { pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi, system_memory, system_io, ram_size, - below_4g_mem_size, - 0x1ULL - below_4g_mem_size, - 0x1ULL + above_4g_mem_size, + below_4g_mem_size + below_4g_hp_mem_size, + 0x1ULL - below_4g_mem_size +- below_4g_hp_mem_size, + 0x1ULL + above_4g_mem_size ++ above_4g_hp_mem_size, (sizeof(target_phys_addr_t) == 4 ? 0
[Qemu-devel] [RFC PATCH v3 20/19][SeaBIOS] alternative: Use paravirt interface for pci windows
Initialize the 32-bit and 64-bit pci starting offsets from values passed in by the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the starting offsets based on initial memory and hotplug-able dimms. Signed-off-by: Vasilis Liaskovitis --- src/paravirt.c |6 ++ src/paravirt.h |2 ++ src/pciinit.c |5 ++--- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/src/paravirt.c b/src/paravirt.c index 2a98d53..390ef30 100644 --- a/src/paravirt.c +++ b/src/paravirt.c @@ -346,3 +346,9 @@ void qemu_cfg_romfile_setup(void) dprintf(3, "Found fw_cfg file: %s (size=%d)\n", file->name, file->size); } } + +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start) +{ +qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64)); +qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64)); +} diff --git a/src/paravirt.h b/src/paravirt.h index a284c41..b53ff88 100644 --- a/src/paravirt.h +++ b/src/paravirt.h @@ -35,6 +35,7 @@ static inline int kvm_para_available(void) #define QEMU_CFG_BOOT_MENU 0x0e #define QEMU_CFG_MAX_CPUS 0x0f #define QEMU_CFG_FILE_DIR 0x19 +#define QEMU_CFG_PCI_WINDOW 0x1a #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES(QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) @@ -65,5 +66,6 @@ struct e820_reservation { u32 qemu_cfg_e820_entries(void); void* qemu_cfg_e820_load_next(void *addr); void qemu_cfg_romfile_setup(void); +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start); #endif diff --git a/src/pciinit.c b/src/pciinit.c index 68f302a..64468a0 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -592,8 +592,7 @@ static void pci_region_map_entries(struct pci_bus *busses, struct pci_region *r) static void pci_bios_map_devices(struct pci_bus *busses) { -pcimem_start = RamSize; - +qemu_cfg_get_pci_offsets(&pcimem_start, &pcimem64_start); if (pci_bios_init_root_regions(busses)) { struct pci_region r64_mem, r64_pref; r64_mem.list = NULL; @@ -611,7 +610,7 @@ static void pci_bios_map_devices(struct pci_bus *busses) u64 align_mem = pci_region_align(&r64_mem); u64 align_pref = pci_region_align(&r64_pref); -r64_mem.base = ALIGN(0x1LL + RamSizeOver4G, align_mem); +r64_mem.base = ALIGN(pcimem64_start, align_mem); r64_pref.base = ALIGN(r64_mem.base + sum_mem, align_pref); pcimem64_start = r64_mem.base; pcimem64_end = r64_pref.base + sum_pref; -- 1.7.9
[Qemu-devel] [RFC PATCH v3 12/19] Implement "info memory-total" and "query-memory-total"
Returns total physical memory available to guest in bytes, including hotplugged memory. Note that the number reported here may be different from what the guest sees e.g. if the guest has not logically onlined hotplugged memory. This functionality is provided independently of a balloon device, since a guest can be using ACPI memory hotplug without using a balloon device. Signed-off-by: Vasilis Liaskovitis --- hmp-commands.hx |2 ++ hmp.c|7 +++ hmp.h|1 + hw/dimm.c| 21 + hw/dimm.h|1 + monitor.c|7 +++ qapi-schema.json | 11 +++ qmp-commands.hx | 20 8 files changed, 70 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index cfb1b67..988d207 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1464,6 +1464,8 @@ show qdev device model list show roms @item info memory-hotplug show memory-hotplug +@item info memory-total +show memory-total @end table ETEXI diff --git a/hmp.c b/hmp.c index 4b3d63d..cc31ddc 100644 --- a/hmp.c +++ b/hmp.c @@ -1185,3 +1185,10 @@ void hmp_info_memory_hotplug(Monitor *mon) qapi_free_MemHpInfoList(info); } + +void hmp_info_memory_total(Monitor *mon) +{ +uint64_t ram_total; +ram_total = (uint64_t)qmp_query_memory_total(NULL); +monitor_printf(mon, "MemTotal: %lu \n", ram_total); +} diff --git a/hmp.h b/hmp.h index 986705a..ab96dba 100644 --- a/hmp.h +++ b/hmp.h @@ -74,5 +74,6 @@ void hmp_closefd(Monitor *mon, const QDict *qdict); void hmp_send_key(Monitor *mon, const QDict *qdict); void hmp_screen_dump(Monitor *mon, const QDict *qdict); void hmp_info_memory_hotplug(Monitor *mon); +void hmp_info_memory_total(Monitor *mon); #endif diff --git a/hw/dimm.c b/hw/dimm.c index fbd93a8..21626f6 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -28,6 +28,7 @@ static DimmBus *main_memory_bus; /* the following list is used to hold dimm config info before machine * initialization. After machine init, the list is emptied and not used anymore.*/ static DimmConfiglist dimmconfig_list = QTAILQ_HEAD_INITIALIZER(dimmconfig_list); +extern ram_addr_t ram_size; static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent); static char *dimmbus_get_fw_dev_path(DeviceState *dev); @@ -233,6 +234,26 @@ void setup_fwcfg_hp_dimms(uint64_t *fw_cfg_slots) } } +uint64_t get_hp_memory_total(void) +{ +DimmBus *bus = main_memory_bus; +DimmDevice *slot; +uint64_t info = 0; + +QTAILQ_FOREACH(slot, &bus->dimmlist, nextdimm) { +info += slot->size; +} +return info; +} + +int64_t qmp_query_memory_total(Error **errp) +{ +uint64_t info; +info = ram_size + get_hp_memory_total(); + +return (int64_t)info; +} + void dimm_notify(uint32_t idx, uint32_t event) { DimmBus *bus = main_memory_bus; diff --git a/hw/dimm.h b/hw/dimm.h index 95251ba..21225be 100644 --- a/hw/dimm.h +++ b/hw/dimm.h @@ -86,5 +86,6 @@ int dimm_add(char *id); void main_memory_bus_create(Object *parent); void dimm_config_create(char *id, uint64_t size, uint64_t node, uint32_t dimm_idx, uint32_t populated); +uint64_t get_hp_memory_total(void); #endif diff --git a/monitor.c b/monitor.c index be9a1d9..4f5ea60 100644 --- a/monitor.c +++ b/monitor.c @@ -2747,6 +2747,13 @@ static mon_cmd_t info_cmds[] = { .mhandler.info = hmp_info_memory_hotplug, }, { +.name = "memory-total", +.args_type = "", +.params = "", +.help = "show total memory size", +.mhandler.info = hmp_info_memory_total, +}, +{ .name = NULL, }, }; diff --git a/qapi-schema.json b/qapi-schema.json index 3706a2a..c1d2571 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2581,3 +2581,14 @@ # Since: 1.3 ## { 'command': 'query-memory-hotplug', 'returns': ['MemHpInfo'] } + +## +# @query-memory-total: +# +# Returns total memory in bytes, including hotplugged dimms +# +# Returns: int +# +# Since: 1.3 +## +{ 'command': 'query-memory-total', 'returns': 'int' } diff --git a/qmp-commands.hx b/qmp-commands.hx index e50dcc2..20b7eea 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2576,3 +2576,23 @@ Example: } EQMP + +{ +.name = "query-memory-total", +.args_type = "", +.mhandler.cmd_new = qmp_marshal_input_query_memory_total +}, +SQMP +query-memory-total +-- + +Return total memory in bytes, including hotplugged dimms + +Example: + +-> { "execute": "query-memory-total" } +<- { + "return": 1073741824 + } + +EQMP -- 1.7.9
[Qemu-devel] [RFC PATCH v3 03/19][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
Extend the DSDT to include methods for handling memory hot-add and hot-remove notifications and memory device status requests. These functions are called from the memory device SSDT methods. Signed-off-by: Vasilis Liaskovitis --- src/acpi-dsdt.dsl | 70 +++- 1 files changed, 68 insertions(+), 2 deletions(-) diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 2060686..5d3e92b 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -737,6 +737,71 @@ DefinitionBlock ( } Return(One) } +/* Objects filled in by run-time generated SSDT */ +External(MTFY, MethodObj) +External(MEON, PkgObj) + +Method (CMST, 1, NotSerialized) { +// _STA method - return ON status of memdevice +// Local0 = MEON flag for this cpu +Store(DerefOf(Index(MEON, Arg0)), Local0) +If (Local0) { Return(0xF) } Else { Return(0x0) } +} + +/* Memory hotplug notify array */ +OperationRegion(MEST, SystemIO, 0xaf80, 32) +Field (MEST, ByteAcc, NoLock, Preserve) +{ +MES, 256 +} + +/* Memory eject byte */ +OperationRegion(MEMJ, SystemIO, 0xafa0, 1) +Field (MEMJ, ByteAcc, NoLock, Preserve) +{ +MPE, 8 +} + +Method(MESC, 0) { +// Local5 = active memdevice bitmap +Store (MES, Local5) +// Local2 = last read byte from bitmap +Store (Zero, Local2) +// Local0 = memory device iterator +Store (Zero, Local0) +While (LLess(Local0, SizeOf(MEON))) { +// Local1 = MEON flag for this memory device +Store(DerefOf(Index(MEON, Local0)), Local1) +If (And(Local0, 0x07)) { +// Shift down previously read bitmap byte +ShiftRight(Local2, 1, Local2) +} Else { +// Read next byte from memdevice bitmap +Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2) +} +// Local3 = active state for this memory device +Store(And(Local2, 1), Local3) + +If (LNotEqual(Local1, Local3)) { +// State change - update MEON with new state +Store(Local3, Index(MEON, Local0)) +// Do MEM notify +If (LEqual(Local3, 1)) { +MTFY(Local0, 1) +} Else { +MTFY(Local0, 3) +} +} +Increment(Local0) +} +Return(One) +} + +Method (MPEJ, 2, NotSerialized) { +// _EJ0 method - eject callback +Store(Arg0, MPE) +Sleep(200) +} } @@ -759,8 +824,9 @@ DefinitionBlock ( // CPU hotplug event Return(\_SB.PRSC()) } -Method(_L03) { -Return(0x01) +Method(_E03) { +// Memory hotplug event +Return(\_SB.MESC()) } Method(_L04) { Return(0x01) -- 1.7.9
[Qemu-devel] [RFC PATCH v3 09/19] pc: Add dimm paravirt SRAT info
The numa_fw_cfg paravirt interface is extended to include SRAT information for all hotplug-able dimms. There are 3 words for each hotplug-able memory slot, denoting start address, size and node proximity. The new info is appended after existing numa info, so that the fw_cfg layout does not break. This information is used by Seabios to build hotplug memory device objects at runtime. nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info to SeaBIOS. v1->v2: Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not to break existing layout Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt Signed-off-by: Vasilis Liaskovitis --- docs/specs/fwcfg.txt | 28 hw/pc.c | 14 -- 2 files changed, 40 insertions(+), 2 deletions(-) create mode 100644 docs/specs/fwcfg.txt diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt new file mode 100644 index 000..55f96d9 --- /dev/null +++ b/docs/specs/fwcfg.txt @@ -0,0 +1,28 @@ +QEMU<->BIOS Paravirt Documentation +-- + +This document describes paravirt data structures passed from QEMU to BIOS. + +FW_CFG_NUMA paravirt info + +The SRAT info passed from QEMU to BIOS has the following layout: + +--- +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem + +--- +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm + +Entry 0 contains the number of numa nodes (nb_numa_nodes). + +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each +one of the vCPUs in the system. + +Entries max_cpus+1..max_cpus+nb_numa_nodes+1: The next nb_numa_nodes entries +describe the memory size for each one of the NUMA nodes in the system. + +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms) + +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains +the physical address offset, size (in bytes), and node proximity for the +respective dimm. diff --git a/hw/pc.c b/hw/pc.c index 2c9664d..f2604ae 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -598,6 +598,7 @@ static void *bochs_bios_init(void) uint8_t *smbios_table; size_t smbios_len; uint64_t *numa_fw_cfg; +uint64_t *hp_dimms_fw_cfg; int i, j; register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL); @@ -632,8 +633,10 @@ static void *bochs_bios_init(void) /* allocate memory for the NUMA channel: one (64bit) word for the number * of nodes, one word for each VCPU->node and one word for each node to * hold the amount of memory. + * Finally one word for the number of hotplug memory slots and three words + * for each hotplug memory slot (start address, size and node proximity). */ -numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8); +numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8); numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); for (i = 0; i < max_cpus; i++) { for (j = 0; j < nb_numa_nodes; j++) { @@ -646,8 +649,15 @@ static void *bochs_bios_init(void) for (i = 0; i < nb_numa_nodes; i++) { numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]); } + +numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms); + +hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes; +if (nb_hp_dimms) +setup_fwcfg_hp_dimms(hp_dimms_fw_cfg); + fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg, - (1 + max_cpus + nb_numa_nodes) * 8); + (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8); return fw_cfg; } -- 1.7.9
[Qemu-devel] [RFC PATCH v3 01/19][SeaBIOS] Add ACPI_EXTRACT_DEVICE* macros
This allows to extract the beginning, end and name of a Device object. Signed-off-by: Vasilis Liaskovitis --- tools/acpi_extract.py | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py index 167a322..cb2540e 100755 --- a/tools/acpi_extract.py +++ b/tools/acpi_extract.py @@ -195,6 +195,28 @@ def aml_package_start(offset): offset += 1 return offset + aml_pkglen_bytes(offset) + 1 +def aml_device_start(offset): +#0x5B 0x82 DeviceOp PkgLength NameString ProcID +if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)): +die( "Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x" % + (offset, aml[offset], aml[offset + 1])); +return offset + +def aml_device_string(offset): +#0x5B 0x82 DeviceOp PkgLength NameString ProcID +start = aml_device_start(offset) +offset += 2 +pkglenbytes = aml_pkglen_bytes(offset) +offset += pkglenbytes +return offset + +def aml_device_end(offset): +start = aml_device_start(offset) +offset += 2 +pkglenbytes = aml_pkglen_bytes(offset) +pkglen = aml_pkglen(offset) +return offset + pkglen + lineno = 0 for line in fileinput.input(): # Strip trailing newline @@ -279,6 +301,12 @@ for i in range(len(asl)): offset = aml_processor_end(offset) elif (directive == "ACPI_EXTRACT_PKG_START"): offset = aml_package_start(offset) +elif (directive == "ACPI_EXTRACT_DEVICE_START"): +offset = aml_device_start(offset) +elif (directive == "ACPI_EXTRACT_DEVICE_STRING"): +offset = aml_device_string(offset) +elif (directive == "ACPI_EXTRACT_DEVICE_END"): +offset = aml_device_end(offset) else: die("Unsupported directive %s" % directive) -- 1.7.9
[Qemu-devel] [RFC PATCH v3 11/19] Implement qmp and hmp commands for notification lists
Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method. This patch implements a tail queue to store guest notifications for memory hot-add and hot-remove requests. Guest responses for memory hotplug command on a per-dimm basis can be detected with the new hmp command "info memhp" or the new qmp command "query-memhp" Examples: (qemu) device_add dimm,id=ram0 (qemu) info memory-hotplug dimm: ram0 hot-add success or dimm: ram0 hot-add failure (qemu) device_del ram3 (qemu) info memory-hotplug dimm: ram3 hot-remove success or dimm: ram3 hot-remove failure Results are removed from the queue once read. This patch only queues _EJ events that signal hot-remove success. For _OST event queuing, which cover the hot-remove failure and hot-add success/failure cases, the _OST patches in this series are are also needed. These notification items should probably be part of migration state (not yet implemented). Signed-off-by: Vasilis Liaskovitis --- hmp-commands.hx |2 + hmp.c| 17 ++ hmp.h|1 + hw/dimm.c| 62 +- hw/dimm.h|2 +- monitor.c|7 ++ qapi-schema.json | 26 ++ qmp-commands.hx | 37 8 files changed, 152 insertions(+), 2 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index ed67e99..cfb1b67 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1462,6 +1462,8 @@ show device tree show qdev device model list @item info roms show roms +@item info memory-hotplug +show memory-hotplug @end table ETEXI diff --git a/hmp.c b/hmp.c index ba6fbd3..4b3d63d 100644 --- a/hmp.c +++ b/hmp.c @@ -1168,3 +1168,20 @@ void hmp_screen_dump(Monitor *mon, const QDict *qdict) qmp_screendump(filename, &err); hmp_handle_error(mon, &err); } + +void hmp_info_memory_hotplug(Monitor *mon) +{ +MemHpInfoList *info; +MemHpInfoList *item; +MemHpInfo *dimm; + +info = qmp_query_memory_hotplug(NULL); +for (item = info; item; item = item->next) { +dimm = item->value; +monitor_printf(mon, "dimm: %s %s %s\n", dimm->dimm, +dimm->request, dimm->result); +dimm->dimm = NULL; +} + +qapi_free_MemHpInfoList(info); +} diff --git a/hmp.h b/hmp.h index 48b9c59..986705a 100644 --- a/hmp.h +++ b/hmp.h @@ -73,5 +73,6 @@ void hmp_getfd(Monitor *mon, const QDict *qdict); void hmp_closefd(Monitor *mon, const QDict *qdict); void hmp_send_key(Monitor *mon, const QDict *qdict); void hmp_screen_dump(Monitor *mon, const QDict *qdict); +void hmp_info_memory_hotplug(Monitor *mon); #endif diff --git a/hw/dimm.c b/hw/dimm.c index 288b997..fbd93a8 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -65,6 +65,7 @@ static void dimm_bus_initfn(Object *obj) DimmBus *bus = DIMM_BUS(obj); QTAILQ_INIT(&bus->dimmconfig_list); QTAILQ_INIT(&bus->dimmlist); +QTAILQ_INIT(&bus->dimm_hp_result_queue); QTAILQ_FOREACH_SAFE(dimm_cfg, &dimmconfig_list, nextdimmcfg, next_dimm_cfg) { QTAILQ_REMOVE(&dimmconfig_list, dimm_cfg, nextdimmcfg); @@ -236,20 +237,78 @@ void dimm_notify(uint32_t idx, uint32_t event) { DimmBus *bus = main_memory_bus; DimmDevice *s; +DimmConfig *slotcfg; +struct dimm_hp_result *result; + s = dimm_find_from_idx(idx); assert(s != NULL); +result = g_malloc0(sizeof(*result)); +slotcfg = dimmcfg_find_from_name(DEVICE(s)->id); +result->dimmname = slotcfg->name; switch(event) { case DIMM_REMOVE_SUCCESS: dimm_depopulate(s); -qdev_simple_unplug_cb((DeviceState*)s); QTAILQ_REMOVE(&bus->dimmlist, s, nextdimm); +qdev_simple_unplug_cb((DeviceState*)s); +QTAILQ_INSERT_TAIL(&bus->dimm_hp_result_queue, result, next); break; default: +g_free(result); break; } } +MemHpInfoList *qmp_query_memory_hotplug(Error **errp) +{ +DimmBus *bus = main_memory_bus; +MemHpInfoList *head = NULL, *cur_item = NULL, *info; +struct dimm_hp_result *item, *nextitem; + +QTAILQ_FOREACH_SAFE(item, &bus->dimm_hp_result_queue, next, nextitem) { + +info = g_malloc0(sizeof(*info)); +info->value = g_malloc0(sizeof(*info->value)); +info->value->dimm = g_malloc0(sizeof(char) * 32); +info->value->request = g_malloc0(sizeof(char) * 16); +info->value->result = g_malloc0(sizeof(char) * 16); +switch (item->ret) { +case DIMM_REMOVE_SUCCESS: +strcpy(info->value->request, "hot-remove"); +strcpy(info->value->result, "success"); +break; +case DIMM_REMOVE_FAIL: +strcpy(info->value->request, "
[Qemu-devel] [RFC PATCH v3 18/19] Implement _PS3 for dimm
This will allow us to update dimm state on OSPM-initiated eject operations e.g. with "echo 1 > /sys/bus/acpi/devices/PNP0C80\:00/eject" Signed-off-by: Vasilis Liaskovitis --- docs/specs/acpi_hotplug.txt |7 +++ hw/acpi_piix4.c |5 + hw/dimm.c |3 +++ hw/dimm.h |3 ++- 4 files changed, 17 insertions(+), 1 deletions(-) diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt index 536da16..69868fe 100644 --- a/docs/specs/acpi_hotplug.txt +++ b/docs/specs/acpi_hotplug.txt @@ -45,3 +45,10 @@ insertion failed. Written by ACPI memory device _OST method to notify qemu of failed hot-add. Write-only. +Memory Dimm _PS3 power-off initiated by OSPM (IO port 0xafa4, 1-byte access): +--- +Dimm hot-add _PS3 initiated by OSPM. Byte value indicates Dimm slot which +entered D3 state. + +Written by ACPI memory device _PS3 method to notify qemu of power-off state for +the dimm. Write-only. diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 8bf58a6..aad78ca 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -52,6 +52,7 @@ #define MEM_OST_REMOVE_FAIL 0xafa1 #define MEM_OST_ADD_SUCCESS 0xafa2 #define MEM_OST_ADD_FAIL 0xafa3 +#define MEM_PS3 0xafa4 #define PIIX4_MEM_HOTPLUG_STATUS 8 #define PIIX4_PCI_HOTPLUG_STATUS 2 @@ -545,6 +546,9 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val) case MEM_OST_ADD_FAIL: dimm_notify(val, DIMM_ADD_FAIL); break; +case MEM_PS3: +dimm_notify(val, DIMM_OSPM_POWEROFF); +break; default: acpi_gpe_ioport_writeb(&s->ar, addr, val); } @@ -621,6 +625,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s) register_ioport_write(MEM_OST_REMOVE_FAIL, 1, 1, gpe_writeb, s); register_ioport_write(MEM_OST_ADD_SUCCESS, 1, 1, gpe_writeb, s); register_ioport_write(MEM_OST_ADD_FAIL, 1, 1, gpe_writeb, s); +register_ioport_write(MEM_PS3, 1, 1, gpe_writeb, s); for(i = 0; i < DIMM_BITMAP_BYTES; i++) { s->gperegs.mems_sts[i] = 0; diff --git a/hw/dimm.c b/hw/dimm.c index b993668..08f66d5 100644 --- a/hw/dimm.c +++ b/hw/dimm.c @@ -319,6 +319,9 @@ void dimm_notify(uint32_t idx, uint32_t event) qdev_simple_unplug_cb((DeviceState*)s); QTAILQ_INSERT_TAIL(&bus->dimm_hp_result_queue, result, next); break; +case DIMM_OSPM_POWEROFF: +if (bus->dimm_revert) +bus->dimm_revert(bus->dimm_hotplug_qdev, s, 1); default: g_free(result); break; diff --git a/hw/dimm.h b/hw/dimm.h index ce091fe..8d73b8f 100644 --- a/hw/dimm.h +++ b/hw/dimm.h @@ -15,7 +15,8 @@ typedef enum { DIMM_REMOVE_SUCCESS = 0, DIMM_REMOVE_FAIL = 1, DIMM_ADD_SUCCESS = 2, -DIMM_ADD_FAIL = 3 +DIMM_ADD_FAIL = 3, +DIMM_OSPM_POWEROFF = 4 } dimm_hp_result_code; typedef enum { -- 1.7.9
[Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction
Each hotplug-able memory slot is a DimmDevice. All DimmDevices are attached to a new bus called DimmBus. This bus is introduced so that we no longer depend on hotplug-capability of main system bus (the main bus does not allow hotplugging). The DimmBus should be attached to a chipset Device (i440fx in case of the pc) A hot-add operation for a particular dimm: - creates a new DimmDevice and attaches it to the DimmBus - creates a new MemoryRegion of the given physical address offset, size and node proximity, and attaches it to main system memory as a sub_region. A successful hot-remove operation detaches and frees the MemoryRegion from system memory, and removes the DimmDevice from the DimmBus. Hotplug operations are done through normal device_add /device_del commands. Also add properties to DimmDevice. Signed-off-by: Vasilis Liaskovitis --- hw/dimm.c | 305 + hw/dimm.h | 90 ++ 2 files changed, 395 insertions(+), 0 deletions(-) create mode 100644 hw/dimm.c create mode 100644 hw/dimm.h diff --git a/hw/dimm.c b/hw/dimm.c new file mode 100644 index 000..288b997 --- /dev/null +++ b/hw/dimm.c @@ -0,0 +1,305 @@ +/* + * Dimm device for Memory Hotplug + * + * Copyright ProfitBricks GmbH 2012 + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/> + */ + +#include "trace.h" +#include "qdev.h" +#include "dimm.h" +#include +#include "../exec-memory.h" +#include "qmp-commands.h" + +/* the system-wide memory bus. */ +static DimmBus *main_memory_bus; +/* the following list is used to hold dimm config info before machine + * initialization. After machine init, the list is emptied and not used anymore.*/ +static DimmConfiglist dimmconfig_list = QTAILQ_HEAD_INITIALIZER(dimmconfig_list); + +static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent); +static char *dimmbus_get_fw_dev_path(DeviceState *dev); + +static Property dimm_properties[] = { +DEFINE_PROP_UINT64("start", DimmDevice, start, 0), +DEFINE_PROP_UINT64("size", DimmDevice, size, DEFAULT_DIMMSIZE), +DEFINE_PROP_UINT32("node", DimmDevice, node, 0), +DEFINE_PROP_END_OF_LIST(), +}; + +static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent) +{ +} + +static char *dimmbus_get_fw_dev_path(DeviceState *dev) +{ +char path[40]; + +snprintf(path, sizeof(path), "%s", qdev_fw_name(dev)); +return strdup(path); +} + +static void dimm_bus_class_init(ObjectClass *klass, void *data) +{ +BusClass *k = BUS_CLASS(klass); + +k->print_dev = dimmbus_dev_print; +k->get_fw_dev_path = dimmbus_get_fw_dev_path; +} + +static void dimm_bus_initfn(Object *obj) +{ +DimmConfig *dimm_cfg, *next_dimm_cfg; +DimmBus *bus = DIMM_BUS(obj); +QTAILQ_INIT(&bus->dimmconfig_list); +QTAILQ_INIT(&bus->dimmlist); + +QTAILQ_FOREACH_SAFE(dimm_cfg, &dimmconfig_list, nextdimmcfg, next_dimm_cfg) { +QTAILQ_REMOVE(&dimmconfig_list, dimm_cfg, nextdimmcfg); +QTAILQ_INSERT_TAIL(&bus->dimmconfig_list, dimm_cfg, nextdimmcfg); +} +} + +static const TypeInfo dimm_bus_info = { +.name = TYPE_DIMM_BUS, +.parent = TYPE_BUS, +.instance_size = sizeof(DimmBus), +.instance_init = dimm_bus_initfn, +.class_init = dimm_bus_class_init, +}; + +void main_memory_bus_create(Object *parent) +{ +main_memory_bus = g_malloc0(dimm_bus_info.instance_size); +main_memory_bus->qbus.glib_allocated = true; +qbus_create_inplace(&main_memory_bus->qbus, TYPE_DIMM_BUS, DEVICE(parent), +"membus"); +} + +static void dimm_populate(DimmDevice *s) +{ +DeviceState *dev= (DeviceState*)s; +MemoryRegion *new = NULL; + +new = g_malloc(sizeof(MemoryRegion)); +memory_region_init_ram(new, dev->id, s->size); +vmstate_register_ram_global(new); +memory_region_add_subregion(get_system_memory(), s->start, new); +s->mr = new; +} + +static void dimm_depopulate(DimmDevice *s) +{ +assert(s); +vmstate_unregister_ram(s->mr, NULL); +memory_region_del_subregion(get_system_memory(), s->mr); +memory_region_destroy(s->mr); +s->mr = NULL; +} + +void dimm_config_create(char *id, uint64_t size, uin
[Qemu-devel] [RFC PATCH v3 19/19] alternative: Introduce paravirt interface QEMU_CFG_PCI_WINDOW
Qemu already calculates the 32-bit and 64-bit PCI starting offsets based on initial memory and hotplug-able dimms. This info needs to be passed to Seabios for PCI initialization. Signed-off-by: Vasilis Liaskovitis --- docs/specs/fwcfg.txt |9 + hw/fw_cfg.h |1 + hw/pc_piix.c | 10 ++ 3 files changed, 20 insertions(+), 0 deletions(-) diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt index 55f96d9..d9fa215 100644 --- a/docs/specs/fwcfg.txt +++ b/docs/specs/fwcfg.txt @@ -26,3 +26,12 @@ Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms) The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains the physical address offset, size (in bytes), and node proximity for the respective dimm. + +FW_CFG_PCI_WINDOW paravirt info + +QEMU passes the starting address for the 32-bit and 64-bit PCI windows to BIOS. +The following layouts are followed: + + +pcimem32_start | pcimem64_start | + diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index 856bf91..6c8c151 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -27,6 +27,7 @@ #define FW_CFG_SETUP_SIZE 0x17 #define FW_CFG_SETUP_DATA 0x18 #define FW_CFG_FILE_DIR 0x19 +#define FW_CFG_PCI_WINDOW 0x1a #define FW_CFG_FILE_FIRST 0x20 #define FW_CFG_FILE_SLOTS 0x10 diff --git a/hw/pc_piix.c b/hw/pc_piix.c index d1fd276..034761f 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -44,6 +44,7 @@ #include "memory.h" #include "exec-memory.h" #include "dimm.h" +#include "fw_cfg.h" #ifdef CONFIG_XEN # include #endif @@ -149,6 +150,7 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *pci_memory; MemoryRegion *rom_memory; void *fw_cfg = NULL; +uint64_t *pci_window_fw_cfg; pc_cpus_init(cpu_model); @@ -205,6 +207,14 @@ static void pc_init1(MemoryRegion *system_memory, ? 0 : ((uint64_t)1 << 62)), pci_memory, ram_memory); + +pci_window_fw_cfg = g_malloc0(2 * 8); +pci_window_fw_cfg[0] = cpu_to_le64(below_4g_mem_size + +below_4g_hp_mem_size); +pci_window_fw_cfg[1] = cpu_to_le64(0x1ULL + above_4g_mem_size ++ above_4g_hp_mem_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW, +(uint8_t *)pci_window_fw_cfg, 2 * 8); } else { pci_bus = NULL; i440fx_state = NULL; -- 1.7.9
[Qemu-devel] [RFC PATCH v3 04/19][SeaBIOS] acpi: generate hotplug memory devices
The memory device generation is guided by qemu paravirt info. Seabios first uses the info to setup SRAT entries for the hotplug-able memory slots. Afterwards, build_memssdt uses the created SRAT entries to generate appropriate memory device objects. One memory device (and corresponding SRAT entry) is generated for each hotplug-able qemu memslot. Currently no SSDT memory device is created for initial system memory. We only support up to 255 DIMMs for now (PackageOp used for the MEON array can only describe an array of at most 255 elements. VarPackageOp would be needed to support more than 255 devices) v1->v2: Seabios reads mems_sts from qemu to build e820_map SSDT size and some offsets are calculated with extraction macros. v2->v3: Minor name changes Signed-off-by: Vasilis Liaskovitis --- src/acpi.c | 158 +-- 1 files changed, 152 insertions(+), 6 deletions(-) diff --git a/src/acpi.c b/src/acpi.c index 6d239fa..1223b52 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -13,6 +13,7 @@ #include "pci_regs.h" // PCI_INTERRUPT_LINE #include "ioport.h" // inl #include "paravirt.h" // qemu_cfg_irq0_override +#include "memmap.h" // /* ACPI tables init */ @@ -416,11 +417,26 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes) #define PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) #define PCI_SLOTS 32 +/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */ +#define MEM_BASE 0xaf80 +#define MEM_AML (ssdm_mem_aml + *ssdt_mem_start) +#define MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start) +#define MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2) +#define MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start) +#define MEM_OFFSET_PXM 31 +#define MEM_OFFSET_START 55 +#define MEM_OFFSET_END 63 +#define MEM_OFFSET_SIZE 79 + +u64 nb_hp_memslots = 0; +struct srat_memory_affinity *mem; + #define SSDT_SIGNATURE 0x54445353 // SSDT #define SSDT_HEADER_LENGTH 36 #include "ssdt-susp.hex" #include "ssdt-pcihp.hex" +#include "ssdt-mem.hex" #define PCI_RMV_BASE 0xae0c @@ -472,6 +488,111 @@ static void patch_pcihp(int slot, u8 *ssdt_ptr, u32 eject) } } +static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node) +{ +memcpy(ssdt_ptr, MEM_AML, MEM_SIZEOF); +ssdt_ptr[MEM_OFFSET_HEX] = getHex(i >> 4); +ssdt_ptr[MEM_OFFSET_HEX+1] = getHex(i); +ssdt_ptr[MEM_OFFSET_ID] = i; +ssdt_ptr[MEM_OFFSET_PXM] = node; +*(u64*)(ssdt_ptr + MEM_OFFSET_START) = mem_base; +*(u64*)(ssdt_ptr + MEM_OFFSET_END) = mem_base + mem_len; +*(u64*)(ssdt_ptr + MEM_OFFSET_SIZE) = mem_len; +} + +static void* +build_memssdt(void) +{ +u64 mem_base; +u64 mem_len; +u8 node; +int i; +struct srat_memory_affinity *entry = mem; +u64 nb_memdevs = nb_hp_memslots; +u8 memslot_status, enabled; + +int length = ((1+3+4) + + (nb_memdevs * MEM_SIZEOF) + + (1+2+5+(12*nb_memdevs)) + + (6+2+1+(1*nb_memdevs))); +u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length); +if (! ssdt) { +warn_noalloc(); +return NULL; +} +u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header); + +// build Scope(_SB_) header +*(ssdt_ptr++) = 0x10; // ScopeOp +ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3); +*(ssdt_ptr++) = '_'; +*(ssdt_ptr++) = 'S'; +*(ssdt_ptr++) = 'B'; +*(ssdt_ptr++) = '_'; + +for (i = 0; i < nb_memdevs; i++) { +mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low); +mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low); +node = entry->proximity[0]; +build_memdev(ssdt_ptr, i, mem_base, mem_len, node); +ssdt_ptr += MEM_SIZEOF; +entry++; +} + +// build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}" +*(ssdt_ptr++) = 0x14; // MethodOp +ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2); +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'T'; +*(ssdt_ptr++) = 'F'; +*(ssdt_ptr++) = 'Y'; +*(ssdt_ptr++) = 0x02; +for (i=0; i> 4); +*(ssdt_ptr++) = getHex(i); +*(ssdt_ptr++) = 0x69; // Arg1Op +} + +// build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })" +*(ssdt_ptr++) = 0x08; // NameOp +*(ssdt_ptr++) = 'M'; +*(ssdt_ptr++) = 'E'; +*(ssdt_ptr++) = 'O'; +*(ssdt_ptr++) = 'N'; +*(ssdt_ptr++) = 0x12; // PackageOp +ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2); +*(ssdt_ptr++) = nb_memdevs; + +entry = mem; +memslot_status = 0; + +for (i = 0; i < nb_memdevs; i++) { +enabled = 0; +if (i