Re: [PATCH] yank: Avoid linking into executables that don't want it

2021-03-22 Thread Thomas Huth

On 16/03/2021 14.59, Markus Armbruster wrote:

util/yank.c and stubs/yank.c are both in libqemuutil.a, even though
their external symbols conflict.  The linker happens to pick the
former.  This links a bunch of unneeded code into the executables that
actually want the latter: qemu-io, qemu-img, qemu-nbd, and several
tests.  Amazingly, none of them fails to link.

To fix this, the non-stub yank.c from sourceset util_ss to sourceset
qmp_ss.  This requires moving it from util/ to monitor/.


In another patch ("tests: Use the normal yank code instead of stubs in 
relevant tests"), Lukas now changed the tests to always explicitly link 
against the real yank.c code. That makes me wonder whether we need the yank 
stubs at all ... it's not that much code after all, and it's very much 
self-contained without references to other files, so I think it should also 
be ok if we simply always keep it in the utils library and ditch the stubs?


 Thomas




Re: [PATCH 1/5] tests: Use the normal yank code instead of stubs in relevant tests

2021-03-22 Thread Thomas Huth

On 22/03/2021 18.48, Lukas Straub wrote:

On Mon, 22 Mar 2021 17:00:23 +0100
Thomas Huth  wrote:


On 22/03/2021 08.35, Lukas Straub wrote:

On Mon, 22 Mar 2021 06:20:50 +0100
Thomas Huth  wrote:
   

On 22/03/2021 00.31, Lukas Straub wrote:

Use the normal yank code instead of stubs in relevant tests to
increase coverage and to ensure that registering and unregistering
of yank instances and functions is done correctly.

Signed-off-by: Lukas Straub 
---
tests/qtest/meson.build | 6 +++---
tests/unit/meson.build  | 4 ++--
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 66ee9fbf45..40e1f495f7 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -234,9 +234,9 @@ tpmemu_files = ['tpm-emu.c', 'tpm-util.c', 'tpm-tests.c']
qtests = {
  'bios-tables-test': [io, 'boot-sector.c', 'acpi-utils.c', 'tpm-emu.c'],
  'cdrom-test': files('boot-sector.c'),
-  'dbus-vmstate-test': files('migration-helpers.c') + dbus_vmstate1,
+  'dbus-vmstate-test': ['migration-helpers.c', dbus_vmstate1, 
'../../monitor/yank.c'],
  'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
-  'migration-test': files('migration-helpers.c'),
+  'migration-test': ['migration-helpers.c', io, '../../monitor/yank.c'],
  'pxe-test': files('boot-sector.c'),
  'qos-test': [chardev, io, qos_test_ss.apply(config_host, strict: 
false).sources()],
  'tpm-crb-swtpm-test': [io, tpmemu_files],


Is this really necessary for the qtests? I can understand the change for the
unit tests, but the qtests are separate programs where I could not imagine
that they use the yank functions in any way?


Yes, it is necessary. While the yank functions are not called in these tests,
it still checks that registering and unregistering of yank instances and
functions is done correctly. I.e. That no yank functions are registered before
the instance, that the yank instance is only unregistered after all functions
where unregistered, that the same instance is not registered twice and that
the yank instance actually exists before it is unregistered.


Now you even confused me more. Could you elaborate a little bit? If none of
the functions are called by the test, which part of yank.c is excercised
here at all? Could you give a more detailed example? The only thing I could
imagine is yank_init(), but that does not look like something we need to
check in a qtest ?


Oh, sorry. I meant yank's concept of a yank function here. It works this way:
The different subsystems first register a yank instance. So in this case
when starting migration in the test, the migration code first registers a
yank instance. Then, it registers _yank functions_ with this instance, for
for example to shutdown a socket.


But these are the qtest, separate stand-alone programs. The migration code 
of QEMU (i.e. the code in the main "migration" folder) is not linked into 
these binaries. Doing something like:


 grep -r yank tests/qtest/migration-test

should give you zero results. Thus it IMHO does not make sense to add the 
yank.c to these tests here.


Having said that, it seems like the qos-test is linking against the chardev 
code and thus might use indirectly the yank code there. So you maybe might 
want to add it to the qos-test instead?


 Thomas




[PATCH v1 8/8] hw/arm/virt: add ITS support in virt GIC

2021-03-22 Thread Shashi Mallela
Included creation of ITS as part of virt platform GIC
initialization.This Emulated ITS model now co-exists with kvm
ITS and is enabled in absence of kvm irq kernel support in a
platform.

Signed-off-by: Shashi Mallela 
---
 hw/arm/virt.c| 10 --
 target/arm/kvm_arm.h |  4 ++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index aa2bbd14e0..77cf2db90f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -622,7 +622,7 @@ static void create_v2m(VirtMachineState *vms)
 vms->msi_controller = VIRT_MSI_CTRL_GICV2M;
 }
 
-static void create_gic(VirtMachineState *vms)
+static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
 {
 MachineState *ms = MACHINE(vms);
 /* We create a standalone GIC */
@@ -656,6 +656,12 @@ static void create_gic(VirtMachineState *vms)
  nb_redist_regions);
 qdev_prop_set_uint32(vms->gic, "redist-region-count[0]", 
redist0_count);
 
+if (!kvm_irqchip_in_kernel()) {
+object_property_set_link(OBJECT(vms->gic), "sysmem", OBJECT(mem),
+ _fatal);
+qdev_prop_set_bit(vms->gic, "has-lpi", true);
+}
+
 if (nb_redist_regions == 2) {
 uint32_t redist1_capacity =
 vms->memmap[VIRT_HIGH_GIC_REDIST2].size / 
GICV3_REDIST_SIZE;
@@ -2039,7 +2045,7 @@ static void machvirt_init(MachineState *machine)
 
 virt_flash_fdt(vms, sysmem, secure_sysmem ?: sysmem);
 
-create_gic(vms);
+create_gic(vms, sysmem);
 
 virt_cpu_post_init(vms, sysmem);
 
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 34f8daa377..0613454975 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -525,8 +525,8 @@ static inline const char *its_class_name(void)
 /* KVM implementation requires this capability */
 return kvm_direct_msi_enabled() ? "arm-its-kvm" : NULL;
 } else {
-/* Software emulation is not implemented yet */
-return NULL;
+/* Software emulation based model */
+return "arm-gicv3-its";
 }
 }
 
-- 
2.27.0




[PATCH v1 3/8] hw/intc: GICv3 ITS command queue framework

2021-03-22 Thread Shashi Mallela
Added functionality to trigger ITS command queue processing on
write to CWRITE register and process each command queue entry to
identify the command type and handle commands like MAPD,MAPC,SYNC.

Signed-off-by: Shashi Mallela 
---
 hw/intc/arm_gicv3_its.c | 362 
 1 file changed, 362 insertions(+)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 4895d32e67..9b094e1f0a 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -56,6 +56,362 @@ struct GICv3ITSClass {
 CmdQDesc  cq;
 };
 
+static MemTxResult process_sync(GICv3ITSState *s, uint32_t offset)
+{
+GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
+AddressSpace *as = >gicv3->sysmem_as;
+uint64_t rdbase;
+uint64_t value;
+bool pta = false;
+MemTxResult res = MEMTX_OK;
+
+offset += NUM_BYTES_IN_DW;
+offset += NUM_BYTES_IN_DW;
+
+value = address_space_ldq_le(as, c->cq.base_addr + offset,
+ MEMTXATTRS_UNSPECIFIED, );
+
+if ((s->typer >> GITS_TYPER_PTA_OFFSET) & GITS_TYPER_PTA_MASK) {
+/*
+ * only bits[47:16] are considered instead of bits [51:16]
+ * since with a physical address the target address must be
+ * 64KB aligned
+ */
+rdbase = (value >> RDBASE_OFFSET) & RDBASE_MASK;
+pta = true;
+} else {
+rdbase = (value >> RDBASE_OFFSET) & RDBASE_PROCNUM_MASK;
+}
+
+if (!pta && (rdbase < (s->gicv3->num_cpu))) {
+/*
+ * Current implementation makes a blocking synchronous call
+ * for every command issued earlier,hence the internal state
+ * is already consistent by the time SYNC command is executed.
+ */
+}
+
+offset += NUM_BYTES_IN_DW;
+return res;
+}
+
+static void update_cte(GICv3ITSState *s, uint16_t icid, uint64_t cte)
+{
+GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
+AddressSpace *as = >gicv3->sysmem_as;
+uint64_t value;
+uint8_t  page_sz_type;
+uint64_t l2t_addr;
+bool valid_l2t;
+uint32_t l2t_id;
+uint32_t page_sz = 0;
+uint32_t max_l2_entries;
+
+if (c->ct.indirect) {
+/* 2 level table */
+page_sz_type = (s->baser[0] >>
+GITS_BASER_PAGESIZE_OFFSET) &
+GITS_BASER_PAGESIZE_MASK;
+
+if (page_sz_type == 0) {
+page_sz = GITS_ITT_PAGE_SIZE_0;
+} else if (page_sz_type == 1) {
+page_sz = GITS_ITT_PAGE_SIZE_1;
+} else if (page_sz_type == 2) {
+page_sz = GITS_ITT_PAGE_SIZE_2;
+}
+
+l2t_id = icid / (page_sz / L1TABLE_ENTRY_SIZE);
+
+value = address_space_ldq_le(as,
+ c->ct.base_addr +
+ (l2t_id * L1TABLE_ENTRY_SIZE),
+ MEMTXATTRS_UNSPECIFIED, NULL);
+
+valid_l2t = (value >> VALID_SHIFT) & VALID_MASK;
+
+if (valid_l2t) {
+max_l2_entries = page_sz / c->ct.entry_sz;
+
+l2t_addr = (value >> page_sz_type) &
+((1ULL << (51 - page_sz_type)) - 1);
+
+address_space_write(as, l2t_addr +
+ ((icid % max_l2_entries) * GITS_CTE_SIZE),
+ MEMTXATTRS_UNSPECIFIED,
+ , sizeof(cte));
+}
+} else {
+/* Flat level table */
+address_space_write(as, c->ct.base_addr + (icid * GITS_CTE_SIZE),
+MEMTXATTRS_UNSPECIFIED, ,
+sizeof(cte));
+}
+}
+
+static MemTxResult process_mapc(GICv3ITSState *s, uint32_t offset)
+{
+GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
+AddressSpace *as = >gicv3->sysmem_as;
+uint16_t icid;
+uint64_t rdbase;
+bool valid;
+bool pta = false;
+MemTxResult res = MEMTX_OK;
+uint64_t cte_entry;
+uint64_t value;
+
+offset += NUM_BYTES_IN_DW;
+offset += NUM_BYTES_IN_DW;
+
+value = address_space_ldq_le(as, c->cq.base_addr + offset,
+ MEMTXATTRS_UNSPECIFIED, );
+
+icid = value & ICID_MASK;
+
+if ((s->typer >> GITS_TYPER_PTA_OFFSET) & GITS_TYPER_PTA_MASK) {
+/*
+ * only bits[47:16] are considered instead of bits [51:16]
+ * since with a physical address the target address must be
+ * 64KB aligned
+ */
+rdbase = (value >> RDBASE_OFFSET) & RDBASE_MASK;
+pta = true;
+} else {
+rdbase = (value >> RDBASE_OFFSET) & RDBASE_PROCNUM_MASK;
+}
+
+valid = (value >> VALID_SHIFT) & VALID_MASK;
+
+if (valid) {
+if ((icid > c->ct.max_collids) || (!pta &&
+(rdbase > s->gicv3->num_cpu))) {
+if ((s->typer >> GITS_TYPER_SEIS_OFFSET) &
+ GITS_TYPER_SEIS_MASK) {
+/* Generate System Error here if supported */
+}
+

Re: [PATCH 1/2] hw/intc: GICv3 ITS implementation

2021-03-22 Thread shashi . mallela
Hi Peter,

As per your suggestion, i have split the series into smaller pieces and
shared newer patch-sets for review including cover letter.
Also,have added virt board support and tested the same for
functionality using kvm-unit-tests.

Please ignore this patch and consider the latest series patchset for
review.

Thanks
Shashi

On Tue, 2021-03-16 at 16:15 +, Peter Maydell wrote:
> On Mon, 15 Mar 2021 at 16:49, Shashi Mallela <
> shashi.mall...@linaro.org> wrote:
> > Implementation of Interrupt Translation Service which allows
> > eventid
> > from devices to be translated to physical LPI IntIds.Extended the
> > redistributor functionality to process LPI Interrupts as well.
> > 
> > Signed-off-by: Shashi Mallela 
> > ---
> >  hw/intc/arm_gicv3.c|6 +
> >  hw/intc/arm_gicv3_common.c |   20 +-
> >  hw/intc/arm_gicv3_cpuif.c  |   15 +-
> >  hw/intc/arm_gicv3_dist.c   |   22 +-
> >  hw/intc/arm_gicv3_its.c| 1386
> > 
> >  hw/intc/arm_gicv3_its_common.c |   17 +-
> >  hw/intc/arm_gicv3_its_kvm.c|2 +-
> >  hw/intc/arm_gicv3_redist.c |  163 ++-
> >  hw/intc/gicv3_internal.h   |  169 ++-
> >  hw/intc/meson.build|1 +
> >  include/hw/intc/arm_gicv3_common.h |   13 +
> >  include/hw/intc/arm_gicv3_its_common.h |   12 +-
> >  12 files changed, 1807 insertions(+), 19 deletions(-)
> >  create mode 100644 hw/intc/arm_gicv3_its.c
> 
> Hi; thanks for posting this. Unfortuantely 1800 lines is much
> too large a patch to be reviewable. Could you split the series
> up into smaller pieces, please? One possible structure would be
> to have a patch with the framework of the device but no actual
> implementation of register behaviour or command handling,
> followed by patches which add the behaviour piece by piece,
> and then finally the patch adding it to the board.
> 
> I think it would also be useful to have the virt board
> support, as a demonstration that the emulated ITS and
> the KVM ITS have the same interface to the board code
> and are basically drop-in-replacements.
> 
> Finally, for multi-patch series, please always send a cover letter
> (the "0/5" email, which the other patch emails are followups to;
> git format-patch should do this for you).
> 
> thanks
> -- PMM




[PATCH v1 7/8] hw/arm/sbsa-ref: add ITS support in SBSA GIC

2021-03-22 Thread Shashi Mallela
Included creation of ITS as part of SBSA platform GIC
initialization.

Signed-off-by: Shashi Mallela 
---
 hw/arm/sbsa-ref.c | 26 +---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 88dfb2284c..d05cbcae48 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -35,7 +35,7 @@
 #include "hw/boards.h"
 #include "hw/ide/internal.h"
 #include "hw/ide/ahci_internal.h"
-#include "hw/intc/arm_gicv3_common.h"
+#include "hw/intc/arm_gicv3_its_common.h"
 #include "hw/loader.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/qdev-properties.h"
@@ -65,6 +65,7 @@ enum {
 SBSA_CPUPERIPHS,
 SBSA_GIC_DIST,
 SBSA_GIC_REDIST,
+SBSA_GIC_ITS,
 SBSA_SECURE_EC,
 SBSA_GWDT,
 SBSA_GWDT_REFRESH,
@@ -108,6 +109,7 @@ static const MemMapEntry sbsa_ref_memmap[] = {
 [SBSA_CPUPERIPHS] = { 0x4000, 0x0004 },
 [SBSA_GIC_DIST] =   { 0x4006, 0x0001 },
 [SBSA_GIC_REDIST] = { 0x4008, 0x0400 },
+[SBSA_GIC_ITS] ={ 0x4409, 0x0002 },
 [SBSA_SECURE_EC] =  { 0x5000, 0x1000 },
 [SBSA_GWDT_REFRESH] =   { 0x5001, 0x1000 },
 [SBSA_GWDT_CONTROL] =   { 0x50011000, 0x1000 },
@@ -378,7 +380,20 @@ static void create_secure_ram(SBSAMachineState *sms,
 memory_region_add_subregion(secure_sysmem, base, secram);
 }
 
-static void create_gic(SBSAMachineState *sms)
+static void create_its(SBSAMachineState *sms)
+{
+DeviceState *dev;
+
+dev = qdev_new(TYPE_ARM_GICV3_ITS);
+SysBusDevice *s = SYS_BUS_DEVICE(dev);
+
+object_property_set_link(OBJECT(dev), "parent-gicv3", OBJECT(sms->gic),
+ _abort);
+sysbus_realize_and_unref(s, _fatal);
+sysbus_mmio_map(s, 0, sbsa_ref_memmap[SBSA_GIC_ITS].base);
+}
+
+static void create_gic(SBSAMachineState *sms, MemoryRegion *mem)
 {
 unsigned int smp_cpus = MACHINE(sms)->smp.cpus;
 SysBusDevice *gicbusdev;
@@ -405,6 +420,10 @@ static void create_gic(SBSAMachineState *sms)
 qdev_prop_set_uint32(sms->gic, "len-redist-region-count", 1);
 qdev_prop_set_uint32(sms->gic, "redist-region-count[0]", redist0_count);
 
+object_property_set_link(OBJECT(sms->gic), "sysmem", OBJECT(mem),
+ _fatal);
+qdev_prop_set_bit(sms->gic, "has-lpi", true);
+
 gicbusdev = SYS_BUS_DEVICE(sms->gic);
 sysbus_realize_and_unref(gicbusdev, _fatal);
 sysbus_mmio_map(gicbusdev, 0, sbsa_ref_memmap[SBSA_GIC_DIST].base);
@@ -451,6 +470,7 @@ static void create_gic(SBSAMachineState *sms)
 sysbus_connect_irq(gicbusdev, i + 3 * smp_cpus,
qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
 }
+create_its(sms);
 }
 
 static void create_uart(const SBSAMachineState *sms, int uart,
@@ -763,7 +783,7 @@ static void sbsa_ref_init(MachineState *machine)
 
 create_secure_ram(sms, secure_sysmem);
 
-create_gic(sms);
+create_gic(sms, sysmem);
 
 create_uart(sms, SBSA_UART, sysmem, serial_hd(0));
 create_uart(sms, SBSA_SECURE_UART, secure_sysmem, serial_hd(1));
-- 
2.27.0




[PATCH v1 2/8] hw/intc: GICv3 ITS register definitions added

2021-03-22 Thread Shashi Mallela
Defined descriptors for ITS device table,collection table and ITS
command queue entities.Implemented register read/write functions,
extract ITS table parameters and command queue parameters,extended
gicv3 common to capture qemu address space(which host the ITS table
platform memories required for subsequent ITS processing) and
initialize the same in its device.

Signed-off-by: Shashi Mallela 
---
 hw/intc/arm_gicv3_its.c| 356 
 include/hw/intc/arm_gicv3_common.h |   4 +
 2 files changed, 360 insertions(+)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 34e49b4d63..4895d32e67 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -23,11 +23,179 @@ typedef struct GICv3ITSClass GICv3ITSClass;
 DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
  ARM_GICV3_ITS, TYPE_ARM_GICV3_ITS)
 
+typedef struct {
+bool valid;
+bool indirect;
+uint16_t entry_sz;
+uint32_t max_entries;
+uint32_t max_devids;
+uint64_t base_addr;
+} DevTableDesc;
+
+typedef struct {
+bool valid;
+bool indirect;
+uint16_t entry_sz;
+uint32_t max_entries;
+uint32_t max_collids;
+uint64_t base_addr;
+} CollTableDesc;
+
+typedef struct {
+bool valid;
+uint32_t max_entries;
+uint64_t base_addr;
+} CmdQDesc;
+
 struct GICv3ITSClass {
 GICv3ITSCommonClass parent_class;
 void (*parent_reset)(DeviceState *dev);
+
+DevTableDesc  dt;
+CollTableDesc ct;
+CmdQDesc  cq;
 };
 
+static bool extract_table_params(GICv3ITSState *s, int index)
+{
+GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
+uint16_t num_pages = 0;
+uint8_t  page_sz_type;
+uint8_t type;
+uint32_t page_sz = 0;
+uint64_t value = s->baser[index];
+
+num_pages = (value & GITS_BASER_SIZE);
+page_sz_type = (value >> GITS_BASER_PAGESIZE_OFFSET) &
+GITS_BASER_PAGESIZE_MASK;
+
+if (page_sz_type == 0) {
+page_sz = GITS_ITT_PAGE_SIZE_0;
+} else if (page_sz_type == 0) {
+page_sz = GITS_ITT_PAGE_SIZE_1;
+} else if (page_sz_type == 2) {
+page_sz = GITS_ITT_PAGE_SIZE_2;
+} else {
+return false;
+}
+
+type = (value >> GITS_BASER_TYPE_OFFSET) &
+GITS_BASER_TYPE_MASK;
+
+if (type == GITS_ITT_TYPE_DEVICE) {
+c->dt.valid = (value >> GITS_BASER_VALID) & GITS_BASER_VALID_MASK;
+
+if (c->dt.valid) {
+c->dt.indirect = (value >> GITS_BASER_INDIRECT_OFFSET) &
+   GITS_BASER_INDIRECT_MASK;
+c->dt.entry_sz = (value >> GITS_BASER_ENTRYSIZE_OFFSET) &
+   GITS_BASER_ENTRYSIZE_MASK;
+
+if (!c->dt.indirect) {
+c->dt.max_entries = ((num_pages + 1) * page_sz) /
+   c->dt.entry_sz;
+} else {
+c->dt.max_entries = num_pages + 1) * page_sz) /
+L1TABLE_ENTRY_SIZE) *
+(page_sz / c->dt.entry_sz));
+}
+
+c->dt.max_devids = (1UL << (((value >> GITS_TYPER_DEVBITS_OFFSET) &
+   GITS_TYPER_DEVBITS_MASK) + 1));
+
+if ((page_sz == GITS_ITT_PAGE_SIZE_0) ||
+(page_sz == GITS_ITT_PAGE_SIZE_1)) {
+c->dt.base_addr = (value >> GITS_BASER_PHYADDR_OFFSET) &
+GITS_BASER_PHYADDR_MASK;
+c->dt.base_addr <<= GITS_BASER_PHYADDR_OFFSET;
+} else if (page_sz == GITS_ITT_PAGE_SIZE_2) {
+c->dt.base_addr = ((value >> GITS_BASER_PHYADDR_OFFSETL_64K) &
+   GITS_BASER_PHYADDR_MASKL_64K) <<
+ GITS_BASER_PHYADDR_OFFSETL_64K;
+c->dt.base_addr |= ((value >> GITS_BASER_PHYADDR_OFFSET) &
+GITS_BASER_PHYADDR_MASKH_64K) <<
+ GITS_BASER_PHYADDR_OFFSETH_64K;
+}
+}
+} else if (type == GITS_ITT_TYPE_COLLECTION) {
+c->ct.valid = (value >> GITS_BASER_VALID) & GITS_BASER_VALID_MASK;
+
+/*
+ * GITS_TYPER.HCC is 0 for this implementation
+ * hence writes are discarded if ct.valid is 0
+ */
+if (c->ct.valid) {
+c->ct.indirect = (value >> GITS_BASER_INDIRECT_OFFSET) &
+   GITS_BASER_INDIRECT_MASK;
+c->ct.entry_sz = (value >> GITS_BASER_ENTRYSIZE_OFFSET) &
+GITS_BASER_ENTRYSIZE_MASK;
+
+if (!c->ct.indirect) {
+c->ct.max_entries = ((num_pages + 1) * page_sz) /
+  c->ct.entry_sz;
+} else {
+c->ct.max_entries = num_pages + 1) * page_sz) /
+  

[PATCH v1 1/8] hw/intc: GICv3 ITS initial framework

2021-03-22 Thread Shashi Mallela
Added register definitions relevant to ITS,implemented overall
ITS device framework with stubs for ITS control and translater
regions read/write,extended ITS common to handle mmio init between
existing kvm device and newer qemu device.

Signed-off-by: Shashi Mallela 
---
 hw/intc/arm_gicv3_its.c| 323 
 hw/intc/arm_gicv3_its_common.c |  17 +-
 hw/intc/arm_gicv3_its_kvm.c|   2 +-
 hw/intc/gicv3_internal.h   | 173 ++-
 hw/intc/meson.build|   1 +
 include/hw/intc/arm_gicv3_its_common.h |  12 +-
 6 files changed, 520 insertions(+), 8 deletions(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
new file mode 100644
index 00..34e49b4d63
--- /dev/null
+++ b/hw/intc/arm_gicv3_its.c
@@ -0,0 +1,323 @@
+/*
+ * ITS emulation for a GICv3-based system
+ *
+ * Copyright Linaro.org 2021
+ *
+ * Authors:
+ *  Shashi Mallela 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/intc/arm_gicv3_its_common.h"
+#include "gicv3_internal.h"
+#include "qom/object.h"
+
+typedef struct GICv3ITSClass GICv3ITSClass;
+/* This is reusing the GICv3ITSState typedef from ARM_GICV3_ITS_COMMON */
+DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
+ ARM_GICV3_ITS, TYPE_ARM_GICV3_ITS)
+
+struct GICv3ITSClass {
+GICv3ITSCommonClass parent_class;
+void (*parent_reset)(DeviceState *dev);
+};
+
+static MemTxResult its_trans_writew(GICv3ITSState *s, hwaddr offset,
+   uint64_t value, MemTxAttrs attrs)
+{
+MemTxResult result = MEMTX_OK;
+
+return result;
+}
+
+static MemTxResult its_trans_writel(GICv3ITSState *s, hwaddr offset,
+   uint64_t value, MemTxAttrs attrs)
+{
+MemTxResult result = MEMTX_OK;
+
+return result;
+}
+
+static MemTxResult gicv3_its_trans_write(void *opaque, hwaddr offset,
+   uint64_t data, unsigned size, MemTxAttrs attrs)
+{
+GICv3ITSState *s = (GICv3ITSState *)opaque;
+MemTxResult result;
+
+switch (size) {
+case 2:
+result = its_trans_writew(s, offset, data, attrs);
+break;
+case 4:
+result = its_trans_writel(s, offset, data, attrs);
+break;
+default:
+result = MEMTX_ERROR;
+break;
+}
+
+if (result == MEMTX_ERROR) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: invalid guest write at offset " TARGET_FMT_plx
+  "size %u\n", __func__, offset, size);
+/*
+ * The spec requires that reserved registers are RAZ/WI;
+ * so use MEMTX_ERROR returns from leaf functions as a way to
+ * trigger the guest-error logging but don't return it to
+ * the caller, or we'll cause a spurious guest data abort.
+ */
+result = MEMTX_OK;
+}
+return result;
+}
+
+static MemTxResult gicv3_its_trans_read(void *opaque, hwaddr offset,
+  uint64_t *data, unsigned size, MemTxAttrs attrs)
+{
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: Invalid read from transaction register area at offset "
+TARGET_FMT_plx "\n", __func__, offset);
+return MEMTX_ERROR;
+}
+
+static MemTxResult its_writeb(GICv3ITSState *s, hwaddr offset,
+   uint64_t value, MemTxAttrs attrs)
+{
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: unsupported byte write to register at offset "
+TARGET_FMT_plx "\n", __func__, offset);
+return MEMTX_ERROR;
+}
+
+static MemTxResult its_readb(GICv3ITSState *s, hwaddr offset,
+   uint64_t *data, MemTxAttrs attrs)
+{
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: unsupported byte read from register at offset "
+TARGET_FMT_plx "\n", __func__, offset);
+return MEMTX_ERROR;
+}
+
+static MemTxResult its_writew(GICv3ITSState *s, hwaddr offset,
+   uint64_t value, MemTxAttrs attrs)
+{
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: unsupported word write to register at offset "
+TARGET_FMT_plx "\n", __func__, offset);
+return MEMTX_ERROR;
+}
+
+static MemTxResult its_readw(GICv3ITSState *s, hwaddr offset,
+   uint64_t *data, MemTxAttrs attrs)
+{
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: unsupported word read from register at offset "
+TARGET_FMT_plx "\n", __func__, offset);
+return MEMTX_ERROR;
+}
+
+static MemTxResult its_writel(GICv3ITSState *s, hwaddr offset,
+   uint64_t value, MemTxAttrs attrs)
+{
+MemTxResult result = MEMTX_OK;
+
+return result;
+}
+
+static MemTxResult its_readl(GICv3ITSState *s, hwaddr offset,
+   

[PATCH v1 6/8] hw/intc: GICv3 redistributor ITS processing

2021-03-22 Thread Shashi Mallela
Implemented lpi processing at redistributor to get lpi config info
from lpi configuration table,determine priority,set pending state in
lpi pending table and forward the lpi to cpuif.Added logic to invoke
redistributor lpi processing with translated LPI which set/clear LPI
from ITS device as part of ITS INT,CLEAR,DISCARD command and
GITS_TRANSLATER processing.

Signed-off-by: Shashi Mallela 
---
 hw/intc/arm_gicv3.c|   6 +
 hw/intc/arm_gicv3_cpuif.c  |  15 ++-
 hw/intc/arm_gicv3_its.c|   9 +-
 hw/intc/arm_gicv3_redist.c | 126 
 hw/intc/gicv3_internal.h   |   3 +
 5 files changed, 154 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index 66eaa97198..618fa1af95 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -166,6 +166,12 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
 cs->hppi.grp = gicv3_irq_group(cs->gic, cs, cs->hppi.irq);
 }
 
+if (cs->gic->lpi_enable) {
+if (gicv3_redist_update_lpi(cs)) {
+seenbetter = true;
+}
+}
+
 /* If the best interrupt we just found would preempt whatever
  * was the previous best interrupt before this update, then
  * we know it's definitely the best one now.
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 43ef1d7a84..c225b80f66 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -899,9 +899,14 @@ static void icc_activate_irq(GICv3CPUState *cs, int irq)
 cs->gicr_ipendr0 = deposit32(cs->gicr_ipendr0, irq, 1, 0);
 gicv3_redist_update(cs);
 } else {
-gicv3_gicd_active_set(cs->gic, irq);
-gicv3_gicd_pending_clear(cs->gic, irq);
-gicv3_update(cs->gic, irq, 1);
+if (irq >= GICV3_LPI_INTID_START) {
+gicv3_redist_lpi_pending(cs, irq, 0);
+gicv3_redist_update(cs);
+} else {
+gicv3_gicd_active_set(cs->gic, irq);
+gicv3_gicd_pending_clear(cs->gic, irq);
+gicv3_update(cs->gic, irq, 1);
+}
 }
 }
 
@@ -1337,7 +1342,9 @@ static void icc_eoir_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
  * valid interrupt value read from the Interrupt Acknowledge
  * register" and so this is UNPREDICTABLE. We choose to ignore it.
  */
-return;
+if (!(cs->gic->lpi_enable && (irq >= GICV3_LPI_INTID_START))) {
+return;
+}
 }
 
 if (icc_highest_active_group(cs) != grp) {
diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index de2d179b5e..bb46af92a3 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -262,6 +262,7 @@ static MemTxResult process_int(GICv3ITSState *s, uint64_t 
value,
 bool ite_valid = false;
 uint64_t cte = 0;
 bool cte_valid = false;
+uint64_t rdbase;
 uint8_t buff[GITS_TYPER_ITT_ENTRY_SIZE];
 uint64_t itt_addr;
 
@@ -315,12 +316,18 @@ static MemTxResult process_int(GICv3ITSState *s, uint64_t 
value,
  * since with a physical address the target address must be
  * 64KB aligned
  */
-
+rdbase = (cte >> 1U) & RDBASE_MASK;
 /*
  * Current implementation only supports rdbase == procnum
  * Hence rdbase physical address is ignored
  */
 } else {
+rdbase = (cte >> 1U) & RDBASE_PROCNUM_MASK;
+if ((cmd == CLEAR) || (cmd == DISCARD)) {
+gicv3_redist_process_lpi(>gicv3->cpu[rdbase], pIntid, 0);
+} else {
+gicv3_redist_process_lpi(>gicv3->cpu[rdbase], pIntid, 1);
+}
 
 if (cmd == DISCARD) {
 /* remove mapping from interrupt translation table */
diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index f4d14811ec..dc47ed42d2 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -549,6 +549,132 @@ MemTxResult gicv3_redist_write(void *opaque, hwaddr 
offset, uint64_t data,
 return r;
 }
 
+bool gicv3_redist_update_lpi(GICv3CPUState *cs)
+{
+AddressSpace *as = >gic->sysmem_as;
+uint64_t lpict_baddr, lpipt_baddr;
+uint32_t pendt_size = 0;
+uint8_t lpite;
+uint8_t prio, pend;
+int i;
+bool seenbetter = false;
+
+if ((!cs->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) || !cs->gicr_propbaser ||
+!cs->gicr_pendbaser || cs->lpi_outofrange) {
+return seenbetter;
+}
+
+lpict_baddr = (cs->gicr_propbaser >> GICR_PROPBASER_ADDR_OFFSET) &
+   GICR_PROPBASER_ADDR_MASK;
+lpict_baddr <<= GICR_PROPBASER_ADDR_OFFSET;
+
+lpipt_baddr = (cs->gicr_pendbaser >> GICR_PENDBASER_ADDR_OFFSET) &
+   GICR_PENDBASER_ADDR_MASK;
+lpipt_baddr <<= GICR_PENDBASER_ADDR_OFFSET;
+
+/* Determine the highest priority pending interrupt among LPIs */
+pendt_size = (1UL << ((cs->gicr_propbaser &
+

[PATCH v1 5/8] hw/intc: GICv3 ITS Feature enablement

2021-03-22 Thread Shashi Mallela
Added properties to enable ITS feature and define qemu system
address space memory in gicv3 common,setup distributor and
redistributor registers to indicate LPI support.

Signed-off-by: Shashi Mallela 
---
 hw/intc/arm_gicv3_common.c | 16 +++
 hw/intc/arm_gicv3_dist.c   | 22 +--
 hw/intc/arm_gicv3_redist.c | 29 ++--
 include/hw/intc/arm_gicv3_common.h |  8 ++
 4 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 58ef65f589..3bfc52f7fa 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -156,6 +156,7 @@ static const VMStateDescription vmstate_gicv3_cpu = {
 VMSTATE_UINT32(gicr_waker, GICv3CPUState),
 VMSTATE_UINT64(gicr_propbaser, GICv3CPUState),
 VMSTATE_UINT64(gicr_pendbaser, GICv3CPUState),
+VMSTATE_BOOL(lpi_outofrange, GICv3CPUState),
 VMSTATE_UINT32(gicr_igroupr0, GICv3CPUState),
 VMSTATE_UINT32(gicr_ienabler0, GICv3CPUState),
 VMSTATE_UINT32(gicr_ipendr0, GICv3CPUState),
@@ -227,6 +228,7 @@ static const VMStateDescription vmstate_gicv3 = {
 .priority = MIG_PRI_GICV3,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(gicd_ctlr, GICv3State),
+VMSTATE_UINT32(gicd_typer, GICv3State),
 VMSTATE_UINT32_ARRAY(gicd_statusr, GICv3State, 2),
 VMSTATE_UINT32_ARRAY(group, GICv3State, GICV3_BMP_SIZE),
 VMSTATE_UINT32_ARRAY(grpmod, GICv3State, GICV3_BMP_SIZE),
@@ -381,6 +383,16 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 (1 << 24) |
 (i << 8) |
 (last << 4);
+
+if (s->lpi_enable) {
+s->cpu[i].gicr_typer |= GICR_TYPER_PLPIS;
+
+if (!s->sysmem) {
+error_setg(errp,
+"Redist-ITS: Guest 'sysmem' reference link not set");
+return;
+}
+}
 }
 }
 
@@ -406,6 +418,7 @@ static void arm_gicv3_common_reset(DeviceState *dev)
 cs->gicr_waker = GICR_WAKER_ProcessorSleep | GICR_WAKER_ChildrenAsleep;
 cs->gicr_propbaser = 0;
 cs->gicr_pendbaser = 0;
+cs->lpi_outofrange = false;
 /* If we're resetting a TZ-aware GIC as if secure firmware
  * had set it up ready to start a kernel in non-secure, we
  * need to set interrupts to group 1 so the kernel can use them.
@@ -494,9 +507,12 @@ static Property arm_gicv3_common_properties[] = {
 DEFINE_PROP_UINT32("num-cpu", GICv3State, num_cpu, 1),
 DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
 DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
+DEFINE_PROP_BOOL("has-lpi", GICv3State, lpi_enable, 0),
 DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
 DEFINE_PROP_ARRAY("redist-region-count", GICv3State, nb_redist_regions,
   redist_region_count, qdev_prop_uint32, uint32_t),
+DEFINE_PROP_LINK("sysmem", GICv3State, sysmem, TYPE_MEMORY_REGION,
+ MemoryRegion *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index b65f56f903..96a317a8ef 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -366,12 +366,15 @@ static MemTxResult gicd_readl(GICv3State *s, hwaddr 
offset,
 return MEMTX_OK;
 case GICD_TYPER:
 {
+bool lpi_supported = false;
 /* For this implementation:
  * No1N == 1 (1-of-N SPI interrupts not supported)
  * A3V == 1 (non-zero values of Affinity level 3 supported)
  * IDbits == 0xf (we support 16-bit interrupt identifiers)
  * DVIS == 0 (Direct virtual LPI injection not supported)
- * LPIS == 0 (LPIs not supported)
+ * LPIS == 1 (LPIs are supported if affinity routing is enabled)
+ * num_LPIs == 0b0 (bits [15:11],Number of LPIs as indicated
+ *  by GICD_TYPER.IDbits)
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
  * CPUNumber == 0 since for us ARE is always 1
@@ -385,8 +388,23 @@ static MemTxResult gicd_readl(GICv3State *s, hwaddr offset,
  */
 bool sec_extn = !(s->gicd_ctlr & GICD_CTLR_DS);
 
+/*
+ * With securityextn on,LPIs are supported when affinity routing
+ * is enabled for non-secure state and if off LPIs are supported
+ * when affinity routing is enabled.
+ */
+if (s->lpi_enable) {
+if (sec_extn) {
+lpi_supported = (s->gicd_ctlr & GICD_CTLR_ARE_NS);
+} else {
+lpi_supported = (s->gicd_ctlr & GICD_CTLR_ARE);
+}
+}
+
 *data = (1 << 25) | (1 << 24) | (sec_extn << 10) |
-(0xf << 19) | itlinesnumber;
+(lpi_supported << GICD_TYPER_LPIS_OFFSET) | (GICD_TYPER_IDBITS 

[PATCH v1 0/8] GICv3 LPI and ITS feature implementation

2021-03-22 Thread Shashi Mallela
This patchset implements qemu device model for enabling physical
LPI support and ITS functionality in GIC as per GICv3 specification.
Both flat table and 2 level tables are implemented.The ITS commands
for adding/deleting ITS table entries,trigerring LPI interrupts are 
implemented.Translated LPI interrupt ids are processed by redistributor
to determine priority and set pending state appropriately before
forwarding the same to cpu interface.
The ITS feature support has been added to sbsa-ref platform as well as
virt platform,wherein the emulated functionality co-exists with kvm
kernel functionality.

Shashi Mallela (8):
  hw/intc: GICv3 ITS initial framework
  hw/intc: GICv3 ITS register definitions added
  hw/intc: GICv3 ITS command queue framework
  hw/intc: GICv3 ITS Command processing
  hw/intc: GICv3 ITS Feature enablement
  hw/intc: GICv3 redistributor ITS processing
  hw/arm/sbsa-ref: add ITS support in SBSA GIC
  hw/arm/virt: add ITS support in virt GIC

 hw/arm/sbsa-ref.c  |   26 +-
 hw/arm/virt.c  |   10 +-
 hw/intc/arm_gicv3.c|6 +
 hw/intc/arm_gicv3_common.c |   16 +
 hw/intc/arm_gicv3_cpuif.c  |   15 +-
 hw/intc/arm_gicv3_dist.c   |   22 +-
 hw/intc/arm_gicv3_its.c| 1417 
 hw/intc/arm_gicv3_its_common.c |   17 +-
 hw/intc/arm_gicv3_its_kvm.c|2 +-
 hw/intc/arm_gicv3_redist.c |  155 ++-
 hw/intc/gicv3_internal.h   |  176 ++-
 hw/intc/meson.build|1 +
 include/hw/intc/arm_gicv3_common.h |   14 +
 include/hw/intc/arm_gicv3_its_common.h |   12 +-
 target/arm/kvm_arm.h   |4 +-
 15 files changed, 1869 insertions(+), 24 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_its.c

-- 
2.27.0




[PATCH v1 4/8] hw/intc: GICv3 ITS Command processing

2021-03-22 Thread Shashi Mallela
Added ITS command queue handling for MAPTI,MAPI commands,handled ITS
translation which triggers an LPI via INT command as well as write
to GITS_TRANSLATER register,defined enum to differentiate between ITS
command interrupt trigger and GITS_TRANSLATER based interrupt trigger.
Each of these commands make use of other functionalities implemented to
get device table entry,collection table entry or interrupt translation
table entry required for their processing.

Signed-off-by: Shashi Mallela 
---
 hw/intc/arm_gicv3_its.c| 371 +++-
 include/hw/intc/arm_gicv3_common.h |   2 +
 2 files changed, 372 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 9b094e1f0a..de2d179b5e 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -56,6 +56,158 @@ struct GICv3ITSClass {
 CmdQDesc  cq;
 };
 
+typedef enum ItsCmdType {
+NONE = 0, /* internal indication for GITS_TRANSLATER write */
+CLEAR = 1,
+DISCARD = 2,
+INT = 3,
+} ItsCmdType;
+
+static bool get_cte(GICv3ITSState *s, uint16_t icid, uint64_t *cte)
+{
+GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
+AddressSpace *as = >gicv3->sysmem_as;
+uint8_t  page_sz_type;
+uint64_t l2t_addr;
+uint64_t value;
+bool valid_l2t;
+uint32_t l2t_id;
+uint32_t page_sz = 0;
+uint32_t max_l2_entries;
+bool status = false;
+
+if (c->ct.indirect) {
+/* 2 level table */
+page_sz_type = (s->baser[0] >>
+GITS_BASER_PAGESIZE_OFFSET) &
+GITS_BASER_PAGESIZE_MASK;
+
+if (page_sz_type == 0) {
+page_sz = GITS_ITT_PAGE_SIZE_0;
+} else if (page_sz_type == 1) {
+page_sz = GITS_ITT_PAGE_SIZE_1;
+} else if (page_sz_type == 2) {
+page_sz = GITS_ITT_PAGE_SIZE_2;
+}
+
+l2t_id = icid / (page_sz / L1TABLE_ENTRY_SIZE);
+
+value = address_space_ldq_le(as,
+ c->ct.base_addr +
+ (l2t_id * L1TABLE_ENTRY_SIZE),
+ MEMTXATTRS_UNSPECIFIED, NULL);
+
+valid_l2t = (value >> VALID_SHIFT) & VALID_MASK;
+
+if (valid_l2t) {
+max_l2_entries = page_sz / c->ct.entry_sz;
+
+l2t_addr = (value >> page_sz_type) &
+((1ULL << (51 - page_sz_type)) - 1);
+
+address_space_read(as, l2t_addr +
+ ((icid % max_l2_entries) * GITS_CTE_SIZE),
+ MEMTXATTRS_UNSPECIFIED,
+ cte, sizeof(*cte));
+   }
+} else {
+/* Flat level table */
+address_space_read(as, c->ct.base_addr + (icid * GITS_CTE_SIZE),
+MEMTXATTRS_UNSPECIFIED, cte,
+sizeof(*cte));
+}
+
+if (*cte & VALID_MASK) {
+status = true;
+}
+
+return status;
+}
+
+static bool get_ite(GICv3ITSState *s, uint32_t eventid, uint64_t dte,
+  uint16_t *icid, uint32_t *pIntid)
+{
+AddressSpace *as = >gicv3->sysmem_as;
+uint8_t buff[GITS_TYPER_ITT_ENTRY_SIZE];
+uint64_t itt_addr;
+bool status = false;
+
+itt_addr = (dte >> 6ULL) & ITTADDR_MASK;
+itt_addr <<= ITTADDR_OFFSET; /* 256 byte aligned */
+
+address_space_read(as, itt_addr + (eventid * sizeof(buff)),
+MEMTXATTRS_UNSPECIFIED, ,
+sizeof(buff));
+
+if (buff[0] & VALID_MASK) {
+if ((buff[0] >> 1U) & GITS_TYPER_PHYSICAL) {
+memcpy(pIntid, [1], 3);
+memcpy(icid, [7], sizeof(*icid));
+status = true;
+}
+}
+
+return status;
+}
+
+static uint64_t get_dte(GICv3ITSState *s, uint32_t devid)
+{
+GICv3ITSClass *c = ARM_GICV3_ITS_GET_CLASS(s);
+AddressSpace *as = >gicv3->sysmem_as;
+uint8_t  page_sz_type;
+uint64_t l2t_addr;
+uint64_t value;
+bool valid_l2t;
+uint32_t l2t_id;
+uint32_t page_sz = 0;
+uint32_t max_l2_entries;
+
+if (c->ct.indirect) {
+/* 2 level table */
+page_sz_type = (s->baser[0] >>
+GITS_BASER_PAGESIZE_OFFSET) &
+GITS_BASER_PAGESIZE_MASK;
+
+if (page_sz_type == 0) {
+page_sz = GITS_ITT_PAGE_SIZE_0;
+} else if (page_sz_type == 1) {
+page_sz = GITS_ITT_PAGE_SIZE_1;
+} else if (page_sz_type == 2) {
+page_sz = GITS_ITT_PAGE_SIZE_2;
+}
+
+l2t_id = devid / (page_sz / L1TABLE_ENTRY_SIZE);
+
+value = address_space_ldq_le(as,
+ c->dt.base_addr +
+ (l2t_id * L1TABLE_ENTRY_SIZE),
+ MEMTXATTRS_UNSPECIFIED, NULL);
+
+valid_l2t = (value >> VALID_SHIFT) & VALID_MASK;
+
+if (valid_l2t) {
+max_l2_entries = page_sz / 

Re: [PATCH] MAINTAINERS: replace Huawei's email to personal one

2021-03-22 Thread Dongjiu Geng






ping...sorry for the noise.








On 3/11/2021 19:29,Dongjiu Geng wrote: 


In order to conveniently receive email, replace the Huaweiemail address with my personal one.Signed-off-by: Dongjiu Geng --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)diff --git a/MAINTAINERS b/MAINTAINERSindex e04ae21..823b98b 100644--- a/MAINTAINERS+++ b/MAINTAINERS@@ -1711,7 +1711,7 @@ F: tests/qtest/acpi-utils.[hc] F: tests/data/acpi/  ACPI/HEST/GHES-R: Dongjiu Geng +R: Dongjiu Geng  R: Xiang Zheng  L: qemu-...@nongnu.org S: Maintained-- 2.7.4




Re: [RFC] adding a generic QAPI event for failed device hotunplug

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 01:06:53PM +0100, Paolo Bonzini wrote:
> On 22/03/21 07:39, David Gibson wrote:
> > > QEMU doesn't really keep track of "in flight" unplug requests, and as
> > > long as that's the case, its timeout even will have the same issue.
> > Not generically, maybe.  In the PAPR code we effectively do, by means
> > of the 'unplug_requested' boolean in the DRC structure.  Maybe that's
> > a mistake, but at the time I couldn't see how else to handle things.
> 
> No, that's good.  x86 also tracks it in some registers that are accessible
> from the ACPI firmware.  See "PCI slot removal notification" in
> docs/specs/acpi_pci_hotplug.txt.
> 
> > Currently we will resolve all "in flight" requests at machine reset
> > time, effectively completing those requests.  Does that differ from
> > x86 behaviour?
> 
> IIRC on x86 the requests are instead cancelled, but I'm not 100%
> sure.

Ah... we'd better check that and try to make ppc consistent with
whatever it does.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH qemu v16] spapr: Implement Open Firmware client interface

2021-03-22 Thread Alexey Kardashevskiy
The PAPR platform which describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..1 - stack
40.. - kernel
3ea.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This is coded in assumption that later on we might be adding support for
booting from QEMU backends (blockdev is the first candidate) without
devices/drivers in between as OF1275 does not require that and
it is quite easy to so.

Signed-off-by: Alexey Kardashevskiy 
---

The example command line is:

/home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 2G \
-machine 
pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
 \
-kernel pbuild/kernel-le-guest/vmlinux \
-initrd pb/rootfs.cpio.xz \
-drive 
id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \
-global spapr-nvram.drive=DRIVE0 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
-mon chardev=SOCKET0,mode=control

---
Changes:
v16:
* rebased on dwg/ppc-for-6.1
* s/SpaprVofInterface/VofMachineInterface/

v15:
* bugfix: claimed memory for the VOF itself
* ditched OF_STACK_ADDR and allocate one instead, now it starts from 0x8000
because it is aligned to its size (no particular reason though)
* coding style
* moved nvram.bin up one level
* ditched bool in the firmware
* made debugging code conditional using trace_event_get_state() + 
qemu_loglevel_mask()
* renamed the CAS interface to SpaprVofInterface
* added "write" which for now dumps the message and ihandle via
trace point for early debug assistance
* commented on when we allocate of_instances in vof_build_dt()
* store fw_size is SpaprMachine to let spapr_vof_reset() claim it
* many small fixes from v14's review

v14:
* check for truncates in readstr()
* ditched a separate vof_reset()
* spapr->vof is a pointer now, dropped the "on" field
* removed rtas_base from vof and updated comment why we allow setting it
* added myself to maintainers
* updated commit log about blockdev and other 

Re: [PATCH 1/9] tpm: Changed a qemu_mutex_lock to QEMU_LOCK_GUARD

2021-03-22 Thread Stefan Berger



On 3/10/21 10:15 PM, Mahmoud Mandour wrote:

Removed a qemu_mutex_lock() and its respective qemu_mutex_unlock()
and used QEMU_LOCK_GUARD instead. This simplifies the code by
eliminiating gotos and removing the qemu_mutex_unlock() calls.

Signed-off-by: Mahmoud Mandour 
---
  backends/tpm/tpm_emulator.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index a012adc193..a3c041e402 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -126,7 +126,7 @@ static int tpm_emulator_ctrlcmd(TPMEmulator *tpm, unsigned 
long cmd, void *msg,
  uint8_t *buf = NULL;
  int ret = -1;



ret is not needed anymore




-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);

  buf = g_alloca(n);
  memcpy(buf, _no, sizeof(cmd_no));
@@ -134,20 +134,18 @@ static int tpm_emulator_ctrlcmd(TPMEmulator *tpm, 
unsigned long cmd, void *msg,

  n = qemu_chr_fe_write_all(dev, buf, n);
  if (n <= 0) {
-goto end;
+return ret;
  }

  if (msg_len_out != 0) {
  n = qemu_chr_fe_read_all(dev, msg, msg_len_out);
  if (n <= 0) {
-goto end;
+return ret;
  }
  }

  ret = 0;

-end:
-qemu_mutex_unlock(>mutex);
  return ret;



return 0;




  }





Re: [RFC PATCH 0/3] vfio/migration: Support manual clear vfio dirty log

2021-03-22 Thread Kunkun Jiang

On 2021/3/18 20:36, Tian, Kevin wrote:

From: Kunkun Jiang 
Sent: Thursday, March 18, 2021 8:29 PM

Hi Kevin,

On 2021/3/18 17:04, Tian, Kevin wrote:

From: Kunkun Jiang 
Sent: Thursday, March 18, 2021 3:59 PM

Hi Kevin,

On 2021/3/18 14:28, Tian, Kevin wrote:

From: Kunkun Jiang
Sent: Wednesday, March 10, 2021 5:41 PM

Hi all,

In the past, we clear dirty log immediately after sync dirty log to
userspace. This may cause redundant dirty handling if userspace
handles dirty log iteratively:

After vfio clears dirty log, new dirty log starts to generate. These
new dirty log will be reported to userspace even if they are generated
before userspace handles the same dirty page.

Since a new dirty log tracking method for vfio based on iommu

hwdbm[1]

has been introduced in the kernel and added a new capability named
VFIO_DIRTY_LOG_MANUAL_CLEAR, we can eliminate some redundant

dirty

handling by supporting it.

Is there any performance data showing the benefit of this new method?


Current dirty log tracking method for VFIO:
[1] All pages marked dirty if not all iommu_groups have pinned_scope
[2] pinned pages by various vendor drivers if all iommu_groups have
pinned scope

Both methods are coarse-grained and can not determine which pages are
really dirty. Each round may mark the pages that are not really dirty as
dirty
and send them to the destination. ( It might be better if the range of the
pinned_scope was smaller. ) This will result in a waste of resources.

HWDBM is short for Hardware Dirty Bit Management.
(e.g. smmuv3 HTTU, Hardware Translation Table Update)

About SMMU HTTU:
HTTU is a feature of ARM SMMUv3, it can update access flag or/and dirty
state of the TTD (Translation Table Descriptor) by hardware.

With HTTU, stage1 TTD is classified into 3 types:
            DBM bit AP[2](readonly bit)
1. writable_clean  1    1
2. writable_dirty   1    0
3. readonly   0    1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from
writable_clean to writable_dirty. Then software can scan TTD to sync dirty
state into dirty bitmap. With this feature, we can track the dirty log of
DMA continuously and precisely.

The capability of VFIO_DIRTY_LOG_MANUAL_CLEAR is similar to that on
the KVM side. We add this new log_clear() interface only to split the old
log_sync() into two separated procedures:

- use log_sync() to collect the collection only, and,
- use log_clear() to clear the dirty bitmap.

If you're interested in this new method, you can take a look at our set of
patches.
[1]
https://lore.kernel.org/linux-iommu/20210310090614.26668-1-
zhukeqi...@huawei.com/


I know what you are doing. Intel is also working on VT-d dirty bit support
based on above link. What I'm curious is the actual performance gain
with this optimization. KVM doing that is one good reference, but IOMMU
has different characteristics (e.g. longer invalidation latency) compared to
CPU MMU. It's always good to understand what a so-called optimization
can actually optimize in a context different from where it's originally

proved.

Thanks
Kevin

My understanding is that this is a new method, which is quite different
from the
previous two. So can you explain in more detail what performance data
you want?

Thanks,
Kunkun Jiang

When you have HTTU enabled, compare the migration efficiency with and
without this manual clear interface.

Thanks
Kevin


Hi Kevin,

Sorry for late reply.

I tested it on our FPGA in two scenarios.
[1] perform limited times of DMA on a fixed ram area
[2] perform infinite DMA on a fixed ram area

In scenario [1], we can clearly see that lesser data is being transmitted
with this manual clear interface. For example, a total of 10 DMA are
performed, the amount of transferred data is the sum of 6 times. This
depends on whether the device performs a DMA on the dirty ram
area between log_sync() and log_clear().

In scenario [2], with or without this manual clear interface, it doesn't
make a big difference. Every time we call log_sync(), the fixed ram
area is dirty.

So, in general scenarios, it can reduce the amount of transmitted data.

In addition, regarding the difference between MMU and IOMMU (e.g.
longer invalidation latency) you mentioned last time, I think it has no
effect on this manual clear interface. Currently, the way we invalidate
IOMMU TLB is iommu_flush_iotlb_all(). It takes lesser time than multiple,
range-based invalidation.

Thanks,
Kunkun Jiang




Re: [PATCH 07/11] hw/gpio/avr_gpio: Add tracing for reads and writes

2021-03-22 Thread Niteesh G. S.
Hii Phil,

A gentle reminder to push these patches.

Thanks,
Niteesh.

On Sat, Mar 13, 2021 at 10:51 PM Niteesh G. S.  wrote:

> Reviewed-by: Niteesh G S 
>
> On Sat, Mar 13, 2021 at 10:25 PM Philippe Mathieu-Daudé 
> wrote:
>
>> From: G S Niteesh Babu 
>>
>> Added tracing for gpio read, write, and update output irq.
>>
>> 1) trace_avr_gpio_update_ouput_irq
>> 2) trace_avr_gpio_read
>> 3) trace_avr_gpio_write
>>
>> Signed-off-by: G S Niteesh Babu 
>> Reviewed-by: Michael Rolnik 
>> Message-Id: <20210311135539.10206-3-niteesh...@gmail.com>
>> [PMD: Added port_name(), display port name in trace events]
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/gpio/avr_gpio.c   | 26 +-
>>  hw/gpio/trace-events |  5 +
>>  2 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/gpio/avr_gpio.c b/hw/gpio/avr_gpio.c
>> index e4c7122e62c..29252d6ccfe 100644
>> --- a/hw/gpio/avr_gpio.c
>> +++ b/hw/gpio/avr_gpio.c
>> @@ -2,6 +2,7 @@
>>   * AVR processors GPIO registers emulation.
>>   *
>>   * Copyright (C) 2020 Heecheol Yang 
>> + * Copyright (C) 2021 Niteesh Babu G S 
>>   *
>>   * This program is free software; you can redistribute it and/or
>>   * modify it under the terms of the GNU General Public License as
>> @@ -26,6 +27,12 @@
>>  #include "hw/gpio/avr_gpio.h"
>>  #include "hw/qdev-properties.h"
>>  #include "migration/vmstate.h"
>> +#include "trace.h"
>> +
>> +static char port_name(AVRGPIOState *s)
>> +{
>> +return 'A' + s->id;
>> +}
>>
>>  static void avr_gpio_reset(DeviceState *dev)
>>  {
>> @@ -47,32 +54,41 @@ static void avr_gpio_write_port(AVRGPIOState *s,
>> uint64_t value)
>>
>>  if (cur_ddr_pin_val && (cur_port_pin_val != new_port_pin_val)) {
>>  qemu_set_irq(s->out[pin], new_port_pin_val);
>> +trace_avr_gpio_update_output_irq(port_name(s), pin,
>> new_port_pin_val);
>>  }
>>  }
>>  s->reg.port = value & s->reg.ddr;
>>  }
>>  static uint64_t avr_gpio_read(void *opaque, hwaddr offset, unsigned int
>> size)
>>  {
>> +uint8_t val = 0;
>>  AVRGPIOState *s = (AVRGPIOState *)opaque;
>>  switch (offset) {
>>  case GPIO_PIN:
>> -return s->reg.pin;
>> +val = s->reg.pin;
>> +break;
>>  case GPIO_DDR:
>> -return s->reg.ddr;
>> +val = s->reg.ddr;
>> +break;
>>  case GPIO_PORT:
>> -return s->reg.port;
>> +val = s->reg.port;
>> +break;
>>  default:
>>  g_assert_not_reached();
>>  break;
>>  }
>> -return 0;
>> +
>> +trace_avr_gpio_read(port_name(s), offset, val);
>> +return val;
>>  }
>>
>>  static void avr_gpio_write(void *opaque, hwaddr offset, uint64_t value,
>>  unsigned int size)
>>  {
>>  AVRGPIOState *s = (AVRGPIOState *)opaque;
>> -value = value & 0xF;
>> +value = value & 0xFF;
>> +
>> +trace_avr_gpio_write(port_name(s), offset, value);
>>  switch (offset) {
>>  case GPIO_PIN:
>>  s->reg.pin = value;
>> diff --git a/hw/gpio/trace-events b/hw/gpio/trace-events
>> index 46ab9323bd0..640834597a8 100644
>> --- a/hw/gpio/trace-events
>> +++ b/hw/gpio/trace-events
>> @@ -18,3 +18,8 @@ sifive_gpio_read(uint64_t offset, uint64_t r) "offset
>> 0x%" PRIx64 " value 0x%" P
>>  sifive_gpio_write(uint64_t offset, uint64_t value) "offset 0x%" PRIx64 "
>> value 0x%" PRIx64
>>  sifive_gpio_set(int64_t line, int64_t value) "line %" PRIi64 " value %"
>> PRIi64
>>  sifive_gpio_update_output_irq(int64_t line, int64_t value) "line %"
>> PRIi64 " value %" PRIi64
>> +
>> +# avr_gpio.c
>> +avr_gpio_read(unsigned id, uint64_t offset, uint64_t r) "port %c offset
>> 0x%" PRIx64 " value 0x%" PRIx64
>> +avr_gpio_write(unsigned id, uint64_t offset, uint64_t value) "port %c
>> offset 0x%" PRIx64 " value 0x%" PRIx64
>> +avr_gpio_update_output_irq(unsigned id, int64_t line, int64_t value)
>> "port %c pin %" PRIi64 " value %" PRIi64
>> --
>> 2.26.2
>>
>>


[PULL 15/16] docs/system: riscv: Add documentation for 'microchip-icicle-kit' machine

2021-03-22 Thread Alistair Francis
From: Bin Meng 

This adds the documentation to describe what is supported for the
'microchip-icicle-kit' machine, and how to boot the machine in QEMU.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
Message-id: 20210322075248.136255-2-bmeng...@gmail.com
Signed-off-by: Alistair Francis 
---
 docs/system/riscv/microchip-icicle-kit.rst | 89 ++
 docs/system/target-riscv.rst   |  1 +
 2 files changed, 90 insertions(+)
 create mode 100644 docs/system/riscv/microchip-icicle-kit.rst

diff --git a/docs/system/riscv/microchip-icicle-kit.rst 
b/docs/system/riscv/microchip-icicle-kit.rst
new file mode 100644
index 00..4fe97bce3f
--- /dev/null
+++ b/docs/system/riscv/microchip-icicle-kit.rst
@@ -0,0 +1,89 @@
+Microchip PolarFire SoC Icicle Kit (``microchip-icicle-kit``)
+=
+
+Microchip PolarFire SoC Icicle Kit integrates a PolarFire SoC, with one
+SiFive's E51 plus four U54 cores and many on-chip peripherals and an FPGA.
+
+For more details about Microchip PolarFire SoC, please see:
+https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga
+
+The Icicle Kit board information can be found here:
+https://www.microsemi.com/existing-parts/parts/152514
+
+Supported devices
+-
+
+The ``microchip-icicle-kit`` machine supports the following devices:
+
+ * 1 E51 core
+ * 4 U54 cores
+ * Core Level Interruptor (CLINT)
+ * Platform-Level Interrupt Controller (PLIC)
+ * L2 Loosely Integrated Memory (L2-LIM)
+ * DDR memory controller
+ * 5 MMUARTs
+ * 1 DMA controller
+ * 2 GEM Ethernet controllers
+ * 1 SDHC storage controller
+
+Boot options
+
+
+The ``microchip-icicle-kit`` machine can start using the standard -bios
+functionality for loading its BIOS image, aka Hart Software Services (HSS_).
+HSS loads the second stage bootloader U-Boot from an SD card. It does not
+support direct kernel loading via the -kernel option. One has to load kernel
+from U-Boot.
+
+The memory is set to 1537 MiB by default which is the minimum required high
+memory size by HSS. A sanity check on ram size is performed in the machine
+init routine to prompt user to increase the RAM size to > 1537 MiB when less
+than 1537 MiB ram is detected.
+
+Boot the machine
+
+
+HSS 2020.12 release is tested at the time of writing. To build an HSS image
+that can be booted by the ``microchip-icicle-kit`` machine, type the following
+in the HSS source tree:
+
+.. code-block:: bash
+
+  $ export CROSS_COMPILE=riscv64-linux-
+  $ cp boards/mpfs-icicle-kit-es/def_config .config
+  $ make BOARD=mpfs-icicle-kit-es
+
+Download the official SD card image released by Microchip and prepare it for
+QEMU usage:
+
+.. code-block:: bash
+
+  $ wget 
ftp://ftpsoc.microsemi.com/outgoing/core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz
+  $ gunzip core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz
+  $ qemu-img resize 
core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic 4G
+
+Then we can boot the machine by:
+
+.. code-block:: bash
+
+  $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 \
+  -bios path/to/hss.bin -sd path/to/sdcard.img \
+  -nic user,model=cadence_gem \
+  -nic tap,ifname=tap,model=cadence_gem,script=no \
+  -display none -serial stdio \
+  -chardev socket,id=serial1,path=serial1.sock,server=on,wait=on \
+  -serial chardev:serial1
+
+With above command line, current terminal session will be used for the first
+serial port. Open another terminal window, and use `minicom` to connect the
+second serial port.
+
+.. code-block:: bash
+
+  $ minicom -D unix\#serial1.sock
+
+HSS output is on the first serial port (stdio) and U-Boot outputs on the
+second serial port. U-Boot will automatically load the Linux kernel from
+the SD card image.
+
+.. _HSS: https://github.com/polarfire-soc/hart-software-services
diff --git a/docs/system/target-riscv.rst b/docs/system/target-riscv.rst
index 94d99c4c82..8d5946fbbb 100644
--- a/docs/system/target-riscv.rst
+++ b/docs/system/target-riscv.rst
@@ -66,6 +66,7 @@ undocumented; you can get a complete list by running
 .. toctree::
:maxdepth: 1
 
+   riscv/microchip-icicle-kit
riscv/sifive_u
 
 RISC-V CPU features
-- 
2.30.1




[PULL 14/16] hw/riscv: microchip_pfsoc: Map EMMC/SD mux register

2021-03-22 Thread Alistair Francis
From: Bin Meng 

Since HSS commit c20a89f8dcac, the Icicle Kit reference design has
been updated to use a register mapped at 0x4f00 instead of a
GPIO to control whether eMMC or SD card is to be used. With this
support the same HSS image can be used for both eMMC and SD card
boot flow, while previously two different board configurations were
used. This is undocumented but one can take a look at the HSS code
HSS_MMCInit() in services/mmc/mmc_api.c.

With this commit, HSS image built from 2020.12 release boots again.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
Message-id: 20210322075248.136255-1-bmeng...@gmail.com
Signed-off-by: Alistair Francis 
---
 include/hw/riscv/microchip_pfsoc.h | 1 +
 hw/riscv/microchip_pfsoc.c | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/include/hw/riscv/microchip_pfsoc.h 
b/include/hw/riscv/microchip_pfsoc.h
index d0c666aae0..d30916f45d 100644
--- a/include/hw/riscv/microchip_pfsoc.h
+++ b/include/hw/riscv/microchip_pfsoc.h
@@ -109,6 +109,7 @@ enum {
 MICROCHIP_PFSOC_ENVM_DATA,
 MICROCHIP_PFSOC_QSPI_XIP,
 MICROCHIP_PFSOC_IOSCB,
+MICROCHIP_PFSOC_EMMC_SD_MUX,
 MICROCHIP_PFSOC_DRAM_LO,
 MICROCHIP_PFSOC_DRAM_LO_ALIAS,
 MICROCHIP_PFSOC_DRAM_HI,
diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index 266f1c3342..c4146b7a6b 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -122,6 +122,7 @@ static const MemMapEntry microchip_pfsoc_memmap[] = {
 [MICROCHIP_PFSOC_ENVM_DATA] =   { 0x2022,0x2 },
 [MICROCHIP_PFSOC_QSPI_XIP] ={ 0x2100,  0x100 },
 [MICROCHIP_PFSOC_IOSCB] =   { 0x3000, 0x1000 },
+[MICROCHIP_PFSOC_EMMC_SD_MUX] = { 0x4f00,0x4 },
 [MICROCHIP_PFSOC_DRAM_LO] = { 0x8000, 0x4000 },
 [MICROCHIP_PFSOC_DRAM_LO_ALIAS] =   { 0xc000, 0x4000 },
 [MICROCHIP_PFSOC_DRAM_HI] =   { 0x10,0x0 },
@@ -411,6 +412,11 @@ static void microchip_pfsoc_soc_realize(DeviceState *dev, 
Error **errp)
 sysbus_mmio_map(SYS_BUS_DEVICE(>ioscb), 0,
 memmap[MICROCHIP_PFSOC_IOSCB].base);
 
+/* eMMC/SD mux */
+create_unimplemented_device("microchip.pfsoc.emmc_sd_mux",
+memmap[MICROCHIP_PFSOC_EMMC_SD_MUX].base,
+memmap[MICROCHIP_PFSOC_EMMC_SD_MUX].size);
+
 /* QSPI Flash */
 memory_region_init_rom(qspi_xip_mem, OBJECT(dev),
"microchip.pfsoc.qspi_xip",
-- 
2.30.1




[PULL 13/16] hw/block: m25p80: Support fast read for SST flashes

2021-03-22 Thread Alistair Francis
From: Bin Meng 

Per SST25VF016B datasheet [1], SST flash requires a dummy byte after
the address bytes. Note only SPI mode is supported by SST flashes.

[1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf

Signed-off-by: Bin Meng 
Acked-by: Alistair Francis 
Message-id: 20210306060152.7250-1-bmeng...@gmail.com
Signed-off-by: Alistair Francis 
---
 hw/block/m25p80.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 5f9471d83c..183d3f44c2 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -895,6 +895,9 @@ static void decode_fast_read_cmd(Flash *s)
 s->needed_bytes = get_addr_length(s);
 switch (get_man(s)) {
 /* Dummy cycles - modeled with bytes writes instead of bits */
+case MAN_SST:
+s->needed_bytes += 1;
+break;
 case MAN_WINBOND:
 s->needed_bytes += 8;
 break;
-- 
2.30.1




[PULL 09/16] hw/riscv: Add fw_cfg support to virt

2021-03-22 Thread Alistair Francis
From: Asherah Connor 

Provides fw_cfg for the virt machine on riscv.  This enables
using e.g.  ramfb later.

Signed-off-by: Asherah Connor 
Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Message-id: 20210318235041.17175-2-a...@kivikakk.ee
Signed-off-by: Alistair Francis 
---
 include/hw/riscv/virt.h |  2 ++
 hw/riscv/virt.c | 30 ++
 hw/riscv/Kconfig|  1 +
 3 files changed, 33 insertions(+)

diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index 632da52018..349fee1f89 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -40,6 +40,7 @@ struct RISCVVirtState {
 RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
 DeviceState *plic[VIRT_SOCKETS_MAX];
 PFlashCFI01 *flash[2];
+FWCfgState *fw_cfg;
 
 int fdt_size;
 };
@@ -53,6 +54,7 @@ enum {
 VIRT_PLIC,
 VIRT_UART0,
 VIRT_VIRTIO,
+VIRT_FW_CFG,
 VIRT_FLASH,
 VIRT_DRAM,
 VIRT_PCIE_MMIO,
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 0b39101a5e..e96ec4cbbc 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -53,6 +53,7 @@ static const MemMapEntry virt_memmap[] = {
 [VIRT_PLIC] ={  0xc00, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
 [VIRT_UART0] =   { 0x1000, 0x100 },
 [VIRT_VIRTIO] =  { 0x10001000,0x1000 },
+[VIRT_FW_CFG] =  { 0x1010,  0x18 },
 [VIRT_FLASH] =   { 0x2000, 0x400 },
 [VIRT_PCIE_ECAM] =   { 0x3000,0x1000 },
 [VIRT_PCIE_MMIO] =   { 0x4000,0x4000 },
@@ -507,6 +508,28 @@ static inline DeviceState *gpex_pcie_init(MemoryRegion 
*sys_mem,
 return dev;
 }
 
+static FWCfgState *create_fw_cfg(const MachineState *mc)
+{
+hwaddr base = virt_memmap[VIRT_FW_CFG].base;
+hwaddr size = virt_memmap[VIRT_FW_CFG].size;
+FWCfgState *fw_cfg;
+char *nodename;
+
+fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16,
+  _space_memory);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)mc->smp.cpus);
+
+nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
+qemu_fdt_add_subnode(mc->fdt, nodename);
+qemu_fdt_setprop_string(mc->fdt, nodename,
+"compatible", "qemu,fw-cfg-mmio");
+qemu_fdt_setprop_sized_cells(mc->fdt, nodename, "reg",
+ 2, base, 2, size);
+qemu_fdt_setprop(mc->fdt, nodename, "dma-coherent", NULL, 0);
+g_free(nodename);
+return fw_cfg;
+}
+
 static void virt_machine_init(MachineState *machine)
 {
 const MemMapEntry *memmap = virt_memmap;
@@ -688,6 +711,13 @@ static void virt_machine_init(MachineState *machine)
 start_addr = virt_memmap[VIRT_FLASH].base;
 }
 
+/*
+ * Init fw_cfg.  Must be done before riscv_load_fdt, otherwise the device
+ * tree cannot be altered and we get FDT_ERR_NOSPACE.
+ */
+s->fw_cfg = create_fw_cfg(machine);
+rom_set_fw(s->fw_cfg);
+
 /* Compute the fdt load address in dram */
 fdt_load_addr = riscv_load_fdt(memmap[VIRT_DRAM].base,
machine->ram_size, machine->fdt);
diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index d139074b02..1de18cdcf1 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -33,6 +33,7 @@ config RISCV_VIRT
 select SIFIVE_PLIC
 select SIFIVE_TEST
 select VIRTIO_MMIO
+select FW_CFG_DMA
 
 config SIFIVE_E
 bool
-- 
2.30.1




[PULL 07/16] target/riscv: Make VSTIP and VSEIP read-only in hip

2021-03-22 Thread Alistair Francis
From: Georg Kotheimer 

Signed-off-by: Georg Kotheimer 
Reviewed-by: Alistair Francis 
Message-id: 20210311094902.1377593-1-georg.kothei...@kernkonzept.com
Signed-off-by: Alistair Francis 
---
 target/riscv/csr.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d2ae73e4a0..a9dba7f736 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -420,7 +420,8 @@ static const target_ulong sstatus_v1_10_mask = SSTATUS_SIE 
| SSTATUS_SPIE |
 SSTATUS_UIE | SSTATUS_UPIE | SSTATUS_SPP | SSTATUS_FS | SSTATUS_XS |
 SSTATUS_SUM | SSTATUS_MXR | SSTATUS_SD;
 static const target_ulong sip_writable_mask = SIP_SSIP | MIP_USIP | MIP_UEIP;
-static const target_ulong hip_writable_mask = MIP_VSSIP | MIP_VSTIP | 
MIP_VSEIP;
+static const target_ulong hip_writable_mask = MIP_VSSIP;
+static const target_ulong hvip_writable_mask = MIP_VSSIP | MIP_VSTIP | 
MIP_VSEIP;
 static const target_ulong vsip_writable_mask = MIP_VSSIP;
 
 static const char valid_vm_1_10_32[16] = {
@@ -962,9 +963,9 @@ static int rmw_hvip(CPURISCVState *env, int csrno, 
target_ulong *ret_value,
target_ulong new_value, target_ulong write_mask)
 {
 int ret = rmw_mip(env, 0, ret_value, new_value,
-  write_mask & hip_writable_mask);
+  write_mask & hvip_writable_mask);
 
-*ret_value &= hip_writable_mask;
+*ret_value &= hvip_writable_mask;
 
 return ret;
 }
-- 
2.30.1




[PULL 11/16] target/riscv: Fix read and write accesses to vsip and vsie

2021-03-22 Thread Alistair Francis
From: Georg Kotheimer 

The previous implementation was broken in many ways:
 - Used mideleg instead of hideleg to mask accesses
 - Used MIP_VSSIP instead of VS_MODE_INTERRUPTS to mask writes to vsie
 - Did not shift between S bits and VS bits (VSEIP <-> SEIP, ...)

Signed-off-by: Georg Kotheimer 
Reviewed-by: Alistair Francis 
Message-id: 20210311094738.1376795-1-georg.kothei...@kernkonzept.com
Signed-off-by: Alistair Francis 
---
 target/riscv/csr.c | 68 +++---
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index a9dba7f736..d2585395bf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -749,30 +749,42 @@ static int write_sstatus(CPURISCVState *env, int csrno, 
target_ulong val)
 return write_mstatus(env, CSR_MSTATUS, newval);
 }
 
+static int read_vsie(CPURISCVState *env, int csrno, target_ulong *val)
+{
+/* Shift the VS bits to their S bit location in vsie */
+*val = (env->mie & env->hideleg & VS_MODE_INTERRUPTS) >> 1;
+return 0;
+}
+
 static int read_sie(CPURISCVState *env, int csrno, target_ulong *val)
 {
 if (riscv_cpu_virt_enabled(env)) {
-/* Tell the guest the VS bits, shifted to the S bit locations */
-*val = (env->mie & env->mideleg & VS_MODE_INTERRUPTS) >> 1;
+read_vsie(env, CSR_VSIE, val);
 } else {
 *val = env->mie & env->mideleg;
 }
 return 0;
 }
 
-static int write_sie(CPURISCVState *env, int csrno, target_ulong val)
+static int write_vsie(CPURISCVState *env, int csrno, target_ulong val)
 {
-target_ulong newval;
+/* Shift the S bits to their VS bit location in mie */
+target_ulong newval = (env->mie & ~VS_MODE_INTERRUPTS) |
+  ((val << 1) & env->hideleg & VS_MODE_INTERRUPTS);
+return write_mie(env, CSR_MIE, newval);
+}
 
+static int write_sie(CPURISCVState *env, int csrno, target_ulong val)
+{
 if (riscv_cpu_virt_enabled(env)) {
-/* Shift the guests S bits to VS */
-newval = (env->mie & ~VS_MODE_INTERRUPTS) |
- ((val << 1) & VS_MODE_INTERRUPTS);
+write_vsie(env, CSR_VSIE, val);
 } else {
-newval = (env->mie & ~S_MODE_INTERRUPTS) | (val & S_MODE_INTERRUPTS);
+target_ulong newval = (env->mie & ~S_MODE_INTERRUPTS) |
+  (val & S_MODE_INTERRUPTS);
+write_mie(env, CSR_MIE, newval);
 }
 
-return write_mie(env, CSR_MIE, newval);
+return 0;
 }
 
 static int read_stvec(CPURISCVState *env, int csrno, target_ulong *val)
@@ -853,17 +865,25 @@ static int write_sbadaddr(CPURISCVState *env, int csrno, 
target_ulong val)
 return 0;
 }
 
+static int rmw_vsip(CPURISCVState *env, int csrno, target_ulong *ret_value,
+target_ulong new_value, target_ulong write_mask)
+{
+/* Shift the S bits to their VS bit location in mip */
+int ret = rmw_mip(env, 0, ret_value, new_value << 1,
+  (write_mask << 1) & vsip_writable_mask & env->hideleg);
+*ret_value &= VS_MODE_INTERRUPTS;
+/* Shift the VS bits to their S bit location in vsip */
+*ret_value >>= 1;
+return ret;
+}
+
 static int rmw_sip(CPURISCVState *env, int csrno, target_ulong *ret_value,
target_ulong new_value, target_ulong write_mask)
 {
 int ret;
 
 if (riscv_cpu_virt_enabled(env)) {
-/* Shift the new values to line up with the VS bits */
-ret = rmw_mip(env, CSR_MSTATUS, ret_value, new_value << 1,
-  (write_mask & sip_writable_mask) << 1 & env->mideleg);
-ret &= vsip_writable_mask;
-ret >>= 1;
+ret = rmw_vsip(env, CSR_VSIP, ret_value, new_value, write_mask);
 } else {
 ret = rmw_mip(env, CSR_MSTATUS, ret_value, new_value,
   write_mask & env->mideleg & sip_writable_mask);
@@ -1122,26 +1142,6 @@ static int write_vsstatus(CPURISCVState *env, int csrno, 
target_ulong val)
 return 0;
 }
 
-static int rmw_vsip(CPURISCVState *env, int csrno, target_ulong *ret_value,
-target_ulong new_value, target_ulong write_mask)
-{
-int ret = rmw_mip(env, 0, ret_value, new_value,
-  write_mask & env->mideleg & vsip_writable_mask);
-return ret;
-}
-
-static int read_vsie(CPURISCVState *env, int csrno, target_ulong *val)
-{
-*val = env->mie & env->mideleg & VS_MODE_INTERRUPTS;
-return 0;
-}
-
-static int write_vsie(CPURISCVState *env, int csrno, target_ulong val)
-{
-target_ulong newval = (env->mie & ~env->mideleg) | (val & env->mideleg & 
MIP_VSSIP);
-return write_mie(env, CSR_MIE, newval);
-}
-
 static int read_vstvec(CPURISCVState *env, int csrno, target_ulong *val)
 {
 *val = env->vstvec;
-- 
2.30.1




[PULL 03/16] target/riscv: propagate PMP permission to TLB page

2021-03-22 Thread Alistair Francis
From: Jim Shu 

Currently, PMP permission checking of TLB page is bypassed if TLB hits
Fix it by propagating PMP permission to TLB page permission.

PMP permission checking also use MMU-style API to change TLB permission
and size.

Signed-off-by: Jim Shu 
Reviewed-by: Alistair Francis 
Message-id: 1613916082-19528-2-git-send-email-cw...@andestech.com
Signed-off-by: Alistair Francis 
---
 target/riscv/pmp.h|  4 +-
 target/riscv/cpu_helper.c | 84 +--
 target/riscv/pmp.c| 80 +++--
 3 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/target/riscv/pmp.h b/target/riscv/pmp.h
index c8d5ef4a69..b82a30f0d5 100644
--- a/target/riscv/pmp.h
+++ b/target/riscv/pmp.h
@@ -59,11 +59,13 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 target_ulong val);
 target_ulong pmpaddr_csr_read(CPURISCVState *env, uint32_t addr_index);
 bool pmp_hart_has_privs(CPURISCVState *env, target_ulong addr,
-target_ulong size, pmp_priv_t priv, target_ulong mode);
+target_ulong size, pmp_priv_t privs, pmp_priv_t *allowed_privs,
+target_ulong mode);
 bool pmp_is_range_in_tlb(CPURISCVState *env, hwaddr tlb_sa,
  target_ulong *tlb_size);
 void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index);
 void pmp_update_rule_nums(CPURISCVState *env);
 uint32_t pmp_get_num_rules(CPURISCVState *env);
+int pmp_priv_to_page_prot(pmp_priv_t pmp_priv);
 
 #endif
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 83a6bcfad0..fa385594df 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -280,6 +280,49 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
 env->load_res = -1;
 }
 
+/*
+ * get_physical_address_pmp - check PMP permission for this physical address
+ *
+ * Match the PMP region and check permission for this physical address and it's
+ * TLB page. Returns 0 if the permission checking was successful
+ *
+ * @env: CPURISCVState
+ * @prot: The returned protection attributes
+ * @tlb_size: TLB page size containing addr. It could be modified after PMP
+ *permission checking. NULL if not set TLB page for addr.
+ * @addr: The physical address to be checked permission
+ * @access_type: The type of MMU access
+ * @mode: Indicates current privilege level.
+ */
+static int get_physical_address_pmp(CPURISCVState *env, int *prot,
+target_ulong *tlb_size, hwaddr addr,
+int size, MMUAccessType access_type,
+int mode)
+{
+pmp_priv_t pmp_priv;
+target_ulong tlb_size_pmp = 0;
+
+if (!riscv_feature(env, RISCV_FEATURE_PMP)) {
+*prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+return TRANSLATE_SUCCESS;
+}
+
+if (!pmp_hart_has_privs(env, addr, size, 1 << access_type, _priv,
+mode)) {
+*prot = 0;
+return TRANSLATE_PMP_FAIL;
+}
+
+*prot = pmp_priv_to_page_prot(pmp_priv);
+if (tlb_size != NULL) {
+if (pmp_is_range_in_tlb(env, addr & ~(*tlb_size - 1), _size_pmp)) {
+*tlb_size = tlb_size_pmp;
+}
+}
+
+return TRANSLATE_SUCCESS;
+}
+
 /* get_physical_address - get the physical address for this virtual address
  *
  * Do a page table walk to obtain the physical address corresponding to a
@@ -442,9 +485,11 @@ restart:
 pte_addr = base + idx * ptesize;
 }
 
-if (riscv_feature(env, RISCV_FEATURE_PMP) &&
-!pmp_hart_has_privs(env, pte_addr, sizeof(target_ulong),
-1 << MMU_DATA_LOAD, PRV_S)) {
+int pmp_prot;
+int pmp_ret = get_physical_address_pmp(env, _prot, NULL, pte_addr,
+   sizeof(target_ulong),
+   MMU_DATA_LOAD, PRV_S);
+if (pmp_ret != TRANSLATE_SUCCESS) {
 return TRANSLATE_PMP_FAIL;
 }
 
@@ -682,13 +727,14 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 #ifndef CONFIG_USER_ONLY
 vaddr im_address;
 hwaddr pa = 0;
-int prot, prot2;
+int prot, prot2, prot_pmp;
 bool pmp_violation = false;
 bool first_stage_error = true;
 bool two_stage_lookup = false;
 int ret = TRANSLATE_FAIL;
 int mode = mmu_idx;
-target_ulong tlb_size = 0;
+/* default TLB page size */
+target_ulong tlb_size = TARGET_PAGE_SIZE;
 
 env->guest_phys_fault_addr = 0;
 
@@ -745,10 +791,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 
 prot &= prot2;
 
-if (riscv_feature(env, RISCV_FEATURE_PMP) &&
-(ret == TRANSLATE_SUCCESS) &&
-!pmp_hart_has_privs(env, pa, size, 1 << access_type, mode)) {
-ret = TRANSLATE_PMP_FAIL;
+if (ret == TRANSLATE_SUCCESS) {
+ret = 

[PULL 10/16] hw/riscv: allow ramfb on virt

2021-03-22 Thread Alistair Francis
From: Asherah Connor 

Allow ramfb on virt.  This lets `-device ramfb' work.

Signed-off-by: Asherah Connor 
Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Message-id: 20210318235041.17175-3-a...@kivikakk.ee
Signed-off-by: Alistair Francis 
---
 hw/riscv/virt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index e96ec4cbbc..c0dc69ff33 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -42,6 +42,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/pci/pci.h"
 #include "hw/pci-host/gpex.h"
+#include "hw/display/ramfb.h"
 
 static const MemMapEntry virt_memmap[] = {
 [VIRT_DEBUG] =   {0x0, 0x100 },
@@ -781,6 +782,8 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
 mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
 mc->numa_mem_supported = true;
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
 }
 
 static const TypeInfo virt_machine_typeinfo = {
-- 
2.30.1




[PULL 16/16] target/riscv: Prevent lost illegal instruction exceptions

2021-03-22 Thread Alistair Francis
From: Georg Kotheimer 

When decode_insn16() fails, we fall back to decode_RV32_64C() for
further compressed instruction decoding. However, prior to this change,
we did not raise an illegal instruction exception, if decode_RV32_64C()
fails to decode the instruction. This means that we skipped illegal
compressed instructions instead of raising an illegal instruction
exception.

Instead of patching decode_RV32_64C(), we can just remove it,
as it is dead code since f330433b363 anyway.

Signed-off-by: Georg Kotheimer 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
Message-id: 20210322121609.3097928-1-georg.kothei...@kernkonzept.com
Signed-off-by: Alistair Francis 
---
 target/riscv/translate.c | 179 +--
 1 file changed, 1 insertion(+), 178 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0f28b5f41e..2f9f5ccc62 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -67,20 +67,6 @@ typedef struct DisasContext {
 CPUState *cs;
 } DisasContext;
 
-#ifdef TARGET_RISCV64
-/* convert riscv funct3 to qemu memop for load/store */
-static const int tcg_memop_lookup[8] = {
-[0 ... 7] = -1,
-[0] = MO_SB,
-[1] = MO_TESW,
-[2] = MO_TESL,
-[3] = MO_TEQ,
-[4] = MO_UB,
-[5] = MO_TEUW,
-[6] = MO_TEUL,
-};
-#endif
-
 #ifdef TARGET_RISCV64
 #define CASE_OP_32_64(X) case X: case glue(X, W)
 #else
@@ -374,48 +360,6 @@ static void gen_jal(DisasContext *ctx, int rd, 
target_ulong imm)
 ctx->base.is_jmp = DISAS_NORETURN;
 }
 
-#ifdef TARGET_RISCV64
-static void gen_load_c(DisasContext *ctx, uint32_t opc, int rd, int rs1,
-target_long imm)
-{
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-gen_get_gpr(t0, rs1);
-tcg_gen_addi_tl(t0, t0, imm);
-int memop = tcg_memop_lookup[(opc >> 12) & 0x7];
-
-if (memop < 0) {
-gen_exception_illegal(ctx);
-return;
-}
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, memop);
-gen_set_gpr(rd, t1);
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-}
-
-static void gen_store_c(DisasContext *ctx, uint32_t opc, int rs1, int rs2,
-target_long imm)
-{
-TCGv t0 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
-gen_get_gpr(t0, rs1);
-tcg_gen_addi_tl(t0, t0, imm);
-gen_get_gpr(dat, rs2);
-int memop = tcg_memop_lookup[(opc >> 12) & 0x7];
-
-if (memop < 0) {
-gen_exception_illegal(ctx);
-return;
-}
-
-tcg_gen_qemu_st_tl(dat, t0, ctx->mem_idx, memop);
-tcg_temp_free(t0);
-tcg_temp_free(dat);
-}
-#endif
-
 #ifndef CONFIG_USER_ONLY
 /* The states of mstatus_fs are:
  * 0 = disabled, 1 = initial, 2 = clean, 3 = dirty
@@ -447,83 +391,6 @@ static void mark_fs_dirty(DisasContext *ctx)
 static inline void mark_fs_dirty(DisasContext *ctx) { }
 #endif
 
-#if !defined(TARGET_RISCV64)
-static void gen_fp_load(DisasContext *ctx, uint32_t opc, int rd,
-int rs1, target_long imm)
-{
-TCGv t0;
-
-if (ctx->mstatus_fs == 0) {
-gen_exception_illegal(ctx);
-return;
-}
-
-t0 = tcg_temp_new();
-gen_get_gpr(t0, rs1);
-tcg_gen_addi_tl(t0, t0, imm);
-
-switch (opc) {
-case OPC_RISC_FLW:
-if (!has_ext(ctx, RVF)) {
-goto do_illegal;
-}
-tcg_gen_qemu_ld_i64(cpu_fpr[rd], t0, ctx->mem_idx, MO_TEUL);
-/* RISC-V requires NaN-boxing of narrower width floating point values 
*/
-tcg_gen_ori_i64(cpu_fpr[rd], cpu_fpr[rd], 0xULL);
-break;
-case OPC_RISC_FLD:
-if (!has_ext(ctx, RVD)) {
-goto do_illegal;
-}
-tcg_gen_qemu_ld_i64(cpu_fpr[rd], t0, ctx->mem_idx, MO_TEQ);
-break;
-do_illegal:
-default:
-gen_exception_illegal(ctx);
-break;
-}
-tcg_temp_free(t0);
-
-mark_fs_dirty(ctx);
-}
-
-static void gen_fp_store(DisasContext *ctx, uint32_t opc, int rs1,
-int rs2, target_long imm)
-{
-TCGv t0;
-
-if (ctx->mstatus_fs == 0) {
-gen_exception_illegal(ctx);
-return;
-}
-
-t0 = tcg_temp_new();
-gen_get_gpr(t0, rs1);
-tcg_gen_addi_tl(t0, t0, imm);
-
-switch (opc) {
-case OPC_RISC_FSW:
-if (!has_ext(ctx, RVF)) {
-goto do_illegal;
-}
-tcg_gen_qemu_st_i64(cpu_fpr[rs2], t0, ctx->mem_idx, MO_TEUL);
-break;
-case OPC_RISC_FSD:
-if (!has_ext(ctx, RVD)) {
-goto do_illegal;
-}
-tcg_gen_qemu_st_i64(cpu_fpr[rs2], t0, ctx->mem_idx, MO_TEQ);
-break;
-do_illegal:
-default:
-gen_exception_illegal(ctx);
-break;
-}
-
-tcg_temp_free(t0);
-}
-#endif
-
 static void gen_set_rm(DisasContext *ctx, int rm)
 {
 TCGv_i32 t0;
@@ -537,49 +404,6 @@ static void gen_set_rm(DisasContext *ctx, int rm)
 tcg_temp_free_i32(t0);
 }
 
-static void decode_RV32_64C0(DisasContext *ctx, uint16_t opcode)
-{
-uint8_t funct3 = extract16(opcode, 

[PULL 02/16] hw/char: disable ibex uart receive if the buffer is full

2021-03-22 Thread Alistair Francis
From: Alexander Wagner 

Not disabling the UART leads to QEMU overwriting the UART receive buffer with
the newest received byte. The rx_level variable is added to allow the use of
the existing OpenTitan driver libraries.

Signed-off-by: Alexander Wagner 
Reviewed-by: Alistair Francis 
Message-id: 20210309152130.13038-1-alexander.wag...@ulal.de
Signed-off-by: Alistair Francis 
---
 include/hw/char/ibex_uart.h |  4 
 hw/char/ibex_uart.c | 23 ++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/hw/char/ibex_uart.h b/include/hw/char/ibex_uart.h
index 03d19e3f6f..546f958eb8 100644
--- a/include/hw/char/ibex_uart.h
+++ b/include/hw/char/ibex_uart.h
@@ -62,6 +62,8 @@ REG32(FIFO_CTRL, 0x1c)
 FIELD(FIFO_CTRL, RXILVL, 2, 3)
 FIELD(FIFO_CTRL, TXILVL, 5, 2)
 REG32(FIFO_STATUS, 0x20)
+FIELD(FIFO_STATUS, TXLVL, 0, 5)
+FIELD(FIFO_STATUS, RXLVL, 16, 5)
 REG32(OVRD, 0x24)
 REG32(VAL, 0x28)
 REG32(TIMEOUT_CTRL, 0x2c)
@@ -82,6 +84,8 @@ struct IbexUartState {
 uint8_t tx_fifo[IBEX_UART_TX_FIFO_SIZE];
 uint32_t tx_level;
 
+uint32_t rx_level;
+
 QEMUTimer *fifo_trigger_handle;
 uint64_t char_tx_time;
 
diff --git a/hw/char/ibex_uart.c b/hw/char/ibex_uart.c
index edcaa30ade..73b8f2e45b 100644
--- a/hw/char/ibex_uart.c
+++ b/hw/char/ibex_uart.c
@@ -66,7 +66,8 @@ static int ibex_uart_can_receive(void *opaque)
 {
 IbexUartState *s = opaque;
 
-if (s->uart_ctrl & R_CTRL_RX_ENABLE_MASK) {
+if ((s->uart_ctrl & R_CTRL_RX_ENABLE_MASK)
+   && !(s->uart_status & R_STATUS_RXFULL_MASK)) {
 return 1;
 }
 
@@ -83,6 +84,11 @@ static void ibex_uart_receive(void *opaque, const uint8_t 
*buf, int size)
 
 s->uart_status &= ~R_STATUS_RXIDLE_MASK;
 s->uart_status &= ~R_STATUS_RXEMPTY_MASK;
+/* The RXFULL is set after receiving a single byte
+ * as the FIFO buffers are not yet implemented.
+ */
+s->uart_status |= R_STATUS_RXFULL_MASK;
+s->rx_level += 1;
 
 if (size > rx_fifo_level) {
 s->uart_intr_state |= R_INTR_STATE_RX_WATERMARK_MASK;
@@ -199,6 +205,7 @@ static void ibex_uart_reset(DeviceState *dev)
 s->uart_timeout_ctrl = 0x;
 
 s->tx_level = 0;
+s->rx_level = 0;
 
 s->char_tx_time = (NANOSECONDS_PER_SECOND / 230400) * 10;
 
@@ -243,11 +250,15 @@ static uint64_t ibex_uart_read(void *opaque, hwaddr addr,
 
 case R_RDATA:
 retvalue = s->uart_rdata;
-if (s->uart_ctrl & R_CTRL_RX_ENABLE_MASK) {
+if ((s->uart_ctrl & R_CTRL_RX_ENABLE_MASK) && (s->rx_level > 0)) {
 qemu_chr_fe_accept_input(>chr);
 
-s->uart_status |= R_STATUS_RXIDLE_MASK;
-s->uart_status |= R_STATUS_RXEMPTY_MASK;
+s->rx_level -= 1;
+s->uart_status &= ~R_STATUS_RXFULL_MASK;
+if (s->rx_level == 0) {
+s->uart_status |= R_STATUS_RXIDLE_MASK;
+s->uart_status |= R_STATUS_RXEMPTY_MASK;
+}
 }
 break;
 case R_WDATA:
@@ -261,7 +272,8 @@ static uint64_t ibex_uart_read(void *opaque, hwaddr addr,
 case R_FIFO_STATUS:
 retvalue = s->uart_fifo_status;
 
-retvalue |= s->tx_level & 0x1F;
+retvalue |= (s->rx_level & 0x1F) << R_FIFO_STATUS_RXLVL_SHIFT;
+retvalue |= (s->tx_level & 0x1F) << R_FIFO_STATUS_TXLVL_SHIFT;
 
 qemu_log_mask(LOG_UNIMP,
   "%s: RX fifos are not supported\n", __func__);
@@ -364,6 +376,7 @@ static void ibex_uart_write(void *opaque, hwaddr addr,
 s->uart_fifo_ctrl = value;
 
 if (value & R_FIFO_CTRL_RXRST_MASK) {
+s->rx_level = 0;
 qemu_log_mask(LOG_UNIMP,
   "%s: RX fifos are not supported\n", __func__);
 }
-- 
2.30.1




[PULL 08/16] target/riscv: Use background registers also for MSTATUS_MPV

2021-03-22 Thread Alistair Francis
From: Georg Kotheimer 

The current condition for the use of background registers only
considers the hypervisor load and store instructions,
but not accesses from M mode via MSTATUS_MPRV+MPV.

Signed-off-by: Georg Kotheimer 
Reviewed-by: Alistair Francis 
Message-id: 20210311103036.1401073-1-georg.kothei...@kernkonzept.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index b15a60d8a2..8d4a62988d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -364,7 +364,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
  * was called. Background registers will be used if the guest has
  * forced a two stage translation to be on (in HS or M mode).
  */
-if (!riscv_cpu_virt_enabled(env) && riscv_cpu_two_stage_lookup(mmu_idx)) {
+if (!riscv_cpu_virt_enabled(env) && two_stage) {
 use_background = true;
 }
 
-- 
2.30.1




[PULL 12/16] target/riscv: Add proper two-stage lookup exception detection

2021-03-22 Thread Alistair Francis
From: Georg Kotheimer 

The current two-stage lookup detection in riscv_cpu_do_interrupt falls
short of its purpose, as all it checks is whether two-stage address
translation either via the hypervisor-load store instructions or the
MPRV feature would be allowed.

What we really need instead is whether two-stage address translation was
active when the exception was raised. However, in riscv_cpu_do_interrupt
we do not have the information to reliably detect this. Therefore, when
we raise a memory fault exception we have to record whether two-stage
address translation is active.

Signed-off-by: Georg Kotheimer 
Reviewed-by: Alistair Francis 
Message-id: 20210319141459.1196741-1-georg.kothei...@kernkonzept.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  4 
 target/riscv/cpu.c|  1 +
 target/riscv/cpu_helper.c | 21 -
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0edb2826a2..0a33d387ba 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -213,6 +213,10 @@ struct CPURISCVState {
 target_ulong satp_hs;
 uint64_t mstatus_hs;
 
+/* Signals whether the current exception occurred with two-stage address
+   translation active. */
+bool two_stage_lookup;
+
 target_ulong scounteren;
 target_ulong mcounteren;
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 2a990f6253..7d6ed80f6b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -356,6 +356,7 @@ static void riscv_cpu_reset(DeviceState *dev)
 env->mstatus &= ~(MSTATUS_MIE | MSTATUS_MPRV);
 env->mcause = 0;
 env->pc = env->resetvec;
+env->two_stage_lookup = false;
 #endif
 cs->exception_index = EXCP_NONE;
 env->load_res = -1;
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 8d4a62988d..21c54ef561 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -654,6 +654,7 @@ static void raise_mmu_exception(CPURISCVState *env, 
target_ulong address,
 g_assert_not_reached();
 }
 env->badaddr = address;
+env->two_stage_lookup = two_stage;
 }
 
 hwaddr riscv_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
@@ -695,6 +696,8 @@ void riscv_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
 }
 
 env->badaddr = addr;
+env->two_stage_lookup = riscv_cpu_virt_enabled(env) ||
+riscv_cpu_two_stage_lookup(mmu_idx);
 riscv_raise_exception(>env, cs->exception_index, retaddr);
 }
 
@@ -718,6 +721,8 @@ void riscv_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
 g_assert_not_reached();
 }
 env->badaddr = addr;
+env->two_stage_lookup = riscv_cpu_virt_enabled(env) ||
+riscv_cpu_two_stage_lookup(mmu_idx);
 riscv_raise_exception(env, cs->exception_index, retaddr);
 }
 #endif /* !CONFIG_USER_ONLY */
@@ -967,16 +972,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 /* handle the trap in S-mode */
 if (riscv_has_ext(env, RVH)) {
 target_ulong hdeleg = async ? env->hideleg : env->hedeleg;
-bool two_stage_lookup = false;
 
-if (env->priv == PRV_M ||
-(env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
-(env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
-get_field(env->hstatus, HSTATUS_HU))) {
-two_stage_lookup = true;
-}
-
-if ((riscv_cpu_virt_enabled(env) || two_stage_lookup) && 
write_tval) {
+if (env->two_stage_lookup && write_tval) {
 /*
  * If we are writing a guest virtual address to stval, set
  * this to 1. If we are trapping to VS we will set this to 0
@@ -1014,10 +1011,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 riscv_cpu_set_force_hs_excep(env, 0);
 } else {
 /* Trap into HS mode */
-if (!two_stage_lookup) {
-env->hstatus = set_field(env->hstatus, HSTATUS_SPV,
- riscv_cpu_virt_enabled(env));
-}
+env->hstatus = set_field(env->hstatus, HSTATUS_SPV, false);
 htval = env->guest_phys_fault_addr;
 }
 }
@@ -1073,6 +1067,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
  * RISC-V ISA Specification.
  */
 
+env->two_stage_lookup = false;
 #endif
 cs->exception_index = EXCP_NONE; /* mark handled to qemu */
 }
-- 
2.30.1




[PATCH v4 4/4] virtio-pci: add support for configure interrupt

2021-03-22 Thread Cindy Lu
Add support for configure interrupt, use kvm_irqfd_assign and set the
gsi to kernel. When the configure notifier was eventfd_signal by host
kernel, this will finally inject an msix interrupt to guest

Signed-off-by: Cindy Lu 
---
 hw/virtio/virtio-pci.c | 171 +
 1 file changed, 137 insertions(+), 34 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 36524a5728..b0c190caba 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -664,7 +664,6 @@ static uint32_t virtio_read_config(PCIDevice *pci_dev,
 }
 
 static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
-unsigned int queue_no,
 unsigned int vector)
 {
 VirtIOIRQFD *irqfd = >vector_irqfd[vector];
@@ -691,23 +690,17 @@ static void 
kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
 }
 
 static int kvm_virtio_pci_irqfd_use(VirtIOPCIProxy *proxy,
- unsigned int queue_no,
+ EventNotifier *n,
  unsigned int vector)
 {
 VirtIOIRQFD *irqfd = >vector_irqfd[vector];
-VirtIODevice *vdev = virtio_bus_get_device(>bus);
-VirtQueue *vq = virtio_get_queue(vdev, queue_no);
-EventNotifier *n = virtio_queue_get_guest_notifier(vq);
 return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, irqfd->virq);
 }
 
 static void kvm_virtio_pci_irqfd_release(VirtIOPCIProxy *proxy,
-  unsigned int queue_no,
+  EventNotifier *n ,
   unsigned int vector)
 {
-VirtIODevice *vdev = virtio_bus_get_device(>bus);
-VirtQueue *vq = virtio_get_queue(vdev, queue_no);
-EventNotifier *n = virtio_queue_get_guest_notifier(vq);
 VirtIOIRQFD *irqfd = >vector_irqfd[vector];
 int ret;
 
@@ -722,7 +715,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, 
int nvqs)
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 unsigned int vector;
 int ret, queue_no;
-
+VirtQueue *vq;
+EventNotifier *n;
 for (queue_no = 0; queue_no < nvqs; queue_no++) {
 if (!virtio_queue_get_num(vdev, queue_no)) {
 break;
@@ -731,7 +725,7 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, 
int nvqs)
 if (vector >= msix_nr_vectors_allocated(dev)) {
 continue;
 }
-ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector);
+ret = kvm_virtio_pci_vq_vector_use(proxy,  vector);
 if (ret < 0) {
 goto undo;
 }
@@ -739,7 +733,9 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, 
int nvqs)
  * Otherwise, delay until unmasked in the frontend.
  */
 if (vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
-ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
+vq = virtio_get_queue(vdev, queue_no);
+n = virtio_queue_get_guest_notifier(vq);
+ret = kvm_virtio_pci_irqfd_use(proxy, n, vector);
 if (ret < 0) {
 kvm_virtio_pci_vq_vector_release(proxy, vector);
 goto undo;
@@ -755,13 +751,69 @@ undo:
 continue;
 }
 if (vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
-kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
+vq = virtio_get_queue(vdev, queue_no);
+n = virtio_queue_get_guest_notifier(vq);
+kvm_virtio_pci_irqfd_release(proxy, n, vector);
 }
 kvm_virtio_pci_vq_vector_release(proxy, vector);
 }
 return ret;
 }
 
+static int kvm_virtio_pci_vector_config_use(VirtIOPCIProxy *proxy)
+{
+
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+unsigned int vector;
+int ret;
+EventNotifier *n = virtio_get_config_notifier(vdev);
+
+vector = vdev->config_vector ;
+ret = kvm_virtio_pci_vq_vector_use(proxy, vector);
+if (ret < 0) {
+goto undo;
+}
+ret = kvm_virtio_pci_irqfd_use(proxy,  n, vector);
+if (ret < 0) {
+goto undo;
+}
+return 0;
+undo:
+kvm_virtio_pci_irqfd_release(proxy, n, vector);
+return ret;
+}
+static void kvm_virtio_pci_vector_config_release(VirtIOPCIProxy *proxy)
+{
+PCIDevice *dev = >pci_dev;
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+unsigned int vector;
+EventNotifier *n = virtio_get_config_notifier(vdev);
+vector = vdev->config_vector ;
+if (vector >= msix_nr_vectors_allocated(dev)) {
+return;
+}
+kvm_virtio_pci_irqfd_release(proxy, n, vector);
+kvm_virtio_pci_vq_vector_release(proxy, vector);
+}
+
+static int virtio_pci_set_config_notifier(DeviceState *d,  bool assign)
+{
+VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+EventNotifier 

[PULL 06/16] target/riscv: Adjust privilege level for HLV(X)/HSV instructions

2021-03-22 Thread Alistair Francis
From: Georg Kotheimer 

According to the specification the "field SPVP of hstatus controls the
privilege level of the access" for the hypervisor virtual-machine load
and store instructions HLV, HLVX and HSV.

Signed-off-by: Georg Kotheimer 
Reviewed-by: Alistair Francis 
Message-id: 20210311103005.1400718-1-georg.kothei...@kernkonzept.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 0515f9aec8..b15a60d8a2 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -368,7 +368,11 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
 use_background = true;
 }
 
-if (mode == PRV_M && access_type != MMU_INST_FETCH) {
+/* MPRV does not affect the virtual-machine load/store
+   instructions, HLV, HLVX, and HSV. */
+if (riscv_cpu_two_stage_lookup(mmu_idx)) {
+mode = get_field(env->hstatus, HSTATUS_SPVP);
+} else if (mode == PRV_M && access_type != MMU_INST_FETCH) {
 if (get_field(env->mstatus, MSTATUS_MPRV)) {
 mode = get_field(env->mstatus, MSTATUS_MPP);
 }
@@ -741,19 +745,18 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
   __func__, address, access_type, mmu_idx);
 
-if (mode == PRV_M && access_type != MMU_INST_FETCH) {
-if (get_field(env->mstatus, MSTATUS_MPRV)) {
-mode = get_field(env->mstatus, MSTATUS_MPP);
+/* MPRV does not affect the virtual-machine load/store
+   instructions, HLV, HLVX, and HSV. */
+if (riscv_cpu_two_stage_lookup(mmu_idx)) {
+mode = get_field(env->hstatus, HSTATUS_SPVP);
+} else if (mode == PRV_M && access_type != MMU_INST_FETCH &&
+   get_field(env->mstatus, MSTATUS_MPRV)) {
+mode = get_field(env->mstatus, MSTATUS_MPP);
+if (riscv_has_ext(env, RVH) && get_field(env->mstatus, MSTATUS_MPV)) {
+two_stage_lookup = true;
 }
 }
 
-if (riscv_has_ext(env, RVH) && env->priv == PRV_M &&
-access_type != MMU_INST_FETCH &&
-get_field(env->mstatus, MSTATUS_MPRV) &&
-get_field(env->mstatus, MSTATUS_MPV)) {
-two_stage_lookup = true;
-}
-
 if (riscv_cpu_virt_enabled(env) ||
 ((riscv_cpu_two_stage_lookup(mmu_idx) || two_stage_lookup) &&
  access_type != MMU_INST_FETCH)) {
-- 
2.30.1




[PULL 04/16] target/riscv: add log of PMP permission checking

2021-03-22 Thread Alistair Francis
From: Jim Shu 

Like MMU translation, add qemu log of PMP permission checking for
debugging.

Signed-off-by: Jim Shu 
Reviewed-by: Alistair Francis 
Message-id: 1613916082-19528-3-git-send-email-cw...@andestech.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index fa385594df..0515f9aec8 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -794,6 +794,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 if (ret == TRANSLATE_SUCCESS) {
 ret = get_physical_address_pmp(env, _pmp, _size, pa,
size, access_type, mode);
+
+qemu_log_mask(CPU_LOG_MMU,
+  "%s PMP address=" TARGET_FMT_plx " ret %d prot"
+  " %d tlb_size " TARGET_FMT_lu "\n",
+  __func__, pa, ret, prot_pmp, tlb_size);
+
 prot &= prot_pmp;
 }
 
@@ -821,6 +827,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 if (ret == TRANSLATE_SUCCESS) {
 ret = get_physical_address_pmp(env, _pmp, _size, pa,
size, access_type, mode);
+
+qemu_log_mask(CPU_LOG_MMU,
+  "%s PMP address=" TARGET_FMT_plx " ret %d prot"
+  " %d tlb_size " TARGET_FMT_lu "\n",
+  __func__, pa, ret, prot_pmp, tlb_size);
+
 prot &= prot_pmp;
 }
 }
-- 
2.30.1




[PATCH v4 3/4] virtio-mmio: add support for configure interrupt

2021-03-22 Thread Cindy Lu
add configure interrupt support for virtio-mmio bus. This
interrupt will working while backend is vhost-vdpa

Signed-off-by: Cindy Lu 
---
 hw/virtio/virtio-mmio.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index e1b5c3b81e..beabd129ef 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -627,12 +627,30 @@ static int virtio_mmio_set_guest_notifier(DeviceState *d, 
int n, bool assign,
 }
 
 if (vdc->guest_notifier_mask && vdev->use_guest_notifier_mask) {
-vdc->guest_notifier_mask(vdev, n, !assign);
+vdc->guest_notifier_mask(vdev, n, !assign, VIRTIO_VQ_VECTOR);
 }
-
 return 0;
 }
+static int virtio_mmio_set_config_notifier(DeviceState *d,  bool assign)
+{
+VirtIOMMIOProxy *proxy = VIRTIO_MMIO(d);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
 
+EventNotifier *notifier = virtio_get_config_notifier(vdev);
+int r = 0;
+if (assign) {
+r = event_notifier_init(notifier, 0);
+virtio_set_config_notifier_fd_handler(vdev, true, false);
+} else {
+virtio_set_config_notifier_fd_handler(vdev, false, false);
+event_notifier_cleanup(notifier);
+}
+if (vdc->guest_notifier_mask && vdev->use_guest_notifier_mask) {
+vdc->guest_notifier_mask(vdev, 0, !assign, VIRTIO_CONFIG_VECTOR);
+}
+return r;
+}
 static int virtio_mmio_set_guest_notifiers(DeviceState *d, int nvqs,
bool assign)
 {
@@ -654,8 +672,15 @@ static int virtio_mmio_set_guest_notifiers(DeviceState *d, 
int nvqs,
 goto assign_error;
 }
 }
+   r = virtio_mmio_set_config_notifier(d, assign);
+   if (r < 0) {
+goto config_assign_error;
+   }
 
 return 0;
+config_assign_error:
+assert(assign);
+r = virtio_mmio_set_config_notifier(d, false);
 
 assign_error:
 /* We get here on assignment failure. Recover by undoing for VQs 0 .. n. */
@@ -666,6 +691,7 @@ assign_error:
 return r;
 }
 
+
 static void virtio_mmio_pre_plugged(DeviceState *d, Error **errp)
 {
 VirtIOMMIOProxy *proxy = VIRTIO_MMIO(d);
-- 
2.21.3




[PULL 05/16] target/riscv: flush TLB pages if PMP permission has been changed

2021-03-22 Thread Alistair Francis
From: Jim Shu 

If PMP permission of any address has been changed by updating PMP entry,
flush all TLB pages to prevent from getting old permission.

Signed-off-by: Jim Shu 
Reviewed-by: Alistair Francis 
Message-id: 1613916082-19528-4-git-send-email-cw...@andestech.com
Signed-off-by: Alistair Francis 
---
 target/riscv/pmp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index ebd874cde3..cff020122a 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -28,6 +28,7 @@
 #include "qapi/error.h"
 #include "cpu.h"
 #include "trace.h"
+#include "exec/exec-all.h"
 
 static void pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
 uint8_t val);
@@ -347,6 +348,9 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 cfg_val = (val >> 8 * i)  & 0xff;
 pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
 }
+
+/* If PMP permission of any addr has been changed, flush TLB pages. */
+tlb_flush(env_cpu(env));
 }
 
 
-- 
2.30.1




[PULL 00/16] riscv-to-apply queue

2021-03-22 Thread Alistair Francis
The following changes since commit c95bd5ff1660883d15ad6e0005e4c8571604f51a:

  Merge remote-tracking branch 'remotes/philmd/tags/mips-fixes-20210322' into 
staging (2021-03-22 14:26:13 +)

are available in the Git repository at:

  g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20210322-2

for you to fetch changes up to 9a27f69bd668d9d71674407badc412ce1231c7d5:

  target/riscv: Prevent lost illegal instruction exceptions (2021-03-22 
21:54:40 -0400)


RISC-V PR for 6.0

This PR includes:
 - Fix for vector CSR access
 - Improvements to the Ibex UART device
 - PMP improvements and bug fixes
 - Hypervisor extension bug fixes
 - ramfb support for the virt machine
 - Fast read support for SST flash
 - Improvements to the microchip_pfsoc machine


Alexander Wagner (1):
  hw/char: disable ibex uart receive if the buffer is full

Asherah Connor (2):
  hw/riscv: Add fw_cfg support to virt
  hw/riscv: allow ramfb on virt

Bin Meng (3):
  hw/block: m25p80: Support fast read for SST flashes
  hw/riscv: microchip_pfsoc: Map EMMC/SD mux register
  docs/system: riscv: Add documentation for 'microchip-icicle-kit' machine

Frank Chang (1):
  target/riscv: fix vs() to return proper error code

Georg Kotheimer (6):
  target/riscv: Adjust privilege level for HLV(X)/HSV instructions
  target/riscv: Make VSTIP and VSEIP read-only in hip
  target/riscv: Use background registers also for MSTATUS_MPV
  target/riscv: Fix read and write accesses to vsip and vsie
  target/riscv: Add proper two-stage lookup exception detection
  target/riscv: Prevent lost illegal instruction exceptions

Jim Shu (3):
  target/riscv: propagate PMP permission to TLB page
  target/riscv: add log of PMP permission checking
  target/riscv: flush TLB pages if PMP permission has been changed

 docs/system/riscv/microchip-icicle-kit.rst |  89 ++
 docs/system/target-riscv.rst   |   1 +
 include/hw/char/ibex_uart.h|   4 +
 include/hw/riscv/microchip_pfsoc.h |   1 +
 include/hw/riscv/virt.h|   2 +
 target/riscv/cpu.h |   4 +
 target/riscv/pmp.h |   4 +-
 hw/block/m25p80.c  |   3 +
 hw/char/ibex_uart.c|  23 +++-
 hw/riscv/microchip_pfsoc.c |   6 +
 hw/riscv/virt.c|  33 ++
 target/riscv/cpu.c |   1 +
 target/riscv/cpu_helper.c  | 144 +++
 target/riscv/csr.c |  77 +++--
 target/riscv/pmp.c |  84 ++
 target/riscv/translate.c   | 179 +
 hw/riscv/Kconfig   |   1 +
 17 files changed, 367 insertions(+), 289 deletions(-)
 create mode 100644 docs/system/riscv/microchip-icicle-kit.rst



[PULL 01/16] target/riscv: fix vs() to return proper error code

2021-03-22 Thread Alistair Francis
From: Frank Chang 

vs() should return -RISCV_EXCP_ILLEGAL_INST instead of -1 if rvv feature
is not enabled.

If -1 is returned, exception will be raised and cs->exception_index will
be set to the negative return value. The exception will then be treated
as an instruction access fault instead of illegal instruction fault.

Signed-off-by: Frank Chang 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Message-id: 20210223065935.20208-1-frank.ch...@sifive.com
Signed-off-by: Alistair Francis 
---
 target/riscv/csr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index fd2e6363f3..d2ae73e4a0 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -54,7 +54,7 @@ static int vs(CPURISCVState *env, int csrno)
 if (env->misa & RVV) {
 return 0;
 }
-return -1;
+return -RISCV_EXCP_ILLEGAL_INST;
 }
 
 static int ctr(CPURISCVState *env, int csrno)
-- 
2.30.1




[PATCH v4 2/4] vhost-vdpa: add callback function for configure interrupt

2021-03-22 Thread Cindy Lu
Add call back function for configure interrupt.
Set the notifier's fd to the kernel driver when vdpa start.
also set -1 while vdpa stop. then the kernel will release
the related resource

Signed-off-by: Cindy Lu 
---
 hw/virtio/trace-events|  2 ++
 hw/virtio/vhost-vdpa.c| 40 +--
 include/hw/virtio/vhost-backend.h |  4 
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 2060a144a2..6710835b46 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -52,6 +52,8 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int 
fd) "dev: %p index:
 vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 
0x%"PRIx64
 vhost_vdpa_set_owner(void *dev) "dev: %p"
 vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t 
avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 
0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
+vhost_vdpa_set_config_call(void *dev, int *fd)"dev: %p fd: %p"
+
 
 # virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned 
out_num) "elem %p size %zd in_num %u out_num %u"
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 01d2101d09..bde32eefe7 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -467,20 +467,47 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, 
uint8_t *config,
 }
 return ret;
  }
-
+static void vhost_vdpa_config_notify_start(struct vhost_dev *dev,
+struct VirtIODevice *vdev, bool start)
+{
+int fd = 0;
+int r = 0;
+if (!(dev->features & (0x1ULL << VIRTIO_NET_F_STATUS))) {
+return;
+}
+if (start) {
+fd = event_notifier_get_fd(>config_notifier);
+r = dev->vhost_ops->vhost_set_config_call(dev, );
+ /*set the fd call back to vdpa driver*/
+if (!r) {
+vdev->use_config_notifier = VIRTIO_CONFIG_WORK;
+event_notifier_set(>config_notifier);
+info_report("vhost_vdpa_config_notify start!");
+  }
+} else {
+fd = -1;
+vdev->use_config_notifier = VIRTIO_CONFIG_STOP;
+r = dev->vhost_ops->vhost_set_config_call(dev, );
+}
+return;
+}
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
 struct vhost_vdpa *v = dev->opaque;
 trace_vhost_vdpa_dev_start(dev, started);
+VirtIODevice *vdev = dev->vdev;
+
 if (started) {
 uint8_t status = 0;
 memory_listener_register(>listener, _space_memory);
 vhost_vdpa_set_vring_ready(dev);
 vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
 vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, );
-
+/*set the configure interrupt call back*/
+vhost_vdpa_config_notify_start(dev, vdev, true);
 return !(status & VIRTIO_CONFIG_S_DRIVER_OK);
 } else {
+vhost_vdpa_config_notify_start(dev, vdev, false);
 vhost_vdpa_reset_device(dev);
 vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
VIRTIO_CONFIG_S_DRIVER);
@@ -546,6 +573,14 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
 return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }
 
+static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
+   int *fd)
+{
+trace_vhost_vdpa_set_config_call(dev, fd);
+
+return vhost_vdpa_call(dev, VHOST_VDPA_SET_CONFIG_CALL, fd);
+}
+
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
  uint64_t *features)
 {
@@ -611,4 +646,5 @@ const VhostOps vdpa_ops = {
 .vhost_get_device_id = vhost_vdpa_get_device_id,
 .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
 .vhost_force_iommu = vhost_vdpa_force_iommu,
+.vhost_set_config_call = vhost_vdpa_set_config_call,
 };
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 8a6f8e2a7a..1a2fee8994 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -125,6 +125,9 @@ typedef int (*vhost_get_device_id_op)(struct vhost_dev 
*dev, uint32_t *dev_id);
 
 typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
 
+typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
+   int *fd);
+
 typedef struct VhostOps {
 VhostBackendType backend_type;
 vhost_backend_init vhost_backend_init;
@@ -170,6 +173,7 @@ typedef struct VhostOps {
 vhost_vq_get_addr_op  vhost_vq_get_addr;
 vhost_get_device_id_op vhost_get_device_id;
 vhost_force_iommu_op vhost_force_iommu;
+vhost_set_config_call_op vhost_set_config_call;
 } VhostOps;
 
 extern const VhostOps user_ops;
-- 
2.21.3




[PATCH v4 0/4] vhost-vdpa: add support for configure interrupt

2021-03-22 Thread Cindy Lu
These code are all tested in vp-vdpa (support configure interrupt)
vdpa_sim (not support configure interrupt)

test in virtio-pci bus and virtio-mmio bus

Change in v2:
Add support fot virtio-mmio bus
active the notifier wihle the backend support configure intterrput
misc fixes form v1

Change in v3
fix the coding style problems

Change in v4
misc fixes form v3
merge the set_config_notifier to set_guest_notifier
when vdpa start, check the feature by VIRTIO_NET_F_STATUS 


Cindy Lu (4):
  virtio:add support in configure interrupt
  vhost-vdpa: add callback function for configure interrupt
  virtio-mmio: add support for configure interrupt
  virtio-pci: add support for configure interrupt

 hw/display/vhost-user-gpu.c   |  14 ++-
 hw/net/vhost_net.c|  16 ++-
 hw/net/virtio-net.c   |  24 -
 hw/s390x/virtio-ccw.c |   6 +-
 hw/virtio/trace-events|   2 +
 hw/virtio/vhost-user-fs.c |  12 ++-
 hw/virtio/vhost-vdpa.c|  40 ++-
 hw/virtio/vhost-vsock-common.c|  12 ++-
 hw/virtio/vhost.c |  44 +++-
 hw/virtio/virtio-crypto.c |  13 ++-
 hw/virtio/virtio-mmio.c   |  30 +-
 hw/virtio/virtio-pci.c| 171 --
 hw/virtio/virtio.c|  28 +
 include/hw/virtio/vhost-backend.h |   4 +
 include/hw/virtio/vhost.h |   4 +
 include/hw/virtio/virtio.h|  23 +++-
 include/net/vhost_net.h   |   3 +
 17 files changed, 378 insertions(+), 68 deletions(-)

-- 
2.21.3





[PATCH v4 1/4] virtio:add support in configure interrupt

2021-03-22 Thread Cindy Lu
Add configure notifier support in virtio and related driver
When peer is vhost vdpa, setup the configure interrupt function
vhost_net_start and release the resource when vhost_net_stop

Signed-off-by: Cindy Lu 
---
 hw/display/vhost-user-gpu.c| 14 +++
 hw/net/vhost_net.c | 16 +++--
 hw/net/virtio-net.c| 24 +++
 hw/s390x/virtio-ccw.c  |  6 ++---
 hw/virtio/vhost-user-fs.c  | 12 ++
 hw/virtio/vhost-vsock-common.c | 12 ++
 hw/virtio/vhost.c  | 44 --
 hw/virtio/virtio-crypto.c  | 13 ++
 hw/virtio/virtio.c | 28 ++
 include/hw/virtio/vhost.h  |  4 
 include/hw/virtio/virtio.h | 23 --
 include/net/vhost_net.h|  3 +++
 12 files changed, 169 insertions(+), 30 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 51f1747c4a..959ad115b6 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -487,18 +487,24 @@ vhost_user_gpu_set_status(VirtIODevice *vdev, uint8_t val)
 }
 
 static bool
-vhost_user_gpu_guest_notifier_pending(VirtIODevice *vdev, int idx)
+vhost_user_gpu_guest_notifier_pending(VirtIODevice *vdev, int idx,
+int type)
 {
 VhostUserGPU *g = VHOST_USER_GPU(vdev);
-
+if (type != VIRTIO_VQ_VECTOR) {
+return false;
+}
 return vhost_virtqueue_pending(>vhost->dev, idx);
 }
 
 static void
-vhost_user_gpu_guest_notifier_mask(VirtIODevice *vdev, int idx, bool mask)
+vhost_user_gpu_guest_notifier_mask(VirtIODevice *vdev, int idx, bool mask,
+int type)
 {
 VhostUserGPU *g = VHOST_USER_GPU(vdev);
-
+if (type != VIRTIO_VQ_VECTOR) {
+return;
+}
 vhost_virtqueue_mask(>vhost->dev, vdev, idx, mask);
 }
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 24d555e764..2ef8cc608e 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -339,7 +339,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 dev->use_guest_notifier_mask = false;
 }
  }
-
+if (ncs->peer && ncs->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+dev->use_config_notifier = VIRTIO_CONFIG_SUPPORT;
+}
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
 if (r < 0) {
 error_report("Error binding guest notifier: %d", -r);
@@ -391,7 +393,6 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 for (i = 0; i < total_queues; i++) {
 vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
 }
-
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
 if (r < 0) {
 fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
@@ -426,6 +427,17 @@ void vhost_net_virtqueue_mask(VHostNetState *net, 
VirtIODevice *dev,
 vhost_virtqueue_mask(>dev, dev, idx, mask);
 }
 
+bool vhost_net_config_pending(VHostNetState *net, int idx)
+{
+return vhost_config_pending(>dev, idx);
+}
+
+void vhost_net_config_mask(VHostNetState *net, VirtIODevice *dev,
+  bool mask)
+{
+vhost_config_mask(>dev, dev,  mask);
+}
+
 VHostNetState *get_vhost_net(NetClientState *nc)
 {
 VHostNetState *vhost_net = 0;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9179013ac4..b84427fe99 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3055,22 +3055,36 @@ static NetClientInfo net_virtio_info = {
 .announce = virtio_net_announce,
 };
 
-static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
+
+static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx,
+int type)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
 assert(n->vhost_started);
-return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
+
+if (type == VIRTIO_VQ_VECTOR) {
+return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
+}
+if (type == VIRTIO_CONFIG_VECTOR) {
+return vhost_net_config_pending(get_vhost_net(nc->peer), idx);
+}
+return false;
 }
 
 static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
-   bool mask)
+   bool mask, int type)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
 assert(n->vhost_started);
-vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
- vdev, idx, mask);
+
+if (type == VIRTIO_VQ_VECTOR) {
+vhost_net_virtqueue_mask(get_vhost_net(nc->peer), vdev, idx, mask);
+ }
+if (type == VIRTIO_CONFIG_VECTOR) {
+vhost_net_config_mask(get_vhost_net(nc->peer), vdev, mask);
+}
 }
 
 static void 

Re: [PATCH v5 10/10] KVM: Dirty ring support

2021-03-22 Thread Keqian Zhu



On 2021/3/23 2:52, Peter Xu wrote:
> On Mon, Mar 22, 2021 at 09:37:19PM +0800, Keqian Zhu wrote:
>>> +/* Should be with all slots_lock held for the address spaces. */
>>> +static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id,
>>> + uint32_t slot_id, uint64_t offset)
>>> +{
>>> +KVMMemoryListener *kml;
>>> +KVMSlot *mem;
>>> +
>>> +if (as_id >= s->nr_as) {
>>> +return;
>>> +}
>>> +
>>> +kml = s->as[as_id].ml;
>>> +mem = >slots[slot_id];
>>> +
>>> +if (!mem->memory_size || offset >= (mem->memory_size / 
>>> TARGET_PAGE_SIZE)) {
>> It seems that TARGET_PAGE_SIZE should be qemu_real_host_page_size.
> 
> Fixed.
> 
> [...]
> 
>>> +/*
>>> + * Flush all the existing dirty pages to the KVM slot buffers.  When
>>> + * this call returns, we guarantee that all the touched dirty pages
>>> + * before calling this function have been put into the per-kvmslot
>>> + * dirty bitmap.
>>> + *
>>> + * This function must be called with BQL held.
>>> + */
>>> +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r)
>> The argument is not used.
> 
> Indeed, removed.
> 
>>
>>> +{
>>> +trace_kvm_dirty_ring_flush(0);
>>> +/*
>>> + * The function needs to be serialized.  Since this function
>>> + * should always be with BQL held, serialization is guaranteed.
>>> + * However, let's be sure of it.
>>> + */
>>> +assert(qemu_mutex_iothread_locked());
>>> +/*
>>> + * First make sure to flush the hardware buffers by kicking all
>>> + * vcpus out in a synchronous way.
>>> + */
>>> +kvm_cpu_synchronize_kick_all();
>> Can we make this function to be architecture specific?
>> It seems that kick out vCPU is an architecture specific way to flush 
>> hardware buffers
>> to dirty ring (for x86 PML).
> 
> I can do that, but I'd say it's kind of an overkill if after all the kernel
> support is not there yet, so I still tend to make it as simple as possible.
OK.

> 
> [...]
> 
>>> +static void *kvm_dirty_ring_reaper_thread(void *data)
>>> +{
>>> +KVMState *s = data;
>>> +struct KVMDirtyRingReaper *r = >reaper;
>>> +
>>> +rcu_register_thread();
>>> +
>>> +trace_kvm_dirty_ring_reaper("init");
>>> +
>>> +while (true) {
>>> +r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT;
>>> +trace_kvm_dirty_ring_reaper("wait");
>>> +/*
>>> + * TODO: provide a smarter timeout rather than a constant?
>>> + */
>>> +sleep(1);
>>> +
>>> +trace_kvm_dirty_ring_reaper("wakeup");
>>> +r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
>>> +
>>> +qemu_mutex_lock_iothread();
>>> +kvm_dirty_ring_reap(s);
>>> +qemu_mutex_unlock_iothread();
>>> +
>>> +r->reaper_iteration++;
>>> +}
>> I don't know when does this iteration exit?
>> And I see that we start this reaper_thread in kvm_init(), maybe it's better 
>> to start it
>> when start dirty log and stop it when stop dirty log.
> 
> Yes we can make it conditional, but note that we can't hook at functions like
> memory_global_dirty_log_start() because that is only for migration purpose.
> 
> Currently QEMU exports the dirty tracking more than that, e.g., to the VGA
> code.  We'll need to try to detect whether there's any existing MR got its
> mr->dirty_log_mask set (besides global_dirty_log being set).  When all of them
> got cleared we'll need to detect too so as to turn the thread off.
> 
> It's just easier to me to run this thread with such a timeout, then when not
> necessary it'll see empty ring and return fast (index comparison for each
> ring).  Not to mention the VGA dirty tracking should be on for most of the VM
> lifecycle, so even if we have that knob this thread will probably be running
> for 99% of the time as long as any MR has its DIRTY_MEMORY_VGA bit set.
Make sense. Thanks for your explanation!

Thanks,
Keqian
> 
> [...]
> 
>>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>>> index c68bc3ba8af..2f0991d93f7 100644
>>> --- a/include/hw/core/cpu.h
>>> +++ b/include/hw/core/cpu.h
>>> @@ -323,6 +323,11 @@ struct qemu_work_item;
>>>   * @ignore_memory_transaction_failures: Cached copy of the MachineState
>>>   *flag of the same name: allows the board to suppress calling of the
>>>   *CPU do_transaction_failed hook function.
>>> + * @kvm_dirty_ring_full:
>>> + *   Whether the kvm dirty ring of this vcpu is soft-full.
>>> + * @kvm_dirty_ring_avail:
>>> + *   Semaphore to be posted when the kvm dirty ring of the vcpu is
>>> + *   available again.
>> The doc does not match code.
> 
> Right; fixed.
> 
> Thanks for taking a look, keqian.
> 



Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events

2021-03-22 Thread David Gibson
On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
> 
> Patches 1 and 3 are independent of the ppc patches and can be applied
> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
> are dependent on the QAPI patches.

Implementation looks fine, but I think there's a bit more to discuss
before we can apply.

I think it would make sense to re-order this and put UNPLUG_ERROR
first.  Its semantics are clearer, and I think there's a stronger case
for it.

I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
it really tell the user/management anything useful beyond what
receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes

2021-03-22 Thread David Gibson
On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote:
> Kernel commit 4bce545903fa ("powerpc/topology: Update
> topology_core_cpumask") cause a regression in the pseries machine when
> defining certain SMP topologies [1]. The reasoning behind the change is
> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating
> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with
> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as
> far as the kernel understanding of SMP topologies goes, both masks are
> equivalent.
> 
> Further discussions in the kernel mailing list [2] shown that the
> powerpc kernel always considered that the number of sockets were equal
> to the number of NUMA nodes. The claim is that it doesn't make sense,
> for Power hardware at least, 2+ sockets being in the same NUMA node. The
> immediate conclusion is that all SMP topologies the pseries machine were
> supplying to the kernel, with more than one socket in the same NUMA node
> as in [1], happened to be correctly represented in the kernel by
> accident during all these years.
> 
> There's a case to be made for virtual topologies being detached from
> hardware constraints, allowing maximum flexibility to users. At the same
> time, this freedom can't result in unrealistic hardware representations
> being emulated. If the real hardware and the pseries kernel don't
> support multiple chips/sockets in the same NUMA node, neither should we.
> 
> Starting in 6.0.0, all sockets must match an unique NUMA node in the
> pseries machine. qtest changes were made to adapt to this new
> condition.

Oof.  I really don't like this idea.  It means a bunch of fiddly work
for users to match these up, for no real gain.  I'm also concerned
that this will require follow on changes in libvirt to not make this a
really cryptic and irritating point of failure.

> 
> [1] https://bugzilla.redhat.com/1934421
> [2] 
> https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364...@gmail.com/
> 
> CC: Srikar Dronamraju 
> CC: Cédric Le Goater 
> CC: Igor Mammedov 
> CC: Laurent Vivier 
> CC: Thomas Huth 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c |  3 ++
>  hw/ppc/spapr_numa.c|  7 +
>  include/hw/ppc/spapr.h |  1 +
>  tests/qtest/cpu-plug-test.c|  4 +--
>  tests/qtest/device-plug-test.c |  9 +-
>  tests/qtest/numa-test.c| 52 --
>  6 files changed, 64 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d56418ca29..745f71c243 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true);
>   */
>  static void spapr_machine_5_2_class_options(MachineClass *mc)
>  {
> +SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> +
>  spapr_machine_6_0_class_options(mc);
>  compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
> +smc->pre_6_0_smp_topology = true;
>  }
>  
>  DEFINE_SPAPR_MACHINE(5_2, "5.2", false);
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 779f18b994..0ade43dd79 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState 
> *spapr,
>  int i, j, max_nodes_with_gpus;
>  bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
>  
> +if (!smc->pre_6_0_smp_topology &&
> +nb_numa_nodes != machine->smp.sockets) {
> +error_report("Number of CPU sockets must be equal to the number "
> + "of NUMA nodes");
> +exit(EXIT_FAILURE);
> +}
> +
>  /*
>   * For all associativity arrays: first position is the size,
>   * position MAX_DISTANCE_REF_POINTS is always the numa_id,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 47cebaf3ac..98dc5d198a 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -142,6 +142,7 @@ struct SpaprMachineClass {
>  hwaddr rma_limit;  /* clamp the RMA to this size */
>  bool pre_5_1_assoc_refpoints;
>  bool pre_5_2_numa_associativity;
> +bool pre_6_0_smp_topology;
>  
>  bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
>uint64_t *buid, hwaddr *pio,
> diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c
> index a1c689414b..946b9129ea 100644
> --- a/tests/qtest/cpu-plug-test.c
> +++ b/tests/qtest/cpu-plug-test.c
> @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname)
>  data->machine = g_strdup(mname);
>  data->cpu_model = "power8_v2.0";
>  data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
> -data->sockets = 2;
> -data->cores = 3;
> +data->sockets = 1;
> +data->cores = 6;
>  data->threads = 1;
>  data->maxcpus = data->sockets * data->cores * data->threads;
>  
> diff --git 

Re: [PATCH v10 6/7] hw/pci-host: Add emulation of Marvell MV64361 PPC system controller

2021-03-22 Thread David Gibson
On Wed, Mar 17, 2021 at 02:17:51AM +0100, BALATON Zoltan wrote:
> The Marvell Discovery II aka. MV64361 is a PowerPC system controller
> chip that is used on the pegasos2 PPC board. This adds emulation of it
> that models the device enough to boot guests on this board. The
> mv643xx.h header with register definitions is taken from Linux 4.15.10
> only fixing white space errors, removing not needed parts and changing
> formatting for QEMU coding style.
> 
> Signed-off-by: BALATON Zoltan 
> ---
>  hw/pci-host/Kconfig   |   4 +
>  hw/pci-host/meson.build   |   2 +
>  hw/pci-host/mv64361.c | 966 ++
>  hw/pci-host/mv643xx.h | 918 
>  hw/pci-host/trace-events  |   6 +
>  include/hw/pci-host/mv64361.h |   8 +
>  include/hw/pci/pci_ids.h  |   1 +
>  7 files changed, 1905 insertions(+)
>  create mode 100644 hw/pci-host/mv64361.c
>  create mode 100644 hw/pci-host/mv643xx.h
>  create mode 100644 include/hw/pci-host/mv64361.h

Adding yourself to MAINTAINERS for this would be good.

> 
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index 2ccc96f02c..79c20bf28b 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -72,3 +72,7 @@ config REMOTE_PCIHOST
>  config SH_PCI
>  bool
>  select PCI
> +
> +config MV64361
> +bool
> +select PCI
> diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
> index 87a896973e..34b3538beb 100644
> --- a/hw/pci-host/meson.build
> +++ b/hw/pci-host/meson.build
> @@ -19,6 +19,8 @@ pci_ss.add(when: 'CONFIG_GRACKLE_PCI', if_true: 
> files('grackle.c'))
>  pci_ss.add(when: 'CONFIG_UNIN_PCI', if_true: files('uninorth.c'))
>  # PowerPC E500 boards
>  pci_ss.add(when: 'CONFIG_PPCE500_PCI', if_true: files('ppce500.c'))
> +# Pegasos2
> +pci_ss.add(when: 'CONFIG_MV64361', if_true: files('mv64361.c'))
>  
>  # ARM devices
>  pci_ss.add(when: 'CONFIG_VERSATILE_PCI', if_true: files('versatile.c'))
> diff --git a/hw/pci-host/mv64361.c b/hw/pci-host/mv64361.c
> new file mode 100644
> index 00..d71402f8b5
> --- /dev/null
> +++ b/hw/pci-host/mv64361.c
> @@ -0,0 +1,966 @@
> +/*
> + * Marvell Discovery II MV64361 System Controller for
> + * QEMU PowerPC CHRP (Genesi/bPlan Pegasos II) hardware System Emulator
> + *
> + * Copyright (c) 2018-2020 BALATON Zoltan
> + *
> + * This work is licensed under the GNU GPL license version 2 or later.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "hw/hw.h"
> +#include "hw/sysbus.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/irq.h"
> +#include "hw/intc/i8259.h"
> +#include "hw/qdev-properties.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "trace.h"
> +#include "hw/pci-host/mv64361.h"
> +#include "mv643xx.h"
> +
> +#define TYPE_MV64361_PCI_BRIDGE "mv64361-pcibridge"
> +
> +static void mv64361_pcibridge_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +k->vendor_id = PCI_VENDOR_ID_MARVELL;
> +k->device_id = PCI_DEVICE_ID_MARVELL_MV6436X;
> +k->class_id = PCI_CLASS_BRIDGE_HOST;
> +/*
> + * PCI-facing part of the host bridge,
> + * not usable without the host-facing part
> + */
> +dc->user_creatable = false;
> +}
> +
> +static const TypeInfo mv64361_pcibridge_info = {
> +.name  = TYPE_MV64361_PCI_BRIDGE,
> +.parent= TYPE_PCI_DEVICE,
> +.instance_size = sizeof(PCIDevice),
> +.class_init= mv64361_pcibridge_class_init,
> +.interfaces = (InterfaceInfo[]) {
> +{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +{ },
> +},
> +};
> +
> +
> +#define TYPE_MV64361_PCI "mv64361-pcihost"
> +OBJECT_DECLARE_SIMPLE_TYPE(MV64361PCIState, MV64361_PCI)
> +
> +struct MV64361PCIState {
> +PCIHostState parent_obj;
> +
> +uint8_t index;
> +MemoryRegion io;
> +MemoryRegion mem;
> +qemu_irq irq[PCI_NUM_PINS];
> +
> +uint32_t io_base;
> +uint32_t io_size;
> +uint32_t mem_base[4];
> +uint32_t mem_size[4];
> +uint64_t remap[5];
> +};
> +
> +static int mv64361_pcihost_map_irq(PCIDevice *pci_dev, int n)
> +{
> +return (n + PCI_SLOT(pci_dev->devfn)) % PCI_NUM_PINS;
> +}
> +
> +static void mv64361_pcihost_set_irq(void *opaque, int n, int level)
> +{
> +MV64361PCIState *s = opaque;
> +qemu_set_irq(s->irq[n], level);
> +}
> +
> +static void mv64361_pcihost_realize(DeviceState *dev, Error **errp)
> +{
> +MV64361PCIState *s = MV64361_PCI(dev);
> +PCIHostState *h = PCI_HOST_BRIDGE(dev);
> +char *name;
> +
> +name = g_strdup_printf("pci%d-io", s->index);
> +memory_region_init(>io, OBJECT(dev), name, 0x1);
> +g_free(name);
> +name = g_strdup_printf("pci%d-mem", s->index);
> +memory_region_init(>mem, 

Re: [PATCH v10 7/7] hw/ppc: Add emulation of Genesi/bPlan Pegasos II

2021-03-22 Thread David Gibson
On Wed, Mar 17, 2021 at 02:17:51AM +0100, BALATON Zoltan wrote:
> Add new machine called pegasos2 emulating the Genesi/bPlan Pegasos II,
> a PowerPC board based on the Marvell MV64361 system controller and the
> VIA VT8231 integrated south bridge/superio chips. It can run Linux,
> AmigaOS and a wide range of MorphOS versions. Currently a firmware ROM
> image is needed to boot and only MorphOS has a video driver to produce
> graphics output. Linux could work too but distros that supported this
> machine don't include usual video drivers so those only run with
> serial console for now.
> 
> Signed-off-by: BALATON Zoltan 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  MAINTAINERS |  10 ++
>  default-configs/devices/ppc-softmmu.mak |   2 +
>  hw/ppc/Kconfig  |   9 ++
>  hw/ppc/meson.build  |   2 +
>  hw/ppc/pegasos2.c   | 144 
>  5 files changed, 167 insertions(+)
>  create mode 100644 hw/ppc/pegasos2.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b6ab3d25a7..1c3c55ef09 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1353,6 +1353,16 @@ F: pc-bios/canyonlands.dt[sb]
>  F: pc-bios/u-boot-sam460ex-20100605.bin
>  F: roms/u-boot-sam460ex
>  
> +pegasos2
> +M: BALATON Zoltan 
> +R: David Gibson 
> +L: qemu-...@nongnu.org
> +S: Maintained
> +F: hw/ppc/pegasos2.c
> +F: hw/pci-host/mv64361.c
> +F: hw/pci-host/mv643xx.h
> +F: include/hw/pci-host/mv64361.h

Oh, sorry about the comment in the previous patch.

>  RISC-V Machines
>  ---
>  OpenTitan
> diff --git a/default-configs/devices/ppc-softmmu.mak 
> b/default-configs/devices/ppc-softmmu.mak
> index 61b78b844d..4535993d8d 100644
> --- a/default-configs/devices/ppc-softmmu.mak
> +++ b/default-configs/devices/ppc-softmmu.mak
> @@ -14,5 +14,7 @@ CONFIG_SAM460EX=y
>  CONFIG_MAC_OLDWORLD=y
>  CONFIG_MAC_NEWWORLD=y
>  
> +CONFIG_PEGASOS2=y

I don't think we can have this default to enabled while it requires a
non-free ROM to start.

>  # For PReP
>  CONFIG_PREP=y
> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
> index d11dc30509..e51e0e5e5a 100644
> --- a/hw/ppc/Kconfig
> +++ b/hw/ppc/Kconfig
> @@ -68,6 +68,15 @@ config SAM460EX
>  select USB_OHCI
>  select FDT_PPC
>  
> +config PEGASOS2
> +bool
> +select MV64361
> +select VT82C686
> +select IDE_VIA
> +select SMBUS_EEPROM
> +# This should come with VT82C686
> +select ACPI_X86
> +
>  config PREP
>  bool
>  imply PCI_DEVICES
> diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
> index 218631c883..86d6f379d1 100644
> --- a/hw/ppc/meson.build
> +++ b/hw/ppc/meson.build
> @@ -78,5 +78,7 @@ ppc_ss.add(when: 'CONFIG_E500', if_true: files(
>  ))
>  # PowerPC 440 Xilinx ML507 reference board.
>  ppc_ss.add(when: 'CONFIG_VIRTEX', if_true: files('virtex_ml507.c'))
> +# Pegasos2
> +ppc_ss.add(when: 'CONFIG_PEGASOS2', if_true: files('pegasos2.c'))
>  
>  hw_arch += {'ppc': ppc_ss}
> diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
> new file mode 100644
> index 00..0bfd0928aa
> --- /dev/null
> +++ b/hw/ppc/pegasos2.c
> @@ -0,0 +1,144 @@
> +/*
> + * QEMU PowerPC CHRP (Genesi/bPlan Pegasos II) hardware System Emulator
> + *
> + * Copyright (c) 2018-2020 BALATON Zoltan
> + *
> + * This work is licensed under the GNU GPL license version 2 or later.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "hw/hw.h"
> +#include "hw/ppc/ppc.h"
> +#include "hw/sysbus.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/irq.h"
> +#include "hw/pci-host/mv64361.h"
> +#include "hw/isa/vt82c686.h"
> +#include "hw/ide/pci.h"
> +#include "hw/i2c/smbus_eeprom.h"
> +#include "hw/qdev-properties.h"
> +#include "sysemu/reset.h"
> +#include "hw/boards.h"
> +#include "hw/loader.h"
> +#include "hw/fw-path-provider.h"
> +#include "elf.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/kvm.h"
> +#include "kvm_ppc.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
> +#include "qemu/datadir.h"
> +#include "sysemu/device_tree.h"
> +
> +#define PROM_FILENAME "pegasos2.rom"
> +#define PROM_ADDR 0xfff0
> +#define PROM_SIZE 0x8
> +
> +#define BUS_FREQ_HZ 1
> +
> +static void pegasos2_cpu_reset(void *opaque)
> +{
> +PowerPCCPU *cpu = opaque;
> +
> +cpu_reset(CPU(cpu));
> +cpu->env.spr[SPR_HID1] = 7ULL << 28;
> +}
> +
> +static void pegasos2_init(MachineState *machine)
> +{
> +PowerPCCPU *cpu = NULL;
> +MemoryRegion *rom = g_new(MemoryRegion, 1);
> +DeviceState *mv;
> +PCIBus *pci_bus;
> +PCIDevice *dev;
> +I2CBus *i2c_bus;
> +const char *fwname = machine->firmware ?: PROM_FILENAME;
> +char *filename;
> +int sz;
> +uint8_t *spd_data;
> +
> +/* init CPU */
> +cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
> +if (PPC_INPUT(>env) != PPC_FLAGS_INPUT_6xx) {
> +

Re: [PATCH v10 0/7] Pegasos2 emulation

2021-03-22 Thread David Gibson
On Wed, Mar 17, 2021 at 02:17:51AM +0100, BALATON Zoltan wrote:
> Hello,
> 
> This is adding a new PPC board called pegasos2. More info on it can be
> found at:
> 
> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2
> 
> Currently it needs a firmware ROM image that I cannot include due to
> original copyright holder (bPlan) did not release it under a free
> licence but I have plans to write a replacement in the future. With
> the original board firmware it can boot MorphOS now as:
> 
> qemu-system-ppc -M pegasos2 -cdrom morphos.iso -device ati-vga,romfile="" 
> -serial stdio
> 
> then enter "boot cd boot.img" at the firmware "ok" prompt as described
> in the MorphOS.readme. To boot Linux use same command line with e.g.
> -cdrom debian-8.11.0-powerpc-netinst.iso then enter
> "boot cd install/pegasos"
> 
> The last patch adds the actual board code after previous patches
> adding VT8231 and MV64361 system controller chip emulation.

I've applied 1..5 to a new ppc-for-6.1 branch.  Sorry it didn't make
it for 6.0, I just didn't have time to look this over until too late.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] configure: Improve alias attribute check

2021-03-22 Thread Gavin Shan

Hi Richard,

On 3/23/21 7:59 AM, Richard Henderson wrote:

On 3/22/21 4:54 AM, Gavin Shan wrote:

It looks this issue can be avoided after "volatile" is applied to
@target_page. However, I'm not sure if it's the correct fix to have.


Certainly not.

That is the exact opposite of what we want.  We want to minimize the number of 
reads from the variable, not maximize them.



Yes, It's something I was thinking of. "volatile" can make
@target_page visible to gcc, but maximizes the number of
reads. By the way, your patch to use "-fno-lto" worked for
me and it has been split into 3 patches by Phil. Richard,
thanks for the quick fixup :)

Thanks,
Gavin




Re: [PATCH] hw/net: fsl_etsec: Tx padding length should exclude CRC

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 01:48:07PM +0800, Bin Meng wrote:
> Hi David,
> 
> On Mon, Mar 22, 2021 at 1:24 PM David Gibson
>  wrote:
> >
> > On Mon, Mar 22, 2021 at 12:33:06PM +0800, Bin Meng wrote:
> > > Hi David,
> > >
> > > On Mon, Mar 22, 2021 at 12:11 PM David Gibson
> > >  wrote:
> > > >
> > > > On Tue, Mar 16, 2021 at 04:15:05PM +0800, Bin Meng wrote:
> > > > > As the comment of tx_padding_and_crc() says: "Never add CRC in QEMU",
> > > > > min_frame_len should excluce CRC, so it should be 60 instead of 64.
> > > >
> > > > Sorry, your reasoning still isn't clear to me.  If qemu is not adding
> > > > the CRC, what is?
> > >
> > > No one is padding CRC in QEMU. QEMU network backends pass payload
> > > without CRC in between.
> >
> > Ok, but the CRCs must be added if the packets are bridged onto a real
> > device, yes?  Where does that happen?
> 
> I've never used it like that before. What's the command line to test that?
> 
> > >
> > > > Will it always append a CRC after this padding is complete?
> > >
> > > No.
> >
> > If that's true, then won't the packets still be shorter than expected
> > if we only pad to 60 bytes?
> 
> In QEMU packets are transmitted without CRC between network backends,
> and when a NIC receives a packet, the minimum required payload length
> is 60 bytes without a CRC.

Ok, I see what you're saying, and indeed it already pads to 60 bytes,
rather than 64 on the Rx side.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] hw/net: fsl_etsec: Tx padding length should exclude CRC

2021-03-22 Thread David Gibson
On Tue, Mar 16, 2021 at 04:15:05PM +0800, Bin Meng wrote:
> As the comment of tx_padding_and_crc() says: "Never add CRC in QEMU",
> min_frame_len should excluce CRC, so it should be 60 instead of 64.
> 
> Signed-off-by: Bin Meng 

Applied to ppc-for-6.0, thanks.

> ---
> 
>  hw/net/fsl_etsec/rings.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
> index d6be0d7d18..8f08446415 100644
> --- a/hw/net/fsl_etsec/rings.c
> +++ b/hw/net/fsl_etsec/rings.c
> @@ -259,7 +259,7 @@ static void process_tx_bd(eTSEC *etsec,
>  || etsec->regs[MACCFG2].value & MACCFG2_PADCRC) {
>  
>  /* Padding and CRC (Padding implies CRC) */
> -tx_padding_and_crc(etsec, 64);
> +tx_padding_and_crc(etsec, 60);
>  
>  } else if (etsec->first_bd.flags & BD_TX_TC
> || etsec->regs[MACCFG2].value & MACCFG2_CRC_EN) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 10/17] target/ppc: Create helper_scv

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 11:05:00AM -0600, Richard Henderson wrote:
> On 3/21/21 10:00 PM, David Gibson wrote:
> > On Mon, Mar 15, 2021 at 12:46:08PM -0600, Richard Henderson wrote:
> > > Perform the test against FSCR_SCV at runtime, in the helper.
> > > 
> > > This means we can remove the incorrect set against SCV in
> > > ppc_tr_init_disas_context and do not need to add an HFLAGS bit.
> > > 
> > > Signed-off-by: Richard Henderson 
> > > ---
> > >   target/ppc/helper.h  |  1 +
> > >   target/ppc/excp_helper.c |  9 +
> > >   target/ppc/translate.c   | 20 +++-
> > >   3 files changed, 17 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> > > index 6a4dccf70c..513066d54d 100644
> > > --- a/target/ppc/helper.h
> > > +++ b/target/ppc/helper.h
> > > @@ -13,6 +13,7 @@ DEF_HELPER_1(rfci, void, env)
> > >   DEF_HELPER_1(rfdi, void, env)
> > >   DEF_HELPER_1(rfmci, void, env)
> > >   #if defined(TARGET_PPC64)
> > > +DEF_HELPER_2(scv, noreturn, env, i32)
> > >   DEF_HELPER_2(pminsn, void, env, i32)
> > >   DEF_HELPER_1(rfid, void, env)
> > >   DEF_HELPER_1(rfscv, void, env)
> > > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > > index 85de7e6c90..5c95e0c103 100644
> > > --- a/target/ppc/excp_helper.c
> > > +++ b/target/ppc/excp_helper.c
> > > @@ -1130,6 +1130,15 @@ void helper_store_msr(CPUPPCState *env, 
> > > target_ulong val)
> > >   }
> > >   #if defined(TARGET_PPC64)
> > > +void helper_scv(CPUPPCState *env, uint32_t lev)
> > > +{
> > > +if (env->spr[SPR_FSCR] & (1ull << FSCR_SCV)) {
> > > +raise_exception_err(env, POWERPC_EXCP_SYSCALL_VECTORED, lev);
> > > +} else {
> > > +raise_exception_err(env, POWERPC_EXCP_FU, FSCR_IC_SCV);
> > > +}
> > > +}
> > > +
> > >   void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t insn)
> > >   {
> > >   CPUState *cs;
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index 7912495f28..d48c554290 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -173,7 +173,6 @@ struct DisasContext {
> > >   bool vsx_enabled;
> > >   bool spe_enabled;
> > >   bool tm_enabled;
> > > -bool scv_enabled;
> > >   bool gtse;
> > >   ppc_spr_t *spr_cb; /* Needed to check rights for mfspr/mtspr */
> > >   int singlestep_enabled;
> > > @@ -4081,15 +4080,16 @@ static void gen_sc(DisasContext *ctx)
> > >   #if !defined(CONFIG_USER_ONLY)
> > >   static void gen_scv(DisasContext *ctx)
> > >   {
> > > -uint32_t lev;
> > > +uint32_t lev = (ctx->opcode >> 5) & 0x7F;
> > > -if (unlikely(!ctx->scv_enabled)) {
> > > -gen_exception_err(ctx, POWERPC_EXCP_FU, FSCR_IC_SCV);
> > > -return;
> > > +/* Set the PC back to the faulting instruction. */
> > > +if (ctx->exception == POWERPC_EXCP_NONE) {
> > > +gen_update_nip(ctx, ctx->base.pc_next - 4);
> > >   }
> > 
> > I don't quite understand this.  Don't we need the NIP to be on the scv
> > instruction itself for the case where we get a facility unavailable
> > exception, but on the next instruction if we actually take the system
> > call?  This appears to be unconditional.
> > 
> > > +gen_helper_scv(cpu_env, tcg_constant_i32(lev));
> > > -lev = (ctx->opcode >> 5) & 0x7F;
> > > -gen_exception_err(ctx, POWERPC_SYSCALL_VECTORED, lev);
> 
> Hmm.  In the old code, both paths use gen_exception_err, without otherwise
> manipulating NIP.  That suggests to me that both exceptions receive the same
> value in NIP.
> 
> Is there an adjustment to NIP when delivering the SCV exception?  Yep:

Ok.  Just shows my ignorance of the exception handling code.

> 
> case POWERPC_EXCP_SYSCALL_VECTORED:
> lev = env->error_code;
> dump_syscall_vectored(env);
> env->nip += 4;
> new_msr |= env->msr & ((target_ulong)1 << MSR_EE);
> new_msr |= env->msr & ((target_ulong)1 << MSR_RI);
> break;
> 
> 
> r~
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 13/17] target/ppc: Remove env->immu_idx and env->dmmu_idx

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 11:27:49AM -0600, Richard Henderson wrote:
> On 3/21/21 10:26 PM, David Gibson wrote:
> > On Mon, Mar 15, 2021 at 12:46:11PM -0600, Richard Henderson wrote:
> > > We weren't recording MSR_GS in hflags, which means that BookE
> > > memory accesses were essentially random vs Guest State.
> > > 
> > > Instead of adding this bit directly, record the completed mmu
> > > indexes instead.  This makes it obvious that we are recording
> > > exactly the information that we need.
> > > 
> > > This also means that we can stop directly recording MSR_IR.
> > 
> > What still uses MSR_DR, that you can't also drop it?
> 
> #define CHK_HVRM\
> do {\
> if (unlikely(ctx->pr || !ctx->hv || ctx->dr)) { \
> 
> I have this notion that this (and CHK_HV and CHK_SV) could be a test against
> mmu_idx instead, but was reluctant to make that change.

Yeah, that's checking for hypervisor real mode (hence "HVRM") for
ldcix and friends, so it should be equivalent to (mmu_idx != 7).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 07/17] target/ppc: Disconnect hflags from MSR

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 10:55:46AM -0600, Richard Henderson wrote:
> On 3/21/21 9:52 PM, David Gibson wrote:
> > > +/*
> > > + * Bits for env->hflags.
> > > + *
> > > + * Most of these bits overlap with corresponding bits in MSR,
> > > + * but some come from other sources.  Be cautious when modifying.
> > 
> > Yeah.. I'm not sure "be cautious" is enough of a warning.  The exact
> > value of some but not all of these flags must equal that for the
> > corresponding MSR bits, which is terrifyingly subtle.
> 
> Fair.  How about, for the comment here, "This is validated in 
> hreg_compute_hflags."
> 
> > > +/* Some bits come straight across from MSR. */
> > > +msr_mask = ((1 << MSR_LE) | (1 << MSR_PR) |
> > > +(1 << MSR_DR) | (1 << MSR_IR) |
> > > +(1 << MSR_FP) | (1 << MSR_SA) | (1 << MSR_AP));
> 
> Here, and in every other spot within this function where we manipulate 
> msr_mask,
> 
> QEMU_BUILD_BUG_ON(MSR_LE != HFLAGS_LE);

Seems reasonable.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] [NFC] Mark locally used symbols as static.

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 10:55:42PM +0300, Yuri Gribov wrote:
> Hi all,
> 
> This patch makes locally used symbols static to enable more compiler
> optimizations on them. Some of the symbols turned out to not be used
> at all so I marked them with ATTRIBUTE_UNUSED (as I wasn't sure if
> they were ok to delete).
> 
> The symbols have been identified with a pet project of mine:
> https://github.com/yugr/Localizer
> 
> >From 07b4f05893b7037e68e5d7bdec5ba8e74e50 Mon Sep 17 00:00:00 2001
> From: Yury Gribov 
> Date: Sat, 20 Mar 2021 23:39:15 +0300
> Subject: [PATCH] [NFC] Mark locally used symbols as static.
> 
> Signed-off-by: Yury Gribov 

ppc parts
Acked-by: David Gibson 

> ---
>  disas/alpha.c | 16 ++--
>  disas/m68k.c  | 78 -
>  disas/mips.c  | 14 ++--
>  disas/nios2.c | 84 +--
>  disas/ppc.c   | 26 +++---
>  disas/riscv.c |  2 +-
>  pc-bios/optionrom/linuxboot_dma.c |  4 +-
>  scripts/tracetool/format/c.py |  2 +-
>  target/hexagon/gen_dectree_import.c   |  2 +-
>  target/hexagon/opcodes.c  |  2 +-
>  target/i386/cpu.c |  2 +-
>  target/s390x/cpu_models.c |  2 +-
>  .../xtensa/core-dc232b/xtensa-modules.c.inc   |  2 +-
>  .../xtensa/core-dc233c/xtensa-modules.c.inc   |  2 +-
>  target/xtensa/core-de212/xtensa-modules.c.inc |  2 +-
>  .../core-de233_fpu/xtensa-modules.c.inc   |  2 +-
>  .../xtensa/core-dsp3400/xtensa-modules.c.inc  |  2 +-
>  target/xtensa/core-fsf/xtensa-modules.c.inc   |  2 +-
>  .../xtensa-modules.c.inc  |  2 +-
>  .../core-test_kc705_be/xtensa-modules.c.inc   |  2 +-
>  .../core-test_mmuhifi_c3/xtensa-modules.c.inc |  2 +-
>  21 files changed, 125 insertions(+), 127 deletions(-)
> 
> diff --git a/disas/alpha.c b/disas/alpha.c
> index 3db90fa..361a4ed 100644
> --- a/disas/alpha.c
> +++ b/disas/alpha.c
> @@ -56,8 +56,8 @@ struct alpha_opcode
>  /* The table itself is sorted by major opcode number, and is otherwise
> in the order in which the disassembler should consider
> instructions.  */
> -extern const struct alpha_opcode alpha_opcodes[];
> -extern const unsigned alpha_num_opcodes;
> +static const struct alpha_opcode alpha_opcodes[];
> +static const unsigned alpha_num_opcodes;
> 
>  /* Values defined for the flags field of a struct alpha_opcode.  */
> 
> @@ -137,8 +137,8 @@ struct alpha_operand
>  /* Elements in the table are retrieved by indexing with values from
> the operands field of the alpha_opcodes table.  */
> 
> -extern const struct alpha_operand alpha_operands[];
> -extern const unsigned alpha_num_operands;
> +static const struct alpha_operand alpha_operands[];
> +static const unsigned alpha_num_operands;
> 
>  /* Values defined for the flags field of a struct alpha_operand.  */
> 
> @@ -293,7 +293,7 @@ static int extract_ev6hwjhint (unsigned, int *);
> 
> -const struct alpha_operand alpha_operands[] =
> +static const struct alpha_operand alpha_operands[] =
>  {
>/* The fields are bits, shift, insert, extract, flags */
>/* The zero index is used to indicate end-of-list */
> @@ -424,7 +424,7 @@ const struct alpha_operand alpha_operands[] =
>  insert_ev6hwjhint, extract_ev6hwjhint }
>  };
> 
> -const unsigned alpha_num_operands =
> sizeof(alpha_operands)/sizeof(*alpha_operands);
> +static ATTRIBUTE_UNUSED const unsigned alpha_num_operands =
> sizeof(alpha_operands)/sizeof(*alpha_operands);
> 
>  /* The RB field when it is the same as the RA field in the same insn.
> This operand is marked fake.  The insertion function just copies
> @@ -706,7 +706,7 @@ extract_ev6hwjhint(unsigned insn, int *invalid
> ATTRIBUTE_UNUSED)
>   that were not assigned to a particular extension.
>  */
> 
> -const struct alpha_opcode alpha_opcodes[] = {
> +static const struct alpha_opcode alpha_opcodes[] = {
>{ "halt",  SPCD(0x00,0x), BASE, ARG_NONE },
>{ "draina",SPCD(0x00,0x0002), BASE, ARG_NONE },
>{ "bpt",   SPCD(0x00,0x0080), BASE, ARG_NONE },
> @@ -1732,7 +1732,7 @@ const struct alpha_opcode alpha_opcodes[] = {
>{ "bgt",   BRA(0x3F), BASE, ARG_BRA },
>  };
> 
> -const unsigned alpha_num_opcodes =
> sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);
> +static ATTRIBUTE_UNUSED const unsigned alpha_num_opcodes =
> sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);
> 
>  /* OSF register names.  */
> 
> diff --git a/disas/m68k.c b/disas/m68k.c
> index aefaecf..903d5cf 100644
> --- a/disas/m68k.c
> +++ b/disas/m68k.c
> @@ -95,29 +95,29 @@ struct floatformat
> 
>  /* floatformats for IEEE single and double, big and little endian.  */
> 
> -extern const struct floatformat floatformat_ieee_single_big;
> -extern const struct floatformat floatformat_ieee_single_little;

Re: [PATCH v4 03/17] target/ppc: Properly sync cpu state with new msr in cpu_load_old

2021-03-22 Thread David Gibson
On Mon, Mar 22, 2021 at 10:53:01AM -0600, Richard Henderson wrote:
> On 3/16/21 2:15 AM, Cédric Le Goater wrote:
> > On 3/15/21 7:46 PM, Richard Henderson wrote:
> > > Match cpu_post_load in using ppc_store_msr to set all of
> > > the cpu state implied by the value of msr.  Do not restore
> > > hflags or hflags_nmsr, as we recompute them in ppc_store_msr.
> > > 
> > > Signed-off-by: Richard Henderson 
> > 
> > Could we add a common routine used by cpu_post_load() and cpu_load_old() ?
> 
> Will do.  David, would you like to unqueue this one, or shall I send another
> patch on top?

Pulling that one out causes conflicts with later patches, so another
one on top, please.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/2] block/raw: added support of persistent dirty bitmaps

2021-03-22 Thread Lubos Matejka
Kdy si muzem cinknout k dalsimu vyvoji?

Odesláno z iPhonu

> 22. 3. 2021 v 12:37, Patrik Janoušek :
> 
> 
>> On 3/22/21 12:18 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 22.03.2021 13:46, Vladimir Sementsov-Ogievskiy wrote:
>>> 22.03.2021 13:18, Patrik Janoušek wrote:
 On 3/22/21 9:41 AM, Vladimir Sementsov-Ogievskiy wrote:
> 20.03.2021 12:32, Patrik Janoušek wrote:
>> Current implementation of dirty bitmaps for raw format is very
>> limited, because those bitmaps cannot be persistent. Basically it
>> makes sense, because the raw format doesn't have space where could
>> be dirty bitmap stored when QEMU is offline. This patch solves it
>> by storing content of every dirty bitmap in separate file on the
>> host filesystem.
>> 
>> However, this only solves one part of the problem. We also have to
>> store information about the existence of the dirty bitmap. This is
>> solved by adding custom options, that stores all required metadata
>> about dirty bitmap (filename where is the bitmap stored on the
>> host filesystem, granularity, persistence, etc.).
>> 
>> Signed-off-by: Patrik Janoušek
> 
> 
> Hmm. Did you considered other ways? Honestly, I don't see a reason for
> yet another storing format for bitmaps.
> 
> The task could be simply solved with existing features:
> 
> 1. We have extenal-data-file feature in qcow2 (read
> docs/interop/qcow2.txt). With this thing enabled, qcow2 file contains
> only metadata (persistent bitmaps for example) and data is stored in
> separate sequential raw file. I think you should start from it.
 
 I didn't know about that feature. I'll look at it.
 
 In case I use NBD to access the bitmap context and qcow2 as a solution
 for persistent layer. Would the patch be acceptable? This is
 significant
 change to my solution and I don't have enought time for it at the
 moment
 (mainly due to other parts of my bachelor's thesis). I just want to
 know
 if this kind of feature is interesting to you and its implementation is
 worth my time.
>>> 
>>> Honestly, at this point I think it doesn't. If existing features
>>> satisfy your use-case, no reason to increase complexity of file-posix
>>> driver and QAPI.
>>> 
>> 
>> It's unpleasant to say this, keeping in mind that that's your first
>> submission :(
>> 
>> I can still recommend in a connection with your bachelor's thesis to
>> look at the videos at kvm-forum youtube channel, searching for backup:
>> 
>>  
>> https://www.youtube.com/channel/UCRCSQmAOh7yzgheq-emy1xA/search?query=backup
>> 
>> You'll get a lot of information about current developments of external
>> backup API.
>> 
>> Also note, that there is (or there will be ?) libvirt Backup API,
>> which includes an API for external backup. I don't know the current
>> status of it, but if your project is based on libvirt, it's better to
>> use libvirt backup API instead of using qemu directly. About Libvirt
>> Backup API it's better to ask Eric Blake (adding him to CC).
> Unfortunately, my solution is based on Proxmox so I can't use libvirt's
> features. I know that a beta version of Proxmox Backup Server has been
> released and it would be much better to improve their solution, but they
> did it too late so I couldn't change assignment of my bachelor's thesis.
> 



[PULL v2 19/19] acpi: Move setters/getters of oem fields to X86MachineState

2021-03-22 Thread Michael S. Tsirkin
From: Marian Postevca 

The code that sets/gets oem fields is duplicated in both PC and MICROVM
variants. This commit moves it to X86MachineState so that all x86
variants can use it and duplication is removed.

Signed-off-by: Marian Postevca 
Message-Id: <20210221001737.24499-2-poste...@mutex.one>
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/i386/microvm.h |  4 ---
 include/hw/i386/pc.h  |  4 ---
 include/hw/i386/x86.h |  4 +++
 hw/i386/acpi-build.c  | 48 ++--
 hw/i386/acpi-microvm.c| 16 +-
 hw/i386/microvm.c | 66 ---
 hw/i386/pc.c  | 63 -
 hw/i386/x86.c | 64 +
 8 files changed, 100 insertions(+), 169 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index 372b05774e..f25f837441 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -76,8 +76,6 @@
 #define MICROVM_MACHINE_ISA_SERIAL  "isa-serial"
 #define MICROVM_MACHINE_OPTION_ROMS "x-option-roms"
 #define MICROVM_MACHINE_AUTO_KERNEL_CMDLINE "auto-kernel-cmdline"
-#define MICROVM_MACHINE_OEM_ID  "oem-id"
-#define MICROVM_MACHINE_OEM_TABLE_ID"oem-table-id"
 
 struct MicrovmMachineClass {
 X86MachineClass parent;
@@ -106,8 +104,6 @@ struct MicrovmMachineState {
 Notifier machine_done;
 Notifier powerdown_req;
 struct GPEXConfig gpex;
-char *oem_id;
-char *oem_table_id;
 };
 
 #define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d4c3d73c11..dcf060b791 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -46,8 +46,6 @@ typedef struct PCMachineState {
 bool pit_enabled;
 bool hpet_enabled;
 uint64_t max_fw_size;
-char *oem_id;
-char *oem_table_id;
 
 /* NUMA information: */
 uint64_t numa_nodes;
@@ -65,8 +63,6 @@ typedef struct PCMachineState {
 #define PC_MACHINE_SATA "sata"
 #define PC_MACHINE_PIT  "pit"
 #define PC_MACHINE_MAX_FW_SIZE  "max-fw-size"
-#define PC_MACHINE_OEM_ID   "oem-id"
-#define PC_MACHINE_OEM_TABLE_ID "oem-table-id"
 /**
  * PCMachineClass:
  *
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 56080bd1fb..26c9cc45a4 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -67,6 +67,8 @@ struct X86MachineState {
 OnOffAuto smm;
 OnOffAuto acpi;
 
+char *oem_id;
+char *oem_table_id;
 /*
  * Address space used by IOAPIC device. All IOAPIC interrupts
  * will be translated to MSI messages in the address space.
@@ -76,6 +78,8 @@ struct X86MachineState {
 
 #define X86_MACHINE_SMM  "smm"
 #define X86_MACHINE_ACPI "acpi"
+#define X86_MACHINE_OEM_ID   "oem-id"
+#define X86_MACHINE_OEM_TABLE_ID "oem-table-id"
 
 #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
 OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 3aeae15e57..de98750aef 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1807,7 +1807,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 g_array_append_vals(table_data, dsdt->buf->data, dsdt->buf->len);
 build_header(linker, table_data,
 (void *)(table_data->data + table_data->len - dsdt->buf->len),
- "DSDT", dsdt->buf->len, 1, pcms->oem_id, pcms->oem_table_id);
+ "DSDT", dsdt->buf->len, 1, x86ms->oem_id, 
x86ms->oem_table_id);
 free_aml_allocator();
 }
 
@@ -1984,8 +1984,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_header(linker, table_data,
  (void *)(table_data->data + srat_start),
  "SRAT",
- table_data->len - srat_start, 1, pcms->oem_id,
- pcms->oem_table_id);
+ table_data->len - srat_start, 1, x86ms->oem_id,
+ x86ms->oem_table_id);
 }
 
 /*
@@ -2338,13 +2338,13 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 if (slic_oem.id) {
 oem_id = slic_oem.id;
 } else {
-oem_id = pcms->oem_id;
+oem_id = x86ms->oem_id;
 }
 
 if (slic_oem.table_id) {
 oem_table_id = slic_oem.table_id;
 } else {
-oem_table_id = pcms->oem_table_id;
+oem_table_id = x86ms->oem_table_id;
 }
 
 table_offsets = g_array_new(false, true /* clear */,
@@ -2385,30 +2385,30 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 
 acpi_add_table(table_offsets, tables_blob);
 acpi_build_madt(tables_blob, tables->linker, x86ms,
-ACPI_DEVICE_IF(x86ms->acpi_dev), pcms->oem_id,
-pcms->oem_table_id);
+

[PULL v2 18/19] acpi: Set proper maximum size for "etc/acpi/rsdp" blob

2021-03-22 Thread Michael S. Tsirkin
From: David Hildenbrand 

Let's also set a maximum size for "etc/acpi/rsdp", so the maximum
size doesn't get implicitly set based on the initial table size. In my
experiments, the table size was in the range of 22 bytes, so a single
page (== what we used until now) seems to be good enough.

Now that we have defined maximum sizes for all currently used table types,
let's assert that we catch usage with new tables that need a proper maximum
size definition.

Also assert that our initial size does not exceed the maximum size; while
qemu_ram_alloc_internal() properly asserts that the initial RAMBlock size
is <= its maximum size, the result might differ when the host page size
is bigger than 4k.

Suggested-by: Laszlo Ersek 
Cc: Alistair Francis 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Peter Maydell 
Cc: Shannon Zhao 
Cc: Marcel Apfelbaum 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Laszlo Ersek 
Signed-off-by: David Hildenbrand 
Message-Id: <20210304105554.121674-5-da...@redhat.com>
Reviewed-by: Laszlo Ersek 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/utils.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/utils.c b/hw/acpi/utils.c
index f2d69a6d92..0c486ea29f 100644
--- a/hw/acpi/utils.c
+++ b/hw/acpi/utils.c
@@ -29,14 +29,19 @@
 MemoryRegion *acpi_add_rom_blob(FWCfgCallback update, void *opaque,
 GArray *blob, const char *name)
 {
-uint64_t max_size = 0;
+uint64_t max_size;
 
 /* Reserve RAM space for tables: add another order of magnitude. */
 if (!strcmp(name, ACPI_BUILD_TABLE_FILE)) {
 max_size = 0x20;
 } else if (!strcmp(name, ACPI_BUILD_LOADER_FILE)) {
 max_size = 0x1;
+} else if (!strcmp(name, ACPI_BUILD_RSDP_FILE)) {
+max_size = 0x1000;
+} else {
+g_assert_not_reached();
 }
+g_assert(acpi_data_len(blob) <= max_size);
 
 return rom_add_blob(name, blob->data, acpi_data_len(blob), max_size, -1,
 name, update, opaque, NULL, true);
-- 
MST




[PULL v2 15/19] acpi: Set proper maximum size for "etc/table-loader" blob

2021-03-22 Thread Michael S. Tsirkin
From: David Hildenbrand 

The resizeable memory region / RAMBlock that is created for the cmd blob
has a maximum size of whole host pages (e.g., 4k), because RAMBlocks
work on full host pages. In addition, in i386 ACPI code:
  acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
makes sure to align to multiples of 4k, padding with 0.

For example, if our cmd_blob is created with a size of 2k, the maximum
size is 4k - we cannot grow beyond that. Growing might be required
due to guest action when rebuilding the tables, but also on incoming
migration.

This automatic generation of the maximum size used to be sufficient,
however, there are cases where we cross host pages now when growing at
runtime: we exceed the maximum size of the RAMBlock and can crash QEMU when
trying to resize the resizeable memory region / RAMBlock:
  $ build/qemu-system-x86_64 --enable-kvm \
  -machine q35,nvdimm=on \
  -smp 1 \
  -cpu host \
  -m size=2G,slots=8,maxmem=4G \
  -object memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \
  -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \
  -nodefaults \
  -device vmgenid \
  -device intel-iommu

Results in:
  Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850:
  qemu-system-x86_64: Size too large: /rom@etc/table-loader:
0x2000 > 0x1000: Invalid argument

In this configuration, we consume exactly 4k (32 entries, 128 bytes each)
when creating the VM. However, once the guest boots up and maps the MCFG,
we also create the MCFG table and end up consuming 2 additional entries
(pointer + checksum) -- which is where we try resizing the memory region
/ RAMBlock, however, the maximum size does not allow for it.

Currently, we get the following maximum sizes for our different
mutable tables based on behavior of resizeable RAMBlock:

  hw   tablemax_size
  ---  -

  virt "etc/acpi/tables"ACPI_BUILD_TABLE_MAX_SIZE (0x20)
  virt "etc/table-loader"   HOST_PAGE_ALIGN(initial_size)
  virt "etc/acpi/rsdp"  HOST_PAGE_ALIGN(initial_size)

  i386 "etc/acpi/tables"ACPI_BUILD_TABLE_MAX_SIZE (0x20)
  i386 "etc/table-loader"   HOST_PAGE_ALIGN(initial_size)
  i386 "etc/acpi/rsdp"  HOST_PAGE_ALIGN(initial_size)

  microvm  "etc/acpi/tables"ACPI_BUILD_TABLE_MAX_SIZE (0x20)
  microvm  "etc/table-loader"   HOST_PAGE_ALIGN(initial_size)
  microvm  "etc/acpi/rsdp"  HOST_PAGE_ALIGN(initial_size)

Let's set the maximum table size for "etc/table-loader" to 64k, so we
can properly grow at runtime, which should be good enough for the future.

Migration is not concerned with the maximum size of a RAMBlock, only
with the used size - so existing setups are not affected. Of course, we
cannot migrate a VM that would have crash when started on older QEMU from
new QEMU to older QEMU without failing early on the destination when
synchronizing the RAM state:
qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: 
Invalid argument
qemu-system-x86_64: error while loading state for instance 0x0 of device 
'ram'
qemu-system-x86_64: load of migration failed: Invalid argument

We'll refactor the code next, to make sure we get rid of this implicit
behavior for "etc/acpi/rsdp" as well and to make the code easier to
grasp.

Reviewed-by: Igor Mammedov 
Cc: Alistair Francis 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Peter Maydell 
Cc: Shannon Zhao 
Cc: Marcel Apfelbaum 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Laszlo Ersek 
Signed-off-by: David Hildenbrand 
Message-Id: <20210304105554.121674-2-da...@redhat.com>
Reviewed-by: Laszlo Ersek 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h | 1 +
 hw/arm/virt-acpi-build.c| 3 ++-
 hw/i386/acpi-build.c| 3 ++-
 hw/i386/acpi-microvm.c  | 2 +-
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index e652106e26..ca781f3531 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -6,6 +6,7 @@
 
 /* Reserve RAM space for tables: add another order of magnitude. */
 #define ACPI_BUILD_TABLE_MAX_SIZE 0x20
+#define ACPI_BUILD_LOADER_MAX_SIZE0x1
 
 #define ACPI_BUILD_APPNAME6 "BOCHS "
 #define ACPI_BUILD_APPNAME8 "BXPC"
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f9c9df916c..a91550de6f 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -865,7 +865,8 @@ void virt_acpi_setup(VirtMachineState *vms)
 
 build_state->linker_mr =
 acpi_add_rom_blob(virt_acpi_build_update, build_state,
-  tables.linker->cmd_blob, ACPI_BUILD_LOADER_FILE, 0);
+  tables.linker->cmd_blob, ACPI_BUILD_LOADER_FILE,
+ 

[PULL v2 13/19] pci: acpi: add _DSM method to PCI devices

2021-03-22 Thread Michael S. Tsirkin
From: Igor Mammedov 

Implement _DSM according to:
PCI Firmware Specification 3.1
4.6.7.  DSM for Naming a PCI or PCI Express Device Under
Operating Systems
and wire it up to cold and hot-plugged PCI devices.
Feature depends on ACPI hotplug being enabled (as that provides
PCI devices descriptions in ACPI and MMIO registers that are
reused to fetch acpi-index).

acpi-index should work for
  - cold plugged NICs:
  $QEMU -device e1000,acpi-index=100
 => 'eno100'
  - hot-plugged
  (monitor) device_add e1000,acpi-index=200,id=remove_me
 => 'eno200'
  - re-plugged
  (monitor) device_del remove_me
  (monitor) device_add e1000,acpi-index=1
 => 'eno1'

Windows also sees index under "PCI Label Id" field in properties
dialog but otherwise it doesn't seem to have any effect.

Signed-off-by: Igor Mammedov 
Message-Id: <20210315180102.3008391-6-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/pci.h |   1 +
 hw/i386/acpi-build.c  | 105 --
 2 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/include/hw/acpi/pci.h b/include/hw/acpi/pci.h
index e514f179d8..b5deee0a9d 100644
--- a/include/hw/acpi/pci.h
+++ b/include/hw/acpi/pci.h
@@ -35,4 +35,5 @@ typedef struct AcpiMcfgInfo {
 
 void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info,
 const char *oem_id, const char *oem_table_id);
+Aml *aml_pci_device_dsm(void);
 #endif
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e49fae2bfd..a95b42c8b3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -397,6 +397,13 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
 );
 aml_append(dev, method);
+method = aml_method("_DSM", 4, AML_SERIALIZED);
+aml_append(method,
+aml_return(aml_call6("PDSM", aml_arg(0), aml_arg(1),
+ aml_arg(2), aml_arg(3),
+ aml_name("BSEL"), aml_name("_SUN")))
+);
+aml_append(dev, method);
 aml_append(parent_scope, dev);
 
 build_append_pcihp_notify_entry(notify_method, slot);
@@ -424,6 +431,16 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
 aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
 
+if (bsel) {
+aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
+method = aml_method("_DSM", 4, AML_SERIALIZED);
+aml_append(method, aml_return(
+aml_call6("PDSM", aml_arg(0), aml_arg(1), aml_arg(2),
+  aml_arg(3), aml_name("BSEL"), aml_name("_SUN"))
+));
+aml_append(dev, method);
+}
+
 if (pc->class_id == PCI_CLASS_DISPLAY_VGA) {
 /* add VGA specific AML methods */
 int s3d;
@@ -446,9 +463,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
PCIBus *bus,
 aml_append(method, aml_return(aml_int(s3d)));
 aml_append(dev, method);
 } else if (hotplug_enabled_dev) {
-/* add _SUN/_EJ0 to make slot hotpluggable  */
-aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
-
+/* add _EJ0 to make slot hotpluggable  */
 method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
 aml_append(method,
 aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
@@ -511,6 +526,88 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 qobject_unref(bsel);
 }
 
+Aml *aml_pci_device_dsm(void)
+{
+Aml *method, *UUID, *ifctx, *ifctx1, *ifctx2, *ifctx3, *elsectx;
+Aml *acpi_index = aml_local(0);
+Aml *zero = aml_int(0);
+Aml *bnum = aml_arg(4);
+Aml *func = aml_arg(2);
+Aml *rev = aml_arg(1);
+Aml *sun = aml_arg(5);
+
+method = aml_method("PDSM", 6, AML_SERIALIZED);
+
+/*
+ * PCI Firmware Specification 3.1
+ * 4.6.  _DSM Definitions for PCI
+ */
+UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
+ifctx = aml_if(aml_equal(aml_arg(0), UUID));
+{
+aml_append(ifctx, aml_store(aml_call2("AIDX", bnum, sun), acpi_index));
+ifctx1 = aml_if(aml_equal(func, zero));
+{
+uint8_t byte_list[1];
+
+ifctx2 = aml_if(aml_equal(rev, aml_int(2)));
+{
+/*
+ * advertise function 7 if device has acpi-index
+ * acpi_index values:
+ *0: not present (default value)
+ * : not supported (old QEMU without PIDX reg)
+ *other: device's 

[PULL v2 17/19] acpi: Move maximum size logic into acpi_add_rom_blob()

2021-03-22 Thread Michael S. Tsirkin
From: David Hildenbrand 

We want to have safety margins for all tables based on the table type.
Let's move the maximum size logic into acpi_add_rom_blob() and make it
dependent on the table name, so we don't have to replicate for each and
every instance that creates such tables.

Suggested-by: Laszlo Ersek 
Cc: Alistair Francis 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Peter Maydell 
Cc: Shannon Zhao 
Cc: Marcel Apfelbaum 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Laszlo Ersek 
Signed-off-by: David Hildenbrand 
Message-Id: <20210304105554.121674-4-da...@redhat.com>
Reviewed-by: Laszlo Ersek 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h |  4 
 include/hw/acpi/utils.h |  3 +--
 hw/acpi/utils.c | 12 ++--
 hw/arm/virt-acpi-build.c| 13 ++---
 hw/i386/acpi-build.c|  8 +++-
 hw/i386/acpi-microvm.c  | 16 ++--
 6 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index ca781f3531..471266d739 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -4,10 +4,6 @@
 #include "hw/acpi/acpi-defs.h"
 #include "hw/acpi/bios-linker-loader.h"
 
-/* Reserve RAM space for tables: add another order of magnitude. */
-#define ACPI_BUILD_TABLE_MAX_SIZE 0x20
-#define ACPI_BUILD_LOADER_MAX_SIZE0x1
-
 #define ACPI_BUILD_APPNAME6 "BOCHS "
 #define ACPI_BUILD_APPNAME8 "BXPC"
 
diff --git a/include/hw/acpi/utils.h b/include/hw/acpi/utils.h
index 140b4de603..0022df027d 100644
--- a/include/hw/acpi/utils.h
+++ b/include/hw/acpi/utils.h
@@ -4,6 +4,5 @@
 #include "hw/nvram/fw_cfg.h"
 
 MemoryRegion *acpi_add_rom_blob(FWCfgCallback update, void *opaque,
-GArray *blob, const char *name,
-uint64_t max_size);
+GArray *blob, const char *name);
 #endif
diff --git a/hw/acpi/utils.c b/hw/acpi/utils.c
index a134a4d554..f2d69a6d92 100644
--- a/hw/acpi/utils.c
+++ b/hw/acpi/utils.c
@@ -27,9 +27,17 @@
 #include "hw/loader.h"
 
 MemoryRegion *acpi_add_rom_blob(FWCfgCallback update, void *opaque,
-GArray *blob, const char *name,
-uint64_t max_size)
+GArray *blob, const char *name)
 {
+uint64_t max_size = 0;
+
+/* Reserve RAM space for tables: add another order of magnitude. */
+if (!strcmp(name, ACPI_BUILD_TABLE_FILE)) {
+max_size = 0x20;
+} else if (!strcmp(name, ACPI_BUILD_LOADER_FILE)) {
+max_size = 0x1;
+}
+
 return rom_add_blob(name, blob->data, acpi_data_len(blob), max_size, -1,
 name, update, opaque, NULL, true);
 }
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a91550de6f..f5a2b2d4cb 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -859,14 +859,13 @@ void virt_acpi_setup(VirtMachineState *vms)
 /* Now expose it all to Guest */
 build_state->table_mr = acpi_add_rom_blob(virt_acpi_build_update,
   build_state, tables.table_data,
-  ACPI_BUILD_TABLE_FILE,
-  ACPI_BUILD_TABLE_MAX_SIZE);
+  ACPI_BUILD_TABLE_FILE);
 assert(build_state->table_mr != NULL);
 
-build_state->linker_mr =
-acpi_add_rom_blob(virt_acpi_build_update, build_state,
-  tables.linker->cmd_blob, ACPI_BUILD_LOADER_FILE,
-  ACPI_BUILD_LOADER_MAX_SIZE);
+build_state->linker_mr = acpi_add_rom_blob(virt_acpi_build_update,
+   build_state,
+   tables.linker->cmd_blob,
+   ACPI_BUILD_LOADER_FILE);
 
 fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
 acpi_data_len(tables.tcpalog));
@@ -880,7 +879,7 @@ void virt_acpi_setup(VirtMachineState *vms)
 
 build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
  build_state, tables.rsdp,
- ACPI_BUILD_RSDP_FILE, 0);
+ ACPI_BUILD_RSDP_FILE);
 
 qemu_register_reset(virt_acpi_build_reset, build_state);
 virt_acpi_build_reset(build_state);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index dc56006353..3aeae15e57 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2628,14 +2628,12 @@ void acpi_setup(void)
 /* Now expose it all to Guest */
 build_state->table_mr = acpi_add_rom_blob(acpi_build_update,
  

[PULL v2 06/19] vhost-user: Introduce nested event loop in vhost_user_read()

2021-03-22 Thread Michael S. Tsirkin
From: Greg Kurz 

A deadlock condition potentially exists if a vhost-user process needs
to request something to QEMU on the slave channel while processing a
vhost-user message.

This doesn't seem to affect any vhost-user implementation so far, but
this is currently biting the upcoming enablement of DAX with virtio-fs.
The issue is being observed when the guest does an emergency reboot while
a mapping still exits in the DAX window, which is very easy to get with
a busy enough workload (e.g. as simulated by blogbench [1]) :

- QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd.

- In order to complete the request, virtiofsd then asks QEMU to remove
  the mapping on the slave channel.

All these dialogs are synchronous, hence the deadlock.

As pointed out by Stefan Hajnoczi:

When QEMU's vhost-user master implementation sends a vhost-user protocol
message, vhost_user_read() does a "blocking" read during which slave_fd
is not monitored by QEMU.

The natural solution for this issue is an event loop. The main event
loop cannot be nested though since we have no guarantees that its
fd handlers are prepared for re-entrancy.

Introduce a new event loop that only monitors the chardev I/O for now
in vhost_user_read() and push the actual reading to a one-shot handler.
A subsequent patch will teach the loop to monitor and process messages
from the slave channel as well.

[1] https://github.com/jedisct1/Blogbench

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Greg Kurz 
Message-Id: <20210312092212.782255-6-gr...@kaod.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 hw/virtio/vhost-user.c | 65 ++
 1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3c1e1611b0..00256fa318 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -296,15 +296,27 @@ static int vhost_user_read_header(struct vhost_dev *dev, 
VhostUserMsg *msg)
 return 0;
 }
 
-static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
+struct vhost_user_read_cb_data {
+struct vhost_dev *dev;
+VhostUserMsg *msg;
+GMainLoop *loop;
+int ret;
+};
+
+static gboolean vhost_user_read_cb(GIOChannel *source, GIOCondition condition,
+   gpointer opaque)
 {
+struct vhost_user_read_cb_data *data = opaque;
+struct vhost_dev *dev = data->dev;
+VhostUserMsg *msg = data->msg;
 struct vhost_user *u = dev->opaque;
 CharBackend *chr = u->user->chr;
 uint8_t *p = (uint8_t *) msg;
 int r, size;
 
 if (vhost_user_read_header(dev, msg) < 0) {
-return -1;
+data->ret = -1;
+goto end;
 }
 
 /* validate message size is sane */
@@ -312,7 +324,8 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
 error_report("Failed to read msg header."
 " Size %d exceeds the maximum %zu.", msg->hdr.size,
 VHOST_USER_PAYLOAD_SIZE);
-return -1;
+data->ret = -1;
+goto end;
 }
 
 if (msg->hdr.size) {
@@ -322,11 +335,53 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
 if (r != size) {
 error_report("Failed to read msg payload."
  " Read %d instead of %d.", r, msg->hdr.size);
-return -1;
+data->ret = -1;
+goto end;
 }
 }
 
-return 0;
+end:
+g_main_loop_quit(data->loop);
+return G_SOURCE_REMOVE;
+}
+
+static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
+{
+struct vhost_user *u = dev->opaque;
+CharBackend *chr = u->user->chr;
+GMainContext *prev_ctxt = chr->chr->gcontext;
+GMainContext *ctxt = g_main_context_new();
+GMainLoop *loop = g_main_loop_new(ctxt, FALSE);
+struct vhost_user_read_cb_data data = {
+.dev = dev,
+.loop = loop,
+.msg = msg,
+.ret = 0
+};
+
+/*
+ * We want to be able to monitor the slave channel fd while waiting
+ * for chr I/O. This requires an event loop, but we can't nest the
+ * one to which chr is currently attached : its fd handlers might not
+ * be prepared for re-entrancy. So we create a new one and switch chr
+ * to use it.
+ */
+qemu_chr_be_update_read_handlers(chr->chr, ctxt);
+qemu_chr_fe_add_watch(chr, G_IO_IN | G_IO_HUP, vhost_user_read_cb, );
+
+g_main_loop_run(loop);
+
+/*
+ * Restore the previous event loop context. This also destroys/recreates
+ * event sources : this guarantees that all pending events in the original
+ * context that have been processed by the nested loop are purged.
+ */
+qemu_chr_be_update_read_handlers(chr->chr, prev_ctxt);
+
+g_main_loop_unref(loop);
+g_main_context_unref(ctxt);
+
+return data.ret;
 }
 
 static int process_message_reply(struct vhost_dev *dev,
-- 
MST




[PULL v2 11/19] pci: acpi: ensure that acpi-index is unique

2021-03-22 Thread Michael S. Tsirkin
From: Igor Mammedov 

it helps to avoid device naming conflicts when guest OS is
configured to use acpi-index for naming.
Spec ialso says so:

PCI Firmware Specification Revision 3.2
4.6.7.  _DSM for Naming a PCI or PCI Express Device Under Operating Systems
"
Instance number must be unique under \_SB scope. This instance number does not 
have to
be sequential in a given system configuration.
"

Signed-off-by: Igor Mammedov 
Message-Id: <20210315180102.3008391-4-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/pcihp.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index ceab287bd3..f4cb3c979d 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -52,6 +52,21 @@ typedef struct AcpiPciHpFind {
 PCIBus *bus;
 } AcpiPciHpFind;
 
+static gint g_cmp_uint32(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+return a - b;
+}
+
+static GSequence *pci_acpi_index_list(void)
+{
+static GSequence *used_acpi_index_list;
+
+if (!used_acpi_index_list) {
+used_acpi_index_list = g_sequence_new(NULL);
+}
+return used_acpi_index_list;
+}
+
 static int acpi_pcihp_get_bsel(PCIBus *bus)
 {
 Error *local_err = NULL;
@@ -277,6 +292,23 @@ void acpi_pcihp_device_pre_plug_cb(HotplugHandler 
*hotplug_dev,
ONBOARD_INDEX_MAX);
 return;
 }
+
+/*
+ * make sure that acpi-index is unique across all present PCI devices
+ */
+if (pdev->acpi_index) {
+GSequence *used_indexes = pci_acpi_index_list();
+
+if (g_sequence_lookup(used_indexes, GINT_TO_POINTER(pdev->acpi_index),
+  g_cmp_uint32, NULL)) {
+error_setg(errp, "a PCI device with acpi-index = %" PRIu32
+   " already exist", pdev->acpi_index);
+return;
+}
+g_sequence_insert_sorted(used_indexes,
+ GINT_TO_POINTER(pdev->acpi_index),
+ g_cmp_uint32, NULL);
+}
 }
 
 void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
@@ -315,8 +347,22 @@ void acpi_pcihp_device_plug_cb(HotplugHandler 
*hotplug_dev, AcpiPciHpState *s,
 void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState 
*s,
  DeviceState *dev, Error **errp)
 {
+PCIDevice *pdev = PCI_DEVICE(dev);
+
 trace_acpi_pci_unplug(PCI_SLOT(PCI_DEVICE(dev)->devfn),
   acpi_pcihp_get_bsel(pci_get_bus(PCI_DEVICE(dev;
+
+/*
+ * clean up acpi-index so it could reused by another device
+ */
+if (pdev->acpi_index) {
+GSequence *used_indexes = pci_acpi_index_list();
+
+g_sequence_remove(g_sequence_lookup(used_indexes,
+  GINT_TO_POINTER(pdev->acpi_index),
+  g_cmp_uint32, NULL));
+}
+
 qdev_unrealize(dev);
 }
 
-- 
MST




[PULL v2 04/19] vhost-user: Factor out duplicated slave_fd teardown code

2021-03-22 Thread Michael S. Tsirkin
From: Greg Kurz 

Signed-off-by: Greg Kurz 
Message-Id: <20210312092212.782255-4-gr...@kaod.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 hw/virtio/vhost-user.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index acde1d2936..cb0c98f30a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1392,6 +1392,13 @@ static int 
vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 return 0;
 }
 
+static void close_slave_channel(struct vhost_user *u)
+{
+qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
+close(u->slave_fd);
+u->slave_fd = -1;
+}
+
 static void slave_read(void *opaque)
 {
 struct vhost_dev *dev = opaque;
@@ -1507,9 +1514,7 @@ static void slave_read(void *opaque)
 goto fdcleanup;
 
 err:
-qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-close(u->slave_fd);
-u->slave_fd = -1;
+close_slave_channel(u);
 
 fdcleanup:
 for (i = 0; i < fdsize; i++) {
@@ -1560,9 +1565,7 @@ static int vhost_setup_slave_channel(struct vhost_dev 
*dev)
 out:
 close(sv[1]);
 if (ret) {
-qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-close(u->slave_fd);
-u->slave_fd = -1;
+close_slave_channel(u);
 }
 
 return ret;
@@ -1915,9 +1918,7 @@ static int vhost_user_backend_cleanup(struct vhost_dev 
*dev)
 u->postcopy_fd.handler = NULL;
 }
 if (u->slave_fd >= 0) {
-qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-close(u->slave_fd);
-u->slave_fd = -1;
+close_slave_channel(u);
 }
 g_free(u->region_rb);
 u->region_rb = NULL;
-- 
MST




[PULL v2 08/19] virtio-pmem: fix virtio_pmem_resp assign problem

2021-03-22 Thread Michael S. Tsirkin
From: Wang Liang 

ret in virtio_pmem_resp is a uint32_t variable, which should be assigned
using virtio_stl_p.

The kernel side driver does not guarantee virtio_pmem_resp to be initialized
to zero in advance, So sometimes the flush operation will fail.

Signed-off-by: Wang Liang 
Message-Id: <20210317024145.271212-1-wanglian...@126.com>
Reviewed-by: Stefano Garzarella 
Reviewed-by: David Hildenbrand 
Reviewed-by: Pankaj Gupta 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index a3e0688a89..d1aeb90a31 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -47,7 +47,7 @@ static int worker_cb(void *opaque)
 err = 1;
 }
 
-virtio_stw_p(req_data->vdev, _data->resp.ret, err);
+virtio_stl_p(req_data->vdev, _data->resp.ret, err);
 
 return 0;
 }
-- 
MST




[PULL v2 14/19] tests: acpi: update expected blobs

2021-03-22 Thread Michael S. Tsirkin
From: Igor Mammedov 

expected changes are:
 * larger BNMR operation region
 * new PIDX field and method to fetch acpi-index
 * PDSM method that implements PCI device _DSM +
   per device _DSM that calls PDSM

@@ -221,10 +221,11 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC", 
0x0001)
 B0EJ,   32
 }

-OperationRegion (BNMR, SystemIO, 0xAE10, 0x04)
+OperationRegion (BNMR, SystemIO, 0xAE10, 0x08)
 Field (BNMR, DWordAcc, NoLock, WriteAsZeros)
 {
-BNUM,   32
+BNUM,   32,
+PIDX,   32
 }

 Mutex (BLCK, 0x00)
@@ -236,6 +237,52 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC", 
0x0001)
 Release (BLCK)
 Return (Zero)
 }
+
+Method (AIDX, 2, NotSerialized)
+{
+Acquire (BLCK, 0x)
+BNUM = Arg0
+PIDX = (One << Arg1)
+Local0 = PIDX /* \_SB_.PCI0.PIDX */
+Release (BLCK)
+Return (Local0)
+}
+
+Method (PDSM, 6, Serialized)
+{
+If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") /* 
Device Labeling Interface */))
+{
+Local0 = AIDX (Arg4, Arg5)
+If ((Arg2 == Zero))
+{
+If ((Arg1 == 0x02))
+{
+If (!((Local0 == Zero) | (Local0 == 0x)))
+{
+Return (Buffer (One)
+{
+ 0x81  
   // .
+})
+}
+}
+
+Return (Buffer (One)
+{
+ 0x00 // .
+})
+}
+ElseIf ((Arg2 == 0x07))
+{
+Local1 = Package (0x02)
+{
+Zero,
+""
+}
+Local1 [Zero] = Local0
+Return (Local1)
+}
+}
+}
 }

 Scope (_SB)
@@ -785,7 +832,7 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC", 
0x0001)
 0xAE00, // Range Minimum
 0xAE00, // Range Maximum
 0x01,   // Alignment
-0x14,   // Length
+0x18,   // Length
 )
 })
 }
@@ -842,11 +889,22 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC", 
0x0001)
 Device (S00)
 {
 Name (_ADR, Zero)  // _ADR: Address
+Name (_SUN, Zero)  // _SUN: Slot User Number
+Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
+{
+Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+}
 }

 Device (S10)
 {
 Name (_ADR, 0x0002)  // _ADR: Address
+Name (_SUN, 0x02)  // _SUN: Slot User Number
+Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
+{
+Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+}
+
 Method (_S1D, 0, NotSerialized)  // _S1D: S1 Device State
 {
 Return (Zero)
[...]

Signed-off-by: Igor Mammedov 
Message-Id: <20210315180102.3008391-7-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/bios-tables-test-allowed-diff.h |  11 ---
 tests/data/acpi/pc/DSDT | Bin 5065 -> 6002 bytes
 tests/data/acpi/pc/DSDT.acpihmat| Bin 6390 -> 7327 bytes
 tests/data/acpi/pc/DSDT.bridge  | Bin 6924 -> 8668 bytes
 tests/data/acpi/pc/DSDT.cphp| Bin 5529 -> 6466 bytes
 tests/data/acpi/pc/DSDT.dimmpxm | Bin 6719 -> 7656 bytes
 tests/data/acpi/pc/DSDT.hpbridge| Bin 5026 -> 5969 bytes
 tests/data/acpi/pc/DSDT.ipmikcs | Bin 5137 -> 6074 bytes
 tests/data/acpi/pc/DSDT.memhp   | Bin 6424 -> 7361 bytes
 tests/data/acpi/pc/DSDT.nohpet  | Bin 4923 -> 5860 bytes
 tests/data/acpi/pc/DSDT.numamem | Bin 5071 -> 6008 bytes
 tests/data/acpi/pc/DSDT.roothp  | Bin 5261 -> 6210 bytes
 12 files changed, 11 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index fddcfc061f..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,12 +1 @@
 /* List of comma-separated changed AML files to ignore */

[PULL v2 02/19] vhost-user: Drop misleading EAGAIN checks in slave_read()

2021-03-22 Thread Michael S. Tsirkin
From: Greg Kurz 

slave_read() checks EAGAIN when reading or writing to the socket
fails. This gives the impression that the slave channel is in
non-blocking mode, which is certainly not the case with the current
code base. And the rest of the code isn't actually ready to cope
with non-blocking I/O.

Just drop the checks everywhere in this function for the sake of
clarity.

Signed-off-by: Greg Kurz 
Message-Id: <20210312092212.782255-2-gr...@kaod.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 hw/virtio/vhost-user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2fdd5daf74..6af9b43a72 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1420,7 +1420,7 @@ static void slave_read(void *opaque)
 
 do {
 size = recvmsg(u->slave_fd, , 0);
-} while (size < 0 && (errno == EINTR || errno == EAGAIN));
+} while (size < 0 && errno == EINTR);
 
 if (size != VHOST_USER_HDR_SIZE) {
 error_report("Failed to read from slave.");
@@ -1452,7 +1452,7 @@ static void slave_read(void *opaque)
 /* Read payload */
 do {
 size = read(u->slave_fd, , hdr.size);
-} while (size < 0 && (errno == EINTR || errno == EAGAIN));
+} while (size < 0 && errno == EINTR);
 
 if (size != hdr.size) {
 error_report("Failed to read payload from slave.");
@@ -1503,7 +1503,7 @@ static void slave_read(void *opaque)
 
 do {
 size = writev(u->slave_fd, iovec, ARRAY_SIZE(iovec));
-} while (size < 0 && (errno == EINTR || errno == EAGAIN));
+} while (size < 0 && errno == EINTR);
 
 if (size != VHOST_USER_HDR_SIZE + hdr.size) {
 error_report("Failed to send msg reply to slave.");
-- 
MST




[PULL v2 07/19] vhost-user: Monitor slave channel in vhost_user_read()

2021-03-22 Thread Michael S. Tsirkin
From: Greg Kurz 

Now that everything is in place, have the nested event loop to monitor
the slave channel. The source in the main event loop is destroyed and
recreated to ensure any pending even for the slave channel that was
previously detected is purged. This guarantees that the main loop
wont invoke slave_read() based on an event that was already handled
by the nested loop.

Signed-off-by: Greg Kurz 
Message-Id: <20210312092212.782255-7-gr...@kaod.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 hw/virtio/vhost-user.c | 35 ---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00256fa318..ded0c10453 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -345,6 +345,35 @@ end:
 return G_SOURCE_REMOVE;
 }
 
+static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
+   gpointer opaque);
+
+/*
+ * This updates the read handler to use a new event loop context.
+ * Event sources are removed from the previous context : this ensures
+ * that events detected in the previous context are purged. They will
+ * be re-detected and processed in the new context.
+ */
+static void slave_update_read_handler(struct vhost_dev *dev,
+  GMainContext *ctxt)
+{
+struct vhost_user *u = dev->opaque;
+
+if (!u->slave_ioc) {
+return;
+}
+
+if (u->slave_src) {
+g_source_destroy(u->slave_src);
+g_source_unref(u->slave_src);
+}
+
+u->slave_src = qio_channel_add_watch_source(u->slave_ioc,
+G_IO_IN | G_IO_HUP,
+slave_read, dev, NULL,
+ctxt);
+}
+
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
 struct vhost_user *u = dev->opaque;
@@ -366,6 +395,7 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
  * be prepared for re-entrancy. So we create a new one and switch chr
  * to use it.
  */
+slave_update_read_handler(dev, ctxt);
 qemu_chr_be_update_read_handlers(chr->chr, ctxt);
 qemu_chr_fe_add_watch(chr, G_IO_IN | G_IO_HUP, vhost_user_read_cb, );
 
@@ -377,6 +407,7 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
  * context that have been processed by the nested loop are purged.
  */
 qemu_chr_be_update_read_handlers(chr->chr, prev_ctxt);
+slave_update_read_handler(dev, NULL);
 
 g_main_loop_unref(loop);
 g_main_context_unref(ctxt);
@@ -1580,9 +1611,7 @@ static int vhost_setup_slave_channel(struct vhost_dev 
*dev)
 return -1;
 }
 u->slave_ioc = ioc;
-u->slave_src = qio_channel_add_watch_source(u->slave_ioc,
-G_IO_IN | G_IO_HUP,
-slave_read, dev, NULL, NULL);
+slave_update_read_handler(dev, NULL);
 
 if (reply_supported) {
 msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-- 
MST




[PULL v2 12/19] acpi: add aml_to_decimalstring() and aml_call6() helpers

2021-03-22 Thread Michael S. Tsirkin
From: Igor Mammedov 

it will be used by follow up patches

Signed-off-by: Igor Mammedov 
Message-Id: <20210315180102.3008391-5-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h |  3 +++
 hw/acpi/aml-build.c | 28 
 2 files changed, 31 insertions(+)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 380d3e3924..e652106e26 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -301,6 +301,7 @@ Aml *aml_arg(int pos);
 Aml *aml_to_integer(Aml *arg);
 Aml *aml_to_hexstring(Aml *src, Aml *dst);
 Aml *aml_to_buffer(Aml *src, Aml *dst);
+Aml *aml_to_decimalstring(Aml *src, Aml *dst);
 Aml *aml_store(Aml *val, Aml *target);
 Aml *aml_and(Aml *arg1, Aml *arg2, Aml *dst);
 Aml *aml_or(Aml *arg1, Aml *arg2, Aml *dst);
@@ -323,6 +324,8 @@ Aml *aml_call3(const char *method, Aml *arg1, Aml *arg2, 
Aml *arg3);
 Aml *aml_call4(const char *method, Aml *arg1, Aml *arg2, Aml *arg3, Aml *arg4);
 Aml *aml_call5(const char *method, Aml *arg1, Aml *arg2, Aml *arg3, Aml *arg4,
Aml *arg5);
+Aml *aml_call6(const char *method, Aml *arg1, Aml *arg2, Aml *arg3, Aml *arg4,
+   Aml *arg5, Aml *arg6);
 Aml *aml_gpio_int(AmlConsumerAndProducer con_and_pro,
   AmlLevelAndEdge edge_level,
   AmlActiveHighAndLow active_level, AmlShared shared,
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a2cd7a5830..d33ce8954a 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -634,6 +634,19 @@ Aml *aml_to_buffer(Aml *src, Aml *dst)
 return var;
 }
 
+/* ACPI 2.0a: 17.2.4.4 Type 2 Opcodes Encoding: DefToDecimalString */
+Aml *aml_to_decimalstring(Aml *src, Aml *dst)
+{
+Aml *var = aml_opcode(0x97 /* ToDecimalStringOp */);
+aml_append(var, src);
+if (dst) {
+aml_append(var, dst);
+} else {
+build_append_byte(var->buf, 0x00 /* NullNameOp */);
+}
+return var;
+}
+
 /* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefStore */
 Aml *aml_store(Aml *val, Aml *target)
 {
@@ -835,6 +848,21 @@ Aml *aml_call5(const char *method, Aml *arg1, Aml *arg2, 
Aml *arg3, Aml *arg4,
 return var;
 }
 
+/* helper to call method with 5 arguments */
+Aml *aml_call6(const char *method, Aml *arg1, Aml *arg2, Aml *arg3, Aml *arg4,
+   Aml *arg5, Aml *arg6)
+{
+Aml *var = aml_alloc();
+build_append_namestring(var->buf, "%s", method);
+aml_append(var, arg1);
+aml_append(var, arg2);
+aml_append(var, arg3);
+aml_append(var, arg4);
+aml_append(var, arg5);
+aml_append(var, arg6);
+return var;
+}
+
 /*
  * ACPI 5.0: 6.4.3.8.1 GPIO Connection Descriptor
  * Type 1, Large Item Name 0xC
-- 
MST




[PULL v2 09/19] tests: acpi: temporary whitelist DSDT changes

2021-03-22 Thread Michael S. Tsirkin
From: Igor Mammedov 

Signed-off-by: Igor Mammedov 
Message-Id: <20210315180102.3008391-2-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..fddcfc061f 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,12 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/DSDT",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/pc/DSDT.bridge",
+"tests/data/acpi/pc/DSDT.cphp",
+"tests/data/acpi/pc/DSDT.dimmpxm",
+"tests/data/acpi/pc/DSDT.hpbridge",
+"tests/data/acpi/pc/DSDT.hpbrroot",
+"tests/data/acpi/pc/DSDT.ipmikcs",
+"tests/data/acpi/pc/DSDT.memhp",
+"tests/data/acpi/pc/DSDT.numamem",
+"tests/data/acpi/pc/DSDT.roothp",
-- 
MST




[PULL v2 01/19] virtio: Fix virtio_mmio_read()/virtio_mmio_write()

2021-03-22 Thread Michael S. Tsirkin
From: Laurent Vivier 

Both functions don't check the personality of the interface (legacy or
modern) before accessing the configuration memory and always use
virtio_config_readX()/virtio_config_writeX().

With this patch, they now check the personality and in legacy mode
call virtio_config_readX()/virtio_config_writeX(), otherwise call
virtio_config_modern_readX()/virtio_config_modern_writeX().

This change has been tested with virtio-mmio guests (virt stretch/armhf and
virt sid/m68k) and virtio-pci guests (pseries RHEL-7.3/ppc64 and /ppc64le).

Signed-off-by: Laurent Vivier 
Message-Id: <20210314200300.3259170-1-laur...@vivier.eu>
Reviewed-by: Stefano Garzarella 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mmio.c | 74 +
 1 file changed, 52 insertions(+), 22 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 6990b9879c..342c918ea7 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -112,15 +112,28 @@ static uint64_t virtio_mmio_read(void *opaque, hwaddr 
offset, unsigned size)
 
 if (offset >= VIRTIO_MMIO_CONFIG) {
 offset -= VIRTIO_MMIO_CONFIG;
-switch (size) {
-case 1:
-return virtio_config_readb(vdev, offset);
-case 2:
-return virtio_config_readw(vdev, offset);
-case 4:
-return virtio_config_readl(vdev, offset);
-default:
-abort();
+if (proxy->legacy) {
+switch (size) {
+case 1:
+return virtio_config_readb(vdev, offset);
+case 2:
+return virtio_config_readw(vdev, offset);
+case 4:
+return virtio_config_readl(vdev, offset);
+default:
+abort();
+}
+} else {
+switch (size) {
+case 1:
+return virtio_config_modern_readb(vdev, offset);
+case 2:
+return virtio_config_modern_readw(vdev, offset);
+case 4:
+return virtio_config_modern_readl(vdev, offset);
+default:
+abort();
+}
 }
 }
 if (size != 4) {
@@ -245,20 +258,37 @@ static void virtio_mmio_write(void *opaque, hwaddr 
offset, uint64_t value,
 
 if (offset >= VIRTIO_MMIO_CONFIG) {
 offset -= VIRTIO_MMIO_CONFIG;
-switch (size) {
-case 1:
-virtio_config_writeb(vdev, offset, value);
-break;
-case 2:
-virtio_config_writew(vdev, offset, value);
-break;
-case 4:
-virtio_config_writel(vdev, offset, value);
-break;
-default:
-abort();
+if (proxy->legacy) {
+switch (size) {
+case 1:
+virtio_config_writeb(vdev, offset, value);
+break;
+case 2:
+virtio_config_writew(vdev, offset, value);
+break;
+case 4:
+virtio_config_writel(vdev, offset, value);
+break;
+default:
+abort();
+}
+return;
+} else {
+switch (size) {
+case 1:
+virtio_config_modern_writeb(vdev, offset, value);
+break;
+case 2:
+virtio_config_modern_writew(vdev, offset, value);
+break;
+case 4:
+virtio_config_modern_writel(vdev, offset, value);
+break;
+default:
+abort();
+}
+return;
 }
-return;
 }
 if (size != 4) {
 qemu_log_mask(LOG_GUEST_ERROR,
-- 
MST




[PULL v2 05/19] vhost-user: Convert slave channel to QIOChannelSocket

2021-03-22 Thread Michael S. Tsirkin
From: Greg Kurz 

The slave channel is implemented with socketpair() : QEMU creates
the pair, passes one of the socket to virtiofsd and monitors the
other one with the main event loop using qemu_set_fd_handler().

In order to fix a potential deadlock between QEMU and a vhost-user
external process (e.g. virtiofsd with DAX), we want to be able to
monitor and service the slave channel while handling vhost-user
requests.

Prepare ground for this by converting the slave channel to be a
QIOChannelSocket. This will make monitoring of the slave channel
as simple as calling qio_channel_add_watch_source(). Since the
connection is already established between the two sockets, only
incoming I/O (G_IO_IN) and disconnect (G_IO_HUP) need to be
serviced.

This also allows to get rid of the ancillary data parsing since
QIOChannelSocket can do this for us. Note that the MSG_CTRUNC
check is dropped on the way because QIOChannelSocket ignores this
case. This isn't a problem since slave_read() provisions space for
8 file descriptors, but affected vhost-user slave protocol messages
generally only convey one. If for some reason a buggy implementation
passes more file descriptors, no need to break the connection, just
like we don't break it if some other type of ancillary data is
received : this isn't explicitely violating the protocol per-se so
it seems better to ignore it.

The current code errors out on short reads and writes. Use the
qio_channel_*_all() variants to address this on the way.

Signed-off-by: Greg Kurz 
Message-Id: <20210312092212.782255-5-gr...@kaod.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 hw/virtio/vhost-user.c | 99 +-
 1 file changed, 39 insertions(+), 60 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index cb0c98f30a..3c1e1611b0 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -16,6 +16,7 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
+#include "io/channel-socket.h"
 #include "sysemu/kvm.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -237,7 +238,8 @@ struct vhost_user {
 struct vhost_dev *dev;
 /* Shared between vhost devs of the same virtio device */
 VhostUserState *user;
-int slave_fd;
+QIOChannel *slave_ioc;
+GSource *slave_src;
 NotifierWithReturn postcopy_notifier;
 struct PostCopyFD  postcopy_fd;
 uint64_t   postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
@@ -1394,61 +1396,37 @@ static int 
vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 
 static void close_slave_channel(struct vhost_user *u)
 {
-qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-close(u->slave_fd);
-u->slave_fd = -1;
+g_source_destroy(u->slave_src);
+g_source_unref(u->slave_src);
+u->slave_src = NULL;
+object_unref(OBJECT(u->slave_ioc));
+u->slave_ioc = NULL;
 }
 
-static void slave_read(void *opaque)
+static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
+   gpointer opaque)
 {
 struct vhost_dev *dev = opaque;
 struct vhost_user *u = dev->opaque;
 VhostUserHeader hdr = { 0, };
 VhostUserPayload payload = { 0, };
-int size, ret = 0;
+Error *local_err = NULL;
+gboolean rc = G_SOURCE_CONTINUE;
+int ret = 0;
 struct iovec iov;
-struct msghdr msgh;
-int fd[VHOST_USER_SLAVE_MAX_FDS];
-char control[CMSG_SPACE(sizeof(fd))];
-struct cmsghdr *cmsg;
-int i, fdsize = 0;
-
-memset(, 0, sizeof(msgh));
-msgh.msg_iov = 
-msgh.msg_iovlen = 1;
-msgh.msg_control = control;
-msgh.msg_controllen = sizeof(control);
-
-memset(fd, -1, sizeof(fd));
+g_autofree int *fd = NULL;
+size_t fdsize = 0;
+int i;
 
 /* Read header */
 iov.iov_base = 
 iov.iov_len = VHOST_USER_HDR_SIZE;
 
-do {
-size = recvmsg(u->slave_fd, , 0);
-} while (size < 0 && errno == EINTR);
-
-if (size != VHOST_USER_HDR_SIZE) {
-error_report("Failed to read from slave.");
+if (qio_channel_readv_full_all(ioc, , 1, , , _err)) {
+error_report_err(local_err);
 goto err;
 }
 
-if (msgh.msg_flags & MSG_CTRUNC) {
-error_report("Truncated message.");
-goto err;
-}
-
-for (cmsg = CMSG_FIRSTHDR(); cmsg != NULL;
- cmsg = CMSG_NXTHDR(, cmsg)) {
-if (cmsg->cmsg_level == SOL_SOCKET &&
-cmsg->cmsg_type == SCM_RIGHTS) {
-fdsize = cmsg->cmsg_len - CMSG_LEN(0);
-memcpy(fd, CMSG_DATA(cmsg), fdsize);
-break;
-}
-}
-
 if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
 error_report("Failed to read msg header."
 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -1457,12 +1435,8 @@ static void slave_read(void *opaque)
 }
 
 /* Read payload */

[PULL v2 10/19] pci: introduce acpi-index property for PCI device

2021-03-22 Thread Michael S. Tsirkin
From: Igor Mammedov 

In x86/ACPI world, linux distros are using predictable
network interface naming since systemd v197. Which on
QEMU based VMs results into path based naming scheme,
that names network interfaces based on PCI topology.

With itm on has to plug NIC in exactly the same bus/slot,
which was used when disk image was first provisioned/configured
or one risks to loose network configuration due to NIC being
renamed to actually used topology.
That also restricts freedom to reshape PCI configuration of
VM without need to reconfigure used guest image.

systemd also offers "onboard" naming scheme which is
preferred over PCI slot/topology one, provided that
firmware implements:
"
PCI Firmware Specification 3.1
4.6.7.  DSM for Naming a PCI or PCI Express Device Under
Operating Systems
"
that allows to assign user defined index to PCI device,
which systemd will use to name NIC. For example, using
  -device e1000,acpi-index=100
guest will rename NIC to 'eno100', where 'eno' is default
prefix for "onboard" naming scheme. This doesn't require
any advance configuration on guest side to com in effect
at 'onboard' scheme takes priority over path based naming.

Hope is that 'acpi-index' it will be easier to consume by
management layer, compared to forcing specific PCI topology
and/or having several disk image templates for different
topologies and will help to simplify process of spawning
VM from the same template without need to reconfigure
guest NIC.

This patch adds, 'acpi-index'* property and wires up
a 32bit register on top of pci hotplug register block
to pass index value to AML code at runtime.
Following patch will add corresponding _DSM code and
wire it up to PCI devices described in ACPI.

*) name comes from linux kernel terminology

Signed-off-by: Igor Mammedov 
Message-Id: <20210315180102.3008391-3-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/pcihp.h |  9 +--
 include/hw/pci/pci.h|  1 +
 hw/acpi/pci.c   |  1 -
 hw/acpi/pcihp.c | 58 +++--
 hw/acpi/piix4.c |  3 ++-
 hw/i386/acpi-build.c| 13 -
 hw/pci/pci.c|  1 +
 hw/acpi/trace-events|  2 ++
 8 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index dfd375820f..2dd90aea30 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -46,6 +46,7 @@ typedef struct AcpiPciHpPciStatus {
 typedef struct AcpiPciHpState {
 AcpiPciHpPciStatus acpi_pcihp_pci_status[ACPI_PCIHP_MAX_HOTPLUG_BUS];
 uint32_t hotplug_select;
+uint32_t acpi_index;
 PCIBus *root;
 MemoryRegion io;
 bool legacy_piix;
@@ -71,13 +72,17 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool 
acpihp_root_off);
 
 extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
 
-#define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp) \
+bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id);
+
+#define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp, test_acpi_index) \
 VMSTATE_UINT32_TEST(pcihp.hotplug_select, state, \
 test_pcihp), \
 VMSTATE_STRUCT_ARRAY_TEST(pcihp.acpi_pcihp_pci_status, state, \
   ACPI_PCIHP_MAX_HOTPLUG_BUS, \
   test_pcihp, 1, \
   vmstate_acpi_pcihp_pci_status, \
-  AcpiPciHpPciStatus)
+  AcpiPciHpPciStatus), \
+VMSTATE_UINT32_TEST(pcihp.acpi_index, state, \
+test_acpi_index)
 
 #endif
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 1bc231480f..6be4e0c460 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -359,6 +359,7 @@ struct PCIDevice {
 
 /* ID of standby device in net_failover pair */
 char *failover_pair_id;
+uint32_t acpi_index;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/hw/acpi/pci.c b/hw/acpi/pci.c
index ec455c3b25..75b1103ec4 100644
--- a/hw/acpi/pci.c
+++ b/hw/acpi/pci.c
@@ -59,4 +59,3 @@ void build_mcfg(GArray *table_data, BIOSLinker *linker, 
AcpiMcfgInfo *info,
 build_header(linker, table_data, (void *)(table_data->data + mcfg_start),
  "MCFG", table_data->len - mcfg_start, 1, oem_id, 
oem_table_id);
 }
-
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 9dc4d3e2db..ceab287bd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -39,12 +39,13 @@
 #include "trace.h"
 
 #define ACPI_PCIHP_ADDR 0xae00
-#define ACPI_PCIHP_SIZE 0x0014
+#define ACPI_PCIHP_SIZE 0x0018
 #define PCI_UP_BASE 0x
 #define PCI_DOWN_BASE 0x0004
 #define PCI_EJ_BASE 0x0008
 #define PCI_RMV_BASE 0x000c
 #define PCI_SEL_BASE 0x0010
+#define PCI_AIDX_BASE 0x0014
 
 typedef struct AcpiPciHpFind {
 int bsel;
@@ -251,9 +252,13 @@ void acpi_pcihp_reset(AcpiPciHpState 

[PULL v2 03/19] vhost-user: Fix double-close on slave_read() error path

2021-03-22 Thread Michael S. Tsirkin
From: Greg Kurz 

Some message types, e.g. VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
can convey file descriptors. These must be closed before returning
from slave_read() to avoid being leaked. This can currently be done
in two different places:

[1] just after the request has been processed

[2] on the error path, under the goto label err:

These path are supposed to be mutually exclusive but they are not
actually. If the VHOST_USER_NEED_REPLY_MASK flag was passed and the
sending of the reply fails, both [1] and [2] are performed with the
same descriptor values. This can potentially cause subtle bugs if one
of the descriptor was recycled by some other thread in the meantime.

This code duplication complicates rollback for no real good benefit.
Do the closing in a unique place, under a new fdcleanup: goto label
at the end of the function.

Signed-off-by: Greg Kurz 
Message-Id: <20210312092212.782255-3-gr...@kaod.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 hw/virtio/vhost-user.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6af9b43a72..acde1d2936 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1475,13 +1475,6 @@ static void slave_read(void *opaque)
 ret = -EINVAL;
 }
 
-/* Close the remaining file descriptors. */
-for (i = 0; i < fdsize; i++) {
-if (fd[i] != -1) {
-close(fd[i]);
-}
-}
-
 /*
  * REPLY_ACK feature handling. Other reply types has to be managed
  * directly in their request handlers.
@@ -1511,12 +1504,14 @@ static void slave_read(void *opaque)
 }
 }
 
-return;
+goto fdcleanup;
 
 err:
 qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
 close(u->slave_fd);
 u->slave_fd = -1;
+
+fdcleanup:
 for (i = 0; i < fdsize; i++) {
 if (fd[i] != -1) {
 close(fd[i]);
-- 
MST




[PULL v2 00/19] pc,virtio,pci: fixes, features

2021-03-22 Thread Michael S. Tsirkin
Changes from v1:
dropped an acpi patch causing regressions reported by clang

The following changes since commit f0f20022a0c744930935fdb7020a8c18347d391a:

  Merge remote-tracking branch 
'remotes/thuth-gitlab/tags/pull-request-2021-03-21' into staging (2021-03-22 
10:05:45 +)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to d07b22863b8e0981bdc9384a787a703f1fd4ba42:

  acpi: Move setters/getters of oem fields to X86MachineState (2021-03-22 
18:58:19 -0400)


pc,virtio,pci: fixes, features

Fixes all over the place.
ACPI index support.

Signed-off-by: Michael S. Tsirkin 


David Hildenbrand (4):
  acpi: Set proper maximum size for "etc/table-loader" blob
  microvm: Don't open-code "etc/table-loader"
  acpi: Move maximum size logic into acpi_add_rom_blob()
  acpi: Set proper maximum size for "etc/acpi/rsdp" blob

Greg Kurz (6):
  vhost-user: Drop misleading EAGAIN checks in slave_read()
  vhost-user: Fix double-close on slave_read() error path
  vhost-user: Factor out duplicated slave_fd teardown code
  vhost-user: Convert slave channel to QIOChannelSocket
  vhost-user: Introduce nested event loop in vhost_user_read()
  vhost-user: Monitor slave channel in vhost_user_read()

Igor Mammedov (6):
  tests: acpi: temporary whitelist DSDT changes
  pci: introduce acpi-index property for PCI device
  pci: acpi: ensure that acpi-index is unique
  acpi: add aml_to_decimalstring() and aml_call6() helpers
  pci: acpi: add _DSM method to PCI devices
  tests: acpi: update expected blobs

Laurent Vivier (1):
  virtio: Fix virtio_mmio_read()/virtio_mmio_write()

Marian Postevca (1):
  acpi: Move setters/getters of oem fields to X86MachineState

Wang Liang (1):
  virtio-pmem: fix virtio_pmem_resp assign problem

 include/hw/acpi/aml-build.h  |   6 +-
 include/hw/acpi/pci.h|   1 +
 include/hw/acpi/pcihp.h  |   9 +-
 include/hw/acpi/utils.h  |   3 +-
 include/hw/i386/microvm.h|   4 -
 include/hw/i386/pc.h |   4 -
 include/hw/i386/x86.h|   4 +
 include/hw/pci/pci.h |   1 +
 hw/acpi/aml-build.c  |  28 +
 hw/acpi/pci.c|   1 -
 hw/acpi/pcihp.c  | 104 ++-
 hw/acpi/piix4.c  |   3 +-
 hw/acpi/utils.c  |  17 ++-
 hw/arm/virt-acpi-build.c |  12 +--
 hw/i386/acpi-build.c | 173 +--
 hw/i386/acpi-microvm.c   |  32 +++---
 hw/i386/microvm.c|  66 
 hw/i386/pc.c |  63 
 hw/i386/x86.c|  64 
 hw/pci/pci.c |   1 +
 hw/virtio/vhost-user.c   | 217 +--
 hw/virtio/virtio-mmio.c  |  74 +
 hw/virtio/virtio-pmem.c  |   2 +-
 hw/acpi/trace-events |   2 +
 tests/data/acpi/pc/DSDT  | Bin 5065 -> 6002 bytes
 tests/data/acpi/pc/DSDT.acpihmat | Bin 6390 -> 7327 bytes
 tests/data/acpi/pc/DSDT.bridge   | Bin 6924 -> 8668 bytes
 tests/data/acpi/pc/DSDT.cphp | Bin 5529 -> 6466 bytes
 tests/data/acpi/pc/DSDT.dimmpxm  | Bin 6719 -> 7656 bytes
 tests/data/acpi/pc/DSDT.hpbridge | Bin 5026 -> 5969 bytes
 tests/data/acpi/pc/DSDT.ipmikcs  | Bin 5137 -> 6074 bytes
 tests/data/acpi/pc/DSDT.memhp| Bin 6424 -> 7361 bytes
 tests/data/acpi/pc/DSDT.nohpet   | Bin 4923 -> 5860 bytes
 tests/data/acpi/pc/DSDT.numamem  | Bin 5071 -> 6008 bytes
 tests/data/acpi/pc/DSDT.roothp   | Bin 5261 -> 6210 bytes
 35 files changed, 583 insertions(+), 308 deletions(-)




Re: [PATCH] acpi:piix4, vt82c686: reinitialize acpi PM device on reset

2021-03-22 Thread Michael S. Tsirkin
On Wed, Mar 17, 2021 at 02:49:31PM -0700, isaku.yamah...@gmail.com wrote:
> From: Isaku Yamahata 
> 
> Commit 6be8cf56bc8b made sure that SCI is enabled in PM1.CNT
> on reset in acpi_only mode by modifying acpi_pm1_cnt_reset() and
> that worked for q35 as expected.
> 
> The function was introduced by commit
>   eaba51c573a (acpi, acpi_piix, vt82c686: factor out PM1_CNT logic)
> that forgot to actually call it at piix4 reset time and as result
> SCI_EN wasn't set as was expected by 6be8cf56bc8b in acpi_only mode.
> 
> So Windows crashes when it notices that SCI_EN is not set and FADT is
> not providing information about how to enable it anymore.
> Reproducer:
>qemu-system-x86_64 -enable-kvm -M pc-i440fx-6.0,smm=off -cdrom 
> any_windows_10x64.iso
> 
> Fix it by calling acpi_pm1_cnt_reset() at piix4 reset time.
> 
> Occasionally this patch adds reset acpi PM related registers on
> piix4/vt582c686 reset time and de-assert sci.
> piix4_pm_realize() initializes acpi pm tmr, evt, cnt and gpe.
> via_pm_realize() initializes acpi pm tmr, evt and cnt.
> reset them on device reset. pm_reset() in ich9.c correctly calls
> corresponding reset functions.
> 
> Fixes: 6be8cf56bc8b (acpi/core: always set SCI_EN when SMM isn't supported)
> Reported-by: Reinoud Zandijk 
> Co-developed-by: Igor Mammedov 
> Signed-off-by: Igor Mammedov 
> Signed-off-by: Isaku Yamahata 

Caused regressions reported by Peter. Pls reproduce and debug, then
repost. Thanks!

> ---
> CC: imamm...@redhat.com
> CC: isaku.yamah...@intel.com
> CC: m...@redhat.com
> CC: rein...@netbsd.org
> CC: isaku.yamah...@gmail.com
> CC: berra...@redhat.com
> CC: pbonz...@redhat.com
> CC: f4...@amsat.org
> CC: aurel...@aurel32.net
> ---
>  hw/acpi/piix4.c   | 7 +++
>  hw/isa/vt82c686.c | 5 +
>  2 files changed, 12 insertions(+)
> 
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 1efc0ded9f..a00525025b 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -325,6 +325,13 @@ static void piix4_pm_reset(DeviceState *dev)
>  /* Mark SMM as already inited (until KVM supports SMM). */
>  pci_conf[0x5B] = 0x02;
>  }
> +
> +acpi_pm1_evt_reset(>ar);
> +acpi_pm1_cnt_reset(>ar);
> +acpi_pm_tmr_reset(>ar);
> +acpi_gpe_reset(>ar);
> +acpi_update_sci(>ar, s->irq);
> +
>  pm_io_space_update(s);
>  acpi_pcihp_reset(>acpi_pci_hotplug, !s->use_acpi_root_pci_hotplug);
>  }
> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
> index 05d084f698..7bacad03e2 100644
> --- a/hw/isa/vt82c686.c
> +++ b/hw/isa/vt82c686.c
> @@ -167,6 +167,11 @@ static void via_pm_reset(DeviceState *d)
>  /* SMBus IO base */
>  pci_set_long(s->dev.config + 0x90, 1);
>  
> +acpi_pm1_evt_reset(>ar);
> +acpi_pm1_cnt_reset(>ar);
> +acpi_pm_tmr_reset(>ar);
> +pm_update_sci(s);
> +
>  pm_io_space_update(s);
>  smb_io_space_update(s);
>  }
> -- 
> 2.25.1




Re: [PULL 00/20] pc,virtio,pci: fixes, features

2021-03-22 Thread Michael S. Tsirkin
On Mon, Mar 22, 2021 at 06:46:06PM +, Peter Maydell wrote:
> On Mon, 22 Mar 2021 at 16:41, Peter Maydell  wrote:
> >
> > On Mon, 22 Mar 2021 at 15:44, Michael S. Tsirkin  wrote:
> > >
> > > The following changes since commit 
> > > f0f20022a0c744930935fdb7020a8c18347d391a:
> > >
> > >   Merge remote-tracking branch 
> > > 'remotes/thuth-gitlab/tags/pull-request-2021-03-21' into staging 
> > > (2021-03-22 10:05:45 +)
> > >
> > > are available in the Git repository at:
> > >
> > >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> > >
> > > for you to fetch changes up to 5971d4a968d51a80daaad53ddaec2b285115af62:
> > >
> > >   acpi: Move setters/getters of oem fields to X86MachineState (2021-03-22 
> > > 11:39:02 -0400)
> > >
> > > 
> > > pc,virtio,pci: fixes, features
> > >
> > > Fixes all over the place.
> > > ACPI index support.
> > >
> > > Signed-off-by: Michael S. Tsirkin 
> > >
> >
> > This triggers a new clang runtime sanitizer warning:
> 
> With a backtrace:
> $ UBSAN_OPTIONS=print_stacktrace=1
> QTEST_QEMU_BINARY=build/clang/qemu-system-mips64el
> ./build/clang/tests/qtest/endianness-test -p
> /mips64el/endianness/fuloong2e
> /mips64el/endianness/fuloong2e: ../../hw/pci/pci.c:252:30: runtime
> error: shift exponent -1 is negative
> #0 0x55a17bc17a1f in pci_irq_state
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/pci/pci.c:252:30
> #1 0x55a17bc17a1f in pci_irq_handler
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/pci/pci.c:1453
> #2 0x55a17b7ed0a5 in pm_update_sci
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/isa/vt82c686.c:147:5
> #3 0x55a17b7ecce3 in via_pm_reset
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/isa/vt82c686.c:173:5
> #4 0x55a17c546cc7 in resettable_phase_hold
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/resettable.c:182:13
> #5 0x55a17c53839a in bus_reset_child_foreach
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/bus.c:97:13
> #6 0x55a17c546bc2 in resettable_phase_hold
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/resettable.c:173:5
> #7 0x55a17c5435ca in device_reset_child_foreach
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/qdev.c:366:9
> #8 0x55a17c546bc2 in resettable_phase_hold
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/resettable.c:173:5
> #9 0x55a17c53839a in bus_reset_child_foreach
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/bus.c:97:13
> #10 0x55a17c546bc2 in resettable_phase_hold
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/resettable.c:173:5
> #11 0x55a17c545ee0 in resettable_assert_reset
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/resettable.c:60:5
> #12 0x55a17c545dbf in resettable_reset
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/resettable.c:45:5
> #13 0x55a17c545d68 in qemu_devices_reset
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/reset.c:69:9
> #14 0x55a17c47b3eb in qemu_system_reset
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../softmmu/runstate.c:444:9
> #15 0x55a17ba225ee in qdev_machine_creation_done
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../hw/core/machine.c:1279:5
> #16 0x55a17c4bdb03 in qemu_machine_creation_done
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../softmmu/vl.c:2567:5
> #17 0x55a17c4bdb03 in qmp_x_exit_preconfig
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../softmmu/vl.c:2590
> #18 0x55a17c4c2c0b in qemu_init
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../softmmu/vl.c:3611:9
> #19 0x55a17b756db5 in main
> /home/petmay01/linaro/qemu-for-merges/build/clang/../../softmmu/main.c:49:5
> #20 0x7f3a9c9f6bf6 in __libc_start_main
> /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:310
> #21 0x55a17b731969 in _start
> (/home/petmay01/linaro/qemu-for-merges/build/clang/qemu-system-mips64el+0x1140969)
> 
> OK
> 
> Suggests the relevant commit is
> "acpi:piix4, vt82c686: reinitialize acpi PM device on reset"

Yep, Cc'd the authors and dropped for now. Thanks!

> This happens because pm_update_sci() calls pci_irq_handler(),
> which calls pci_intx(pci_dev), which returns -1, which is not
> a valid interrupt number to call pci_irq_handler() with.
> 
> Q: given that pci_irq_handler() says it must only be called with
> an irqnum in [0..3], shouldn't pci_set_irq() be a bit more
> cautious than to pull a byte directly out of PCI_INTERRUPT_PIN
> and assume it's valid? (Is this guest-writable, or is it read-only?)

It's read-only.

> 
> thanks
> -- PMM


-- 
MST




Re: [PULL 00/20] pc,virtio,pci: fixes, features

2021-03-22 Thread Michael S. Tsirkin
On Mon, Mar 22, 2021 at 04:41:01PM +, Peter Maydell wrote:
> On Mon, 22 Mar 2021 at 15:44, Michael S. Tsirkin  wrote:
> >
> > The following changes since commit f0f20022a0c744930935fdb7020a8c18347d391a:
> >
> >   Merge remote-tracking branch 
> > 'remotes/thuth-gitlab/tags/pull-request-2021-03-21' into staging 
> > (2021-03-22 10:05:45 +)
> >
> > are available in the Git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> >
> > for you to fetch changes up to 5971d4a968d51a80daaad53ddaec2b285115af62:
> >
> >   acpi: Move setters/getters of oem fields to X86MachineState (2021-03-22 
> > 11:39:02 -0400)
> >
> > 
> > pc,virtio,pci: fixes, features
> >
> > Fixes all over the place.
> > ACPI index support.
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> 
> This triggers a new clang runtime sanitizer warning:
> 
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/petmay01/linaro/qemu-for-merges/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-mips64el tests/qtest/qom-test --tap -k
> PASS 1 qtest-mips64el/qom-test /mips64el/qom/loongson3-virt
> PASS 2 qtest-mips64el/qom-test /mips64el/qom/none
> PASS 3 qtest-mips64el/qom-test /mips64el/qom/magnum
> PASS 4 qtest-mips64el/qom-test /mips64el/qom/mipssim
> PASS 5 qtest-mips64el/qom-test /mips64el/qom/malta
> ../../hw/pci/pci.c:252:30: runtime error: shift exponent -1 is negative
> PASS 6 qtest-mips64el/qom-test /mips64el/qom/fuloong2e
> PASS 7 qtest-mips64el/qom-test /mips64el/qom/boston
> PASS 8 qtest-mips64el/qom-test /mips64el/qom/pica61
> 
> and similarly for eg
> 
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/petmay01/linaro/qemu-for-merges/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-mips64el tests/qtest/endianness-test
> --tap -k
> ../../hw/pci/pci.c:252:30: runtime error: shift exponent -1 is negative
> PASS 1 qtest-mips64el/endianness-test /mips64el/endianness/fuloong2e
> ../../hw/pci/pci.c:252:30: runtime error: shift exponent -1 is negative
> PASS 2 qtest-mips64el/endianness-test /mips64el/endianness/split/fuloong2e
> ../../hw/pci/pci.c:252:30: runtime error: shift exponent -1 is negative
> PASS 3 qtest-mips64el/endianness-test /mips64el/endianness/combine/fuloong2e
> 
> thanks
> -- PMM

Weird I don't see any related changes. Something subtle going on.
I'd like to reproduce this.
Which clang flags do you use?

-- 
MST




Re: [PATCH v1] vhost-user-blk: use different event handlers on init and operation

2021-03-22 Thread Raphael Norwitz
I'm mostly happy with this. My biggest overall comment is that I think
this should be split into two, as your refactor using different event
handlers for init is a standalone improvement over and above the bugfix.

I would have the first commit split out vhost_user_blk_event_init() and
vhost_user_blk_event_oper(), replace the runstate_is_running() check 
...etc and the second commit immidiately call
vhost_user_blk_disconnect() during device realization.

A couple other comments mixed in bellow:

On Thu, Mar 11, 2021 at 11:10:45AM +0300, Denis Plotnikov wrote:
> Commit a1a20d06b73e "vhost-user-blk: delay vhost_user_blk_disconnect"

For the hash above can we rather use the first digits of the commit hash
instead of the last?

> introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
> because of connection problems with vhost-blk daemon.
> 
> However, it introdues a new problem. Now, any communication errors
> during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
> lead to qemu abort on assert in vhost_dev_get_config().
> 
> This happens because vhost_user_blk_disconnect() is postponed but
> it should have dropped s->connected flag by the time
> vhost_user_blk_device_realize() performs a new connection opening.
> On the connection opening, vhost_dev initialization in
> vhost_user_blk_connect() relies on s->connection flag and
> if it's not dropped, it skips vhost_dev initialization and returns
> with success. Then, vhost_user_blk_device_realize()'s execution flow
> goes to vhost_dev_get_config() where it's aborted on the assert.
> 
> It seems connection/disconnection processing should happen
> differently on initialization and operation of vhost-user-blk.
> On initialization (in vhost_user_blk_device_realize()) we fully
> control the initialization process. At that point, nobody can use the
> device since it isn't initialized and we don't need to postpone any
> cleanups, so we can do cleanup right away when there is communication
> problems with the vhost-blk daemon.
> On operation the disconnect may happen when the device is in use, so
> the device users may want to use vhost_dev's data to do rollback before
> vhost_dev is re-initialized (e.g. in vhost_dev_set_log()), so we
> postpone the cleanup.
> 
> The patch splits those two cases, and performs the cleanup immediately on
> initialization, and postpones cleanup when the device is initialized and
> in use.
> 
> Signed-off-by: Denis Plotnikov 
> ---
>  hw/block/vhost-user-blk.c | 88 ---
>  1 file changed, 54 insertions(+), 34 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index b870a50e6b20..84940122b8ca 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -362,7 +362,17 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>  vhost_dev_cleanup(>dev);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event, bool 
> init);

The parameter name "init" feels a little unclear. Maybe "realized"
would be better? I would also change the vhost_user_blk_event_init
function name accordingly.

> +
> +static void vhost_user_blk_event_init(void *opaque, QEMUChrEvent event)
> +{
> +vhost_user_blk_event(opaque, event, true);
> +}
> +
> +static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> +{
> +vhost_user_blk_event(opaque, event, false);
> +}
>  
>  static void vhost_user_blk_chr_closed_bh(void *opaque)
>  {
> @@ -371,11 +381,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  
>  vhost_user_blk_disconnect(dev);
> -qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vhost_user_blk_event,
> -NULL, opaque, NULL, true);
> +qemu_chr_fe_set_handlers(>chardev, NULL, NULL,
> +vhost_user_blk_event_oper, NULL, opaque, NULL, true);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event, bool init)
>  {
>  DeviceState *dev = opaque;
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -390,38 +400,42 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  break;
>  case CHR_EVENT_CLOSED:
>  /*
> - * A close event may happen during a read/write, but vhost
> - * code assumes the vhost_dev remains setup, so delay the
> - * stop & clear. There are two possible paths to hit this
> - * disconnect event:
> - * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> - * vhost_user_blk_device_realize() is a caller.
> - * 2. In tha main loop phase after VM start.
> - *
> - * For p2 the disconnect event will be delayed. We can't
> - * do the same for p1, because we are not running the loop
> - * at this moment. So 

Re: [PATCH v3 1/1] acpi: Consolidate the handling of OEM ID and OEM Table ID fields

2021-03-22 Thread Michael S. Tsirkin
On Mon, Mar 22, 2021 at 11:55:54PM +0200, Marian Postevca wrote:
> Introduces structure AcpiBuildOem to hold the value of OEM fields and
> uses dedicated macros to initialize/set the values.
> Unnecessary dynamically allocated OEM fields are re-factored to static
> allocation.
> 
> Signed-off-by: Marian Postevca 
> ---
>  hw/acpi/hmat.h   |  2 +-
>  hw/i386/acpi-common.h|  2 +-
>  include/hw/acpi/acpi-build-oem.h | 55 ++
>  include/hw/acpi/aml-build.h  | 15 +++---
>  include/hw/acpi/ghes.h   |  2 +-
>  include/hw/acpi/pci.h|  2 +-
>  include/hw/acpi/vmgenid.h|  2 +-
>  include/hw/arm/virt.h|  4 +-
>  include/hw/i386/x86.h|  4 +-
>  include/hw/mem/nvdimm.h  |  4 +-
>  hw/acpi/aml-build.c  | 27 ++-
>  hw/acpi/ghes.c   |  5 +-
>  hw/acpi/hmat.c   |  4 +-
>  hw/acpi/nvdimm.c | 22 +
>  hw/acpi/pci.c|  4 +-
>  hw/acpi/vmgenid.c|  6 ++-
>  hw/arm/virt-acpi-build.c | 40 ++--
>  hw/arm/virt.c| 16 +++
>  hw/i386/acpi-build.c | 78 +++-
>  hw/i386/acpi-common.c|  4 +-
>  hw/i386/acpi-microvm.c   | 13 ++
>  hw/i386/x86.c| 19 
>  22 files changed, 182 insertions(+), 148 deletions(-)
>  create mode 100644 include/hw/acpi/acpi-build-oem.h
> 
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> index b57f0e7e80..39c42328bd 100644
> --- a/hw/acpi/hmat.h
> +++ b/hw/acpi/hmat.h
> @@ -38,6 +38,6 @@
>  #define HMAT_PROXIMITY_INITIATOR_VALID  0x1
>  
>  void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState 
> *numa_state,
> -const char *oem_id, const char *oem_table_id);
> +struct AcpiBuildOem *bld_oem);
>  
>  #endif
> diff --git a/hw/i386/acpi-common.h b/hw/i386/acpi-common.h
> index b12cd73ea5..27c2e5b6a9 100644
> --- a/hw/i386/acpi-common.h
> +++ b/hw/i386/acpi-common.h
> @@ -10,6 +10,6 @@
>  
>  void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
>   X86MachineState *x86ms, AcpiDeviceIf *adev,
> - const char *oem_id, const char *oem_table_id);
> + struct AcpiBuildOem *bld_oem);
>  
>  #endif
> diff --git a/include/hw/acpi/acpi-build-oem.h 
> b/include/hw/acpi/acpi-build-oem.h
> new file mode 100644
> index 00..5e5edc4c22
> --- /dev/null
> +++ b/include/hw/acpi/acpi-build-oem.h
> @@ -0,0 +1,55 @@
> +#ifndef QEMU_HW_ACPI_BUILD_OEM_H
> +#define QEMU_HW_ACPI_BUILD_OEM_H
> +
> +/*
> + * Utilities for working with ACPI OEM ID and OEM TABLE ID fields
> + *
> + * Copyright (c) 2021 Marian Postevca
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +#include "qemu/cutils.h"
> +
> +#define ACPI_BUILD_APPNAME6 "BOCHS "
> +#define ACPI_BUILD_APPNAME8 "BXPC"

A single user for each of these now ... drop the defines?

> +
> +#define ACPI_BUILD_OEM_ID_SIZE 6
> +#define ACPI_BUILD_OEM_TABLE_ID_SIZE 8
> +
> +struct AcpiBuildOem {
> +char oem_id[ACPI_BUILD_OEM_ID_SIZE + 1];
> +char oem_table_id[ACPI_BUILD_OEM_TABLE_ID_SIZE + 1];
> +};
> +
> +#define ACPI_SET_BUILD_OEM_ID(__bld_oem, __oem_id) do {\
> +pstrcpy(__bld_oem.oem_id,  \
> +sizeof __bld_oem.oem_id, __oem_id);\
> +} while (0)
> +
> +#define ACPI_SET_BUILD_OEM_TABLE_ID(__bld_oem,  __oem_table_id) do {\
> +pstrcpy(__bld_oem.oem_table_id, \
> +sizeof __bld_oem.oem_table_id, __oem_table_id); \

we generally avoid names starting with __. No need for that
when not using local variables within macros ...



> +} while (0)
> +
> +#define ACPI_INIT_BUILD_OEM(__bld_oem, __oem_id, __oem_table_id) do {   \
> +ACPI_SET_BUILD_OEM_ID(__bld_oem, __oem_id); \
> +ACPI_SET_BUILD_OEM_TABLE_ID(__bld_oem, __oem_table_id); \
> +} while (0)
> +
> +#define ACPI_INIT_DEFAULT_BUILD_OEM(__bld_oem) do { \
> +ACPI_INIT_BUILD_OEM(__bld_oem,  \
> +ACPI_BUILD_APPNAME6, ACPI_BUILD_APPNAME8);  \
> +} while (0)

OK but ... why are these 

Re: [PATCH] hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows

2021-03-22 Thread Michael S. Tsirkin
On Mon, Mar 22, 2021 at 08:13:36PM +, Peter Maydell wrote:
> Currently the gpex PCI controller implements no special behaviour for
> guest accesses to areas of the PIO and MMIO where it has not mapped
> any PCI devices, which means that for Arm you end up with a CPU
> exception due to a data abort.
> 
> Most host OSes expect "like an x86 PC" behaviour, where bad accesses
> like this return -1 for reads and ignore writes.  In the interests of
> not being surprising, make host CPU accesses to these windows behave
> as -1/discard where there's no mapped PCI device.
> 
> Reported-by: Dmitry Vyukov 
> Fixes: https://bugs.launchpad.net/qemu/+bug/1918917
> Signed-off-by: Peter Maydell 

Acked-by: Michael S. Tsirkin 

BTW it looks like launchpad butchered the lore.kernel.org
link so one can't find out what was the guest issue this is
fixing. Want to include a bit more data in the commit log
instead?

> ---
> Not convinced that this is 6.0 material, because IMHO the
> kernel shouldn't be doing this in the first place.
> Do we need to have the property machinery so that old
> virt-5.2 etc retain the previous behaviour ?
> ---
>  include/hw/pci-host/gpex.h |  2 ++
>  hw/pci-host/gpex.c | 37 +++--
>  2 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
> index d48a020a952..ad876ecd209 100644
> --- a/include/hw/pci-host/gpex.h
> +++ b/include/hw/pci-host/gpex.h
> @@ -49,6 +49,8 @@ struct GPEXHost {
>  
>  MemoryRegion io_ioport;
>  MemoryRegion io_mmio;
> +MemoryRegion io_ioport_window;
> +MemoryRegion io_mmio_window;
>  qemu_irq irq[GPEX_NUM_IRQS];
>  int irq_num[GPEX_NUM_IRQS];
>  };
> diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
> index 2bdbe7b4561..1f48c89ac6a 100644
> --- a/hw/pci-host/gpex.c
> +++ b/hw/pci-host/gpex.c
> @@ -82,13 +82,46 @@ static void gpex_host_realize(DeviceState *dev, Error 
> **errp)
>  PCIExpressHost *pex = PCIE_HOST_BRIDGE(dev);
>  int i;
>  
> +/*
> + * Note that the MemoryRegions io_mmio and io_ioport that we pass
> + * to pci_register_root_bus() are not the same as the
> + * MemoryRegions io_mmio_window and io_ioport_window that we
> + * expose as SysBus MRs. The difference is in the behaviour of
> + * accesses to addresses where no PCI device has been mapped.
> + *
> + * io_mmio and io_ioport are the underlying PCI view of the PCI
> + * address space, and when a PCI device does a bus master access
> + * to a bad address this is reported back to it as a transaction
> + * failure.
> + *
> + * io_mmio_window and io_ioport_window implement "unmapped
> + * addresses read as -1 and ignore writes"; this is traditional
> + * x86 PC behaviour, which is not mandated by the PCI spec proper
> + * but expected by much PCI-using guest software, including Linux.
> + *
> + * In the interests of not being unnecessarily surprising, we
> + * implement it in the gpex PCI host controller, by providing the
> + * _window MRs, which are containers with io ops that implement
> + * the 'background' behaviour and which hold the real PCI MRs as
> + * subregions.
> + */
>  pcie_host_mmcfg_init(pex, PCIE_MMCFG_SIZE_MAX);
>  memory_region_init(>io_mmio, OBJECT(s), "gpex_mmio", UINT64_MAX);
>  memory_region_init(>io_ioport, OBJECT(s), "gpex_ioport", 64 * 1024);
>  
> +memory_region_init_io(>io_mmio_window, OBJECT(s),
> +  _io_ops, OBJECT(s),
> +  "gpex_mmio_window", UINT64_MAX);
> +memory_region_init_io(>io_ioport_window, OBJECT(s),
> +  _io_ops, OBJECT(s),
> +  "gpex_ioport_window", 64 * 1024);
> +
> +memory_region_add_subregion(>io_mmio_window, 0, >io_mmio);
> +memory_region_add_subregion(>io_ioport_window, 0, >io_ioport);
> +
>  sysbus_init_mmio(sbd, >mmio);
> -sysbus_init_mmio(sbd, >io_mmio);
> -sysbus_init_mmio(sbd, >io_ioport);
> +sysbus_init_mmio(sbd, >io_mmio_window);
> +sysbus_init_mmio(sbd, >io_ioport_window);
>  for (i = 0; i < GPEX_NUM_IRQS; i++) {
>  sysbus_init_irq(sbd, >irq[i]);
>  s->irq_num[i] = -1;
> -- 
> 2.20.1




Re: [PATCH v2] piix: fix regression during unplug in Xen HVM domUs

2021-03-22 Thread John Snow

On 3/17/21 3:00 AM, Olaf Hering wrote:

Commit ee358e919e385fdc79d59d0d47b4a81e349cd5c9 causes a regression in
Xen HVM domUs which run xenlinux based kernels.

If the domU has an USB device assigned, for example with
"usbdevice=['tablet']" in domU.cfg, the late unplug of devices will
kill the emulated USB host. As a result the khubd thread hangs, and as
a result the entire boot process.

For some reason this does not affect pvops based kernels. This is
most likely caused by the fact that unplugging happens very early
during boot.



I'm not entirely sure of how the commit message relates to the patch, 
actually. (Sorry, I am not well familiar with XEN.)



Signed-off-by: Olaf Hering 
---
  hw/ide/piix.c| 5 +
  include/hw/ide/pci.h | 1 +
  2 files changed, 6 insertions(+)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index b9860e35a5..7f1998bf04 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -109,6 +109,9 @@ static void piix_ide_reset(DeviceState *dev)
  uint8_t *pci_conf = pd->config;
  int i;
  
+if (d->xen_unplug_done == true) {

+return;
+}


My understanding is that XEN has some extra disks that it unplugs when 
it later figures out it doesn't need them. How exactly this works is 
something I've not looked into too closely.


So if these IDE devices have been "unplugged" already, we avoid 
resetting them here. What about this reset causes the bug you describe 
in the commit message?


Does this reset now happen earlier/later as compared to what it did 
prior to ee358e91 ?



  for (i = 0; i < 2; i++) {
  ide_bus_reset(>bus[i]);
  }
@@ -151,6 +154,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  PCIIDEState *d = PCI_IDE(dev);
  uint8_t *pci_conf = dev->config;
  
+d->xen_unplug_done = false;

  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
  
  bmdma_setup_bar(d);

@@ -170,6 +174,7 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
  BlockBackend *blk;
  
  pci_ide = PCI_IDE(dev);

+pci_ide->xen_unplug_done = true;
  
  for (i = aux ? 1 : 0; i < 4; i++) {

  idebus = _ide->bus[i / 2];
diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index d8384e1c42..9e71cfec3b 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -50,6 +50,7 @@ struct PCIIDEState {
  IDEBus bus[2];
  BMDMAState bmdma[2];
  uint32_t secondary; /* used only for cmd646 */
+bool xen_unplug_done;


I am hesitant to put a new XEN-specific boolean here, but don't know 
enough about the problem to outright say "no".


This looks like a band-aid that's out of place, but I don't understand 
the problem well enough yet to suggest a better place.



  MemoryRegion bmdma_bar;
  MemoryRegion cmd_bar[2];
  MemoryRegion data_bar[2];



(If anyone else with more experience with XEN wants to take over the 
review of this patch, let me know. I only really care about the IDE bits.)





Re: [PATCH v2] target/riscv: Prevent lost illegal instruction exceptions

2021-03-22 Thread Richard Henderson

On 3/22/21 6:16 AM, Georg Kotheimer wrote:

When decode_insn16() fails, we fall back to decode_RV32_64C() for
further compressed instruction decoding. However, prior to this change,
we did not raise an illegal instruction exception, if decode_RV32_64C()
fails to decode the instruction. This means that we skipped illegal
compressed instructions instead of raising an illegal instruction
exception.

Instead of patching decode_RV32_64C(), we can just remove it,
as it is dead code since f330433b363 anyway.

Signed-off-by: Georg Kotheimer
---
  target/riscv/translate.c | 179 +--
  1 file changed, 1 insertion(+), 178 deletions(-)


Reviewed-by: Richard Henderson 

r~



[PATCH v3 0/1] Rework ACPI OEM fields handling to simplify code (was: acpi: Remove duplicated code handling OEM ID and OEM table ID fields)

2021-03-22 Thread Marian Postevca
This patch consolidates ACPI OEM fields handling
by:
- Moving common code in PC and MICROVM to X86.
- Changes unnecessary dynamic memory allocation to static allocation
- Uses dedicated structure to keep values of fields instead of two
  separate strings
- Adds helper macros to initialize the structure

v2:
- Move the setters/getters of OEM fields to X86MachineState to
  remove duplication
- Change commit message to make it clear the second commit is
  a re-factor

v3:
- Rebase "acpi: Consolidate the handling of OEM ID and OEM
  Table ID fields to latest" to latest HEAD
- Dropped "acpi: Move setters/getters of oem fields to
   X86MachineState" since it was accepted already

Marian Postevca (1):
  acpi: Consolidate the handling of OEM ID and OEM Table ID fields

 hw/acpi/hmat.h   |  2 +-
 hw/i386/acpi-common.h|  2 +-
 include/hw/acpi/acpi-build-oem.h | 55 ++
 include/hw/acpi/aml-build.h  | 15 +++---
 include/hw/acpi/ghes.h   |  2 +-
 include/hw/acpi/pci.h|  2 +-
 include/hw/acpi/vmgenid.h|  2 +-
 include/hw/arm/virt.h|  4 +-
 include/hw/i386/x86.h|  4 +-
 include/hw/mem/nvdimm.h  |  4 +-
 hw/acpi/aml-build.c  | 27 ++-
 hw/acpi/ghes.c   |  5 +-
 hw/acpi/hmat.c   |  4 +-
 hw/acpi/nvdimm.c | 22 +
 hw/acpi/pci.c|  4 +-
 hw/acpi/vmgenid.c|  6 ++-
 hw/arm/virt-acpi-build.c | 40 ++--
 hw/arm/virt.c| 16 +++
 hw/i386/acpi-build.c | 78 +++-
 hw/i386/acpi-common.c|  4 +-
 hw/i386/acpi-microvm.c   | 13 ++
 hw/i386/x86.c| 19 
 22 files changed, 182 insertions(+), 148 deletions(-)
 create mode 100644 include/hw/acpi/acpi-build-oem.h

-- 
2.26.2




[PATCH v3 1/1] acpi: Consolidate the handling of OEM ID and OEM Table ID fields

2021-03-22 Thread Marian Postevca
Introduces structure AcpiBuildOem to hold the value of OEM fields and
uses dedicated macros to initialize/set the values.
Unnecessary dynamically allocated OEM fields are re-factored to static
allocation.

Signed-off-by: Marian Postevca 
---
 hw/acpi/hmat.h   |  2 +-
 hw/i386/acpi-common.h|  2 +-
 include/hw/acpi/acpi-build-oem.h | 55 ++
 include/hw/acpi/aml-build.h  | 15 +++---
 include/hw/acpi/ghes.h   |  2 +-
 include/hw/acpi/pci.h|  2 +-
 include/hw/acpi/vmgenid.h|  2 +-
 include/hw/arm/virt.h|  4 +-
 include/hw/i386/x86.h|  4 +-
 include/hw/mem/nvdimm.h  |  4 +-
 hw/acpi/aml-build.c  | 27 ++-
 hw/acpi/ghes.c   |  5 +-
 hw/acpi/hmat.c   |  4 +-
 hw/acpi/nvdimm.c | 22 +
 hw/acpi/pci.c|  4 +-
 hw/acpi/vmgenid.c|  6 ++-
 hw/arm/virt-acpi-build.c | 40 ++--
 hw/arm/virt.c| 16 +++
 hw/i386/acpi-build.c | 78 +++-
 hw/i386/acpi-common.c|  4 +-
 hw/i386/acpi-microvm.c   | 13 ++
 hw/i386/x86.c| 19 
 22 files changed, 182 insertions(+), 148 deletions(-)
 create mode 100644 include/hw/acpi/acpi-build-oem.h

diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index b57f0e7e80..39c42328bd 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -38,6 +38,6 @@
 #define HMAT_PROXIMITY_INITIATOR_VALID  0x1
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state,
-const char *oem_id, const char *oem_table_id);
+struct AcpiBuildOem *bld_oem);
 
 #endif
diff --git a/hw/i386/acpi-common.h b/hw/i386/acpi-common.h
index b12cd73ea5..27c2e5b6a9 100644
--- a/hw/i386/acpi-common.h
+++ b/hw/i386/acpi-common.h
@@ -10,6 +10,6 @@
 
 void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
  X86MachineState *x86ms, AcpiDeviceIf *adev,
- const char *oem_id, const char *oem_table_id);
+ struct AcpiBuildOem *bld_oem);
 
 #endif
diff --git a/include/hw/acpi/acpi-build-oem.h b/include/hw/acpi/acpi-build-oem.h
new file mode 100644
index 00..5e5edc4c22
--- /dev/null
+++ b/include/hw/acpi/acpi-build-oem.h
@@ -0,0 +1,55 @@
+#ifndef QEMU_HW_ACPI_BUILD_OEM_H
+#define QEMU_HW_ACPI_BUILD_OEM_H
+
+/*
+ * Utilities for working with ACPI OEM ID and OEM TABLE ID fields
+ *
+ * Copyright (c) 2021 Marian Postevca
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+#include "qemu/cutils.h"
+
+#define ACPI_BUILD_APPNAME6 "BOCHS "
+#define ACPI_BUILD_APPNAME8 "BXPC"
+
+#define ACPI_BUILD_OEM_ID_SIZE 6
+#define ACPI_BUILD_OEM_TABLE_ID_SIZE 8
+
+struct AcpiBuildOem {
+char oem_id[ACPI_BUILD_OEM_ID_SIZE + 1];
+char oem_table_id[ACPI_BUILD_OEM_TABLE_ID_SIZE + 1];
+};
+
+#define ACPI_SET_BUILD_OEM_ID(__bld_oem, __oem_id) do {\
+pstrcpy(__bld_oem.oem_id,  \
+sizeof __bld_oem.oem_id, __oem_id);\
+} while (0)
+
+#define ACPI_SET_BUILD_OEM_TABLE_ID(__bld_oem,  __oem_table_id) do {\
+pstrcpy(__bld_oem.oem_table_id, \
+sizeof __bld_oem.oem_table_id, __oem_table_id); \
+} while (0)
+
+#define ACPI_INIT_BUILD_OEM(__bld_oem, __oem_id, __oem_table_id) do {   \
+ACPI_SET_BUILD_OEM_ID(__bld_oem, __oem_id); \
+ACPI_SET_BUILD_OEM_TABLE_ID(__bld_oem, __oem_table_id); \
+} while (0)
+
+#define ACPI_INIT_DEFAULT_BUILD_OEM(__bld_oem) do { \
+ACPI_INIT_BUILD_OEM(__bld_oem,  \
+ACPI_BUILD_APPNAME6, ACPI_BUILD_APPNAME8);  \
+} while (0)
+
+#endif /* QEMU_HW_ACPI_BUILD_OEM_H */
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 471266d739..b5a9223158 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -3,9 +3,8 @@
 
 #include "hw/acpi/acpi-defs.h"
 #include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/acpi-build-oem.h"
 
-#define ACPI_BUILD_APPNAME6 "BOCHS "
-#define ACPI_BUILD_APPNAME8 "BXPC"
 
 #define ACPI_BUILD_TABLE_FILE "etc/acpi/tables"
 #define 

Re: [PATCH v2 0/3] exec: Build page-vary-common.c with -fno-lto

2021-03-22 Thread Richard Henderson

On 3/22/21 7:00 AM, Paolo Bonzini wrote:

On 22/03/21 12:24, Philippe Mathieu-Daudé wrote:

Hi,

While reviewing Richard's original patch, I split it in 3
to make it more digestible to my review taste. I then simply
filled the patch descriptions. Feel free to keep Richard's v1
if this isn't worth it.

What is still missing is adding the new files to a MAINTAINERS
section.


Both versions look good, thanks (the split is indeed more digestible)! Richard, 
are you going to queue one of them?


Yes, I'll take Phil's split, move the new files into softmmu/, and update 
MAINTAINERS.



r~



Re: Fwd: [PATCH 0/2] block/raw: implemented persistent dirty bitmap and ability to dump bitmap content via qapi

2021-03-22 Thread Patrik Janoušek
On 3/22/21 1:06 PM, Max Reitz wrote:
> On 22.03.21 12:27, Patrik Janoušek wrote:
>> On 3/22/21 11:48 AM, Max Reitz wrote:
>>> Hi,
>>>
>>> On 20.03.21 11:01, Patrik Janoušek wrote:
 I'm sorry, but I forgot to add you to the cc, so I'm forwarding the
 patch to you additionally. I don't want to spam the mailing list
 unnecessarily.
>>>
>>> I think it’s better to still CC the list.  It’s so full of mail, one
>>> more won’t hurt. :)
>>>
>>> (Re-adding qemu-block and qemu-devel, because the discussion belongs
>>> on the list(s).)
>>>
  Forwarded Message 
 Subject: [PATCH 0/2] block/raw: implemented persistent dirty
 bitmap and ability to dump bitmap content via qapi
 Date: Sat, 20 Mar 2021 10:32:33 +0100
 From: Patrik Janoušek 
 To: qemu-devel@nongnu.org
 CC: Patrik Janoušek , lmate...@kiv.zcu.cz



 Currently, QEMU doesn't support persistent dirty bitmaps for raw
 format
 and also dirty bitmaps are for internal use only, and cannot be
 accessed
 using third-party applications. These facts are very limiting
 in case someone would like to develop their own backup tool becaouse
 without access to the dirty bitmap it would be possible to implement
 only full backups. And without persistent dirty bitmaps, it wouldn't
 be possible to keep track of changed data after QEMU is restarted. And
 this is exactly what I do as a part of my bachelor thesis. I've
 developed a tool that is able to create incremental backups of drives
 in raw format that are LVM volumes (ability to create snapshot is
 required).
>>>
>>> Similarly to what Vladimir has said already, the thing is that
>>> conceptually I can see no difference between having a raw image with
>>> the bitmaps stored in some other file, i.e.:
>>>
>>>    { "driver": "raw",
>>>  "dirty-bitmaps": [ {
>>>    "filename": "sdc1.bitmap",
>>>    "persistent": true
>>>  } ],
>>>  "file": {
>>>    "driver": "file",
>>>    "filename": "/dev/sdc1"
>>>  } }
>>>
>>> And having a qcow2 image with the raw data stored in some other file,
>>> i.e.:
>>>
>>>    { "driver": "qcow2",
>>>  "file": {
>>>    "driver": "file",
>>>    "filename": "sdc1.metadata"
>>>  },
>>>  "data-file": {
>>>    "driver": "file",
>>>    "filename": "/dev/sdc1"
>>>  } }
>>>
>>> (Where sdc1.metadata is a qcow2 file created with
>>> “data-file=/dev/sdc1,data-file-raw=on”.)
>>>
>>> To use persistent bitmaps with raw images, you need to add metadata
>>> (namely, the bitmaps).  Why not store that metadata in a qcow2 file?
>>>
>>> Max
>>
>> So if I understand it correctly. I can configure dirty bitmaps in the
>> latest version of QEMU to be persistently stored in some other file.
>> Because even Proxmox Backup Server can't perform an incremental backup
>> after restarting QEMU, and that means something to me. I think they
>> would implement it if it was that simple.
>>
>> Could you please send me simple example on how to configure (via command
>> line args) one raw format drive that can store dirty bitmaps
>> persistently in other qcow2 file? I may be missing something, but I
>> thought QEMU couldn't do it, because Proxmox community wants this
>> feature for a long time.
>
> One trouble is that if you use qemu-img create to create the qcow2
> image, it will always create an empty image, and so if use pass
> data_file to it, it will empty the existing raw image:
>
> $ cp ~/tmp/arch.iso raw.img # Just some Arch Linux ISO
>
> $ qemu-img create \
>     -f qcow2 \
>     -o data_file=raw.img,data_file_raw=on,preallocation=metadata \
>     metadata.qcow2 \
>     $(stat -c '%s' raw.img)
> Formatting 'metadata.qcow2', fmt=qcow2 cluster_size=65536
> preallocation=metadata compression_type=zlib size=687865856
> data_file=raw.img data_file_raw=on lazy_refcounts=off refcount_bits=16
>
> (If you check raw.img at this point, you’ll find that it’s empty, so
> you need to copy it from the source again:)
>
> $ cp ~/tmp/arch.iso raw.img
>
> Now if you use metadata.qcow2, the image data will actually all be
> stored in raw.img.
>
>
> To get around the “creating metadata.qcow2 clears raw.img” problem,
> you can either create a temporary empty image of the same size as
> raw.img that you pass to qemu-img create, and then you use qemu-img
> amend to change the data-file pointer (which will not overwrite the
> new data-file’s contents):
>
> $ qemu-img create -f raw tmp.raw $(stat -c '%s' raw.img)
>
> $ qemu-img create \
>     -f qcow2 \
>     -o data_file=tmp.img,data_file_raw=on,preallocation=metadata \
>     metadata.qcow2 \
>     $(stat -c '%s' raw.img)
> Formatting 'metadata.qcow2', fmt=qcow2 cluster_size=65536
> preallocation=metadata compression_type=zlib size=687865856
> data_file=tmp.img data_file_raw=on lazy_refcounts=off refcount_bits=16
>
> $ qemu-img amend -o data_file=raw.img metadata.qcow2
>
> $ rm tmp.img
>
>
> Or you use the 

Re: [PATCH] exec: Build page-varry-common.c with -fno-lto

2021-03-22 Thread Richard Henderson

On 3/22/21 5:14 AM, Philippe Mathieu-Daudé wrote:

  configure|  19 ---
  meson.build  |  18 ++-
  include/exec/cpu-all.h   |  15 ++
  include/exec/page-vary.h |  34 
  exec-vary.c  | 108 ---
  page-vary-common.c   |  54 
  page-vary.c  |  41 +++
  7 files changed, 150 insertions(+), 139 deletions(-)
  create mode 100644 include/exec/page-vary.h
  delete mode 100644 exec-vary.c
  create mode 100644 page-vary-common.c
  create mode 100644 page-vary.c


In which MAINTAINERS section this files belong to?


Hmm, yes.  I see exec-vary.c wasn't listed either.

It looks like exec-vary.c should have gone into softmmu/ when that was created. 
 And should be part of "Overall TCG CPUs".


r~



Re: [PATCH] hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows

2021-03-22 Thread Arnd Bergmann
6On Mon, Mar 22, 2021 at 9:13 PM Peter Maydell  wrote:
>
> Currently the gpex PCI controller implements no special behaviour for
> guest accesses to areas of the PIO and MMIO where it has not mapped
> any PCI devices, which means that for Arm you end up with a CPU
> exception due to a data abort.
>
> Most host OSes expect "like an x86 PC" behaviour, where bad accesses
> like this return -1 for reads and ignore writes.  In the interests of
> not being surprising, make host CPU accesses to these windows behave
> as -1/discard where there's no mapped PCI device.
>
> Reported-by: Dmitry Vyukov 
> Fixes: https://bugs.launchpad.net/qemu/+bug/1918917
> Signed-off-by: Peter Maydell 
> ---
> Not convinced that this is 6.0 material, because IMHO the
> kernel shouldn't be doing this in the first place.
> Do we need to have the property machinery so that old
> virt-5.2 etc retain the previous behaviour ?

I think it would be sufficient to do this for the ioport window,
which is what old-style ISA drivers access. I am not aware
of any driver accessing hardcoded addresses in the mmio
window, at least not without probing io ports first (the VGA
text console would use both).

I checked which SoCs the kernel supports that do require a special
hook to avoid an abort and found these:

arch/arm/mach-bcm/bcm_5301x.c:  hook_fault_code(16 + 6,
bcm5301x_abort_handler, SIGBUS, BUS_OBJERR,
arch/arm/mach-cns3xxx/pcie.c:   hook_fault_code(16 + 6,
cns3xxx_pcie_abort_handler, SIGBUS, 0,
arch/arm/mach-iop32x/pci.c: hook_fault_code(16+6,
iop3xx_pci_abort, SIGBUS, 0, "imprecise external abort");
arch/arm/mach-ixp4xx/common-pci.c:  hook_fault_code(16+6,
abort_handler, SIGBUS, 0,
drivers/pci/controller/dwc/pci-imx6.c:  hook_fault_code(8,
imx6q_pcie_abort_handler, SIGBUS, 0,
drivers/pci/controller/dwc/pci-keystone.c:  hook_fault_code(17,
ks_pcie_fault, SIGBUS, 0,

The first four (bcm5301x, cns3xxx, iop32x and ixp4xx) generate an
'imprecise external abort' (16+6), imx6q has a "precise external abort on
non-linefetch" (8), and keystone in LPAE mode has an "asynchronous external
abort". The only SoC among those that is emulated by qemu to my knowledge
is the i.MX6q in the 'sabrelite' machine.

It's possible that some of these are not caused by a PCI master abort
but some other error condition on the PCI host though. I think most other
PCI implementations either ignore the error or generate an I/O interrupt.

Arnd



Re: [PATCH] [NFC] Mark locally used symbols as static.

2021-03-22 Thread Max Filippov
On Mon, Mar 22, 2021 at 12:55 PM Yuri Gribov  wrote:
>
> Hi all,
>
> This patch makes locally used symbols static to enable more compiler
> optimizations on them. Some of the symbols turned out to not be used
> at all so I marked them with ATTRIBUTE_UNUSED (as I wasn't sure if
> they were ok to delete).
>
> The symbols have been identified with a pet project of mine:
> https://github.com/yugr/Localizer
>
> From 07b4f05893b7037e68e5d7bdec5ba8e74e50 Mon Sep 17 00:00:00 2001
> From: Yury Gribov 
> Date: Sat, 20 Mar 2021 23:39:15 +0300
> Subject: [PATCH] [NFC] Mark locally used symbols as static.
>
> Signed-off-by: Yury Gribov 
> ---
>  disas/alpha.c | 16 ++--
>  disas/m68k.c  | 78 -
>  disas/mips.c  | 14 ++--
>  disas/nios2.c | 84 +--
>  disas/ppc.c   | 26 +++---
>  disas/riscv.c |  2 +-
>  pc-bios/optionrom/linuxboot_dma.c |  4 +-
>  scripts/tracetool/format/c.py |  2 +-
>  target/hexagon/gen_dectree_import.c   |  2 +-
>  target/hexagon/opcodes.c  |  2 +-
>  target/i386/cpu.c |  2 +-
>  target/s390x/cpu_models.c |  2 +-
>  .../xtensa/core-dc232b/xtensa-modules.c.inc   |  2 +-
>  .../xtensa/core-dc233c/xtensa-modules.c.inc   |  2 +-
>  target/xtensa/core-de212/xtensa-modules.c.inc |  2 +-
>  .../core-de233_fpu/xtensa-modules.c.inc   |  2 +-
>  .../xtensa/core-dsp3400/xtensa-modules.c.inc  |  2 +-
>  target/xtensa/core-fsf/xtensa-modules.c.inc   |  2 +-
>  .../xtensa-modules.c.inc  |  2 +-
>  .../core-test_kc705_be/xtensa-modules.c.inc   |  2 +-
>  .../core-test_mmuhifi_c3/xtensa-modules.c.inc |  2 +-
>  21 files changed, 125 insertions(+), 127 deletions(-)

For the xtensa part:
Acked-by: Max Filippov 

Changed files are auto-generated, I'll add a rule to the import
script (target/xtensa/import_core.sh) to do this transformation.
-- 
Thanks.
-- Max



Re: [PATCH] i386/cpu_dump: support AVX512 ZMM regs dump

2021-03-22 Thread Richard Henderson

On 3/22/21 4:59 AM, Robert Hoo wrote:

Since commit fa4518741e (target-i386: Rename struct XMMReg to ZMMReg),
CPUX86State.xmm_regs[] has already been extended to 512bit to support
AVX512.
Also, other qemu level supports for AVX512 registers are there for
years.
But in x86_cpu_dump_state(), still only dump XMM registers.
This patch is just to complement this part, let it dump ZMM of 512bits.


I think you should examine the state of the cpu to determine what of SSE, AVX 
or AVX512 is currently enabled, then dump that.



-if (env->hflags & HF_CS64_MASK)
-nb = 16;
-else
-nb = 8;
-for(i=0;ixmm_regs) / sizeof(env->xmm_regs[0]);


E.g., you're dumping all of the registers in 32-bit mode, which is restricted 
to 8 registers, not 32.



r~



Re: [PATCH] configure: Improve alias attribute check

2021-03-22 Thread Richard Henderson

On 3/22/21 4:54 AM, Gavin Shan wrote:

It looks this issue can be avoided after "volatile" is applied to
@target_page. However, I'm not sure if it's the correct fix to have.


Certainly not.

That is the exact opposite of what we want.  We want to minimize the number of 
reads from the variable, not maximize them.



r~



Re: [PATCH 2/2] docs/system: riscv: Add documentation for 'microchip-icicle-kit' machine

2021-03-22 Thread Alistair Francis
On Mon, Mar 22, 2021 at 3:53 AM Bin Meng  wrote:
>
> From: Bin Meng 
>
> This adds the documentation to describe what is supported for the
> 'microchip-icicle-kit' machine, and how to boot the machine in QEMU.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  docs/system/riscv/microchip-icicle-kit.rst | 89 ++
>  docs/system/target-riscv.rst   |  1 +
>  2 files changed, 90 insertions(+)
>  create mode 100644 docs/system/riscv/microchip-icicle-kit.rst
>
> diff --git a/docs/system/riscv/microchip-icicle-kit.rst 
> b/docs/system/riscv/microchip-icicle-kit.rst
> new file mode 100644
> index 00..4fe97bce3f
> --- /dev/null
> +++ b/docs/system/riscv/microchip-icicle-kit.rst
> @@ -0,0 +1,89 @@
> +Microchip PolarFire SoC Icicle Kit (``microchip-icicle-kit``)
> +=
> +
> +Microchip PolarFire SoC Icicle Kit integrates a PolarFire SoC, with one
> +SiFive's E51 plus four U54 cores and many on-chip peripherals and an FPGA.
> +
> +For more details about Microchip PolarFire SoC, please see:
> +https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga
> +
> +The Icicle Kit board information can be found here:
> +https://www.microsemi.com/existing-parts/parts/152514
> +
> +Supported devices
> +-
> +
> +The ``microchip-icicle-kit`` machine supports the following devices:
> +
> + * 1 E51 core
> + * 4 U54 cores
> + * Core Level Interruptor (CLINT)
> + * Platform-Level Interrupt Controller (PLIC)
> + * L2 Loosely Integrated Memory (L2-LIM)
> + * DDR memory controller
> + * 5 MMUARTs
> + * 1 DMA controller
> + * 2 GEM Ethernet controllers
> + * 1 SDHC storage controller
> +
> +Boot options
> +
> +
> +The ``microchip-icicle-kit`` machine can start using the standard -bios
> +functionality for loading its BIOS image, aka Hart Software Services (HSS_).
> +HSS loads the second stage bootloader U-Boot from an SD card. It does not
> +support direct kernel loading via the -kernel option. One has to load kernel
> +from U-Boot.
> +
> +The memory is set to 1537 MiB by default which is the minimum required high
> +memory size by HSS. A sanity check on ram size is performed in the machine
> +init routine to prompt user to increase the RAM size to > 1537 MiB when less
> +than 1537 MiB ram is detected.
> +
> +Boot the machine
> +
> +
> +HSS 2020.12 release is tested at the time of writing. To build an HSS image
> +that can be booted by the ``microchip-icicle-kit`` machine, type the 
> following
> +in the HSS source tree:
> +
> +.. code-block:: bash
> +
> +  $ export CROSS_COMPILE=riscv64-linux-
> +  $ cp boards/mpfs-icicle-kit-es/def_config .config
> +  $ make BOARD=mpfs-icicle-kit-es
> +
> +Download the official SD card image released by Microchip and prepare it for
> +QEMU usage:
> +
> +.. code-block:: bash
> +
> +  $ wget 
> ftp://ftpsoc.microsemi.com/outgoing/core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz
> +  $ gunzip 
> core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz
> +  $ qemu-img resize 
> core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic 4G
> +
> +Then we can boot the machine by:
> +
> +.. code-block:: bash
> +
> +  $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 \
> +  -bios path/to/hss.bin -sd path/to/sdcard.img \
> +  -nic user,model=cadence_gem \
> +  -nic tap,ifname=tap,model=cadence_gem,script=no \
> +  -display none -serial stdio \
> +  -chardev socket,id=serial1,path=serial1.sock,server=on,wait=on \
> +  -serial chardev:serial1
> +
> +With above command line, current terminal session will be used for the first
> +serial port. Open another terminal window, and use `minicom` to connect the
> +second serial port.
> +
> +.. code-block:: bash
> +
> +  $ minicom -D unix\#serial1.sock
> +
> +HSS output is on the first serial port (stdio) and U-Boot outputs on the
> +second serial port. U-Boot will automatically load the Linux kernel from
> +the SD card image.
> +
> +.. _HSS: https://github.com/polarfire-soc/hart-software-services
> diff --git a/docs/system/target-riscv.rst b/docs/system/target-riscv.rst
> index 94d99c4c82..8d5946fbbb 100644
> --- a/docs/system/target-riscv.rst
> +++ b/docs/system/target-riscv.rst
> @@ -66,6 +66,7 @@ undocumented; you can get a complete list by running
>  .. toctree::
> :maxdepth: 1
>
> +   riscv/microchip-icicle-kit
> riscv/sifive_u
>
>  RISC-V CPU features
> --
> 2.25.1
>
>



Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread

2021-03-22 Thread Peter Xu
On Fri, Mar 19, 2021 at 05:52:47PM +0300, Andrey Gruzdev wrote:
> Added missing qemu_fflush() on buffer file holding precopy device state.
> Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
> Typical configurations often require >200KB for device state and VMDESC.
> 
> Signed-off-by: Andrey Gruzdev 
> ---
>  migration/migration.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ca8b97baa5..32b48fe9f5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
>   * with vCPUs running and, finally, write stashed non-RAM part of
>   * the vmstate from the buffer to the migration stream.
>   */
> -s->bioc = qio_channel_buffer_new(128 * 1024);
> +s->bioc = qio_channel_buffer_new(512 * 1024);
>  qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
>  fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
>  object_unref(OBJECT(s->bioc));
> @@ -3866,6 +3866,8 @@ static void *bg_migration_thread(void *opaque)
>  if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
>  goto fail;
>  }
> +qemu_fflush(fb);

What will happen if the vmstates are bigger than 512KB?  Would the extra data
be dropped?

In that case, I'm wondering whether we'll need a qemu_file_get_error() after
the flush to detect it, and whether we need to retry with a bigger buffer size?

Thanks,

-- 
Peter Xu




Re: [Virtio-fs] [External] Re: [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection

2021-03-22 Thread Vivek Goyal
On Mon, Mar 22, 2021 at 11:00:52AM +, Stefan Hajnoczi wrote:
> On Wed, Mar 17, 2021 at 08:32:31PM +0800, Jiachen Zhang wrote:
> > On Wed, Mar 17, 2021 at 6:05 PM Stefan Hajnoczi  wrote:
> > > On Fri, Dec 18, 2020 at 05:39:34PM +0800, Jiachen Zhang wrote:
> > I agreed with you that a virtiofsd must be launched by a software like
> > systemd. So we are planning to define more generic persist/restore
> > interfaces (callbacks). Then anyone can implement their own persist/restore
> > callbacks to store states to proper places.  And I think in the next
> > version we will implement default callbacks for the interfaces. Instead of
> > vhost-user messages, systemd's sd_notify(3) will be the default method for
> > storing fds, and several tmpfs files can be the default place to store the
> > shm regions.
> 
> Okay, great!
> 
> I was thinking about how to make the crash recovery mechanism reusable
> as a C library or Rust crate. The mechanism is a combination of:
> 1. sd_listen_fds(3) for restoring the fds on restart.
> 2. sd_notify(3) for storing the fds.
> 3. memfd or tmpfs for storing state (could be mmapped).
> 
> I'm not sure if there is enough common behavior to create a reusable API
> or if this is quite application-specific.

I am wondering what will happen for use cases where virtiofsd is running
inside a container (with no systemd inside containers).

Do container managers offer systemd like services to save and restore
state.

Vivek




[PATCH] hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows

2021-03-22 Thread Peter Maydell
Currently the gpex PCI controller implements no special behaviour for
guest accesses to areas of the PIO and MMIO where it has not mapped
any PCI devices, which means that for Arm you end up with a CPU
exception due to a data abort.

Most host OSes expect "like an x86 PC" behaviour, where bad accesses
like this return -1 for reads and ignore writes.  In the interests of
not being surprising, make host CPU accesses to these windows behave
as -1/discard where there's no mapped PCI device.

Reported-by: Dmitry Vyukov 
Fixes: https://bugs.launchpad.net/qemu/+bug/1918917
Signed-off-by: Peter Maydell 
---
Not convinced that this is 6.0 material, because IMHO the
kernel shouldn't be doing this in the first place.
Do we need to have the property machinery so that old
virt-5.2 etc retain the previous behaviour ?
---
 include/hw/pci-host/gpex.h |  2 ++
 hw/pci-host/gpex.c | 37 +++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index d48a020a952..ad876ecd209 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -49,6 +49,8 @@ struct GPEXHost {
 
 MemoryRegion io_ioport;
 MemoryRegion io_mmio;
+MemoryRegion io_ioport_window;
+MemoryRegion io_mmio_window;
 qemu_irq irq[GPEX_NUM_IRQS];
 int irq_num[GPEX_NUM_IRQS];
 };
diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
index 2bdbe7b4561..1f48c89ac6a 100644
--- a/hw/pci-host/gpex.c
+++ b/hw/pci-host/gpex.c
@@ -82,13 +82,46 @@ static void gpex_host_realize(DeviceState *dev, Error 
**errp)
 PCIExpressHost *pex = PCIE_HOST_BRIDGE(dev);
 int i;
 
+/*
+ * Note that the MemoryRegions io_mmio and io_ioport that we pass
+ * to pci_register_root_bus() are not the same as the
+ * MemoryRegions io_mmio_window and io_ioport_window that we
+ * expose as SysBus MRs. The difference is in the behaviour of
+ * accesses to addresses where no PCI device has been mapped.
+ *
+ * io_mmio and io_ioport are the underlying PCI view of the PCI
+ * address space, and when a PCI device does a bus master access
+ * to a bad address this is reported back to it as a transaction
+ * failure.
+ *
+ * io_mmio_window and io_ioport_window implement "unmapped
+ * addresses read as -1 and ignore writes"; this is traditional
+ * x86 PC behaviour, which is not mandated by the PCI spec proper
+ * but expected by much PCI-using guest software, including Linux.
+ *
+ * In the interests of not being unnecessarily surprising, we
+ * implement it in the gpex PCI host controller, by providing the
+ * _window MRs, which are containers with io ops that implement
+ * the 'background' behaviour and which hold the real PCI MRs as
+ * subregions.
+ */
 pcie_host_mmcfg_init(pex, PCIE_MMCFG_SIZE_MAX);
 memory_region_init(>io_mmio, OBJECT(s), "gpex_mmio", UINT64_MAX);
 memory_region_init(>io_ioport, OBJECT(s), "gpex_ioport", 64 * 1024);
 
+memory_region_init_io(>io_mmio_window, OBJECT(s),
+  _io_ops, OBJECT(s),
+  "gpex_mmio_window", UINT64_MAX);
+memory_region_init_io(>io_ioport_window, OBJECT(s),
+  _io_ops, OBJECT(s),
+  "gpex_ioport_window", 64 * 1024);
+
+memory_region_add_subregion(>io_mmio_window, 0, >io_mmio);
+memory_region_add_subregion(>io_ioport_window, 0, >io_ioport);
+
 sysbus_init_mmio(sbd, >mmio);
-sysbus_init_mmio(sbd, >io_mmio);
-sysbus_init_mmio(sbd, >io_ioport);
+sysbus_init_mmio(sbd, >io_mmio_window);
+sysbus_init_mmio(sbd, >io_ioport_window);
 for (i = 0; i < GPEX_NUM_IRQS; i++) {
 sysbus_init_irq(sbd, >irq[i]);
 s->irq_num[i] = -1;
-- 
2.20.1




[PATCH] [NFC] Mark locally used symbols as static.

2021-03-22 Thread Yuri Gribov
Hi all,

This patch makes locally used symbols static to enable more compiler
optimizations on them. Some of the symbols turned out to not be used
at all so I marked them with ATTRIBUTE_UNUSED (as I wasn't sure if
they were ok to delete).

The symbols have been identified with a pet project of mine:
https://github.com/yugr/Localizer

>From 07b4f05893b7037e68e5d7bdec5ba8e74e50 Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Sat, 20 Mar 2021 23:39:15 +0300
Subject: [PATCH] [NFC] Mark locally used symbols as static.

Signed-off-by: Yury Gribov 
---
 disas/alpha.c | 16 ++--
 disas/m68k.c  | 78 -
 disas/mips.c  | 14 ++--
 disas/nios2.c | 84 +--
 disas/ppc.c   | 26 +++---
 disas/riscv.c |  2 +-
 pc-bios/optionrom/linuxboot_dma.c |  4 +-
 scripts/tracetool/format/c.py |  2 +-
 target/hexagon/gen_dectree_import.c   |  2 +-
 target/hexagon/opcodes.c  |  2 +-
 target/i386/cpu.c |  2 +-
 target/s390x/cpu_models.c |  2 +-
 .../xtensa/core-dc232b/xtensa-modules.c.inc   |  2 +-
 .../xtensa/core-dc233c/xtensa-modules.c.inc   |  2 +-
 target/xtensa/core-de212/xtensa-modules.c.inc |  2 +-
 .../core-de233_fpu/xtensa-modules.c.inc   |  2 +-
 .../xtensa/core-dsp3400/xtensa-modules.c.inc  |  2 +-
 target/xtensa/core-fsf/xtensa-modules.c.inc   |  2 +-
 .../xtensa-modules.c.inc  |  2 +-
 .../core-test_kc705_be/xtensa-modules.c.inc   |  2 +-
 .../core-test_mmuhifi_c3/xtensa-modules.c.inc |  2 +-
 21 files changed, 125 insertions(+), 127 deletions(-)

diff --git a/disas/alpha.c b/disas/alpha.c
index 3db90fa..361a4ed 100644
--- a/disas/alpha.c
+++ b/disas/alpha.c
@@ -56,8 +56,8 @@ struct alpha_opcode
 /* The table itself is sorted by major opcode number, and is otherwise
in the order in which the disassembler should consider
instructions.  */
-extern const struct alpha_opcode alpha_opcodes[];
-extern const unsigned alpha_num_opcodes;
+static const struct alpha_opcode alpha_opcodes[];
+static const unsigned alpha_num_opcodes;

 /* Values defined for the flags field of a struct alpha_opcode.  */

@@ -137,8 +137,8 @@ struct alpha_operand
 /* Elements in the table are retrieved by indexing with values from
the operands field of the alpha_opcodes table.  */

-extern const struct alpha_operand alpha_operands[];
-extern const unsigned alpha_num_operands;
+static const struct alpha_operand alpha_operands[];
+static const unsigned alpha_num_operands;

 /* Values defined for the flags field of a struct alpha_operand.  */

@@ -293,7 +293,7 @@ static int extract_ev6hwjhint (unsigned, int *);
 
 /* The operands table  */

-const struct alpha_operand alpha_operands[] =
+static const struct alpha_operand alpha_operands[] =
 {
   /* The fields are bits, shift, insert, extract, flags */
   /* The zero index is used to indicate end-of-list */
@@ -424,7 +424,7 @@ const struct alpha_operand alpha_operands[] =
 insert_ev6hwjhint, extract_ev6hwjhint }
 };

-const unsigned alpha_num_operands =
sizeof(alpha_operands)/sizeof(*alpha_operands);
+static ATTRIBUTE_UNUSED const unsigned alpha_num_operands =
sizeof(alpha_operands)/sizeof(*alpha_operands);

 /* The RB field when it is the same as the RA field in the same insn.
This operand is marked fake.  The insertion function just copies
@@ -706,7 +706,7 @@ extract_ev6hwjhint(unsigned insn, int *invalid
ATTRIBUTE_UNUSED)
that were not assigned to a particular extension.
 */

-const struct alpha_opcode alpha_opcodes[] = {
+static const struct alpha_opcode alpha_opcodes[] = {
   { "halt",SPCD(0x00,0x), BASE, ARG_NONE },
   { "draina",  SPCD(0x00,0x0002), BASE, ARG_NONE },
   { "bpt", SPCD(0x00,0x0080), BASE, ARG_NONE },
@@ -1732,7 +1732,7 @@ const struct alpha_opcode alpha_opcodes[] = {
   { "bgt", BRA(0x3F), BASE, ARG_BRA },
 };

-const unsigned alpha_num_opcodes =
sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);
+static ATTRIBUTE_UNUSED const unsigned alpha_num_opcodes =
sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);

 /* OSF register names.  */

diff --git a/disas/m68k.c b/disas/m68k.c
index aefaecf..903d5cf 100644
--- a/disas/m68k.c
+++ b/disas/m68k.c
@@ -95,29 +95,29 @@ struct floatformat

 /* floatformats for IEEE single and double, big and little endian.  */

-extern const struct floatformat floatformat_ieee_single_big;
-extern const struct floatformat floatformat_ieee_single_little;
-extern const struct floatformat floatformat_ieee_double_big;
-extern const struct floatformat floatformat_ieee_double_little;
+static const struct floatformat floatformat_ieee_single_big;
+static const struct floatformat floatformat_ieee_single_little;
+static const struct floatformat 

Re: [PATCH] docs: simplify each section title

2021-03-22 Thread Marc-André Lureau
Hi

On Mon, Mar 22, 2021 at 10:23 PM John Snow  wrote:

> On 3/22/21 12:36 PM, Peter Maydell wrote:
> > On Mon, 22 Mar 2021 at 16:03,  wrote:
> >>
> >> From: Marc-André Lureau 
> >>
> >> Now that we merged into one doc, it makes the nav looks nicer.
> >>
> >> Signed-off-by: Marc-André Lureau 
> >> ---
> >>   docs/devel/index.rst   | 4 ++--
> >>   docs/interop/index.rst | 4 ++--
> >>   docs/specs/index.rst   | 4 ++--
> >>   docs/system/index.rst  | 4 ++--
> >>   docs/tools/index.rst   | 4 ++--
> >>   docs/user/index.rst| 4 ++--
> >>   6 files changed, 12 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> >> index 7c424ea6d7..09d21d3514 100644
> >> --- a/docs/devel/index.rst
> >> +++ b/docs/devel/index.rst
> >> @@ -1,8 +1,8 @@
> >>   .. This is the top level page for the 'devel' manual.
> >>
> >>
> >> -QEMU Developer's Guide
> >> -==
> >> +Developers
> >> +==
> >
> > I think this should be "Developer's Guide" or "Developer Information"
> > or something. Just "Developers" doesn't really read right to me:
> > it is not "documentation of developers" in the way that the "Tools"
> > section is "documentation of tools", etc.
> >
> > thanks
> > -- PMM
> >
>
> Changing it to a verb - "Development" - might fit the intent, by analogy
> with "System Emulation Management and Interoperability", "System
> Emulation", and "User Mode Emulation".
>
> Keeping it as a noun with "Developer Information" or "Information for
> Developers" also reads fine to me.
>
>
It's a collection of developer's documents regrouped in a section. Maybe we
should consider a title like "Internals" instead? Tbh, I think "Developers"
was about right too.. "Guide" does not uphold its promise.

Ok, last call for "Developer Information" ?

-- 
Marc-André Lureau


Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)

2021-03-22 Thread Peter Xu
On Mon, Mar 22, 2021 at 10:02:38PM +0800, Keqian Zhu wrote:
> Hi Peter,

Hi, Keqian,

[...]

> You emphasize that dirty ring is a "Thread-local buffers", but dirty bitmap 
> is global,
> but I don't see it has optimization about "locking" compared to dirty bitmap.
> 
> The thread-local means that vCPU can flush hardware buffer into dirty ring 
> without
> locking, but for bitmap, vCPU can also use atomic set to mark dirty without 
> locking.
> Maybe I miss something?

Yes, the atomic ops guaranteed locking as you said, but afaiu atomics are
expensive already, since at least on x86 I think it needs to lock the memory
bus.  IIUC that'll become even slower as cores grow, as long as the cores share
the memory bus.

KVM dirty ring is per-vcpu, it means its metadata can be modified locally
without atomicity at all (but still, we'll need READ_ONCE/WRITE_ONCE to
guarantee ordering of memory accesses).  It should scale better especially with
hosts who have lots of cores.

> 
> The second question is that you observed longer migration time (55s->73s) 
> when guest
> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As 
> with dirty
> ring enabled, Qemu can get dirty info faster which means it handles dirty 
> page more
> quick, and guest can be throttled which means dirty page is generated slower. 
> What's
> the rationale for the longer migration time?

Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
condition, dirty bitmap seems to be more efficient, say, collecting dirty
bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.

Not to mention that current implementation of dirty ring in QEMU is not
complete - we still have two more layers of dirty bitmap, so it's actually a
mixture of dirty bitmap and dirty ring.  This series is more like a POC on
dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
E.g., we won't have hang issue when getting dirty pages since it's totally
async, however we'll still have some legacy dirty bitmap issues e.g. memory
consumption of userspace dirty bitmaps are still linear to memory footprint.

Moreover, IMHO another important feature that dirty ring provided is actually
the full-exit, where we can pause a vcpu when it dirties too fast, while other
vcpus won't be affected.  That's something I really wanted to POC too but I
don't have enough time.  I think it's a worth project in the future to really
make the full-exit throttle vcpus, then ideally we'll remove all the dirty
bitmaps in QEMU as long as dirty ring is on.

So I'd say the number I got at that time is not really helping a lot - as you
can see for small VMs it won't make things faster.  Maybe a bit more efficient?
I can't tell.  From design-wise it looks actually still better.  However dirty
logging still has the reasoning to be the default interface we use for small
vms, imho.

> 
> PS: As the dirty ring is still converted into dirty_bitmap of kvm_slot, so the
> "get dirty info faster" maybe not true. :-(

We can get dirty info faster even now, I think, because previously we only do
KVM_GET_DIRTY_LOG once per migration iteration, which could be tens of seconds
for a VM mentioned above with 24G and 800MB/s dirty rate.  Dirty ring is fully
async, we'll get that after the reaper thread timeout.  However I must also
confess "get dirty info faster" doesn't help us a lot on anything yet, afaict,
comparing to a full-featured dirty logging where clear dirty log and so on.

Hope above helps.

Thanks,

-- 
Peter Xu




[Bug 1407808] Re: virtual console gives strange response to ANSI DSR

2021-03-22 Thread Peter Maydell
This should be fixed in head-of-git by commit 8eb13bbbac08a, which will
be in QEMU 6.0. (The underlying bug is that when the GTK front-end tries
to send sequences of more than one byte to a UART, it didn't account for
UARTs which don't have a FIFO capable of holding the whole sequence at
once.)


** Changed in: qemu
   Status: Triaged => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1407808

Title:
  virtual console gives strange response to ANSI DSR

Status in QEMU:
  Fix Committed

Bug description:
  With "-serial vc" (which is the default), qemu make strange responses
  to the ANSI DSR escape sequence (\033[6n) which can confuse guests.

  Terminal emulators supporting the ANSI escape sequences usually
  support the "Device Status Report" escape sequence, \033[6n, to which
  as a response the terminal injects as input the response \033[n;mR,
  containing the current cursor position. An application running in the
  guest can use this escape sequence to, for example, figure out the
  size of the terminal it is running under, which can be useful as the
  guest has no other standard way to figure out a "size" for the serial
  port.

  Unfortunately, it seems that qemu when run with "-serial vc" (which
  appears to be the default), when qemu gets the \033[6n escape sequence
  on the serial port, it just responds with a single \033, and that's
  it! This can confuse an application, could concievably assume that a
  terminal either supports this escape sequence and injects the correct
  response (\033[n;mR), or doesn't support it and injects absolutely
  nothing as input - but not something in between.

  This caused a problem on one shell implementation on OSv that tried to
  figure out the terminal's size, and had to work around this unexpected
  behavior (see https://github.com/cloudius-
  systems/osv/commit/b79223584be40459861d1c12e1cb67e3e49e2a12).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1407808/+subscriptions



Re: [RFC PATCH v2] gitlab: default to not building the documentation

2021-03-22 Thread Alex Bennée


Alex Bennée  writes:

> In d0f26e68a0 ("gitlab: force enable docs build in Fedora, Ubuntu,
> Debian") we made sure we can build the documents on more than one
> system. However we don't want to build documents all the time as it's
> a waste of cycles (and energy). So lets reduce the total amount of
> documentation we build while still keeping coverage of at least one
> build on each supported target.
>
> Fixes: a8a3abe0b3 ("gitlab: move docs and tools build across from Travis")
> Signed-off-by: Alex Bennée 
> Reviewed-by: Willian Rampazzo 
> Reviewed-by: Thomas Huth 
>
> ---
> v2
>   - enable for OpenSUSE LEAP and Centos8 as well

Predictably these two fail the documentation build :-/

>   - disable for all cross builds
>   - minor re-word of the commit text
> ---
>  .gitlab-ci.d/crossbuilds.yml | 15 ---
>  .gitlab-ci.yml   | 16 
>  2 files changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
> index d5098c986b..2d95784ed5 100644
> --- a/.gitlab-ci.d/crossbuilds.yml
> +++ b/.gitlab-ci.d/crossbuilds.yml
> @@ -6,10 +6,10 @@
>  - mkdir build
>  - cd build
>  - PKG_CONFIG_PATH=$PKG_CONFIG_PATH
> -  ../configure --enable-werror $QEMU_CONFIGURE_OPTS --disable-user
> ---target-list-exclude="arm-softmmu cris-softmmu i386-softmmu
> -  microblaze-softmmu mips-softmmu mipsel-softmmu mips64-softmmu
> -  ppc-softmmu sh4-softmmu xtensa-softmmu"
> +  ../configure --enable-werror --disable-docs $QEMU_CONFIGURE_OPTS
> +--disable-user --target-list-exclude="arm-softmmu cris-softmmu
> +  i386-softmmu microblaze-softmmu mips-softmmu mipsel-softmmu
> +  mips64-softmmu ppc-softmmu sh4-softmmu xtensa-softmmu"
>  - make -j$(expr $(nproc) + 1) all check-build $MAKE_CHECK_ARGS
>  
>  # Job to cross-build specific accelerators.
> @@ -25,8 +25,8 @@
>  - mkdir build
>  - cd build
>  - PKG_CONFIG_PATH=$PKG_CONFIG_PATH
> -  ../configure --enable-werror $QEMU_CONFIGURE_OPTS --disable-tools
> ---enable-${ACCEL:-kvm} $ACCEL_CONFIGURE_OPTS
> +  ../configure --enable-werror --disable-docs $QEMU_CONFIGURE_OPTS
> +--disable-tools --enable-${ACCEL:-kvm} $ACCEL_CONFIGURE_OPTS
>  - make -j$(expr $(nproc) + 1) all check-build
>  
>  .cross_user_build_job:
> @@ -36,7 +36,8 @@
>  - mkdir build
>  - cd build
>  - PKG_CONFIG_PATH=$PKG_CONFIG_PATH
> -  ../configure --enable-werror $QEMU_CONFIGURE_OPTS --disable-system
> +  ../configure --enable-werror --disable-docs $QEMU_CONFIGURE_OPTS
> +--disable-system
>  - make -j$(expr $(nproc) + 1) all check-build $MAKE_CHECK_ARGS
>  
>  cross-armel-system:
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index 9ffbaa7ffb..c9c4079dbb 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -23,9 +23,9 @@ include:
>  - cd build
>  - if test -n "$TARGETS";
>then
> -../configure --enable-werror $CONFIGURE_ARGS 
> --target-list="$TARGETS" ;
> +../configure --enable-werror --disable-docs $CONFIGURE_ARGS 
> --target-list="$TARGETS" ;
>else
> -../configure --enable-werror $CONFIGURE_ARGS ;
> +../configure --enable-werror --disable-docs $CONFIGURE_ARGS ;
>fi || { cat config.log meson-logs/meson-log.txt && exit 1; }
>  - if test -n "$LD_JOBS";
>then
> @@ -119,7 +119,7 @@ build-system-ubuntu:
>  job: amd64-ubuntu2004-container
>variables:
>  IMAGE: ubuntu2004
> -CONFIGURE_ARGS: --enable-fdt=system --enable-slirp=system
> +CONFIGURE_ARGS: --enable-docs --enable-fdt=system --enable-slirp=system
>  TARGETS: aarch64-softmmu alpha-softmmu cris-softmmu hppa-softmmu
>moxie-softmmu microblazeel-softmmu mips64el-softmmu
>  MAKE_CHECK_ARGS: check-build
> @@ -223,7 +223,7 @@ build-system-centos:
>variables:
>  IMAGE: centos8
>  CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
> ---enable-modules
> +--enable-modules --enable-docs
>  TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
>x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
>  MAKE_CHECK_ARGS: check-build
> @@ -257,7 +257,7 @@ build-system-opensuse:
>  job: amd64-opensuse-leap-container
>variables:
>  IMAGE: opensuse-leap
> -CONFIGURE_ARGS: --enable-fdt=system
> +CONFIGURE_ARGS: --enable-docs --enable-fdt=system
>  TARGETS: s390x-softmmu x86_64-softmmu aarch64-softmmu
>  MAKE_CHECK_ARGS: check-build
>artifacts:
> @@ -443,7 +443,7 @@ build-user-centos7:
>  job: amd64-centos7-container
>variables:
>  IMAGE: centos7
> -CONFIGURE_ARGS: --disable-system --disable-tools --disable-docs
> +CONFIGURE_ARGS: --disable-system --disable-tools
>  MAKE_CHECK_ARGS: check-tcg
>  
>  build-some-softmmu-plugins:
> @@ -607,7 +607,7 @@ tsan-build:
>  job: amd64-ubuntu2004-container
>

  1   2   3   4   5   >