Re: [PATCH v2 1/1] osdep: asynchronous teardown for shutdown on Linux

2022-08-03 Thread Claudio Imbrenda
On Wed, 3 Aug 2022 18:34:45 +0100
Daniel P. Berrangé  wrote:

> On Wed, Aug 03, 2022 at 07:31:41PM +0200, Claudio Imbrenda wrote:
> > This patch adds support for asynchronously tearing down a VM on Linux.
> > 
> > When qemu terminates, either naturally or because of a fatal signal,
> > the VM is torn down. If the VM is huge, it can take a considerable
> > amount of time for it to be cleaned up. In case of a protected VM, it
> > might take even longer than a non-protected VM (this is the case on
> > s390x, for example).
> > 
> > Some users might want to shut down a VM and restart it immediately,
> > without having to wait. This is especially true if management
> > infrastructure like libvirt is used.
> > 
> > This patch implements a simple trick on Linux to allow qemu to return
> > immediately, with the teardown of the VM being performed
> > asynchronously.
> > 
> > If the new commandline option -async-teardown is used, a new process is
> > spawned from qemu at startup, using the clone syscall, in such way that
> > it will share its address space with qemu.
> > 
> > The new process will then simpy wait until qemu terminates, and then it
> > will exit itself.
> > 
> > This allows qemu to terminate quickly, without having to wait for the
> > whole address space to be torn down. The teardown process will exit
> > after qemu, so it will be the last user of the address space, and
> > therefore it will take care of the actual teardown.
> > 
> > The teardown process will share the same cgroups as qemu, so both
> > memory usage and cpu time will be accounted properly.
> > 
> > This feature can already be used with libvirt by adding the following
> > to the XML domain definition:
> > 
> >   http://libvirt.org/schemas/domain/qemu/1.0;>
> >   
> > 
> 
> How does this work in practice ?  Libvirt should be blocking until

I don't know the inner details of how libvirt works..

> all processes in the cgroup have exited, including this cloned
> child process.

..but I tested it and it works

my impression is that libvirt by default is only waiting for the
main qemu process.

the only issue I have found is the log file, which stays open as long
as some file descriptors (which the cloned process inherits from the
main qemu process) stay open. A new VM cannot be started if its log file
is still open by the logger process. The close_range() call solves the
issue.

> 
> With regards,
> Daniel




Re: [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation

2022-08-03 Thread Eugenio Perez Martin
On Thu, Aug 4, 2022 at 5:01 AM Jason Wang  wrote:
>
>
> 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > Since we're going to allow SVQ to add elements without the guest's
> > knowledge and without its own VirtQueueElement, it's easier to check if
> > an element is a valid head checking a different thing than the
> > VirtQueueElement.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
>
>
> Patch looks good to me. But I spot several other issues:
>
> 1) vhost_svq_add() use size_t for in_num and out_num, is this intended?

Would you prefer them to be unsigned? To me size_t fits better, but
VirtQueueElement uses unsigned anyway.

> 2) do we need to fail vhost_svq_add() if in_num + out_num == 0?
>

We can recover from it, but it's a failure of qemu code.
- In the case of loading the state to the destination device, we
already know the layout (it's always 1 out, 1 in).
- In the case of forwarding buffers, there is no way to get a
VirtQueueElement with 0 out and 0 in descriptors, due to the virtqueue
way of working.

Would you prefer to return success in this case?

Thanks!




Re: [PATCH v2 19/20] ppc/ppc405: QOM'ify I2C

2022-08-03 Thread Cédric Le Goater

On 8/4/22 01:31, BALATON Zoltan wrote:

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

Having an explicit I2C model object will help if one day we want to
add I2C devices on the bus.


Same here as with the UIC in previous patch, it's not QOMifying here either. As 
for why we may need I2C, on sam460ex the firmware detects RAM accessing the SPD 
data over I2C so that could be the reason but it may not be used here on 405.


You can still plug I2C devices on the PPC405 command line if you want to.

Thanks,

C.



Regards,
BALATON Zoltan


Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h    |  2 ++
hw/ppc/ppc405_uc.c | 10 --
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index d29f738cd2d0..d13624ae309c 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -28,6 +28,7 @@
#include "qom/object.h"
#include "hw/ppc/ppc4xx.h"
#include "hw/intc/ppc-uic.h"
+#include "hw/i2c/ppc4xx_i2c.h"

#define PPC405EP_SDRAM_BASE 0x
#define PPC405EP_NVRAM_BASE 0xF000
@@ -256,6 +257,7 @@ struct Ppc405SoCState {
    Ppc405OcmState ocm;
    Ppc405GpioState gpio;
    Ppc405DmaState dma;
+    PPC4xxI2CState i2c;
    Ppc405EbcState ebc;
    Ppc405OpbaState opba;
    Ppc405PobState pob;
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 5cd32e22b7ea..8f0caa45f5f7 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1461,6 +1461,8 @@ static void ppc405_soc_instance_init(Object *obj)

    object_initialize_child(obj, "dma", >dma, TYPE_PPC405_DMA);

+    object_initialize_child(obj, "i2c", >i2c, TYPE_PPC4xx_I2C);
+
    object_initialize_child(obj, "ebc", >ebc, TYPE_PPC405_EBC);

    object_initialize_child(obj, "opba", >opba, TYPE_PPC405_OPBA);
@@ -1569,8 +1571,12 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)
    }

    /* I2C controller */
-    sysbus_create_simple(TYPE_PPC4xx_I2C, 0xef600500,
- qdev_get_gpio_in(DEVICE(>uic), 2));
+    if (!sysbus_realize(SYS_BUS_DEVICE(>i2c), errp)) {
+    return;
+    }
+    sysbus_mmio_map(SYS_BUS_DEVICE(>i2c), 0, 0xef600500);
+    sysbus_connect_irq(SYS_BUS_DEVICE(>i2c), 0,
+   qdev_get_gpio_in(DEVICE(>uic), 2));

    /* GPIO */
    if (!sysbus_realize(SYS_BUS_DEVICE(>gpio), errp)) {






Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Laszlo Ersek
On 08/04/22 00:23, Michael S. Tsirkin wrote:
> On Thu, Aug 04, 2022 at 12:08:07AM +0200, Jason A. Donenfeld wrote:
>> Hi Michael,
>>
>> On Wed, Aug 03, 2022 at 06:03:20PM -0400, Michael S. Tsirkin wrote:
>>> On Wed, Aug 03, 2022 at 07:07:52PM +0200, Jason A. Donenfeld wrote:
 On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
>> Thanks for the info. Very helpful. Looking into it now.
>
> So interestingly, this is not a new issue. If you pass any type of setup
> data, OVMF appears to be doing something unusual and passing 0x
> for all the entries, rather than the actual data. The reason this isn't
> new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> same page fault, on all QEMU versions. The thing that passes the DTB is
> the thing that passes the RNG seed. Same mechanism, same bug.
>
> I'm looking into it...

 Fixed with: 
 https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/

 Feel free to join into the discussion there. I CC'd you.

 Jason
>>>
>>> Hmm I don't think this patch will make it in 7.1 given the
>>> timeframe. I suspect we should revert the patch for now.
>>>
>>> Which is where you maybe begin to see why we generally
>>> prefer doing it with features - one can then work around
>>> bugs by turning the feature on and off.
>>
>> The bug actually precedes this patch. Just boot with -dtb on any qemu
>> version and you'll trigger it.

Yes! That's exactly what I expected, per

https://bugzilla.redhat.com/show_bug.cgi?id=2114637#c4

There I wrote that this kind of setup_data chaining was a "first" for OVMF.

While the same logic had existed in QEMU with for chaining a dtb, there
never had been a reason (that I could imagine) for using that with OVMF
guests.

So it had to be either a preexistent bug in QEMU, or one in OVMF, that
now got triggered (as Jason's patch for chaining the seed closely
followed the pattern set by the dtb logic).

Now, regarding the patch at
,
including v2 at
...

I don't think this kind of setup_data block chaining, with raw
guest-physical addresses filled in by QEMU in guest RAM, in advance, is
appropriate for an edk2 guest *in general*. By the time the firmware
loads the kernel (including setup block and kernel block) from fw_cfg,
the area in question (with the seed etc) may have been overwritten
several times. Edk2 is very careful about memory ownership, but it needs
the VMM and the guest OS to play along. There is a only very small set
of "well known addresses" that are (a) open-coded in both QEMU board
code and edk2 platform code and (b) not specified by industry specs;
such addresses are used to set up everything else, and we seek not to
introduce more of them.

Consider e.g. the end of
, namely the
deprecation of the "EFI Handover Protocol". The idea is to use
well-specified information channels that don't depend on the placement
of the kernel.

At least two mechanisms exist for dealing with this; the ACPI
linker-loader, and the GUID-ed struct chaining in pflash (mostly used
with SEV and I think TDX too).

More below.

> 
> Sure but it's still a regression.
> 
>> We're still at rc0; there should be time
>> enough for a bug fix. Please do chime in on that thread and maybe we can
>> come up with something reasonable fast enough.
>>
>> Jason
> 
> Maybe.

QEMU commit 67f7e426e538 ("hw/i386: pass RNG seed via setup_data entry",
2022-07-22) references ,
and the commit message on that begins with:

--
Currently, the only way x86 can get an early boot RNG seed is via EFI,
which is generally always used now for physical machines, but is very
rarely used in VMs, especially VMs that are optimized for starting
"instantaneously", such as Firecracker's MicroVM. For tiny fast booting
VMs, EFI is not something you generally need or want.
--

So, first, I'd quite disagree with "EFI being rarely used in VMs" (the
trend has been the opposite), and I'm not saying that because I'm
married to edk2 (I switched to a different project last summer). I do
agree with EFI being rarely used in one-shot, fast-booting VMs though.

Second, I think this segmentation of use cases is actually great. If you
need this particular kind of seed-passing for non-EFI VMs only (i.e.,
where the UEFI stub in the guest kernel cannot rely on
EFI_RNG_PROTOCOL), then implement it in both QEMU and the (guest) kernel
for non-EFI only. Both the guest kernel and QEMU can tell whether the
guest firmware is UEFI (the guest kernel can tell that precisely,
whereas in QEMU, if memory serves, we equate that with a particular
pflash setup).

Again, I don't think the 

Re: [PATCH v2 02/20] ppc/ppc405: Introduce a PPC405 generic machine

2022-08-03 Thread Cédric Le Goater

On 8/4/22 00:07, BALATON Zoltan wrote:

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

We will use this machine as a base to define the ref405ep and possibly
the PPC405 hotfoot board as found in the Linux kernel.

Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405_boards.c | 31 ---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 1a4e7588c584..4c269b6526a5 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -50,6 +50,15 @@

#define USE_FLASH_BIOS

+struct Ppc405MachineState {
+    /* Private */
+    MachineState parent_obj;
+    /* Public */
+};
+
+#define TYPE_PPC405_MACHINE MACHINE_TYPE_NAME("ppc405")
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405MachineState, PPC405_MACHINE);
+
/*/
/* PPC405EP reference board (IBM) */
/* Standalone board with:
@@ -332,18 +341,34 @@ static void ref405ep_class_init(ObjectClass *oc, void 
*data)

    mc->desc = "ref405ep";
    mc->init = ref405ep_init;
-    mc->default_ram_size = 0x0800;
-    mc->default_ram_id = "ef405ep.ram";
}

static const TypeInfo ref405ep_type = {
    .name = MACHINE_TYPE_NAME("ref405ep"),
-    .parent = TYPE_MACHINE,
+    .parent = TYPE_PPC405_MACHINE,
    .class_init = ref405ep_class_init,
};

+static void ppc405_machine_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->desc = "PPC405 generic machine";
+    mc->default_ram_size = 0x0800;
+    mc->default_ram_id = "ppc405.ram";


Is the default RAM size a property of specific boards or the PPC405? I think it 
could be different for different boards so don't see why it's moved to the 
generic machine but maybe it has something to do with how other parts of QEMU 
handles this or I'm not getting what the generic PPC405 machine is for.


Well, the two QEMU PPC405 machines had 128M, so they were sharing the same
definition. This can be overridden in a child class if needed but I doubt
there will be any new PPC405 machines in QEMU. Let's keep it here.
 


Would it be clearer to just write 128 * MiB instead of a long hex number with 
extra zeros that's hard to read? It would be a good opportunity to change it 
here.


agree.

Thanks,

C.



Regards,
BALATON Zoltan


+}
+
+static const TypeInfo ppc405_machine_type = {
+    .name = TYPE_PPC405_MACHINE,
+    .parent = TYPE_MACHINE,
+    .instance_size = sizeof(Ppc405MachineState),
+    .class_init = ppc405_machine_class_init,
+    .abstract = true,
+};
+
static void ppc405_machine_init(void)
{
+    type_register_static(_machine_type);
    type_register_static(_type);
}







Re: [PATCH] hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[]

2022-08-03 Thread Cédric Le Goater

On 8/4/22 01:32, Daniel Henrique Barboza wrote:

We're not storing all GPIO lines we're retrieving with
qdev_get_gpio_in() in mal_irqs[]. We're storing just the last one in the
first index:

 for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
 mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
 }
 ppc4xx_mal_init(env, 4, 16, mal_irqs);

mal_irqs is used in ppc4xx_mal_init() to assign the IRQs to MAL:

 for (i = 0; i < 4; i++) {
 mal->irqs[i] = irqs[i];
 }

Since only irqs[0] has been initialized, mal->irqs[1,2,3] are being
zeroed.

This doesn´t seem to trigger any apparent issues at this moment, but
Cedric's QOMification of the MAL device [1] is executing a
sysbus_connect_irq() that will fail if we do not store all GPIO lines
properly.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00497.html

Cc: Peter Maydell 
Cc: BALATON Zoltan 
Fixes: 706e944206d7 ("hw/ppc/sam460ex: Drop use of ppcuic_init()")
Signed-off-by: Daniel Henrique Barboza 


Yes, I found the same issue when fixing ppc4xx_mal_init().

Reviewed-by: Cédric Le Goater 

Thanks,

C.

---
  hw/ppc/sam460ex.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 7e8da657c2..0357ee077f 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -384,7 +384,7 @@ static void sam460ex_init(MachineState *machine)
  
  /* MAL */

  for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
-mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
+mal_irqs[i] = qdev_get_gpio_in(uic[2], 3 + i);
  }
  ppc4xx_mal_init(env, 4, 16, mal_irqs);
  





Re: [PATCH v2 12/20] ppc/ppc405: QOM'ify EBC

2022-08-03 Thread Cédric Le Goater

On 8/4/22 01:36, Daniel Henrique Barboza wrote:

Cedric,

On 8/3/22 10:28, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
  hw/ppc/ppc405.h    | 16 +++
  hw/ppc/ppc405_uc.c | 71 +++---
  2 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 1da34a7f10f3..1c7fe07b8084 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -65,7 +65,22 @@ struct ppc4xx_bd_info_t {
  typedef struct Ppc405SoCState Ppc405SoCState;
+/* Peripheral controller */
+#define TYPE_PPC405_EBC "ppc405-ebc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405EbcState, PPC405_EBC);
+struct Ppc405EbcState {
+    DeviceState parent_obj;
+
+    PowerPCCPU *cpu;
+    uint32_t addr;
+    uint32_t bcr[8];
+    uint32_t bap[8];
+    uint32_t bear;
+    uint32_t besr0;
+    uint32_t besr1;
+    uint32_t cfg;
+};
  /* DMA controller */
  #define TYPE_PPC405_DMA "ppc405-dma"
@@ -203,6 +218,7 @@ struct Ppc405SoCState {
  Ppc405OcmState ocm;
  Ppc405GpioState gpio;
  Ppc405DmaState dma;
+    Ppc405EbcState ebc;
  };
  /* PowerPC 405 core */
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 6bd93c1cb90c..0166f3fc36da 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -393,17 +393,6 @@ static void ppc4xx_opba_init(hwaddr base)
  
/*/
  /* Peripheral controller */
-typedef struct ppc4xx_ebc_t ppc4xx_ebc_t;
-struct ppc4xx_ebc_t {
-    uint32_t addr;
-    uint32_t bcr[8];
-    uint32_t bap[8];
-    uint32_t bear;
-    uint32_t besr0;
-    uint32_t besr1;
-    uint32_t cfg;
-};
-
  enum {
  EBC0_CFGADDR = 0x012,
  EBC0_CFGDATA = 0x013,
@@ -411,10 +400,9 @@ enum {
  static uint32_t dcr_read_ebc (void *opaque, int dcrn)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(opaque);
  uint32_t ret;
-    ebc = opaque;
  switch (dcrn) {
  case EBC0_CFGADDR:
  ret = ebc->addr;
@@ -496,9 +484,8 @@ static uint32_t dcr_read_ebc (void *opaque, int dcrn)
  static void dcr_write_ebc (void *opaque, int dcrn, uint32_t val)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(opaque);
-    ebc = opaque;
  switch (dcrn) {
  case EBC0_CFGADDR:
  ebc->addr = val;
@@ -554,12 +541,11 @@ static void dcr_write_ebc (void *opaque, int dcrn, 
uint32_t val)
  }
  }
-static void ebc_reset (void *opaque)
+static void ppc405_ebc_reset(DeviceState *dev)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(dev);
  int i;
-    ebc = opaque;
  ebc->addr = 0x;
  ebc->bap[0] = 0x7F8FFE80;
  ebc->bcr[0] = 0xFFE28000;
@@ -572,18 +558,46 @@ static void ebc_reset (void *opaque)
  ebc->cfg = 0x8040;
  }
-void ppc405_ebc_init(CPUPPCState *env)
+static void ppc405_ebc_realize(DeviceState *dev, Error **errp)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(dev);
+    CPUPPCState *env;
+
+    assert(ebc->cpu);
+
+    env = >cpu->env;
-    ebc = g_new0(ppc4xx_ebc_t, 1);
-    qemu_register_reset(_reset, ebc);
  ppc_dcr_register(env, EBC0_CFGADDR,
   ebc, _read_ebc, _write_ebc);
  ppc_dcr_register(env, EBC0_CFGDATA,
   ebc, _read_ebc, _write_ebc);
  }
+static Property ppc405_ebc_properties[] = {
+    DEFINE_PROP_LINK("cpu", Ppc405EbcState, cpu, TYPE_POWERPC_CPU,
+ PowerPCCPU *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ppc405_ebc_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = ppc405_ebc_realize;
+    dc->user_creatable = false;
+    dc->reset = ppc405_ebc_reset;
+    device_class_set_props(dc, ppc405_ebc_properties);
+}
+
+void ppc405_ebc_init(CPUPPCState *env)
+{
+    PowerPCCPU *cpu = env_archcpu(env);
+    DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
+
+    object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);


This line is breaking the boot of sam460ex:


  ./qemu-system-ppc64 -display none -M sam460ex
Unexpected error in object_property_find_err() at ../qom/object.c:1304:
qemu-system-ppc64: Property '460exb-powerpc64-cpu.cpu' not found
Aborted (core dumped)


I think you meant to link the cpu prop of the EBC obj to the CPU object,
not the cpu prop of the CPU obj to the EBC dev.


Yes. ppc405_ebc_init() has only one user left, the sam460ex, which I didn't
test :/

Thanks,

C.
 


This fixes the issue:


$ git diff
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 0166f3fc36..aac3a3f761 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -594,7 +594,7 @@ void ppc405_ebc_init(CPUPPCState *env)
  PowerPCCPU *cpu = env_archcpu(env);
  DeviceState *dev = qdev_new(TYPE_PPC405_EBC);

-    object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);
+    object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
  qdev_realize_and_unref(dev, NULL, 

Re: [PATCH v2 07/20] ppc/ppc405: QOM'ify CPC

2022-08-03 Thread Cédric Le Goater

On 8/3/22 19:16, BALATON Zoltan wrote:

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

Introduce a QOM property "cpu" to initialize the DCR handlers. This is
a pattern that we will reuse for the all other 405 devices needing it.

Now that all clock settings are handled at the CPC level, change the
SoC "sys-clk" property to be an alias on the same property in the CPC
model.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h    |  39 +++-
hw/ppc/ppc405_uc.c | 109 +++--
2 files changed, 85 insertions(+), 63 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index ae64549537c6..88c63774d9ba 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -63,6 +63,43 @@ struct ppc4xx_bd_info_t {
    uint32_t bi_iic_fast[2];
};

+typedef struct Ppc405SoCState Ppc405SoCState;


That's a left over from previous experiment passing the SoC to
the device model to initialize the DCR handlers. Passing the CPU
is enough.

We can drop the forward declaration.

Thanks,

C.




This typedef is already done by the OBJECT_DECLARE_SIMPLE_TYPE macro below. 
Could some compilers complain about double typedef? There may be some circular 
dependencies here so to avoid a separate typedef you may need to bring the 
OBJECT_DECLARE_SIMPLE_TYPE(Ppc405SoCState, PPC405_SOC); line up here to the 
front while keeping the actual declaration of the state struct and rest of the 
object later which separates them but adding a comment may explain that. I'm 
not sure if it's better to do that or repeating the typedef in advance as done 
here is better but declaring the object in advance is probably a bit cleaner 
than repeating part of its internals just in case this implementation detail 
ever changes.

Regards,
BALATON Zoltan


+
+#define TYPE_PPC405_CPC "ppc405-cpc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405CpcState, PPC405_CPC);
+
+enum {
+    PPC405EP_CPU_CLK   = 0,
+    PPC405EP_PLB_CLK   = 1,
+    PPC405EP_OPB_CLK   = 2,
+    PPC405EP_EBC_CLK   = 3,
+    PPC405EP_MAL_CLK   = 4,
+    PPC405EP_PCI_CLK   = 5,
+    PPC405EP_UART0_CLK = 6,
+    PPC405EP_UART1_CLK = 7,
+    PPC405EP_CLK_NB    = 8,
+};
+
+struct Ppc405CpcState {
+    DeviceState parent_obj;
+
+    PowerPCCPU *cpu;
+
+    uint32_t sysclk;
+    clk_setup_t clk_setup[PPC405EP_CLK_NB];
+    uint32_t boot;
+    uint32_t epctl;
+    uint32_t pllmr[2];
+    uint32_t ucr;
+    uint32_t srr;
+    uint32_t jtagid;
+    uint32_t pci;
+    /* Clock and power management */
+    uint32_t er;
+    uint32_t fr;
+    uint32_t sr;
+};
+
#define TYPE_PPC405_SOC "ppc405-soc"
OBJECT_DECLARE_SIMPLE_TYPE(Ppc405SoCState, PPC405_SOC);

@@ -79,9 +116,9 @@ struct Ppc405SoCState {
    MemoryRegion *dram_mr;
    hwaddr ram_size;

-    uint32_t sysclk;
    PowerPCCPU cpu;
    DeviceState *uic;
+    Ppc405CpcState cpc;
};

/* PowerPC 405 core */
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 013dccee898b..32bfc9480bc6 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1178,36 +1178,7 @@ enum {
#endif
};

-enum {
-    PPC405EP_CPU_CLK   = 0,
-    PPC405EP_PLB_CLK   = 1,
-    PPC405EP_OPB_CLK   = 2,
-    PPC405EP_EBC_CLK   = 3,
-    PPC405EP_MAL_CLK   = 4,
-    PPC405EP_PCI_CLK   = 5,
-    PPC405EP_UART0_CLK = 6,
-    PPC405EP_UART1_CLK = 7,
-    PPC405EP_CLK_NB    = 8,
-};
-
-typedef struct ppc405ep_cpc_t ppc405ep_cpc_t;
-struct ppc405ep_cpc_t {
-    uint32_t sysclk;
-    clk_setup_t clk_setup[PPC405EP_CLK_NB];
-    uint32_t boot;
-    uint32_t epctl;
-    uint32_t pllmr[2];
-    uint32_t ucr;
-    uint32_t srr;
-    uint32_t jtagid;
-    uint32_t pci;
-    /* Clock and power management */
-    uint32_t er;
-    uint32_t fr;
-    uint32_t sr;
-};
-
-static void ppc405ep_compute_clocks (ppc405ep_cpc_t *cpc)
+static void ppc405ep_compute_clocks(Ppc405CpcState *cpc)
{
    uint32_t CPU_clk, PLB_clk, OPB_clk, EBC_clk, MAL_clk, PCI_clk;
    uint32_t UART0_clk, UART1_clk;
@@ -1302,10 +1273,9 @@ static void ppc405ep_compute_clocks (ppc405ep_cpc_t *cpc)

static uint32_t dcr_read_epcpc (void *opaque, int dcrn)
{
-    ppc405ep_cpc_t *cpc;
+    Ppc405CpcState *cpc = PPC405_CPC(opaque);
    uint32_t ret;

-    cpc = opaque;
    switch (dcrn) {
    case PPC405EP_CPC0_BOOT:
    ret = cpc->boot;
@@ -1342,9 +1312,8 @@ static uint32_t dcr_read_epcpc (void *opaque, int dcrn)

static void dcr_write_epcpc (void *opaque, int dcrn, uint32_t val)
{
-    ppc405ep_cpc_t *cpc;
+    Ppc405CpcState *cpc = PPC405_CPC(opaque);

-    cpc = opaque;
    switch (dcrn) {
    case PPC405EP_CPC0_BOOT:
    /* Read-only register */
@@ -1377,9 +1346,9 @@ static void dcr_write_epcpc (void *opaque, int dcrn, 
uint32_t val)
    }
}

-static void ppc405ep_cpc_reset (void *opaque)
+static void ppc405_cpc_reset(DeviceState *dev)
{
-    ppc405ep_cpc_t *cpc = opaque;
+    Ppc405CpcState *cpc = PPC405_CPC(dev);

    cpc->boot = 0x0010; /* Boot from PCI - IIC EEPROM disabled */
    cpc->epctl = 0x;
@@ -1391,21 +1360,24 @@ static 

Re: [PATCH] vdpa: do not save failed dma maps in SVQ iova tree

2022-08-03 Thread Jason Wang



在 2022/8/3 16:12, Eugenio Perez Martin 写道:

On Wed, Aug 3, 2022 at 10:09 AM Jason Wang  wrote:

On Tue, Aug 2, 2022 at 10:39 PM Eugenio Pérez  wrote:

If a map fails for whatever reason, it must not be saved in the tree.
Otherwise, qemu will try to unmap it in cleanup, leaving to more errors.

Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-vdpa.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3ff9ce3501..e44c23dce5 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -176,6 +176,7 @@ static void vhost_vdpa_listener_commit(MemoryListener 
*listener)
  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
 MemoryRegionSection *section)
  {
+DMAMap mem_region = {};
  struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, 
listener);
  hwaddr iova;
  Int128 llend, llsize;
@@ -212,13 +213,13 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,

  llsize = int128_sub(llend, int128_make64(iova));
  if (v->shadow_vqs_enabled) {
-DMAMap mem_region = {
-.translated_addr = (hwaddr)(uintptr_t)vaddr,
-.size = int128_get64(llsize) - 1,
-.perm = IOMMU_ACCESS_FLAG(true, section->readonly),
-};

Nit: can we keep this part unchanged?


We can, but that implies we should look for iova again at fail_map
tag. If you are ok with that I'm fine to perform the search again.



I meant something like:

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9a2daef7e3..edf40868e3 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -232,11 +232,15 @@ static void 
vhost_vdpa_listener_region_add(MemoryListener *listener,

  vaddr, section->readonly);
 if (ret) {
 error_report("vhost vdpa map fail!");
-    goto fail;
+    goto fail_unmap;
 }

 return;

+fail_unmap:
+    if (v->shadow_vqs_enabled) {
+    vhost_iova_tree_remove(v->iova_tree, _region);
+    }
 fail:
 /*
  * On the initfn path, store the first error in the container so we

Thanks





Thanks


+int r;

-int r = vhost_iova_tree_map_alloc(v->iova_tree, _region);
+mem_region.translated_addr = (hwaddr)(uintptr_t)vaddr,
+mem_region.size = int128_get64(llsize) - 1,
+mem_region.perm = IOMMU_ACCESS_FLAG(true, section->readonly),
+
+r = vhost_iova_tree_map_alloc(v->iova_tree, _region);
  if (unlikely(r != IOVA_OK)) {
  error_report("Can't allocate a mapping (%d)", r);
  goto fail;
@@ -232,11 +233,16 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,
   vaddr, section->readonly);
  if (ret) {
  error_report("vhost vdpa map fail!");
-goto fail;
+goto fail_map;
  }

  return;

+fail_map:
+if (v->shadow_vqs_enabled) {
+vhost_iova_tree_remove(v->iova_tree, _region);
+}
+
  fail:
  /*
   * On the initfn path, store the first error in the container so we
--
2.31.1






Re: [PATCH v3 6/7] vhost_net: Add NetClientInfo prepare callback

2022-08-03 Thread Jason Wang



在 2022/8/4 01:18, Eugenio Pérez 写道:

This is used by the backend to perform actions before the device is
started.

In particular, vdpa will use it to isolate CVQ in its own ASID if
possible, and start SVQ unconditionally only in CVQ.

Signed-off-by: Eugenio Pérez 
---
  include/net/net.h  | 2 ++
  hw/net/vhost_net.c | 4 
  2 files changed, 6 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index a8d47309cd..efa6448886 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
  
  typedef void (NetPoll)(NetClientState *, bool enable);

  typedef bool (NetCanReceive)(NetClientState *);
+typedef void (NetPrepare)(NetClientState *);
  typedef int (NetLoad)(NetClientState *);
  typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
  typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
@@ -72,6 +73,7 @@ typedef struct NetClientInfo {
  NetReceive *receive_raw;
  NetReceiveIOV *receive_iov;
  NetCanReceive *can_receive;
+NetPrepare *prepare;
  NetLoad *load;
  NetCleanup *cleanup;
  LinkStatusChanged *link_status_changed;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index a9bf72dcda..6d759b 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -244,6 +244,10 @@ static int vhost_net_start_one(struct vhost_net *net,
  struct vhost_vring_file file = { };
  int r;
  
+if (net->nc->info->prepare) {

+net->nc->info->prepare(net->nc);
+}



Any chance we can reuse load()?

Thanks



+
  r = vhost_dev_enable_notifiers(>dev, dev);
  if (r < 0) {
  goto fail_notifiers;





Re: [PATCH v3 7/7] vdpa: Always start CVQ in SVQ mode

2022-08-03 Thread Jason Wang



在 2022/8/4 01:18, Eugenio Pérez 写道:

Isolate control virtqueue in its own group, allowing to intercept control
commands but letting dataplane run totally passthrough to the guest.

Signed-off-by: Eugenio Pérez 
---
v3:
* Make asid related queries print a warning instead of returning an
   error and stop the start of qemu.
---
  hw/virtio/vhost-vdpa.c |   3 +-
  net/vhost-vdpa.c   | 144 +++--
  2 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 131100841c..a4cb68862b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -674,7 +674,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
  {
  uint64_t features;
  uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
  int r;
  
  if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, )) {

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index e3b65ed546..5f39f0edb5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
  /* Control commands shadow buffers */
  void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
  
+/* Number of address spaces supported by the device */

+unsigned address_space_num;
+
  /* The device always have SVQ enabled */
  bool always_svq;
  bool started;
@@ -100,6 +103,8 @@ static const uint64_t vdpa_svq_device_features =
  BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
  BIT_ULL(VIRTIO_NET_F_STANDBY);
  
+#define VHOST_VDPA_NET_CVQ_ASID 1

+
  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
  {
  VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -224,6 +229,101 @@ static NetClientInfo net_vhost_vdpa_info = {
  .check_peer_type = vhost_vdpa_check_peer_type,
  };
  
+static void vhost_vdpa_get_vring_group(int device_fd,

+   struct vhost_vring_state *state)
+{
+int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
+if (unlikely(r < 0)) {
+/*
+ * Assume all groups are 0, the consequences are the same and we will
+ * not abort device creation
+ */
+state->num = 0;
+}
+}
+
+/**
+ * Check if all the virtqueues of the virtio device are in a different vq than
+ * the last vq. VQ group of last group passed in cvq_group.
+ */
+static bool vhost_vdpa_cvq_group_is_independent(struct vhost_vdpa *v,
+struct vhost_vring_state cvq_group)
+{
+struct vhost_dev *dev = v->dev;
+
+for (int i = 0; i < (dev->vq_index_end - 1); ++i) {
+struct vhost_vring_state vq_group = {
+.index = i,
+};
+
+vhost_vdpa_get_vring_group(v->device_fd, _group);
+if (unlikely(vq_group.num == cvq_group.num)) {
+warn_report("CVQ %u group is the same as VQ %u one (%u)",
+ cvq_group.index, vq_group.index, cvq_group.num);



I don't get why we need warn here.



+return false;
+}
+}
+
+return true;
+}
+
+static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
+   unsigned vq_group,
+   unsigned asid_num)
+{
+struct vhost_vring_state asid = {
+.index = vq_group,
+.num = asid_num,
+};
+int ret;
+
+ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, );
+if (unlikely(ret < 0)) {
+warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
+asid.index, asid.num, errno, g_strerror(errno));
+}
+return ret;
+}
+
+static void vhost_vdpa_net_prepare(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_vdpa *v = >vhost_vdpa;
+struct vhost_dev *dev = v->dev;
+struct vhost_vring_state cvq_group = {
+.index = v->dev->vq_index_end - 1,
+};
+int r;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (dev->nvqs != 1 || dev->vq_index + dev->nvqs != dev->vq_index_end) {
+/* Only interested in CVQ */
+return;
+}
+
+if (s->always_svq) {
+/* SVQ is already enabled */
+return;
+}
+
+if (s->address_space_num < 2) {
+v->shadow_vqs_enabled = false;
+return;
+}
+
+vhost_vdpa_get_vring_group(v->device_fd, _group);
+if (!vhost_vdpa_cvq_group_is_independent(v, cvq_group)) {



If there's no other caller of vhost_vdpa_cvq_group_is_independent(), I'd 
suggest to unitfy them into a single helper.


(Btw, the name of the function is kind of too long).

Thanks



+v->shadow_vqs_enabled = false;
+return;
+}
+
+r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
+VHOST_VDPA_NET_CVQ_ASID);
+

Re: [PATCH v3 4/7] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap

2022-08-03 Thread Jason Wang



在 2022/8/4 01:18, Eugenio Pérez 写道:

So the caller can choose which ASID is destined.

No need to update the batch functions as they will always be called from
memory listener updates at the moment. Memory listener updates will
always update ASID 0, as it's the passthrough ASID.

All vhost devices's ASID are 0 at this moment.

Signed-off-by: Eugenio Pérez 
---
v3: Deleted unneeded space
---
  include/hw/virtio/vhost-vdpa.h |  8 +---
  hw/virtio/vhost-vdpa.c | 25 +++--
  net/vhost-vdpa.c   |  6 +++---
  hw/virtio/trace-events |  4 ++--
  4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d85643..6560bb9d78 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
  int index;
  uint32_t msg_type;
  bool iotlb_batch_begin_sent;
+uint32_t address_space_id;
  MemoryListener listener;
  struct vhost_vdpa_iova_range iova_range;
  uint64_t acked_features;
@@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
  } VhostVDPA;
  
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,

-   void *vaddr, bool readonly);
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+   hwaddr size, void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+ hwaddr size);
  
  #endif

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2fefcc66ad..131100841c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -72,22 +72,24 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
  return false;
  }
  
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,

-   void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+   hwaddr size, void *vaddr, bool readonly)
  {
  struct vhost_msg_v2 msg = {};
  int fd = v->device_fd;
  int ret = 0;
  
  msg.type = v->msg_type;

+msg.asid = asid;



I wonder what happens if we're running is a kernel without ASID support.

Does it work since asid will be simply ignored? Can we have a case that 
we want asid!=0 on old kernel?


Thanks



  msg.iotlb.iova = iova;
  msg.iotlb.size = size;
  msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
  msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
  msg.iotlb.type = VHOST_IOTLB_UPDATE;
  
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,

-msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+ msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+ msg.iotlb.type);
  
  if (write(fd, , sizeof(msg)) != sizeof(msg)) {

  error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, 
hwaddr size,
  return ret;
  }
  
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)

+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+ hwaddr size)
  {
  struct vhost_msg_v2 msg = {};
  int fd = v->device_fd;
  int ret = 0;
  
  msg.type = v->msg_type;

+msg.asid = asid;
  msg.iotlb.iova = iova;
  msg.iotlb.size = size;
  msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
  
-trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,

+trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
 msg.iotlb.size, msg.iotlb.type);
  
  if (write(fd, , sizeof(msg)) != sizeof(msg)) {

@@ -229,7 +233,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,
  }
  
  vhost_vdpa_iotlb_batch_begin_once(v);

-ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
+ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
   vaddr, section->readonly);
  if (ret) {
  error_report("vhost vdpa map fail!");
@@ -299,7 +303,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
*listener,
  vhost_iova_tree_remove(v->iova_tree, result);
  }
  vhost_vdpa_iotlb_batch_begin_once(v);
-ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
+ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
  if (ret) {
  error_report("vhost_vdpa dma unmap error!");
  }
@@ -890,7 +894,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
  }
  
 

Re: [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ

2022-08-03 Thread Jason Wang



在 2022/8/3 01:57, Eugenio Pérez 写道:

CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
virtio-net device model is updated. The migration was blocked because although
the state can be megrated between VMM it was not possible to restore on the
destination NIC.

This series add support for SVQ to inject external messages without the guest's
knowledge, so before the guest is resumed all the guest visible state is
restored. It is done using standard CVQ messages, so the vhost-vdpa device does
not need to learn how to restore it: As long as they have the feature, they
know how to handle it.

This series needs fixes [1], [2] and [3] to be applied to achieve full live
migration.

Thanks!

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02984.html
[2] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg03993.html



Note that the above has been merged into master.

And the series looks good overall, just some comments to make the code 
easier to be read and maintained in the future.


Thanks



[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00325.html

v5:
- Rename s/start/load/
- Use independent NetClientInfo to only add load callback on cvq.
- Accept out sg instead of dev_buffers[] at vhost_vdpa_net_cvq_map_elem
- Use only out size instead of iovec dev_buffers to know if the descriptor is
   effectively available, allowing to delete artificial !NULL VirtQueueElement
   on vhost_svq_add call.

v4:
- Actually use NetClientInfo callback.

v3:
- Route vhost-vdpa start code through NetClientInfo callback.
- Delete extra vhost_net_stop_one() call.

v2:
- Fix SIGSEGV dereferencing SVQ when not in svq mode

v1 from RFC:
- Do not reorder DRIVER_OK & enable patches.
- Delete leftovers

Eugenio Pérez (10):
   vhost: stop transfer elem ownership in vhost_handle_guest_kick
   vhost: use SVQ element ndescs instead of opaque data for desc
 validation
   vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
   vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem
   vdpa: Extract vhost_vdpa_net_cvq_add from
 vhost_vdpa_net_handle_ctrl_avail
   vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
   vdpa: add NetClientState->load() callback
   vdpa: add net_vhost_vdpa_cvq_info NetClientInfo
   vdpa: Add virtio-net mac address via CVQ at start
   vdpa: Delete CVQ migration blocker

  include/hw/virtio/vhost-vdpa.h |   1 -
  include/net/net.h  |   2 +
  hw/net/vhost_net.c |   7 ++
  hw/virtio/vhost-shadow-virtqueue.c |  31 +++---
  hw/virtio/vhost-vdpa.c |  14 ---
  net/vhost-vdpa.c   | 163 +
  6 files changed, 145 insertions(+), 73 deletions(-)






Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg

2022-08-03 Thread Jason Wang



在 2022/8/3 01:57, Eugenio Pérez 写道:

So its generic enough to accept any out sg buffer and we can inject
NIC state messages.

Signed-off-by: Eugenio Pérez 
---
v5: Accept out sg instead of dev_buffers[]
---
  net/vhost-vdpa.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 33bf3d6409..2421bca347 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -302,16 +302,16 @@ dma_map_err:
  }
  
  /**

- * Copy the guest element into a dedicated buffer suitable to be sent to NIC
+ * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
   */
-static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
-VirtQueueElement *elem,
-size_t *out_len)
+static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
+  const struct iovec *out, size_t out_num,
+  size_t *out_len)



This still looks not genreal as there's no guarantee that we won't have 
command-in-specific-data. One example is that Ali is working on the 
virtio-net statistics fetching from the control virtqueue.


So it looks to me we'd better have a general bounce_map here that accepts:

1) out_sg and out_num
2) in_sg and in_num

In this level, we'd better not have any special care about the in as the 
ack. And we need do bouncing:


1) for out buffer, during map
2) for in buffer during unmap

Thanks



  {
  size_t in_copied;
  bool ok;
  
-ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, elem->out_sg, elem->out_num,

+ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, out, out_num,
  vhost_vdpa_net_cvq_cmd_len(),
  s->cvq_cmd_out_buffer, out_len, false);
  if (unlikely(!ok)) {
@@ -435,7 +435,8 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
  };
  bool ok;
  
-ok = vhost_vdpa_net_cvq_map_elem(s, elem, _buffers[0].iov_len);

+ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
+   _buffers[0].iov_len);
  if (unlikely(!ok)) {
  goto out;
  }





Re: [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem

2022-08-03 Thread Jason Wang



在 2022/8/3 01:57, Eugenio Pérez 写道:

There is no need to get them by parameter, since they're contained in
VhostVDPAState. The only useful information was the written length in
out.

Simplify the function removing those.

Signed-off-by: Eugenio Pérez 
---
  net/vhost-vdpa.c | 17 ++---
  1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ac1810723c..c6699edfbc 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -303,34 +303,29 @@ dma_map_err:
  
  /**

   * Copy the guest element into a dedicated buffer suitable to be sent to NIC
- *
- * @iov: [0] is the out buffer, [1] is the in one
   */
  static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
  VirtQueueElement *elem,
-struct iovec *iov)
+size_t *out_len)
  {
  size_t in_copied;
  bool ok;
  
-iov[0].iov_base = s->cvq_cmd_out_buffer;

  ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, elem->out_sg, elem->out_num,
-vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
-[0].iov_len, false);
+vhost_vdpa_net_cvq_cmd_len(),
+s->cvq_cmd_out_buffer, out_len, false);
  if (unlikely(!ok)) {
  return false;
  }
  
-iov[1].iov_base = s->cvq_cmd_in_buffer;

  ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, NULL, 0,
-sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
-_copied, true);
+sizeof(virtio_net_ctrl_ack),
+s->cvq_cmd_in_buffer, _copied, true);



I'd suggest to do some tweak to make it easier for the reviewers:

- let vhost_vdpa_cvq_map_buf() and vhost_vdpa_net_cvq_map_elem() return 
ssize_t and drop the confusing written/out_len parameter of those 
functions.
- rename vhost_vdpa_net_cvq_map_elem() to 
vhost_vdpa_net_cvq_bounce_map() since it uses a bounce buffer actually


Thanks



  if (unlikely(!ok)) {
  vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_out_buffer);
  return false;
  }
  
-iov[1].iov_len = sizeof(virtio_net_ctrl_ack);

  return true;
  }
  
@@ -395,7 +390,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,

  int r = -EINVAL;
  bool ok;
  
-ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers);

+ok = vhost_vdpa_net_cvq_map_elem(s, elem, _buffers[0].iov_len);
  if (unlikely(!ok)) {
  goto out;
  }





Re: [PATCH v7 3/4] target/riscv: smstateen check for fcsr

2022-08-03 Thread Mayuresh Chitale
On Wed, 2022-08-03 at 16:32 +0800, Weiwei Li wrote:
> 在 2022/8/2 上午1:18, Mayuresh Chitale 写道:
> > If smstateen is implemented and sstateen0.fcsr is clear then the
> > floating point operations must return illegal instruction
> > exception.
> 
> I think this is not correct. The exception for float point
> operations 
> must be illegal instruction exception
> 
> if FCSR is not existed(that is misa.F is zero and Zfinx is not 
> supported). However, when FCSR is exsited,
> 
> the final exception should be decided by current privilege level and
> the 
> stateen related csr values just
> 
> like the access control of FCSR.
Ok. We can use the language from the spec itself.
> 
> Regards,
> 
> Weiwei Li
> 
> > Signed-off-by: Mayuresh Chitale 
> > ---
> >   target/riscv/csr.c| 23 +
> >   target/riscv/insn_trans/trans_rvf.c.inc   | 40
> > +--
> >   target/riscv/insn_trans/trans_rvzfh.c.inc | 12 +++
> >   3 files changed, 72 insertions(+), 3 deletions(-)
> > 
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 011d6c5976..0512391220 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -79,6 +79,10 @@ static RISCVException fs(CPURISCVState *env, int
> > csrno)
> >   !RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
> >   return RISCV_EXCP_ILLEGAL_INST;
> >   }
> > +
> > +if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
> > +return smstateen_acc_ok(env, 0, SMSTATEEN0_FCSR);
> > +}
> >   #endif
> >   return RISCV_EXCP_NONE;
> >   }
> > @@ -1866,6 +1870,9 @@ static RISCVException
> > write_mstateen0(CPURISCVState *env, int csrno,
> > target_ulong new_val)
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> >   
> >   return write_mstateen(env, csrno, wr_mask, new_val);
> >   }
> > @@ -1914,6 +1921,10 @@ static RISCVException
> > write_mstateen0h(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_mstateenh(env, csrno, wr_mask, new_val);
> >   }
> >   
> > @@ -1963,6 +1974,10 @@ static RISCVException
> > write_hstateen0(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_hstateen(env, csrno, wr_mask, new_val);
> >   }
> >   
> > @@ -2014,6 +2029,10 @@ static RISCVException
> > write_hstateen0h(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_hstateenh(env, csrno, wr_mask, new_val);
> >   }
> >   
> > @@ -2073,6 +2092,10 @@ static RISCVException
> > write_sstateen0(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_sstateen(env, csrno, wr_mask, new_val);
> >   }
> >   
> > diff --git a/target/riscv/insn_trans/trans_rvf.c.inc
> > b/target/riscv/insn_trans/trans_rvf.c.inc
> > index a1d3eb52ad..ce8a0cc34b 100644
> > --- a/target/riscv/insn_trans/trans_rvf.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvf.c.inc
> > @@ -24,9 +24,43 @@
> >   return false; \
> >   } while (0)
> >   
> > -#define REQUIRE_ZFINX_OR_F(ctx) do {\
> > -if (!ctx->cfg_ptr->ext_zfinx) { \
> > -REQUIRE_EXT(ctx, RVF); \
> > +#ifndef CONFIG_USER_ONLY
> > +static inline bool smstateen_check(DisasContext *ctx, int index)
> > +{
> > +CPUState *cpu = ctx->cs;
> > +CPURISCVState *env = cpu->env_ptr;
> > +uint64_t stateen = env->mstateen[index];
> > +
> > +if (!ctx->cfg_ptr->ext_smstateen || env->priv == PRV_M) {
> > +return true;
> > +}
> > +
> > +if (ctx->virt_enabled) {
> > +stateen &= env->hstateen[index];
> > +}
> > +
> > +if (env->priv == PRV_U && has_ext(ctx, RVS)) {
> > +stateen &= env->sstateen[index];
> > +}
> > +
> > +if (!(stateen & SMSTATEEN0_FCSR)) {
> > +return false;
> > +}
> > +
> > +return true;
> > +}
> > +#else
> > +#define smstateen_check(ctx, index) (true)
> > +#endif
> > +
> > +#define REQUIRE_ZFINX_OR_F(ctx) do { \
> > +if (!has_ext(ctx, RVF)) { \
> > +if (!ctx->cfg_ptr->ext_zfinx) { \
> > +return false; \
> > +} \
> > +if (!smstateen_check(ctx, 0)) { \
> > +return false; \
> > +} \
> >   } \
> >   } while (0)
> >   
> > diff --git 

Re: [PATCH] target/riscv: Fix priority of csr related check in riscv_csrrw_check

2022-08-03 Thread Anup Patel
On Wed, Aug 3, 2022 at 6:16 PM Weiwei Li  wrote:
>
> Normally, riscv_csrrw_check is called when executing Zicsr instructions.
> And we can only do access control for existed CSRs. So the priority of
> CSR related check, from highest to lowest, should be as follows:
> 1) check whether Zicsr is supported: raise RISCV_EXCP_ILLEGAL_INST if not
> 2) check whether csr is existed: raise RISCV_EXCP_ILLEGAL_INST if not
> 3) do access control: raise RISCV_EXCP_ILLEGAL_INST or RISCV_EXCP_VIRT_
> INSTRUCTION_FAULT if not allowed
>
> The predicates contain parts of function of both 2) and 3), So they need
> to be placed in the middle of riscv_csrrw_check
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> ---
>  target/riscv/csr.c | 44 +---
>  1 file changed, 25 insertions(+), 19 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 0fb042b2fd..d81f466c80 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -3270,6 +3270,30 @@ static inline RISCVException 
> riscv_csrrw_check(CPURISCVState *env,
>  /* check privileges and return RISCV_EXCP_ILLEGAL_INST if check fails */
>  int read_only = get_field(csrno, 0xC00) == 3;
>  int csr_min_priv = csr_ops[csrno].min_priv_ver;
> +
> +/* ensure the CSR extension is enabled. */
> +if (!cpu->cfg.ext_icsr) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +if (env->priv_ver < csr_min_priv) {
> +return RISCV_EXCP_ILLEGAL_INST;

This line breaks nested virtualization because for nested virtualization
to work, the guest hypervisor accessing h and vs CSRs from
VS-mode should result in a virtual instruction trap not illegal
instruction trap.

Regards,
Anup

> +}
> +
> +/* check predicate */
> +if (!csr_ops[csrno].predicate) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +if (write_mask && read_only) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +RISCVException ret = csr_ops[csrno].predicate(env, csrno);
> +if (ret != RISCV_EXCP_NONE) {
> +return ret;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  int csr_priv, effective_priv = env->priv;
>
> @@ -3290,25 +3314,7 @@ static inline RISCVException 
> riscv_csrrw_check(CPURISCVState *env,
>  return RISCV_EXCP_ILLEGAL_INST;
>  }
>  #endif
> -if (write_mask && read_only) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -/* ensure the CSR extension is enabled. */
> -if (!cpu->cfg.ext_icsr) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -/* check predicate */
> -if (!csr_ops[csrno].predicate) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -if (env->priv_ver < csr_min_priv) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -return csr_ops[csrno].predicate(env, csrno);
> +return RISCV_EXCP_NONE;
>  }
>
>  static RISCVException riscv_csrrw_do64(CPURISCVState *env, int csrno,
> --
> 2.17.1
>
>



Re: [PATCH v7 2/4] target/riscv: smstateen check for h/senvcfg

2022-08-03 Thread Mayuresh Chitale
On Wed, 2022-08-03 at 16:24 +0800, Weiwei Li wrote:
> 在 2022/8/2 上午1:18, Mayuresh Chitale 写道:
> > Accesses to henvcfg, henvcfgh and senvcfg are allowed only if
> > corresponding bit in mstateen0/hstateen0 is enabled. Otherwise an
> > illegal instruction trap is generated.
> > 
> > Signed-off-by: Mayuresh Chitale 
> > ---
> >   roms/opensbi   |  2 +-
> >   target/riscv/csr.c | 83
> > ++
> >   2 files changed, 77 insertions(+), 8 deletions(-)
> > 
> > diff --git a/roms/opensbi b/roms/opensbi
> > index 4489876e93..48f91ee9c9 16
> > --- a/roms/opensbi
> > +++ b/roms/opensbi
> > @@ -1 +1 @@
> > -Subproject commit 4489876e933d8ba0d8bc6c64bae71e295d45faac
> > +Subproject commit 48f91ee9c960f048c4a7d1da4447d31e04931e38
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index ad1642fb9b..011d6c5976 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -40,6 +40,38 @@ void riscv_set_csr_ops(int csrno,
> > riscv_csr_operations *ops)
> >   }
> >   
> >   /* Predicates */
> > +#if !defined(CONFIG_USER_ONLY)
> > +static RISCVException smstateen_acc_ok(CPURISCVState *env, int
> > index,
> > +   uint64_t bit)
> > +{
> > +bool virt = riscv_cpu_virt_enabled(env);
> > +CPUState *cs = env_cpu(env);
> > +RISCVCPU *cpu = RISCV_CPU(cs);
> > +
> > +if (env->priv == PRV_M || !cpu->cfg.ext_smstateen) {
> > +return RISCV_EXCP_NONE;
> > +}
> > +
> > +if (!(env->mstateen[index] & bit)) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +if (virt) {
> > +if (!(env->hstateen[index] & bit)) {
> > +return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > +}
> > +}
> > +
> > +if (env->priv == PRV_U && riscv_has_ext(env, RVS)) {
> > +if (!(env->sstateen[index] & bit)) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> 
> VU mode seems not be taken into consideration. For VU mode, the 
> exception will be
> 
> RISCV_EXCP_VIRT_INSTRUCTION_FAULT instead if "!(env->sstateen[index]
> & bit)" here.
Ok. I will fix it in the next version.
> 
> 
> Regards,
> 
> Weiwei Li
> 
> > +}
> > +
> > +return RISCV_EXCP_NONE;
> > +}
> > +#endif
> > +
> >   static RISCVException fs(CPURISCVState *env, int csrno)
> >   {
> >   #if !defined(CONFIG_USER_ONLY)
> > @@ -1715,6 +1747,13 @@ static RISCVException
> > write_menvcfgh(CPURISCVState *env, int csrno,
> >   static RISCVException read_senvcfg(CPURISCVState *env, int csrno,
> >target_ulong *val)
> >   {
> > +RISCVException ret;
> > +
> > +ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
> > +if (ret != RISCV_EXCP_NONE) {
> > +return ret;
> > +}
> > +
> >   *val = env->senvcfg;
> >   return RISCV_EXCP_NONE;
> >   }
> > @@ -1723,15 +1762,27 @@ static RISCVException
> > write_senvcfg(CPURISCVState *env, int csrno,
> > target_ulong val)
> >   {
> >   uint64_t mask = SENVCFG_FIOM | SENVCFG_CBIE | SENVCFG_CBCFE |
> > SENVCFG_CBZE;
> > +RISCVException ret;
> >   
> > -env->senvcfg = (env->senvcfg & ~mask) | (val & mask);
> > +ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
> > +if (ret != RISCV_EXCP_NONE) {
> > +return ret;
> > +}
> >   
> > +env->senvcfg = (env->senvcfg & ~mask) | (val & mask);
> >   return RISCV_EXCP_NONE;
> >   }
> >   
> >   static RISCVException read_henvcfg(CPURISCVState *env, int csrno,
> >target_ulong *val)
> >   {
> > +RISCVException ret;
> > +
> > +ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
> > +if (ret != RISCV_EXCP_NONE) {
> > +return ret;
> > +}
> > +
> >   *val = env->henvcfg;
> >   return RISCV_EXCP_NONE;
> >   }
> > @@ -1740,6 +1791,12 @@ static RISCVException
> > write_henvcfg(CPURISCVState *env, int csrno,
> > target_ulong val)
> >   {
> >   uint64_t mask = HENVCFG_FIOM | HENVCFG_CBIE | HENVCFG_CBCFE |
> > HENVCFG_CBZE;
> > +RISCVException ret;
> > +
> > +ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
> > +if (ret != RISCV_EXCP_NONE) {
> > +return ret;
> > +}
> >   
> >   if (riscv_cpu_mxl(env) == MXL_RV64) {
> >   mask |= HENVCFG_PBMTE | HENVCFG_STCE;
> > @@ -1753,6 +1810,13 @@ static RISCVException
> > write_henvcfg(CPURISCVState *env, int csrno,
> >   static RISCVException read_henvcfgh(CPURISCVState *env, int
> > csrno,
> >target_ulong *val)
> >   {
> > +RISCVException ret;
> > +
> > +ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
> > +if (ret != RISCV_EXCP_NONE) {
> > +return ret;
> > +}
> > +
> >   *val = env->henvcfg >> 32;
> >   return RISCV_EXCP_NONE;
> >   }
> > @@ -1762,9 +1826,14 @@ static RISCVException
> > write_henvcfgh(CPURISCVState *env, int csrno,
> >   {
> > 

Re: [PATCH v7 1/4] target/riscv: Add smstateen support

2022-08-03 Thread Mayuresh Chitale
On Wed, 2022-08-03 at 16:15 +0800, Weiwei Li wrote:
> 在 2022/8/2 上午1:18, Mayuresh Chitale 写道:
> > Smstateen extension specifies a mechanism to close
> > the potential covert channels that could cause security issues.
> > 
> > This patch adds the CSRs defined in the specification and
> > the corresponding predicates and read/write functions.
> > 
> > Signed-off-by: Mayuresh Chitale 
> > ---
> >   target/riscv/cpu.h  |   4 +
> >   target/riscv/cpu_bits.h |  37 
> >   target/riscv/csr.c  | 369
> > 
> >   target/riscv/machine.c  |  21 +++
> >   4 files changed, 431 insertions(+)
> > 
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 4be4b82a83..6bff935c57 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -354,6 +354,9 @@ struct CPUArchState {
> >   
> >   /* CSRs for execution enviornment configuration */
> >   uint64_t menvcfg;
> > +uint64_t mstateen[SMSTATEEN_MAX_COUNT];
> > +uint64_t hstateen[SMSTATEEN_MAX_COUNT];
> > +uint64_t sstateen[SMSTATEEN_MAX_COUNT];
> >   target_ulong senvcfg;
> >   uint64_t henvcfg;
> >   #endif
> > @@ -427,6 +430,7 @@ struct RISCVCPUConfig {
> >   bool ext_ifencei;
> >   bool ext_icsr;
> >   bool ext_zihintpause;
> > +bool ext_smstateen;
> >   bool ext_svinval;
> >   bool ext_svnapot;
> >   bool ext_svpbmt;
> > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > index 6be5a9e9f0..c773e0d310 100644
> > --- a/target/riscv/cpu_bits.h
> > +++ b/target/riscv/cpu_bits.h
> > @@ -199,6 +199,12 @@
> >   /* Supervisor Configuration CSRs */
> >   #define CSR_SENVCFG 0x10A
> >   
> > +/* Supervisor state CSRs */
> > +#define CSR_SSTATEEN0   0x10C
> > +#define CSR_SSTATEEN1   0x10D
> > +#define CSR_SSTATEEN2   0x10E
> > +#define CSR_SSTATEEN3   0x10F
> > +
> >   /* Supervisor Trap Handling */
> >   #define CSR_SSCRATCH0x140
> >   #define CSR_SEPC0x141
> > @@ -242,6 +248,16 @@
> >   #define CSR_HENVCFG 0x60A
> >   #define CSR_HENVCFGH0x61A
> >   
> > +/* Hypervisor state CSRs */
> > +#define CSR_HSTATEEN0   0x60C
> > +#define CSR_HSTATEEN0H  0x61C
> > +#define CSR_HSTATEEN1   0x60D
> > +#define CSR_HSTATEEN1H  0x61D
> > +#define CSR_HSTATEEN2   0x60E
> > +#define CSR_HSTATEEN2H  0x61E
> > +#define CSR_HSTATEEN3   0x60F
> > +#define CSR_HSTATEEN3H  0x61F
> > +
> >   /* Virtual CSRs */
> >   #define CSR_VSSTATUS0x200
> >   #define CSR_VSIE0x204
> > @@ -283,6 +299,27 @@
> >   #define CSR_MENVCFG 0x30A
> >   #define CSR_MENVCFGH0x31A
> >   
> > +/* Machine state CSRs */
> > +#define CSR_MSTATEEN0   0x30C
> > +#define CSR_MSTATEEN0H  0x31C
> > +#define CSR_MSTATEEN1   0x30D
> > +#define CSR_MSTATEEN1H  0x31D
> > +#define CSR_MSTATEEN2   0x30E
> > +#define CSR_MSTATEEN2H  0x31E
> > +#define CSR_MSTATEEN3   0x30F
> > +#define CSR_MSTATEEN3H  0x31F
> > +
> > +/* Common defines for all smstateen */
> > +#define SMSTATEEN_MAX_COUNT 4
> > +#define SMSTATEEN0_CS   (1ULL << 0)
> > +#define SMSTATEEN0_FCSR (1ULL << 1)
> > +#define SMSTATEEN0_HSCONTXT (1ULL << 57)
> > +#define SMSTATEEN0_IMSIC(1ULL << 58)
> > +#define SMSTATEEN0_AIA  (1ULL << 59)
> > +#define SMSTATEEN0_SVSLCT   (1ULL << 60)
> > +#define SMSTATEEN0_HSENVCFG (1ULL << 62)
> > +#define SMSTATEEN_STATEEN   (1ULL << 63)
> > +
> >   /* Enhanced Physical Memory Protection (ePMP) */
> >   #define CSR_MSECCFG 0x747
> >   #define CSR_MSECCFGH0x757
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 0fb042b2fd..ad1642fb9b 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -346,6 +346,68 @@ static RISCVException umode32(CPURISCVState
> > *env, int csrno)
> >   return umode(env, csrno);
> >   }
> >   
> > +static RISCVException mstateen(CPURISCVState *env, int csrno)
> > +{
> > +CPUState *cs = env_cpu(env);
> > +RISCVCPU *cpu = RISCV_CPU(cs);
> > +
> > +if (!cpu->cfg.ext_smstateen) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +return any(env, csrno);
> > +}
> > +
> > +static RISCVException hstateen_pred(CPURISCVState *env, int csrno,
> > int base)
> > +{
> > +CPUState *cs = env_cpu(env);
> > +RISCVCPU *cpu = RISCV_CPU(cs);
> > +
> > +if (!cpu->cfg.ext_smstateen) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +if (!(env->mstateen[csrno - base] & SMSTATEEN_STATEEN)) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> 
> mstateen only control the access from less-privilege levels.  If the 
> access is from M mode, it will always
> 
> be allowed.  So I think we should add check for current priv  is
> less 
> than M mode here.
> 
> Similar to sstateen.

Ok. I will fix in the next version.
> 
> Regards,
> 
> Weiwei Li
> 
> > +return hmode(env, csrno);
> > +}
> > +
> > +static 

Re: [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush

2022-08-03 Thread Jason Wang



在 2022/8/3 01:57, Eugenio Pérez 写道:

Since QEMU will be able to inject new elements on CVQ to restore the
state, we need not to depend on a VirtQueueElement to know if a new
element has been used by the device or not. Instead of check that, check
if there are new elements only using used idx on vhost_svq_flush.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-shadow-virtqueue.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e6eebd0e8d..fdb550c31b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -491,7 +491,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
  /**
   * Poll the SVQ for one device used buffer.
   *
- * This function race with main event loop SVQ polling, so extra
+ * This function races with main event loop SVQ polling, so extra
   * synchronization is needed.
   *
   * Return the length written by the device.
@@ -499,20 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
  size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
  {
  int64_t start_us = g_get_monotonic_time();
-do {
+while (true) {
  uint32_t len;
-VirtQueueElement *elem = vhost_svq_get_buf(svq, );
-if (elem) {
-return len;
-}
  
  if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {

  return 0;
  }
  
-/* Make sure we read new used_idx */

-smp_rmb();
-} while (true);
+if (!vhost_svq_more_used(svq)) {
+continue;
+}
+
+vhost_svq_get_buf(svq, );



I wonder if this means we won't worry about the infinite wait?

Thanks



+return len;
+}
  }
  
  /**





Re: [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation

2022-08-03 Thread Jason Wang



在 2022/8/3 01:57, Eugenio Pérez 写道:

Since we're going to allow SVQ to add elements without the guest's
knowledge and without its own VirtQueueElement, it's easier to check if
an element is a valid head checking a different thing than the
VirtQueueElement.

Signed-off-by: Eugenio Pérez 
---



Patch looks good to me. But I spot several other issues:

1) vhost_svq_add() use size_t for in_num and out_num, is this intended?
2) do we need to fail vhost_svq_add() if in_num + out_num == 0?

Thanks



  hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index ffd2b2c972..e6eebd0e8d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -414,7 +414,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
  return NULL;
  }
  
-if (unlikely(!svq->desc_state[used_elem.id].elem)) {

+if (unlikely(!svq->desc_state[used_elem.id].ndescs)) {
  qemu_log_mask(LOG_GUEST_ERROR,
  "Device %s says index %u is used, but it was not available",
  svq->vdev->name, used_elem.id);
@@ -422,6 +422,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
  }
  
  num = svq->desc_state[used_elem.id].ndescs;

+svq->desc_state[used_elem.id].ndescs = 0;
  last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
  svq->desc_next[last_used_chain] = svq->free_head;
  svq->free_head = used_elem.id;





[PATCH v8 3/3] target/riscv: Add vstimecmp support

2022-08-03 Thread Atish Patra
vstimecmp CSR allows the guest OS or to program the next guest timer
interrupt directly. Thus, hypervisor no longer need to inject the
timer interrupt to the guest if vstimecmp is used. This was ratified
as a part of the Sstc extension.

Signed-off-by: Atish Patra 
---
 target/riscv/cpu.h |   4 ++
 target/riscv/cpu_bits.h|   4 ++
 target/riscv/cpu_helper.c  |  11 ++--
 target/riscv/csr.c | 102 -
 target/riscv/machine.c |   1 +
 target/riscv/time_helper.c |  16 ++
 6 files changed, 133 insertions(+), 5 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 4cda2905661e..1fd382b2717f 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -312,6 +312,8 @@ struct CPUArchState {
 /* Sstc CSRs */
 uint64_t stimecmp;
 
+uint64_t vstimecmp;
+
 /* physical memory protection */
 pmp_table_t pmp_state;
 target_ulong mseccfg;
@@ -366,6 +368,8 @@ struct CPUArchState {
 
 /* Fields from here on are preserved across CPU reset. */
 QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
+QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */
+bool vstime_irq;
 
 hwaddr kernel_addr;
 hwaddr fdt_addr;
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index ac17cf1515c0..095dab19f512 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -257,6 +257,10 @@
 #define CSR_VSIP0x244
 #define CSR_VSATP   0x280
 
+/* Sstc virtual CSRs */
+#define CSR_VSTIMECMP   0x24D
+#define CSR_VSTIMECMPH  0x25D
+
 #define CSR_MTINST  0x34a
 #define CSR_MTVAL2  0x34b
 
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 650574accf0a..1e4faa84e839 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -345,8 +345,9 @@ uint64_t riscv_cpu_all_pending(CPURISCVState *env)
 {
 uint32_t gein = get_field(env->hstatus, HSTATUS_VGEIN);
 uint64_t vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
+uint64_t vstip = (env->vstime_irq) ? MIP_VSTIP : 0;
 
-return (env->mip | vsgein) & env->mie;
+return (env->mip | vsgein | vstip) & env->mie;
 }
 
 int riscv_cpu_mirq_pending(CPURISCVState *env)
@@ -605,7 +606,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t mask, 
uint64_t value)
 {
 CPURISCVState *env = >env;
 CPUState *cs = CPU(cpu);
-uint64_t gein, vsgein = 0, old = env->mip;
+uint64_t gein, vsgein = 0, vstip = 0, old = env->mip;
 bool locked = false;
 
 if (riscv_cpu_virt_enabled(env)) {
@@ -613,6 +614,10 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t 
mask, uint64_t value)
 vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
 }
 
+/* No need to update mip for VSTIP */
+mask = ((mask == MIP_VSTIP) && env->vstime_irq) ? 0 : mask;
+vstip = env->vstime_irq ? MIP_VSTIP : 0;
+
 if (!qemu_mutex_iothread_locked()) {
 locked = true;
 qemu_mutex_lock_iothread();
@@ -620,7 +625,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t mask, 
uint64_t value)
 
 env->mip = (env->mip & ~mask) | (value & mask);
 
-if (env->mip | vsgein) {
+if (env->mip | vsgein | vstip) {
 cpu_interrupt(cs, CPU_INTERRUPT_HARD);
 } else {
 cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index e18b000700e4..9da4d6515e7b 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -829,17 +829,100 @@ static RISCVException sstc(CPURISCVState *env, int csrno)
 return smode(env, csrno);
 }
 
+static RISCVException sstc_hmode(CPURISCVState *env, int csrno)
+{
+CPUState *cs = env_cpu(env);
+RISCVCPU *cpu = RISCV_CPU(cs);
+
+if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (env->priv == PRV_M) {
+return RISCV_EXCP_NONE;
+}
+
+if (!(get_field(env->mcounteren, COUNTEREN_TM) &
+  get_field(env->menvcfg, MENVCFG_STCE))) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (riscv_cpu_virt_enabled(env)) {
+if (!(get_field(env->hcounteren, COUNTEREN_TM) &
+  get_field(env->henvcfg, HENVCFG_STCE))) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
+}
+
+return hmode(env, csrno);
+}
+
+static RISCVException read_vstimecmp(CPURISCVState *env, int csrno,
+target_ulong *val)
+{
+*val = env->vstimecmp;
+
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException read_vstimecmph(CPURISCVState *env, int csrno,
+target_ulong *val)
+{
+*val = env->vstimecmp >> 32;
+
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException write_vstimecmp(CPURISCVState *env, int csrno,
+target_ulong val)
+{
+RISCVCPU *cpu = env_archcpu(env);
+
+if (riscv_cpu_mxl(env) == MXL_RV32) {
+env->vstimecmp = 

[PATCH v8 2/3] target/riscv: Add stimecmp support

2022-08-03 Thread Atish Patra
stimecmp allows the supervisor mode to update stimecmp CSR directly
to program the next timer interrupt. This CSR is part of the Sstc
extension which was ratified recently.

Signed-off-by: Atish Patra 
---
 target/riscv/cpu.c |  9 
 target/riscv/cpu.h |  5 ++
 target/riscv/cpu_bits.h|  4 ++
 target/riscv/csr.c | 77 ++
 target/riscv/machine.c |  1 +
 target/riscv/meson.build   |  3 +-
 target/riscv/time_helper.c | 98 ++
 target/riscv/time_helper.h | 30 
 8 files changed, 226 insertions(+), 1 deletion(-)
 create mode 100644 target/riscv/time_helper.c
 create mode 100644 target/riscv/time_helper.h

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d4635c7df46b..2498b93105fd 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -23,6 +23,7 @@
 #include "qemu/log.h"
 #include "cpu.h"
 #include "internals.h"
+#include "time_helper.h"
 #include "exec/exec-all.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
@@ -99,6 +100,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
 ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
 ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
+ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc),
 ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0, ext_svinval),
 ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
@@ -675,6 +677,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 set_resetvec(env, cpu->cfg.resetvec);
 
+#ifndef CONFIG_USER_ONLY
+if (cpu->cfg.ext_sstc) {
+riscv_timer_init(cpu);
+}
+#endif /* CONFIG_USER_ONLY */
+
 /* Validate that MISA_MXL is set properly. */
 switch (env->misa_mxl_max) {
 #ifdef TARGET_RISCV64
@@ -995,6 +1003,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("Zve64f", RISCVCPU, cfg.ext_zve64f, false),
 DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
 DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
+DEFINE_PROP_BOOL("sstc", RISCVCPU, cfg.ext_sstc, true),
 
 DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
 DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0fae1569945c..4cda2905661e 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -309,6 +309,9 @@ struct CPUArchState {
 uint64_t mfromhost;
 uint64_t mtohost;
 
+/* Sstc CSRs */
+uint64_t stimecmp;
+
 /* physical memory protection */
 pmp_table_t pmp_state;
 target_ulong mseccfg;
@@ -362,6 +365,7 @@ struct CPUArchState {
 float_status fp_status;
 
 /* Fields from here on are preserved across CPU reset. */
+QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
 
 hwaddr kernel_addr;
 hwaddr fdt_addr;
@@ -425,6 +429,7 @@ struct RISCVCPUConfig {
 bool ext_ifencei;
 bool ext_icsr;
 bool ext_zihintpause;
+bool ext_sstc;
 bool ext_svinval;
 bool ext_svnapot;
 bool ext_svpbmt;
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 6be5a9e9f046..ac17cf1515c0 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -206,6 +206,10 @@
 #define CSR_STVAL   0x143
 #define CSR_SIP 0x144
 
+/* Sstc supervisor CSRs */
+#define CSR_STIMECMP0x14D
+#define CSR_STIMECMPH   0x15D
+
 /* Supervisor Protection and Translation */
 #define CSR_SPTBR   0x180
 #define CSR_SATP0x180
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0fb042b2fd0f..e18b000700e4 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -22,6 +22,7 @@
 #include "qemu/timer.h"
 #include "cpu.h"
 #include "pmu.h"
+#include "time_helper.h"
 #include "qemu/main-loop.h"
 #include "exec/exec-all.h"
 #include "sysemu/cpu-timers.h"
@@ -803,6 +804,72 @@ static RISCVException read_timeh(CPURISCVState *env, int 
csrno,
 return RISCV_EXCP_NONE;
 }
 
+static RISCVException sstc(CPURISCVState *env, int csrno)
+{
+CPUState *cs = env_cpu(env);
+RISCVCPU *cpu = RISCV_CPU(cs);
+
+if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (env->priv == PRV_M) {
+return RISCV_EXCP_NONE;
+}
+
+/*
+ * No need of separate function for rv32 as menvcfg stores both menvcfg
+ * menvcfgh for RV32.
+ */
+if (!(get_field(env->mcounteren, COUNTEREN_TM) &&
+  get_field(env->menvcfg, MENVCFG_STCE))) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return smode(env, csrno);
+}
+
+static RISCVException read_stimecmp(CPURISCVState *env, int csrno,
+target_ulong *val)
+{
+*val = env->stimecmp;
+return RISCV_EXCP_NONE;
+}
+

[PATCH v8 1/3] hw/intc: Move mtimer/mtimecmp to aclint

2022-08-03 Thread Atish Patra
Historically, The mtime/mtimecmp has been part of the CPU because
they are per hart entities. However, they actually belong to aclint
which is a MMIO device.

Move them to the ACLINT device. This also emulates the real hardware
more closely.

Reviewed-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Andrew Jones 
Signed-off-by: Atish Patra 
---
 hw/intc/riscv_aclint.c | 41 --
 hw/timer/ibex_timer.c  | 18 ++-
 include/hw/intc/riscv_aclint.h |  2 ++
 include/hw/timer/ibex_timer.h  |  2 ++
 target/riscv/cpu.h |  2 --
 target/riscv/machine.c |  5 ++---
 6 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
index e7942c4e5a32..a125c73d535c 100644
--- a/hw/intc/riscv_aclint.c
+++ b/hw/intc/riscv_aclint.c
@@ -32,6 +32,7 @@
 #include "hw/intc/riscv_aclint.h"
 #include "qemu/timer.h"
 #include "hw/irq.h"
+#include "migration/vmstate.h"
 
 typedef struct riscv_aclint_mtimer_callback {
 RISCVAclintMTimerState *s;
@@ -65,8 +66,8 @@ static void 
riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
 
 uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
 
-cpu->env.timecmp = value;
-if (cpu->env.timecmp <= rtc_r) {
+mtimer->timecmp[hartid] = value;
+if (mtimer->timecmp[hartid] <= rtc_r) {
 /*
  * If we're setting an MTIMECMP value in the "past",
  * immediately raise the timer interrupt
@@ -77,7 +78,7 @@ static void 
riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
 
 /* otherwise, set up the future timer interrupt */
 qemu_irq_lower(mtimer->timer_irqs[hartid - mtimer->hartid_base]);
-diff = cpu->env.timecmp - rtc_r;
+diff = mtimer->timecmp[hartid] - rtc_r;
 /* back to ns (note args switched in muldiv64) */
 uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND, timebase_freq);
 
@@ -102,7 +103,7 @@ static void 
riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
 next = MIN(next, INT64_MAX);
 }
 
-timer_mod(cpu->env.timer, next);
+timer_mod(mtimer->timers[hartid], next);
 }
 
 /*
@@ -133,11 +134,11 @@ static uint64_t riscv_aclint_mtimer_read(void *opaque, 
hwaddr addr,
   "aclint-mtimer: invalid hartid: %zu", hartid);
 } else if ((addr & 0x7) == 0) {
 /* timecmp_lo for RV32/RV64 or timecmp for RV64 */
-uint64_t timecmp = env->timecmp;
+uint64_t timecmp = mtimer->timecmp[hartid];
 return (size == 4) ? (timecmp & 0x) : timecmp;
 } else if ((addr & 0x7) == 4) {
 /* timecmp_hi */
-uint64_t timecmp = env->timecmp;
+uint64_t timecmp = mtimer->timecmp[hartid];
 return (timecmp >> 32) & 0x;
 } else {
 qemu_log_mask(LOG_UNIMP,
@@ -177,7 +178,7 @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr 
addr,
 } else if ((addr & 0x7) == 0) {
 if (size == 4) {
 /* timecmp_lo for RV32/RV64 */
-uint64_t timecmp_hi = env->timecmp >> 32;
+uint64_t timecmp_hi = mtimer->timecmp[hartid] >> 32;
 riscv_aclint_mtimer_write_timecmp(mtimer, RISCV_CPU(cpu), 
hartid,
 timecmp_hi << 32 | (value & 0x));
 } else {
@@ -188,7 +189,7 @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr 
addr,
 } else if ((addr & 0x7) == 4) {
 if (size == 4) {
 /* timecmp_hi for RV32/RV64 */
-uint64_t timecmp_lo = env->timecmp;
+uint64_t timecmp_lo = mtimer->timecmp[hartid];
 riscv_aclint_mtimer_write_timecmp(mtimer, RISCV_CPU(cpu), 
hartid,
 value << 32 | (timecmp_lo & 0x));
 } else {
@@ -234,7 +235,7 @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr 
addr,
 }
 riscv_aclint_mtimer_write_timecmp(mtimer, RISCV_CPU(cpu),
   mtimer->hartid_base + i,
-  env->timecmp);
+  mtimer->timecmp[i]);
 }
 return;
 }
@@ -284,6 +285,8 @@ static void riscv_aclint_mtimer_realize(DeviceState *dev, 
Error **errp)
 s->timer_irqs = g_new(qemu_irq, s->num_harts);
 qdev_init_gpio_out(dev, s->timer_irqs, s->num_harts);
 
+s->timers = g_new0(QEMUTimer *, s->num_harts);
+s->timecmp = g_new0(uint64_t, s->num_harts);
 /* Claim timer interrupt bits */
 for (i = 0; i < s->num_harts; i++) {
 RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(s->hartid_base + i));
@@ -310,6 +313,18 @@ static void riscv_aclint_mtimer_reset_enter(Object *obj, 
ResetType type)
 riscv_aclint_mtimer_write(mtimer, mtimer->time_base, 0, 8);
 }
 
+static const VMStateDescription vmstate_riscv_mtimer = {
+.name = 

[PATCH v8 0/3] Implement Sstc extension

2022-08-03 Thread Atish Patra
This series implements Sstc extension[1] which was ratified recently.

The first patch is a prepartory patches while PATCH 2 adds stimecmp
support while PATCH 3 adds vstimecmp support. This series is based on
on top of upstream commit (faee5441a038).

The series can also be found at
https://github.com/atishp04/qemu/tree/sstc_v8

It is tested on RV32 & RV64 with latest OpenSBI & Linux kernel[2]
patches.

Changes from v7->v8:
1. Removed redundant blank lines.
2. Invoke smode & hmode predicate function from sstc related predicate
   functions.

Changes from v6->v7:
1. Replaced g_malloc0 with g_new0.
2. Removed the over allocation for the timers.

Changes from v5->v6:
1. Rebased on top of the latest HEAD commit.

Changes from v4->v5:
1. Removed any ordering related flags and emulate the hardware more
   closely. 

Changes from v3->v4:
1. Added [v]stimecmp_wr_done to the corresponding vmstate strucuture.

Changes from v2->v3:
1. Dropped generic migration code improvement patches.
2. Removed the order constraints while updating stimecmp/vstimecmp.

Changes from v1->v2:
1. Rebased on the latest upstream commit.
2. Replaced PATCH 1 with another patch where mtimer/timecmp is
   moved from CPU to ACLINT.
3. Added ACLINT migration support.

[1] https://drive.google.com/file/d/1m84Re2yK8m_vbW7TspvevCDR82MOBaSX/view
[2] https://github.com/atishp04/linux/tree/sstc_v8

Atish Patra (3):
hw/intc: Move mtimer/mtimecmp to aclint
target/riscv: Add stimecmp support
target/riscv: Add vstimecmp support

hw/intc/riscv_aclint.c |  41 +---
hw/timer/ibex_timer.c  |  18 ++--
include/hw/intc/riscv_aclint.h |   2 +
include/hw/timer/ibex_timer.h  |   2 +
target/riscv/cpu.c |   9 ++
target/riscv/cpu.h |  11 ++-
target/riscv/cpu_bits.h|   8 ++
target/riscv/cpu_helper.c  |  11 ++-
target/riscv/csr.c | 175 +
target/riscv/machine.c |   7 +-
target/riscv/meson.build   |   3 +-
target/riscv/time_helper.c | 114 +
target/riscv/time_helper.h |  30 ++
13 files changed, 399 insertions(+), 32 deletions(-)
create mode 100644 target/riscv/time_helper.c
create mode 100644 target/riscv/time_helper.h

--
2.25.1




Re: [PATCH v7 3/3] target/riscv: Add vstimecmp support

2022-08-03 Thread Weiwei Li


在 2022/8/4 上午5:05, Atish Kumar Patra 写道:



On Wed, Aug 3, 2022 at 1:49 AM Weiwei Li > wrote:



在 2022/8/3 下午4:25, Atish Patra 写道:
> vstimecmp CSR allows the guest OS or to program the next guest timer
> interrupt directly. Thus, hypervisor no longer need to inject the
> timer interrupt to the guest if vstimecmp is used. This was ratified
> as a part of the Sstc extension.
>
> Signed-off-by: Atish Patra mailto:ati...@rivosinc.com>>
> ---
>   target/riscv/cpu.h         |   4 ++
>   target/riscv/cpu_bits.h    |   4 ++
>   target/riscv/cpu_helper.c  |  11 ++--
>   target/riscv/csr.c         | 100
-
>   target/riscv/machine.c     |   1 +
>   target/riscv/time_helper.c |  16 ++
>   6 files changed, 131 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 4cda2905661e..1fd382b2717f 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -312,6 +312,8 @@ struct CPUArchState {
>       /* Sstc CSRs */
>       uint64_t stimecmp;
>
> +    uint64_t vstimecmp;
> +
>       /* physical memory protection */
>       pmp_table_t pmp_state;
>       target_ulong mseccfg;
> @@ -366,6 +368,8 @@ struct CPUArchState {
>
>       /* Fields from here on are preserved across CPU reset. */
>       QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
> +    QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */
> +    bool vstime_irq;
>
>       hwaddr kernel_addr;
>       hwaddr fdt_addr;
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index ac17cf1515c0..095dab19f512 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -257,6 +257,10 @@
>   #define CSR_VSIP            0x244
>   #define CSR_VSATP           0x280
>
> +/* Sstc virtual CSRs */
> +#define CSR_VSTIMECMP       0x24D
> +#define CSR_VSTIMECMPH      0x25D
> +
>   #define CSR_MTINST          0x34a
>   #define CSR_MTVAL2          0x34b
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 650574accf0a..1e4faa84e839 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -345,8 +345,9 @@ uint64_t riscv_cpu_all_pending(CPURISCVState
*env)
>   {
>       uint32_t gein = get_field(env->hstatus, HSTATUS_VGEIN);
>       uint64_t vsgein = (env->hgeip & (1ULL << gein)) ?
MIP_VSEIP : 0;
> +    uint64_t vstip = (env->vstime_irq) ? MIP_VSTIP : 0;
>
> -    return (env->mip | vsgein) & env->mie;
> +    return (env->mip | vsgein | vstip) & env->mie;
>   }
>
>   int riscv_cpu_mirq_pending(CPURISCVState *env)
> @@ -605,7 +606,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu,
uint64_t mask, uint64_t value)
>   {
>       CPURISCVState *env = >env;
>       CPUState *cs = CPU(cpu);
> -    uint64_t gein, vsgein = 0, old = env->mip;
> +    uint64_t gein, vsgein = 0, vstip = 0, old = env->mip;
>       bool locked = false;
>
>       if (riscv_cpu_virt_enabled(env)) {
> @@ -613,6 +614,10 @@ uint64_t riscv_cpu_update_mip(RISCVCPU
*cpu, uint64_t mask, uint64_t value)
>           vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
>       }
>
> +    /* No need to update mip for VSTIP */
> +    mask = ((mask == MIP_VSTIP) && env->vstime_irq) ? 0 : mask;
> +    vstip = env->vstime_irq ? MIP_VSTIP : 0;
> +
>       if (!qemu_mutex_iothread_locked()) {
>           locked = true;
>           qemu_mutex_lock_iothread();
> @@ -620,7 +625,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu,
uint64_t mask, uint64_t value)
>
>       env->mip = (env->mip & ~mask) | (value & mask);
>
> -    if (env->mip | vsgein) {
> +    if (env->mip | vsgein | vstip) {
>           cpu_interrupt(cs, CPU_INTERRUPT_HARD);
>       } else {
>           cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index b71e2509b64f..d4265dd3cca2 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -833,17 +833,98 @@ static RISCVException sstc(CPURISCVState
*env, int csrno)
>       return RISCV_EXCP_NONE;
>   }
>
> +static RISCVException sstc_hmode(CPURISCVState *env, int csrno)
> +{
> +    CPUState *cs = env_cpu(env);
> +    RISCVCPU *cpu = RISCV_CPU(cs);
> +
> +    if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {
> +        return RISCV_EXCP_ILLEGAL_INST;
> +    }
> +
> +    if (env->priv == PRV_M) {
> +        return RISCV_EXCP_NONE;
> +    }
> +
> +    if (!(get_field(env->mcounteren, COUNTEREN_TM) &
> +          get_field(env->menvcfg, MENVCFG_STCE))) {
> +  

Re: [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions

2022-08-03 Thread Xiaoyao Li

On 8/3/2022 3:33 PM, Chenyi Qiang wrote:



On 8/2/2022 3:47 PM, Xiaoyao Li wrote:

According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
bits of TD can be classified into 6 types:


1 | As configured | configurable by VMM, independent of native value;

2 | As configured | configurable by VMM if the bit is supported natively
 (if native)   | Otherwise it equals as native(0).

3 | Fixed | fixed to 0/1

4 | Native    | reflect the native value

5 | Calculated    | calculated by TDX module.

6 | Inducing #VE  | get #VE exception


Note:
1. All the configurable XFAM related features and TD attributes related
    features fall into type #2. And fixed0/1 bits of XFAM and TD
    attributes fall into type #3.

2. For CPUID leaves not listed in "CPUID virtualization Overview" table
    in TDX module spec. When they are queried, TDX module injects #VE to
    TDs. For this case, TDs can request CPUID emulation from VMM via
    TDVMCALL and the values are fully controlled by VMM.

Due to TDX module has its own virtualization policy on CPUID bits, it 
leads

to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
CPUID bits for TDS. In order to keep a consistent CPUID configuration
between VMM and TDs. Adjust supported CPUID for TDs based on TDX
restrictions.

Currently only focus on the CPUID leaves recognized by QEMU's
feature_word_info[] that are indexed by a FeatureWord.

Introduce a TDX CPUID lookup table, which maintains 1 entry for each
FeatureWord. Each entry has below fields:

  - tdx_fixed0/1: The bits that are fixed as 0/1;

  - vmm_fixup:   The bits that are configurable from the view of TDX 
module.
 But they requires emulation of VMM when they are 
configured

    as enabled. For those, they are not supported if VMM doesn't
    report them as supported. So they need be fixed up by
    checking if VMM supports them.

  - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
 totally configurable by VMM.

  - supported_on_ve: It's valid only when @inducing_ve is true. It 
represents

    the maximum feature set supported that be emulated
    for TDs.

By applying TDX CPUID lookup table and TDX capabilities reported from
TDX module, the supported CPUID for TDs can be obtained from following
steps:

- get the base of VMM supported feature set;

- if the leaf is not a FeatureWord just return VMM's value without
   modification;

- if the leaf is an inducing_ve type, applying supported_on_ve mask and
   return;

- include all native bits, it covers type #2, #4, and parts of type #1.
   (it also includes some unsupported bits. The following step will
    correct it.)

- apply fixed0/1 to it (it covers #3, and rectifies the previous step);

- add configurable bits (it covers the other part of type #1);

- fix the ones in vmm_fixup;

- filter the one has valid .supported field;


What does .supported field filter mean here?



(Calculated type is ignored since it's determined at runtime).

Co-developed-by: Chenyi Qiang 
Signed-off-by: Chenyi Qiang 
Signed-off-by: Xiaoyao Li 
---
  target/i386/cpu.h |  16 +++
  target/i386/kvm/kvm.c |   4 +
  target/i386/kvm/tdx.c | 255 ++
  target/i386/kvm/tdx.h |   2 +
  4 files changed, 277 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 82004b65b944..cc9da9fc4318 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -771,6 +771,8 @@ uint64_t 
x86_cpu_get_supported_feature_word(FeatureWord w,

  /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */
  #define CPUID_7_0_EBX_FSGSBASE  (1U << 0)
+/* Support for TSC adjustment MSR 0x3B */
+#define CPUID_7_0_EBX_TSC_ADJUST    (1U << 1)
  /* Support SGX */
  #define CPUID_7_0_EBX_SGX   (1U << 2)
  /* 1st Group of Advanced Bit Manipulation Extensions */
@@ -789,8 +791,12 @@ uint64_t 
x86_cpu_get_supported_feature_word(FeatureWord w,

  #define CPUID_7_0_EBX_INVPCID   (1U << 10)
  /* Restricted Transactional Memory */
  #define CPUID_7_0_EBX_RTM   (1U << 11)
+/* Cache QoS Monitoring */
+#define CPUID_7_0_EBX_PQM   (1U << 12)
  /* Memory Protection Extension */
  #define CPUID_7_0_EBX_MPX   (1U << 14)
+/* Resource Director Technology Allocation */
+#define CPUID_7_0_EBX_RDT_A (1U << 15)
  /* AVX-512 Foundation */
  #define CPUID_7_0_EBX_AVX512F   (1U << 16)
  /* 

Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions

2022-08-03 Thread Gavin Shan

Hi Eric,

On 8/3/22 10:52 PM, Eric Auger wrote:

On 8/3/22 15:02, Gavin Shan wrote:

On 8/3/22 5:01 PM, Marc Zyngier wrote:

On Wed, 03 Aug 2022 04:01:04 +0100,
Gavin Shan  wrote:

On 8/2/22 7:41 PM, Eric Auger wrote:

On 8/2/22 08:45, Gavin Shan wrote:

There are 3 highmem IO regions as below. They can be disabled in
two situations: (a) The specific region is disabled by user. (b)
The specific region doesn't fit in the PA space. However, the base
address and highest_gpa are still updated no matter if the region
is enabled or disabled. It's incorrectly incurring waste in the PA
space.

If I am not wrong highmem_redists and highmem_mmio are not user
selectable

Only highmem ecam depends on machine type & ACPI setup. But I would
say
that in server use case it is always set. So is that optimization
really
needed?


There are two other cases you missed.

- highmem_ecam is enabled after virt-2.12, meaning it stays disabled
    before that.


I don't get this. The current behaviour is to disable highmem_ecam if
it doesn't fit in the PA space. I can't see anything that enables it
if it was disabled the first place.



There are several places or conditions where vms->highmem_ecam can be
disabled:

- virt_instance_init() where vms->highmem_ecam is inherited from
   !vmc->no_highmem_ecam. The option is set to true after virt-2.12
   in virt_machine_2_12_options().

- machvirt_init() where vms->highmem_ecam can be disable if we have
   32-bits vCPUs and failure on loading firmware.

- Another place is where we're talking about. It's address assignment
   to fit the PA space.



- The high memory region can be disabled if user is asking large
    (normal) memory space through 'maxmem=' option. When the requested
    memory by 'maxmem=' is large enough, the high memory regions are
    disabled. It means the normal memory has higher priority than those
    high memory regions. This is the case I provided in (b) of the
    commit log.


Why is that a problem? It matches the expected behaviour, as the
highmem IO region is floating and is pushed up by the memory region.



Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO regions
aren't user selectable. I tended to explain why it's not true. 'maxmem='
can affect the outcome. When 'maxmem=' value is big enough, there will be
no free area in the PA space to hold those two regions.



In the commit log, I was supposed to say something like below for
(a):

- The specific high memory region can be disabled through changing
    the code by user or developer. For example, 'vms->highmem_mmio'
    is changed from true to false in virt_instance_init().


Huh. By this principle, the user can change anything. Why is it
important?



Still like above. I was explaining the possible cases where those
3 switches can be turned on/off by users or developers. Our code
needs to be consistent and comprehensive.

   vms->highmem_redists
   vms->highmem_ecam
   vms->mmio





Improve address assignment for highmem IO regions to avoid the waste
in the PA space by putting the logic into virt_memmap_fits().


I guess that this is what I understand the least. What do you mean by
"wasted PA space"? Either the regions fit in the PA space, and
computing their addresses in relevant, or they fall outside of it and
what we stick in memap[index].base is completely irrelevant.



It's possible that we run into the following combination. we should
have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However,
the region is disabled in the original implementation because
VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is
unecessary and waste in the PA space.

each region's base is aligned on its size.


Yes.



     static MemMapEntry extended_memmap[] = {
     [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
     [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
     [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },

so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not
imply any amount of RAM. This depends on the address space.
I    };


Yes. Prior to the start of system memory, there is 1GB used by
various regions either.



     IPA_LIMIT   = (1UL << 40)
     '-maxmem'   = 511GB  /* Memory starts from 1GB */
     '-slots'    = 0
     vms->highmem_rdist2 = false

How can this happen? the only reason for highmem_redists to be reset is
if it does not fit into map_ipa. So if mmio fits, highmem_redists does
too. What do I miss?


The example is having "vms->highmem_rdist2 = flase" BEFORE the address
assignment, it's possible that developer changes the code to disable
it intentionally. The point is the original implementation isn't comprehensive
because it has the wrong assumption that vms->highmem_{rdist2, ecam, mmio} all
true before the address assignment. With the wrong assumption, the base address
is always increased, even the previous region is disabled, during the
address assignment in virt_set_memmap().



     

[PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld
The boot parameter header refers to setup_data at an absolute address,
and each setup_data refers to the next setup_data at an absolute address
too. Currently QEMU simply puts the setup_datas right after the kernel
image, and since the kernel_image is loaded at prot_addr -- a fixed
address knowable to QEMU apriori -- the setup_data absolute address
winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.

This mostly works fine, so long as the kernel image really is loaded at
prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
generally EFI doesn't give a good way of predicting where it's going to
load the kernel. So when it loads it at some address != prot_addr, the
absolute addresses in setup_data now point somewhere bogus, causing
crashes when EFI stub tries to follow the next link.

Fix this by placing setup_data at some fixed place in memory, relative
to real_addr, not as part of the kernel image, and then pointing the
setup_data absolute address to that fixed place in memory. This way,
even if OVMF or other chains relocate the kernel image, the boot
parameter still points to the correct absolute address.

Fixes: 3cbeb52467 ("hw/i386: add device tree support")
Reported-by: Xiaoyao Li 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Cc: Daniel P. Berrangé 
Cc: Gerd Hoffmann 
Cc: Ard Biesheuvel 
Cc: linux-...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..8b853abf38 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -760,36 +760,36 @@ static bool load_elfboot(const char *kernel_filename,
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
 
 return true;
 }
 
 void x86_load_linux(X86MachineState *x86ms,
 FWCfgState *fw_cfg,
 int acpi_data_size,
 bool pvh_enabled,
 bool legacy_no_rng_seed)
 {
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
-int dtb_size, setup_data_offset;
+int dtb_size, setup_data_item_len, setup_data_total_len = 0;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel;
-hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
+uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
+hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0, setup_data_base;
 FILE *f;
 char *vmode;
 MachineState *machine = MACHINE(x86ms);
 struct setup_data *setup_data;
 const char *kernel_filename = machine->kernel_filename;
 const char *initrd_filename = machine->initrd_filename;
 const char *dtb_filename = machine->dtb;
 const char *kernel_cmdline = machine->kernel_cmdline;
 SevKernelLoaderContext sev_load_ctx = {};
 enum { RNG_SEED_LENGTH = 32 };
 
 /* Align to 16 bytes as a paranoia measure */
 cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
 
 /* load the kernel header */
 f = fopen(kernel_filename, "rb");
@@ -886,32 +886,33 @@ void x86_load_linux(X86MachineState *x86ms,
 if (protocol < 0x200 || !(header[0x211] & 0x01)) {
 /* Low kernel */
 real_addr= 0x9;
 cmdline_addr = 0x9a000 - cmdline_size;
 prot_addr= 0x1;
 } else if (protocol < 0x202) {
 /* High but ancient kernel */
 real_addr= 0x9;
 cmdline_addr = 0x9a000 - cmdline_size;
 prot_addr= 0x10;
 } else {
 /* High and recent kernel */
 real_addr= 0x1;
 cmdline_addr = 0x2;
 prot_addr= 0x10;
 }
+setup_data_base = real_addr + 0x8000;
 
 /* highest address for loading the initrd */
 if (protocol >= 0x20c &&
 lduw_p(header + 0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
 /*
  * Linux has supported initrd up to 4 GB for a very long time (2007,
  * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
  * though it only sets initrd_max to 2 GB to "work around bootloader
  * bugs". Luckily, QEMU firmware(which does something like bootloader)
  * has supported this.
  *
  * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
  * be loaded into any address.
  *
  * In addition, initrd_max is uint32_t simply because QEMU doesn't
  * support the 64-bit boot protocol (specifically the ext_ramdisk_image
@@ -1049,60 +1050,61 @@ void x86_load_linux(X86MachineState *x86ms,
 fclose(f);
 
 /* append dtb to kernel */
 if (dtb_filename) {
 

Re: [PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld
Hey again,

On Thu, Aug 04, 2022 at 12:50:50AM +0200, Jason A. Donenfeld wrote:
> Hi Michael,
> 
> On Wed, Aug 03, 2022 at 06:25:39PM -0400, Michael S. Tsirkin wrote:
> > > -/* Offset 0x250 is a pointer to the first setup_data link. */
> > > -stq_p(header + 0x250, first_setup_data);
> > > +if (first_setup_data) {
> > > +/* Offset 0x250 is a pointer to the first setup_data link. */
> > > +stq_p(header + 0x250, first_setup_data);
> > > +rom_add_blob("setup_data", setup_datas, 
> > > setup_data_total_len, setup_data_total_len,
> > > + SETUP_DATA_PHYS_BASE, NULL, NULL, NULL, NULL, 
> > > false);
> > > +}
> > > +
> > >
> > 
> > Allocating memory on x86 is tricky business.  Can we maybe use 
> > bios-linker-loader
> > with COMMAND_WRITE_POINTER to get an address from firmware?
> 
> Hmm. Is BIOSLinker even available to us at this stage in preparation?
> 
> One thing to note is that this memory doesn't really need to be
> persistent. It's only used extrmely early in boot. So it could be
> somewhere that gets used/remapped later on.

Actually, it's possible there's one place that's already available, and
that this isn't so bad after all. In my tests, this seems to be working
in a wide variety of configurations. I'll send a v2.

Jason



Re: [PATCH] hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[]

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Daniel Henrique Barboza wrote:

We're not storing all GPIO lines we're retrieving with
qdev_get_gpio_in() in mal_irqs[]. We're storing just the last one in the
first index:

   for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
   mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
   }
   ppc4xx_mal_init(env, 4, 16, mal_irqs);


Indeed, this used to be ppc4xx_mal_init(env, 4, 16, [2][3]); before 
706e944206d7 and this typo slipped thorugh unnoticed, likely because the 
MAL is only there for the firmware to be happy. I think it would be used 
by the EMAC Ethernet port or maybe SATA which are not emulated so probably 
nothing really uses the MAL.


Acked-by: BALATON Zoltan 



mal_irqs is used in ppc4xx_mal_init() to assign the IRQs to MAL:

   for (i = 0; i < 4; i++) {
   mal->irqs[i] = irqs[i];
   }

Since only irqs[0] has been initialized, mal->irqs[1,2,3] are being
zeroed.

This doesn´t seem to trigger any apparent issues at this moment, but
Cedric's QOMification of the MAL device [1] is executing a
sysbus_connect_irq() that will fail if we do not store all GPIO lines
properly.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00497.html

Cc: Peter Maydell 
Cc: BALATON Zoltan 
Fixes: 706e944206d7 ("hw/ppc/sam460ex: Drop use of ppcuic_init()")
Signed-off-by: Daniel Henrique Barboza 
---
hw/ppc/sam460ex.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 7e8da657c2..0357ee077f 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -384,7 +384,7 @@ static void sam460ex_init(MachineState *machine)

/* MAL */
for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
-mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
+mal_irqs[i] = qdev_get_gpio_in(uic[2], 3 + i);
}
ppc4xx_mal_init(env, 4, 16, mal_irqs);



Re: [PATCH v2 16/20] ppc/ppc405: QOM'ify MAL

2022-08-03 Thread Daniel Henrique Barboza

This patch really broke sam460ex boot, but not because of the
QOMification. I managed to get it work by doing the following:

On 8/3/22 10:28, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
  hw/ppc/ppc405.h |   1 +
  include/hw/ppc/ppc4xx.h |  28 ++
  hw/ppc/ppc405_uc.c  |  20 +--
  hw/ppc/ppc4xx_devs.c| 120 +---
  4 files changed, 118 insertions(+), 51 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 8ca32f35ce67..7d585a244d18 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -259,6 +259,7 @@ struct Ppc405SoCState {
  Ppc405OpbaState opba;
  Ppc405PobState pob;
  Ppc405PlbState plb;
+Ppc4xxMalState mal;
  };
  
  /* PowerPC 405 core */

diff --git a/include/hw/ppc/ppc4xx.h b/include/hw/ppc/ppc4xx.h
index 021376c2d260..c31219265273 100644
--- a/include/hw/ppc/ppc4xx.h
+++ b/include/hw/ppc/ppc4xx.h
@@ -26,6 +26,7 @@
  #define PPC4XX_H
  
  #include "hw/ppc/ppc.h"

+#include "hw/sysbus.h"
  #include "exec/memory.h"
  
  /* PowerPC 4xx core initialization */

@@ -45,6 +46,33 @@ void ppc4xx_sdram_init (CPUPPCState *env, qemu_irq irq, int 
nbanks,
  hwaddr *ram_sizes,
  int do_init);
  
+/* Memory Access Layer (MAL) */

+#define TYPE_PPC4xx_MAL "ppc4xx-mal"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc4xxMalState, PPC4xx_MAL);
+struct Ppc4xxMalState {
+SysBusDevice parent_obj;
+
+PowerPCCPU *cpu;
+
+qemu_irq irqs[4];
+uint32_t cfg;
+uint32_t esr;
+uint32_t ier;
+uint32_t txcasr;
+uint32_t txcarr;
+uint32_t txeobisr;
+uint32_t txdeir;
+uint32_t rxcasr;
+uint32_t rxcarr;
+uint32_t rxeobisr;
+uint32_t rxdeir;
+uint32_t *txctpr;
+uint32_t *rxctpr;
+uint32_t *rcbs;
+uint8_t  txcnum;
+uint8_t  rxcnum;
+};
+
  void ppc4xx_mal_init(CPUPPCState *env, uint8_t txcnum, uint8_t rxcnum,
   qemu_irq irqs[4]);
  
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c

index 9bbd524ad5ea..f39e0b44f9cc 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1466,12 +1466,13 @@ static void ppc405_soc_instance_init(Object *obj)
  object_initialize_child(obj, "pob", >pob, TYPE_PPC405_POB);
  
  object_initialize_child(obj, "plb", >plb, TYPE_PPC405_PLB);

+
+object_initialize_child(obj, "mal", >mal, TYPE_PPC4xx_MAL);
  }
  
  static void ppc405_soc_realize(DeviceState *dev, Error **errp)

  {
  Ppc405SoCState *s = PPC405_SOC(dev);
-qemu_irq mal_irqs[4];
  CPUPPCState *env;
  Error *err = NULL;
  int i;
@@ -1610,11 +1611,18 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)
  }
  
  /* MAL */

-mal_irqs[0] = qdev_get_gpio_in(s->uic, 11);
-mal_irqs[1] = qdev_get_gpio_in(s->uic, 12);
-mal_irqs[2] = qdev_get_gpio_in(s->uic, 13);
-mal_irqs[3] = qdev_get_gpio_in(s->uic, 14);
-ppc4xx_mal_init(env, 4, 2, mal_irqs);
+object_property_set_int(OBJECT(>mal), "txc-num", 4, _abort);
+object_property_set_int(OBJECT(>mal), "rxc-num", 2, _abort);
+object_property_set_link(OBJECT(>mal), "cpu", OBJECT(>cpu),
+ _abort);
+if (!sysbus_realize(SYS_BUS_DEVICE(>mal), errp)) {
+return;
+}
+
+for (i = 0; i < ARRAY_SIZE(s->mal.irqs); i++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(>mal), i,
+   qdev_get_gpio_in(s->uic, 11 + i));
+}
  
  /* Ethernet */

  /* Uses UIC IRQs 9, 15, 17 */
diff --git a/hw/ppc/ppc4xx_devs.c b/hw/ppc/ppc4xx_devs.c
index f20098cf417c..0e97347e2839 100644
--- a/hw/ppc/ppc4xx_devs.c
+++ b/hw/ppc/ppc4xx_devs.c
@@ -491,32 +491,10 @@ enum {
  MAL0_RCBS1= 0x1E1,
  };
  
-typedef struct ppc4xx_mal_t ppc4xx_mal_t;

-struct ppc4xx_mal_t {
-qemu_irq irqs[4];
-uint32_t cfg;
-uint32_t esr;
-uint32_t ier;
-uint32_t txcasr;
-uint32_t txcarr;
-uint32_t txeobisr;
-uint32_t txdeir;
-uint32_t rxcasr;
-uint32_t rxcarr;
-uint32_t rxeobisr;
-uint32_t rxdeir;
-uint32_t *txctpr;
-uint32_t *rxctpr;
-uint32_t *rcbs;
-uint8_t  txcnum;
-uint8_t  rxcnum;
-};
-
-static void ppc4xx_mal_reset(void *opaque)
+static void ppc4xx_mal_reset(DeviceState *dev)
  {
-ppc4xx_mal_t *mal;
+Ppc4xxMalState *mal = PPC4xx_MAL(dev);
  
-mal = opaque;

  mal->cfg = 0x0007C000;
  mal->esr = 0x;
  mal->ier = 0x;
@@ -530,10 +508,9 @@ static void ppc4xx_mal_reset(void *opaque)
  
  static uint32_t dcr_read_mal(void *opaque, int dcrn)

  {
-ppc4xx_mal_t *mal;
+Ppc4xxMalState *mal = PPC4xx_MAL(opaque);
  uint32_t ret;
  
-mal = opaque;

  switch (dcrn) {
  case MAL0_CFG:
  ret = mal->cfg;
@@ -587,13 +564,12 @@ static uint32_t dcr_read_mal(void *opaque, int dcrn)
  
  static void dcr_write_mal(void *opaque, int dcrn, uint32_t val)

  {
-ppc4xx_mal_t *mal;
+Ppc4xxMalState *mal = 

Re: [PATCH v2 15/20] ppc/ppc405: QOM'ify PLB

2022-08-03 Thread Daniel Henrique Barboza




On 8/3/22 10:28, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
  hw/ppc/ppc405.h| 14 ++
  hw/ppc/ppc405_uc.c | 67 +-
  2 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 8acb90427596..8ca32f35ce67 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -65,6 +65,19 @@ struct ppc4xx_bd_info_t {
  
  typedef struct Ppc405SoCState Ppc405SoCState;
  
+/* Peripheral local bus arbitrer */

+#define TYPE_PPC405_PLB "ppc405-plb"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405PlbState, PPC405_PLB);
+struct Ppc405PlbState {
+DeviceState parent_obj;
+
+PowerPCCPU *cpu;
+
+uint32_t acr;
+uint32_t bear;
+uint32_t besr;
+};
+
  /* PLB to OPB bridge */
  #define TYPE_PPC405_POB "ppc405-pob"
  OBJECT_DECLARE_SIMPLE_TYPE(Ppc405PobState, PPC405_POB);
@@ -245,6 +258,7 @@ struct Ppc405SoCState {
  Ppc405EbcState ebc;
  Ppc405OpbaState opba;
  Ppc405PobState pob;
+Ppc405PlbState plb;
  };
  
  /* PowerPC 405 core */

diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index ca214ee4d741..9bbd524ad5ea 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -148,19 +148,11 @@ enum {
  PLB4A1_ACR = 0x089,
  };
  
-typedef struct ppc4xx_plb_t ppc4xx_plb_t;

-struct ppc4xx_plb_t {
-uint32_t acr;
-uint32_t bear;
-uint32_t besr;
-};
-
  static uint32_t dcr_read_plb (void *opaque, int dcrn)
  {
-ppc4xx_plb_t *plb;
+Ppc405PlbState *plb = PPC405_PLB(opaque);
  uint32_t ret;
  
-plb = opaque;

  switch (dcrn) {
  case PLB0_ACR:
  ret = plb->acr;
@@ -182,9 +174,8 @@ static uint32_t dcr_read_plb (void *opaque, int dcrn)
  
  static void dcr_write_plb (void *opaque, int dcrn, uint32_t val)

  {
-ppc4xx_plb_t *plb;
+Ppc405PlbState *plb = PPC405_PLB(opaque);
  
-plb = opaque;

  switch (dcrn) {
  case PLB0_ACR:
  /* We don't care about the actual parameters written as
@@ -202,28 +193,55 @@ static void dcr_write_plb (void *opaque, int dcrn, 
uint32_t val)
  }
  }
  
-static void ppc4xx_plb_reset (void *opaque)

+static void ppc405_plb_reset(DeviceState *dev)
  {
-ppc4xx_plb_t *plb;
+Ppc405PlbState *plb = PPC405_PLB(dev);
  
-plb = opaque;

  plb->acr = 0x;
  plb->bear = 0x;
  plb->besr = 0x;
  }
  
-void ppc4xx_plb_init(CPUPPCState *env)

+static void ppc405_plb_realize(DeviceState *dev, Error **errp)
  {
-ppc4xx_plb_t *plb;
+Ppc405PlbState *plb = PPC405_PLB(dev);
+CPUPPCState *env;
+
+assert(plb->cpu);
+
+env = >cpu->env;
  
-plb = g_new0(ppc4xx_plb_t, 1);

  ppc_dcr_register(env, PLB3A0_ACR, plb, _read_plb, _write_plb);
  ppc_dcr_register(env, PLB4A0_ACR, plb, _read_plb, _write_plb);
  ppc_dcr_register(env, PLB0_ACR, plb, _read_plb, _write_plb);
  ppc_dcr_register(env, PLB0_BEAR, plb, _read_plb, _write_plb);
  ppc_dcr_register(env, PLB0_BESR, plb, _read_plb, _write_plb);
  ppc_dcr_register(env, PLB4A1_ACR, plb, _read_plb, _write_plb);
-qemu_register_reset(ppc4xx_plb_reset, plb);
+}
+
+static Property ppc405_plb_properties[] = {
+DEFINE_PROP_LINK("cpu", Ppc405PlbState, cpu, TYPE_POWERPC_CPU,
+ PowerPCCPU *),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ppc405_plb_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = ppc405_plb_realize;
+dc->user_creatable = false;
+dc->reset = ppc405_plb_reset;
+device_class_set_props(dc, ppc405_plb_properties);
+}
+
+void ppc4xx_plb_init(CPUPPCState *env)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
+
+object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);


This causes the same problem that happened in patch 12:


$ ./qemu-system-ppc64 -display none -M sam460ex
Unexpected error in object_property_find_err() at ../qom/object.c:1304:
qemu-system-ppc64: Property '460exb-powerpc64-cpu.cpu' not found
Aborted (core dumped)


The same fix applies here as well:


$ git diff
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index dd3c05a28b..fd53cf38e5 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -240,7 +240,7 @@ void ppc4xx_plb_init(CPUPPCState *env)
 PowerPCCPU *cpu = env_archcpu(env);
 DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
 
-object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);

+object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
 qdev_realize_and_unref(dev, NULL, _fatal);
 }
 


Daniel



+qdev_realize_and_unref(dev, NULL, _fatal);
  }
  
  /*/

@@ -1446,6 +1464,8 @@ static void ppc405_soc_instance_init(Object *obj)
  object_initialize_child(obj, "opba", >opba, TYPE_PPC405_OPBA);
  
  object_initialize_child(obj, "pob", >pob, 

Re: [PATCH v2 12/20] ppc/ppc405: QOM'ify EBC

2022-08-03 Thread Daniel Henrique Barboza

Cedric,

On 8/3/22 10:28, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
  hw/ppc/ppc405.h| 16 +++
  hw/ppc/ppc405_uc.c | 71 +++---
  2 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 1da34a7f10f3..1c7fe07b8084 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -65,7 +65,22 @@ struct ppc4xx_bd_info_t {
  
  typedef struct Ppc405SoCState Ppc405SoCState;
  
+/* Peripheral controller */

+#define TYPE_PPC405_EBC "ppc405-ebc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405EbcState, PPC405_EBC);
+struct Ppc405EbcState {
+DeviceState parent_obj;
+
+PowerPCCPU *cpu;
  
+uint32_t addr;

+uint32_t bcr[8];
+uint32_t bap[8];
+uint32_t bear;
+uint32_t besr0;
+uint32_t besr1;
+uint32_t cfg;
+};
  
  /* DMA controller */

  #define TYPE_PPC405_DMA "ppc405-dma"
@@ -203,6 +218,7 @@ struct Ppc405SoCState {
  Ppc405OcmState ocm;
  Ppc405GpioState gpio;
  Ppc405DmaState dma;
+Ppc405EbcState ebc;
  };
  
  /* PowerPC 405 core */

diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 6bd93c1cb90c..0166f3fc36da 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -393,17 +393,6 @@ static void ppc4xx_opba_init(hwaddr base)
  
  /*/

  /* Peripheral controller */
-typedef struct ppc4xx_ebc_t ppc4xx_ebc_t;
-struct ppc4xx_ebc_t {
-uint32_t addr;
-uint32_t bcr[8];
-uint32_t bap[8];
-uint32_t bear;
-uint32_t besr0;
-uint32_t besr1;
-uint32_t cfg;
-};
-
  enum {
  EBC0_CFGADDR = 0x012,
  EBC0_CFGDATA = 0x013,
@@ -411,10 +400,9 @@ enum {
  
  static uint32_t dcr_read_ebc (void *opaque, int dcrn)

  {
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(opaque);
  uint32_t ret;
  
-ebc = opaque;

  switch (dcrn) {
  case EBC0_CFGADDR:
  ret = ebc->addr;
@@ -496,9 +484,8 @@ static uint32_t dcr_read_ebc (void *opaque, int dcrn)
  
  static void dcr_write_ebc (void *opaque, int dcrn, uint32_t val)

  {
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(opaque);
  
-ebc = opaque;

  switch (dcrn) {
  case EBC0_CFGADDR:
  ebc->addr = val;
@@ -554,12 +541,11 @@ static void dcr_write_ebc (void *opaque, int dcrn, 
uint32_t val)
  }
  }
  
-static void ebc_reset (void *opaque)

+static void ppc405_ebc_reset(DeviceState *dev)
  {
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(dev);
  int i;
  
-ebc = opaque;

  ebc->addr = 0x;
  ebc->bap[0] = 0x7F8FFE80;
  ebc->bcr[0] = 0xFFE28000;
@@ -572,18 +558,46 @@ static void ebc_reset (void *opaque)
  ebc->cfg = 0x8040;
  }
  
-void ppc405_ebc_init(CPUPPCState *env)

+static void ppc405_ebc_realize(DeviceState *dev, Error **errp)
  {
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(dev);
+CPUPPCState *env;
+
+assert(ebc->cpu);
+
+env = >cpu->env;
  
-ebc = g_new0(ppc4xx_ebc_t, 1);

-qemu_register_reset(_reset, ebc);
  ppc_dcr_register(env, EBC0_CFGADDR,
   ebc, _read_ebc, _write_ebc);
  ppc_dcr_register(env, EBC0_CFGDATA,
   ebc, _read_ebc, _write_ebc);
  }
  
+static Property ppc405_ebc_properties[] = {

+DEFINE_PROP_LINK("cpu", Ppc405EbcState, cpu, TYPE_POWERPC_CPU,
+ PowerPCCPU *),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ppc405_ebc_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = ppc405_ebc_realize;
+dc->user_creatable = false;
+dc->reset = ppc405_ebc_reset;
+device_class_set_props(dc, ppc405_ebc_properties);
+}
+
+void ppc405_ebc_init(CPUPPCState *env)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
+
+object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);


This line is breaking the boot of sam460ex:


 ./qemu-system-ppc64 -display none -M sam460ex
Unexpected error in object_property_find_err() at ../qom/object.c:1304:
qemu-system-ppc64: Property '460exb-powerpc64-cpu.cpu' not found
Aborted (core dumped)


I think you meant to link the cpu prop of the EBC obj to the CPU object,
not the cpu prop of the CPU obj to the EBC dev.


This fixes the issue:


$ git diff
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 0166f3fc36..aac3a3f761 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -594,7 +594,7 @@ void ppc405_ebc_init(CPUPPCState *env)
 PowerPCCPU *cpu = env_archcpu(env);
 DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
 
-object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);

+object_property_set_link(OBJECT(dev), "cpu", OBJECT(cpu), _abort);
 qdev_realize_and_unref(dev, NULL, _fatal);
 }


Daniel



+qdev_realize_and_unref(dev, NULL, _fatal);
+}
+
  

[PATCH] hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[]

2022-08-03 Thread Daniel Henrique Barboza
We're not storing all GPIO lines we're retrieving with
qdev_get_gpio_in() in mal_irqs[]. We're storing just the last one in the
first index:

for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
}
ppc4xx_mal_init(env, 4, 16, mal_irqs);

mal_irqs is used in ppc4xx_mal_init() to assign the IRQs to MAL:

for (i = 0; i < 4; i++) {
mal->irqs[i] = irqs[i];
}

Since only irqs[0] has been initialized, mal->irqs[1,2,3] are being
zeroed.

This doesn´t seem to trigger any apparent issues at this moment, but
Cedric's QOMification of the MAL device [1] is executing a
sysbus_connect_irq() that will fail if we do not store all GPIO lines
properly.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00497.html

Cc: Peter Maydell 
Cc: BALATON Zoltan 
Fixes: 706e944206d7 ("hw/ppc/sam460ex: Drop use of ppcuic_init()")
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/sam460ex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 7e8da657c2..0357ee077f 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -384,7 +384,7 @@ static void sam460ex_init(MachineState *machine)
 
 /* MAL */
 for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
-mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
+mal_irqs[i] = qdev_get_gpio_in(uic[2], 3 + i);
 }
 ppc4xx_mal_init(env, 4, 16, mal_irqs);
 
-- 
2.36.1




Re: [PATCH v2 19/20] ppc/ppc405: QOM'ify I2C

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

Having an explicit I2C model object will help if one day we want to
add I2C devices on the bus.


Same here as with the UIC in previous patch, it's not QOMifying here 
either. As for why we may need I2C, on sam460ex the firmware detects RAM 
accessing the SPD data over I2C so that could be the reason but it may not 
be used here on 405.


Regards,
BALATON Zoltan


Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h|  2 ++
hw/ppc/ppc405_uc.c | 10 --
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index d29f738cd2d0..d13624ae309c 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -28,6 +28,7 @@
#include "qom/object.h"
#include "hw/ppc/ppc4xx.h"
#include "hw/intc/ppc-uic.h"
+#include "hw/i2c/ppc4xx_i2c.h"

#define PPC405EP_SDRAM_BASE 0x
#define PPC405EP_NVRAM_BASE 0xF000
@@ -256,6 +257,7 @@ struct Ppc405SoCState {
Ppc405OcmState ocm;
Ppc405GpioState gpio;
Ppc405DmaState dma;
+PPC4xxI2CState i2c;
Ppc405EbcState ebc;
Ppc405OpbaState opba;
Ppc405PobState pob;
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 5cd32e22b7ea..8f0caa45f5f7 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1461,6 +1461,8 @@ static void ppc405_soc_instance_init(Object *obj)

object_initialize_child(obj, "dma", >dma, TYPE_PPC405_DMA);

+object_initialize_child(obj, "i2c", >i2c, TYPE_PPC4xx_I2C);
+
object_initialize_child(obj, "ebc", >ebc, TYPE_PPC405_EBC);

object_initialize_child(obj, "opba", >opba, TYPE_PPC405_OPBA);
@@ -1569,8 +1571,12 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)
}

/* I2C controller */
-sysbus_create_simple(TYPE_PPC4xx_I2C, 0xef600500,
- qdev_get_gpio_in(DEVICE(>uic), 2));
+if (!sysbus_realize(SYS_BUS_DEVICE(>i2c), errp)) {
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>i2c), 0, 0xef600500);
+sysbus_connect_irq(SYS_BUS_DEVICE(>i2c), 0,
+   qdev_get_gpio_in(DEVICE(>uic), 2));

/* GPIO */
if (!sysbus_realize(SYS_BUS_DEVICE(>gpio), errp)) {


Re: [PATCH v2 18/20] ppc/ppc405: QOM'ify UIC

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h|  3 ++-
hw/ppc/ppc405_uc.c | 26 +-
2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 7d585a244d18..d29f738cd2d0 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -27,6 +27,7 @@

#include "qom/object.h"
#include "hw/ppc/ppc4xx.h"
+#include "hw/intc/ppc-uic.h"

#define PPC405EP_SDRAM_BASE 0x
#define PPC405EP_NVRAM_BASE 0xF000
@@ -249,7 +250,7 @@ struct Ppc405SoCState {
hwaddr ram_size;

PowerPCCPU cpu;
-DeviceState *uic;
+PPCUIC uic;


So this patch is probably misnamed as nothing is QOMified here, the UIC is 
already a QOM object, what happens is rather embedding it in the SoC 
instead of only storing a reference. The advantage of embedding is likely 
that it does not have to be freed so we don't need an exit function but it 
adds a bunch of casts to other places. As said before you probably should 
do the casts once and store it in a local if you need it more than once or 
twice.


Regards,
BALATON Zoltan


Ppc405CpcState cpc;
Ppc405GptState gpt;
Ppc405OcmState ocm;
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index f39e0b44f9cc..5cd32e22b7ea 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1448,6 +1448,8 @@ static void ppc405_soc_instance_init(Object *obj)
object_initialize_child(obj, "cpu", >cpu,
POWERPC_CPU_TYPE_NAME("405ep"));

+object_initialize_child(obj, "uic", >uic, TYPE_PPC_UIC);
+
object_initialize_child(obj, "cpc", >cpc, TYPE_PPC405_CPC);
object_property_add_alias(obj, "sys-clk", OBJECT(>cpc), "sys-clk");

@@ -1525,17 +1527,15 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)
sysbus_mmio_map(SYS_BUS_DEVICE(>opba), 0, 0xef600600);

/* Universal interrupt controller */
-s->uic = qdev_new(TYPE_PPC_UIC);
-
-object_property_set_link(OBJECT(s->uic), "cpu", OBJECT(>cpu),
+object_property_set_link(OBJECT(>uic), "cpu", OBJECT(>cpu),
 _fatal);
-if (!sysbus_realize(SYS_BUS_DEVICE(s->uic), errp)) {
+if (!sysbus_realize(SYS_BUS_DEVICE(>uic), errp)) {
return;
}

-sysbus_connect_irq(SYS_BUS_DEVICE(s->uic), PPCUIC_OUTPUT_INT,
+sysbus_connect_irq(SYS_BUS_DEVICE(>uic), PPCUIC_OUTPUT_INT,
   qdev_get_gpio_in(DEVICE(>cpu), PPC40x_INPUT_INT));
-sysbus_connect_irq(SYS_BUS_DEVICE(s->uic), PPCUIC_OUTPUT_CINT,
+sysbus_connect_irq(SYS_BUS_DEVICE(>uic), PPCUIC_OUTPUT_CINT,
   qdev_get_gpio_in(DEVICE(>cpu), PPC40x_INPUT_CINT));

/* SDRAM controller */
@@ -1545,7 +1545,7 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)
s->ram_bases[0] = 0;
s->ram_sizes[0] = s->ram_size;

-ppc4xx_sdram_init(env, qdev_get_gpio_in(s->uic, 17),
+ppc4xx_sdram_init(env, qdev_get_gpio_in(DEVICE(>uic), 17),
  ARRAY_SIZE(s->ram_memories), s->ram_memories,
  s->ram_bases, s->ram_sizes, s->do_dram_init);

@@ -1565,12 +1565,12 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)

for (i = 0; i < ARRAY_SIZE(s->dma.irqs); i++) {
sysbus_connect_irq(SYS_BUS_DEVICE(>dma), i,
-   qdev_get_gpio_in(s->uic, 5 + i));
+   qdev_get_gpio_in(DEVICE(>uic), 5 + i));
}

/* I2C controller */
sysbus_create_simple(TYPE_PPC4xx_I2C, 0xef600500,
- qdev_get_gpio_in(s->uic, 2));
+ qdev_get_gpio_in(DEVICE(>uic), 2));

/* GPIO */
if (!sysbus_realize(SYS_BUS_DEVICE(>gpio), errp)) {
@@ -1581,13 +1581,13 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)
/* Serial ports */
if (serial_hd(0) != NULL) {
serial_mm_init(get_system_memory(), 0xef600300, 0,
-   qdev_get_gpio_in(s->uic, 0),
+   qdev_get_gpio_in(DEVICE(>uic), 0),
   PPC_SERIAL_MM_BAUDBASE, serial_hd(0),
   DEVICE_BIG_ENDIAN);
}
if (serial_hd(1) != NULL) {
serial_mm_init(get_system_memory(), 0xef600400, 0,
-   qdev_get_gpio_in(s->uic, 1),
+   qdev_get_gpio_in(DEVICE(>uic), 1),
   PPC_SERIAL_MM_BAUDBASE, serial_hd(1),
   DEVICE_BIG_ENDIAN);
}
@@ -1607,7 +1607,7 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)

for (i = 0; i < ARRAY_SIZE(s->gpt.irqs); i++) {
sysbus_connect_irq(SYS_BUS_DEVICE(>gpt), i,
-   qdev_get_gpio_in(s->uic, 19 + i));
+   qdev_get_gpio_in(DEVICE(>uic), 19 + i));
}

/* MAL */
@@ -1621,7 +1621,7 @@ static void ppc405_soc_realize(DeviceState *dev, Error 
**errp)

for (i = 0; i < ARRAY_SIZE(s->mal.irqs); i++) {

Re: [PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Michael S. Tsirkin
On Wed, Aug 03, 2022 at 07:02:35PM +0200, Jason A. Donenfeld wrote:
> The boot parameter header refers to setup_data at an absolute address,
> and each setup_data refers to the next setup_data at an absolute address
> too. Currently QEMU simply puts the setup_datas right after the kernel
> image, and since the kernel_image is loaded at prot_addr -- a fixed
> address knowable to QEMU apriori -- the setup_data absolute address
> winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.
> 
> This mostly works fine, so long as the kernel image really is loaded at
> prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
> generally EFI doesn't give a good way of predicting where it's going to
> load the kernel. So when it loads it at some address != prot_addr, the
> absolute addresses in setup_data now point somewhere bogus, causing
> crashes when EFI stub tries to follow the next link.
> 
> Fix this by placing setup_data at some fixed place in memory, not as
> part of the kernel image, and then pointing the setup_data absolute
> address to that fixed place in memory. This way, even if OVMF or other
> chains relocate the kernel image, the boot parameter still points to the
> correct absolute address.
> 
> === NOTE NOTE NOTE NOTE NOTE ===
> This commit is currently garbage! It fixes the boot test case, but it
> just picks the address 0x1000. That's probably not a good idea. If
> somebody with some x86 architectural knowledge could let me know a
> better reserved place to put this, that'd be very appreciated.
> 
> Fixes: 3cbeb52467 ("hw/i386: add device tree support")
> Reported-by: Xiaoyao Li 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Peter Maydell 
> Cc: Michael S. Tsirkin 
> Cc: Daniel P. Berrangé 
> Cc: Gerd Hoffmann 
> Cc: Ard Biesheuvel 
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld 
> ---
>  hw/i386/x86.c | 38 +-
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 050eedc0c8..0b0083b345 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -773,9 +773,9 @@ void x86_load_linux(X86MachineState *x86ms,
>  bool linuxboot_dma_enabled = 
> X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
>  uint16_t protocol;
>  int setup_size, kernel_size, cmdline_size;
> -int dtb_size, setup_data_offset;
> +int dtb_size, setup_data_item_len, setup_data_total_len = 0;
>  uint32_t initrd_max;
> -uint8_t header[8192], *setup, *kernel;
> +uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
>  hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
> first_setup_data = 0;
>  FILE *f;
>  char *vmode;
> @@ -1048,6 +1048,8 @@ void x86_load_linux(X86MachineState *x86ms,
>  }
>  fclose(f);
>  
> +#define SETUP_DATA_PHYS_BASE 0x1000
> +
>  /* append dtb to kernel */
>  if (dtb_filename) {
>  if (protocol < 0x209) {
> @@ -1062,34 +1064,36 @@ void x86_load_linux(X86MachineState *x86ms,
>  exit(1);
>  }
>  
> -setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> -kernel_size = setup_data_offset + sizeof(struct setup_data) + 
> dtb_size;
> -kernel = g_realloc(kernel, kernel_size);
> -
> -
> -setup_data = (struct setup_data *)(kernel + setup_data_offset);
> +setup_data_item_len = sizeof(struct setup_data) + dtb_size;
> +setup_datas = g_realloc(setup_datas, setup_data_total_len + 
> setup_data_item_len);
> +setup_data = (struct setup_data *)(setup_datas + 
> setup_data_total_len);
>  setup_data->next = cpu_to_le64(first_setup_data);
> -first_setup_data = prot_addr + setup_data_offset;
> +first_setup_data = SETUP_DATA_PHYS_BASE + setup_data_total_len;
> +setup_data_total_len += setup_data_item_len;
>  setup_data->type = cpu_to_le32(SETUP_DTB);
>  setup_data->len = cpu_to_le32(dtb_size);
> -
>  load_image_size(dtb_filename, setup_data->data, dtb_size);
>  }
>  
>  if (!legacy_no_rng_seed) {
> -setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> -kernel_size = setup_data_offset + sizeof(struct setup_data) + 
> RNG_SEED_LENGTH;
> -kernel = g_realloc(kernel, kernel_size);
> -setup_data = (struct setup_data *)(kernel + setup_data_offset);
> +setup_data_item_len = sizeof(struct setup_data) + SETUP_RNG_SEED;
> +setup_datas = g_realloc(setup_datas, setup_data_total_len + 
> setup_data_item_len);
> +setup_data = (struct setup_data *)(setup_datas + 
> setup_data_total_len);
>  setup_data->next = cpu_to_le64(first_setup_data);
> -first_setup_data = prot_addr + setup_data_offset;
> +first_setup_data = SETUP_DATA_PHYS_BASE + setup_data_total_len;
> +setup_data_total_len += setup_data_item_len;
>  setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
>  setup_data->len = 

Re: [PATCH v2 12/20] ppc/ppc405: QOM'ify EBC

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h| 16 +++
hw/ppc/ppc405_uc.c | 71 +++---
2 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 1da34a7f10f3..1c7fe07b8084 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -65,7 +65,22 @@ struct ppc4xx_bd_info_t {

typedef struct Ppc405SoCState Ppc405SoCState;

+/* Peripheral controller */
+#define TYPE_PPC405_EBC "ppc405-ebc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405EbcState, PPC405_EBC);
+struct Ppc405EbcState {
+DeviceState parent_obj;
+
+PowerPCCPU *cpu;

+uint32_t addr;
+uint32_t bcr[8];
+uint32_t bap[8];
+uint32_t bear;
+uint32_t besr0;
+uint32_t besr1;
+uint32_t cfg;
+};

/* DMA controller */
#define TYPE_PPC405_DMA "ppc405-dma"
@@ -203,6 +218,7 @@ struct Ppc405SoCState {
Ppc405OcmState ocm;
Ppc405GpioState gpio;
Ppc405DmaState dma;
+Ppc405EbcState ebc;
};

/* PowerPC 405 core */
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 6bd93c1cb90c..0166f3fc36da 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -393,17 +393,6 @@ static void ppc4xx_opba_init(hwaddr base)

/*/
/* Peripheral controller */
-typedef struct ppc4xx_ebc_t ppc4xx_ebc_t;
-struct ppc4xx_ebc_t {
-uint32_t addr;
-uint32_t bcr[8];
-uint32_t bap[8];
-uint32_t bear;
-uint32_t besr0;
-uint32_t besr1;
-uint32_t cfg;
-};
-
enum {
EBC0_CFGADDR = 0x012,
EBC0_CFGDATA = 0x013,
@@ -411,10 +400,9 @@ enum {

static uint32_t dcr_read_ebc (void *opaque, int dcrn)
{
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(opaque);
uint32_t ret;

-ebc = opaque;


I think QOM casts are kind of expensive (maybe because we have quo-debug 
enabled by default even without --enable-debug and it does additional 
checks; I've tried to change this default once but it was thought to be 
better to have it enabled). So it's advised to use QOM casts sparingly, 
e.g. store the result in a local variable if you need it more than once 
and so. Therefore I tend to consider these read/write callbacks that the 
object itself registers with itself as the opaque pointer to be internal 
to the object and guaranteed to be passed the object pointer so no QOM 
cast is necessary and the direct assignment can be kept. This avoids 
potential overhead on every register access. Not sure if it's measurable 
but I think if an overhead can be avoided it probably should be.



switch (dcrn) {
case EBC0_CFGADDR:
ret = ebc->addr;
@@ -496,9 +484,8 @@ static uint32_t dcr_read_ebc (void *opaque, int dcrn)

static void dcr_write_ebc (void *opaque, int dcrn, uint32_t val)
{
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(opaque);

-ebc = opaque;
switch (dcrn) {
case EBC0_CFGADDR:
ebc->addr = val;
@@ -554,12 +541,11 @@ static void dcr_write_ebc (void *opaque, int dcrn, 
uint32_t val)
}
}

-static void ebc_reset (void *opaque)
+static void ppc405_ebc_reset(DeviceState *dev)
{
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(dev);


In this case the cast is OK as it's casting a different object so it's 
needed and also it's infrequently called so should not matter.



int i;

-ebc = opaque;
ebc->addr = 0x;
ebc->bap[0] = 0x7F8FFE80;
ebc->bcr[0] = 0xFFE28000;
@@ -572,18 +558,46 @@ static void ebc_reset (void *opaque)
ebc->cfg = 0x8040;
}

-void ppc405_ebc_init(CPUPPCState *env)
+static void ppc405_ebc_realize(DeviceState *dev, Error **errp)
{
-ppc4xx_ebc_t *ebc;
+Ppc405EbcState *ebc = PPC405_EBC(dev);
+CPUPPCState *env;
+
+assert(ebc->cpu);
+
+env = >cpu->env;

-ebc = g_new0(ppc4xx_ebc_t, 1);
-qemu_register_reset(_reset, ebc);
ppc_dcr_register(env, EBC0_CFGADDR,
 ebc, _read_ebc, _write_ebc);
ppc_dcr_register(env, EBC0_CFGDATA,
 ebc, _read_ebc, _write_ebc);
}

+static Property ppc405_ebc_properties[] = {
+DEFINE_PROP_LINK("cpu", Ppc405EbcState, cpu, TYPE_POWERPC_CPU,
+ PowerPCCPU *),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ppc405_ebc_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = ppc405_ebc_realize;
+dc->user_creatable = false;
+dc->reset = ppc405_ebc_reset;
+device_class_set_props(dc, ppc405_ebc_properties);
+}
+
+void ppc405_ebc_init(CPUPPCState *env)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
+
+object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), _abort);
+qdev_realize_and_unref(dev, NULL, _fatal);
+}
+
/*/
/* DMA controller */
enum {
@@ -1418,6 +1432,8 

Re: [PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld
Hi Michael,

On Wed, Aug 03, 2022 at 06:25:39PM -0400, Michael S. Tsirkin wrote:
> > -/* Offset 0x250 is a pointer to the first setup_data link. */
> > -stq_p(header + 0x250, first_setup_data);
> > +if (first_setup_data) {
> > +/* Offset 0x250 is a pointer to the first setup_data link. */
> > +stq_p(header + 0x250, first_setup_data);
> > +rom_add_blob("setup_data", setup_datas, setup_data_total_len, 
> > setup_data_total_len,
> > + SETUP_DATA_PHYS_BASE, NULL, NULL, NULL, NULL, 
> > false);
> > +}
> > +
> >
> 
> Allocating memory on x86 is tricky business.  Can we maybe use 
> bios-linker-loader
> with COMMAND_WRITE_POINTER to get an address from firmware?

Hmm. Is BIOSLinker even available to us at this stage in preparation?

One thing to note is that this memory doesn't really need to be
persistent. It's only used extrmely early in boot. So it could be
somewhere that gets used/remapped later on.

Jason



Re: [PATCH v2 05/20] ppc/ppc405: Start QOMification of the SoC

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

This moves all the code previously done in the ppc405ep_init() routine
under ppc405_soc_realize(). We can also adjust the number of banks now
that we have control on ppc4xx_sdram_init().

Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h|  16 ++---
hw/ppc/ppc405_boards.c |  12 ++--
hw/ppc/ppc405_uc.c | 144 -
3 files changed, 83 insertions(+), 89 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index c8cddb71733a..2c912b328eaf 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -72,11 +72,16 @@ struct Ppc405SoCState {

/* Public */
MemoryRegion sram;
-MemoryRegion ram_memories[2];
-hwaddr ram_bases[2], ram_sizes[2];
+MemoryRegion ram_memories[1];
+hwaddr ram_bases[1], ram_sizes[1];
+bool do_dram_init;


I'm not sure about this. First of all what's the point having a 1 element 
array instead of just a normal field if you don't need more than one of 
these? (But then the names in plural become a misnomer too.) On the other 
hand the SoC likely has two banks, it's just that the board only has one 
socket and thus only uses one of it but other boards could have two 
sockets and use both. If the SoC model already has this I'd keep it for 
that cases or to emulate the SoC more precisely. But I may be wrong, I 
haven't checked the chip docs and only dimly remember how this was on 
460EX.


Regards,
BALATON Zoltan



MemoryRegion *dram_mr;
hwaddr ram_size;
+
+uint32_t sysclk;
+PowerPCCPU *cpu;
+DeviceState *uic;
};

/* PowerPC 405 core */
@@ -85,11 +90,4 @@ ram_addr_t ppc405_set_bootinfo(CPUPPCState *env, ram_addr_t 
ram_size);
void ppc4xx_plb_init(CPUPPCState *env);
void ppc405_ebc_init(CPUPPCState *env);

-PowerPCCPU *ppc405ep_init(MemoryRegion *address_space_mem,
-MemoryRegion ram_memories[2],
-hwaddr ram_bases[2],
-hwaddr ram_sizes[2],
-uint32_t sysclk, DeviceState **uicdev,
-int do_init);
-
#endif /* PPC405_H */
diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 96db52c5a309..363cb0770506 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -237,9 +237,7 @@ static void ppc405_init(MachineState *machine)
Ppc405MachineState *ppc405 = PPC405_MACHINE(machine);
MachineClass *mc = MACHINE_GET_CLASS(machine);
const char *kernel_filename = machine->kernel_filename;
-PowerPCCPU *cpu;
MemoryRegion *sysmem = get_system_memory();
-DeviceState *uicdev;

if (machine->ram_size != mc->default_ram_size) {
char *sz = size_to_str(mc->default_ram_size);
@@ -254,12 +252,12 @@ static void ppc405_init(MachineState *machine)
 machine->ram_size, _fatal);
object_property_set_link(OBJECT(>soc), "dram",
 OBJECT(machine->ram), _abort);
+object_property_set_bool(OBJECT(>soc), "dram-init",
+ !(kernel_filename == NULL), _abort);
+object_property_set_uint(OBJECT(>soc), "sys-clk", ,
+ _abort);
qdev_realize(DEVICE(>soc), NULL, _abort);

-cpu = ppc405ep_init(sysmem, ppc405->soc.ram_memories, 
ppc405->soc.ram_bases,
-ppc405->soc.ram_sizes,
-, , kernel_filename == NULL ? 0 : 1);
-
/* allocate and load BIOS */
if (machine->firmware) {
MemoryRegion *bios = g_new(MemoryRegion, 1);
@@ -315,7 +313,7 @@ static void ppc405_init(MachineState *machine)

/* Load ELF kernel and rootfs.cpio */
} else if (kernel_filename && !machine->firmware) {
-boot_from_kernel(machine, cpu);
+boot_from_kernel(machine, ppc405->soc.cpu);
}
}

diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 7033bac6bf3f..ed1099e08bbd 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1432,130 +1432,128 @@ static void ppc405ep_cpc_init (CPUPPCState *env, 
clk_setup_t clk_setup[8],
#endif
}

-PowerPCCPU *ppc405ep_init(MemoryRegion *address_space_mem,
-MemoryRegion ram_memories[2],
-hwaddr ram_bases[2],
-hwaddr ram_sizes[2],
-uint32_t sysclk, DeviceState **uicdevp,
-int do_init)
+static void ppc405_soc_realize(DeviceState *dev, Error **errp)
{
+Ppc405SoCState *s = PPC405_SOC(dev);
clk_setup_t clk_setup[PPC405EP_CLK_NB], tlb_clk_setup;
qemu_irq dma_irqs[4], gpt_irqs[5], mal_irqs[4];
-PowerPCCPU *cpu;
CPUPPCState *env;
-DeviceState *uicdev;
-SysBusDevice *uicsbd;
+Error *err = NULL;
+
+/* allocate SRAM */
+memory_region_init_ram(>sram, OBJECT(s), "ppc405.sram",
+   PPC405EP_SRAM_SIZE,  );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+memory_region_add_subregion(get_system_memory(), 

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Michael S. Tsirkin
On Thu, Aug 04, 2022 at 12:08:07AM +0200, Jason A. Donenfeld wrote:
> Hi Michael,
> 
> On Wed, Aug 03, 2022 at 06:03:20PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Aug 03, 2022 at 07:07:52PM +0200, Jason A. Donenfeld wrote:
> > > On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> > > > On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> > > > > Thanks for the info. Very helpful. Looking into it now.
> > > > 
> > > > So interestingly, this is not a new issue. If you pass any type of setup
> > > > data, OVMF appears to be doing something unusual and passing 0x
> > > > for all the entries, rather than the actual data. The reason this isn't
> > > > new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> > > > same page fault, on all QEMU versions. The thing that passes the DTB is
> > > > the thing that passes the RNG seed. Same mechanism, same bug.
> > > > 
> > > > I'm looking into it...
> > > 
> > > Fixed with: 
> > > https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/
> > > 
> > > Feel free to join into the discussion there. I CC'd you.
> > > 
> > > Jason
> > 
> > Hmm I don't think this patch will make it in 7.1 given the
> > timeframe. I suspect we should revert the patch for now.
> > 
> > Which is where you maybe begin to see why we generally
> > prefer doing it with features - one can then work around
> > bugs by turning the feature on and off.
> 
> The bug actually precedes this patch. Just boot with -dtb on any qemu
> version and you'll trigger it.

Sure but it's still a regression.

> We're still at rc0; there should be time
> enough for a bug fix. Please do chime in on that thread and maybe we can
> come up with something reasonable fast enough.
> 
> Jason

Maybe.

-- 
MST




Re: [PATCH v2 04/20] ppc/ppc405: Introduce a PPC405 SoC

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

It is an initial model to start QOMification of the PPC405 board.
QOM'ified devices will be reintroduced one by one. Start with the
memory regions, which name prefix is changed to "ppc405".


I'm not a native speaker but "which name prefix" sounds weird to me here. 
Maybe something like the name prefix of which, with their name prefix or 
maybe whose name prefix is probably better but some English speaker would 
better review this.




Also, initialize only one RAM bank. The second bank is a dummy one
(zero size) which is here to match the hard coded number of banks in
ppc405ep_init(). We will adjust this number when ppc4xx_sdram_init()
can be called directly, after we have replaced ppc405ep_init().

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h| 17 +++
hw/ppc/ppc405_boards.c | 29 +++--
hw/ppc/ppc405_uc.c | 49 ++
3 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 83f156f585c8..c8cddb71733a 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -25,6 +25,7 @@
#ifndef PPC405_H
#define PPC405_H

+#include "qom/object.h"
#include "hw/ppc/ppc4xx.h"

#define PPC405EP_SDRAM_BASE 0x
@@ -62,6 +63,22 @@ struct ppc4xx_bd_info_t {
uint32_t bi_iic_fast[2];
};

+#define TYPE_PPC405_SOC "ppc405-soc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405SoCState, PPC405_SOC);
+
+struct Ppc405SoCState {
+/* Private */
+DeviceState parent_obj;
+
+/* Public */
+MemoryRegion sram;
+MemoryRegion ram_memories[2];
+hwaddr ram_bases[2], ram_sizes[2];
+
+MemoryRegion *dram_mr;
+hwaddr ram_size;
+};
+
/* PowerPC 405 core */
ram_addr_t ppc405_set_bootinfo(CPUPPCState *env, ram_addr_t ram_size);

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 24ec948d22a4..96db52c5a309 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -54,6 +54,8 @@ struct Ppc405MachineState {
/* Private */
MachineState parent_obj;
/* Public */
+
+Ppc405SoCState soc;
};

#define TYPE_PPC405_MACHINE MACHINE_TYPE_NAME("ppc405")
@@ -232,12 +234,10 @@ static void boot_from_kernel(MachineState *machine, 
PowerPCCPU *cpu)

static void ppc405_init(MachineState *machine)
{
+Ppc405MachineState *ppc405 = PPC405_MACHINE(machine);
MachineClass *mc = MACHINE_GET_CLASS(machine);
const char *kernel_filename = machine->kernel_filename;
PowerPCCPU *cpu;
-MemoryRegion *sram = g_new(MemoryRegion, 1);
-MemoryRegion *ram_memories = g_new(MemoryRegion, 2);
-hwaddr ram_bases[2], ram_sizes[2];
MemoryRegion *sysmem = get_system_memory();
DeviceState *uicdev;

@@ -248,23 +248,18 @@ static void ppc405_init(MachineState *machine)
exit(EXIT_FAILURE);
}

-/* XXX: fix this */
-memory_region_init_alias(_memories[0], NULL, "ef405ep.ram.alias",
- machine->ram, 0, machine->ram_size);
-ram_bases[0] = 0;
-ram_sizes[0] = machine->ram_size;
-memory_region_init(_memories[1], NULL, "ef405ep.ram1", 0);
-ram_bases[1] = 0x;
-ram_sizes[1] = 0x;
+object_initialize_child(OBJECT(machine), "soc", >soc,
+TYPE_PPC405_SOC);
+object_property_set_uint(OBJECT(>soc), "ram-size",
+ machine->ram_size, _fatal);
+object_property_set_link(OBJECT(>soc), "dram",
+ OBJECT(machine->ram), _abort);
+qdev_realize(DEVICE(>soc), NULL, _abort);

-cpu = ppc405ep_init(sysmem, ram_memories, ram_bases, ram_sizes,
+cpu = ppc405ep_init(sysmem, ppc405->soc.ram_memories, 
ppc405->soc.ram_bases,
+ppc405->soc.ram_sizes,
, , kernel_filename == NULL ? 0 : 1);

-/* allocate SRAM */
-memory_region_init_ram(sram, NULL, "ef405ep.sram", PPC405EP_SRAM_SIZE,
-   _fatal);
-memory_region_add_subregion(sysmem, PPC405EP_SRAM_BASE, sram);
-
/* allocate and load BIOS */
if (machine->firmware) {
MemoryRegion *bios = g_new(MemoryRegion, 1);
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index d6420c88d3a6..7033bac6bf3f 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -30,6 +30,7 @@
#include "hw/ppc/ppc.h"
#include "hw/i2c/ppc4xx_i2c.h"
#include "hw/irq.h"
+#include "hw/qdev-properties.h"
#include "ppc405.h"
#include "hw/char/serial.h"
#include "qemu/timer.h"
@@ -1530,3 +1531,51 @@ PowerPCCPU *ppc405ep_init(MemoryRegion 
*address_space_mem,

return cpu;
}
+
+static void ppc405_soc_realize(DeviceState *dev, Error **errp)
+{
+Ppc405SoCState *s = PPC405_SOC(dev);
+Error *err = NULL;
+
+memory_region_init_alias(>ram_memories[0], OBJECT(s),
+ "ppc405.ram.alias", s->dram_mr, 0, s->ram_size);
+s->ram_bases[0] = 0;
+s->ram_sizes[0] = s->ram_size;
+
+/* allocate SRAM */

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Jason A. Donenfeld
Hi Michael,

On Wed, Aug 03, 2022 at 06:03:20PM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 03, 2022 at 07:07:52PM +0200, Jason A. Donenfeld wrote:
> > On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> > > On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> > > > Thanks for the info. Very helpful. Looking into it now.
> > > 
> > > So interestingly, this is not a new issue. If you pass any type of setup
> > > data, OVMF appears to be doing something unusual and passing 0x
> > > for all the entries, rather than the actual data. The reason this isn't
> > > new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> > > same page fault, on all QEMU versions. The thing that passes the DTB is
> > > the thing that passes the RNG seed. Same mechanism, same bug.
> > > 
> > > I'm looking into it...
> > 
> > Fixed with: 
> > https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/
> > 
> > Feel free to join into the discussion there. I CC'd you.
> > 
> > Jason
> 
> Hmm I don't think this patch will make it in 7.1 given the
> timeframe. I suspect we should revert the patch for now.
> 
> Which is where you maybe begin to see why we generally
> prefer doing it with features - one can then work around
> bugs by turning the feature on and off.

The bug actually precedes this patch. Just boot with -dtb on any qemu
version and you'll trigger it. We're still at rc0; there should be time
enough for a bug fix. Please do chime in on that thread and maybe we can
come up with something reasonable fast enough.

Jason



Re: [PATCH v2 02/20] ppc/ppc405: Introduce a PPC405 generic machine

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

We will use this machine as a base to define the ref405ep and possibly
the PPC405 hotfoot board as found in the Linux kernel.

Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405_boards.c | 31 ---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 1a4e7588c584..4c269b6526a5 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -50,6 +50,15 @@

#define USE_FLASH_BIOS

+struct Ppc405MachineState {
+/* Private */
+MachineState parent_obj;
+/* Public */
+};
+
+#define TYPE_PPC405_MACHINE MACHINE_TYPE_NAME("ppc405")
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405MachineState, PPC405_MACHINE);
+
/*/
/* PPC405EP reference board (IBM) */
/* Standalone board with:
@@ -332,18 +341,34 @@ static void ref405ep_class_init(ObjectClass *oc, void 
*data)

mc->desc = "ref405ep";
mc->init = ref405ep_init;
-mc->default_ram_size = 0x0800;
-mc->default_ram_id = "ef405ep.ram";
}

static const TypeInfo ref405ep_type = {
.name = MACHINE_TYPE_NAME("ref405ep"),
-.parent = TYPE_MACHINE,
+.parent = TYPE_PPC405_MACHINE,
.class_init = ref405ep_class_init,
};

+static void ppc405_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+mc->desc = "PPC405 generic machine";
+mc->default_ram_size = 0x0800;
+mc->default_ram_id = "ppc405.ram";


Is the default RAM size a property of specific boards or the PPC405? I 
think it could be different for different boards so don't see why it's 
moved to the generic machine but maybe it has something to do with how 
other parts of QEMU handles this or I'm not getting what the generic 
PPC405 machine is for.


Would it be clearer to just write 128 * MiB instead of a long hex number 
with extra zeros that's hard to read? It would be a good opportunity to 
change it here.


Regards,
BALATON Zoltan


+}
+
+static const TypeInfo ppc405_machine_type = {
+.name = TYPE_PPC405_MACHINE,
+.parent = TYPE_MACHINE,
+.instance_size = sizeof(Ppc405MachineState),
+.class_init = ppc405_machine_class_init,
+.abstract = true,
+};
+
static void ppc405_machine_init(void)
{
+type_register_static(_machine_type);
type_register_static(_type);
}



Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Michael S. Tsirkin
On Wed, Aug 03, 2022 at 07:07:52PM +0200, Jason A. Donenfeld wrote:
> On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> > On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> > > Thanks for the info. Very helpful. Looking into it now.
> > 
> > So interestingly, this is not a new issue. If you pass any type of setup
> > data, OVMF appears to be doing something unusual and passing 0x
> > for all the entries, rather than the actual data. The reason this isn't
> > new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> > same page fault, on all QEMU versions. The thing that passes the DTB is
> > the thing that passes the RNG seed. Same mechanism, same bug.
> > 
> > I'm looking into it...
> 
> Fixed with: 
> https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/
> 
> Feel free to join into the discussion there. I CC'd you.
> 
> Jason

Hmm I don't think this patch will make it in 7.1 given the
timeframe. I suspect we should revert the patch for now.

Which is where you maybe begin to see why we generally
prefer doing it with features - one can then work around
bugs by turning the feature on and off.

-- 
MST




Re: [PATCH for-7.1] hw/mips/malta: turn off x86 specific features of PIIX4_PM

2022-08-03 Thread Michael S. Tsirkin
On Thu, Jul 28, 2022 at 07:50:34AM -0400, Igor Mammedov wrote:
> QEMU crashes trying to save VMSTATE when only MIPS target are compiled in
>   $ qemu-system-mips -monitor stdio
>   (qemu) migrate "exec:gzip -c > STATEFILE.gz"
>   Segmentation fault (core dumped)
> 
> It happens due to PIIX4_PM trying to parse hotplug vmstate structures
> which are valid only for x86 and not for MIPS (as it requires ACPI
> tables support which is not existent for ithe later)
> 
> Issue was probably exposed by trying to cleanup/compile out unused
> ACPI bits from MIPS target (but forgetting about migration bits).
> 
> Disable compiled out features using compat properties as the least
> risky way to deal with issue.
> 
> Signed-off-by: Igor Mammedov 


For 7.1 this seems like the lesser evil.

Acked-by: Michael S. Tsirkin 

> ---
> PS:
> another approach could be setting defaults to disabled state and
> enabling them using compat props on PC machines (which is more
> code to deal with => more risky) or continue with PIIX4_PM
> refactoring to split x86-shism out (which I'm not really
> interested in due to risk of regressions for not much of
> benefit)
> ---
>  hw/mips/malta.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/mips/malta.c b/hw/mips/malta.c
> index 7a0ec513b0..0e932988e0 100644
> --- a/hw/mips/malta.c
> +++ b/hw/mips/malta.c
> @@ -1442,6 +1442,14 @@ static const TypeInfo mips_malta_device = {
>  .instance_init = mips_malta_instance_init,
>  };
>  
> +GlobalProperty malta_compat[] = {
> +{ "PIIX4_PM", "memory-hotplug-support", "off" },
> +{ "PIIX4_PM", "acpi-pci-hotplug-with-bridge-support", "off" },
> +{ "PIIX4_PM", "acpi-root-pci-hotplug", "off" },
> +{ "PIIX4_PM", "x-not-migrate-acpi-index", "true" },
> +};
> +const size_t malta_compat_len = G_N_ELEMENTS(malta_compat);
> +
>  static void mips_malta_machine_init(MachineClass *mc)
>  {
>  mc->desc = "MIPS Malta Core LV";
> @@ -1455,6 +1463,7 @@ static void mips_malta_machine_init(MachineClass *mc)
>  mc->default_cpu_type = MIPS_CPU_TYPE_NAME("24Kf");
>  #endif
>  mc->default_ram_id = "mips_malta.ram";
> +compat_props_add(mc->compat_props, malta_compat, malta_compat_len);
>  }
>  
>  DEFINE_MACHINE("malta", mips_malta_machine_init)
> -- 
> 2.31.1




Re: [PATCH for-7.1] hw/misc/grlib_ahb_apb_pnp: Support 8 and 16 bit accesses

2022-08-03 Thread Philippe Mathieu-Daudé via
On Tue, Aug 2, 2022 at 4:33 PM Peter Maydell  wrote:
>
> On Tue, 2 Aug 2022 at 15:20, Konrad, Frederic  wrote:
> >
> > Hi Peter,
> >
> > CC'ing Philippe.
> >
> > > -Original Message-
> > > From: Qemu-devel  > > bounces+fkonrad=amd@nongnu.org> On Behalf Of Peter Maydell
> > > Sent: 02 August 2022 14:19
> > > To: qemu-devel@nongnu.org
> > > Cc: Fabien Chouteau ; Frederic Konrad
> > > 
> > > Subject: [PATCH for-7.1] hw/misc/grlib_ahb_apb_pnp: Support 8 and 16 bit
> > > accesses
> > >
> > > In real hardware, the APB and AHB PNP data tables can be accessed
> > > with byte and halfword reads as well as word reads.  Our
> > > implementation currently only handles word reads.  Add support for
> > > the 8 and 16 bit accesses.  Note that we only need to handle aligned
> > > accesses -- unaligned accesses should continue to trap, as happens on
> > > hardware.
> > >
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1132
> > > Signed-off-by: Peter Maydell 
> > > ---
> > > It would be nice if we could just set the .valid.min_access_size in
> > > the MemoryRegionOps to 1 and have the memory system core synthesize
> > > the 1 and 2 byte accesses from a 4 byte read, but currently that
> > > doesn't work (see various past mailing list threads).

Hmm sorry I missed the past threads, the one I remember is about
unaligned accesses
(https://lore.kernel.org/qemu-devel/20170630030058.28943-1-and...@aj.id.au/).

> > That looks good to me but I thought this was fixed by 1a5a5570 and 0fbe394a
> > because RTEMS do bytes accesses?
> >
> > Did that break at some point?
>
> I definitely tried letting the .impl vs .valid settings handle this,
> but the access_with_adjusted_size() code doesn't do the right thing.
> (In particular, the test case ELF in the bug report works with
> this patch, and doesn't work without it...)
>
> I'm pretty sure the problem with access_with_adjusted_size() is a
> long-standing bug -- I found a couple of mailing list threads about
> it. We really ought to fix that properly, but that's definitely not
> for-7.1 material.

I agree this is sufficient for 7.1, so:
Reviewed-by: Philippe Mathieu-Daudé 

But I'd rather keep simple implementations (.impl) and use .valid fields,
adjusting with access_with_adjusted_size().

(At least we now have an ELF reproducer.)

One problem I remember is when PCI is involved.
Here I could only get MMIO working, but not PCI:
https://lore.kernel.org/qemu-devel/20200817161853.593247-1-f4...@amsat.org/

Commit 98f52cdbb5 ("memory: Fix access_with_adjusted_size(small size)
on big-endian memory regions") might be incomplete...

Regards,

Phil.



Re: [PATCH v7 2/3] target/riscv: Add stimecmp support

2022-08-03 Thread Atish Kumar Patra
On Wed, Aug 3, 2022 at 3:26 AM Ben Dooks  wrote:

> On 03/08/2022 09:25, Atish Patra wrote:
> > stimecmp allows the supervisor mode to update stimecmp CSR directly
> > to program the next timer interrupt. This CSR is part of the Sstc
> > extension which was ratified recently.
> >
> > Signed-off-by: Atish Patra 
> > ---
> >   target/riscv/cpu.c | 12 +
> >   target/riscv/cpu.h |  5 ++
> >   target/riscv/cpu_bits.h|  4 ++
> >   target/riscv/csr.c | 81 +++
> >   target/riscv/machine.c |  1 +
> >   target/riscv/meson.build   |  3 +-
> >   target/riscv/time_helper.c | 98 ++
> >   target/riscv/time_helper.h | 30 
> >   8 files changed, 233 insertions(+), 1 deletion(-)
> >   create mode 100644 target/riscv/time_helper.c
> >   create mode 100644 target/riscv/time_helper.h
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index d4635c7df46b..e0c3e786849f 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -23,6 +23,7 @@
> >   #include "qemu/log.h"
> >   #include "cpu.h"
> >   #include "internals.h"
> > +#include "time_helper.h"
> >   #include "exec/exec-all.h"
> >   #include "qapi/error.h"
> >   #include "qemu/error-report.h"
> > @@ -99,6 +100,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
> >   ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
> >   ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
> >   ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0,
> ext_zhinxmin),
> > +ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc),
> >   ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0,
> ext_svinval),
> >   ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0,
> ext_svnapot),
> >   ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
> > @@ -675,6 +677,13 @@ static void riscv_cpu_realize(DeviceState *dev,
> Error **errp)
> >
> >   set_resetvec(env, cpu->cfg.resetvec);
> >
> > +#ifndef CONFIG_USER_ONLY
> > +if (cpu->cfg.ext_sstc) {
> > +riscv_timer_init(cpu);
> > +}
> > +#endif /* CONFIG_USER_ONLY */
> > +
> > +
> >   /* Validate that MISA_MXL is set properly. */
> >   switch (env->misa_mxl_max) {
> >   #ifdef TARGET_RISCV64
> > @@ -968,7 +977,9 @@ static void riscv_cpu_init(Object *obj)
> >   #ifndef CONFIG_USER_ONLY
> >   qdev_init_gpio_in(DEVICE(cpu), riscv_cpu_set_irq,
> > IRQ_LOCAL_MAX + IRQ_LOCAL_GUEST_MAX);
> > +
> >   #endif /* CONFIG_USER_ONLY */
> > +
> >   }
> >
> >   static Property riscv_cpu_extensions[] = {
> > @@ -995,6 +1006,7 @@ static Property riscv_cpu_extensions[] = {
> >   DEFINE_PROP_BOOL("Zve64f", RISCVCPU, cfg.ext_zve64f, false),
> >   DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
> >   DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
> > +DEFINE_PROP_BOOL("sstc", RISCVCPU, cfg.ext_sstc, true),
> >
> >   DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
> >   DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 0fae1569945c..4cda2905661e 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -309,6 +309,9 @@ struct CPUArchState {
> >   uint64_t mfromhost;
> >   uint64_t mtohost;
> >
> > +/* Sstc CSRs */
> > +uint64_t stimecmp;
> > +
> >   /* physical memory protection */
> >   pmp_table_t pmp_state;
> >   target_ulong mseccfg;
> > @@ -362,6 +365,7 @@ struct CPUArchState {
> >   float_status fp_status;
> >
> >   /* Fields from here on are preserved across CPU reset. */
> > +QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
> >
> >   hwaddr kernel_addr;
> >   hwaddr fdt_addr;
> > @@ -425,6 +429,7 @@ struct RISCVCPUConfig {
> >   bool ext_ifencei;
> >   bool ext_icsr;
> >   bool ext_zihintpause;
> > +bool ext_sstc;
> >   bool ext_svinval;
> >   bool ext_svnapot;
> >   bool ext_svpbmt;
> > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > index 6be5a9e9f046..ac17cf1515c0 100644
> > --- a/target/riscv/cpu_bits.h
> > +++ b/target/riscv/cpu_bits.h
> > @@ -206,6 +206,10 @@
> >   #define CSR_STVAL   0x143
> >   #define CSR_SIP 0x144
> >
> > +/* Sstc supervisor CSRs */
> > +#define CSR_STIMECMP0x14D
> > +#define CSR_STIMECMPH   0x15D
> > +
> >   /* Supervisor Protection and Translation */
> >   #define CSR_SPTBR   0x180
> >   #define CSR_SATP0x180
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 0fb042b2fd0f..b71e2509b64f 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -22,6 +22,7 @@
> >   #include "qemu/timer.h"
> >   #include "cpu.h"
> >   #include "pmu.h"
> > +#include "time_helper.h"
> >   #include "qemu/main-loop.h"
> >   #include "exec/exec-all.h"
> >   #include 

Re: [PATCH v7 2/3] target/riscv: Add stimecmp support

2022-08-03 Thread Atish Kumar Patra
On Wed, Aug 3, 2022 at 1:42 AM Weiwei Li  wrote:

>
> 在 2022/8/3 下午4:25, Atish Patra 写道:
> > stimecmp allows the supervisor mode to update stimecmp CSR directly
> > to program the next timer interrupt. This CSR is part of the Sstc
> > extension which was ratified recently.
> >
> > Signed-off-by: Atish Patra 
> > ---
> >   target/riscv/cpu.c | 12 +
> >   target/riscv/cpu.h |  5 ++
> >   target/riscv/cpu_bits.h|  4 ++
> >   target/riscv/csr.c | 81 +++
> >   target/riscv/machine.c |  1 +
> >   target/riscv/meson.build   |  3 +-
> >   target/riscv/time_helper.c | 98 ++
> >   target/riscv/time_helper.h | 30 
> >   8 files changed, 233 insertions(+), 1 deletion(-)
> >   create mode 100644 target/riscv/time_helper.c
> >   create mode 100644 target/riscv/time_helper.h
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index d4635c7df46b..e0c3e786849f 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -23,6 +23,7 @@
> >   #include "qemu/log.h"
> >   #include "cpu.h"
> >   #include "internals.h"
> > +#include "time_helper.h"
> >   #include "exec/exec-all.h"
> >   #include "qapi/error.h"
> >   #include "qemu/error-report.h"
> > @@ -99,6 +100,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
> >   ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
> >   ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
> >   ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0,
> ext_zhinxmin),
> > +ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc),
> >   ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0,
> ext_svinval),
> >   ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0,
> ext_svnapot),
> >   ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
> > @@ -675,6 +677,13 @@ static void riscv_cpu_realize(DeviceState *dev,
> Error **errp)
> >
> >   set_resetvec(env, cpu->cfg.resetvec);
> >
> > +#ifndef CONFIG_USER_ONLY
> > +if (cpu->cfg.ext_sstc) {
> > +riscv_timer_init(cpu);
> > +}
> > +#endif /* CONFIG_USER_ONLY */
> > +
> > +
>
> multi blink line here.
>
>
Fixed it.


> Regards,
>
> Weiwei Li
>
> >   /* Validate that MISA_MXL is set properly. */
> >   switch (env->misa_mxl_max) {
> >   #ifdef TARGET_RISCV64
> > @@ -968,7 +977,9 @@ static void riscv_cpu_init(Object *obj)
> >   #ifndef CONFIG_USER_ONLY
> >   qdev_init_gpio_in(DEVICE(cpu), riscv_cpu_set_irq,
> > IRQ_LOCAL_MAX + IRQ_LOCAL_GUEST_MAX);
> > +
> >   #endif /* CONFIG_USER_ONLY */
> > +
> >   }
> >
> >   static Property riscv_cpu_extensions[] = {
> > @@ -995,6 +1006,7 @@ static Property riscv_cpu_extensions[] = {
> >   DEFINE_PROP_BOOL("Zve64f", RISCVCPU, cfg.ext_zve64f, false),
> >   DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
> >   DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
> > +DEFINE_PROP_BOOL("sstc", RISCVCPU, cfg.ext_sstc, true),
> >
> >   DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
> >   DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 0fae1569945c..4cda2905661e 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -309,6 +309,9 @@ struct CPUArchState {
> >   uint64_t mfromhost;
> >   uint64_t mtohost;
> >
> > +/* Sstc CSRs */
> > +uint64_t stimecmp;
> > +
> >   /* physical memory protection */
> >   pmp_table_t pmp_state;
> >   target_ulong mseccfg;
> > @@ -362,6 +365,7 @@ struct CPUArchState {
> >   float_status fp_status;
> >
> >   /* Fields from here on are preserved across CPU reset. */
> > +QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
> >
> >   hwaddr kernel_addr;
> >   hwaddr fdt_addr;
> > @@ -425,6 +429,7 @@ struct RISCVCPUConfig {
> >   bool ext_ifencei;
> >   bool ext_icsr;
> >   bool ext_zihintpause;
> > +bool ext_sstc;
> >   bool ext_svinval;
> >   bool ext_svnapot;
> >   bool ext_svpbmt;
> > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > index 6be5a9e9f046..ac17cf1515c0 100644
> > --- a/target/riscv/cpu_bits.h
> > +++ b/target/riscv/cpu_bits.h
> > @@ -206,6 +206,10 @@
> >   #define CSR_STVAL   0x143
> >   #define CSR_SIP 0x144
> >
> > +/* Sstc supervisor CSRs */
> > +#define CSR_STIMECMP0x14D
> > +#define CSR_STIMECMPH   0x15D
> > +
> >   /* Supervisor Protection and Translation */
> >   #define CSR_SPTBR   0x180
> >   #define CSR_SATP0x180
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 0fb042b2fd0f..b71e2509b64f 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -22,6 +22,7 @@
> >   #include "qemu/timer.h"
> >   #include "cpu.h"
> >   #include "pmu.h"
> > +#include "time_helper.h"
> >   #include 

Re: [PATCH v7 3/3] target/riscv: Add vstimecmp support

2022-08-03 Thread Atish Kumar Patra
On Wed, Aug 3, 2022 at 1:49 AM Weiwei Li  wrote:

>
> 在 2022/8/3 下午4:25, Atish Patra 写道:
> > vstimecmp CSR allows the guest OS or to program the next guest timer
> > interrupt directly. Thus, hypervisor no longer need to inject the
> > timer interrupt to the guest if vstimecmp is used. This was ratified
> > as a part of the Sstc extension.
> >
> > Signed-off-by: Atish Patra 
> > ---
> >   target/riscv/cpu.h |   4 ++
> >   target/riscv/cpu_bits.h|   4 ++
> >   target/riscv/cpu_helper.c  |  11 ++--
> >   target/riscv/csr.c | 100 -
> >   target/riscv/machine.c |   1 +
> >   target/riscv/time_helper.c |  16 ++
> >   6 files changed, 131 insertions(+), 5 deletions(-)
> >
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 4cda2905661e..1fd382b2717f 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -312,6 +312,8 @@ struct CPUArchState {
> >   /* Sstc CSRs */
> >   uint64_t stimecmp;
> >
> > +uint64_t vstimecmp;
> > +
> >   /* physical memory protection */
> >   pmp_table_t pmp_state;
> >   target_ulong mseccfg;
> > @@ -366,6 +368,8 @@ struct CPUArchState {
> >
> >   /* Fields from here on are preserved across CPU reset. */
> >   QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
> > +QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */
> > +bool vstime_irq;
> >
> >   hwaddr kernel_addr;
> >   hwaddr fdt_addr;
> > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > index ac17cf1515c0..095dab19f512 100644
> > --- a/target/riscv/cpu_bits.h
> > +++ b/target/riscv/cpu_bits.h
> > @@ -257,6 +257,10 @@
> >   #define CSR_VSIP0x244
> >   #define CSR_VSATP   0x280
> >
> > +/* Sstc virtual CSRs */
> > +#define CSR_VSTIMECMP   0x24D
> > +#define CSR_VSTIMECMPH  0x25D
> > +
> >   #define CSR_MTINST  0x34a
> >   #define CSR_MTVAL2  0x34b
> >
> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> > index 650574accf0a..1e4faa84e839 100644
> > --- a/target/riscv/cpu_helper.c
> > +++ b/target/riscv/cpu_helper.c
> > @@ -345,8 +345,9 @@ uint64_t riscv_cpu_all_pending(CPURISCVState *env)
> >   {
> >   uint32_t gein = get_field(env->hstatus, HSTATUS_VGEIN);
> >   uint64_t vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
> > +uint64_t vstip = (env->vstime_irq) ? MIP_VSTIP : 0;
> >
> > -return (env->mip | vsgein) & env->mie;
> > +return (env->mip | vsgein | vstip) & env->mie;
> >   }
> >
> >   int riscv_cpu_mirq_pending(CPURISCVState *env)
> > @@ -605,7 +606,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu,
> uint64_t mask, uint64_t value)
> >   {
> >   CPURISCVState *env = >env;
> >   CPUState *cs = CPU(cpu);
> > -uint64_t gein, vsgein = 0, old = env->mip;
> > +uint64_t gein, vsgein = 0, vstip = 0, old = env->mip;
> >   bool locked = false;
> >
> >   if (riscv_cpu_virt_enabled(env)) {
> > @@ -613,6 +614,10 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu,
> uint64_t mask, uint64_t value)
> >   vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
> >   }
> >
> > +/* No need to update mip for VSTIP */
> > +mask = ((mask == MIP_VSTIP) && env->vstime_irq) ? 0 : mask;
> > +vstip = env->vstime_irq ? MIP_VSTIP : 0;
> > +
> >   if (!qemu_mutex_iothread_locked()) {
> >   locked = true;
> >   qemu_mutex_lock_iothread();
> > @@ -620,7 +625,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu,
> uint64_t mask, uint64_t value)
> >
> >   env->mip = (env->mip & ~mask) | (value & mask);
> >
> > -if (env->mip | vsgein) {
> > +if (env->mip | vsgein | vstip) {
> >   cpu_interrupt(cs, CPU_INTERRUPT_HARD);
> >   } else {
> >   cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index b71e2509b64f..d4265dd3cca2 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -833,17 +833,98 @@ static RISCVException sstc(CPURISCVState *env, int
> csrno)
> >   return RISCV_EXCP_NONE;
> >   }
> >
> > +static RISCVException sstc_hmode(CPURISCVState *env, int csrno)
> > +{
> > +CPUState *cs = env_cpu(env);
> > +RISCVCPU *cpu = RISCV_CPU(cs);
> > +
> > +if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +if (env->priv == PRV_M) {
> > +return RISCV_EXCP_NONE;
> > +}
> > +
> > +if (!(get_field(env->mcounteren, COUNTEREN_TM) &
> > +  get_field(env->menvcfg, MENVCFG_STCE))) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +if (!(get_field(env->hcounteren, COUNTEREN_TM) &
> > +  get_field(env->henvcfg, HENVCFG_STCE))) {
> > +return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > +}
> > +
> I think hcounteren only works for VS mode here. So we should add check
> for virt  mode is enabled here.
>

vstimecmp can 

Re: [RFC PATCH 1/3] target/ppc: Bugfix fadd/fsub result with OE/UE set

2022-08-03 Thread Lucas Mateus Martins Araujo e Castro


On 03/08/2022 15:16, Richard Henderson wrote:


On 8/3/22 10:45, Lucas Mateus Martins Araujo e Castro wrote:


On 03/08/2022 13:18, Richard Henderson wrote:


On 8/3/22 05:22, Lucas Mateus Castro(alqotel) wrote:

From: "Lucas Mateus Castro (alqotel)" 

As mentioned in the functions float_overflow_excp and
float_underflow_excp, the result should be adjusted as mentioned in 
the
ISA (subtracted 192/1536 from the exponent of the intermediate 
result if
an overflow occurs with OE set and added 192/1536 to the exponent 
of the

intermediate result if an underflow occurs with UE set), but at those
functions the result has already been rounded so it is not possible to
add/subtract from the intermediate result anymore.

This patch creates a new function that receives the value that 
should be
subtracted/added from the exponent if an overflow/underflow 
happens, to
not leave some arbitrary numbers from the PowerISA in the middle of 
the

FPU code. If these numbers are 0 the new functions just call the old
ones.

I used 2 values here for overflow and underflow, maybe it'd be 
better to

just use the same ones, any thoughts?

Signed-off-by: Lucas Mateus Castro (alqotel) 


---
An alternative I've thought was to always return the value adjusted 
if a

overflow or underflow occurs and in float_underflow_excp and
float_overflow_excp adjust it to inf/den/0 if OE/UE is 0, but I didn't
saw many advantages to that approach.
---
  fpu/softfloat.c | 75 
+

  include/fpu/softfloat.h |  2 ++
  target/ppc/fpu_helper.c | 10 --
  3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4a871ef2a1..a407129dcb 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -268,6 +268,8 @@ typedef bool (*f64_check_fn)(union_float64 a, 
union_float64 b);


  typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, 
float_status *s);
  typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, 
float_status *s);
+typedef float64 (*soft_f64_op2_int2_fn)(float64 a, float64 b, int 
c, int d,

+    float_status *s);
  typedef float   (*hard_f32_op2_fn)(float a, float b);
  typedef double  (*hard_f64_op2_fn)(double a, double b);

@@ -401,6 +403,19 @@ float64_gen2(float64 xa, float64 xb, 
float_status *s,

  return soft(ua.s, ub.s, s);
  }

+static inline float64
+float64_gen2_excp(float64 xa, float64 xb, int xc, int xd, 
float_status *s,

+  hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+  soft_f64_op2_int2_fn soft_excp, f64_check_fn pre,
+  f64_check_fn post)
+{
+    if (xc || xd) {
+    return soft_excp(xa, xb, xc, xd, s);
+    } else {
+    return float64_gen2(xa, xb, s, hard, soft, pre, post);
+    }
+}
+
  /*
   * Classify a floating point number. Everything above 
float_class_qnan

   * is a NaN so cls >= float_class_qnan is any NaN.
@@ -1929,6 +1944,39 @@ static double hard_f64_sub(double a, double b)
  return a - b;
  }

+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_addsub_excp_en(float64 a, float64 b, int oe_sub, int ue_sum,
+    float_status *status, bool subtract)
+{
+    FloatParts64 pa, pb, *pr;
+
+    float64_unpack_canonical(, a, status);
+    float64_unpack_canonical(, b, status);
+    pr = parts_addsub(, , status, subtract);
+
+    if (unlikely(oe_sub && (pr->exp > 1023))) {
+    pr->exp -= oe_sub;
+    float_raise(float_flag_overflow, status);
+    } else if (unlikely(ue_sum && (pr->exp < -1022))) {
+    pr->exp += ue_sum;
+    float_raise(float_flag_underflow, status);
+    }
+
+    return float64_round_pack_canonical(pr, status);


This is incorrect, because the exponent is not fixed until the 
middle of

round_pack_canonical.

I think you should not add new functions like this, with new 
parameters, but instead add
fields to float_status, which would then be checked at the places 
currently setting

underflow and overflow.


So add overflow_correction and underflow_correction in 
'partsN(uncanon_normal)' so that:


if (exp >= exp_max) {
 if (overflow_correction != 0) {
 exp -= overflow_correction;
 }
}

And the equivalent for underflow, or a bool ppc_overflow_enable that 
uses a fixed value like:


if (exp >= exp_max) {
 if (ppc_overflow_enable) {
 exp -= ((fmt->exp_bias + 1) + (fmt->exp_bias + 1)/2);
 }
}

(and the equivalent for underflow) ?


Something like that.

I would suggest pre-computing that adjustment into fmt, via FLOAT_PARAMS.
Naming is always hard, but how about exp_re_bias?

The flag(s) should not contain "ppc" in the name.  But perhaps

  s->rebias_overflow
  s->rebias_underflow


rebias_* sounds good to me.

Also I imagine that these bools would be set by mtfsf, mtfsfi, mtfsb0 
and mtfsb1, in which case it'd make these patches significantly shorter. 
I'll send a v2 with these changes






r~

--
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas 

Re: [RFC PATCH 1/3] target/ppc: Bugfix fadd/fsub result with OE/UE set

2022-08-03 Thread Richard Henderson

On 8/3/22 10:45, Lucas Mateus Martins Araujo e Castro wrote:


On 03/08/2022 13:18, Richard Henderson wrote:


On 8/3/22 05:22, Lucas Mateus Castro(alqotel) wrote:

From: "Lucas Mateus Castro (alqotel)" 

As mentioned in the functions float_overflow_excp and
float_underflow_excp, the result should be adjusted as mentioned in the
ISA (subtracted 192/1536 from the exponent of the intermediate result if
an overflow occurs with OE set and added 192/1536 to the exponent of the
intermediate result if an underflow occurs with UE set), but at those
functions the result has already been rounded so it is not possible to
add/subtract from the intermediate result anymore.

This patch creates a new function that receives the value that should be
subtracted/added from the exponent if an overflow/underflow happens, to
not leave some arbitrary numbers from the PowerISA in the middle of the
FPU code. If these numbers are 0 the new functions just call the old
ones.

I used 2 values here for overflow and underflow, maybe it'd be better to
just use the same ones, any thoughts?

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
An alternative I've thought was to always return the value adjusted if a
overflow or underflow occurs and in float_underflow_excp and
float_overflow_excp adjust it to inf/den/0 if OE/UE is 0, but I didn't
saw many advantages to that approach.
---
  fpu/softfloat.c | 75 +
  include/fpu/softfloat.h |  2 ++
  target/ppc/fpu_helper.c | 10 --
  3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4a871ef2a1..a407129dcb 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -268,6 +268,8 @@ typedef bool (*f64_check_fn)(union_float64 a, union_float64 
b);

  typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);
  typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
+typedef float64 (*soft_f64_op2_int2_fn)(float64 a, float64 b, int c, int d,
+    float_status *s);
  typedef float   (*hard_f32_op2_fn)(float a, float b);
  typedef double  (*hard_f64_op2_fn)(double a, double b);

@@ -401,6 +403,19 @@ float64_gen2(float64 xa, float64 xb, float_status *s,
  return soft(ua.s, ub.s, s);
  }

+static inline float64
+float64_gen2_excp(float64 xa, float64 xb, int xc, int xd, float_status *s,
+  hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+  soft_f64_op2_int2_fn soft_excp, f64_check_fn pre,
+  f64_check_fn post)
+{
+    if (xc || xd) {
+    return soft_excp(xa, xb, xc, xd, s);
+    } else {
+    return float64_gen2(xa, xb, s, hard, soft, pre, post);
+    }
+}
+
  /*
   * Classify a floating point number. Everything above float_class_qnan
   * is a NaN so cls >= float_class_qnan is any NaN.
@@ -1929,6 +1944,39 @@ static double hard_f64_sub(double a, double b)
  return a - b;
  }

+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_addsub_excp_en(float64 a, float64 b, int oe_sub, int ue_sum,
+    float_status *status, bool subtract)
+{
+    FloatParts64 pa, pb, *pr;
+
+    float64_unpack_canonical(, a, status);
+    float64_unpack_canonical(, b, status);
+    pr = parts_addsub(, , status, subtract);
+
+    if (unlikely(oe_sub && (pr->exp > 1023))) {
+    pr->exp -= oe_sub;
+    float_raise(float_flag_overflow, status);
+    } else if (unlikely(ue_sum && (pr->exp < -1022))) {
+    pr->exp += ue_sum;
+    float_raise(float_flag_underflow, status);
+    }
+
+    return float64_round_pack_canonical(pr, status);


This is incorrect, because the exponent is not fixed until the middle of 
round_pack_canonical.


I think you should not add new functions like this, with new parameters, but 
instead add
fields to float_status, which would then be checked at the places currently 
setting
underflow and overflow.


So add overflow_correction and underflow_correction in 'partsN(uncanon_normal)' 
so that:

if (exp >= exp_max) {
     if (overflow_correction != 0) {
         exp -= overflow_correction;
     }
}

And the equivalent for underflow, or a bool ppc_overflow_enable that uses a 
fixed value like:

if (exp >= exp_max) {
     if (ppc_overflow_enable) {
         exp -= ((fmt->exp_bias + 1) + (fmt->exp_bias + 1)/2);
     }
}

(and the equivalent for underflow) ?


Something like that.

I would suggest pre-computing that adjustment into fmt, via FLOAT_PARAMS.
Naming is always hard, but how about exp_re_bias?

The flag(s) should not contain "ppc" in the name.  But perhaps

  s->rebias_overflow
  s->rebias_underflow


r~



Re: [PATCH v2 07/20] ppc/ppc405: QOM'ify CPC

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

Introduce a QOM property "cpu" to initialize the DCR handlers. This is
a pattern that we will reuse for the all other 405 devices needing it.

Now that all clock settings are handled at the CPC level, change the
SoC "sys-clk" property to be an alias on the same property in the CPC
model.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405.h|  39 +++-
hw/ppc/ppc405_uc.c | 109 +++--
2 files changed, 85 insertions(+), 63 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index ae64549537c6..88c63774d9ba 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -63,6 +63,43 @@ struct ppc4xx_bd_info_t {
uint32_t bi_iic_fast[2];
};

+typedef struct Ppc405SoCState Ppc405SoCState;


This typedef is already done by the OBJECT_DECLARE_SIMPLE_TYPE macro 
below. Could some compilers complain about double typedef? There may be 
some circular dependencies here so to avoid a separate typedef you may 
need to bring the OBJECT_DECLARE_SIMPLE_TYPE(Ppc405SoCState, PPC405_SOC); 
line up here to the front while keeping the actual declaration of the 
state struct and rest of the object later which separates them but adding 
a comment may explain that. I'm not sure if it's better to do that or 
repeating the typedef in advance as done here is better but declaring the 
object in advance is probably a bit cleaner than repeating part of its 
internals just in case this implementation detail ever changes.


Regards,
BALATON Zoltan


+
+#define TYPE_PPC405_CPC "ppc405-cpc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405CpcState, PPC405_CPC);
+
+enum {
+PPC405EP_CPU_CLK   = 0,
+PPC405EP_PLB_CLK   = 1,
+PPC405EP_OPB_CLK   = 2,
+PPC405EP_EBC_CLK   = 3,
+PPC405EP_MAL_CLK   = 4,
+PPC405EP_PCI_CLK   = 5,
+PPC405EP_UART0_CLK = 6,
+PPC405EP_UART1_CLK = 7,
+PPC405EP_CLK_NB= 8,
+};
+
+struct Ppc405CpcState {
+DeviceState parent_obj;
+
+PowerPCCPU *cpu;
+
+uint32_t sysclk;
+clk_setup_t clk_setup[PPC405EP_CLK_NB];
+uint32_t boot;
+uint32_t epctl;
+uint32_t pllmr[2];
+uint32_t ucr;
+uint32_t srr;
+uint32_t jtagid;
+uint32_t pci;
+/* Clock and power management */
+uint32_t er;
+uint32_t fr;
+uint32_t sr;
+};
+
#define TYPE_PPC405_SOC "ppc405-soc"
OBJECT_DECLARE_SIMPLE_TYPE(Ppc405SoCState, PPC405_SOC);

@@ -79,9 +116,9 @@ struct Ppc405SoCState {
MemoryRegion *dram_mr;
hwaddr ram_size;

-uint32_t sysclk;
PowerPCCPU cpu;
DeviceState *uic;
+Ppc405CpcState cpc;
};

/* PowerPC 405 core */
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 013dccee898b..32bfc9480bc6 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -1178,36 +1178,7 @@ enum {
#endif
};

-enum {
-PPC405EP_CPU_CLK   = 0,
-PPC405EP_PLB_CLK   = 1,
-PPC405EP_OPB_CLK   = 2,
-PPC405EP_EBC_CLK   = 3,
-PPC405EP_MAL_CLK   = 4,
-PPC405EP_PCI_CLK   = 5,
-PPC405EP_UART0_CLK = 6,
-PPC405EP_UART1_CLK = 7,
-PPC405EP_CLK_NB= 8,
-};
-
-typedef struct ppc405ep_cpc_t ppc405ep_cpc_t;
-struct ppc405ep_cpc_t {
-uint32_t sysclk;
-clk_setup_t clk_setup[PPC405EP_CLK_NB];
-uint32_t boot;
-uint32_t epctl;
-uint32_t pllmr[2];
-uint32_t ucr;
-uint32_t srr;
-uint32_t jtagid;
-uint32_t pci;
-/* Clock and power management */
-uint32_t er;
-uint32_t fr;
-uint32_t sr;
-};
-
-static void ppc405ep_compute_clocks (ppc405ep_cpc_t *cpc)
+static void ppc405ep_compute_clocks(Ppc405CpcState *cpc)
{
uint32_t CPU_clk, PLB_clk, OPB_clk, EBC_clk, MAL_clk, PCI_clk;
uint32_t UART0_clk, UART1_clk;
@@ -1302,10 +1273,9 @@ static void ppc405ep_compute_clocks (ppc405ep_cpc_t *cpc)

static uint32_t dcr_read_epcpc (void *opaque, int dcrn)
{
-ppc405ep_cpc_t *cpc;
+Ppc405CpcState *cpc = PPC405_CPC(opaque);
uint32_t ret;

-cpc = opaque;
switch (dcrn) {
case PPC405EP_CPC0_BOOT:
ret = cpc->boot;
@@ -1342,9 +1312,8 @@ static uint32_t dcr_read_epcpc (void *opaque, int dcrn)

static void dcr_write_epcpc (void *opaque, int dcrn, uint32_t val)
{
-ppc405ep_cpc_t *cpc;
+Ppc405CpcState *cpc = PPC405_CPC(opaque);

-cpc = opaque;
switch (dcrn) {
case PPC405EP_CPC0_BOOT:
/* Read-only register */
@@ -1377,9 +1346,9 @@ static void dcr_write_epcpc (void *opaque, int dcrn, 
uint32_t val)
}
}

-static void ppc405ep_cpc_reset (void *opaque)
+static void ppc405_cpc_reset(DeviceState *dev)
{
-ppc405ep_cpc_t *cpc = opaque;
+Ppc405CpcState *cpc = PPC405_CPC(dev);

cpc->boot = 0x0010; /* Boot from PCI - IIC EEPROM disabled */
cpc->epctl = 0x;
@@ -1391,21 +1360,24 @@ static void ppc405ep_cpc_reset (void *opaque)
cpc->er = 0x;
cpc->fr = 0x;
cpc->sr = 0x;
+cpc->jtagid = 0x20267049;
ppc405ep_compute_clocks(cpc);
}

/* XXX: sysclk should be between 25 and 100 MHz 

Re: [PATCH for-7.1] hw/mips/malta: turn off x86 specific features of PIIX4_PM

2022-08-03 Thread Peter Maydell
On Wed, 3 Aug 2022 at 18:26, Bernhard Beschow  wrote:
>
> On Tue, Aug 2, 2022 at 8:37 AM Philippe Mathieu-Daudé via 
>  wrote:
>>
>> On 28/7/22 15:16, Igor Mammedov wrote:
>> > On Thu, 28 Jul 2022 13:29:07 +0100
>> > Peter Maydell  wrote:
>> >
>> >> On Thu, 28 Jul 2022 at 12:50, Igor Mammedov  wrote:
>> >>> Disable compiled out features using compat properties as the least
>> >>> risky way to deal with issue.
>>
>> So now MIPS is forced to use meaningless compat[] to satisfy X86.
>>
>> Am I wrong seeing this as a dirty hack creeping in, yet another
>> technical debt that will hit (me...) back in a close future?
>>
>> Are we sure there are no better solution (probably more time consuming
>> and involving refactors) we could do instead?
>
>
> Working on the consolidation of piix3 and -4 soutbridges [1] I've stumbled 
> over certain design decisions where board/platform specific assumptions are 
> baked into the piix device models. I figure that's the core of the issue.
>
> In our case the ACPI functionality is implemented by inheritance while 
> perhaps it should be implemented using composition. With composition, the 
> ACPI functionality could be injected by the caller: The pc board would inject 
> it while the Malta board wouldn't. This would solve both the crash and above 
> design problem.
>
> I'd be willing to implement it but can't make any promises about the time 
> frame since I'm currently doing this in my free time. Any hints regarding the 
> implementation would be welcome, though.


For the 7.1 release (coming up real soon now) can we get consensus
on this patch from Igor as the least risky way to at least fix
the segfault ? We can look at better approaches for 7.2.

thanks
-- PMM



Re: [PATCH v1 00/40] TDX QEMU support

2022-08-03 Thread Daniel P . Berrangé
On Tue, Aug 02, 2022 at 06:55:48PM +0800, Xiaoyao Li wrote:
> On 8/2/2022 5:49 PM, Daniel P. Berrangé wrote:
> > On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:
> 
> > > - CPU model
> > > 
> > >We cannot create a TD with arbitrary CPU model like what for non-TDX 
> > > VMs,
> > >because only a subset of features can be configured for TD.
> > >- It's recommended to use '-cpu host' to create TD;
> > >- '+feature/-feature' might not work as expected;
> > > 
> > >future work: To introduce specific CPU model for TDs and enhance 
> > > +/-features
> > > for TDs.
> > 
> > Which features are incompatible with TDX ?
> 
> TDX enforces some features fixed to 1 (e.g., CPUID_EXT_X2APIC,
> CPUID_EXT_HYPERVISOR)and some fixed to 0 (e.g., CPUID_EXT_VMX ).
> 
> Details can be found in patch 8 and TDX spec chapter "CPUID virtualization"
> 
> > Presumably you have such a list, so that KVM can block them when
> > using '-cpu host' ?
> 
> No, KVM doesn't do this. The result is no error reported from KVM but what
> TD OS sees from CPUID might be different what user specifies in QEMU.
> 
> > If so, we should be able to sanity check the
> > use of these features in QEMU for the named CPU models / feature
> > selection too.
> 
> This series enhances get_supported_cpuid() for TDX. If named CPU models are
> used to boot a TDX guest, it likely gets warning of "xxx feature is not
> available"

If the  ',check=on' arg is given to -cpu, does it ensure that the
guest fails to startup with an incompatible feature set ? That's
really the key thing to protect the user from mistakes.


> We have another series to enhance the "-feature" for TDX, to warn out if
> some fixed1 is specified to be removed. Besides, we will introduce specific
> named CPU model for TDX. e.g., TDX-SapphireRapids which contains the maximum
> feature set a TDX guest can have on SPR host.

I don't know if this is the right approach or not, but we should at least
consider making use of CPU versioning here.  ie have a single "SapphireRapids"
alias, which resolves to a suitable specific CPU version depending on whether
TDX is used or not.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2 0/2] virtio: remove unnecessary host_features in ->get_features()

2022-08-03 Thread Stefan Hajnoczi
v2:
- Document vdv->get_features() callback [Cornelia]

The vdc->get_features() callbacks are a little inconsistent in how they use
vdev->host_features. This is because the function's behavior changed over time.
Clean things up.

Stefan Hajnoczi (2):
  virtio: document vdc->get_features() callback
  virtio: remove unnecessary host_features in ->get_features()

 include/hw/virtio/virtio.h  | 20 
 hw/block/virtio-blk.c   |  3 ---
 hw/char/virtio-serial-bus.c |  1 -
 hw/net/virtio-net.c |  3 ---
 hw/scsi/vhost-scsi-common.c |  3 ---
 hw/scsi/virtio-scsi.c   |  4 
 hw/virtio/virtio-balloon.c  |  2 --
 7 files changed, 20 insertions(+), 16 deletions(-)

-- 
2.37.1




Re: [RFC PATCH 1/3] target/ppc: Bugfix fadd/fsub result with OE/UE set

2022-08-03 Thread Lucas Mateus Martins Araujo e Castro


On 03/08/2022 13:18, Richard Henderson wrote:


On 8/3/22 05:22, Lucas Mateus Castro(alqotel) wrote:

From: "Lucas Mateus Castro (alqotel)" 

As mentioned in the functions float_overflow_excp and
float_underflow_excp, the result should be adjusted as mentioned in the
ISA (subtracted 192/1536 from the exponent of the intermediate result if
an overflow occurs with OE set and added 192/1536 to the exponent of the
intermediate result if an underflow occurs with UE set), but at those
functions the result has already been rounded so it is not possible to
add/subtract from the intermediate result anymore.

This patch creates a new function that receives the value that should be
subtracted/added from the exponent if an overflow/underflow happens, to
not leave some arbitrary numbers from the PowerISA in the middle of the
FPU code. If these numbers are 0 the new functions just call the old
ones.

I used 2 values here for overflow and underflow, maybe it'd be better to
just use the same ones, any thoughts?

Signed-off-by: Lucas Mateus Castro (alqotel) 


---
An alternative I've thought was to always return the value adjusted if a
overflow or underflow occurs and in float_underflow_excp and
float_overflow_excp adjust it to inf/den/0 if OE/UE is 0, but I didn't
saw many advantages to that approach.
---
  fpu/softfloat.c | 75 +
  include/fpu/softfloat.h |  2 ++
  target/ppc/fpu_helper.c | 10 --
  3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4a871ef2a1..a407129dcb 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -268,6 +268,8 @@ typedef bool (*f64_check_fn)(union_float64 a, 
union_float64 b);


  typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, 
float_status *s);
  typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, 
float_status *s);
+typedef float64 (*soft_f64_op2_int2_fn)(float64 a, float64 b, int c, 
int d,

+    float_status *s);
  typedef float   (*hard_f32_op2_fn)(float a, float b);
  typedef double  (*hard_f64_op2_fn)(double a, double b);

@@ -401,6 +403,19 @@ float64_gen2(float64 xa, float64 xb, 
float_status *s,

  return soft(ua.s, ub.s, s);
  }

+static inline float64
+float64_gen2_excp(float64 xa, float64 xb, int xc, int xd, 
float_status *s,

+  hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+  soft_f64_op2_int2_fn soft_excp, f64_check_fn pre,
+  f64_check_fn post)
+{
+    if (xc || xd) {
+    return soft_excp(xa, xb, xc, xd, s);
+    } else {
+    return float64_gen2(xa, xb, s, hard, soft, pre, post);
+    }
+}
+
  /*
   * Classify a floating point number. Everything above float_class_qnan
   * is a NaN so cls >= float_class_qnan is any NaN.
@@ -1929,6 +1944,39 @@ static double hard_f64_sub(double a, double b)
  return a - b;
  }

+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_addsub_excp_en(float64 a, float64 b, int oe_sub, int ue_sum,
+    float_status *status, bool subtract)
+{
+    FloatParts64 pa, pb, *pr;
+
+    float64_unpack_canonical(, a, status);
+    float64_unpack_canonical(, b, status);
+    pr = parts_addsub(, , status, subtract);
+
+    if (unlikely(oe_sub && (pr->exp > 1023))) {
+    pr->exp -= oe_sub;
+    float_raise(float_flag_overflow, status);
+    } else if (unlikely(ue_sum && (pr->exp < -1022))) {
+    pr->exp += ue_sum;
+    float_raise(float_flag_underflow, status);
+    }
+
+    return float64_round_pack_canonical(pr, status);


This is incorrect, because the exponent is not fixed until the middle 
of round_pack_canonical.


I think you should not add new functions like this, with new 
parameters, but instead add
fields to float_status, which would then be checked at the places 
currently setting

underflow and overflow.


So add overflow_correction and underflow_correction in 
'partsN(uncanon_normal)' so that:


if (exp >= exp_max) {
    if (overflow_correction != 0) {
        exp -= overflow_correction;
    }
}

And the equivalent for underflow, or a bool ppc_overflow_enable that 
uses a fixed value like:


if (exp >= exp_max) {
    if (ppc_overflow_enable) {
        exp -= ((fmt->exp_bias + 1) + (fmt->exp_bias + 1)/2);
    }
}

(and the equivalent for underflow) ?




r~

--
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 


Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer 


Re: [PATCH v2 02/20] ppc/ppc405: Introduce a PPC405 generic machine

2022-08-03 Thread Daniel Henrique Barboza




On 8/3/22 14:03, BALATON Zoltan wrote:

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

We will use this machine as a base to define the ref405ep and possibly
the PPC405 hotfoot board as found in the Linux kernel.

Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405_boards.c | 31 ---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 1a4e7588c584..4c269b6526a5 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -50,6 +50,15 @@

#define USE_FLASH_BIOS

+struct Ppc405MachineState {
+    /* Private */
+    MachineState parent_obj;
+    /* Public */
+};
+
+#define TYPE_PPC405_MACHINE MACHINE_TYPE_NAME("ppc405")
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405MachineState, PPC405_MACHINE);


In other patches the declaration of the state struct comes after the 
OBJECT_DECLARE macro so here instead of above. It would be better to write it 
like that here too for consistency and also because then the DECLARE macro 
starts the object declaration and everything belonging to the object are 
together below it. Declaring the structure before is kind of outside the 
object, although this is only cosmetic and may be a matter of style.


Good point. I moved the struct declaration to after the OBJECT_DECLARE macro.


Thanks,

Daniel



Regards,
BALATON Zoltan


+
/*/
/* PPC405EP reference board (IBM) */
/* Standalone board with:
@@ -332,18 +341,34 @@ static void ref405ep_class_init(ObjectClass *oc, void 
*data)

    mc->desc = "ref405ep";
    mc->init = ref405ep_init;
-    mc->default_ram_size = 0x0800;
-    mc->default_ram_id = "ef405ep.ram";
}

static const TypeInfo ref405ep_type = {
    .name = MACHINE_TYPE_NAME("ref405ep"),
-    .parent = TYPE_MACHINE,
+    .parent = TYPE_PPC405_MACHINE,
    .class_init = ref405ep_class_init,
};

+static void ppc405_machine_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->desc = "PPC405 generic machine";
+    mc->default_ram_size = 0x0800;
+    mc->default_ram_id = "ppc405.ram";
+}
+
+static const TypeInfo ppc405_machine_type = {
+    .name = TYPE_PPC405_MACHINE,
+    .parent = TYPE_MACHINE,
+    .instance_size = sizeof(Ppc405MachineState),
+    .class_init = ppc405_machine_class_init,
+    .abstract = true,
+};
+
static void ppc405_machine_init(void)
{
+    type_register_static(_machine_type);
    type_register_static(_type);
}






[PATCH v2 1/1] osdep: asynchronous teardown for shutdown on Linux

2022-08-03 Thread Claudio Imbrenda
This patch adds support for asynchronously tearing down a VM on Linux.

When qemu terminates, either naturally or because of a fatal signal,
the VM is torn down. If the VM is huge, it can take a considerable
amount of time for it to be cleaned up. In case of a protected VM, it
might take even longer than a non-protected VM (this is the case on
s390x, for example).

Some users might want to shut down a VM and restart it immediately,
without having to wait. This is especially true if management
infrastructure like libvirt is used.

This patch implements a simple trick on Linux to allow qemu to return
immediately, with the teardown of the VM being performed
asynchronously.

If the new commandline option -async-teardown is used, a new process is
spawned from qemu at startup, using the clone syscall, in such way that
it will share its address space with qemu.

The new process will then simpy wait until qemu terminates, and then it
will exit itself.

This allows qemu to terminate quickly, without having to wait for the
whole address space to be torn down. The teardown process will exit
after qemu, so it will be the last user of the address space, and
therefore it will take care of the actual teardown.

The teardown process will share the same cgroups as qemu, so both
memory usage and cpu time will be accounted properly.

This feature can already be used with libvirt by adding the following
to the XML domain definition:

  http://libvirt.org/schemas/domain/qemu/1.0;>
  
  

Signed-off-by: Claudio Imbrenda 
---
 include/qemu/osdep.h |  2 ++
 os-posix.c   |  5 
 qemu-options.hx  | 17 ++
 util/osdep.c | 55 
 4 files changed, 79 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b1c161c035..3154759d79 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -549,6 +549,8 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t 
count)
 
 void qemu_set_cloexec(int fd);
 
+void init_async_teardown(void);
+
 /* Return a dynamically allocated directory path that is appropriate for 
storing
  * local state.
  *
diff --git a/os-posix.c b/os-posix.c
index 321fc4bd13..dd3e42b4c4 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -150,6 +150,11 @@ int os_parse_cmd_args(int index, const char *optarg)
 case QEMU_OPTION_daemonize:
 daemonize = 1;
 break;
+#if defined(CONFIG_LINUX)
+case QEMU_OPTION_asyncteardown:
+init_async_teardown();
+break;
+#endif
 default:
 return -1;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index 3f23a42fa8..d434353159 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4743,6 +4743,23 @@ HXCOMM Internal use
 DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL)
 DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
 
+#ifdef __linux__
+DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
+"-async-teardown enable asynchronous teardown\n",
+QEMU_ARCH_ALL)
+#endif
+SRST
+``-async-teardown``
+Enable asynchronous teardown. A new teardown process will be
+created at startup, using clone. The teardown process will share
+the address space of the main qemu process, and wait for the main
+process to terminate. At that point, the teardown process will
+also exit. This allows qemu to terminate quickly if the guest was
+huge, leaving the teardown of the address space to the teardown
+process. Since the teardown process shares the same cgroups as the
+main qemu process, accounting is performed correctly.
+ERST
+
 DEF("msg", HAS_ARG, QEMU_OPTION_msg,
 "-msg [timestamp[=on|off]][,guest-name=[on|off]]\n"
 "control error message format\n"
diff --git a/util/osdep.c b/util/osdep.c
index 60fcbbaebe..bb0baf97a0 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -23,6 +23,15 @@
  */
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+
+#ifdef CONFIG_LINUX
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
+
 #include "qemu/cutils.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
@@ -512,6 +521,52 @@ const char *qemu_hw_version(void)
 return hw_version;
 }
 
+#ifdef __linux__
+static int async_teardown_fn(void *arg)
+{
+sigset_t all_signals;
+fd_set r, w, e;
+int fd;
+
+/* open a pidfd descriptor for the parent qemu process */
+fd = syscall(__NR_pidfd_open, getppid(), 0);
+/* if something went wrong, or if the file descriptor is too big */
+if ((fd < 0) || (fd >= FD_SETSIZE)) {
+_exit(1);
+}
+/* zero all fd sets */
+FD_ZERO();
+FD_ZERO();
+FD_ZERO();
+/* set the fd for the pidfd in the "read" set */
+FD_SET(fd, );
+/* block all signals */
+sigfillset(_signals);
+sigprocmask(SIG_BLOCK, _signals, NULL);
+/* wait for the pid to disappear -> fd will appear as ready for read */
+(void) select(fd + 1, , , , NULL);
+
+/*
+ * Close all file descriptors that might 

Re: [PATCH for-7.1] hw/mips/malta: turn off x86 specific features of PIIX4_PM

2022-08-03 Thread Bernhard Beschow
On Tue, Aug 2, 2022 at 8:37 AM Philippe Mathieu-Daudé via <
qemu-devel@nongnu.org> wrote:

> On 28/7/22 15:16, Igor Mammedov wrote:
> > On Thu, 28 Jul 2022 13:29:07 +0100
> > Peter Maydell  wrote:
> >
> >> On Thu, 28 Jul 2022 at 12:50, Igor Mammedov 
> wrote:
> >>>
> >>> QEMU crashes trying to save VMSTATE when only MIPS target are compiled
> in
> >>>$ qemu-system-mips -monitor stdio
> >>>(qemu) migrate "exec:gzip -c > STATEFILE.gz"
> >>>Segmentation fault (core dumped)
> >>>
> >>> It happens due to PIIX4_PM trying to parse hotplug vmstate structures
> >>> which are valid only for x86 and not for MIPS (as it requires ACPI
> >>> tables support which is not existent for ithe later)
>
> We already discussed this Frankenstein PIIX4 problem 2 and 4 years ago:
>
> https://lore.kernel.org/qemu-devel/4d42697e-ba84-e5af-3a17-a2cc52cf0...@redhat.com/
>
> https://lore.kernel.org/qemu-devel/20190304210359-mutt-send-email-...@kernel.org/


Interesting reads!


> >>> Issue was probably exposed by trying to cleanup/compile out unused
> >>> ACPI bits from MIPS target (but forgetting about migration bits).
> >>>
> >>> Disable compiled out features using compat properties as the least
> >>> risky way to deal with issue.
>
> So now MIPS is forced to use meaningless compat[] to satisfy X86.
>
> Am I wrong seeing this as a dirty hack creeping in, yet another
> technical debt that will hit (me...) back in a close future?
>
> Are we sure there are no better solution (probably more time consuming
> and involving refactors) we could do instead?
>

Working on the consolidation of piix3 and -4 soutbridges [1] I've stumbled
over certain design decisions where board/platform specific assumptions are
baked into the piix device models. I figure that's the core of the issue.

In our case the ACPI functionality is implemented by inheritance while
perhaps it should be implemented using composition. With composition, the
ACPI functionality could be injected by the caller: The pc board would
inject it while the Malta board wouldn't. This would solve both the crash
and above design problem.

I'd be willing to implement it but can't make any promises about the time
frame since I'm currently doing this in my free time. Any hints regarding
the implementation would be welcome, though.

Best regards,
Bernhard

[1] https://github.com/shentok/qemu/commits/piix-consolidate


> Thanks,
>
> Phil.
>
> >>> Signed-off-by: Igor Mammedov 
> >>
> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/995
> >>
> >>> ---
> >>> PS:
> >>> another approach could be setting defaults to disabled state and
> >>> enabling them using compat props on PC machines (which is more
> >>> code to deal with => more risky) or continue with PIIX4_PM
> >>> refactoring to split x86-shism out (which I'm not really
> >>> interested in due to risk of regressions for not much of
> >>> benefit)
> >>> ---
> >>>   hw/mips/malta.c | 9 +
> >>>   1 file changed, 9 insertions(+)
> >>>
> >>> diff --git a/hw/mips/malta.c b/hw/mips/malta.c
> >>> index 7a0ec513b0..0e932988e0 100644
> >>> --- a/hw/mips/malta.c
> >>> +++ b/hw/mips/malta.c
> >>> @@ -1442,6 +1442,14 @@ static const TypeInfo mips_malta_device = {
> >>>   .instance_init = mips_malta_instance_init,
> >>>   };
> >>>
> >>> +GlobalProperty malta_compat[] = {
> >>> +{ "PIIX4_PM", "memory-hotplug-support", "off" },
> >>> +{ "PIIX4_PM", "acpi-pci-hotplug-with-bridge-support", "off" },
> >>> +{ "PIIX4_PM", "acpi-root-pci-hotplug", "off" },
> >>> +{ "PIIX4_PM", "x-not-migrate-acpi-index", "true" },
> >>> +};
> >>
> >> Is there an easy way to assert in hw/acpi/piix4.c that if
> >> CONFIG_ACPI_PCIHP was not set then the board has initialized
> >> all these properties to the don't-use-hotplug state ?
> >> That would be a guard against similar bugs (though I suppose
> >> we probably aren't likely to add new piix4 boards...)
> >
> > unfortunately new features still creep in 'pc' machine
> > ex: "acpi-root-pci-hotplug"), and I don't see an easy
> > way to compile that nor enforce that in the future.
> >
> > Far from easy would be split piix4_pm on base/enhanced
> > classes so we wouldn't need x86 specific hacks in 'base'
> > variant (assuming 'enhanced' could maintain the current
> > VMSTATE to keep cross-version migration working).
> >
> >>> +const size_t malta_compat_len = G_N_ELEMENTS(malta_compat);
> >>> +
> >>>   static void mips_malta_machine_init(MachineClass *mc)
> >>>   {
> >>>   mc->desc = "MIPS Malta Core LV";
> >>> @@ -1455,6 +1463,7 @@ static void mips_malta_machine_init(MachineClass
> *mc)
> >>>   mc->default_cpu_type = MIPS_CPU_TYPE_NAME("24Kf");
> >>>   #endif
> >>>   mc->default_ram_id = "mips_malta.ram";
> >>> +compat_props_add(mc->compat_props, malta_compat,
> malta_compat_len);
> >>>   }
> >>>
> >>>   DEFINE_MACHINE("malta", mips_malta_machine_init)
> >>> --
> >>> 2.31.1
> >>
> >> thanks
> >> -- PMM
> >>
> >
>
>
>


[PATCH v2 2/2] virtio: remove unnecessary host_features in ->get_features()

2022-08-03 Thread Stefan Hajnoczi
Since at least commit 6b8f1020540c27246277377aa2c3331ad2bfb160 ("virtio:
move host_features") the ->get_features() function has been called with
host_features as an argument.

Some devices manually add host_features in ->get_features() although the
features argument already contains host_features. Make all devices
consistent by dropping the unnecessary code.

Cc: Cornelia Huck 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c   | 3 ---
 hw/char/virtio-serial-bus.c | 1 -
 hw/net/virtio-net.c | 3 ---
 hw/scsi/vhost-scsi-common.c | 3 ---
 hw/scsi/virtio-scsi.c   | 4 
 hw/virtio/virtio-balloon.c  | 2 --
 6 files changed, 16 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..429aedcf2b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -996,9 +996,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-/* Firstly sync all virtio-blk possible supported features */
-features |= s->host_features;
-
 virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX);
 virtio_add_feature(, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(, VIRTIO_BLK_F_TOPOLOGY);
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..1414fb85ae 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -557,7 +557,6 @@ static uint64_t get_features(VirtIODevice *vdev, uint64_t 
features,
 
 vser = VIRTIO_SERIAL(vdev);
 
-features |= vser->host_features;
 if (vser->bus.max_nr_ports > 1) {
 virtio_add_feature(, VIRTIO_CONSOLE_F_MULTIPORT);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index dd0d056fde..8ecdc1cd83 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -715,9 +715,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
 
-/* Firstly sync all virtio-net possible supported features */
-features |= n->host_features;
-
 virtio_add_feature(, VIRTIO_NET_F_MAC);
 
 if (!peer_has_vnet_hdr(n)) {
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 767f827e55..8b26f90aa1 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -124,9 +124,6 @@ uint64_t vhost_scsi_common_get_features(VirtIODevice *vdev, 
uint64_t features,
 {
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
 
-/* Turn on predefined features supported by this device */
-features |= vsc->host_features;
-
 return vhost_get_features(>dev, vsc->feature_bits, features);
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 414151..f754611dfe 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -816,10 +816,6 @@ static uint64_t virtio_scsi_get_features(VirtIODevice 
*vdev,
  uint64_t requested_features,
  Error **errp)
 {
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-
-/* Firstly sync all virtio-scsi possible supported features */
-requested_features |= s->host_features;
 return requested_features;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 73ac5eb675..0e9ca71b15 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -796,8 +796,6 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
 Error **errp)
 {
-VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
-f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
 
 return f;
-- 
2.37.1




Re: [RFC 1/1] hw: tpmtisspi: add SPI support to QEMU TPM implementation

2022-08-03 Thread Peter Delevoryas
On Wed, Aug 03, 2022 at 10:52:23AM +0200, Cédric Le Goater wrote:
> On 8/3/22 04:32, Iris Chen wrote:
> > From: Iris Chen 
> 
> A commit log telling us about this new device would be good to have.
> 
> 
> > Signed-off-by: Iris Chen 
> > ---
> >   configs/devices/arm-softmmu/default.mak |   1 +
> >   hw/arm/Kconfig  |   5 +
> >   hw/tpm/Kconfig  |   5 +
> >   hw/tpm/meson.build  |   1 +
> >   hw/tpm/tpm_tis_spi.c| 311 
> >   include/sysemu/tpm.h|   3 +
> >   6 files changed, 326 insertions(+)
> >   create mode 100644 hw/tpm/tpm_tis_spi.c
> > 
> > diff --git a/configs/devices/arm-softmmu/default.mak 
> > b/configs/devices/arm-softmmu/default.mak
> > index 6985a25377..80d2841568 100644
> > --- a/configs/devices/arm-softmmu/default.mak
> > +++ b/configs/devices/arm-softmmu/default.mak
> > @@ -42,3 +42,4 @@ CONFIG_FSL_IMX6UL=y
> >   CONFIG_SEMIHOSTING=y
> >   CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
> >   CONFIG_ALLWINNER_H3=y
> > +CONFIG_FBOBMC_AST=y
> 
> I don't think this extra config is useful for now
> 
> > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > index 15fa79afd3..193decaec1 100644
> > --- a/hw/arm/Kconfig
> > +++ b/hw/arm/Kconfig
> > @@ -458,6 +458,11 @@ config ASPEED_SOC
> >   select PMBUS
> >   select MAX31785
> > +config FBOBMC_AST
> > +bool
> > +select ASPEED_SOC
> > +select TPM_TIS_SPI
> > +
> >   config MPS2
> >   bool
> >   imply I2C_DEVICES
> > diff --git a/hw/tpm/Kconfig b/hw/tpm/Kconfig
> > index 29e82f3c92..370a43f045 100644
> > --- a/hw/tpm/Kconfig
> > +++ b/hw/tpm/Kconfig
> > @@ -8,6 +8,11 @@ config TPM_TIS_SYSBUS
> >   depends on TPM
> >   select TPM_TIS
> > +config TPM_TIS_SPI
> > +bool
> > +depends on TPM
> > +select TPM_TIS
> > +
> >   config TPM_TIS
> >   bool
> >   depends on TPM
> > diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
> > index 1c68d81d6a..1a057f4e36 100644
> > --- a/hw/tpm/meson.build
> > +++ b/hw/tpm/meson.build
> > @@ -2,6 +2,7 @@ softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: 
> > files('tpm_tis_common.c'))
> >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: 
> > files('tpm_tis_isa.c'))
> >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
> > files('tpm_tis_sysbus.c'))
> >   softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
> > +softmmu_ss.add(when: 'CONFIG_TPM_TIS_SPI', if_true: files('tpm_tis_spi.c'))
> >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_TIS'], if_true: 
> > files('tpm_ppi.c'))
> >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_CRB'], if_true: 
> > files('tpm_ppi.c'))
> > diff --git a/hw/tpm/tpm_tis_spi.c b/hw/tpm/tpm_tis_spi.c
> > new file mode 100644
> > index 00..c98ddcfddb
> > --- /dev/null
> > +++ b/hw/tpm/tpm_tis_spi.c
> > @@ -0,0 +1,311 @@
> > +#include "qemu/osdep.h"
> > +#include "hw/qdev-properties.h"
> > +#include "migration/vmstate.h"
> > +#include "hw/acpi/tpm.h"
> > +#include "tpm_prop.h"
> > +#include "tpm_tis.h"
> > +#include "qom/object.h"
> > +#include "hw/ssi/ssi.h"
> > +#include "hw/ssi/spi_gpio.h"
> > +
> > +#define TPM_TIS_SPI_ADDR_BYTES 3
> > +#define SPI_WRITE 0
> > +
> > +typedef enum {
> > +TIS_SPI_PKT_STATE_DEACTIVATED = 0,
> > +TIS_SPI_PKT_STATE_START,
> > +TIS_SPI_PKT_STATE_ADDRESS,
> > +TIS_SPI_PKT_STATE_DATA_WR,
> > +TIS_SPI_PKT_STATE_DATA_RD,
> > +TIS_SPI_PKT_STATE_DONE,
> > +} TpmTisSpiPktState;
> > +
> > +union TpmTisRWSizeByte {
> > +uint8_t byte;
> > +struct {
> > +uint8_t data_expected_size:6;
> > +uint8_t resv:1;
> > +uint8_t rwflag:1;
> > +};
> > +};
> > +
> > +union TpmTisSpiHwAddr {
> > +hwaddr addr;
> > +uint8_t bytes[sizeof(hwaddr)];
> > +};
> > +
> > +union TpmTisSpiData {
> > +uint32_t data;
> > +uint8_t bytes[64];
> > +};
> > +
> > +struct TpmTisSpiState {
> > +/*< private >*/
> > +SSIPeripheral parent_obj;
> > +
> > +/*< public >*/
> > +TPMState tpm_state; /* not a QOM object */
> > +TpmTisSpiPktState tpm_tis_spi_state;
> > +
> > +union TpmTisRWSizeByte first_byte;
> > +union TpmTisSpiHwAddr addr;
> > +union TpmTisSpiData data;
> 
> Are these device registers ? I am not sure the unions are very useful.

+1, I don't think we should be using unions, instead we should split out
all the relevant fields we want to store and use extract32/deposit32/etc
if necessary.

> 
> > +uint32_t data_size;
> > +uint8_t data_idx;
> > +uint8_t addr_idx;
> > +};
> > +
> > +struct TpmTisSpiClass {
> > +SSIPeripheralClass parent_class;
> > +};
> > +
> > +OBJECT_DECLARE_TYPE(TpmTisSpiState, TpmTisSpiClass, TPM_TIS_SPI)
> > +
> > +static void tpm_tis_spi_mmio_read(TpmTisSpiState *tts)
> > +{
> > +uint16_t offset = tts->addr.addr & 0xffc;
> > +
> > +switch (offset) {
> > +case TPM_TIS_REG_DATA_FIFO:
> > +for (uint8_t i = 0; i < 

[PATCH v2 1/2] virtio: document vdc->get_features() callback

2022-08-03 Thread Stefan Hajnoczi
Suggested-by: Cornelia Huck 
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..8d27fe1824 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -120,9 +120,29 @@ struct VirtioDeviceClass {
 /* This is what a VirtioDevice must implement */
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+
+/**
+ * get_features:
+ * @vdev: the VirtIODevice
+ * @requested_features: existing device feature bits from
+ *  vdev->host_features
+ * @errp: pointer to error object
+ *
+ * Get the device feature bits.
+ *
+ * The ->get_features() function typically sets always-on device feature
+ * bits as well as conditional feature bits that require some logic to
+ * compute.
+ *
+ * Device feature bits can also be set in vdev->host_features before this
+ * function is called using DEFINE_PROP_BIT64() qdev properties.
+ *
+ * Returns: the final device feature bits to store in vdev->host_features.
+ */
 uint64_t (*get_features)(VirtIODevice *vdev,
  uint64_t requested_features,
  Error **errp);
+
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
 int (*validate_features)(VirtIODevice *vdev);
-- 
2.37.1




[PATCH v3 4/7] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap

2022-08-03 Thread Eugenio Pérez
So the caller can choose which ASID is destined.

No need to update the batch functions as they will always be called from
memory listener updates at the moment. Memory listener updates will
always update ASID 0, as it's the passthrough ASID.

All vhost devices's ASID are 0 at this moment.

Signed-off-by: Eugenio Pérez 
---
v3: Deleted unneeded space
---
 include/hw/virtio/vhost-vdpa.h |  8 +---
 hw/virtio/vhost-vdpa.c | 25 +++--
 net/vhost-vdpa.c   |  6 +++---
 hw/virtio/trace-events |  4 ++--
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d85643..6560bb9d78 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
 int index;
 uint32_t msg_type;
 bool iotlb_batch_begin_sent;
+uint32_t address_space_id;
 MemoryListener listener;
 struct vhost_vdpa_iova_range iova_range;
 uint64_t acked_features;
@@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-   void *vaddr, bool readonly);
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+   hwaddr size, void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+ hwaddr size);
 
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2fefcc66ad..131100841c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -72,22 +72,24 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 return false;
 }
 
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-   void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+   hwaddr size, void *vaddr, bool readonly)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
 int ret = 0;
 
 msg.type = v->msg_type;
+msg.asid = asid;
 msg.iotlb.iova = iova;
 msg.iotlb.size = size;
 msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
 msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
 msg.iotlb.type = VHOST_IOTLB_UPDATE;
 
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
-msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+ msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+ msg.iotlb.type);
 
 if (write(fd, , sizeof(msg)) != sizeof(msg)) {
 error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, 
hwaddr size,
 return ret;
 }
 
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+ hwaddr size)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
 int ret = 0;
 
 msg.type = v->msg_type;
+msg.asid = asid;
 msg.iotlb.iova = iova;
 msg.iotlb.size = size;
 msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
 
-trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
+trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
msg.iotlb.size, msg.iotlb.type);
 
 if (write(fd, , sizeof(msg)) != sizeof(msg)) {
@@ -229,7 +233,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,
 }
 
 vhost_vdpa_iotlb_batch_begin_once(v);
-ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
+ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
  vaddr, section->readonly);
 if (ret) {
 error_report("vhost vdpa map fail!");
@@ -299,7 +303,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
*listener,
 vhost_iova_tree_remove(v->iova_tree, result);
 }
 vhost_vdpa_iotlb_batch_begin_once(v);
-ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
+ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
 if (ret) {
 error_report("vhost_vdpa dma unmap error!");
 }
@@ -890,7 +894,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
 }
 
 size = ROUND_UP(result->size, qemu_real_host_page_size());
-r = vhost_vdpa_dma_unmap(v, result->iova, size);
+r = vhost_vdpa_dma_unmap(v, v->address_space_id, result->iova, size);
 return r == 0;
 }
 
@@ -932,7 +936,8 @@ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, 
DMAMap *needle,

Re: [PATCH v2 7/7] vdpa: Always start CVQ in SVQ mode

2022-08-03 Thread Eugenio Perez Martin
On Mon, Aug 1, 2022 at 9:50 AM Eugenio Perez Martin  wrote:
>
> On Tue, Jul 26, 2022 at 5:04 AM Jason Wang  wrote:
> >
> >
> > 在 2022/7/22 21:43, Eugenio Pérez 写道:
> > > Isolate control virtqueue in its own group, allowing to intercept control
> > > commands but letting dataplane run totally passthrough to the guest.
> > >
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > >   hw/virtio/vhost-vdpa.c |   3 +-
> > >   net/vhost-vdpa.c   | 158 +++--
> > >   2 files changed, 156 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 79623badf2..fe1c85b086 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -668,7 +668,8 @@ static int vhost_vdpa_set_backend_cap(struct 
> > > vhost_dev *dev)
> > >   {
> > >   uint64_t features;
> > >   uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> > > -0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> > > +0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> > > +0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> > >   int r;
> > >
> > >   if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, )) {
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 6c1c64f9b1..f5075ef487 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
> > >   /* Control commands shadow buffers */
> > >   void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
> > >
> > > +/* Number of address spaces supported by the device */
> > > +unsigned address_space_num;
> > > +
> > >   /* The device always have SVQ enabled */
> > >   bool always_svq;
> > >   bool started;
> > > @@ -100,6 +103,8 @@ static const uint64_t vdpa_svq_device_features =
> > >   BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> > >   BIT_ULL(VIRTIO_NET_F_STANDBY);
> > >
> > > +#define VHOST_VDPA_NET_CVQ_ASID 1
> > > +
> > >   VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> > >   {
> > >   VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > @@ -214,6 +219,109 @@ static ssize_t vhost_vdpa_receive(NetClientState 
> > > *nc, const uint8_t *buf,
> > >   return 0;
> > >   }
> > >
> > > +static int vhost_vdpa_get_vring_group(int device_fd,
> > > +  struct vhost_vring_state *state)
> > > +{
> > > +int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
> > > +return r < 0 ? -errno : 0;
> > > +}
> >
> >
> > It would be more convenient for the caller if we can simply return 0 here.
> >
>
> I don't follow this, how do we know if the call failed then?
>
> >
> > > +
> > > +/**
> > > + * Check if all the virtqueues of the virtio device are in a different 
> > > vq than
> > > + * the last vq. VQ group of last group passed in cvq_group.
> > > + */
> > > +static bool vhost_vdpa_cvq_group_is_independent(struct vhost_vdpa *v,
> > > +struct vhost_vring_state 
> > > cvq_group)
> > > +{
> > > +struct vhost_dev *dev = v->dev;
> > > +int ret;
> > > +
> > > +for (int i = 0; i < (dev->vq_index_end - 1); ++i) {
> > > +struct vhost_vring_state vq_group = {
> > > +.index = i,
> > > +};
> > > +
> > > +ret = vhost_vdpa_get_vring_group(v->device_fd, _group);
> > > +if (unlikely(ret)) {
> > > +goto call_err;
> > > +}
> > > +if (unlikely(vq_group.num == cvq_group.num)) {
> > > +error_report("CVQ %u group is the same as VQ %u one (%u)",
> > > + cvq_group.index, vq_group.index, cvq_group.num);
> >
> >
> > Any reason we need error_report() here?
> >
>
> We can move it to a migration blocker.
>
> > Btw, I'd suggest to introduce new field in vhost_vdpa, then we can get
> > and store the group_id there during init.
> >
> > This could be useful for the future e.g PASID virtualization.
> >
>
> Answering below.
>
> >
> > > +return false;
> > > +}
> > > +}
> > > +
> > > +return true;
> > > +
> > > +call_err:
> > > +error_report("Can't read vq group, errno=%d (%s)", -ret, 
> > > g_strerror(-ret));
> > > +return false;
> > > +}
> > > +
> > > +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> > > +   unsigned vq_group,
> > > +   unsigned asid_num)
> > > +{
> > > +struct vhost_vring_state asid = {
> > > +.index = vq_group,
> > > +.num = asid_num,
> > > +};
> > > +int ret;
> > > +
> > > +ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, );
> > > +if (unlikely(ret < 0)) {
> > > +error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> > > +asid.index, asid.num, errno, g_strerror(errno));
> > > +}
> > > +return ret;
> > > +}
> > > +
> > > +static void vhost_vdpa_net_prepare(NetClientState *nc)
> > > +{
> > > +  

Re: [PATCH v2 1/1] osdep: asynchronous teardown for shutdown on Linux

2022-08-03 Thread Daniel P . Berrangé
On Wed, Aug 03, 2022 at 07:31:41PM +0200, Claudio Imbrenda wrote:
> This patch adds support for asynchronously tearing down a VM on Linux.
> 
> When qemu terminates, either naturally or because of a fatal signal,
> the VM is torn down. If the VM is huge, it can take a considerable
> amount of time for it to be cleaned up. In case of a protected VM, it
> might take even longer than a non-protected VM (this is the case on
> s390x, for example).
> 
> Some users might want to shut down a VM and restart it immediately,
> without having to wait. This is especially true if management
> infrastructure like libvirt is used.
> 
> This patch implements a simple trick on Linux to allow qemu to return
> immediately, with the teardown of the VM being performed
> asynchronously.
> 
> If the new commandline option -async-teardown is used, a new process is
> spawned from qemu at startup, using the clone syscall, in such way that
> it will share its address space with qemu.
> 
> The new process will then simpy wait until qemu terminates, and then it
> will exit itself.
> 
> This allows qemu to terminate quickly, without having to wait for the
> whole address space to be torn down. The teardown process will exit
> after qemu, so it will be the last user of the address space, and
> therefore it will take care of the actual teardown.
> 
> The teardown process will share the same cgroups as qemu, so both
> memory usage and cpu time will be accounted properly.
> 
> This feature can already be used with libvirt by adding the following
> to the XML domain definition:
> 
>   http://libvirt.org/schemas/domain/qemu/1.0;>
>   
>   

How does this work in practice ?  Libvirt should be blocking until
all processes in the cgroup have exited, including this cloned
child process.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v3 1/7] linux-headers: Update kernel headers

2022-08-03 Thread Eugenio Pérez
Main reason is for new vhost_vdpa address space ioctls to be available.

Update kernel headers until
9de1f9c8ca51 ("Merge tag 'irq-core-2022-08-01' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").

Signed-off-by: Eugenio Pérez 
---
 include/standard-headers/asm-x86/bootparam.h |  7 +-
 include/standard-headers/drm/drm_fourcc.h| 69 
 include/standard-headers/linux/ethtool.h |  1 +
 include/standard-headers/linux/input.h   | 12 ++--
 include/standard-headers/linux/pci_regs.h|  1 +
 include/standard-headers/linux/vhost_types.h | 11 +++-
 include/standard-headers/linux/virtio_ids.h  | 14 ++--
 linux-headers/asm-arm64/kvm.h| 27 
 linux-headers/asm-generic/unistd.h   |  4 +-
 linux-headers/asm-riscv/kvm.h| 20 ++
 linux-headers/asm-riscv/unistd.h |  3 +-
 linux-headers/asm-x86/kvm.h  | 11 ++--
 linux-headers/asm-x86/mman.h | 14 
 linux-headers/linux/kvm.h| 56 +++-
 linux-headers/linux/userfaultfd.h| 10 ++-
 linux-headers/linux/vfio.h   |  4 +-
 linux-headers/linux/vhost.h  | 26 ++--
 17 files changed, 240 insertions(+), 50 deletions(-)

diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index b2aaad10e5..0b06d2bff1 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,12 +10,13 @@
 #define SETUP_EFI  4
 #define SETUP_APPLE_PROPERTIES 5
 #define SETUP_JAILHOUSE6
+#define SETUP_CC_BLOB  7
+#define SETUP_IMA  8
 #define SETUP_RNG_SEED 9
+#define SETUP_ENUM_MAX SETUP_RNG_SEED
 
 #define SETUP_INDIRECT (1<<31)
-
-/* SETUP_INDIRECT | max(SETUP_*) */
-#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE)
+#define SETUP_TYPE_MAX (SETUP_ENUM_MAX | SETUP_INDIRECT)
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK   0x07FF
diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 4888f85f69..0b051545d3 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -571,6 +571,53 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
+/*
+ * Intel Tile 4 layout
+ *
+ * This is a tiled layout using 4KB tiles in a row-major layout. It has the 
same
+ * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
+ * only differs from Tile Y at the 256B granularity in between. At this
+ * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a 
shape
+ * of 64B x 8 rows.
+ */
+#define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. For semi-planar formats
+ * like NV12, the Y and UV planes are Tile 4 and are located at plane indices
+ * 0 and 1, respectively. The CCS for all planes are stored outside of the
+ * GEM object in a reserved memory area dedicated for the storage of the
+ * CCS data for all RC/RC_CC/MC compressible GEM objects. The main surface
+ * pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for DG2 render 
compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths. The
+ * clear color is stored at plane index 1 and the pitch should be ignored. The
+ * format of the 256 bits of clear color data matches the one used for the
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
+ * for details.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -608,6 +655,28 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_QCOM_COMPRESSED fourcc_mod_code(QCOM, 1)
 
+/*
+ * Qualcomm Tiled Format
+ *
+ * Similar to 

[PATCH v3 2/7] vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop

2022-08-03 Thread Eugenio Pérez
This function used to trust in v->shadow_vqs != NULL to know if it must
start svq or not.

This is not going to be valid anymore, as qemu is going to allocate svq
unconditionally (but it will only start them conditionally).

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8882077955..2b8d807860 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1025,7 +1025,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 Error *err = NULL;
 unsigned i;
 
-if (!v->shadow_vqs) {
+if (!v->shadow_vqs_enabled) {
 return true;
 }
 
@@ -1078,7 +1078,7 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 {
 struct vhost_vdpa *v = dev->opaque;
 
-if (!v->shadow_vqs) {
+if (!v->shadow_vqs_enabled) {
 return true;
 }
 
-- 
2.31.1




[PATCH v3 3/7] vdpa: Allocate SVQ unconditionally

2022-08-03 Thread Eugenio Pérez
SVQ may run or not in a device depending on runtime conditions (for
example, if the device can move CVQ to its own group or not).

Allocate the resources unconditionally, and decide later if to use them
or not.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2b8d807860..2fefcc66ad 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -406,6 +406,21 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 int r;
 bool ok;
 
+shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
+for (unsigned n = 0; n < hdev->nvqs; ++n) {
+g_autoptr(VhostShadowVirtqueue) svq;
+
+svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+v->shadow_vq_ops_opaque);
+if (unlikely(!svq)) {
+error_setg(errp, "Cannot create svq %u", n);
+return -1;
+}
+g_ptr_array_add(shadow_vqs, g_steal_pointer());
+}
+
+v->shadow_vqs = g_steal_pointer(_vqs);
+
 if (!v->shadow_vqs_enabled) {
 return 0;
 }
@@ -422,20 +437,6 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 return -1;
 }
 
-shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
-for (unsigned n = 0; n < hdev->nvqs; ++n) {
-g_autoptr(VhostShadowVirtqueue) svq;
-
-svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
-v->shadow_vq_ops_opaque);
-if (unlikely(!svq)) {
-error_setg(errp, "Cannot create svq %u", n);
-return -1;
-}
-g_ptr_array_add(shadow_vqs, g_steal_pointer());
-}
-
-v->shadow_vqs = g_steal_pointer(_vqs);
 return 0;
 }
 
@@ -576,10 +577,6 @@ static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
 struct vhost_vdpa *v = dev->opaque;
 size_t idx;
 
-if (!v->shadow_vqs) {
-return;
-}
-
 for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
 vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
 }
-- 
2.31.1




[PATCH v6 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld
In order to fully support MSA_EXT_5, we have to also support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   2 +
 target/s390x/tcg/crypto_helper.c | 157 +++
 2 files changed, 159 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 3d333e2789..b6d804fa6d 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -751,6 +751,8 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_PRNO_TRNG,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 8ad4ef1ace..bb4823107c 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand 
+ *   Jason A. Donenfeld 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -19,6 +21,153 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
interactivity. */
+uint64_t z[8], b[8], a[8], w[16], t;
+uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
processed = 0;
+int i, j, reg_len = 64, blocks = 0, cc = 0;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+for (i = 0; i < 8; ++i) {
+z[i] = a[i] = cpu_ldq_be_data_ra(env, wrap_address(env, 
parameter_block + 8 * i), ra);

[PATCH v3 5/7] vdpa: Store x-svq parameter in VhostVDPAState

2022-08-03 Thread Eugenio Pérez
CVQ can be shadowed two ways:
- Device has x-svq=on parameter (current way)
- The device can isolate CVQ in its own vq group

QEMU needs to check for the second condition dynamically, because CVQ
index is not known at initialization time. Since this is dynamic, the
CVQ isolation could vary with different conditions, making it possible
to go from "not isolated group" to "isolated".

Saving the cmdline parameter in an extra field so we never disable CVQ
SVQ in case the device was started with cmdline.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index f96c3cb1da..e3b65ed546 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -36,6 +36,9 @@ typedef struct VhostVDPAState {
 
 /* Control commands shadow buffers */
 void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
+
+/* The device always have SVQ enabled */
+bool always_svq;
 bool started;
 } VhostVDPAState;
 
@@ -564,6 +567,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+s->always_svq = svq;
 s->vhost_vdpa.shadow_vqs_enabled = svq;
 s->vhost_vdpa.iova_tree = iova_tree;
 if (!is_datapath) {
-- 
2.31.1




Re: [PATCH v2 01/20] ppc/ppc405: Remove taihu machine

2022-08-03 Thread Daniel Henrique Barboza




On 8/3/22 10:28, Cédric Le Goater wrote:

It has been deprecated since 7.0.

Signed-off-by: Cédric Le Goater 
---



Reviewed-by: Daniel Henrique Barboza 


  docs/about/deprecated.rst   |   9 --
  docs/about/removed-features.rst |   6 +
  docs/system/ppc/embedded.rst|   1 -
  hw/ppc/ppc405_boards.c  | 232 
  MAINTAINERS |   2 +-
  5 files changed, 7 insertions(+), 243 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7ee26626d5cf..2f9b41aaea48 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -233,15 +233,6 @@ deprecated; use the new name ``dtb-randomness`` instead. 
The new name
  better reflects the way this property affects all random data within
  the device tree blob, not just the ``kaslr-seed`` node.
  
-PPC 405 ``taihu`` machine (since 7.0)

-'
-
-The PPC 405 CPU is a system-on-a-chip, so all 405 machines are very similar,
-except for some external periphery. However, the periphery of the ``taihu``
-machine is hardly emulated at all (e.g. neither the LCD nor the USB part had
-been implemented), so there is not much value added by this board. Use the
-``ref405ep`` machine instead.
-
  ``pc-i440fx-1.4`` up to ``pc-i440fx-1.7`` (since 7.0)
  '
  
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst

index c7b9dadd5d63..8fad2f4d5e9b 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -661,6 +661,12 @@ Aspeed ``swift-bmc`` machine (removed in 7.0)
  This machine was removed because it was unused. Alternative AST2500 based
  OpenPOWER machines are ``witherspoon-bmc`` and ``romulus-bmc``.
  
+ppc ``taihu`` machine (removed in 7.2)

+'
+
+This machine was removed because it was partially emulated and 405
+machines are very similar. Use the ``ref405ep`` machine instead.
+
  linux-user mode CPUs
  
  
diff --git a/docs/system/ppc/embedded.rst b/docs/system/ppc/embedded.rst

index cfffbda24da9..af3b3d9fa460 100644
--- a/docs/system/ppc/embedded.rst
+++ b/docs/system/ppc/embedded.rst
@@ -6,5 +6,4 @@ Embedded family boards
  - ``ppce500``  generic paravirt e500 platform
  - ``ref405ep`` ref405ep
  - ``sam460ex`` aCube Sam460ex
-- ``taihu``taihu
  - ``virtex-ml507`` Xilinx Virtex ML507 reference design
diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index a66ad05e3ac3..1a4e7588c584 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -342,241 +342,9 @@ static const TypeInfo ref405ep_type = {
  .class_init = ref405ep_class_init,
  };
  
-/*/

-/* AMCC Taihu evaluation board */
-/* - PowerPC 405EP processor
- * - SDRAM   128 MB at 0x
- * - Boot flash  2 MB   at 0xFFE0
- * - Application flash   32 MB  at 0xFC00
- * - 2 serial ports
- * - 2 ethernet PHY
- * - 1 USB 1.1 device0x5000
- * - 1 LCD display   0x5010
- * - 1 CPLD  0x5010
- * - 1 I2C EEPROM
- * - 1 I2C thermal sensor
- * - a set of LEDs
- * - bit-bang SPI port using GPIOs
- * - 1 EBC interface connector 0 0x5020
- * - 1 cardbus controller + expansion slot.
- * - 1 PCI expansion slot.
- */
-typedef struct taihu_cpld_t taihu_cpld_t;
-struct taihu_cpld_t {
-uint8_t reg0;
-uint8_t reg1;
-};
-
-static uint64_t taihu_cpld_read(void *opaque, hwaddr addr, unsigned size)
-{
-taihu_cpld_t *cpld;
-uint32_t ret;
-
-cpld = opaque;
-switch (addr) {
-case 0x0:
-ret = cpld->reg0;
-break;
-case 0x1:
-ret = cpld->reg1;
-break;
-default:
-ret = 0;
-break;
-}
-
-return ret;
-}
-
-static void taihu_cpld_write(void *opaque, hwaddr addr,
- uint64_t value, unsigned size)
-{
-taihu_cpld_t *cpld;
-
-cpld = opaque;
-switch (addr) {
-case 0x0:
-/* Read only */
-break;
-case 0x1:
-cpld->reg1 = value;
-break;
-default:
-break;
-}
-}
-
-static const MemoryRegionOps taihu_cpld_ops = {
-.read = taihu_cpld_read,
-.write = taihu_cpld_write,
-.impl = {
-.min_access_size = 1,
-.max_access_size = 1,
-},
-.endianness = DEVICE_NATIVE_ENDIAN,
-};
-
-static void taihu_cpld_reset (void *opaque)
-{
-taihu_cpld_t *cpld;
-
-cpld = opaque;
-cpld->reg0 = 0x01;
-cpld->reg1 = 0x80;
-}
-
-static void taihu_cpld_init(MemoryRegion *sysmem, uint32_t base)
-{
-taihu_cpld_t *cpld;
-MemoryRegion *cpld_memory = g_new(MemoryRegion, 1);
-
-cpld = g_new0(taihu_cpld_t, 1);
-memory_region_init_io(cpld_memory, NULL, _cpld_ops, cpld, "cpld", 
0x100);
-memory_region_add_subregion(sysmem, 

[PATCH v3 0/7] ASID support in vhost-vdpa net

2022-08-03 Thread Eugenio Pérez
Control VQ is the way net devices use to send changes to the device state, like
the number of active queues or its mac address.

QEMU needs to intercept this queue so it can track these changes and is able to
migrate the device. It can do it from 1576dbb5bbc4 ("vdpa: Add x-svq to
NetdevVhostVDPAOptions"). However, to enable x-svq implies to shadow all VirtIO
device's virtqueues, which will damage performance.

This series adds address space isolation, so the device and the guest
communicate directly with them (passthrough) and CVQ communication is split in
two: The guest communicates with QEMU and QEMU forwards the commands to the
device.

This series is based on [1], and this needs to be applied on top of that.  Each
one of them adds a feature on isolation and could be merged individually once
conflicts are solved.

Comments are welcome. Thanks!

v3:
- Do not return an error but just print a warning if vdpa device initialization
  returns failure while getting AS num of VQ groups
- Delete extra newline

v2:
- Much as commented on series [1], handle vhost_net backend through
  NetClientInfo callbacks instead of directly.
- Fix not freeing SVQ properly when device does not support CVQ
- Add BIT_ULL missed checking device's backend feature for _F_ASID.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00349.html

Eugenio Pérez (7):
  linux-headers: Update kernel headers
  vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop
  vdpa: Allocate SVQ unconditionally
  vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  vdpa: Store x-svq parameter in VhostVDPAState
  vhost_net: Add NetClientInfo prepare callback
  vdpa: Always start CVQ in SVQ mode

 include/hw/virtio/vhost-vdpa.h   |   8 +-
 include/net/net.h|   2 +
 include/standard-headers/asm-x86/bootparam.h |   7 +-
 include/standard-headers/drm/drm_fourcc.h|  69 +
 include/standard-headers/linux/ethtool.h |   1 +
 include/standard-headers/linux/input.h   |  12 +-
 include/standard-headers/linux/pci_regs.h|   1 +
 include/standard-headers/linux/vhost_types.h |  11 +-
 include/standard-headers/linux/virtio_ids.h  |  14 +-
 linux-headers/asm-arm64/kvm.h|  27 
 linux-headers/asm-generic/unistd.h   |   4 +-
 linux-headers/asm-riscv/kvm.h|  20 +++
 linux-headers/asm-riscv/unistd.h |   3 +-
 linux-headers/asm-x86/kvm.h  |  11 +-
 linux-headers/asm-x86/mman.h |  14 --
 linux-headers/linux/kvm.h|  56 ++-
 linux-headers/linux/userfaultfd.h|  10 +-
 linux-headers/linux/vfio.h   |   4 +-
 linux-headers/linux/vhost.h  |  26 +++-
 hw/net/vhost_net.c   |   4 +
 hw/virtio/vhost-vdpa.c   |  65 
 net/vhost-vdpa.c | 154 ++-
 hw/virtio/trace-events   |   4 +-
 23 files changed, 434 insertions(+), 93 deletions(-)

-- 
2.31.1





[PATCH 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld
In order to fully support MSA_EXT_5, we have to also support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   2 +
 target/s390x/tcg/crypto_helper.c | 157 +++
 2 files changed, 159 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 3d333e2789..b6d804fa6d 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -751,6 +751,8 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_PRNO_TRNG,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 8ad4ef1ace..bb4823107c 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand 
+ *   Jason A. Donenfeld 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -19,6 +21,153 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
interactivity. */
+uint64_t z[8], b[8], a[8], w[16], t;
+uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
processed = 0;
+int i, j, reg_len = 64, blocks = 0, cc = 0;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+for (i = 0; i < 8; ++i) {
+z[i] = a[i] = cpu_ldq_be_data_ra(env, wrap_address(env, 
parameter_block + 8 * i), ra);

[PATCH v3 6/7] vhost_net: Add NetClientInfo prepare callback

2022-08-03 Thread Eugenio Pérez
This is used by the backend to perform actions before the device is
started.

In particular, vdpa will use it to isolate CVQ in its own ASID if
possible, and start SVQ unconditionally only in CVQ.

Signed-off-by: Eugenio Pérez 
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index a8d47309cd..efa6448886 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
+typedef void (NetPrepare)(NetClientState *);
 typedef int (NetLoad)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
@@ -72,6 +73,7 @@ typedef struct NetClientInfo {
 NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
+NetPrepare *prepare;
 NetLoad *load;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index a9bf72dcda..6d759b 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -244,6 +244,10 @@ static int vhost_net_start_one(struct vhost_net *net,
 struct vhost_vring_file file = { };
 int r;
 
+if (net->nc->info->prepare) {
+net->nc->info->prepare(net->nc);
+}
+
 r = vhost_dev_enable_notifiers(>dev, dev);
 if (r < 0) {
 goto fail_notifiers;
-- 
2.31.1




[PATCH v3 7/7] vdpa: Always start CVQ in SVQ mode

2022-08-03 Thread Eugenio Pérez
Isolate control virtqueue in its own group, allowing to intercept control
commands but letting dataplane run totally passthrough to the guest.

Signed-off-by: Eugenio Pérez 
---
v3:
* Make asid related queries print a warning instead of returning an
  error and stop the start of qemu.
---
 hw/virtio/vhost-vdpa.c |   3 +-
 net/vhost-vdpa.c   | 144 +++--
 2 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 131100841c..a4cb68862b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -674,7 +674,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
 uint64_t features;
 uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
 int r;
 
 if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, )) {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index e3b65ed546..5f39f0edb5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
 /* Control commands shadow buffers */
 void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
 
+/* Number of address spaces supported by the device */
+unsigned address_space_num;
+
 /* The device always have SVQ enabled */
 bool always_svq;
 bool started;
@@ -100,6 +103,8 @@ static const uint64_t vdpa_svq_device_features =
 BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
 BIT_ULL(VIRTIO_NET_F_STANDBY);
 
+#define VHOST_VDPA_NET_CVQ_ASID 1
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -224,6 +229,101 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_get_vring_group(int device_fd,
+   struct vhost_vring_state *state)
+{
+int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
+if (unlikely(r < 0)) {
+/*
+ * Assume all groups are 0, the consequences are the same and we will
+ * not abort device creation
+ */
+state->num = 0;
+}
+}
+
+/**
+ * Check if all the virtqueues of the virtio device are in a different vq than
+ * the last vq. VQ group of last group passed in cvq_group.
+ */
+static bool vhost_vdpa_cvq_group_is_independent(struct vhost_vdpa *v,
+struct vhost_vring_state cvq_group)
+{
+struct vhost_dev *dev = v->dev;
+
+for (int i = 0; i < (dev->vq_index_end - 1); ++i) {
+struct vhost_vring_state vq_group = {
+.index = i,
+};
+
+vhost_vdpa_get_vring_group(v->device_fd, _group);
+if (unlikely(vq_group.num == cvq_group.num)) {
+warn_report("CVQ %u group is the same as VQ %u one (%u)",
+ cvq_group.index, vq_group.index, cvq_group.num);
+return false;
+}
+}
+
+return true;
+}
+
+static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
+   unsigned vq_group,
+   unsigned asid_num)
+{
+struct vhost_vring_state asid = {
+.index = vq_group,
+.num = asid_num,
+};
+int ret;
+
+ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, );
+if (unlikely(ret < 0)) {
+warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
+asid.index, asid.num, errno, g_strerror(errno));
+}
+return ret;
+}
+
+static void vhost_vdpa_net_prepare(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_vdpa *v = >vhost_vdpa;
+struct vhost_dev *dev = v->dev;
+struct vhost_vring_state cvq_group = {
+.index = v->dev->vq_index_end - 1,
+};
+int r;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (dev->nvqs != 1 || dev->vq_index + dev->nvqs != dev->vq_index_end) {
+/* Only interested in CVQ */
+return;
+}
+
+if (s->always_svq) {
+/* SVQ is already enabled */
+return;
+}
+
+if (s->address_space_num < 2) {
+v->shadow_vqs_enabled = false;
+return;
+}
+
+vhost_vdpa_get_vring_group(v->device_fd, _group);
+if (!vhost_vdpa_cvq_group_is_independent(v, cvq_group)) {
+v->shadow_vqs_enabled = false;
+return;
+}
+
+r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
+VHOST_VDPA_NET_CVQ_ASID);
+v->shadow_vqs_enabled = r == 0;
+s->vhost_vdpa.address_space_id = r == 0 ? 1 : 0;
+}
+
 static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
 {
 VhostIOVATree *tree = v->iova_tree;
@@ -431,6 +531,7 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = 

[PATCH v6 1/2] target/s390x: support PRNO_TRNG instruction

2022-08-03 Thread Jason A. Donenfeld
In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  2 ++
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..3d333e2789 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,8 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..8ad4ef1ace 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -12,12 +12,38 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -52,6 +78,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, >regs[r1], >regs[r1 + 1]);
+fill_buf_random(env, ra, >regs[r2], >regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1




[PATCH 1/2] target/s390x: support PRNO_TRNG instruction

2022-08-03 Thread Jason A. Donenfeld
In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  2 ++
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..3d333e2789 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,8 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..8ad4ef1ace 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -12,12 +12,38 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -52,6 +78,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, >regs[r1], >regs[r1 + 1]);
+fill_buf_random(env, ra, >regs[r2], >regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1




Re: [PULL 0/3] Linux user for 7.1 patches

2022-08-03 Thread Richard Henderson

On 8/3/22 07:56, Laurent Vivier wrote:

The following changes since commit 3e4abe2c92964aadd35344a635b0f32cb487fd5c:

   Merge tag 'pull-block-2022-07-27' of https://gitlab.com/vsementsov/qemu into 
staging (2022-07-27 20:10:15 -0700)

are available in the Git repository at:

   https://gitlab.com/laurent_vivier/qemu.git 
tags/linux-user-for-7.1-pull-request

for you to fetch changes up to 5b63de6b54add51822db3c89325c6fc05534a54c:

   linux-user: Use memfd for open syscall emulation (2022-08-02 15:44:27 +0200)


Pull request linux-user 20220803


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Ilya Leoshkevich (1):
   linux-user: Do not treat madvise()'s advice as a bitmask

Peter Maydell (1):
   linux-user/flatload.c: Fix setting of image_info::end_code

Rainer Müller (1):
   linux-user: Use memfd for open syscall emulation

  linux-user/flatload.c |  2 +-
  linux-user/mmap.c |  2 +-
  linux-user/syscall.c  | 22 ++
  3 files changed, 16 insertions(+), 10 deletions(-)






Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Jason A. Donenfeld
On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> > Thanks for the info. Very helpful. Looking into it now.
> 
> So interestingly, this is not a new issue. If you pass any type of setup
> data, OVMF appears to be doing something unusual and passing 0x
> for all the entries, rather than the actual data. The reason this isn't
> new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> same page fault, on all QEMU versions. The thing that passes the DTB is
> the thing that passes the RNG seed. Same mechanism, same bug.
> 
> I'm looking into it...

Fixed with: 
https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/

Feel free to join into the discussion there. I CC'd you.

Jason



[PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld
The boot parameter header refers to setup_data at an absolute address,
and each setup_data refers to the next setup_data at an absolute address
too. Currently QEMU simply puts the setup_datas right after the kernel
image, and since the kernel_image is loaded at prot_addr -- a fixed
address knowable to QEMU apriori -- the setup_data absolute address
winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.

This mostly works fine, so long as the kernel image really is loaded at
prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
generally EFI doesn't give a good way of predicting where it's going to
load the kernel. So when it loads it at some address != prot_addr, the
absolute addresses in setup_data now point somewhere bogus, causing
crashes when EFI stub tries to follow the next link.

Fix this by placing setup_data at some fixed place in memory, not as
part of the kernel image, and then pointing the setup_data absolute
address to that fixed place in memory. This way, even if OVMF or other
chains relocate the kernel image, the boot parameter still points to the
correct absolute address.

=== NOTE NOTE NOTE NOTE NOTE ===
This commit is currently garbage! It fixes the boot test case, but it
just picks the address 0x1000. That's probably not a good idea. If
somebody with some x86 architectural knowledge could let me know a
better reserved place to put this, that'd be very appreciated.

Fixes: 3cbeb52467 ("hw/i386: add device tree support")
Reported-by: Xiaoyao Li 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Cc: Daniel P. Berrangé 
Cc: Gerd Hoffmann 
Cc: Ard Biesheuvel 
Cc: linux-...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..0b0083b345 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -773,9 +773,9 @@ void x86_load_linux(X86MachineState *x86ms,
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
-int dtb_size, setup_data_offset;
+int dtb_size, setup_data_item_len, setup_data_total_len = 0;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel;
+uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
 hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
 FILE *f;
 char *vmode;
@@ -1048,6 +1048,8 @@ void x86_load_linux(X86MachineState *x86ms,
 }
 fclose(f);
 
+#define SETUP_DATA_PHYS_BASE 0x1000
+
 /* append dtb to kernel */
 if (dtb_filename) {
 if (protocol < 0x209) {
@@ -1062,34 +1064,36 @@ void x86_load_linux(X86MachineState *x86ms,
 exit(1);
 }
 
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
-kernel = g_realloc(kernel, kernel_size);
-
-
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + dtb_size;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = SETUP_DATA_PHYS_BASE + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_DTB);
 setup_data->len = cpu_to_le32(dtb_size);
-
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
 if (!legacy_no_rng_seed) {
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + 
RNG_SEED_LENGTH;
-kernel = g_realloc(kernel, kernel_size);
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + SETUP_RNG_SEED;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = SETUP_DATA_PHYS_BASE + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
 setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
 qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
 }
 
-/* Offset 0x250 is a pointer to the first setup_data link. */
-stq_p(header + 0x250, first_setup_data);
+if (first_setup_data) {
+/* Offset 0x250 is a pointer to the first 

Re: [PATCH v2 02/20] ppc/ppc405: Introduce a PPC405 generic machine

2022-08-03 Thread BALATON Zoltan

On Wed, 3 Aug 2022, Cédric Le Goater wrote:

We will use this machine as a base to define the ref405ep and possibly
the PPC405 hotfoot board as found in the Linux kernel.

Signed-off-by: Cédric Le Goater 
---
hw/ppc/ppc405_boards.c | 31 ---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 1a4e7588c584..4c269b6526a5 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -50,6 +50,15 @@

#define USE_FLASH_BIOS

+struct Ppc405MachineState {
+/* Private */
+MachineState parent_obj;
+/* Public */
+};
+
+#define TYPE_PPC405_MACHINE MACHINE_TYPE_NAME("ppc405")
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405MachineState, PPC405_MACHINE);


In other patches the declaration of the state struct comes after the 
OBJECT_DECLARE macro so here instead of above. It would be better to write 
it like that here too for consistency and also because then the DECLARE 
macro starts the object declaration and everything belonging to the object 
are together below it. Declaring the structure before is kind of outside 
the object, although this is only cosmetic and may be a matter of style.


Regards,
BALATON Zoltan


+
/*/
/* PPC405EP reference board (IBM) */
/* Standalone board with:
@@ -332,18 +341,34 @@ static void ref405ep_class_init(ObjectClass *oc, void 
*data)

mc->desc = "ref405ep";
mc->init = ref405ep_init;
-mc->default_ram_size = 0x0800;
-mc->default_ram_id = "ef405ep.ram";
}

static const TypeInfo ref405ep_type = {
.name = MACHINE_TYPE_NAME("ref405ep"),
-.parent = TYPE_MACHINE,
+.parent = TYPE_PPC405_MACHINE,
.class_init = ref405ep_class_init,
};

+static void ppc405_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+mc->desc = "PPC405 generic machine";
+mc->default_ram_size = 0x0800;
+mc->default_ram_id = "ppc405.ram";
+}
+
+static const TypeInfo ppc405_machine_type = {
+.name = TYPE_PPC405_MACHINE,
+.parent = TYPE_MACHINE,
+.instance_size = sizeof(Ppc405MachineState),
+.class_init = ppc405_machine_class_init,
+.abstract = true,
+};
+
static void ppc405_machine_init(void)
{
+type_register_static(_machine_type);
type_register_static(_type);
}



[PATCH] virtio-scsi: fix race in virtio_scsi_dataplane_start()

2022-08-03 Thread Stefan Hajnoczi
As soon as virtio_scsi_data_plane_start() attaches host notifiers the
IOThread may start virtqueue processing. There is a race between
IOThread virtqueue processing and virtio_scsi_data_plane_start() because
it only assigns s->dataplane_started after attaching host notifiers.

When a virtqueue handler function in the IOThread calls
virtio_scsi_defer_to_dataplane() it may see !s->dataplane_started and
attempt to start dataplane even though we're already in the IOThread:

  #0  0x7f67b360857c __pthread_kill_implementation (libc.so.6 + 0xa257c)
  #1  0x7f67b35bbd56 raise (libc.so.6 + 0x55d56)
  #2  0x7f67b358e833 abort (libc.so.6 + 0x28833)
  #3  0x7f67b358e75b __assert_fail_base.cold (libc.so.6 + 0x2875b)
  #4  0x7f67b35b4cd6 __assert_fail (libc.so.6 + 0x4ecd6)
  #5  0x55ca87fd411b memory_region_transaction_commit (qemu-kvm + 0x67511b)
  #6  0x55ca87e17811 virtio_pci_ioeventfd_assign (qemu-kvm + 0x4b8811)
  #7  0x55ca87e14836 virtio_bus_set_host_notifier (qemu-kvm + 0x4b5836)
  #8  0x55ca87f8e14e virtio_scsi_set_host_notifier (qemu-kvm + 0x62f14e)
  #9  0x55ca87f8dd62 virtio_scsi_dataplane_start (qemu-kvm + 0x62ed62)
  #10 0x55ca87e14610 virtio_bus_start_ioeventfd (qemu-kvm + 0x4b5610)
  #11 0x55ca87f8c29a virtio_scsi_handle_ctrl (qemu-kvm + 0x62d29a)
  #12 0x55ca87fa5902 virtio_queue_host_notifier_read (qemu-kvm + 0x646902)
  #13 0x55ca882c099e aio_dispatch_handler (qemu-kvm + 0x96199e)
  #14 0x55ca882c1761 aio_poll (qemu-kvm + 0x962761)
  #15 0x55ca880e1052 iothread_run (qemu-kvm + 0x782052)
  #16 0x55ca882c562a qemu_thread_start (qemu-kvm + 0x96662a)

This patch assigns s->dataplane_started before attaching host notifiers
so that virtqueue handler functions that run in the IOThread before
virtio_scsi_data_plane_start() returns correctly identify that dataplane
does not need to be started.

Note that s->dataplane_started does not need the AioContext lock because
it is set before attaching host notifiers and cleared after detaching
host notifiers. In other words, the IOThread always sees the value true
and the main loop thread does not modify it while the IOThread is
active.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099541
Reported-by: Qing Wang 
Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/virtio-scsi-dataplane.c | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 8bb6e6acfc..a575c3f0cd 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -66,6 +66,21 @@ static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, 
VirtQueue *vq, int n)
 return 0;
 }
 
+/* Context: BH in IOThread */
+static void virtio_scsi_dataplane_start_bh(void *opaque)
+{
+VirtIOSCSI *s = opaque;
+VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+int i;
+
+virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
+virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
+
+for (i = 0; i < vs->conf.num_queues; i++) {
+virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
+}
+}
+
 /* Context: BH in IOThread */
 static void virtio_scsi_dataplane_stop_bh(void *opaque)
 {
@@ -136,16 +151,18 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
 memory_region_transaction_commit();
 
-aio_context_acquire(s->ctx);
-virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
-virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
-
-for (i = 0; i < vs->conf.num_queues; i++) {
-virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
-}
-
 s->dataplane_starting = false;
 s->dataplane_started = true;
+
+/*
+ * Attach notifiers from within the IOThread. It's possible to attach
+ * notifiers from our thread directly but this approach has the advantages
+ * that virtio_scsi_dataplane_start_bh() is symmetric with
+ * virtio_scsi_dataplane_stop_bh() and the s->dataplane_started assignment
+ * above doesn't require explicit synchronization.
+ */
+aio_context_acquire(s->ctx);
+aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_start_bh, s);
 aio_context_release(s->ctx);
 return 0;
 
-- 
2.37.1




Re: [PATCH 0/2] vmgenid: add generation counter

2022-08-03 Thread Daniel P . Berrangé
On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchal...@amazon.es wrote:
> From: Babis Chalios 
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

The VM generation ID feature in QEMU is implementing a spec defined
by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
more. This series is proposing a QEMU specific variant, which means
Linux running on all these other hypervisor platforms won't benefit
from the change. If the counter were provided entirely in the guest
kernel, then it works across all hypervisors.

It feels like the kernel ought to provide an implementation itself
as a starting point, with this QEMU change merely being an optional
enhancement to close the race window.

Ideally there would be someone at Microsoft we could connect with to
propose they include this feature in a VM Gen ID spec update, but I
don't personally know who to contact about that kind of thing. A
spec update would increase chances that this change gets provieded
across all hypervisors.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH for-7.2 03/10] ppc/pnv: set root port chassis and slot using Bus properties

2022-08-03 Thread Cédric Le Goater

On 8/3/22 15:44, Daniel Henrique Barboza wrote:

For default root ports we have a way of accessing chassis and slot,
before root_port_realize(), via pnv_phb_attach_root_port(). For the
future user created root ports this won't be the case: we can't use
this helper because we don't have access to the PHB phb-id/chip-id
values.

In earlier patches we've added phb-id and chip-id to pnv-phb-root-bus
objects. We're now able to use the bus to retrieve them. The bus is
reachable for both user created and default devices, so we're changing
all the code paths. This also allow us to validate these changes with
the existing default devices.

Signed-off-by: Daniel Henrique Barboza 


Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/pci-host/pnv_phb.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index c47ed92462..826c0c144e 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -25,21 +25,19 @@
   * QOM id. 'chip_id' is going to be used as PCIE chassis for the
   * root port.
   */
-static void pnv_phb_attach_root_port(PCIHostState *pci, int index, int chip_id)
+static void pnv_phb_attach_root_port(PCIHostState *pci)
  {
  PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-g_autofree char *default_id = g_strdup_printf("%s[%d]",
-  TYPE_PNV_PHB_ROOT_PORT,
-  index);
  const char *dev_id = DEVICE(root)->id;
+g_autofree char *default_id = NULL;
+int index;
+
+index = object_property_get_int(OBJECT(pci->bus), "phb-id", _fatal);
+default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
  
  object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,

OBJECT(root));
  
-/* Set unique chassis/slot values for the root port */

-qdev_prop_set_uint8(DEVICE(root), "chassis", chip_id);
-qdev_prop_set_uint16(DEVICE(root), "slot", index);
-
  pci_realize_and_unref(root, pci->bus, _fatal);
  }
  
@@ -93,7 +91,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)

  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
-pnv_phb_attach_root_port(pci, phb->phb_id, phb->chip_id);

+pnv_phb_attach_root_port(pci);
  }
  
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,

@@ -162,9 +160,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  {
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
  PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
  PCIDevice *pci = PCI_DEVICE(dev);
  uint16_t device_id = 0;
  Error *local_err = NULL;
+int chip_id, index;
+
+chip_id = object_property_get_int(OBJECT(bus), "chip-id", _fatal);
+index = object_property_get_int(OBJECT(bus), "phb-id", _fatal);
+
+/* Set unique chassis/slot values for the root port */
+qdev_prop_set_uint8(dev, "chassis", chip_id);
+qdev_prop_set_uint16(dev, "slot", index);
  
  rpc->parent_realize(dev, _err);

  if (local_err) {





Re: [PATCH for-7.2 01/10] ppc/pnv: add phb-id/chip-id PnvPHB3RootBus properties

2022-08-03 Thread Cédric Le Goater

On 8/3/22 15:44, Daniel Henrique Barboza wrote:

We rely on the phb-id and chip-id, which are PHB properties, to assign
chassis and slot to the root port. For default devices this is no big
deal: the root port is being created under pnv_phb_realize() and the
values are being passed on via the 'index' and 'chip-id' of the
pnv_phb_attach_root_port() helper.

If we want to implement user created root ports we have a problem. The
user created root port will not be aware of which PHB it belongs to,
unless we're willing to violate QOM best practices and access the PHB
via dev->parent_bus->parent. What we can do is to access the root bus
parent bus.

Since we're already assigning the root port as QOM child of the bus, and
the bus is initiated using PHB properties, let's add phb-id and chip-id
as properties of the bus. This will allow us trivial access to them, for
both user-created and default root ports, without doing anything too
shady with QOM.

Signed-off-by: Daniel Henrique Barboza 


Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  hw/pci-host/pnv_phb3.c | 50 ++
  include/hw/pci-host/pnv_phb3.h |  9 +-
  2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
index d4c04a281a..af8575c007 100644
--- a/hw/pci-host/pnv_phb3.c
+++ b/hw/pci-host/pnv_phb3.c
@@ -1006,6 +1006,11 @@ void pnv_phb3_bus_init(DeviceState *dev, PnvPHB3 *phb)
   >pci_mmio, >pci_io,
   0, 4, TYPE_PNV_PHB3_ROOT_BUS);
  
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,

+_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+_abort);
+
  pci_setup_iommu(pci->bus, pnv_phb3_dma_iommu, phb);
  }
  
@@ -1105,10 +1110,55 @@ static const TypeInfo pnv_phb3_type_info = {

  .instance_init = pnv_phb3_instance_init,
  };
  
+static void pnv_phb3_root_bus_get_prop(Object *obj, Visitor *v,

+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, , errp);
+}
+
+static void pnv_phb3_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, , errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
  static void pnv_phb3_root_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *k = BUS_CLASS(klass);
  
+object_class_property_add(klass, "phb-id", "int",

+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
  /*
   * PHB3 has only a single root complex. Enforce the limit on the
   * parent bus
diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
index bff69201d9..4854f6d2f6 100644
--- a/include/hw/pci-host/pnv_phb3.h
+++ b/include/hw/pci-host/pnv_phb3.h
@@ -104,9 +104,16 @@ struct PnvPBCQState {
  };
  
  /*

- * PHB3 PCIe Root port
+ * PHB3 PCIe Root Bus
   */
  #define TYPE_PNV_PHB3_ROOT_BUS "pnv-phb3-root"
+struct PnvPHB3RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB3RootBus, PNV_PHB3_ROOT_BUS)
  
  /*

   * PHB3 PCIe Host Bridge for PowerNV machines (POWER8)





Re: [PATCH for-7.2 02/10] ppc/pnv: add phb-id/chip-id PnvPHB4RootBus properties

2022-08-03 Thread Cédric Le Goater

On 8/3/22 15:44, Daniel Henrique Barboza wrote:

The same rationale provided in the PHB3 bus case applies here.

Note: we could have merged both buses in a single object, like we did
with the root ports, and spare some boilerplate. The reason we opted to
preserve both buses objects is twofold:

- there's not user side advantage in doing so. Unifying the root ports
presents a clear user QOL change when we enable user created devices back.
The buses objects, aside from having a different QOM name, is transparent
to the user;

- we leave a door opened in case we want to increase the root port limit
for phb4/5 later on without having to deal with phb3 code.

Signed-off-by: Daniel Henrique Barboza 




Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  hw/pci-host/pnv_phb4.c | 51 ++
  include/hw/pci-host/pnv_phb4.h | 10 +++
  2 files changed, 61 insertions(+)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index b98c394713..824e1a73fb 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1551,6 +1551,12 @@ void pnv_phb4_bus_init(DeviceState *dev, PnvPHB4 *phb)
   pnv_phb4_set_irq, pnv_phb4_map_irq, phb,
   >pci_mmio, >pci_io,
   0, 4, TYPE_PNV_PHB4_ROOT_BUS);
+
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,
+_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+_abort);
+
  pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
  pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
  }
@@ -1708,10 +1714,55 @@ static const TypeInfo pnv_phb5_type_info = {
  .instance_size = sizeof(PnvPHB4),
  };
  
+static void pnv_phb4_root_bus_get_prop(Object *obj, Visitor *v,

+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, , errp);
+}
+
+static void pnv_phb4_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, , errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
  static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *k = BUS_CLASS(klass);
  
+object_class_property_add(klass, "phb-id", "int",

+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
  /*
   * PHB4 has only a single root complex. Enforce the limit on the
   * parent bus
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 20aa4819d3..50d4faa001 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -45,7 +45,17 @@ typedef struct PnvPhb4DMASpace {
  QLIST_ENTRY(PnvPhb4DMASpace) list;
  } PnvPhb4DMASpace;
  
+/*

+ * PHB4 PCIe Root Bus
+ */
  #define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root"
+struct PnvPHB4RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB4RootBus, PNV_PHB4_ROOT_BUS)
  
  /*

   * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)





[PATCH v2 2/2] virtio: Add shared memory capability

2022-08-03 Thread Antonio Caggiano
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
and the data structure 'virtio_pci_shm_cap' to go with it.
They allow defining shared memory regions with sizes and offsets
of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

v2: Remove virtio_pci_shm_cap as virtio_pci_cap64 is used instead.
v3: No need for mask32 as cpu_to_le32 truncates the value.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 45327f0b31..50bd230122 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1164,6 +1164,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, );
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index 2446dcd9ae..5e5c4a4c6d 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -252,4 +252,8 @@ void virtio_pci_types_register(const 
VirtioPCIDeviceTypeInfo *t);
  */
 unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id);
+
 #endif
-- 
2.34.1




Re: [RFC PATCH 1/3] target/ppc: Bugfix fadd/fsub result with OE/UE set

2022-08-03 Thread Richard Henderson

On 8/3/22 05:22, Lucas Mateus Castro(alqotel) wrote:

From: "Lucas Mateus Castro (alqotel)" 

As mentioned in the functions float_overflow_excp and
float_underflow_excp, the result should be adjusted as mentioned in the
ISA (subtracted 192/1536 from the exponent of the intermediate result if
an overflow occurs with OE set and added 192/1536 to the exponent of the
intermediate result if an underflow occurs with UE set), but at those
functions the result has already been rounded so it is not possible to
add/subtract from the intermediate result anymore.
  
This patch creates a new function that receives the value that should be

subtracted/added from the exponent if an overflow/underflow happens, to
not leave some arbitrary numbers from the PowerISA in the middle of the
FPU code. If these numbers are 0 the new functions just call the old
ones.

I used 2 values here for overflow and underflow, maybe it'd be better to
just use the same ones, any thoughts?

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
An alternative I've thought was to always return the value adjusted if a
overflow or underflow occurs and in float_underflow_excp and
float_overflow_excp adjust it to inf/den/0 if OE/UE is 0, but I didn't
saw many advantages to that approach.
---
  fpu/softfloat.c | 75 +
  include/fpu/softfloat.h |  2 ++
  target/ppc/fpu_helper.c | 10 --
  3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4a871ef2a1..a407129dcb 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -268,6 +268,8 @@ typedef bool (*f64_check_fn)(union_float64 a, union_float64 
b);
  
  typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);

  typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
+typedef float64 (*soft_f64_op2_int2_fn)(float64 a, float64 b, int c, int d,
+float_status *s);
  typedef float   (*hard_f32_op2_fn)(float a, float b);
  typedef double  (*hard_f64_op2_fn)(double a, double b);
  
@@ -401,6 +403,19 @@ float64_gen2(float64 xa, float64 xb, float_status *s,

  return soft(ua.s, ub.s, s);
  }
  
+static inline float64

+float64_gen2_excp(float64 xa, float64 xb, int xc, int xd, float_status *s,
+  hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+  soft_f64_op2_int2_fn soft_excp, f64_check_fn pre,
+  f64_check_fn post)
+{
+if (xc || xd) {
+return soft_excp(xa, xb, xc, xd, s);
+} else {
+return float64_gen2(xa, xb, s, hard, soft, pre, post);
+}
+}
+
  /*
   * Classify a floating point number. Everything above float_class_qnan
   * is a NaN so cls >= float_class_qnan is any NaN.
@@ -1929,6 +1944,39 @@ static double hard_f64_sub(double a, double b)
  return a - b;
  }
  
+static float64 QEMU_SOFTFLOAT_ATTR

+soft_f64_addsub_excp_en(float64 a, float64 b, int oe_sub, int ue_sum,
+float_status *status, bool subtract)
+{
+FloatParts64 pa, pb, *pr;
+
+float64_unpack_canonical(, a, status);
+float64_unpack_canonical(, b, status);
+pr = parts_addsub(, , status, subtract);
+
+if (unlikely(oe_sub && (pr->exp > 1023))) {
+pr->exp -= oe_sub;
+float_raise(float_flag_overflow, status);
+} else if (unlikely(ue_sum && (pr->exp < -1022))) {
+pr->exp += ue_sum;
+float_raise(float_flag_underflow, status);
+}
+
+return float64_round_pack_canonical(pr, status);


This is incorrect, because the exponent is not fixed until the middle of 
round_pack_canonical.

I think you should not add new functions like this, with new parameters, but instead add 
fields to float_status, which would then be checked at the places currently setting 
underflow and overflow.



r~



[PATCH v2 0/2] virtio-gpu: Shared memory capability

2022-08-03 Thread Antonio Caggiano
I guess RFC has been waiting long enough [0].

v2: Squash patch #3 into patch #2, and formatting fixes to patch #1.

[0] https://www.mail-archive.com/qemu-devel@nongnu.org/msg840405.html

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem

 hw/display/virtio-gpu-pci.c| 15 +++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-gpu.h |  5 +
 include/hw/virtio/virtio-pci.h |  4 
 6 files changed, 67 insertions(+), 9 deletions(-)

-- 
2.34.1




[PATCH v2 1/2] virtio-gpu: hostmem

2022-08-03 Thread Antonio Caggiano
From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

v2: Formatting fixes

Signed-off-by: Antonio Caggiano 
Acked-by: Michael S. Tsirkin 
---
 hw/display/virtio-gpu-pci.c| 15 +++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..2cbbacd7fe 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,21 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
+qdev_set_parent_bus(vdev, BUS(_dev->bus), errp);
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 20cc703dcc..506b3b8eef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1424,6 +1424,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index 4dcb34c4a7..aa8d1ab993 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..eafce75b04 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -102,12 +102,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -131,6 +134,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.34.1




Re: [PATCH 0/2] vmgenid: add generation counter

2022-08-03 Thread bchalios



On 8/3/22 5:36 PM, "Michael S. Tsirkin"  wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchal...@amazon.es wrote:
> From: Babis Chalios 
>
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
>
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
>
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

Producing another 4 bytes is not really the issue, the issue
is how does guest consume this.
So I would like this discussion to happen on the linux kernel mailing
list not just here.  Can you post the linux patch please?



CCed you in the Linux patch thread.





> Babis Chalios (2):
>vmgenid: make device data size configurable
>vmgenid: add generation counter
>
>   docs/specs/vmgenid.txt| 101 ++
>   hw/acpi/vmgenid.c | 145 +++---
>   include/hw/acpi/vmgenid.h |  23 --
>   3 files changed, 204 insertions(+), 65 deletions(-)
>
> --
> 2.37.1
>
> Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 
5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja 
M-401234 . CIF B84570936



Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 
28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja 
M-401234 . CIF B84570936


Re: [PATCH v10 07/21] blockjob: introduce block_job _locked() APIs

2022-08-03 Thread Kevin Wolf
Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> Just as done with job.h, create _locked() functions in blockjob.h
> 
> These functions will be later useful when caller has already taken
> the lock. All blockjob _locked functions call job _locked functions.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v7 11/14] KVM: Register/unregister the guest private memory regions

2022-08-03 Thread Sean Christopherson
On Wed, Aug 03, 2022, Chao Peng wrote:
> On Tue, Aug 02, 2022 at 04:38:55PM +, Sean Christopherson wrote:
> > On Tue, Aug 02, 2022, Sean Christopherson wrote:
> > > I think we should avoid UNMAPPABLE even on the KVM side of things for the 
> > > core
> > > memslots functionality and instead be very literal, e.g.
> > > 
> > >   KVM_HAS_FD_BASED_MEMSLOTS
> > >   KVM_MEM_FD_VALID
> > > 
> > > We'll still need KVM_HAS_USER_UNMAPPABLE_MEMORY, but it won't be tied 
> > > directly to
> > > the memslot.  Decoupling the two thingis will require a bit of extra 
> > > work, but the
> > > code impact should be quite small, e.g. explicitly query and propagate
> > > MEMFILE_F_USER_INACCESSIBLE to kvm_memory_slot to track if a memslot can 
> > > be private.
> > > And unless I'm missing something, it won't require an additional memslot 
> > > flag.
> > > The biggest oddity (if we don't also add KVM_MEM_PRIVATE) is that KVM 
> > > would
> > > effectively ignore the hva for fd-based memslots for VM types that don't 
> > > support
> > > private memory, i.e. userspace can't opt out of using the fd-based 
> > > backing, but that
> > > doesn't seem like a deal breaker.
> 
> I actually love this idea. I don't mind adding extra code for potential
> usage other than confidential VMs if we can have a workable solution for
> it.
> 
> > 
> > Hrm, but basing private memory on top of a generic FD_VALID would 
> > effectively require
> > shared memory to use hva-based memslots for confidential VMs.  That'd yield 
> > a very
> > weird API, e.g. non-confidential VMs could be backed entirely by fd-based 
> > memslots,
> > but confidential VMs would be forced to use hva-based memslots.
> 
> It would work if we can treat userspace_addr as optional for
> KVM_MEM_FD_VALID, e.g. userspace can opt in to decide whether needing
> the mappable part or not for a regular VM and we can enforce KVM for
> confidential VMs. But the u64 type of userspace_addr doesn't allow us to
> express a 'null' value so sounds like we will end up needing another
> flag anyway.
> 
> In concept, we could have three cofigurations here:
>   1. hva-only: without any flag and use userspace_addr;
>   2. fd-only:  another new flag is needed and use fd/offset;
>   3. hva/fd mixed: both userspace_addr and fd/offset is effective.
>  KVM_MEM_PRIVATE is a subset of it for confidential VMs. Not sure
>  regular VM also wants this.

My mental model breaks things down slightly differently, though the end result 
is
more or less the same. 

After this series, there will be two types of memory: private and "regular" (I'm
trying to avoid "shared").  "Regular" memory is always hva-based 
(userspace_addr),
and private always fd-based (fd+offset).

In the future, if we want to support fd-based memory for "regular" memory, then
as you said we'd need to add a new flag, and a new fd+offset pair.

At that point, we'd have two new (relatively to current) flags:

  KVM_MEM_PRIVATE_FD_VALID
  KVM_MEM_FD_VALID

along with two new pairs of fd+offset (private_* and "regular").  Mapping those
to your above list:
  
  1.  Neither *_FD_VALID flag set.
  2a. Both PRIVATE_FD_VALID and FD_VALID are set
  2b. FD_VALID is set and the VM doesn't support private memory
  3.  Only PRIVATE_FD_VALID is set (which private memory support in the VM).

Thus, "regular" VMs can't have a mix in a single memslot because they can't use
private memory.

> There is no direct relationship between unmappable and fd-based since
> even fd-based can also be mappable for regular VM?

Yep.

> > Ignore this idea for now.  If there's an actual use case for generic 
> > fd-based memory
> > then we'll want a separate flag, fd, and offset, i.e. that support could be 
> > added
> > independent of KVM_MEM_PRIVATE.
> 
> If we ignore this idea now (which I'm also fine), do you still think we
> need change KVM_MEM_PRIVATE to KVM_MEM_USER_UNMAPPBLE?

Hmm, no.  After working through this, I think it's safe to say 
KVM_MEM_USER_UNMAPPABLE
is bad name because we could end up with "regular" memory that's backed by an
inaccessible (unmappable) file.

One alternative would be to call it KVM_MEM_PROTECTED.  That shouldn't cause
problems for the known use of "private" (TDX and SNP), and it gives us a little
wiggle room, e.g. if we ever get a use case where VMs can share memory that is
otherwise protected.

That's a pretty big "if" though, and odds are good we'd need more memslot flags 
and
fd+offset pairs to allow differentiating "private" vs. "protected-shared" 
without
forcing userspace to punch holes in memslots, so I don't know that hedging now 
will
buy us anything.

So I'd say that if people think KVM_MEM_PRIVATE brings additional and meaningful
clarity over KVM_MEM_PROTECTECD, then lets go with PRIVATE.  But if PROTECTED is
just as good, go with PROTECTED as it gives us a wee bit of wiggle room for the
future.

Note, regardless of what name we settle on, I think it makes to do the
KVM_PRIVATE_MEM_SLOTS => KVM_INTERNAL_MEM_SLOTS 

[PATCH v3 0/2] virtio: Add shared memory capability

2022-08-03 Thread Antonio Caggiano
Previously part of [0], now a patch series on its own.

This patch series cherry picks two commits from [1] and applies one fix
according to [2], which should answer Gerd's comment [3] on previous
patch.

v2: Squash patch #3 into patch #2, and formatting fixes to patch #1.
v3: Reverse commits order.

[0] https://www.mail-archive.com/qemu-devel@nongnu.org/msg826897.html
[1] https://gitlab.freedesktop.org/virgl/qemu/-/commits/virtio-gpu-next/
[2] 
https://github.com/torvalds/linux/commit/0dd4ff93f4c8dba016ad79384007da4938cd54a1
[3] https://www.mail-archive.com/qemu-devel@nongnu.org/msg827306.html


Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem

 hw/display/virtio-gpu-pci.c| 15 +++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-gpu.h |  5 +
 include/hw/virtio/virtio-pci.h |  4 
 6 files changed, 67 insertions(+), 9 deletions(-)

-- 
2.34.1




[PATCH v3 2/2] virtio-gpu: hostmem

2022-08-03 Thread Antonio Caggiano
From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

v2: Formatting fixes

Signed-off-by: Antonio Caggiano 
Acked-by: Michael S. Tsirkin 
---
 hw/display/virtio-gpu-pci.c| 15 +++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..2cbbacd7fe 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,21 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
+qdev_set_parent_bus(vdev, BUS(_dev->bus), errp);
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 20cc703dcc..506b3b8eef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1424,6 +1424,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index 4dcb34c4a7..aa8d1ab993 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..eafce75b04 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -102,12 +102,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -131,6 +134,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.34.1




[PATCH v3 1/2] virtio: Add shared memory capability

2022-08-03 Thread Antonio Caggiano
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
and the data structure 'virtio_pci_shm_cap' to go with it.
They allow defining shared memory regions with sizes and offsets
of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

v2: Remove virtio_pci_shm_cap as virtio_pci_cap64 is used instead.
v3: No need for mask32 as cpu_to_le32 truncates the value.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 45327f0b31..50bd230122 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1164,6 +1164,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, );
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index 2446dcd9ae..5e5c4a4c6d 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -252,4 +252,8 @@ void virtio_pci_types_register(const 
VirtioPCIDeviceTypeInfo *t);
  */
 unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id);
+
 #endif
-- 
2.34.1




Re: [PATCH v10 06/21] job: move and update comments from blockjob.c

2022-08-03 Thread Kevin Wolf
Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> This comment applies more on job, it was left in blockjob as in the past
> the whole job logic was implemented there.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> No functional change intended.
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  blockjob.c | 20 
>  job.c  | 14 ++
>  2 files changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 4868453d74..7da59a1f1c 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -36,21 +36,6 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/timer.h"
>  
> -/*
> - * The block job API is composed of two categories of functions.
> - *
> - * The first includes functions used by the monitor.  The monitor is
> - * peculiar in that it accesses the block job list with block_job_get, and
> - * therefore needs consistency across block_job_get and the actual operation
> - * (e.g. block_job_set_speed).  The consistency is achieved with
> - * aio_context_acquire/release.  These functions are declared in blockjob.h.
> - *
> - * The second includes functions used by the block job drivers and sometimes
> - * by the core block layer.  These do not care about locking, because the
> - * whole coroutine runs under the AioContext lock, and are declared in
> - * blockjob_int.h.
> - */
> -
>  static bool is_block_job(Job *job)
>  {
>  return job_type(job) == JOB_TYPE_BACKUP ||
> @@ -433,11 +418,6 @@ static void block_job_event_ready(Notifier *n, void 
> *opaque)
>  }
>  
>  
> -/*
> - * API for block job drivers and the block layer.  These functions are
> - * declared in blockjob_int.h.
> - */
> -
>  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
> JobTxn *txn, BlockDriverState *bs, uint64_t perm,
> uint64_t shared_perm, int64_t speed, int flags,
> diff --git a/job.c b/job.c
> index ae25db97ac..ebaa4e585b 100644
> --- a/job.c
> +++ b/job.c
> @@ -32,6 +32,20 @@
>  #include "trace/trace-root.h"
>  #include "qapi/qapi-events-job.h"
>  
> +/*
> + * The job API is composed of two categories of functions.
> + *
> + * The first includes functions used by the monitor.  The monitor is
> + * peculiar in that it accesses the block job list with job_get, and

s/block job/job/

> + * therefore needs consistency across job_get and the actual operation
> + * (e.g. job_user_cancel). To achieve this consistency, the caller
> + * calls job_lock/job_unlock itself around the whole operation.
> + *
> + *
> + * The second includes functions used by the block job drivers and sometimes

Same here.

> + * by the core block layer. These delegate the locking to the callee instead.
> + */

Unless I'm missing something, this comment (specifically the part with
calling job_lock/job_unlock outside of job.c) is actually not true at
this point in the series. I would suggest adding a comment to this
effect, like:

* TODO Actually make this true

Then we know that when you remove the comment, we need to review that
it's actually true at that point in the series.

For now, I'll just try to remember checking this later.

Kevin




Re: [PATCH v2 2/2] virtio: Add shared memory capability

2022-08-03 Thread Michael S. Tsirkin
On Wed, Aug 03, 2022 at 05:21:35PM +0200, Antonio Caggiano wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
> and the data structure 'virtio_pci_shm_cap' to go with it.
> They allow defining shared memory regions with sizes and offsets
> of 2^32 and more.
> Multiple instances of the capability are allowed and distinguished
> by a device-specific 'id'.
> 
> v2: Remove virtio_pci_shm_cap as virtio_pci_cap64 is used instead.
> v3: No need for mask32 as cpu_to_le32 truncates the value.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> Signed-off-by: Antonio Caggiano 


looks like the patches are in the reverse order, 1/2 won't
build without 2/2
> ---
>  hw/virtio/virtio-pci.c | 18 ++
>  include/hw/virtio/virtio-pci.h |  4 
>  2 files changed, 22 insertions(+)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 45327f0b31..50bd230122 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1164,6 +1164,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy 
> *proxy,
>  return offset;
>  }
>  
> +int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
> +   uint8_t bar, uint64_t offset, uint64_t length,
> +   uint8_t id)
> +{
> +struct virtio_pci_cap64 cap = {
> +.cap.cap_len = sizeof cap,
> +.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
> +};
> +
> +cap.cap.bar = bar;
> +cap.cap.length = cpu_to_le32(length);
> +cap.length_hi = cpu_to_le32(length >> 32);
> +cap.cap.offset = cpu_to_le32(offset);
> +cap.offset_hi = cpu_to_le32(offset >> 32);
> +cap.cap.id = id;
> +return virtio_pci_add_mem_cap(proxy, );
> +}
> +
>  static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
> unsigned size)
>  {
> diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
> index 2446dcd9ae..5e5c4a4c6d 100644
> --- a/include/hw/virtio/virtio-pci.h
> +++ b/include/hw/virtio/virtio-pci.h
> @@ -252,4 +252,8 @@ void virtio_pci_types_register(const 
> VirtioPCIDeviceTypeInfo *t);
>   */
>  unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
>  
> +int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
> +   uint8_t bar, uint64_t offset, uint64_t length,
> +   uint8_t id);
> +
>  #endif
> -- 
> 2.34.1




  1   2   3   >