date:20220223

Re: [PATCH 5/7] hw/smbios: code cleanup - use macro definitions for table header handles

2022-02-23 Thread Philippe Mathieu-Daudé


On 23/2/22 15:33, Ani Sinha wrote:

This is a minor cleanup. Using macro definitions makes the code more
readable. It is at once clear which tables use which handle numbers in their
header. It also makes it easy to calculate the gaps between the numbers and
update them if needed.

Reviewed-by: Igor Mammedov 
Signed-off-by: Ani Sinha 
---
  hw/smbios/smbios.c | 38 ++
  1 file changed, 26 insertions(+), 12 deletions(-)



Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v6 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-23 Thread Christian Borntraeger


Am 23.02.22 um 23:29 schrieb David Miller:

Yes I'm adding to this patch,  I haven't quite figured out where to
put them,  they are inline to various things in the patch themselves
so I'm putting in the cover letter under the patch they go to.
I hope that's correct.


You usually put it under your Signed-off-by: line in the patch.
I think Thomas can fixup when applying.



Thanks
- David Miller

On Wed, Feb 23, 2022 at 8:40 AM Christian Borntraeger
 wrote:



Am 18.02.22 um 00:17 schrieb David Miller:

resolves: https://gitlab.com/qemu-project/qemu/-/issues/737
implements:
AND WITH COMPLEMENT   (NCRK, NCGRK)
NAND  (NNRK, NNGRK)
NOT EXCLUSIVE OR  (NXRK, NXGRK)
NOR   (NORK, NOGRK)
OR WITH COMPLEMENT(OCRK, OCGRK)
SELECT(SELR, SELGR)
SELECT HIGH   (SELFHR)
MOVE RIGHT TO LEFT(MVCRL)
POPULATION COUNT  (POPCNT)

Signed-off-by: David Miller 


For your next patches, feel free to add previous Reviewed-by: tags so that 
others
can see what review has already happened.

Re: [PATCH] qapi: fix mistake in example command illustration

2022-02-23 Thread Markus Armbruster

"Dr. David Alan Gilbert"  writes:

> * Daniel P. Berrangé (berra...@redhat.com) wrote:
>> The snapshot-load/save/delete commands illustrated their usage, but
>> mistakenly used 'data' rather than 'arguments' as the field name.
>> 
>> Signed-off-by: Daniel P. Berrangé 
>
> Fabian Holler's patch from yesterday beat you to it slightly;
> I think Markus has it queued.

Correct.  Thanks anyway!

> (20220222170116.63105-1-fabian.hol...@simplesurance.de )

[PATCH v2] qapi, target/i386/sev: Add cpu0-id to query-sev-capabilities

2022-02-23 Thread Dov Murik

Add a new field 'cpu0-id' to the response of query-sev-capabilities QMP
command.  The value of the field is the base64-encoded 64-byte unique ID
of the CPU0 (socket 0), which can be used to retrieve the signed CEK of
the CPU from AMD's Key Distribution Service (KDS).

Signed-off-by: Dov Murik 

---

v2:
- change encoding to Base64 (thanks Daniel)
- rename constant to SEV_CPU_UNIQUE_ID_LEN
---
 qapi/misc-target.json |  4 
 target/i386/sev.c | 27 +++
 2 files changed, 31 insertions(+)

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4bc45d2474..c6d9ad69e1 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -177,6 +177,8 @@
 #
 # @cert-chain:  PDH certificate chain (base64 encoded)
 #
+# @cpu0-id: 64-byte unique ID of CPU0 (base64 encoded) (since 7.0)
+#
 # @cbitpos: C-bit location in page table entry
 #
 # @reduced-phys-bits: Number of physical Address bit reduction when SEV is
@@ -187,6 +189,7 @@
 { 'struct': 'SevCapability',
   'data': { 'pdh': 'str',
 'cert-chain': 'str',
+'cpu0-id': 'str',
 'cbitpos': 'int',
 'reduced-phys-bits': 'int'},
   'if': 'TARGET_I386' }
@@ -205,6 +208,7 @@
 #
 # -> { "execute": "query-sev-capabilities" }
 # <- { "return": { "pdh": "8CCDD8DDD", "cert-chain": "888CCCDDDEE",
+#  "cpu0-id": "2lvmGwo+...61iEinw==",
 #  "cbitpos": 47, "reduced-phys-bits": 5}}
 #
 ##
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 025ff7a6f8..d3d2680e16 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -82,6 +82,8 @@ struct SevGuestState {
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
 
+#define SEV_CPU_UNIQUE_ID_LEN   64
+
 #define SEV_INFO_BLOCK_GUID "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
 typedef struct __attribute__((__packed__)) SevInfoBlock {
 /* SEV-ES Reset Vector Address */
@@ -531,11 +533,31 @@ e_free:
 return 1;
 }
 
+static int
+sev_get_id(int fd, guchar *id_buf, size_t id_buf_len, Error **errp)
+{
+struct sev_user_data_get_id2 id = {
+.address = (unsigned long)id_buf,
+.length = id_buf_len
+};
+int err, r;
+
+r = sev_platform_ioctl(fd, SEV_GET_ID2, , );
+if (r < 0) {
+error_setg(errp, "SEV: Failed to get ID ret=%d fw_err=%d (%s)",
+   r, err, fw_error_to_str(err));
+return 1;
+}
+
+return 0;
+}
+
 static SevCapability *sev_get_capabilities(Error **errp)
 {
 SevCapability *cap = NULL;
 guchar *pdh_data = NULL;
 guchar *cert_chain_data = NULL;
+guchar cpu0_id[SEV_CPU_UNIQUE_ID_LEN];
 size_t pdh_len = 0, cert_chain_len = 0;
 uint32_t ebx;
 int fd;
@@ -561,9 +583,14 @@ static SevCapability *sev_get_capabilities(Error **errp)
 goto out;
 }
 
+if (sev_get_id(fd, cpu0_id, sizeof(cpu0_id), errp)) {
+goto out;
+}
+
 cap = g_new0(SevCapability, 1);
 cap->pdh = g_base64_encode(pdh_data, pdh_len);
 cap->cert_chain = g_base64_encode(cert_chain_data, cert_chain_len);
+cap->cpu0_id = g_base64_encode(cpu0_id, sizeof(cpu0_id));
 
 host_cpuid(0x801F, 0, NULL, , NULL, NULL);
 cap->cbitpos = ebx & 0x3f;
-- 
2.25.1

[PATCH] vl: transform QemuOpts device to JSON syntax device

2022-02-23 Thread Zhenzhong Duan

While there are mixed use of traditional -device option and JSON
syntax option, QEMU reports conflict, e.x:

/usr/libexec/qemu-kvm -nodefaults \
  -device 
'{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x02.0"}' \
  -device virtio-scsi-pci,id=scsi1,bus=pci.0

It breaks with:

qemu-kvm: -device 
{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x02.0"}: PCI: 
slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-scsi-pci

But if we reformat first -device same as the second, so only same kind
of option for all the devices, it succeeds, vice versa. e.x:

/usr/libexec/qemu-kvm -nodefaults \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=02.0 \
  -device virtio-scsi-pci,id=scsi1,bus=pci.0

Succeed!

Because both kind of options are inserted into their own list and
break the order in QEMU command line during BDF auto assign. Fix it
by transform QemuOpts into JSON syntax and insert in JSON device
list, so the order in QEMU command line kept.

Signed-off-by: Zhenzhong Duan 
---
 softmmu/vl.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 1fe028800fdf..3def40b5405e 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3394,21 +3394,26 @@ void qemu_init(int argc, char **argv, char **envp)
 qdict_put_str(machine_opts_dict, "usb", "on");
 add_device_config(DEV_USB, optarg);
 break;
-case QEMU_OPTION_device:
+case QEMU_OPTION_device: {
+QObject *obj;
 if (optarg[0] == '{') {
-QObject *obj = qobject_from_json(optarg, _fatal);
-DeviceOption *opt = g_new0(DeviceOption, 1);
-opt->opts = qobject_to(QDict, obj);
-loc_save(>loc);
-assert(opt->opts != NULL);
-QTAILQ_INSERT_TAIL(_opts, opt, next);
+obj = qobject_from_json(optarg, _fatal);
 } else {
-if (!qemu_opts_parse_noisily(qemu_find_opts("device"),
- optarg, true)) {
+opts = qemu_opts_parse_noisily(qemu_find_opts("device"),
+   optarg, true);
+if (!opts) {
 exit(1);
 }
+obj = QOBJECT(qemu_opts_to_qdict(opts, NULL));
+qemu_opts_del(opts);
 }
+DeviceOption *opt = g_new0(DeviceOption, 1);
+opt->opts = qobject_to(QDict, obj);
+loc_save(>loc);
+assert(opt->opts != NULL);
+QTAILQ_INSERT_TAIL(_opts, opt, next);
 break;
+}
 case QEMU_OPTION_smp:
 machine_parse_property_opt(qemu_find_opts("smp-opts"),
"smp", optarg);
-- 
2.25.1

Re: [PATCH v3] target/riscv: Add isa extenstion strings to the device tree

2022-02-23 Thread Alistair Francis

On Wed, Feb 23, 2022 at 8:39 AM Atish Patra  wrote:
>
> The Linux kernel parses the ISA extensions from "riscv,isa" DT
> property. It used to parse only the single letter base extensions
> until now. A generic ISA extension parsing framework was proposed[1]
> recently that can parse multi-letter ISA extensions as well.
>
> Generate the extended ISA string by appending  the available ISA extensions
> to the "riscv,isa" string if it is enabled so that kernel can process it.
>
> [1] https://lkml.org/lkml/2022/2/15/263
>
> Suggested-by: Heiko Stubner 
> Signed-off-by: Atish Patra 

Reviewed-by: Alistair Francis 

Alistair

> ---
> Changes from v2->v3:
> 1. Used g_strconcat to replace snprintf & a max isa string length as
> suggested by Anup.
> 2. I have not included the Tested-by Tag from Heiko because the
> implementation changed from v2 to v3.
>
> Changes from v1->v2:
> 1. Improved the code redability by using arrays instead of individual check
> ---
>  target/riscv/cpu.c | 29 +
>  1 file changed, 29 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index b0a40b83e7a8..2c7ff6ef555a 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -34,6 +34,12 @@
>
>  /* RISC-V CPU definitions */
>
> +/* This includes the null terminated character '\0' */
> +struct isa_ext_data {
> +const char *name;
> +bool enabled;
> +};
> +
>  static const char riscv_exts[26] = "IEMAFDQCLBJTPVNSUHKORWXYZG";
>
>  const char * const riscv_int_regnames[] = {
> @@ -881,6 +887,28 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>
> +static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int 
> max_str_len)
> +{
> +char *old = *isa_str;
> +char *new = *isa_str;
> +int i;
> +struct isa_ext_data isa_edata_arr[] = {
> +{ "svpbmt", cpu->cfg.ext_svpbmt   },
> +{ "svinval", cpu->cfg.ext_svinval },
> +{ "svnapot", cpu->cfg.ext_svnapot },
> +};
> +
> +for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> +if (isa_edata_arr[i].enabled) {
> +new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
> +g_free(old);
> +old = new;
> +}
> +}
> +
> +*isa_str = new;
> +}
> +
>  char *riscv_isa_string(RISCVCPU *cpu)
>  {
>  int i;
> @@ -893,6 +921,7 @@ char *riscv_isa_string(RISCVCPU *cpu)
>  }
>  }
>  *p = '\0';
> +riscv_isa_string_ext(cpu, _str, maxlen);
>  return isa_str;
>  }
>
> --
> 2.30.2
>
>

Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ

2022-02-23 Thread Jason Wang

On Wed, Feb 23, 2022 at 4:06 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Feb 23, 2022 at 4:47 AM Jason Wang  wrote:
> >
> > On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Tue, Feb 22, 2022 at 8:41 AM Jason Wang  wrote:
> > > >
> > > >
> > > > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang  
> > > > > wrote:
> > > > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > > > >>  wrote:
> > > > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang  
> > > > >>> wrote:
> > > > 
> > > >  在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang  
> > > > > wrote:
> > > > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>> SVQ is able to log the dirty bits by itself, so let's use it to 
> > > > >>> not
> > > > >>> block migration.
> > > > >>>
> > > > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features 
> > > > >>> if SVQ is
> > > > >>> enabled. Even if the device supports it, the reports would be 
> > > > >>> nonsense
> > > > >>> because SVQ memory is in the qemu region.
> > > > >>>
> > > > >>> The log region is still allocated. Future changes might skip 
> > > > >>> that, but
> > > > >>> this series is already long enough.
> > > > >>>
> > > > >>> Signed-off-by: Eugenio Pérez 
> > > > >>> ---
> > > > >>> hw/virtio/vhost-vdpa.c | 20 
> > > > >>> 1 file changed, 20 insertions(+)
> > > > >>>
> > > > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > >>> index fb0a338baa..75090d65e8 100644
> > > > >>> --- a/hw/virtio/vhost-vdpa.c
> > > > >>> +++ b/hw/virtio/vhost-vdpa.c
> > > > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct 
> > > > >>> vhost_dev *dev, uint64_t *features)
> > > > >>> if (ret == 0 && v->shadow_vqs_enabled) {
> > > > >>> /* Filter only features that SVQ can offer to guest 
> > > > >>> */
> > > > >>> vhost_svq_valid_guest_features(features);
> > > > >>> +
> > > > >>> +/* Add SVQ logging capabilities */
> > > > >>> +*features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > > >>> }
> > > > >>>
> > > > >>> return ret;
> > > > >>> @@ -1039,8 +1042,25 @@ static int 
> > > > >>> vhost_vdpa_set_features(struct vhost_dev *dev,
> > > > >>>
> > > > >>> if (v->shadow_vqs_enabled) {
> > > > >>> uint64_t dev_features, svq_features, acked_features;
> > > > >>> +uint8_t status = 0;
> > > > >>> bool ok;
> > > > >>>
> > > > >>> +ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, 
> > > > >>> );
> > > > >>> +if (unlikely(ret)) {
> > > > >>> +return ret;
> > > > >>> +}
> > > > >>> +
> > > > >>> +if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > >>> +/*
> > > > >>> + * vhost is trying to enable or disable _F_LOG, 
> > > > >>> and the device
> > > > >>> + * would report wrong dirty pages. SVQ handles it.
> > > > >>> + */
> > > > >> I fail to understand this comment, I'd think there's no way to 
> > > > >> disable
> > > > >> dirty page tracking for SVQ.
> > > > >>
> > > > > vhost_log_global_{start,stop} are called at the beginning and end 
> > > > > of
> > > > > migration. To inform the device that it should start logging, 
> > > > > they set
> > > > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > > > 
> > > >  Yes, but for SVQ, we can't disable dirty page tracking, isn't it? 
> > > >  The
> > > >  only thing is to ignore or filter out the F_LOG_ALL and pretend to 
> > > >  be
> > > >  enabled and disabled.
> > > > 
> > > > >>> Yes, that's what this patch does.
> > > > >>>
> > > > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature 
> > > > > bit so
> > > > > vhost does not block migration. Maybe we need to look for another 
> > > > > way
> > > > > to do this?
> > > > 
> > > >  I'm fine with filtering since it's much more simpler, but I fail to
> > > >  understand why we need to check DRIVER_OK.
> > > > 
> > > > >>> Ok maybe I can make that part more clear,
> > > > >>>
> > > > >>> Since both operations use vhost_vdpa_set_features we must just 
> > > > >>> filter
> > > > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > > > >>> affecting other features.
> > > > >>>
> > > > >>> In practice, that means to not forward the set features after
> > > > >>> DRIVER_OK. The device is not expecting them anymore.
> > > > >> I wonder what happens if we don't do this.
> > > > >>
> > > > > If we simply delete the check vhost_dev_set_features will return an
> > > > > error, failing the

Re: [PATCH v1 1/2] hw/ssi: Add Ibex SPI device model

2022-02-23 Thread Alistair Francis

On Wed, Feb 23, 2022 at 7:45 AM Alistair Francis
 wrote:
>
> From: Wilfred Mallawa 
>
> Adds the SPI_HOST device model for ibex. The device specification is as per
> [1]. The model has been tested on opentitan with spi_host unit tests
> written for TockOS.
>
> [1] https://docs.opentitan.org/hw/ip/spi_host/doc/
>
> Signed-off-by: Wilfred Mallawa 
> ---
>  hw/ssi/ibex_spi_host.c | 629 +
>  hw/ssi/meson.build |   1 +
>  hw/ssi/trace-events|   7 +
>  include/hw/ssi/ibex_spi_host.h |  91 +
>  4 files changed, 728 insertions(+)
>  create mode 100644 hw/ssi/ibex_spi_host.c
>  create mode 100644 include/hw/ssi/ibex_spi_host.h
>
> diff --git a/hw/ssi/ibex_spi_host.c b/hw/ssi/ibex_spi_host.c
> new file mode 100644
> index 00..7343eb0f61
> --- /dev/null
> +++ b/hw/ssi/ibex_spi_host.c
> @@ -0,0 +1,629 @@
> +
> +/*
> + * QEMU model of the Ibex SPI Controller
> + * SPEC Reference: https://docs.opentitan.org/hw/ip/spi_host/doc/
> + *
> + * Copyright (C) 2022 Western Digital
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "hw/ssi/ibex_spi_host.h"
> +#include "hw/irq.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/qdev-properties-system.h"
> +#include "migration/vmstate.h"
> +#include "trace.h"
> +
> +REG32(INTR_STATE, 0x00)
> +FIELD(INTR_STATE, ERROR, 0, 1)
> +FIELD(INTR_STATE, SPI_EVENT, 1, 1)
> +REG32(INTR_ENABLE, 0x04)
> +FIELD(INTR_ENABLE, ERROR, 0, 1)
> +FIELD(INTR_ENABLE, SPI_EVENT, 1, 1)
> +REG32(INTR_TEST, 0x08)
> +FIELD(INTR_TEST, ERROR, 0, 1)
> +FIELD(INTR_TEST, SPI_EVENT, 1, 1)
> +REG32(ALERT_TEST, 0x0c)
> +FIELD(ALERT_TEST, FETAL_TEST, 0, 1)
> +REG32(CONTROL, 0x10)
> +FIELD(CONTROL, RX_WATERMARK, 0, 8)
> +FIELD(CONTROL, TX_WATERMARK, 1, 8)
> +FIELD(CONTROL, OUTPUT_EN, 29, 1)
> +FIELD(CONTROL, SW_RST, 30, 1)
> +FIELD(CONTROL, SPIEN, 31, 1)
> +REG32(STATUS, 0x14)
> +FIELD(STATUS, TXQD, 0, 8)
> +FIELD(STATUS, RXQD, 18, 8)
> +FIELD(STATUS, CMDQD, 16, 3)
> +FIELD(STATUS, RXWM, 20, 1)
> +FIELD(STATUS, BYTEORDER, 22, 1)
> +FIELD(STATUS, RXSTALL, 23, 1)
> +FIELD(STATUS, RXEMPTY, 24, 1)
> +FIELD(STATUS, RXFULL, 25, 1)
> +FIELD(STATUS, TXWM, 26, 1)
> +FIELD(STATUS, TXSTALL, 27, 1)
> +FIELD(STATUS, TXEMPTY, 28, 1)
> +FIELD(STATUS, TXFULL, 29, 1)
> +FIELD(STATUS, ACTIVE, 30, 1)
> +FIELD(STATUS, READY, 31, 1)
> +REG32(CONFIGOPTS, 0x18)
> +FIELD(CONFIGOPTS, CLKDIV_0, 0, 16)
> +FIELD(CONFIGOPTS, CSNIDLE_0, 16, 4)
> +FIELD(CONFIGOPTS, CSNTRAIL_0, 20, 4)
> +FIELD(CONFIGOPTS, CSNLEAD_0, 24, 4)
> +FIELD(CONFIGOPTS, FULLCYC_0, 29, 1)
> +FIELD(CONFIGOPTS, CPHA_0, 30, 1)
> +FIELD(CONFIGOPTS, CPOL_0, 31, 1)
> +REG32(CSID, 0x1c)
> +FIELD(CSID, CSID, 0, 32)
> +REG32(COMMAND, 0x20)
> +FIELD(COMMAND, LEN, 0, 8)
> +FIELD(COMMAND, CSAAT, 9, 1)
> +FIELD(COMMAND, SPEED, 10, 2)
> +FIELD(COMMAND, DIRECTION, 12, 2)
> +REG32(ERROR_ENABLE, 0x2c)
> +FIELD(ERROR_ENABLE, CMDBUSY, 0, 1)
> +FIELD(ERROR_ENABLE, OVERFLOW, 1, 1)
> +FIELD(ERROR_ENABLE, UNDERFLOW, 2, 1)
> +FIELD(ERROR_ENABLE, CMDINVAL, 3, 1)
> +FIELD(ERROR_ENABLE, CSIDINVAL, 4, 1)
> +REG32(ERROR_STATUS, 0x30)
> +FIELD(ERROR_STATUS, CMDBUSY, 0, 1)
> +FIELD(ERROR_STATUS, OVERFLOW, 1, 1)
> +FIELD(ERROR_STATUS, UNDERFLOW, 2, 1)
> +FIELD(ERROR_STATUS, CMDINVAL, 3, 1)
> +FIELD(ERROR_STATUS, CSIDINVAL, 4, 1)
> +FIELD(ERROR_STATUS, ACCESSINVAL, 5, 1)
> +REG32(EVENT_ENABLE, 0x30)
> +FIELD(EVENT_ENABLE, RXFULL, 0, 1)
> +FIELD(EVENT_ENABLE, TXEMPTY, 1, 1)
> +FIELD(EVENT_ENABLE, RXWM, 2, 1)
> +FIELD(EVENT_ENABLE, TXWM, 3, 1)
> +FIELD(EVENT_ENABLE, READY, 4, 1)
> +FIELD(EVENT_ENABLE, IDLE, 5, 1)
> +
> +/*
> + * Used to track the

Re: [PATCH v2 4/4] hw: hyperv: Initial commit for Synthetic Debugging device

2022-02-23 Thread Jon Doron

ping

On Wed, Feb 16, 2022, 12:25 Jon Doron  wrote:

> Signed-off-by: Jon Doron 
> ---
>  hw/hyperv/Kconfig |   5 +
>  hw/hyperv/meson.build |   1 +
>  hw/hyperv/syndbg.c| 402 ++
>  3 files changed, 408 insertions(+)
>  create mode 100644 hw/hyperv/syndbg.c
>
> diff --git a/hw/hyperv/Kconfig b/hw/hyperv/Kconfig
> index 3fbfe41c9e..fcf65903bd 100644
> --- a/hw/hyperv/Kconfig
> +++ b/hw/hyperv/Kconfig
> @@ -11,3 +11,8 @@ config VMBUS
>  bool
>  default y
>  depends on HYPERV
> +
> +config SYNDBG
> +bool
> +default y
> +depends on VMBUS
> diff --git a/hw/hyperv/meson.build b/hw/hyperv/meson.build
> index 1367e2994f..b43f119ea5 100644
> --- a/hw/hyperv/meson.build
> +++ b/hw/hyperv/meson.build
> @@ -1,3 +1,4 @@
>  specific_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'))
>  specific_ss.add(when: 'CONFIG_HYPERV_TESTDEV', if_true:
> files('hyperv_testdev.c'))
>  specific_ss.add(when: 'CONFIG_VMBUS', if_true: files('vmbus.c'))
> +specific_ss.add(when: 'CONFIG_SYNDBG', if_true: files('syndbg.c'))
> diff --git a/hw/hyperv/syndbg.c b/hw/hyperv/syndbg.c
> new file mode 100644
> index 00..8816bc4082
> --- /dev/null
> +++ b/hw/hyperv/syndbg.c
> @@ -0,0 +1,402 @@
> +/*
> + * QEMU Hyper-V Synthetic Debugging device
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/ctype.h"
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/sockets.h"
> +#include "qemu-common.h"
> +#include "qapi/error.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/loader.h"
> +#include "cpu.h"
> +#include "hw/hyperv/hyperv.h"
> +#include "hw/hyperv/vmbus-bridge.h"
> +#include "hw/hyperv/hyperv-proto.h"
> +#include "net/net.h"
> +#include "net/eth.h"
> +#include "net/checksum.h"
> +#include "trace.h"
> +
> +#define TYPE_HV_SYNDBG   "hv-syndbg"
> +
> +typedef struct HvSynDbg {
> +DeviceState parent_obj;
> +
> +char *host_ip;
> +uint16_t host_port;
> +bool use_hcalls;
> +
> +uint32_t target_ip;
> +struct sockaddr_in servaddr;
> +int socket;
> +bool has_data_pending;
> +uint64_t pending_page_gpa;
> +} HvSynDbg;
> +
> +#define HVSYNDBG(obj) OBJECT_CHECK(HvSynDbg, (obj), TYPE_HV_SYNDBG)
> +
> +/* returns NULL unless there is exactly one HV Synth debug device */
> +static HvSynDbg *hv_syndbg_find(void)
> +{
> +/* Returns NULL unless there is exactly one hvsd device */
> +return HVSYNDBG(object_resolve_path_type("", TYPE_HV_SYNDBG, NULL));
> +}
> +
> +static void set_pending_state(HvSynDbg *syndbg, bool has_pending)
> +{
> +hwaddr out_len;
> +void *out_data;
> +
> +syndbg->has_data_pending = has_pending;
> +
> +if (!syndbg->pending_page_gpa) {
> +return;
> +}
> +
> +out_len = 1;
> +out_data = cpu_physical_memory_map(syndbg->pending_page_gpa,
> _len, 1);
> +if (out_data) {
> +*(uint8_t *)out_data = !!has_pending;
> +cpu_physical_memory_unmap(out_data, out_len, 1, out_len);
> +}
> +}
> +
> +static bool get_udb_pkt_data(void *p, uint32_t len, uint32_t *data_ofs,
> + uint32_t *src_ip)
> +{
> +uint32_t offset, curr_len = len;
> +
> +if (curr_len < sizeof(struct eth_header) ||
> +(be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto) != ETH_P_IP)) {
> +return false;
> +}
> +offset = sizeof(struct eth_header);
> +curr_len -= sizeof(struct eth_header);
> +
> +if (curr_len < sizeof(struct ip_header) ||
> +PKT_GET_IP_HDR(p)->ip_p != IP_PROTO_UDP) {
> +return false;
> +}
> +offset += PKT_GET_IP_HDR_LEN(p);
> +curr_len -= PKT_GET_IP_HDR_LEN(p);
> +
> +if (curr_len < sizeof(struct udp_header)) {
> +return false;
> +}
> +
> +offset += sizeof(struct udp_header);
> +*data_ofs = offset;
> +*src_ip = PKT_GET_IP_HDR(p)->ip_src;
> +return true;
> +}
> +
> +static uint16_t handle_send_msg(HvSynDbg *syndbg, uint64_t ingpa,
> +uint32_t count, bool is_raw,
> +uint32_t *pending_count)
> +{
> +uint16_t ret;
> +hwaddr data_len;
> +void *debug_data = NULL;
> +uint32_t udp_data_ofs = 0;
> +const void *pkt_data;
> +int sent_count;
> +
> +data_len = count;
> +debug_data = cpu_physical_memory_map(ingpa, _len, 0);
> +if (!debug_data || data_len < count) {
> +ret = HV_STATUS_INSUFFICIENT_MEMORY;
> +goto cleanup;
> +}
> +
> +if (is_raw &&
> +!get_udb_pkt_data(debug_data, count, _data_ofs,
> +  >target_ip)) {
> +ret = HV_STATUS_SUCCESS;
> +goto cleanup;
> +}
> +
> +pkt_data = (const void *)((uintptr_t)debug_data + udp_data_ofs);
> +sent_count = qemu_sendto(syndbg->socket, pkt_data, count -
>

Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Eric Biggers

On Wed, Feb 23, 2022 at 02:12:30PM +0100, Jason A. Donenfeld wrote:
> When a VM forks, we must immediately mix in additional information to
> the stream of random output so that two forks or a rollback don't
> produce the same stream of random numbers, which could have catastrophic
> cryptographic consequences. This commit adds a simple API, add_vmfork_
> randomness(), for that.
> 
> Cc: Theodore Ts'o 
> Cc: Jann Horn 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  drivers/char/random.c  | 58 ++
>  include/linux/random.h |  1 +
>  2 files changed, 59 insertions(+)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 536237a0f073..29d6ce484d15 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -344,6 +344,46 @@ static void crng_reseed(void)
>   }
>  }
>  
> +/*
> + * This mixes unique_vm_id directly into the base_crng key as soon as
> + * possible, similarly to crng_pre_init_inject(), even if the crng is
> + * already running, in order to immediately branch streams from prior
> + * VM instances.
> + */
> +static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
> +{
> + unsigned long flags, next_gen;
> + struct blake2s_state hash;
> +
> + /*
> +  * Unlike crng_reseed(), we take the lock as early as possible,
> +  * since we don't want the RNG to be used until it's updated.
> +  */
> + spin_lock_irqsave(_crng.lock, flags);
> +
> + /*
> +  * Also update the generation, while locked, as early as
> +  * possible. This will mean unlocked reads of the generation
> +  * will cause a reseeding of per-cpu crngs, and those will
> +  * spin on the base_crng lock waiting for the rest of this
> +  * operation to complete, which achieves the goal of blocking
> +  * the production of new output until this is done.
> +  */
> + next_gen = base_crng.generation + 1;
> + if (next_gen == ULONG_MAX)
> + ++next_gen;
> + WRITE_ONCE(base_crng.generation, next_gen);
> + WRITE_ONCE(base_crng.birth, jiffies);
> +
> + /* This is the same formulation used by crng_pre_init_inject(). */
> + blake2s_init(, sizeof(base_crng.key));
> + blake2s_update(, base_crng.key, sizeof(base_crng.key));
> + blake2s_update(, unique_vm_id, len);
> + blake2s_final(, base_crng.key);
> +
> + spin_unlock_irqrestore(_crng.lock, flags);
> +}
[...]
> +/*
> + * Handle a new unique VM ID, which is unique, not secret, so we
> + * don't credit it, but we do mix it into the entropy pool and
> + * inject it into the crng.
> + */
> +void add_vmfork_randomness(const void *unique_vm_id, size_t size)
> +{
> + add_device_randomness(unique_vm_id, size);
> + crng_vm_fork_inject(unique_vm_id, size);
> +}
> +EXPORT_SYMBOL_GPL(add_vmfork_randomness);

I think we should be removing cases where the base_crng key is changed directly
besides extraction from the input_pool, not adding new ones.  Why not implement
this as add_device_randomness() followed by crng_reseed(force=true), where the
'force' argument forces a reseed to occur even if the entropy_count is too low?

- Eric

Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Eric Biggers

On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> On 2/24/22, Eric Biggers  wrote:
> > I think we should be removing cases where the base_crng key is changed
> > directly
> > besides extraction from the input_pool, not adding new ones.  Why not
> > implement
> > this as add_device_randomness() followed by crng_reseed(force=true), where
> > the
> > 'force' argument forces a reseed to occur even if the entropy_count is too
> > low?
> 
> Because that induces a "premature next" condition which can let that
> entropy, potentially newly acquired by a storm of IRQs at power-on, be
> bruteforced by unprivileged userspace. I actually had it exactly the
> way you describe at first, but decided that this here is the lesser of
> evils and doesn't really complicate things the way an intentional
> premature next would. The only thing we care about here is branching
> the crng stream, and so this does explicitly that, without having to
> interfere with how we collect entropy. Of course we *also* add it as
> non-credited "device randomness" so that it's part of the next
> reseeding, whenever that might occur.

Can you make sure to properly explain this in the code?

- Eric

Re: Fix a potential Use-after-free in virtio_iommu_handle_command() (v6.2.0).

2022-02-23 Thread wliang


> > thanks for your report and patch - but to make sure that the right 
> > people get attention, please use the scripts/get_maintainer.pl script to 
> > get a list of people who should be on CC:, or look into the MAINTAINERS 
> > file directly (for the next time - this time, I've CC:ed them now already).
> 
> You can find the contribution guidelines here:
> https://www.qemu.org/docs/master/devel/submitting-a-patch.html



Thank you so much!
You guys are so kid! That reminds me how beautiful the world is.
Have a good day!

Thanks,
Wentao

Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Jason A. Donenfeld

On 2/24/22, Eric Biggers  wrote:
> I think we should be removing cases where the base_crng key is changed
> directly
> besides extraction from the input_pool, not adding new ones.  Why not
> implement
> this as add_device_randomness() followed by crng_reseed(force=true), where
> the
> 'force' argument forces a reseed to occur even if the entropy_count is too
> low?

Because that induces a "premature next" condition which can let that
entropy, potentially newly acquired by a storm of IRQs at power-on, be
bruteforced by unprivileged userspace. I actually had it exactly the
way you describe at first, but decided that this here is the lesser of
evils and doesn't really complicate things the way an intentional
premature next would. The only thing we care about here is branching
the crng stream, and so this does explicitly that, without having to
interfere with how we collect entropy. Of course we *also* add it as
non-credited "device randomness" so that it's part of the next
reseeding, whenever that might occur.

Re: [PATCH v7 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-23 Thread Richard Henderson


On 2/23/22 13:43, Richard Henderson wrote:

Although none of this is going to work with .insn...


I beg your pardon, this is incorrect: .insn does have fields for the register 
arguments.


r~

Re: [PATCH v7 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-23 Thread Richard Henderson


On 2/23/22 12:31, David Miller wrote:

+#define F_EPI "stg %%r0, %[res] " : [res] "+m" (res) : : "r0", "r2", "r3"
+
+#define F_PROasm ( \
+"llihf %%r0,801\n" \
+"lg %%r2, %[a]\n"  \
+"lg %%r3, %[b] "   \
+: : [a] "m" (a),   \
+[b] "m" (b)\
+: "r2", "r3")
+
+#define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \
+{ uint64_t res = 0; F_PRO; ASM; return res; }
+
+/* AND WITH COMPLEMENT */
+FbinOp(_ncrk,  asm("ncrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI))


Better written as

  asm("ncrk %0, %3, %2" : "="(res) : "r"(a), "r"(b) : "cc");

and drop F_PRO and F_EPI.  Use the asm constraints properly to place the 
operands.


+/* NAND */
+FbinOp(_nnrk,  asm("nnrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NOT XOR */
+FbinOp(_nxrk,  asm("nxrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NOR */
+FbinOp(_nork,  asm("nork  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* OR WITH COMPLEMENT */
+FbinOp(_ocrk,  asm("ocrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI))


Similarly.


+++ b/tests/tcg/s390x/mie3-sel.c
@@ -0,0 +1,42 @@
+#include 
+
+
+#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
+{   \
+uint64_t res = 0;   \
+asm (   \
+"lg %%r2, %[a]\n"   \
+"lg %%r3, %[b]\n"   \
+"lg %%r0, %[c]\n"   \
+"ltgr %%r0, %%r0\n" \
+ASM \
+"stg %%r0, %[res] " \
+: [res] "=m" (res)  \
+: [a] "m" (a),  \
+  [b] "m" (b),  \
+  [c] "m" (c)   \
+: "r0", "r2",   \
+  "r3", "r4"\
+);  \
+return res; \
+}
+
+
+Fi3 (_selre,"selre%%r0, %%r3, %%r2\n")
+Fi3 (_selgrz,   "selgrz   %%r0, %%r3, %%r2\n")
+Fi3 (_selfhrnz, "selfhrnz %%r0, %%r3, %%r2\n")


Similarly:

  asm("ltgr %3, %3; selre %0, %2, %1"
  : "="(res) : "r"(a), "r"(b), "r"(c) : "cc");

Although none of this is going to work with .insn.  We *ought* to be able to use the 
debian11 update plus a change to tests/tcg/configure.sh to detect host support for 
-march=z15 to drop that change.



r~

Re: [PATCH v3 4/6] i386/pc: relocate 4g start to 1T where applicable

2022-02-23 Thread Joao Martins

On 2/23/22 21:22, Michael S. Tsirkin wrote:
> On Wed, Feb 23, 2022 at 06:44:53PM +, Joao Martins wrote:
>> It is assumed that the whole GPA space is available to be DMA
>> addressable, within a given address space limit, expect for a
>> tiny region before the 4G. Since Linux v5.4, VFIO validates
>> whether the selected GPA is indeed valid i.e. not reserved by
>> IOMMU on behalf of some specific devices or platform-defined
>> restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
>>  -EINVAL.
>>
>> AMD systems with an IOMMU are examples of such platforms and
>> particularly may only have these ranges as allowed:
>>
>>   - fedf (0  .. 3.982G)
>>  fef0 - 00fc (3.983G .. 1011.9G)
>>  0100 -  (1Tb.. 16Pb[*])
>>
>> We already account for the 4G hole, albeit if the guest is big
>> enough we will fail to allocate a guest with  >1010G due to the
>> ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).
> 
> Could you point me to which driver then reserves the
> other regions on Linux for AMD platforms?
> 
It's two regions only. The 4G hole which its use is the same use as 
AMD[0]/Intel[1],
and part of that hole is the IOMMU MSI reserved range. And the 1T hole, is 
reserved
for HyperTransport[2]. This is hardware behaviour, so drivers just mark them 
reserved
and avoid using those at all.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd/iommu.c#n2203
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/intel/iommu.c#n5328
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd/iommu.c#n2210

Now for the 256T on AMD, it isn't reserved anywhere and the only code reference 
that I can
give you is KVM selftests that had issues before[4] fixed by Paolo. The errata 
also gives
a glimpse[3].

[3] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
[4]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c8cc43c1eae2910ac96daa4216e0fb3391ad0504

>> +/*
>> + * AMD systems with an IOMMU have an additional hole close to the
>> + * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
>> + * on kernel version, VFIO may or may not let you DMA map those ranges.
>> + * Starting Linux v5.4 we validate it, and can't create guests on AMD 
>> machines
>> + * with certain memory sizes. It's also wrong to use those IOVA ranges
>> + * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
>> + * The ranges reserved for Hyper-Transport are:
>> + *
>> + * FD__h - FF__h
>> + *
>> + * The ranges represent the following:
>> + *
>> + * Base Address   Top Address  Use
>> + *
>> + * FD__h FD_F7FF_h Reserved interrupt address space
>> + * FD_F800_h FD_F8FF_h Interrupt/EOI IntCtl
>> + * FD_F900_h FD_F90F_h Legacy PIC IACK
>> + * FD_F910_h FD_F91F_h System Management
>> + * FD_F920_h FD_FAFF_h Reserved Page Tables
>> + * FD_FB00_h FD_FBFF_h Address Translation
>> + * FD_FC00_h FD_FDFF_h I/O Space
>> + * FD_FE00_h FD__h Configuration
>> + * FE__h FE_1FFF_h Extended Configuration/Device Messages
>> + * FE_2000_h FF__h Reserved
>> + *
>> + * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
>> + * Table 3: Special Address Controls (GPA) for more information.
>> + */
>> +#define AMD_HT_START 0xfdUL
>> +#define AMD_HT_END   0xffUL
>> +#define AMD_ABOVE_1TB_START  (AMD_HT_END + 1)
>> +#define AMD_HT_SIZE  (AMD_ABOVE_1TB_START - AMD_HT_START)
>> +
>> +static hwaddr x86_max_phys_addr(PCMachineState *pcms,
>> +uint64_t pci_hole64_size)
>> +{
>> +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +X86MachineState *x86ms = X86_MACHINE(pcms);
>> +MachineState *machine = MACHINE(pcms);
>> +ram_addr_t device_mem_size = 0;
>> +hwaddr base;
>> +
>> +if (pcmc->has_reserved_memory &&
>> +   (machine->ram_size < machine->maxram_size)) {
>> +device_mem_size = machine->maxram_size - machine->ram_size;
>> +}
>> +
>> +base = ROUND_UP(x86ms->above_4g_mem_start + x86ms->above_4g_mem_size +
>> +pcms->sgx_epc.size, 1 * GiB);
>> +
>> +return base + device_mem_size + pci_hole64_size;
>> +}
>> +
>> +static void x86_update_above_4g_mem_start(PCMachineState *pcms,
>> +  uint64_t pci_hole64_size)
>> +{
>> +X86MachineState *x86ms = X86_MACHINE(pcms);
>> +uint32_t eax, vendor[3];
>> +
>> +host_cpuid(0x0, 0, , [0], [2], [1]);
>> +if (!IS_AMD_VENDOR(vendor)) {
>> +return;
>> +}
> 
> Wait a sec, should this actually be tying things to the host CPU ID?
> It's really about what we present to the guest though,
> isn't it?
> 

It was the easier catch all to use cpuid

Re: [PATCH v7 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-23 Thread Richard Henderson


On 2/23/22 12:31, David Miller wrote:

resolves:https://gitlab.com/qemu-project/qemu/-/issues/737
implements:
AND WITH COMPLEMENT   (NCRK, NCGRK)
NAND  (NNRK, NNGRK)
NOT EXCLUSIVE OR  (NXRK, NXGRK)
NOR   (NORK, NOGRK)
OR WITH COMPLEMENT(OCRK, OCGRK)
SELECT(SELR, SELGR)
SELECT HIGH   (SELFHR)
MOVE RIGHT TO LEFT(MVCRL)
POPULATION COUNT  (POPCNT)

Signed-off-by: David Miller
---
  target/s390x/gen-features.c|  1 +
  target/s390x/helper.h  |  1 +
  target/s390x/tcg/insn-data.def | 30 +++--
  target/s390x/tcg/mem_helper.c  | 20 
  target/s390x/tcg/translate.c   | 60 --
  5 files changed, 107 insertions(+), 5 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PATCH v3 16/17] tests/avocado: Limit test_virt_tcg_gicv[23] to cortex-a72

2022-02-23 Thread Richard Henderson

These tests currently use Fedora Core 31, with a v5.3.7 kernel,
which is broken vs FEAT_LPA2.  Before we can re-enable these tests
for -cpu max, we need to advance to at least a v5.12 kernel.

Signed-off-by: Richard Henderson 
---

Fedora Cloud 35 uses a v5.14 kernel, and does work with FEAT_LPA2.
However, I have no idea how to update the makefile/avocado combo
to get that to happen.

Cc: Cleber Rosa 
Cc: Philippe Mathieu-Daudé 
Cc: Wainer dos Santos Moschetta 
Cc: Beraldo Leal 
---
 tests/avocado/boot_linux.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index ab19146d1e..a79c1578a6 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -74,7 +74,7 @@ def add_common_args(self):
 def test_virt_tcg_gicv2(self):
 """
 :avocado: tags=accel:tcg
-:avocado: tags=cpu:max
+:avocado: tags=cpu:cortex-a72
 :avocado: tags=device:gicv2
 """
 self.require_accelerator("tcg")
@@ -86,7 +86,7 @@ def test_virt_tcg_gicv2(self):
 def test_virt_tcg_gicv3(self):
 """
 :avocado: tags=accel:tcg
-:avocado: tags=cpu:max
+:avocado: tags=cpu:cortex-a72
 :avocado: tags=device:gicv3
 """
 self.require_accelerator("tcg")
-- 
2.25.1

[PATCH v3 17/17] target/arm: Implement FEAT_LPA2

2022-02-23 Thread Richard Henderson

This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
4k or 16k pages.

This introduces the DS bit to TCR_ELx, which is RES0 unless the
page size is enabled and supports LPA2, resulting in the effective
value of DS for a given table walk.  The DS bit changes the format
of the page table descriptor slightly, moving the PS field out to
TCR so that all pages have the same sharability and repurposing
those bits of the page table descriptor for the highest bits of
the output address.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v2: Check DS in tlbi_aa64_get_range.
Check TGRAN4_2 and TGRAN16_2.
---
 docs/system/arm/emulation.rst |   1 +
 target/arm/cpu.h  |  22 
 target/arm/internals.h|   2 +
 target/arm/cpu64.c|   7 ++-
 target/arm/helper.c   | 102 +-
 5 files changed, 116 insertions(+), 18 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 0053ddce20..520fd39071 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -25,6 +25,7 @@ the following architecture extensions:
 - FEAT_JSCVT (JavaScript conversion instructions)
 - FEAT_LOR (Limited ordering regions)
 - FEAT_LPA (Large Physical Address space)
+- FEAT_LPA2 (Large Physical and virtual Address space v2)
 - FEAT_LRCPC (Load-acquire RCpc instructions)
 - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 - FEAT_LSE (Large System Extensions)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c52d56f669..24d9fff170 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4284,6 +4284,28 @@ static inline bool isar_feature_aa64_i8mm(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, I8MM) != 0;
 }
 
+static inline bool isar_feature_aa64_tgran4_lpa2(const ARMISARegisters *id)
+{
+return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 1;
+}
+
+static inline bool isar_feature_aa64_tgran4_2_lpa2(const ARMISARegisters *id)
+{
+unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4_2);
+return t >= 3 || (t == 0 && isar_feature_aa64_tgran4_lpa2(id));
+}
+
+static inline bool isar_feature_aa64_tgran16_lpa2(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16) >= 2;
+}
+
+static inline bool isar_feature_aa64_tgran16_2_lpa2(const ARMISARegisters *id)
+{
+unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16_2);
+return t >= 3 || (t == 0 && isar_feature_aa64_tgran16_lpa2(id));
+}
+
 static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 00af41d792..a34be2e459 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1056,6 +1056,7 @@ static inline uint32_t aarch64_pstate_valid_mask(const 
ARMISARegisters *id)
 typedef struct ARMVAParameters {
 unsigned tsz: 8;
 unsigned ps : 3;
+unsigned sh : 2;
 unsigned select : 1;
 bool tbi: 1;
 bool epd: 1;
@@ -1063,6 +1064,7 @@ typedef struct ARMVAParameters {
 bool using16k   : 1;
 bool using64k   : 1;
 bool tsz_oob: 1;  /* tsz has been clamped to legal range */
+bool ds : 1;
 } ARMVAParameters;
 
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 2fdc16bf18..fc3c65ab2a 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -796,10 +796,11 @@ static void aarch64_max_initfn(Object *obj)
 
 t = cpu->isar.id_aa64mmfr0;
 t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 6); /* FEAT_LPA: 52 bits */
-t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16, 1);   /* 16k pages supported */
-t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 2); /* 16k stage2 supported */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16, 2);   /* 16k pages w/ LPA2 */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4, 1);/*  4k pages w/ LPA2 */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 3); /* 16k stage2 w/ LPA2 */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 3);  /*  4k stage2 w/ LPA2 */
 t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
-t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
 cpu->isar.id_aa64mmfr0 = t;
 
 t = cpu->isar.id_aa64mmfr1;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3a7f5cf6f0..088956eecf 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4546,6 +4546,14 @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, 
ARMMMUIdx mmuidx,
 } else {
 ret.base = extract64(value, 0, 37);
 }
+if (param.ds) {
+/*
+ * With DS=1, BaseADDR is always shifted 16 so that it is able
+ * to address all 52 va bits.  The input address is perforce
+

[PATCH v3 15/17] target/arm: Advertise all page sizes for -cpu max

2022-02-23 Thread Richard Henderson

We support 16k pages, but do not advertize that in ID_AA64MMFR0.

The value 0 in the TGRAN*_2 fields indicates that stage2 lookups defer
to the same support as stage1 lookups.  This setting is deprecated, so
indicate support for all stage2 page sizes directly.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu64.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index d88662cef6..2fdc16bf18 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -796,6 +796,10 @@ static void aarch64_max_initfn(Object *obj)
 
 t = cpu->isar.id_aa64mmfr0;
 t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 6); /* FEAT_LPA: 52 bits */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16, 1);   /* 16k pages supported */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 2); /* 16k stage2 supported */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
+t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
 cpu->isar.id_aa64mmfr0 = t;
 
 t = cpu->isar.id_aa64mmfr1;
-- 
2.25.1

[PATCH v3 14/17] target/arm: Validate tlbi TG matches translation granule in use

2022-02-23 Thread Richard Henderson

For FEAT_LPA2, we will need other ARMVAParameters, which themselves
depend on the translation granule in use.  We might as well validate
that the given TG matches; the architecture "does not require that
the instruction invalidates any entries" if this is not true.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index e455397fb5..3a7f5cf6f0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4520,12 +4520,16 @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, 
ARMMMUIdx mmuidx,
  uint64_t value)
 {
 unsigned int page_size_granule, page_shift, num, scale, exponent;
+/* Extract one bit to represent the va selector in use. */
+uint64_t select = sextract64(value, 36, 1);
+ARMVAParameters param = aa64_va_parameters(env, select, mmuidx, true);
 TLBIRange ret = { };
 
 page_size_granule = extract64(value, 46, 2);
 
-if (page_size_granule == 0) {
-qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
+/* The granule encoded in value must match the granule in use. */
+if (page_size_granule != (param.using64k ? 3 : param.using16k ? 2 : 1)) {
+qemu_log_mask(LOG_GUEST_ERROR, "Invalid tlbi page size granule %d\n",
   page_size_granule);
 return ret;
 }
@@ -4537,7 +4541,7 @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, 
ARMMMUIdx mmuidx,
 
 ret.length = (num + 1) << (exponent + page_shift);
 
-if (regime_has_2_ranges(mmuidx)) {
+if (param.select) {
 ret.base = sextract64(value, 0, 37);
 } else {
 ret.base = extract64(value, 0, 37);
-- 
2.25.1

[PATCH v3 13/17] target/arm: Fix TLBIRange.base for 16k and 64k pages

2022-02-23 Thread Richard Henderson

The shift of the BaseADDR field depends on the translation
granule in use.

Fixes: 84940ed8255 ("target/arm: Add support for FEAT_TLBIRANGE")
Reported-by: Peter Maydell 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 31c2a716f2..e455397fb5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4538,10 +4538,11 @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, 
ARMMMUIdx mmuidx,
 ret.length = (num + 1) << (exponent + page_shift);
 
 if (regime_has_2_ranges(mmuidx)) {
-ret.base = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
+ret.base = sextract64(value, 0, 37);
 } else {
-ret.base = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+ret.base = extract64(value, 0, 37);
 }
+ret.base <<= page_shift;
 
 return ret;
 }
-- 
2.25.1

[PATCH v3 12/17] target/arm: Introduce tlbi_aa64_get_range

2022-02-23 Thread Richard Henderson

Merge tlbi_aa64_range_get_length and tlbi_aa64_range_get_base,
returning a structure containing both results.  Pass in the
ARMMMUIdx, rather than the digested two_ranges boolean.

This is in preparation for FEAT_LPA2, where the interpretation
of 'value' depends on the effective value of DS for the regime.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 58 +++--
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 950f56599e..31c2a716f2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4511,70 +4511,60 @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 }
 
 #ifdef TARGET_AARCH64
-static uint64_t tlbi_aa64_range_get_length(CPUARMState *env,
-   uint64_t value)
-{
-unsigned int page_shift;
-unsigned int page_size_granule;
-uint64_t num;
-uint64_t scale;
-uint64_t exponent;
+typedef struct {
+uint64_t base;
 uint64_t length;
+} TLBIRange;
+
+static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
+ uint64_t value)
+{
+unsigned int page_size_granule, page_shift, num, scale, exponent;
+TLBIRange ret = { };
 
-num = extract64(value, 39, 5);
-scale = extract64(value, 44, 2);
 page_size_granule = extract64(value, 46, 2);
 
 if (page_size_granule == 0) {
 qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
   page_size_granule);
-return 0;
+return ret;
 }
 
 page_shift = (page_size_granule - 1) * 2 + 12;
-
+num = extract64(value, 39, 5);
+scale = extract64(value, 44, 2);
 exponent = (5 * scale) + 1;
-length = (num + 1) << (exponent + page_shift);
 
-return length;
-}
+ret.length = (num + 1) << (exponent + page_shift);
 
-static uint64_t tlbi_aa64_range_get_base(CPUARMState *env, uint64_t value,
-bool two_ranges)
-{
-/* TODO: ARMv8.7 FEAT_LPA2 */
-uint64_t pageaddr;
-
-if (two_ranges) {
-pageaddr = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
+if (regime_has_2_ranges(mmuidx)) {
+ret.base = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
 } else {
-pageaddr = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+ret.base = extract64(value, 0, 37) << TARGET_PAGE_BITS;
 }
 
-return pageaddr;
+return ret;
 }
 
 static void do_rvae_write(CPUARMState *env, uint64_t value,
   int idxmap, bool synced)
 {
 ARMMMUIdx one_idx = ARM_MMU_IDX_A | ctz32(idxmap);
-bool two_ranges = regime_has_2_ranges(one_idx);
-uint64_t baseaddr, length;
+TLBIRange range;
 int bits;
 
-baseaddr = tlbi_aa64_range_get_base(env, value, two_ranges);
-length = tlbi_aa64_range_get_length(env, value);
-bits = tlbbits_for_regime(env, one_idx, baseaddr);
+range = tlbi_aa64_get_range(env, one_idx, value);
+bits = tlbbits_for_regime(env, one_idx, range.base);
 
 if (synced) {
 tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
-  baseaddr,
-  length,
+  range.base,
+  range.length,
   idxmap,
   bits);
 } else {
-tlb_flush_range_by_mmuidx(env_cpu(env), baseaddr,
-  length, idxmap, bits);
+tlb_flush_range_by_mmuidx(env_cpu(env), range.base,
+  range.length, idxmap, bits);
 }
 }
 
-- 
2.25.1

[PATCH v3 10/17] target/arm: Implement FEAT_LPA

2022-02-23 Thread Richard Henderson

This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
64k pages.  The only thing left at this point is to handle the
extra bits in the TTBR and in the table descriptors.

Note that PAR_EL1 and HPFAR_EL2 are nominally extended, but we don't
mask out the high bits when writing to those registers, so no changes
are required there.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu-param.h|  2 +-
 target/arm/cpu64.c|  2 +-
 target/arm/helper.c   | 19 ---
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index f3eabddfb5..0053ddce20 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -24,6 +24,7 @@ the following architecture extensions:
 - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
 - FEAT_JSCVT (JavaScript conversion instructions)
 - FEAT_LOR (Limited ordering regions)
+- FEAT_LPA (Large Physical Address space)
 - FEAT_LRCPC (Load-acquire RCpc instructions)
 - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 - FEAT_LSE (Large System Extensions)
diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 5f9c288b1a..b59d505761 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -10,7 +10,7 @@
 
 #ifdef TARGET_AARCH64
 # define TARGET_LONG_BITS 64
-# define TARGET_PHYS_ADDR_SPACE_BITS  48
+# define TARGET_PHYS_ADDR_SPACE_BITS  52
 # define TARGET_VIRT_ADDR_SPACE_BITS  52
 #else
 # define TARGET_LONG_BITS 32
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 1de31ffb40..d88662cef6 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -795,7 +795,7 @@ static void aarch64_max_initfn(Object *obj)
 cpu->isar.id_aa64pfr1 = t;
 
 t = cpu->isar.id_aa64mmfr0;
-t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 5); /* PARange: 48 bits */
+t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 6); /* FEAT_LPA: 52 bits */
 cpu->isar.id_aa64mmfr0 = t;
 
 t = cpu->isar.id_aa64mmfr1;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 28b4347213..950f56599e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11173,6 +11173,7 @@ static const uint8_t pamax_map[] = {
 [3] = 42,
 [4] = 44,
 [5] = 48,
+[6] = 52,
 };
 
 /* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
@@ -11564,11 +11565,15 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 descaddr = extract64(ttbr, 0, 48);
 
 /*
- * If the base address is out of range, raise AddressSizeFault.
+ * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [5:2] of TTBR.
+ *
+ * Otherwise, if the base address is out of range, raise AddressSizeFault.
  * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
  * but we've just cleared the bits above 47, so simplify the test.
  */
-if (descaddr >> outputsize) {
+if (outputsize > 48) {
+descaddr |= extract64(ttbr, 2, 4) << 48;
+} else if (descaddr >> outputsize) {
 level = 0;
 fault_type = ARMFault_AddressSize;
 goto do_fault;
@@ -11620,7 +11625,15 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 }
 
 descaddr = descriptor & descaddrmask;
-if (descaddr >> outputsize) {
+
+/*
+ * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
+ * of descriptor.  Otherwise, if descaddr is out of range, raise
+ * AddressSizeFault.
+ */
+if (outputsize > 48) {
+descaddr |= extract64(descriptor, 12, 4) << 48;
+} else if (descaddr >> outputsize) {
 fault_type = ARMFault_AddressSize;
 goto do_fault;
 }
-- 
2.25.1

[PATCH v3 11/17] target/arm: Extend arm_fi_to_lfsc to level -1

2022-02-23 Thread Richard Henderson

With FEAT_LPA2, rather than introducing translation level 4,
we introduce level -1, below the current level 0.  Extend
arm_fi_to_lfsc to handle these faults.

Assert that this new translation level does not leak into
fault types for which it is not defined, which allows some
masking of fi->level to be removed.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 35 +--
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 3d3d41ba2b..00af41d792 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -462,28 +462,51 @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
 case ARMFault_None:
 return 0;
 case ARMFault_AddressSize:
-fsc = fi->level & 3;
+assert(fi->level >= -1 && fi->level <= 3);
+if (fi->level < 0) {
+fsc = 0b101001;
+} else {
+fsc = fi->level;
+}
 break;
 case ARMFault_AccessFlag:
-fsc = (fi->level & 3) | (0x2 << 2);
+assert(fi->level >= 0 && fi->level <= 3);
+fsc = 0b001000 | fi->level;
 break;
 case ARMFault_Permission:
-fsc = (fi->level & 3) | (0x3 << 2);
+assert(fi->level >= 0 && fi->level <= 3);
+fsc = 0b001100 | fi->level;
 break;
 case ARMFault_Translation:
-fsc = (fi->level & 3) | (0x1 << 2);
+assert(fi->level >= -1 && fi->level <= 3);
+if (fi->level < 0) {
+fsc = 0b101011;
+} else {
+fsc = 0b000100 | fi->level;
+}
 break;
 case ARMFault_SyncExternal:
 fsc = 0x10 | (fi->ea << 12);
 break;
 case ARMFault_SyncExternalOnWalk:
-fsc = (fi->level & 3) | (0x5 << 2) | (fi->ea << 12);
+assert(fi->level >= -1 && fi->level <= 3);
+if (fi->level < 0) {
+fsc = 0b010011;
+} else {
+fsc = 0b010100 | fi->level;
+}
+fsc |= fi->ea << 12;
 break;
 case ARMFault_SyncParity:
 fsc = 0x18;
 break;
 case ARMFault_SyncParityOnWalk:
-fsc = (fi->level & 3) | (0x7 << 2);
+assert(fi->level >= -1 && fi->level <= 3);
+if (fi->level < 0) {
+fsc = 0b011011;
+} else {
+fsc = 0b011100 | fi->level;
+}
 break;
 case ARMFault_AsyncParity:
 fsc = 0x19;
-- 
2.25.1

[PATCH v3 08/17] target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA

2022-02-23 Thread Richard Henderson

The original A.a revision of the AArch64 ARM required that we
force-extend the addresses in these registers from 49 bits.
This language has been loosened via a combination of IMPLEMENTATION
DEFINED and CONSTRAINTED UNPREDICTABLE to allow consideration of
the entire aligned address.

This means that we do not have to consider whether or not FEAT_LVA
is enabled, and decide from which bit an address might need to be
extended.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index c002100979..2eff30d18c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6423,11 +6423,18 @@ static void dbgwvr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 ARMCPU *cpu = env_archcpu(env);
 int i = ri->crm;
 
-/* Bits [63:49] are hardwired to the value of bit [48]; that is, the
- * register reads and behaves as if values written are sign extended.
+/*
  * Bits [1:0] are RES0.
+ *
+ * It is IMPLEMENTATION DEFINED whether [63:49] ([63:53] with FEAT_LVA)
+ * are hardwired to the value of bit [48] ([52] with FEAT_LVA), or if
+ * they contain the value written.  It is CONSTRAINED UNPREDICTABLE
+ * whether the RESS bits are ignored when comparing an address.
+ *
+ * Therefore we are allowed to compare the entire register, which lets
+ * us avoid considering whether or not FEAT_LVA is actually enabled.
  */
-value = sextract64(value, 0, 49) & ~3ULL;
+value &= ~3ULL;
 
 raw_write(env, ri, value);
 hw_watchpoint_update(cpu, i);
@@ -6473,10 +6480,19 @@ void hw_breakpoint_update(ARMCPU *cpu, int n)
 case 0: /* unlinked address match */
 case 1: /* linked address match */
 {
-/* Bits [63:49] are hardwired to the value of bit [48]; that is,
- * we behave as if the register was sign extended. Bits [1:0] are
- * RES0. The BAS field is used to allow setting breakpoints on 16
- * bit wide instructions; it is CONSTRAINED UNPREDICTABLE whether
+/*
+ * Bits [1:0] are RES0.
+ *
+ * It is IMPLEMENTATION DEFINED whether bits [63:49]
+ * ([63:53] for FEAT_LVA) are hardwired to a copy of the sign bit
+ * of the VA field ([48] or [52] for FEAT_LVA), or whether the
+ * value is read as written.  It is CONSTRAINED UNPREDICTABLE
+ * whether the RESS bits are ignored when comparing an address.
+ * Therefore we are allowed to compare the entire register, which
+ * lets us avoid considering whether FEAT_LVA is actually enabled.
+ *
+ * The BAS field is used to allow setting breakpoints on 16-bit
+ * wide instructions; it is CONSTRAINED UNPREDICTABLE whether
  * a bp will fire if the addresses covered by the bp and the addresses
  * covered by the insn overlap but the insn doesn't start at the
  * start of the bp address range. We choose to require the insn and
@@ -6489,7 +6505,7 @@ void hw_breakpoint_update(ARMCPU *cpu, int n)
  * See also figure D2-3 in the v8 ARM ARM (DDI0487A.c).
  */
 int bas = extract64(bcr, 5, 4);
-addr = sextract64(bvr, 0, 49) & ~3ULL;
+addr = bvr & ~3ULL;
 if (bas == 0) {
 return;
 }
-- 
2.25.1

[PATCH v3 07/17] target/arm: Honor TCR_ELx.{I}PS

2022-02-23 Thread Richard Henderson

This field controls the output (intermediate) physical address size
of the translation process.  V8 requires to raise an AddressSize
fault if the page tables are programmed incorrectly, such that any
intermediate descriptor address, or the final translated address,
is out of range.

Add a PS field to ARMVAParameters, and properly compute outputsize
in get_phys_addr_lpae.  Test the descaddr as extracted from TTBR
and from page table entries.

Restrict descaddrmask so that we won't raise the fault for v7.

Reviewed-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/internals.h |  1 +
 target/arm/helper.c| 72 --
 2 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index fefd1fb8d8..3d3d41ba2b 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1032,6 +1032,7 @@ static inline uint32_t aarch64_pstate_valid_mask(const 
ARMISARegisters *id)
  */
 typedef struct ARMVAParameters {
 unsigned tsz: 8;
+unsigned ps : 3;
 unsigned select : 1;
 bool tbi: 1;
 bool epd: 1;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 675aec4bf3..c002100979 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11149,17 +11149,19 @@ static uint8_t convert_stage2_attrs(CPUARMState *env, 
uint8_t s2attrs)
 }
 #endif /* !CONFIG_USER_ONLY */
 
+/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
+static const uint8_t pamax_map[] = {
+[0] = 32,
+[1] = 36,
+[2] = 40,
+[3] = 42,
+[4] = 44,
+[5] = 48,
+};
+
 /* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
 unsigned int arm_pamax(ARMCPU *cpu)
 {
-static const unsigned int pamax_map[] = {
-[0] = 32,
-[1] = 36,
-[2] = 40,
-[3] = 42,
-[4] = 44,
-[5] = 48,
-};
 unsigned int parange =
 FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
 
@@ -11210,7 +11212,7 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 {
 uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
 bool epd, hpd, using16k, using64k, tsz_oob;
-int select, tsz, tbi, max_tsz, min_tsz;
+int select, tsz, tbi, max_tsz, min_tsz, ps;
 
 if (!regime_has_2_ranges(mmu_idx)) {
 select = 0;
@@ -11224,6 +11226,7 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 hpd = extract32(tcr, 24, 1);
 }
 epd = false;
+ps = extract32(tcr, 16, 3);
 } else {
 /*
  * Bit 55 is always between the two regions, and is canonical for
@@ -11244,6 +11247,7 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 epd = extract32(tcr, 23, 1);
 hpd = extract64(tcr, 42, 1);
 }
+ps = extract64(tcr, 32, 3);
 }
 
 if (cpu_isar_feature(aa64_st, env_archcpu(env))) {
@@ -11272,6 +11276,7 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 
 return (ARMVAParameters) {
 .tsz = tsz,
+.ps = ps,
 .select = select,
 .tbi = tbi,
 .epd = epd,
@@ -11399,6 +11404,8 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 
 /* TODO: This code does not support shareability levels. */
 if (aarch64) {
+int ps;
+
 param = aa64_va_parameters(env, address, mmu_idx,
access_type != MMU_INST_FETCH);
 level = 0;
@@ -11419,7 +11426,16 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 
 addrsize = 64 - 8 * param.tbi;
 inputsize = 64 - param.tsz;
-outputsize = arm_pamax(cpu);
+
+/*
+ * Bound PS by PARANGE to find the effective output address size.
+ * ID_AA64MMFR0 is a read-only register so values outside of the
+ * supported mappings can be considered an implementation error.
+ */
+ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+ps = MIN(ps, param.ps);
+assert(ps < ARRAY_SIZE(pamax_map));
+outputsize = pamax_map[ps];
 } else {
 param = aa32_va_parameters(env, address, mmu_idx);
 level = 1;
@@ -11523,19 +11539,38 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 
 /* Now we can extract the actual base address from the TTBR */
 descaddr = extract64(ttbr, 0, 48);
+
+/*
+ * If the base address is out of range, raise AddressSizeFault.
+ * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
+ * but we've just cleared the bits above 47, so simplify the test.
+ */
+if (descaddr >> outputsize) {
+level = 0;
+fault_type = ARMFault_AddressSize;
+goto do_fault;
+}
+
 /*
  * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
  * and also

[PATCH v3 06/17] target/arm: Use MAKE_64BIT_MASK to compute indexmask

2022-02-23 Thread Richard Henderson

The macro is a bit more readable than the inlined computation.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 431b0c1405..675aec4bf3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11518,8 +11518,8 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 level = startlevel;
 }
 
-indexmask_grainsize = (1ULL << (stride + 3)) - 1;
-indexmask = (1ULL << (inputsize - (stride * (4 - level - 1;
+indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
+indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
 
 /* Now we can extract the actual base address from the TTBR */
 descaddr = extract64(ttbr, 0, 48);
-- 
2.25.1

[PATCH v3 05/17] target/arm: Pass outputsize down to check_s2_mmu_setup

2022-02-23 Thread Richard Henderson

Pass down the width of the output address from translation.
For now this is still just PAMax, but a subsequent patch will
compute the correct value from TCR_ELx.{I}PS.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 71e575f352..431b0c1405 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11065,7 +11065,7 @@ do_fault:
  * false otherwise.
  */
 static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
-   int inputsize, int stride)
+   int inputsize, int stride, int outputsize)
 {
 const int grainsize = stride + 3;
 int startsizecheck;
@@ -11081,22 +11081,19 @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool 
is_aa64, int level,
 }
 
 if (is_aa64) {
-CPUARMState *env = >env;
-unsigned int pamax = arm_pamax(cpu);
-
 switch (stride) {
 case 13: /* 64KB Pages.  */
-if (level == 0 || (level == 1 && pamax <= 42)) {
+if (level == 0 || (level == 1 && outputsize <= 42)) {
 return false;
 }
 break;
 case 11: /* 16KB Pages.  */
-if (level == 0 || (level == 1 && pamax <= 40)) {
+if (level == 0 || (level == 1 && outputsize <= 40)) {
 return false;
 }
 break;
 case 9: /* 4KB Pages.  */
-if (level == 0 && pamax <= 42) {
+if (level == 0 && outputsize <= 42) {
 return false;
 }
 break;
@@ -11105,8 +11102,8 @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool 
is_aa64, int level,
 }
 
 /* Inputsize checks.  */
-if (inputsize > pamax &&
-(arm_el_is_aa64(env, 1) || inputsize > 40)) {
+if (inputsize > outputsize &&
+(arm_el_is_aa64(>env, 1) || inputsize > 40)) {
 /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
 return false;
 }
@@ -11392,7 +11389,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 target_ulong page_size;
 uint32_t attrs;
 int32_t stride;
-int addrsize, inputsize;
+int addrsize, inputsize, outputsize;
 TCR *tcr = regime_tcr(env, mmu_idx);
 int ap, ns, xn, pxn;
 uint32_t el = regime_el(env, mmu_idx);
@@ -11422,11 +11419,13 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 
 addrsize = 64 - 8 * param.tbi;
 inputsize = 64 - param.tsz;
+outputsize = arm_pamax(cpu);
 } else {
 param = aa32_va_parameters(env, address, mmu_idx);
 level = 1;
 addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
 inputsize = addrsize - param.tsz;
+outputsize = 40;
 }
 
 /*
@@ -11511,7 +11510,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 
 /* Check that the starting level is valid. */
 ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
-inputsize, stride);
+inputsize, stride, outputsize);
 if (!ok) {
 fault_type = ARMFault_Translation;
 goto do_fault;
-- 
2.25.1

[PATCH v3 04/17] target/arm: Move arm_pamax out of line

2022-02-23 Thread Richard Henderson

We will shortly share parts of this function with other portions
of address translation.

Reviewed-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 19 +--
 target/arm/helper.c| 22 ++
 2 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index ef6c25d8cb..fefd1fb8d8 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -243,24 +243,7 @@ static inline void update_spsel(CPUARMState *env, uint32_t 
imm)
  * Returns the implementation defined bit-width of physical addresses.
  * The ARMv8 reference manuals refer to this as PAMax().
  */
-static inline unsigned int arm_pamax(ARMCPU *cpu)
-{
-static const unsigned int pamax_map[] = {
-[0] = 32,
-[1] = 36,
-[2] = 40,
-[3] = 42,
-[4] = 44,
-[5] = 48,
-};
-unsigned int parange =
-FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
-
-/* id_aa64mmfr0 is a read-only register so values outside of the
- * supported mappings can be considered an implementation error.  */
-assert(parange < ARRAY_SIZE(pamax_map));
-return pamax_map[parange];
-}
+unsigned int arm_pamax(ARMCPU *cpu);
 
 /* Return true if extended addresses are enabled.
  * This is always the case if our translation regime is 64 bit,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index dd4d95bda2..71e575f352 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11152,6 +11152,28 @@ static uint8_t convert_stage2_attrs(CPUARMState *env, 
uint8_t s2attrs)
 }
 #endif /* !CONFIG_USER_ONLY */
 
+/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
+unsigned int arm_pamax(ARMCPU *cpu)
+{
+static const unsigned int pamax_map[] = {
+[0] = 32,
+[1] = 36,
+[2] = 40,
+[3] = 42,
+[4] = 44,
+[5] = 48,
+};
+unsigned int parange =
+FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+
+/*
+ * id_aa64mmfr0 is a read-only register so values outside of the
+ * supported mappings can be considered an implementation error.
+ */
+assert(parange < ARRAY_SIZE(pamax_map));
+return pamax_map[parange];
+}
+
 static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
 if (regime_has_2_ranges(mmu_idx)) {
-- 
2.25.1

[PATCH v3 03/17] target/arm: Fault on invalid TCR_ELx.TxSZ

2022-02-23 Thread Richard Henderson

Without FEAT_LVA, the behaviour of programming an invalid value
is IMPLEMENTATION DEFINED.  With FEAT_LVA, programming an invalid
minimum value requires a Translation fault.

It is most self-consistent to choose to generate the fault always.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v2: Continue to bound in aa64_va_parameters, so that PAuth gets
something it can use, but provide a flag for get_phys_addr_lpae
to raise a fault.
---
 target/arm/internals.h |  1 +
 target/arm/helper.c| 32 
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 3f05748ea4..ef6c25d8cb 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1055,6 +1055,7 @@ typedef struct ARMVAParameters {
 bool hpd: 1;
 bool using16k   : 1;
 bool using64k   : 1;
+bool tsz_oob: 1;  /* tsz has been clamped to legal range */
 } ARMVAParameters;
 
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7bf50fdd76..dd4d95bda2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11190,8 +11190,8 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
ARMMMUIdx mmu_idx, bool data)
 {
 uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
-bool epd, hpd, using16k, using64k;
-int select, tsz, tbi, max_tsz;
+bool epd, hpd, using16k, using64k, tsz_oob;
+int select, tsz, tbi, max_tsz, min_tsz;
 
 if (!regime_has_2_ranges(mmu_idx)) {
 select = 0;
@@ -11232,9 +11232,17 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 } else {
 max_tsz = 39;
 }
+min_tsz = 16;  /* TODO: ARMv8.2-LVA  */
 
-tsz = MIN(tsz, max_tsz);
-tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
+if (tsz > max_tsz) {
+tsz = max_tsz;
+tsz_oob = true;
+} else if (tsz < min_tsz) {
+tsz = min_tsz;
+tsz_oob = true;
+} else {
+tsz_oob = false;
+}
 
 /* Present TBI as a composite with TBID.  */
 tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
@@ -11251,6 +11259,7 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 .hpd = hpd,
 .using16k = using16k,
 .using64k = using64k,
+.tsz_oob = tsz_oob,
 };
 }
 
@@ -11374,6 +11383,21 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 param = aa64_va_parameters(env, address, mmu_idx,
access_type != MMU_INST_FETCH);
 level = 0;
+
+/*
+ * If TxSZ is programmed to a value larger than the maximum,
+ * or smaller than the effective minimum, it is IMPLEMENTATION
+ * DEFINED whether we behave as if the field were programmed
+ * within bounds, or if a level 0 Translation fault is generated.
+ *
+ * With FEAT_LVA, fault on less than minimum becomes required,
+ * so our choice is to always raise the fault.
+ */
+if (param.tsz_oob) {
+fault_type = ARMFault_Translation;
+goto do_fault;
+}
+
 addrsize = 64 - 8 * param.tbi;
 inputsize = 64 - param.tsz;
 } else {
-- 
2.25.1

[PATCH v7 4/4] tests/tcg/s390x: changed to using .insn for tests requiring z15

2022-02-23 Thread David Miller

Signed-off-by: David Miller 
---
 tests/tcg/s390x/mie3-compl.c | 21 +++--
 tests/tcg/s390x/mie3-mvcrl.c |  2 +-
 tests/tcg/s390x/mie3-sel.c   |  6 +++---
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/tests/tcg/s390x/mie3-compl.c b/tests/tcg/s390x/mie3-compl.c
index 98281ee683..31820e4a2a 100644
--- a/tests/tcg/s390x/mie3-compl.c
+++ b/tests/tcg/s390x/mie3-compl.c
@@ -14,25 +14,26 @@
 #define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \
 { uint64_t res = 0; F_PRO; ASM; return res; }
 
+
 /* AND WITH COMPLEMENT */
-FbinOp(_ncrk,  asm("ncrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ncrk,  asm(".insn rrf, 0xB9F5, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_ncgrk, asm(".insn rrf, 0xB9E5, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* NAND */
-FbinOp(_nnrk,  asm("nnrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nnrk,  asm(".insn rrf, 0xB974, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_nngrk, asm(".insn rrf, 0xB964, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* NOT XOR */
-FbinOp(_nxrk,  asm("nxrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nxrk,  asm(".insn rrf, 0xB977, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_nxgrk, asm(".insn rrf, 0xB967, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* NOR */
-FbinOp(_nork,  asm("nork  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nork,  asm(".insn rrf, 0xB976, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_nogrk, asm(".insn rrf, 0xB966, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* OR WITH COMPLEMENT */
-FbinOp(_ocrk,  asm("ocrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ocrk,  asm(".insn rrf, 0xB975, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_ocgrk, asm(".insn rrf, 0xB965, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 
 int main(int argc, char *argv[])
diff --git a/tests/tcg/s390x/mie3-mvcrl.c b/tests/tcg/s390x/mie3-mvcrl.c
index 81cf3ad702..f0be83b197 100644
--- a/tests/tcg/s390x/mie3-mvcrl.c
+++ b/tests/tcg/s390x/mie3-mvcrl.c
@@ -6,7 +6,7 @@ static inline void mvcrl_8(const char *dst, const char *src)
 {
 asm volatile (
 "llill %%r0, 8\n"
-"mvcrl 0(%[dst]), 0(%[src])\n"
+".insn sse, 0xE50A, 0(%[dst]), 0(%[src])"
 : : [dst] "d" (dst), [src] "d" (src)
 : "memory");
 }
diff --git a/tests/tcg/s390x/mie3-sel.c b/tests/tcg/s390x/mie3-sel.c
index 2e99e00b47..ee619a763d 100644
--- a/tests/tcg/s390x/mie3-sel.c
+++ b/tests/tcg/s390x/mie3-sel.c
@@ -22,9 +22,9 @@ asm (   \
 }
 
 
-Fi3 (_selre,"selre%%r0, %%r3, %%r2\n")
-Fi3 (_selgrz,   "selgrz   %%r0, %%r3, %%r2\n")
-Fi3 (_selfhrnz, "selfhrnz %%r0, %%r3, %%r2\n")
+Fi3 (_selre, ".insn rrf, 0xB9F0, %%r0, %%r3, %%r2, 8\n")
+Fi3 (_selgrz,".insn rrf, 0xB9E3, %%r0, %%r3, %%r2, 8\n")
+Fi3 (_selfhrnz,  ".insn rrf, 0xB9C0, %%r0, %%r3, %%r2, 7\n")
 
 
 int main(int argc, char *argv[])
-- 
2.32.0

[PATCH v3 09/17] target/arm: Implement FEAT_LVA

2022-02-23 Thread Richard Henderson

This feature is relatively small, as it applies only to
64k pages and thus requires no additional changes to the
table descriptor walking algorithm, only a change to the
minimum TSZ (which is the inverse of the maximum virtual
address space size).

Note that this feature widens VBAR_ELx, but we already
treat the register as being 64 bits wide.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/cpu-param.h| 2 +-
 target/arm/cpu.h  | 5 +
 target/arm/cpu64.c| 1 +
 target/arm/helper.c   | 9 -
 5 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 144dc491d9..f3eabddfb5 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -27,6 +27,7 @@ the following architecture extensions:
 - FEAT_LRCPC (Load-acquire RCpc instructions)
 - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 - FEAT_LSE (Large System Extensions)
+- FEAT_LVA (Large Virtual Address space)
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
 - FEAT_MTE3 (MTE Asymmetric Fault Handling)
diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 7f38d33b8e..5f9c288b1a 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -11,7 +11,7 @@
 #ifdef TARGET_AARCH64
 # define TARGET_LONG_BITS 64
 # define TARGET_PHYS_ADDR_SPACE_BITS  48
-# define TARGET_VIRT_ADDR_SPACE_BITS  48
+# define TARGET_VIRT_ADDR_SPACE_BITS  52
 #else
 # define TARGET_LONG_BITS 32
 # define TARGET_PHYS_ADDR_SPACE_BITS  40
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c6a4d50e82..c52d56f669 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4289,6 +4289,11 @@ static inline bool isar_feature_aa64_ccidx(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
 }
 
+static inline bool isar_feature_aa64_lva(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, VARANGE) != 0;
+}
+
 static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 1171ab16b9..1de31ffb40 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -811,6 +811,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64MMFR2, UAO, 1);
 t = FIELD_DP64(t, ID_AA64MMFR2, CNP, 1); /* TTCNP */
 t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1); /* TTST */
+t = FIELD_DP64(t, ID_AA64MMFR2, VARANGE, 1); /* FEAT_LVA */
 cpu->isar.id_aa64mmfr2 = t;
 
 t = cpu->isar.id_aa64zfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2eff30d18c..28b4347213 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11271,7 +11271,14 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 } else {
 max_tsz = 39;
 }
-min_tsz = 16;  /* TODO: ARMv8.2-LVA  */
+
+min_tsz = 16;
+if (using64k) {
+if (cpu_isar_feature(aa64_lva, env_archcpu(env))) {
+min_tsz = 12;
+}
+}
+/* TODO: FEAT_LPA2 */
 
 if (tsz > max_tsz) {
 tsz = max_tsz;
-- 
2.25.1

[PATCH v7 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-23 Thread David Miller

resolves: https://gitlab.com/qemu-project/qemu/-/issues/737
implements:
AND WITH COMPLEMENT   (NCRK, NCGRK)
NAND  (NNRK, NNGRK)
NOT EXCLUSIVE OR  (NXRK, NXGRK)
NOR   (NORK, NOGRK)
OR WITH COMPLEMENT(OCRK, OCGRK)
SELECT(SELR, SELGR)
SELECT HIGH   (SELFHR)
MOVE RIGHT TO LEFT(MVCRL)
POPULATION COUNT  (POPCNT)

Signed-off-by: David Miller 
---
 target/s390x/gen-features.c|  1 +
 target/s390x/helper.h  |  1 +
 target/s390x/tcg/insn-data.def | 30 +++--
 target/s390x/tcg/mem_helper.c  | 20 
 target/s390x/tcg/translate.c   | 60 --
 5 files changed, 107 insertions(+), 5 deletions(-)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 7cb1a6ec10..a3f30f69d9 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -740,6 +740,7 @@ static uint16_t qemu_LATEST[] = {
 
 /* add all new definitions before this point */
 static uint16_t qemu_MAX[] = {
+S390_FEAT_MISC_INSTRUCTION_EXT3,
 /* generates a dependency warning, leave it out for now */
 S390_FEAT_MSA_EXT_5,
 };
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 271b081e8c..69f69cf718 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -4,6 +4,7 @@ DEF_HELPER_FLAGS_4(nc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(oc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(xc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvc, TCG_CALL_NO_WG, void, env, i32, i64, i64)
+DEF_HELPER_FLAGS_4(mvcrl, TCG_CALL_NO_WG, void, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mvcin, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_3(mvcl, i32, env, i32, i32)
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 1c3e115712..35e55d454e 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -105,6 +105,9 @@
 D(0xa507, NILL,RI_a,  Z,   r1_o, i2_16u, r1, 0, andi, 0, 0x1000)
 D(0x9400, NI,  SI,Z,   la1, i2_8u, new, 0, ni, nz64, MO_UB)
 D(0xeb54, NIY, SIY,   LD,  la1, i2_8u, new, 0, ni, nz64, MO_UB)
+/* AND WITH COMPLEMENT */
+C(0xb9f5, NCRK,RRF_a, MIE3, r2, r3, new, r1_32, andc, nz32)
+C(0xb9e5, NCGRK,   RRF_a, MIE3, r2, r3, r1, 0, andc, nz64)
 
 /* BRANCH AND LINK */
 C(0x0500, BALR,RR_a,  Z,   0, r2_nz, r1, 0, bal, 0)
@@ -640,6 +643,8 @@
 C(0xeb8e, MVCLU,   RSY_a, E2,  0, a2, 0, 0, mvclu, 0)
 /* MOVE NUMERICS */
 C(0xd100, MVN, SS_a,  Z,   la1, a2, 0, 0, mvn, 0)
+/* MOVE RIGHT TO LEFT */
+C(0xe50a, MVCRL,   SSE,  MIE3, la1, a2, 0, 0, mvcrl, 0)
 /* MOVE PAGE */
 C(0xb254, MVPG,RRE,   Z,   0, 0, 0, 0, mvpg, 0)
 /* MOVE STRING */
@@ -707,6 +712,16 @@
 F(0xed0f, MSEB,RXF,   Z,   e1, m2_32u, new, e1, mseb, 0, IF_BFP)
 F(0xed1f, MSDB,RXF,   Z,   f1, m2_64, new, f1, msdb, 0, IF_BFP)
 
+/* NAND */
+C(0xb974, NNRK,RRF_a, MIE3, r2, r3, new, r1_32, nand, nz32)
+C(0xb964, NNGRK,   RRF_a, MIE3, r2, r3, r1, 0, nand, nz64)
+/* NOR */
+C(0xb976, NORK,RRF_a, MIE3, r2, r3, new, r1_32, nor, nz32)
+C(0xb966, NOGRK,   RRF_a, MIE3, r2, r3, r1, 0, nor, nz64)
+/* NOT EXCLUSIVE OR */
+C(0xb977, NXRK,RRF_a, MIE3, r2, r3, new, r1_32, nxor, nz32)
+C(0xb967, NXGRK,   RRF_a, MIE3, r2, r3, r1, 0, nxor, nz64)
+
 /* OR */
 C(0x1600, OR,  RR_a,  Z,   r1, r2, new, r1_32, or, nz32)
 C(0xb9f6, ORK, RRF_a, DO,  r2, r3, new, r1_32, or, nz32)
@@ -725,6 +740,9 @@
 D(0xa50b, OILL,RI_a,  Z,   r1_o, i2_16u, r1, 0, ori, 0, 0x1000)
 D(0x9600, OI,  SI,Z,   la1, i2_8u, new, 0, oi, nz64, MO_UB)
 D(0xeb56, OIY, SIY,   LD,  la1, i2_8u, new, 0, oi, nz64, MO_UB)
+/* OR WITH COMPLEMENT */
+C(0xb975, OCRK,RRF_a, MIE3, r2, r3, new, r1_32, orc, nz32)
+C(0xb965, OCGRK,   RRF_a, MIE3, r2, r3, r1, 0, orc, nz64)
 
 /* PACK */
 /* Really format SS_b, but we pack both lengths into one argument
@@ -735,6 +753,9 @@
 /* PACK UNICODE */
 C(0xe100, PKU, SS_f,  E2,  la1, a2, 0, 0, pku, 0)
 
+/* POPULATION COUNT */
+C(0xb9e1, POPCNT,  RRF_c, PC,  0, r2_o, r1, 0, popcnt, nz64)
+
 /* PREFETCH */
 /* Implemented as nops of course.  */
 C(0xe336, PFD, RXY_b, GIE, 0, 0, 0, 0, 0, 0)
@@ -743,9 +764,6 @@
 /* Implemented as nop of course.  */
 C(0xb2e8, PPA, RRF_c, PPA, 0, 0, 0, 0, 0, 0)
 
-/* POPULATION COUNT */
-C(0xb9e1, POPCNT,  RRE,   PC,  0, r2_o, r1, 0, popcnt, nz64)
-
 /* ROTATE LEFT SINGLE LOGICAL */
 C(0xeb1d, RLL, RSY_a, Z,   r3_o, sh, new, r1_32, rll32, 0)
 C(0xeb1c, RLLG,RSY_a, Z,   r3_o, sh, r1, 0, rll64, 0)
@@ -765,6 +783,12 @@
 /* SEARCH STRING UNICODE */
 C(0xb9be, SRSTU,   RRE,   ETF3, 0, 0, 0, 0, srstu, 0)
 
+/* SELECT */
+C(0xb9f0, SELR,RRF_a, MIE3, r3, r2, new, r1_32, loc, 0)
+C(0xb9e3, SELGR,   RRF_a, MIE3,

[PATCH v3 02/17] target/arm: Set TCR_EL1.TSZ for user-only

2022-02-23 Thread Richard Henderson

Set this as the kernel would, to 48 bits, to keep the computation
of the address space correct for PAuth.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index c085dc10ee..e251f0df4b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -206,10 +206,11 @@ static void arm_cpu_reset(DeviceState *dev)
 aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
 }
 /*
+ * Enable 48-bit address space (TODO: take reserved_va into account).
  * Enable TBI0 but not TBI1.
  * Note that this must match useronly_clean_ptr.
  */
-env->cp15.tcr_el[1].raw_tcr = (1ULL << 37);
+env->cp15.tcr_el[1].raw_tcr = 5 | (1ULL << 37);
 
 /* Enable MTE */
 if (cpu_isar_feature(aa64_mte, cpu)) {
-- 
2.25.1

[PATCH v7 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-23 Thread David Miller

tests/tcg/s390x/mie3-compl.c: [N]*K instructions
tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction
tests/tcg/s390x/mie3-sel.c:  SELECT instruction

Signed-off-by: David Miller 
---
 tests/tcg/s390x/Makefile.target |  5 ++-
 tests/tcg/s390x/mie3-compl.c| 55 +
 tests/tcg/s390x/mie3-mvcrl.c| 31 +++
 tests/tcg/s390x/mie3-sel.c  | 42 +
 4 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/s390x/mie3-compl.c
 create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
 create mode 100644 tests/tcg/s390x/mie3-sel.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 1a7238b4eb..54e67446aa 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -1,12 +1,15 @@
 S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
 VPATH+=$(S390X_SRC)
-CFLAGS+=-march=zEC12 -m64
+CFLAGS+=-march=z15 -m64
 TESTS+=hello-s390x
 TESTS+=csst
 TESTS+=ipm
 TESTS+=exrl-trt
 TESTS+=exrl-trtr
 TESTS+=pack
+TESTS+=mie3-compl
+TESTS+=mie3-mvcrl
+TESTS+=mie3-sel
 TESTS+=mvo
 TESTS+=mvc
 TESTS+=shift
diff --git a/tests/tcg/s390x/mie3-compl.c b/tests/tcg/s390x/mie3-compl.c
new file mode 100644
index 00..98281ee683
--- /dev/null
+++ b/tests/tcg/s390x/mie3-compl.c
@@ -0,0 +1,55 @@
+#include 
+
+
+#define F_EPI "stg %%r0, %[res] " : [res] "+m" (res) : : "r0", "r2", "r3"
+
+#define F_PROasm ( \
+"llihf %%r0,801\n" \
+"lg %%r2, %[a]\n"  \
+"lg %%r3, %[b] "   \
+: : [a] "m" (a),   \
+[b] "m" (b)\
+: "r2", "r3")
+
+#define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \
+{ uint64_t res = 0; F_PRO; ASM; return res; }
+
+/* AND WITH COMPLEMENT */
+FbinOp(_ncrk,  asm("ncrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NAND */
+FbinOp(_nnrk,  asm("nnrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NOT XOR */
+FbinOp(_nxrk,  asm("nxrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NOR */
+FbinOp(_nork,  asm("nork  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* OR WITH COMPLEMENT */
+FbinOp(_ocrk,  asm("ocrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+
+int main(int argc, char *argv[])
+{
+if (_ncrk(0xFF88, 0xAA11)  != 0x03210011ull ||
+_nnrk(0xFF88, 0xAA11)  != 0x032155FFull ||
+_nork(0xFF88, 0xAA11)  != 0x03210066ull ||
+_nxrk(0xFF88, 0xAA11)  != 0x0321AA66ull ||
+_ocrk(0xFF88, 0xAA11)  != 0x0321AA77ull ||
+_ncgrk(0xFF88, 0xAA11) != 0x0011ull ||
+_nngrk(0xFF88, 0xAA11) != 0x55FFull ||
+_nogrk(0xFF88, 0xAA11) != 0x0066ull ||
+_nxgrk(0xFF88, 0xAA11) != 0xAA66ull ||
+_ocgrk(0xFF88, 0xAA11) != 0xAA77ull)
+{
+return 1;
+}
+
+return 0;
+}
diff --git a/tests/tcg/s390x/mie3-mvcrl.c b/tests/tcg/s390x/mie3-mvcrl.c
new file mode 100644
index 00..81cf3ad702
--- /dev/null
+++ b/tests/tcg/s390x/mie3-mvcrl.c
@@ -0,0 +1,31 @@
+#include 
+#include 
+
+
+static inline void mvcrl_8(const char *dst, const char *src)
+{
+asm volatile (
+"llill %%r0, 8\n"
+"mvcrl 0(%[dst]), 0(%[src])\n"
+: : [dst] "d" (dst), [src] "d" (src)
+: "memory");
+}
+
+
+int main(int argc, char *argv[])
+{
+const char *alpha = "abcdefghijklmnop";
+
+/* array missing 'i' */
+char tstr[17] = "abcdefghjklmnop\0" ;
+
+/* mvcrl reference use: 'open a hole in an array' */
+mvcrl_8(tstr + 9, tstr + 8);
+
+/* place missing 'i' */
+tstr[8] = 'i';
+
+return strncmp(alpha, tstr, 16ul);
+}
+
+
diff --git a/tests/tcg/s390x/mie3-sel.c b/tests/tcg/s390x/mie3-sel.c
new file mode 100644
index 00..2e99e00b47
--- /dev/null
+++ b/tests/tcg/s390x/mie3-sel.c
@@ -0,0 +1,42 @@
+#include 
+
+
+#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
+{   \
+uint64_t res = 0;   \
+asm (   \
+"lg %%r2, %[a]\n"   \
+"lg %%r3, %[b]\n"   \
+"lg %%r0, %[c]\n"   \
+"ltgr %%r0, %%r0\n" \
+ASM \
+"stg %%r0, %[res] " \
+: [res] "=m" (res)  \
+: [a] "m" (a),  \
+  [b] "m" (b),  \
+  [c] "m" (c)   \
+: "r0", "r2",   \
+  "r3", "r4"\
+);  \
+return res; \
+}
+
+
+Fi3 (_selre,"selre%%r0, %%r3, %%r2\n")
+Fi3 (_selgrz,   "selgrz   %%r0, %%r3, %%r2\n")
+Fi3 (_selfhrnz, "selfhrnz %%r0, %%r3, %%r2\n")
+
+
+int main(int argc, char *argv[])
+{
+uint64_t a = ~0, b = ~0, c = ~0;
+a =_selre(0x06660066ull, 0x06660006ull, a);
+b =   _selgrz(0xF00D0005ull, 0xF00D0055ull, b);
+c = _selfhrnz(0x04320044ull, 0x06540004ull, c);
+
+return (int) (
+(0x0066ull

[PATCH v3 01/17] hw/registerfields: Add FIELD_SEX and FIELD_SDP

2022-02-23 Thread Richard Henderson

Add new macros to manipulate signed fields within the register.

Reviewed-by: Philippe Mathieu-Daudé 
Suggested-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/hw/registerfields.h | 48 -
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/hw/registerfields.h b/include/hw/registerfields.h
index f2a3c9c41f..3a88e135d0 100644
--- a/include/hw/registerfields.h
+++ b/include/hw/registerfields.h
@@ -59,6 +59,19 @@
 extract64((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
   R_ ## reg ## _ ## field ## _LENGTH)
 
+#define FIELD_SEX8(storage, reg, field)   \
+sextract8((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_SEX16(storage, reg, field)  \
+sextract16((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+   R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_SEX32(storage, reg, field)  \
+sextract32((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+   R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_SEX64(storage, reg, field)  \
+sextract64((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+   R_ ## reg ## _ ## field ## _LENGTH)
+
 /* Extract a field from an array of registers */
 #define ARRAY_FIELD_EX32(regs, reg, field)\
 FIELD_EX32((regs)[R_ ## reg], reg, field)
@@ -95,7 +108,40 @@
 _d; })
 #define FIELD_DP64(storage, reg, field, val) ({   \
 struct {  \
-uint64_t v:R_ ## reg ## _ ## field ## _LENGTH;\
+uint64_t v:R_ ## reg ## _ ## field ## _LENGTH;\
+} _v = { .v = val };  \
+uint64_t _d;  \
+_d = deposit64((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+  R_ ## reg ## _ ## field ## _LENGTH, _v.v);  \
+_d; })
+
+#define FIELD_SDP8(storage, reg, field, val) ({   \
+struct {  \
+signed int v:R_ ## reg ## _ ## field ## _LENGTH;  \
+} _v = { .v = val };  \
+uint8_t _d;   \
+_d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+  R_ ## reg ## _ ## field ## _LENGTH, _v.v);  \
+_d; })
+#define FIELD_SDP16(storage, reg, field, val) ({  \
+struct {  \
+signed int v:R_ ## reg ## _ ## field ## _LENGTH;  \
+} _v = { .v = val };  \
+uint16_t _d;  \
+_d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+  R_ ## reg ## _ ## field ## _LENGTH, _v.v);  \
+_d; })
+#define FIELD_SDP32(storage, reg, field, val) ({  \
+struct {  \
+signed int v:R_ ## reg ## _ ## field ## _LENGTH;  \
+} _v = { .v = val };  \
+uint32_t _d;  \
+_d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
+  R_ ## reg ## _ ## field ## _LENGTH, _v.v);  \
+_d; })
+#define FIELD_SDP64(storage, reg, field, val) ({  \
+struct {  \
+int64_t v:R_ ## reg ## _ ## field ## _LENGTH; \
 } _v = { .v = val };  \
 uint64_t _d;  \
 _d = deposit64((storage), R_ ## reg ## _ ## field ## _SHIFT,  \
-- 
2.25.1

[PATCH v7 0/4] s390x: Add partial z15 support and tests

2022-02-23 Thread David Miller

Add partial support for s390x z15 ga1 and specific tests for mie3 


v6 -> v7:
* Modified SELFHR insn-data + test to ensure high 32bits are copied.
* Changed m3 mask test value for popcnt to fix mie3 variant.

v5 -> v6:
* Swap operands for sel* instructions
* Use .insn in tests for z15 arch instructions

v4 -> v5:
* Readd missing tests/tcg/s390x/mie3-*.c to patch

v3 -> v4:
* Change popcnt encoding RRE -> RRF_c
* Remove redundant code op_sel -> op_loc
* Cleanup for checkpatch.pl
* Readded mie3-* to Makefile.target

v2 -> v3:
* Moved tests to separate patch.
* Combined patches into series.

David Miller (4):
  s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3
for the s390x
   * Reviewed-by: David Hildenbrand 
  s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z15 GA1
   * Reviewed-by: David Hildenbrand 
  tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions
Facility 3
  tests/tcg/s390x: changed to using .insn for tests requiring z15
   * Reviewed-by: Thomas Huth 

 hw/s390x/s390-virtio-ccw.c  |  3 ++
 target/s390x/cpu_models.c   |  6 ++--
 target/s390x/gen-features.c |  6 +++-
 target/s390x/helper.h   |  1 +
 target/s390x/tcg/insn-data.def  | 30 +++--
 target/s390x/tcg/mem_helper.c   | 20 +++
 target/s390x/tcg/translate.c| 60 +++--
 tests/tcg/s390x/Makefile.target |  5 ++-
 tests/tcg/s390x/mie3-compl.c| 56 ++
 tests/tcg/s390x/mie3-mvcrl.c| 31 +
 tests/tcg/s390x/mie3-sel.c  | 42 +++
 11 files changed, 250 insertions(+), 10 deletions(-)
 create mode 100644 tests/tcg/s390x/mie3-compl.c
 create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
 create mode 100644 tests/tcg/s390x/mie3-sel.c

-- 
2.32.0

[PATCH v7 2/4] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z15 GA1

2022-02-23 Thread David Miller

TCG implements everything we need to run basic z15 OS+software

Signed-off-by: David Miller 
---
 hw/s390x/s390-virtio-ccw.c  | 3 +++
 target/s390x/cpu_models.c   | 6 +++---
 target/s390x/gen-features.c | 7 +--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 84e3e63c43..90480e7cf9 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -802,7 +802,10 @@ DEFINE_CCW_MACHINE(7_0, "7.0", true);
 
 static void ccw_machine_6_2_instance_options(MachineState *machine)
 {
+static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
+
 ccw_machine_7_0_instance_options(machine);
+s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
 }
 
 static void ccw_machine_6_2_class_options(MachineClass *mc)
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 11e06cc51f..89f83e81d5 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -85,9 +85,9 @@ static S390CPUDef s390_cpu_defs[] = {
 CPUDEF_INIT(0x3932, 16, 1, 47, 0x0800U, "gen16b", "IBM 3932 GA1"),
 };
 
-#define QEMU_MAX_CPU_TYPE 0x3906
-#define QEMU_MAX_CPU_GEN 14
-#define QEMU_MAX_CPU_EC_GA 2
+#define QEMU_MAX_CPU_TYPE 0x8561
+#define QEMU_MAX_CPU_GEN 15
+#define QEMU_MAX_CPU_EC_GA 1
 static const S390FeatInit qemu_max_cpu_feat_init = { S390_FEAT_LIST_QEMU_MAX };
 static S390FeatBitmap qemu_max_cpu_feat;
 
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index a3f30f69d9..22846121c4 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -731,16 +731,18 @@ static uint16_t qemu_V6_0[] = {
 S390_FEAT_ESOP,
 };
 
-static uint16_t qemu_LATEST[] = {
+static uint16_t qemu_V6_2[] = {
 S390_FEAT_INSTRUCTION_EXEC_PROT,
 S390_FEAT_MISC_INSTRUCTION_EXT2,
 S390_FEAT_MSA_EXT_8,
 S390_FEAT_VECTOR_ENH,
 };
 
+static uint16_t qemu_LATEST[] = {
+S390_FEAT_MISC_INSTRUCTION_EXT3,
+};
 /* add all new definitions before this point */
 static uint16_t qemu_MAX[] = {
-S390_FEAT_MISC_INSTRUCTION_EXT3,
 /* generates a dependency warning, leave it out for now */
 S390_FEAT_MSA_EXT_5,
 };
@@ -863,6 +865,7 @@ static FeatGroupDefSpec QemuFeatDef[] = {
 QEMU_FEAT_INITIALIZER(V4_0),
 QEMU_FEAT_INITIALIZER(V4_1),
 QEMU_FEAT_INITIALIZER(V6_0),
+QEMU_FEAT_INITIALIZER(V6_2),
 QEMU_FEAT_INITIALIZER(LATEST),
 QEMU_FEAT_INITIALIZER(MAX),
 };
-- 
2.32.0

[PATCH v3 00/17] target/arm: Implement LVA, LPA, LPA2 features

2022-02-23 Thread Richard Henderson

Changes for v3:
  * Update emulation.rst.
  * Split out separate update to ID_AA64MMFR0.
  * Hack for avocado.

If the avocado hack isn't acceptable, perhaps just drop the
last two patches for now?


r~


Richard Henderson (17):
  hw/registerfields: Add FIELD_SEX and FIELD_SDP
  target/arm: Set TCR_EL1.TSZ for user-only
  target/arm: Fault on invalid TCR_ELx.TxSZ
  target/arm: Move arm_pamax out of line
  target/arm: Pass outputsize down to check_s2_mmu_setup
  target/arm: Use MAKE_64BIT_MASK to compute indexmask
  target/arm: Honor TCR_ELx.{I}PS
  target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
  target/arm: Implement FEAT_LVA
  target/arm: Implement FEAT_LPA
  target/arm: Extend arm_fi_to_lfsc to level -1
  target/arm: Introduce tlbi_aa64_get_range
  target/arm: Fix TLBIRange.base for 16k and 64k pages
  target/arm: Validate tlbi TG matches translation granule in use
  target/arm: Advertise all page sizes for -cpu max
  tests/avocado: Limit test_virt_tcg_gicv[23] to cortex-a72
  target/arm: Implement FEAT_LPA2

 docs/system/arm/emulation.rst |   3 +
 include/hw/registerfields.h   |  48 -
 target/arm/cpu-param.h|   4 +-
 target/arm/cpu.h  |  27 +++
 target/arm/internals.h|  58 +++---
 target/arm/cpu.c  |   3 +-
 target/arm/cpu64.c|   8 +-
 target/arm/helper.c   | 332 ++
 tests/avocado/boot_linux.py   |   4 +-
 9 files changed, 384 insertions(+), 103 deletions(-)

-- 
2.25.1

Re: [PATCH v6 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-23 Thread David Miller

Yes I'm adding to this patch,  I haven't quite figured out where to
put them,  they are inline to various things in the patch themselves
so I'm putting in the cover letter under the patch they go to.
I hope that's correct.

Thanks
- David Miller

On Wed, Feb 23, 2022 at 8:40 AM Christian Borntraeger
 wrote:
>
>
> Am 18.02.22 um 00:17 schrieb David Miller:
> > resolves: https://gitlab.com/qemu-project/qemu/-/issues/737
> > implements:
> > AND WITH COMPLEMENT   (NCRK, NCGRK)
> > NAND  (NNRK, NNGRK)
> > NOT EXCLUSIVE OR  (NXRK, NXGRK)
> > NOR   (NORK, NOGRK)
> > OR WITH COMPLEMENT(OCRK, OCGRK)
> > SELECT(SELR, SELGR)
> > SELECT HIGH   (SELFHR)
> > MOVE RIGHT TO LEFT(MVCRL)
> > POPULATION COUNT  (POPCNT)
> >
> > Signed-off-by: David Miller 
>
> For your next patches, feel free to add previous Reviewed-by: tags so that 
> others
> can see what review has already happened.

Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree

2022-02-23 Thread Richard Henderson


On 2/23/22 11:43, Matheus K. Ferst wrote:

Note that rotlv does the masking itself:

/*
  * Expand D = A << (B % element bits)
  *
  * Unlike scalar shifts, where it is easy for the target front end
  * to include the modulo as part of the expansion.  If the target
  * naturally includes the modulo as part of the operation, great!
  * If the target has some other behaviour from out-of-range shifts,
  * then it could not use this function anyway, and would need to
  * do it's own expansion with custom functions.
  */



Using tcg_gen_rotlv_vec(vece, vrt, vra, vrb) works on PPC but fails on x86. It looks like 
a problem on the i386 backend. It's using VPS[RL]LV[DQ], but instead of this modulo 
behavior, these instructions write zero to the element[1]. I'm not sure how to fix that. 


You don't want to use tcg_gen_rotlv_vec directly, but tcg_gen_rotlv_vec.

The generic modulo is being applied here:

static void tcg_gen_rotlv_mod_vec(unsigned vece, TCGv_vec d,
  TCGv_vec a, TCGv_vec b)
{
TCGv_vec t = tcg_temp_new_vec_matching(d);
TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);

tcg_gen_and_vec(vece, t, b, m);
tcg_gen_rotlv_vec(vece, d, a, t);
tcg_temp_free_vec(t);
}


r~

[PULL 6/6] MAINTAINERS: python - remove ehabkost and add bleal

2022-02-23 Thread John Snow

Eduardo Habkost has left Red Hat and has other daily responsibilities to
attend to. In order to stop spamming him on every series, remove him as
"Reviewer" for the python/ library dir and add Beraldo Leal instead.

For the "python scripts" stanza (which is separate due to level of
support), replace Eduardo as maintainer with myself.

(Thanks for all of your hard work, Eduardo!)

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Beraldo Leal 
Acked-by: Eduardo Habkost 
Message-id: 20220208000525.2601011-1-js...@redhat.com
Signed-off-by: John Snow 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c3b500345c..62bc185d10 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2747,13 +2747,13 @@ F: backends/cryptodev*.c
 Python library
 M: John Snow 
 M: Cleber Rosa 
-R: Eduardo Habkost 
+R: Beraldo Leal 
 S: Maintained
 F: python/
 T: git https://gitlab.com/jsnow/qemu.git python
 
 Python scripts
-M: Eduardo Habkost 
+M: John Snow 
 M: Cleber Rosa 
 S: Odd Fixes
 F: scripts/*.py
-- 
2.34.1

[PULL 5/6] Revert "python: pin setuptools below v60.0.0"

2022-02-23 Thread John Snow

This reverts commit 1e4d8b31be35e54b6429fea54f5ecaa0083f91e7.

Signed-off-by: John Snow 
Message-id: 20220204221804.2047468-3-js...@redhat.com
Signed-off-by: John Snow 
---
 python/Makefile  | 2 --
 python/setup.cfg | 1 -
 2 files changed, 3 deletions(-)

diff --git a/python/Makefile b/python/Makefile
index 949c472624..3334311362 100644
--- a/python/Makefile
+++ b/python/Makefile
@@ -68,8 +68,6 @@ $(QEMU_VENV_DIR) $(QEMU_VENV_DIR)/bin/activate: setup.cfg
echo "ACTIVATE $(QEMU_VENV_DIR)";   \
. $(QEMU_VENV_DIR)/bin/activate;\
echo "INSTALL qemu[devel] $(QEMU_VENV_DIR)";\
-   pip install --disable-pip-version-check \
-   "setuptools<60.0.0" 1>/dev/null;\
make develop 1>/dev/null;   \
)
@touch $(QEMU_VENV_DIR)
diff --git a/python/setup.cfg b/python/setup.cfg
index 9821db9880..241f243e8b 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -167,7 +167,6 @@ deps =
 .[devel]
 .[fuse]  # Workaround to trigger tox venv rebuild
 .[tui]   # Workaround to trigger tox venv rebuild
-setuptools < 60  # Workaround, please see commit msg.
 commands =
 make check
 
-- 
2.34.1

[PULL 2/6] python: support recording QMP session to a file

2022-02-23 Thread John Snow

From: Daniel P. Berrangé 

When running QMP commands with very large response payloads, it is often
not easy to spot the info you want. If we can save the response to a
file then tools like 'grep' or 'jq' can be used to extract information.

For convenience of processing, we merge the QMP command and response
dictionaries together:

  {
  "arguments": {},
  "execute": "query-kvm",
  "return": {
  "enabled": false,
  "present": true
  }
  }

Example usage

  $ ./scripts/qmp/qmp-shell-wrap -l q.log -p -- ./build/qemu-system-x86_64 
-display none
  Welcome to the QMP low-level shell!
  Connected
  (QEMU) query-kvm
  {
  "return": {
  "enabled": false,
  "present": true
  }
  }
  (QEMU) query-mice
  {
  "return": [
  {
  "absolute": false,
  "current": true,
  "index": 2,
  "name": "QEMU PS/2 Mouse"
  }
  ]
  }

 $ jq --slurp '. | to_entries[] | select(.value.execute == "query-kvm") |
   .value.return.enabled' < q.log
   false

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
Message-id: 20220128161157.36261-3-berra...@redhat.com
Signed-off-by: John Snow 
---
 python/qemu/aqmp/qmp_shell.py | 29 ++---
 python/setup.cfg  |  3 +++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/python/qemu/aqmp/qmp_shell.py b/python/qemu/aqmp/qmp_shell.py
index c60df787fc..35691494d0 100644
--- a/python/qemu/aqmp/qmp_shell.py
+++ b/python/qemu/aqmp/qmp_shell.py
@@ -89,6 +89,7 @@
 from subprocess import Popen
 import sys
 from typing import (
+IO,
 Iterator,
 List,
 NoReturn,
@@ -170,7 +171,8 @@ class QMPShell(QEMUMonitorProtocol):
 def __init__(self, address: SocketAddrT,
  pretty: bool = False,
  verbose: bool = False,
- server: bool = False):
+ server: bool = False,
+ logfile: Optional[str] = None):
 super().__init__(address, server=server)
 self._greeting: Optional[QMPMessage] = None
 self._completer = QMPCompleter()
@@ -180,6 +182,10 @@ def __init__(self, address: SocketAddrT,
   '.qmp-shell_history')
 self.pretty = pretty
 self.verbose = verbose
+self.logfile = None
+
+if logfile is not None:
+self.logfile = open(logfile, "w", encoding='utf-8')
 
 def close(self) -> None:
 # Hook into context manager of parent to save shell history.
@@ -320,11 +326,11 @@ def _build_cmd(self, cmdline: str) -> 
Optional[QMPMessage]:
 self._cli_expr(cmdargs[1:], qmpcmd['arguments'])
 return qmpcmd
 
-def _print(self, qmp_message: object) -> None:
+def _print(self, qmp_message: object, fh: IO[str] = sys.stdout) -> None:
 jsobj = json.dumps(qmp_message,
indent=4 if self.pretty else None,
sort_keys=self.pretty)
-print(str(jsobj))
+print(str(jsobj), file=fh)
 
 def _execute_cmd(self, cmdline: str) -> bool:
 try:
@@ -347,6 +353,9 @@ def _execute_cmd(self, cmdline: str) -> bool:
 print('Disconnected')
 return False
 self._print(resp)
+if self.logfile is not None:
+cmd = {**qmpcmd, **resp}
+self._print(cmd, fh=self.logfile)
 return True
 
 def connect(self, negotiate: bool = True) -> None:
@@ -414,8 +423,9 @@ class HMPShell(QMPShell):
 def __init__(self, address: SocketAddrT,
  pretty: bool = False,
  verbose: bool = False,
- server: bool = False):
-super().__init__(address, pretty, verbose, server)
+ server: bool = False,
+ logfile: Optional[str] = None):
+super().__init__(address, pretty, verbose, server, logfile)
 self._cpu_index = 0
 
 def _cmd_completion(self) -> None:
@@ -508,6 +518,8 @@ def main() -> None:
 help='Verbose (echo commands sent and received)')
 parser.add_argument('-p', '--pretty', action='store_true',
 help='Pretty-print JSON')
+parser.add_argument('-l', '--logfile',
+help='Save log of all QMP messages to PATH')
 
 default_server = os.environ.get('QMP_SOCKET')
 parser.add_argument('qmp_server', action='store',
@@ -526,7 +538,7 @@ def main() -> None:
 parser.error(f"Bad port number: {args.qmp_server}")
 return  # pycharm doesn't know error() is noreturn
 
-with shell_class(address, args.pretty, args.verbose) as qemu:
+with shell_class(address, args.pretty, args.verbose, args.logfile) as qemu:
 try:
 qemu.connect(negotiate=not args.skip_negotiation)
 except ConnectError as err:
@@ -550,6 +562,8 @@ def main_wrap() -> None:
 help='Verbose (echo

Re: [PATCH v4 20/47] target/ppc: implement vslq

2022-02-23 Thread Richard Henderson


On 2/23/22 11:53, Matheus K. Ferst wrote:

On 22/02/2022 19:14, Richard Henderson wrote:

On 2/22/22 04:36, matheus.fe...@eldorado.org.br wrote:

From: Matheus Ferst 

Signed-off-by: Matheus Ferst 
---
v4:
  -  New in v4.
---
  target/ppc/insn32.decode    |  1 +
  target/ppc/translate/vmx-impl.c.inc | 40 +
  2 files changed, 41 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 88baebe35e..3799065508 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -473,6 +473,7 @@ VSLB    000100 . . . 0010100    @VX
  VSLH    000100 . . . 00101000100    @VX
  VSLW    000100 . . . 0011100    @VX
  VSLD    000100 . . . 10111000100    @VX
+VSLQ    000100 . . . 0010101    @VX

  VSRB    000100 . . . 0100100    @VX
  VSRH    000100 . . . 01001000100    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index ec4f0e7654..ca98a545ef 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -834,6 +834,46 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, 
tcg_gen_gvec_sarv);

  TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
  TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, 
tcg_gen_gvec_sarv);

+static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
+{
+    TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    n = tcg_temp_new_i64();
+    hi = tcg_temp_new_i64();
+    lo = tcg_temp_new_i64();
+    tmp = tcg_const_i64(0);
+
+    get_avr64(lo, a->vra, false);
+    get_avr64(hi, a->vra, true);
+
+    get_avr64(n, a->vrb, true);
+    tcg_gen_andi_i64(n, n, 0x7F);
+
+    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
+    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);


Since you have to mask twice anyway, better use (n & 64) != 0.



Hmm, I'm not sure if I understood. To check != 0 we'll need a temp to hold n&64. We could 
use tmp here, but we'll need another one in patch 22. Is that right?


Yes.

r~

[PULL 1/6] python: introduce qmp-shell-wrap convenience tool

2022-02-23 Thread John Snow

From: Daniel P. Berrangé 

With the current 'qmp-shell' tool developers must first spawn QEMU with
a suitable -qmp arg and then spawn qmp-shell in a separate terminal
pointing to the right socket.

With 'qmp-shell-wrap' developers can ignore QMP sockets entirely and
just pass the QEMU command and arguments they want. The program will
listen on a UNIX socket and tell QEMU to connect QMP to that.

For example, this:

 # qmp-shell-wrap -- qemu-system-x86_64 -display none

Is roughly equivalent of running:

 # qemu-system-x86_64 -display none -qmp qmp-shell-1234 &
 # qmp-shell qmp-shell-1234

Except that 'qmp-shell-wrap' switches the socket peers around so that
it is the UNIX socket server and QEMU is the socket client. This makes
QEMU reliably go away when qmp-shell-wrap exits, closing the server
socket.

Signed-off-by: Daniel P. Berrangé 
Message-id: 20220128161157.36261-2-berra...@redhat.com
[Edited for rebase. --js]
Signed-off-by: John Snow 
---
 python/qemu/aqmp/qmp_shell.py | 65 ---
 python/setup.cfg  |  1 +
 scripts/qmp/qmp-shell-wrap| 11 ++
 3 files changed, 73 insertions(+), 4 deletions(-)
 create mode 100755 scripts/qmp/qmp-shell-wrap

diff --git a/python/qemu/aqmp/qmp_shell.py b/python/qemu/aqmp/qmp_shell.py
index d11bf54b00..c60df787fc 100644
--- a/python/qemu/aqmp/qmp_shell.py
+++ b/python/qemu/aqmp/qmp_shell.py
@@ -86,6 +86,7 @@
 import os
 import re
 import readline
+from subprocess import Popen
 import sys
 from typing import (
 Iterator,
@@ -167,8 +168,10 @@ class QMPShell(QEMUMonitorProtocol):
 :param verbose: Echo outgoing QMP messages to console.
 """
 def __init__(self, address: SocketAddrT,
- pretty: bool = False, verbose: bool = False):
-super().__init__(address)
+ pretty: bool = False,
+ verbose: bool = False,
+ server: bool = False):
+super().__init__(address, server=server)
 self._greeting: Optional[QMPMessage] = None
 self._completer = QMPCompleter()
 self._transmode = False
@@ -409,8 +412,10 @@ class HMPShell(QMPShell):
 :param verbose: Echo outgoing QMP messages to console.
 """
 def __init__(self, address: SocketAddrT,
- pretty: bool = False, verbose: bool = False):
-super().__init__(address, pretty, verbose)
+ pretty: bool = False,
+ verbose: bool = False,
+ server: bool = False):
+super().__init__(address, pretty, verbose, server)
 self._cpu_index = 0
 
 def _cmd_completion(self) -> None:
@@ -533,5 +538,57 @@ def main() -> None:
 pass
 
 
+def main_wrap() -> None:
+"""
+qmp-shell-wrap entry point: parse command line arguments and
+start the REPL.
+"""
+parser = argparse.ArgumentParser()
+parser.add_argument('-H', '--hmp', action='store_true',
+help='Use HMP interface')
+parser.add_argument('-v', '--verbose', action='store_true',
+help='Verbose (echo commands sent and received)')
+parser.add_argument('-p', '--pretty', action='store_true',
+help='Pretty-print JSON')
+
+parser.add_argument('command', nargs=argparse.REMAINDER,
+help='QEMU command line to invoke')
+
+args = parser.parse_args()
+
+cmd = args.command
+if len(cmd) != 0 and cmd[0] == '--':
+cmd = cmd[1:]
+if len(cmd) == 0:
+cmd = ["qemu-system-x86_64"]
+
+sockpath = "qmp-shell-wrap-%d" % os.getpid()
+cmd += ["-qmp", "unix:%s" % sockpath]
+
+shell_class = HMPShell if args.hmp else QMPShell
+
+try:
+address = shell_class.parse_address(sockpath)
+except QMPBadPortError:
+parser.error(f"Bad port number: {sockpath}")
+return  # pycharm doesn't know error() is noreturn
+
+try:
+with shell_class(address, args.pretty, args.verbose, True) as qemu:
+with Popen(cmd):
+
+try:
+qemu.accept()
+except ConnectError as err:
+if isinstance(err.exc, OSError):
+die(f"Couldn't connect to {args.qmp_server}: {err!s}")
+die(str(err))
+
+for _ in qemu.repl():
+pass
+finally:
+os.unlink(sockpath)
+
+
 if __name__ == '__main__':
 main()
diff --git a/python/setup.cfg b/python/setup.cfg
index 18aea2bab3..0959603238 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -68,6 +68,7 @@ console_scripts =
 qom-fuse = qemu.utils.qom_fuse:QOMFuse.entry_point [fuse]
 qemu-ga-client = qemu.utils.qemu_ga_client:main
 qmp-shell = qemu.aqmp.qmp_shell:main
+qmp-shell-wrap = qemu.aqmp.qmp_shell:main_wrap
 aqmp-tui = qemu.aqmp.aqmp_tui:main [tui]
 
 [flake8]
diff --git a/scripts/qmp/qmp-shell-wrap b/scripts/qmp/qmp-shell-wrap
new file mode 100755
index

[PATCH v2 2/2] virt: vmgenid: introduce driver for reinitializing RNG on VM fork

2022-02-23 Thread Jason A. Donenfeld

VM Generation ID is a feature from Microsoft, described at
, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
, as:

If the OS is running in a VM, there is a problem that most
hypervisors can snapshot the state of the machine and later rewind
the VM state to the saved state. This results in the machine running
a second time with the exact same RNG state, which leads to serious
security problems.  To reduce the window of vulnerability, Windows
10 on a Hyper-V VM will detect when the VM state is reset, retrieve
a unique (not random) value from the hypervisor, and reseed the root
RNG with that unique value.  This does not eliminate the
vulnerability, but it greatly reduces the time during which the RNG
system will produce the same outputs as it did during a previous
instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
After setting that, use `savevm` in the monitor to save the VM state,
then quit QEMU, start it again, and use `loadvm`. That will trigger this
driver's notify function, which hands the new UUID to the RNG.

This driver builds on prior work from Adrian Catangiu at Amazon, and it
is my hope that that team can resume maintenance of this driver.

Cc: Adrian Catangiu 
Cc: Dominik Brodowski 
Cc: Ard Biesheuvel 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/virt/Kconfig   |   9 
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 120 +
 3 files changed, 130 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..d3276dc2095c 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,15 @@ menuconfig VIRT_DRIVERS
 
 if VIRT_DRIVERS
 
+config VMGENID
+   tristate "Virtual Machine Generation ID driver"
+   default y
+   depends on ACPI
+   help
+ Say Y here to use the hypervisor-provided Virtual Machine Generation 
ID
+ to reseed the RNG when the VM is cloned. This is highly recommended if
+ you intend to do any rollback / cloning / snapshotting of VMs.
+
 config FSL_HV_MANAGER
tristate "Freescale hypervisor management driver"
depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..108d0ffcc9aa 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
+obj-$(CONFIG_VMGENID)  += vmgenid.o
 obj-y  += vboxguest/
 
 obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
new file mode 100644
index ..c2255ea6be59
--- /dev/null
+++ b/drivers/virt/vmgenid.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual Machine Generation ID driver
+ *
+ * Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
+ * Copyright (C) 2020 Amazon. All rights reserved.
+ * Copyright (C) 2018 Red Hat Inc. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+ACPI_MODULE_NAME("vmgenid");
+
+static struct {
+   uuid_t this_uuid;
+   uuid_t *next_uuid;
+} state;
+
+static int vmgenid_acpi_add(struct acpi_device *device)
+{
+   struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER };
+   union acpi_object *pss;
+   phys_addr_t phys_addr;
+   acpi_status status;
+   int ret = 0;
+
+   if (!device)
+   return -EINVAL;
+
+   status = acpi_evaluate_object(device->handle, "ADDR", NULL, );
+   if (ACPI_FAILURE(status)) {
+   ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+   return -ENODEV;
+   }
+   pss = buffer.pointer;
+   if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2 ||
+   pss->package.elements[0].type != ACPI_TYPE_INTEGER ||
+   pss->package.elements[1].type != ACPI_TYPE_INTEGER) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   phys_addr = (pss->package.elements[0].integer.value << 0) |
+   (pss->package.elements[1].integer.value << 32);
+   state.next_uuid = acpi_os_map_memory(phys_addr, 
sizeof(*state.next_uuid));
+   if (!state.next_uuid) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   state.this_uuid = *state.next_uuid;
+   device->driver_data = 
+   add_device_randomness(_uuid, sizeof(state.this_uuid));
+
+out:
+   ACPI_FREE(buffer.pointer);
+   return ret;
+}
+
+static

[PULL 4/6] Python: add setuptools v60.0 workaround

2022-02-23 Thread John Snow

Setuptools v60 and later include a bundled version of distutils, a
deprecated standard library scheduled for removal in future versions of
Python. Setuptools v60 is only possible to install for Python 3.7 and later.

Python has a distutils.sysconfig.get_python_lib() function that returns
'/usr/lib/pythonX.Y' on posix systems. RPM-based systems actually use
'/usr/lib64/pythonX.Y' instead, so Fedora patches stdlib distutils for
Python 3.7 and Python 3.8 to return the correct value.

Python 3.9 and later introduce a sys.platlibdir property, which returns
the correct value on RPM-based systems.

The change to a distutils package not provided by Fedora on Python 3.7
and 3.8 causes a regression in distutils.sysconfig.get_python_lib() that
ultimately causes false positives to be emitted by pylint, because it
can no longer find the system source libraries.

Many Python tools are fairly aggressive about updating setuptools
packages, and so even though this package is a fair bit newer than
Python 3.7/3.8, it's not entirely unreasonable for a given user to have
such a modern package with a fairly old Python interpreter.

Updates to Python 3.7 and Python 3.8 are being produced for Fedora which
will fix the problem on up-to-date systems. Until then, we can force the
loading of platform-provided distutils when running the pylint
test. This is the least-invasive yet most comprehensive fix.

References:
 https://github.com/pypa/setuptools/pull/2896
 https://github.com/PyCQA/pylint/issues/5704
 https://github.com/pypa/distutils/issues/110

Signed-off-by: John Snow 
Message-id: 20220204221804.2047468-2-js...@redhat.com
Signed-off-by: John Snow 
---
 python/tests/iotests-pylint.sh | 3 ++-
 python/tests/pylint.sh | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/python/tests/iotests-pylint.sh b/python/tests/iotests-pylint.sh
index 4cae03424b..33c5ae900a 100755
--- a/python/tests/iotests-pylint.sh
+++ b/python/tests/iotests-pylint.sh
@@ -1,4 +1,5 @@
 #!/bin/sh -e
 
 cd ../tests/qemu-iotests/
-python3 -m linters --pylint
+# See commit message for environment variable explainer.
+SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m linters --pylint
diff --git a/python/tests/pylint.sh b/python/tests/pylint.sh
index 4b10b34db7..03d64705a1 100755
--- a/python/tests/pylint.sh
+++ b/python/tests/pylint.sh
@@ -1,2 +1,3 @@
 #!/bin/sh -e
-python3 -m pylint qemu/
+# See commit message for environment variable explainer.
+SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pylint qemu/
-- 
2.34.1

[PULL 3/6] Python: discourage direct setup.py install

2022-02-23 Thread John Snow

When invoking setup.py directly, the default behavior for 'install' is
to run the bdist_egg installation hook, which is ... actually deprecated
by setuptools. It doesn't seem to work quite right anymore.

By contrast, 'pip install' will invoke the bdist_wheel hook
instead. This leads to differences in behavior for the two approaches. I
advocate using pip in the documentation in this directory, but the
'setup.py' which has been used for quite a long time in the Python world
may deceptively appear to work at first glance.

Add an error message that will save a bit of time and frustration
that points the user towards using the supported installation
invocation.

Reported-by: Daniel P. Berrangé 
Signed-off-by: John Snow 
Reviewed-by: Beraldo Leal 
Message-id: 20220207213039.2278569-1-js...@redhat.com
Signed-off-by: John Snow 
---
 python/setup.py | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/python/setup.py b/python/setup.py
index 2014f81b75..c5bc45919a 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -5,9 +5,26 @@
 """
 
 import setuptools
+from setuptools.command import bdist_egg
+import sys
 import pkg_resources
 
 
+class bdist_egg_guard(bdist_egg.bdist_egg):
+"""
+Protect against bdist_egg from being executed
+
+This prevents calling 'setup.py install' directly, as the 'install'
+CLI option will invoke the deprecated bdist_egg hook. "pip install"
+calls the more modern bdist_wheel hook, which is what we want.
+"""
+def run(self):
+sys.exit(
+'Installation directly via setup.py is not supported.\n'
+'Please use `pip install .` instead.'
+)
+
+
 def main():
 """
 QEMU tooling installer
@@ -16,7 +33,7 @@ def main():
 # 
https://medium.com/@daveshawley/safely-using-setup-cfg-for-metadata-1babbe54c108
 pkg_resources.require('setuptools>=39.2')
 
-setuptools.setup()
+setuptools.setup(cmdclass={'bdist_egg': bdist_egg_guard})
 
 
 if __name__ == '__main__':
-- 
2.34.1

[PULL 0/6] Python patches

2022-02-23 Thread John Snow

The following changes since commit 31e3caf21b6cdf54d11f3744b8b341f07a30b5d7:

  Merge remote-tracking branch 
'remotes/lvivier-gitlab/tags/trivial-branch-for-7.0-pull-request' into staging 
(2022-02-22 20:17:09 +)

are available in the Git repository at:

  https://gitlab.com/jsnow/qemu.git tags/python-pull-request

for you to fetch changes up to 89d38c74f4b69a93696392b55a9fee573055d78b:

  MAINTAINERS: python - remove ehabkost and add bleal (2022-02-23 17:07:26 
-0500)


Python patches

New functionality in qmp-shell from Dan, and some packaging fixes.



Daniel P. Berrangé (2):
  python: introduce qmp-shell-wrap convenience tool
  python: support recording QMP session to a file

John Snow (4):
  Python: discourage direct setup.py install
  Python: add setuptools v60.0 workaround
  Revert "python: pin setuptools below v60.0.0"
  MAINTAINERS: python - remove ehabkost and add bleal

 MAINTAINERS|  4 +-
 python/Makefile|  2 -
 python/qemu/aqmp/qmp_shell.py  | 86 +++---
 python/setup.cfg   |  5 +-
 python/setup.py| 19 +++-
 python/tests/iotests-pylint.sh |  3 +-
 python/tests/pylint.sh |  3 +-
 scripts/qmp/qmp-shell-wrap | 11 +
 8 files changed, 118 insertions(+), 15 deletions(-)
 create mode 100755 scripts/qmp/qmp-shell-wrap

-- 
2.34.1

[PATCH v2 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Jason A. Donenfeld

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that.

Cc: Dominik Brodowski 
Cc: Theodore Ts'o 
Cc: Jann Horn 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c  | 53 ++
 include/linux/random.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 536237a0f073..95584f6646f9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -508,6 +508,40 @@ static size_t crng_pre_init_inject(const void *input, 
size_t len,
return len;
 }
 
+/*
+ * This mixes unique_vm_id directly into the base_crng key as soon as
+ * possible, similarly to crng_pre_init_inject(), even if the crng is
+ * already running, in order to immediately branch streams from prior
+ * VM instances.
+ */
+static void crng_vmfork_inject(const void *unique_vm_id, size_t len)
+{
+   unsigned long flags, next_gen;
+   struct blake2s_state hash;
+
+   spin_lock_irqsave(_crng.lock, flags);
+
+   /*
+* Update the generation, while locked, as early as possible
+* This will mean unlocked reads of the generation will
+* cause a reseeding of per-cpu crngs, and those will spin
+* on the base_crng lock waiting for the rest of this function
+* to complete, which achieves the goal of blocking the
+* production of new output until this is done.
+*/
+   next_gen = base_crng.generation + 1;
+   if (next_gen == ULONG_MAX)
+   ++next_gen;
+   WRITE_ONCE(base_crng.generation, next_gen);
+
+   blake2s_init(, sizeof(base_crng.key));
+   blake2s_update(, base_crng.key, sizeof(base_crng.key));
+   blake2s_update(, unique_vm_id, len);
+   blake2s_final(, base_crng.key);
+
+   spin_unlock_irqrestore(_crng.lock, flags);
+}
+
 static void _get_random_bytes(void *buf, size_t nbytes)
 {
u32 chacha_state[CHACHA_STATE_WORDS];
@@ -935,6 +969,7 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * void add_hwgenerator_randomness(const void *buffer, size_t count,
  * size_t entropy);
  * void add_bootloader_randomness(const void *buf, size_t size);
+ * void add_vmfork_randomness(const void *unique_vm_id, size_t size);
  * void add_interrupt_randomness(int irq);
  *
  * add_device_randomness() adds data to the input pool that
@@ -966,6 +1001,11 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * add_device_randomness(), depending on whether or not the configuration
  * option CONFIG_RANDOM_TRUST_BOOTLOADER is set.
  *
+ * add_vmfork_randomness() adds a unique (but not neccessarily secret) ID
+ * representing the current instance of a VM to the pool, without crediting,
+ * and then immediately mixes that ID into the current base_crng key, so
+ * that it takes effect prior to a reseeding.
+ *
  * add_interrupt_randomness() uses the interrupt timing as random
  * inputs to the entropy pool. Using the cycle counters and the irq source
  * as inputs, it feeds the input pool roughly once a second or after 64
@@ -1195,6 +1235,19 @@ void add_bootloader_randomness(const void *buf, size_t 
size)
 }
 EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
+/*
+ * Handle a new unique VM ID, which is unique, not secret, so we
+ * don't credit it, but we do mix it into the entropy pool and
+ * inject it into the crng.
+ */
+void add_vmfork_randomness(const void *unique_vm_id, size_t size)
+{
+   add_device_randomness(unique_vm_id, size);
+   if (crng_ready())
+   crng_vmfork_inject(unique_vm_id, size);
+}
+EXPORT_SYMBOL_GPL(add_vmfork_randomness);
+
 struct fast_pool {
union {
u32 pool32[4];
diff --git a/include/linux/random.h b/include/linux/random.h
index 6148b8d1ccf3..51b8ed797732 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned 
int code,
 extern void add_interrupt_randomness(int irq) __latent_entropy;
 extern void add_hwgenerator_randomness(const void *buffer, size_t count,
   size_t entropy);
+extern void add_vmfork_randomness(const void *unique_vm_id, size_t size);
 
 extern void get_random_bytes(void *buf, size_t nbytes);
 extern int wait_for_random_bytes(void);
-- 
2.35.1

[PATCH v2 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

This small series picks up work from Amazon that seems to have stalled
out last year around this time: listening for the vmgenid ACPI
notification, and using it to "do something." Last year, folks proposed
a complicated userspace mmap chardev, which was frought with
difficulty and evidently abandoned. This year, instead, I have something
much simpler in mind: simply using those ACPI notifications to tell the
RNG to reinitialize safely, so we don't repeat random numbers in cloned,
forked, or rolled-back VM instances.

This series consists of two patches. The first one adds the right hooks
into the actual RNG, and the second is a driver for the ACPI
notification.

I had posted an RFC v1 earlier today, thinking I really needed to
request comments, lacking much experience with ACPI drivers. But having
spent all day reworking this driver, and then testing and debugging it
in a variety of circumstances, I feel fairly confident that it works
well, so this is now the real thing. Please review! Here's a little
screencast showing it in action: 
https://data.zx2c4.com/vmgenid-appears-to-work.gif

As a side note, this series intentionally does _not_ focus on
notification of these events to userspace or to other kernel consumers.
Since these VM fork detection events first need to hit the RNG, we can
later talk about what sorts of notifications or mmap'd counters the RNG
should be making accessible to elsewhere. But that's a different sort of
project and ties into a lot of more complicated concerns beyond this
more basic patchset. So hopefully we can keep the discussion rather
focused here to this ACPI business.

Changes v1->v2:
- [Ard] Correct value of MODULE_LICENSE().
- [Ard] Use ordinary memory accesses instead of memcpy_fromio.
- [Ard] Make module a tristate and set MODULE_DEVICE_TABLE().
- [Ard] Free buffer after using.
- Use { } instead of { "", 0 }.
- Clean up interface into RNG.
- Minimize ACPI driver a bit.

In addition to the usual suspects, I'm CCing the original team from
Amazon who proposed this last year and the QEMU developers who added it
there, as well as the kernel Hyper-V maintainers, since this is
technically a Microsoft-proposed thing, though QEMU now implements it.

Cc: adr...@parity.io
Cc: d...@amazon.co.uk
Cc: g...@amazon.com
Cc: colmm...@amazon.com
Cc: raduw...@amazon.com
Cc: imamm...@redhat.com
Cc: ehabk...@redhat.com
Cc: b...@skyportsystems.com
Cc: m...@redhat.com
Cc: k...@microsoft.com
Cc: haiya...@microsoft.com
Cc: sthem...@microsoft.com
Cc: wei@kernel.org
Cc: de...@microsoft.com
Cc: li...@dominikbrodowski.net
Cc: a...@kernel.org
Cc: ja...@google.com
Cc: gre...@linuxfoundation.org
Cc: ty...@mit.edu

Jason A. Donenfeld (2):
  random: add mechanism for VM forks to reinitialize crng
  virt: vmgenid: introduce driver for reinitializing RNG on VM fork

 drivers/char/random.c  |  53 ++
 drivers/virt/Kconfig   |   9 
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 120 +
 include/linux/random.h |   1 +
 5 files changed, 184 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

-- 
2.35.1

Re: [PATCH v4 20/47] target/ppc: implement vslq

2022-02-23 Thread Matheus K. Ferst


On 22/02/2022 19:14, Richard Henderson wrote:

On 2/22/22 04:36, matheus.fe...@eldorado.org.br wrote:

From: Matheus Ferst 

Signed-off-by: Matheus Ferst 
---
v4:
  -  New in v4.
---
  target/ppc/insn32.decode    |  1 +
  target/ppc/translate/vmx-impl.c.inc | 40 +
  2 files changed, 41 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 88baebe35e..3799065508 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -473,6 +473,7 @@ VSLB    000100 . . . 
0010100    @VX

  VSLH    000100 . . . 00101000100    @VX
  VSLW    000100 . . . 0011100    @VX
  VSLD    000100 . . . 10111000100    @VX
+VSLQ    000100 . . . 0010101    @VX

  VSRB    000100 . . . 0100100    @VX
  VSRH    000100 . . . 01001000100    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc

index ec4f0e7654..ca98a545ef 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -834,6 +834,46 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, 
MO_16, tcg_gen_gvec_sarv);
  TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, 
tcg_gen_gvec_sarv);
  TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, 
tcg_gen_gvec_sarv);


+static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
+{
+    TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    n = tcg_temp_new_i64();
+    hi = tcg_temp_new_i64();
+    lo = tcg_temp_new_i64();
+    tmp = tcg_const_i64(0);
+
+    get_avr64(lo, a->vra, false);
+    get_avr64(hi, a->vra, true);
+
+    get_avr64(n, a->vrb, true);
+    tcg_gen_andi_i64(n, n, 0x7F);
+
+    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
+    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);


Since you have to mask twice anyway, better use (n & 64) != 0.



Hmm, I'm not sure if I understood. To check != 0 we'll need a temp to 
hold n&64. We could use tmp here, but we'll need another one in patch 
22. Is that right?


Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer

Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree

2022-02-23 Thread Matheus K. Ferst


On 22/02/2022 19:30, Richard Henderson wrote:

On 2/22/22 04:36, matheus.fe...@eldorado.org.br wrote:

+static void gen_vrlnm_vec(unsigned vece, TCGv_vec vrt, TCGv_vec vra,
+  TCGv_vec vrb)
+{
+    TCGv_vec mask, n = tcg_temp_new_vec_matching(vrt);
+
+    /* Create the mask */
+    mask = do_vrl_mask_vec(vece, vrb);
+
+    /* Extract n */
+    tcg_gen_dupi_vec(vece, n, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, n, vrb, n);
+
+    /* Rotate and mask */
+    tcg_gen_rotlv_vec(vece, vrt, vra, n);


Note that rotlv does the masking itself:

/*
  * Expand D = A << (B % element bits)
  *
  * Unlike scalar shifts, where it is easy for the target front end
  * to include the modulo as part of the expansion.  If the target
  * naturally includes the modulo as part of the operation, great!
  * If the target has some other behaviour from out-of-range shifts,
  * then it could not use this function anyway, and would need to
  * do it's own expansion with custom functions.
  */



Using tcg_gen_rotlv_vec(vece, vrt, vra, vrb) works on PPC but fails on 
x86. It looks like a problem on the i386 backend. It's using 
VPS[RL]LV[DQ], but instead of this modulo behavior, these instructions 
write zero to the element[1]. I'm not sure how to fix that. Do we need 
an INDEX_op_shlv_vec case in i386 tcg_expand_vec_op?



+static bool do_vrlnm(DisasContext *ctx, arg_VX *a, int vece)
+{
+    static const TCGOpcode vecop_list[] = {
+    INDEX_op_cmp_vec, INDEX_op_rotlv_vec, INDEX_op_sari_vec,
+    INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_shrv_vec, 0
+    };


Where is sari used?



I'll remove in v5.

[1] Section 5.3 of 
https://www.intel.com/content/dam/develop/external/us/en/documents/36945


Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer

Re: [PATCH v3 4/6] i386/pc: relocate 4g start to 1T where applicable

2022-02-23 Thread Michael S. Tsirkin

On Wed, Feb 23, 2022 at 06:44:53PM +, Joao Martins wrote:
> It is assumed that the whole GPA space is available to be DMA
> addressable, within a given address space limit, expect for a
> tiny region before the 4G. Since Linux v5.4, VFIO validates
> whether the selected GPA is indeed valid i.e. not reserved by
> IOMMU on behalf of some specific devices or platform-defined
> restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
>  -EINVAL.
> 
> AMD systems with an IOMMU are examples of such platforms and
> particularly may only have these ranges as allowed:
> 
>    - fedf (0  .. 3.982G)
>   fef0 - 00fc (3.983G .. 1011.9G)
>   0100 -  (1Tb.. 16Pb[*])
> 
> We already account for the 4G hole, albeit if the guest is big
> enough we will fail to allocate a guest with  >1010G due to the
> ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).

Could you point me to which driver then reserves the
other regions on Linux for AMD platforms?

> 
> [*] there is another reserved region unrelated to HT that exists
> in the 256T boundaru in Fam 17h according to Errata #1286,
> documeted also in "Open-Source Register Reference for AMD Family
> 17h Processors (PUB)"
> 
> When creating the region above 4G, take into account that on AMD
> platforms the HyperTransport range is reserved and hence it
> cannot be used either as GPAs. On those cases rather than
> establishing the start of ram-above-4g to be 4G, relocate instead
> to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
> Topology", for more information on the underlying restriction of
> IOVAs.
> 
> After accounting for the 1Tb hole on AMD hosts, mtree should
> look like:
> 
> -7fff (prio 0, i/o):
>alias ram-below-4g @pc.ram -7fff
> 0100-01ff7fff (prio 0, i/o):
>   alias ram-above-4g @pc.ram 8000-00ff
> 
> If the relocation is done, we also add the the reserved HT
> e820 range as reserved.
> 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Joao Martins 
> ---
>  hw/i386/pc.c  | 79 +++
>  target/i386/cpu.h |  4 +++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 360f4e10001b..6e4f5c87a2e5 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -802,6 +802,78 @@ void xen_load_linux(PCMachineState *pcms)
>  #define PC_ROM_ALIGN   0x800
>  #define PC_ROM_SIZE(PC_ROM_MAX - PC_ROM_MIN_VGA)
>  
> +/*
> + * AMD systems with an IOMMU have an additional hole close to the
> + * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
> + * on kernel version, VFIO may or may not let you DMA map those ranges.
> + * Starting Linux v5.4 we validate it, and can't create guests on AMD 
> machines
> + * with certain memory sizes. It's also wrong to use those IOVA ranges
> + * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
> + * The ranges reserved for Hyper-Transport are:
> + *
> + * FD__h - FF__h
> + *
> + * The ranges represent the following:
> + *
> + * Base Address   Top Address  Use
> + *
> + * FD__h FD_F7FF_h Reserved interrupt address space
> + * FD_F800_h FD_F8FF_h Interrupt/EOI IntCtl
> + * FD_F900_h FD_F90F_h Legacy PIC IACK
> + * FD_F910_h FD_F91F_h System Management
> + * FD_F920_h FD_FAFF_h Reserved Page Tables
> + * FD_FB00_h FD_FBFF_h Address Translation
> + * FD_FC00_h FD_FDFF_h I/O Space
> + * FD_FE00_h FD__h Configuration
> + * FE__h FE_1FFF_h Extended Configuration/Device Messages
> + * FE_2000_h FF__h Reserved
> + *
> + * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
> + * Table 3: Special Address Controls (GPA) for more information.
> + */
> +#define AMD_HT_START 0xfdUL
> +#define AMD_HT_END   0xffUL
> +#define AMD_ABOVE_1TB_START  (AMD_HT_END + 1)
> +#define AMD_HT_SIZE  (AMD_ABOVE_1TB_START - AMD_HT_START)
> +
> +static hwaddr x86_max_phys_addr(PCMachineState *pcms,
> +uint64_t pci_hole64_size)
> +{
> +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +X86MachineState *x86ms = X86_MACHINE(pcms);
> +MachineState *machine = MACHINE(pcms);
> +ram_addr_t device_mem_size = 0;
> +hwaddr base;
> +
> +if (pcmc->has_reserved_memory &&
> +   (machine->ram_size < machine->maxram_size)) {
> +device_mem_size = machine->maxram_size - machine->ram_size;
> +}
> +
> +base = ROUND_UP(x86ms->above_4g_mem_start + x86ms->above_4g_mem_size +
> +pcms->sgx_epc.size, 1 * GiB);
> +
> +return base + device_mem_size + pci_hole64_size;
> +}
> +
> +static void x86_update_above_4g_mem_start(PCMachineState *pcms,
> +  uint64_t

Re: [PATCH v6 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-23 Thread David Miller

> Bit 0 controls this, and recall that IBM uses big-bit numbering, so "8".

> This stores the low part of r[23] in the high part of r1.
> You need to select the high part of r[23].

good catch, these are both fixed will update patch shortly.

Thanks for the review

- David Miller




On Wed, Feb 23, 2022 at 2:41 PM Richard Henderson
 wrote:
>
> On 2/17/22 13:17, David Miller wrote:
> > +/* SELECT HIGH */
> > +C(0xb9c0, SELFHR,  RRF_a, MIE3, r3, r2, new, r1_32h, loc, 0)
>
> This stores the low part of r[23] in the high part of r1.
> You need to select the high part of r[23].
>
> >   static DisasJumpType op_popcnt(DisasContext *s, DisasOps *o)
> >   {
> > -gen_helper_popcnt(o->out, o->in2);
> > +const uint8_t m3 = get_field(s, m3);
> > +
> > +if ((m3 & 1) && s390_has_feat(S390_FEAT_MISC_INSTRUCTION_EXT3)) {
>
> Bit 0 controls this, and recall that IBM uses big-bit numbering, so "8".
>
>
> r~

Re: [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features

2022-02-23 Thread Richard Henderson


On 2/17/22 04:07, Peter Maydell wrote:

This series seems to break 'make check-acceptance':

  (01/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'01-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j...
(900.74 s)
  (02/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'02-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j...
(900.71 s)

UEFI runs in the guest and seems to launch the kernel, but there's
no output from the kernel itself in the logfile. Last thing it
prints is:

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
SetUefiImageMemoryAttributes - 0x7F50 - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7C19 - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7C14 - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7F4C - 0x0003
(0x0008)
SetUefiImageMemoryAttributes - 0x7C0F - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7BFB - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7BE0 - 0x0003
(0x0008)
SetUefiImageMemoryAttributes - 0x7BDC - 0x0003
(0x0008)

This ought to be followed by the usual kernel boot log
[0.00] Booting Linux on physical CPU 0x00 [0x000f0510]
etc but it isn't. Probably the kernel is crashing in early bootup
before it gets round to printing anything.


Ug.  The v5.3.7 kernel we're trying to boot is actively broken wrt LPA2:

ENTRY(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
b.ne__no_granule_support

That's an exact match for TGRAN4 == 0, so the LPA2 value sends the cpu into a 
sleep loop.

This is fixed in 26f55386f964c, included in v5.12.

So... we're going to need to update avocado, or something.


r~

Re: [PATCH v6 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-23 Thread David Miller

> No test for popcnt, seeing as there's a bug in m3?

Originally popcnt was not in the task list, it was added later.

> You can't split these two asm, lest the ltgr and sel not be adjacent, and the 
> flags not
> having the correct value when we arrive at the sel.

This was tested, both gcc and clang assemble multiple 'asm' statements
into a single block as long as there are no C statements between.
I'm happy to change it.

On Wed, Feb 23, 2022 at 2:45 PM Richard Henderson
 wrote:
>
> On 2/17/22 13:17, David Miller wrote:
> > +#define F_PROasm ( \
> > +"lg %%r2, %[a]\n"  \
> > +"lg %%r3, %[b]\n"  \
> > +"lg %%r0, %[c]\n"  \
> > +"ltgr %%r0, %%r0"  \
> > +: : [a] "m" (a),   \
> > +[b] "m" (b),   \
> > +[c] "m" (c)\
> > +: "r0", "r2", "r3", "r4")
> > +
> > +
> > +
> > +#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
> > +{ uint64_t res = 0; F_PRO ; ASM ; return res; }
> > +
> > +
> > +Fi3 (_selre, asm("selre%%r0, %%r3, %%r2\n" F_EPI))
> > +Fi3 (_selgrz,asm("selgrz   %%r0, %%r3, %%r2\n" F_EPI))
> > +Fi3 (_selfhrnz,  asm("selfhrnz %%r0, %%r3, %%r2\n" F_EPI))
>
> You can't split these two asm, lest the ltgr and sel not be adjacent, and the 
> flags not
> having the correct value when we arrive at the sel.
>
> No test for popcnt, seeing as there's a bug in m3?
>
>
> r~

RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-02-23 Thread Ivan Shcherbakov

Hi Paolo,

Thanks for getting back to me. Please see my comments below:

>Please use WhpxStepMode and likewise for WhpxBreakpointState.
No problem, I have updated the patch.

>(In the case of WhpxStepMode I would also consider simply a "bool exclusive" 
>in whpx_cpu_run).
This is a leftover from prior experiments with stepping into the exception 
handlers 
that involves reading the IDT/GDT, writing INT3 into the original handler, 
disabling INT1 interception
and instead intercepting INT3. It would be implemented via a third step mode. I 
ended up removing
it due to the unbelievable complexity of properly handling all corner cases, 
but I thought it would make
sense to keep it as an enum if someone else decides to add anything similar 
later.

>Please leave the empty line.
Oops, leftover from other experiments. Thanks for pointing out.

>Out of curiosity, does the guest see TF=1 if it single steps through a PUSHF 
>(and then break horribly on POPF :))?
This is a very good point. It would indeed get pushed into the stack and popped 
back with POPF, raising another INT1.
I shouldn't be too horrible though: it would report another stop to gdb, and 
doing a step again would clear it.
A somewhat bigger issue would be that stepping through POPF or LAHF could reset 
the TF to 0, effectively ending the
single-stepping mode.

It could be addressed by emulating PUSHF/POPF, but there are a few corner cases 
(alignment checks, page faults on RSP,
flag translation for the virtual 8086 mode), that could make things more 
complicated. Also, PUSHF/POPF it not the only
special case. Stepping over IRET, into page fault handlers, and when the guest 
itself is running a debugger that wants
to do single-stepping would also require rather non-trivial workarounds.

I have added a detailed comment outlining the limitations of the current 
stepping logic and possible workarounds above
the definition of WhpxStepMode. The current implementation works like a charm 
for debugging real-world kernel modules
on x64 Ubuntu, and if some of these limitations end up breaking other debug 
scenarios, I would much more prefer
addressing specific narrow issues, than adding another layer of complexity 
proactively, if that's OK with you. 

I have taken special care to make sure the new functionality won't cause any 
regressions when not debugging.
The intercepted exception mask is set to 0 when gdb is not connected, 100% 
matching the behavior prior to the patch.

>Why separate the original addresses in a different array
This accommodates the case when different parts of QEMU would set multiple 
breakpoints at the same location, or when
a breakpoint removed on the QEMU level could not be removed from RAM (e.g. the 
page got swapped out). 
Synchronizing the QEMU breakpoint list with the RAM breakpoint list currently 
has O(N^2) complexity. However, in many
cases (e.g. stopping to handle timers), the QEMU breakpoints won't change 
between the invocations, so we can just do a
quick O(N) comparison of the old breakpoint list vs. new breakpoint list, and 
avoid the O(N^2) part. You can find this logic
by searching for the 'update_pending' variable.

This could be avoided if cpu_breakpoint_insert() or gdb_breakpoint_insert() 
could invoke a WHPX-specific handler,
similar to what is currently done with kvm_insert_breakpoint(), or a generic 
callback via AccelOpsClass, that could just
mark the RAM breakpoint list dirty. But I didn't want to introduce unnecessary 
changes outside the WHPX module.

>(and why the different logic, with used/allocated for one array and an exact 
>size for the other)
When we are rebuilding the memory breakpoint list, we don't know how many of 
the CPU breakpoints will refer to the same
memory addresses, or overlap with the breakpoints that could not be removed. 
So, we allocate for the worst case, and keep
a track of the actually created entries in the 'used' field.

Updated patch below. Let me know if you have any further concerns and I will be 
happy to address them.

This adds support for breakpoints and stepping when debugging WHPX-accelerated 
guests with gdb.
It enables reliable debugging of the Linux kernel in both single-CPU and SMP 
modes.

Signed-off-by: Ivan Shcherbakov 
---
 gdbstub.c|  10 +
 include/exec/gdbstub.h   |   8 +
 target/i386/whpx/whpx-all.c  | 763 ++-
 target/i386/whpx/whpx-internal.h |  29 ++
 4 files changed, 796 insertions(+), 14 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 3c14c6a038..d30cbfa478 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -373,6 +373,12 @@ typedef struct GDBState {
 } GDBState;
 
 static GDBState gdbserver_state;
+static bool gdbserver_is_connected;
+
+bool gdb_is_connected(void)
+{
+return gdbserver_is_connected;
+}
 
 static void init_gdbserver_state(void)
 {
@@ -3410,6 +3416,10 @@ static void gdb_chr_event(void *opaque, QEMUChrEvent 
event)
 vm_stop(RUN_STATE_PAUSED);

Re: [PATCH v6 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-23 Thread Richard Henderson


On 2/17/22 13:17, David Miller wrote:

+#define F_PROasm ( \
+"lg %%r2, %[a]\n"  \
+"lg %%r3, %[b]\n"  \
+"lg %%r0, %[c]\n"  \
+"ltgr %%r0, %%r0"  \
+: : [a] "m" (a),   \
+[b] "m" (b),   \
+[c] "m" (c)\
+: "r0", "r2", "r3", "r4")
+
+
+
+#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
+{ uint64_t res = 0; F_PRO ; ASM ; return res; }
+
+
+Fi3 (_selre, asm("selre%%r0, %%r3, %%r2\n" F_EPI))
+Fi3 (_selgrz,asm("selgrz   %%r0, %%r3, %%r2\n" F_EPI))
+Fi3 (_selfhrnz,  asm("selfhrnz %%r0, %%r3, %%r2\n" F_EPI))


You can't split these two asm, lest the ltgr and sel not be adjacent, and the flags not 
having the correct value when we arrive at the sel.


No test for popcnt, seeing as there's a bug in m3?


r~

Re: [PATCH v6 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-23 Thread Richard Henderson


On 2/17/22 13:17, David Miller wrote:

+/* SELECT HIGH */
+C(0xb9c0, SELFHR,  RRF_a, MIE3, r3, r2, new, r1_32h, loc, 0)


This stores the low part of r[23] in the high part of r1.
You need to select the high part of r[23].


  static DisasJumpType op_popcnt(DisasContext *s, DisasOps *o)
  {
-gen_helper_popcnt(o->out, o->in2);
+const uint8_t m3 = get_field(s, m3);
+
+if ((m3 & 1) && s390_has_feat(S390_FEAT_MISC_INSTRUCTION_EXT3)) {


Bit 0 controls this, and recall that IBM uses big-bit numbering, so "8".


r~

Re: Analysis of slow distro boots in check-avocado (BootLinuxAarch64.test_virt_tcg*)

2022-02-23 Thread Peter Maydell

On Wed, 23 Feb 2022 at 16:38, Laszlo Ersek  wrote:
> BTW I still don't understand the problem with the DEBUG firmware builds;
> in the test suite, as many debug messages should be printed as possible,
> for helping with the analysis of any new issue that pops up. I've
> re-read Alex's message that I got first CC'd on, and I can't connect the
> dots, sorry.

As well as the performance question, these images aren't purely
used by the test suite -- we install them for end-users via
'make install'. If we want debug-images for use with the
test suite as well as generally usable ones, we should label
them appropriately and not install them.

-- PMM

Re: Fix a potential Use-after-free bug in handle_simd_shift_fpint_conv() (v6.2.0).

2022-02-23 Thread Richard Henderson


On 2/23/22 04:33, wli...@stu.xidian.edu.cn wrote:


Hi all,

I find a potential Use-after-free bug in QEMU 6.2.0, which is in 
handle_simd_shift_fpint_conv()(./target/arm/translate-a64.c).


At line 9048, a variable 'tcg_fpstatus' is freed by invoking tcg_temp_free_ptr(). However, 
at line 9050, the variable 'tcg_fpstatus' is subsequently use as the 3rd parameter of the 
function gen_helper_set_rmode. This may result in a use-after-free bug.



9048    tcg_temp_free_ptr(tcg_fpstatus);
9049    tcg_temp_free_i32(tcg_shift);
9050    gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);


I believe the bug can be fixed by invoking the gen_helper_set_rmode() before 
'tcg_fpstatus' being freed by the tcg_temp_free_ptr().



  ---    tcg_temp_free_ptr(tcg_fpstatus);
9049    tcg_temp_free_i32(tcg_shift);
9050    gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
  +++    tcg_temp_free_ptr(tcg_fpstatus);

I'm looking forward to your confirmation.


The fix is correct.  We just need the submission formatted properly, with your 
Signed-off-by tag.  When re-formatting, you can add my


Reviewed-by: Richard Henderson 


r~

[PATCH v3 6/6] i386/pc: restrict AMD only enforcing of valid IOVAs to new machine type

2022-02-23 Thread Joao Martins

The added enforcing is only relevant in the case of AMD where the
range right before the 1TB is restricted and cannot be DMA mapped
by the kernel consequently leading to IOMMU INVALID_DEVICE_REQUEST
or possibly other kinds of IOMMU events in the AMD IOMMU.

Although, there's a case where it may make sense to disable the
IOVA relocation/validation when migrating from a
non-valid-IOVA-aware qemu to one that supports it.

Relocating RAM regions to after the 1Tb hole has consequences for
guest ABI because we are changing the memory mapping, so make
sure that only new machine enforce but not older ones.

Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 6 ++
 hw/i386/pc_piix.c| 2 ++
 hw/i386/pc_q35.c | 2 ++
 include/hw/i386/pc.h | 1 +
 4 files changed, 11 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 11598a0a39e4..ef0a5325f98a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -859,9 +859,14 @@ static hwaddr x86_max_phys_addr(PCMachineState *pcms,
 static void x86_update_above_4g_mem_start(PCMachineState *pcms,
   uint64_t pci_hole64_size)
 {
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
 uint32_t eax, vendor[3];
 
+if (!pcmc->enforce_valid_iova) {
+return;
+}
+
 host_cpuid(0x0, 0, , [0], [2], [1]);
 if (!IS_AMD_VENDOR(vendor)) {
 return;
@@ -1804,6 +1809,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->has_reserved_memory = true;
 pcmc->kvmclock_enabled = true;
 pcmc->enforce_aligned_dimm = true;
+pcmc->enforce_valid_iova = true;
 /* BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K reported
  * to be used at the moment, 32K should be enough for a while.  */
 pcmc->acpi_data_size = 0x2 + 0x8000;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 5a608e30e28f..c322f9494384 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -435,9 +435,11 @@ DEFINE_I440FX_MACHINE(v7_0, "pc-i440fx-7.0", NULL,
 
 static void pc_i440fx_6_2_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_7_0_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
+pcmc->enforce_valid_iova = false;
 compat_props_add(m->compat_props, hw_compat_6_2, hw_compat_6_2_len);
 compat_props_add(m->compat_props, pc_compat_6_2, pc_compat_6_2_len);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index c81d21d1ebb4..53ed6df1e0e0 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -379,8 +379,10 @@ DEFINE_Q35_MACHINE(v7_0, "pc-q35-7.0", NULL,
 
 static void pc_q35_6_2_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_7_0_machine_options(m);
 m->alias = NULL;
+pcmc->enforce_valid_iova = false;
 compat_props_add(m->compat_props, hw_compat_6_2, hw_compat_6_2_len);
 compat_props_add(m->compat_props, pc_compat_6_2, pc_compat_6_2_len);
 }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d8b9c4ebd748..914340750498 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -117,6 +117,7 @@ struct PCMachineClass {
 bool has_reserved_memory;
 bool enforce_aligned_dimm;
 bool broken_reserved_end;
+bool enforce_valid_iova;
 
 /* generate legacy CPU hotplug AML */
 bool legacy_cpu_hotplug;
-- 
2.17.2

[PATCH v3 3/6] i386/pc: pass pci_hole64_size to pc_memory_init()

2022-02-23 Thread Joao Martins

Use the pre-initialized pci-host qdev and fetch the
pci-hole64-size into pc_memory_init() newly added argument.
piix needs a bit of care given all the !pci_enabled()
and that the pci_hole64_size is private to i440fx.

This is in preparation to determine that host-phys-bits are
enough and for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 3 ++-
 hw/i386/pc_piix.c| 5 -
 hw/i386/pc_q35.c | 8 +++-
 hw/pci-host/i440fx.c | 7 +++
 include/hw/i386/pc.h | 3 ++-
 include/hw/pci-host/i440fx.h | 1 +
 6 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7de0e87f4a3f..360f4e10001b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -805,7 +805,8 @@ void xen_load_linux(PCMachineState *pcms)
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
-MemoryRegion **ram_memory)
+MemoryRegion **ram_memory,
+uint64_t pci_hole64_size)
 {
 int linux_boot, i;
 MemoryRegion *option_rom_mr;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 9ff49e672628..5a608e30e28f 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -91,6 +91,7 @@ static void pc_init1(MachineState *machine,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 ram_addr_t lowmem;
+uint64_t hole64_size;
 DeviceState *i440fx_dev;
 
 /*
@@ -166,10 +167,12 @@ static void pc_init1(MachineState *machine,
 memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
 rom_memory = pci_memory;
 i440fx_dev = qdev_new(host_type);
+hole64_size = i440fx_pci_hole64_size(i440fx_dev);
 } else {
 pci_memory = NULL;
 rom_memory = system_memory;
 i440fx_dev = NULL;
+hole64_size = 0;
 }
 
 pc_guest_info_init(pcms);
@@ -186,7 +189,7 @@ static void pc_init1(MachineState *machine,
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 pc_memory_init(pcms, system_memory,
-   rom_memory, _memory);
+   rom_memory, _memory, hole64_size);
 } else {
 pc_system_flash_cleanup_unused(pcms);
 if (machine->kernel_filename != NULL) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 2881afd75a82..c81d21d1ebb4 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -138,6 +138,7 @@ static void pc_q35_init(MachineState *machine)
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 bool acpi_pcihp;
 bool keep_pci_slot_hpc;
+uint64_t pci_hole64_size = 0;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -206,8 +207,13 @@ static void pc_q35_init(MachineState *machine)
 /* create pci host bus */
 q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
 
+if (pcmc->pci_enabled) {
+pci_hole64_size = q35_host->mch.pci_hole64_size;
+}
+
 /* allocate ram and load rom/bios */
-pc_memory_init(pcms, get_system_memory(), rom_memory, _memory);
+pc_memory_init(pcms, get_system_memory(), rom_memory, _memory,
+   pci_hole64_size);
 
 object_property_add_child(qdev_get_machine(), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
index 5c1bab5c58ed..c5cc28250d5c 100644
--- a/hw/pci-host/i440fx.c
+++ b/hw/pci-host/i440fx.c
@@ -237,6 +237,13 @@ static void i440fx_realize(PCIDevice *dev, Error **errp)
 }
 }
 
+uint64_t i440fx_pci_hole64_size(DeviceState *i440fx_dev)
+{
+I440FXState *i440fx = I440FX_PCI_HOST_BRIDGE(i440fx_dev);
+
+return i440fx->pci_hole64_size;
+}
+
 PCIBus *i440fx_init(const char *host_type, const char *pci_type,
 DeviceState *dev,
 PCII440FXState **pi440fx_state,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 9c9f4ac74810..d8b9c4ebd748 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -158,7 +158,8 @@ void xen_load_linux(PCMachineState *pcms);
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
-MemoryRegion **ram_memory);
+MemoryRegion **ram_memory,
+uint64_t pci_hole64_size);
 uint64_t pc_pci_hole64_start(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(struct PCMachineState *pcms,
diff --git a/include/hw/pci-host/i440fx.h b/include/hw/pci-host/i440fx.h
index c4710445e30a..1299d6a2b0e4 100644
--- a/include/hw/pci-host/i440fx.h
+++ b/include/hw/pci-host/i440fx.h
@@ -45,5 +45,6 @@ PCIBus *i440fx_init(const

[PATCH v3 5/6] i386/pc: warn if phys-bits is too low

2022-02-23 Thread Joao Martins

Default phys-bits on Qemu is TCG_PHYS_BITS (40) which is enough
to address 1Tb (0xff  ). On AMD platforms, if a
ram-above-4g relocation happens and the CPU wasn't configured
with a big enough phys-bits, warn the user. There isn't a
catastrophic failure exactly, the guest will still boot, but
most likely won't be able to use more than ~4G of RAM.

Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6e4f5c87a2e5..11598a0a39e4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -888,6 +888,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr maxphysaddr, maxusedaddr;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -896,6 +897,15 @@ void pc_memory_init(PCMachineState *pcms,
 
 x86_update_above_4g_mem_start(pcms, pci_hole64_size);
 
+maxphysaddr = ((hwaddr)1 << X86_CPU(first_cpu)->phys_bits) - 1;
+maxusedaddr = x86_max_phys_addr(pcms, pci_hole64_size);
+if (maxphysaddr < maxusedaddr) {
+warn_report("Address space above 4G at %"PRIx64"-%"PRIx64
+" phys-bits too low (%u)",
+x86ms->above_4g_mem_start, maxusedaddr,
+X86_CPU(first_cpu)->phys_bits);
+}
+
 /*
  * Split single memory region and use aliases to address portions of it,
  * done for backwards compatibility with older qemus.
-- 
2.17.2

[PATCH v3 2/6] i386/pc: create pci-host qdev prior to pc_memory_init()

2022-02-23 Thread Joao Martins

At the start of pc_memory_init() we usually pass a range of
0..UINT64_MAX as pci_memory, when really its 2G (i440fx) or
32G (q35). To get the real user value, we need to get pci-host
passed property for default pci_hole64_size. Thus to get that,
create the qdev prior to memory init to better make estimations
on max used/phys addr.

This is in preparation to determine that host-phys-bits are
enough and also for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins 
---
 hw/i386/pc_piix.c| 5 -
 hw/i386/pc_q35.c | 6 +++---
 hw/pci-host/i440fx.c | 3 +--
 include/hw/pci-host/i440fx.h | 2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d9b344248dac..9ff49e672628 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -91,6 +91,7 @@ static void pc_init1(MachineState *machine,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 ram_addr_t lowmem;
+DeviceState *i440fx_dev;
 
 /*
  * Calculate ram split, for memory below and above 4G.  It's a bit
@@ -164,9 +165,11 @@ static void pc_init1(MachineState *machine,
 pci_memory = g_new(MemoryRegion, 1);
 memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
 rom_memory = pci_memory;
+i440fx_dev = qdev_new(host_type);
 } else {
 pci_memory = NULL;
 rom_memory = system_memory;
+i440fx_dev = NULL;
 }
 
 pc_guest_info_init(pcms);
@@ -199,7 +202,7 @@ static void pc_init1(MachineState *machine,
 
 pci_bus = i440fx_init(host_type,
   pci_type,
-  _state,
+  i440fx_dev, _state,
   system_memory, system_io, machine->ram_size,
   x86ms->below_4g_mem_size,
   x86ms->above_4g_mem_size,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1780f79bc127..2881afd75a82 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -203,12 +203,12 @@ static void pc_q35_init(MachineState *machine)
 pcms->smbios_entry_point_type);
 }
 
-/* allocate ram and load rom/bios */
-pc_memory_init(pcms, get_system_memory(), rom_memory, _memory);
-
 /* create pci host bus */
 q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
 
+/* allocate ram and load rom/bios */
+pc_memory_init(pcms, get_system_memory(), rom_memory, _memory);
+
 object_property_add_child(qdev_get_machine(), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
  OBJECT(ram_memory), NULL);
diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
index e08716142b6e..5c1bab5c58ed 100644
--- a/hw/pci-host/i440fx.c
+++ b/hw/pci-host/i440fx.c
@@ -238,6 +238,7 @@ static void i440fx_realize(PCIDevice *dev, Error **errp)
 }
 
 PCIBus *i440fx_init(const char *host_type, const char *pci_type,
+DeviceState *dev,
 PCII440FXState **pi440fx_state,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
@@ -247,7 +248,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 MemoryRegion *pci_address_space,
 MemoryRegion *ram_memory)
 {
-DeviceState *dev;
 PCIBus *b;
 PCIDevice *d;
 PCIHostState *s;
@@ -255,7 +255,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 unsigned i;
 I440FXState *i440fx;
 
-dev = qdev_new(host_type);
 s = PCI_HOST_BRIDGE(dev);
 b = pci_root_bus_new(dev, NULL, pci_address_space,
  address_space_io, 0, TYPE_PCI_BUS);
diff --git a/include/hw/pci-host/i440fx.h b/include/hw/pci-host/i440fx.h
index f068aaba8fda..c4710445e30a 100644
--- a/include/hw/pci-host/i440fx.h
+++ b/include/hw/pci-host/i440fx.h
@@ -36,7 +36,7 @@ struct PCII440FXState {
 #define TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE "igd-passthrough-i440FX"
 
 PCIBus *i440fx_init(const char *host_type, const char *pci_type,
-PCII440FXState **pi440fx_state,
+DeviceState *dev, PCII440FXState **pi440fx_state,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
-- 
2.17.2

Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled

2022-02-23 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Wed, Feb 23, 2022 at 09:52:08AM +, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > On Tue, Feb 22, 2022 at 11:32:10AM +, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (pet...@redhat.com) wrote:
> > > > > To allow postcopy recovery, the ram fast load (preempt-only) dest 
> > > > > QEMU thread
> > > > > needs similar handling on fault tolerance.  When ram_load_postcopy() 
> > > > > fails,
> > > > > instead of stopping the thread it halts with a semaphore, preparing 
> > > > > to be
> > > > > kicked again when recovery is detected.
> > > > > 
> > > > > A mutex is introduced to make sure there's no concurrent operation 
> > > > > upon the
> > > > > socket.  To make it simple, the fast ram load thread will take the 
> > > > > mutex during
> > > > > its whole procedure, and only release it if it's paused.  The 
> > > > > fast-path socket
> > > > > will be properly released by the main loading thread safely when 
> > > > > there's
> > > > > network failures during postcopy with that mutex held.
> > > > 
> > > > I *think* this is mostly OK; but I worry I don't understand all the
> > > > cases; e.g.
> > > >   a) If the postcopy channel errors first
> > > >   b) If the main channel errors first
> > > 
> > > Ah right, I don't think I handled all the cases.  Sorry.
> > > 
> > > We always check the main channel, but if the postcopy channel got faulted,
> > > we may not fall into paused mode as expected.
> > > 
> > > I'll fix that up.
> > 
> > Thanks.
> > 
> > > > 
> > > > Can you add some docs to walk through those and explain the locking ?
> > > 
> > > Sure.
> > > 
> > > The sem is mentioned in the last sentence of paragraph 1, where it's 
> > > purely
> > > used for a way to yield the fast ram load thread so that when something
> > > wrong happens it can sleep on that semaphore.  Then when we recover we'll
> > > post to the semaphore to kick it up.  We used it like that in many places,
> > > e.g. postcopy_pause_sem_dst to yield the main load thread.
> > > 
> > > The 2nd paragraph above was for explaining why we need the mutex; it's
> > > basically the same as rp_mutex protecting to_src_file, so that we won't
> > > accidentally close() the qemufile during some other thread using it.  So
> > > the fast ram load thread needs to take that new mutex for mostly the whole
> > > lifecycle of itself (because it's loading from that qemufile), meanwhile
> > > only drop the mutex when it prepares to sleep.  Then the main load thread
> > > can recycle the postcopy channel using qemu_fclose() safely.
> > 
> > Yes, that feels like it needs to go in the code somewhere.
> 
> Sure, I'll further squash below comment update into the same patch.  I
> reworded some places but mostly it should be telling the same thing:
> 
> ---8<---
> diff --git a/migration/migration.h b/migration/migration.h
> index 945088064a..91f845e9e4 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -118,7 +118,17 @@ struct MigrationIncomingState {
>  /* Postcopy priority thread is used to receive postcopy requested pages 
> */
>  QemuThread postcopy_prio_thread;
>  bool postcopy_prio_thread_created;
> -/* Used to sync with the prio thread */
> +/*
> + * Used to sync between the ram load main thread and the fast ram load
> + * thread.  It protects postcopy_qemufile_dst, which is the postcopy
> + * fast channel.
> + *
> + * The ram fast load thread will take it mostly for the whole lifecycle
> + * because it needs to continuously read data from the channel, and
> + * it'll only release this mutex if postcopy is interrupted, so that
> + * the ram load main thread will take this mutex over and properly
> + * release the broken channel.
> + */
>  QemuMutex postcopy_prio_thread_mutex;
>  /*
>   * An array of temp host huge pages to be used, one for each postcopy
> @@ -149,6 +159,12 @@ struct MigrationIncomingState {
>  /* notify PAUSED postcopy incoming migrations to try to continue */
>  QemuSemaphore postcopy_pause_sem_dst;
>  QemuSemaphore postcopy_pause_sem_fault;
> +/*
> + * This semaphore is used to allow the ram fast load thread (only when
> + * postcopy preempt is enabled) fall into sleep when there's network
> + * interruption detected.  When the recovery is done, the main load
> + * thread will kick the fast ram load thread using this semaphore.
> + */
>  QemuSemaphore postcopy_pause_sem_fast_load;

Acked-by: Dr. David Alan Gilbert 

>  
>  /* List of listening socket addresses  */
> ---8<---
> 
> > 
> > > [...]
> > > 
> > > > > @@ -3466,6 +3468,17 @@ static MigThrError 
> > > > > postcopy_pause(MigrationState *s)
> > > > >  qemu_file_shutdown(file);
> > > > >  qemu_fclose(file);
> > > > >  
> > > > > +/*
> > > > > + * Do the same to postcopy fast path socket too if there is. 
> > > > >  No
> > > > > +

[PATCH v3 0/6] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

2022-02-23 Thread Joao Martins

RFCv2[3] -> v3:

* Add missing brackets in single line statement, in patch 5 (David)
* Change ranges printf to use PRIx64, in patch 5 (David)
* Move the check to after changing above_4g_mem_start, in patch 5 (David)
* Make the check generic and move it to pc_memory_init rather being specific
to AMD, as the check is useful to capture invalid phys-bits
configs (patch 5, Igor).
* Fix comment as 'Start address of the initial RAM above 4G' in patch 1 (Igor)
* Consider pci_hole64_size in patch 4 (Igor)
* To consider pci_hole64_size in max used addr we need to get it from pci-host,
so introduce two new patches (2 and 3) which move only the qdev_new("i440fx") or
qdev_new("q35") to be before pc_memory_init().
* Consider sgx_epc.size in max used address, in patch 4 (Igor)
* Rename relocate_4g() to x86_update_above_4g_mem_start() (Igor)
* Keep warn_report() in patch 5, as erroring out will break a few x86_64 qtests
due to pci_hole64 accounting surprass phys-bits possible maxphysaddr.

Thanks Igor/David for the comments!

---

This series lets Qemu spawn i386 guests with >= 1010G with VFIO,
particularly when running on AMD systems with an IOMMU.

Since Linux v5.4, VFIO validates whether the IOVA in DMA_MAP ioctl is valid and 
it
will return -EINVAL on those cases. On x86, Intel hosts aren't particularly
affected by this extra validation. But AMD systems with IOMMU have a hole in
the 1TB boundary which is *reserved* for HyperTransport I/O addresses located
here: FD__h - FF__h. See IOMMU manual [1], specifically
section '2.1.2 IOMMU Logical Topology', Table 3 on what those addresses mean.

VFIO DMA_MAP calls in this IOVA address range fall through this check and hence 
return
 -EINVAL, consequently failing the creation the guests bigger than 1010G. 
Example
of the failure:

qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: 
VFIO_MAP_DMA: -22
qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: vfio 
:41:10.1: 
failed to setup container for group 258: memory listener initialization 
failed:
Region pc.ram: vfio_dma_map(0x55ba53e7a9d0, 0x1, 
0xff3000, 0x7ed243e0) = -22 (Invalid argument)

Prior to v5.4, we could map to these IOVAs *but* that's still not the right 
thing
to do and could trigger certain IOMMU events (e.g. INVALID_DEVICE_REQUEST), or
spurious guest VF failures from the resultant IOMMU target abort (see Errata 
1155[2])
as documented on the links down below.

This small series tries to address that by dealing with this AMD-specific 1Tb 
hole,
but rather than dealing like the 4G hole, it instead relocates RAM above 4G
to be above the 1T if the maximum RAM range crosses the HT reserved range.
It is organized as following:

patch 1: Introduce a @above_4g_mem_start which defaults to 4 GiB as starting
 address of the 4G boundary

patches 2-3: Move pci-host qdev creation to be before pc_memory_init(),
 to get accessing to pci_hole64_size. The actual pci-host
 initialization is kept as is, only the qdev_new.

patch 4: Change @above_4g_mem_start to 1TiB /if we are on AMD and the max
possible address acrosses the HT region.

patch 5: Warns user if phys-bits is too low

patch 6: Ensure valid IOVAs only on new machine types, but not older
ones (<= v6.2.0)

The 'consequence' of this approach is that we may need more than the default
phys-bits e.g. a guest with >1010G, will have most of its RAM after the 1TB
address, consequently needing 41 phys-bits as opposed to the default of 40
(TCG_PHYS_BITS). Today there's already a precedent to depend on the user to
pick the right value of phys-bits (regardless of this series), so we warn in
case phys-bits aren't enough. Finally, CMOS loosing its meaning of the above 4G
ram blocks, but it was mentioned over RFC that CMOS is only useful for very
old seabios. 

Additionally, the reserved region is added to E820 if the relocation is done.

Alternative options considered (in RFC[0]):

a) Dealing with the 1T hole like the 4G hole -- which also represents what
hardware closely does.

Thanks,
Joao

Older Changelog,

RFC[0] -> RFCv2[3]:

* At Igor's suggestion in one of the patches I reworked the series enterily,
and more or less as he was thinking it is far simpler to relocate the
ram-above-4g to be at 1TiB where applicable. The changeset is 3x simpler,
and less intrusive. (patch 1 & 2)
* Check phys-bits is big enough prior to relocating (new patch 3)
* Remove the machine property, and it's only internal and set by new machine
version (Igor, patch 4).
* Clarify whether it's GPA or HPA as a more clear meaning (Igor, patch 2)
* Add IOMMU SDM in the commit message (Igor, patch 2)

[0] 
https://lore.kernel.org/qemu-devel/20210622154905.30858-1-joao.m.mart...@oracle.com/
[1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
[2] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
[3]

[PATCH v3 1/6] hw/i386: add 4g boundary start to X86MachineState

2022-02-23 Thread Joao Martins

Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState property @above_4g_mem_start and use it
accordingly.

This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.

Signed-off-by: Joao Martins 
---
 hw/i386/acpi-build.c  | 2 +-
 hw/i386/pc.c  | 9 +
 hw/i386/sgx.c | 2 +-
 hw/i386/x86.c | 1 +
 include/hw/i386/x86.h | 3 +++
 5 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebd47aa26fd8..4bf54ccdab91 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2063,7 +2063,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_srat_memory(table_data, mem_base, mem_len, i - 1,
   MEM_AFFINITY_ENABLED);
 }
-mem_base = 1ULL << 32;
+mem_base = x86ms->above_4g_mem_start;
 mem_len = next_base - x86ms->below_4g_mem_size;
 next_base = mem_base + mem_len;
 }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c8696ac01e85..7de0e87f4a3f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -837,9 +837,10 @@ void pc_memory_init(PCMachineState *pcms,
  machine->ram,
  x86ms->below_4g_mem_size,
  x86ms->above_4g_mem_size);
-memory_region_add_subregion(system_memory, 0x1ULL,
+memory_region_add_subregion(system_memory, x86ms->above_4g_mem_start,
 ram_above_4g);
-e820_add_entry(0x1ULL, x86ms->above_4g_mem_size, E820_RAM);
+e820_add_entry(x86ms->above_4g_mem_start, x86ms->above_4g_mem_size,
+   E820_RAM);
 }
 
 if (pcms->sgx_epc.size != 0) {
@@ -880,7 +881,7 @@ void pc_memory_init(PCMachineState *pcms,
 machine->device_memory->base = 
sgx_epc_above_4g_end(>sgx_epc);
 } else {
 machine->device_memory->base =
-0x1ULL + x86ms->above_4g_mem_size;
+x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 machine->device_memory->base =
@@ -972,7 +973,7 @@ uint64_t pc_pci_hole64_start(void)
 } else if (pcms->sgx_epc.size != 0) {
 hole64_start = sgx_epc_above_4g_end(>sgx_epc);
 } else {
-hole64_start = 0x1ULL + x86ms->above_4g_mem_size;
+hole64_start = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 return ROUND_UP(hole64_start, 1 * GiB);
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index a2b318dd9387..164ee1ddb8de 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -295,7 +295,7 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 return;
 }
 
-sgx_epc->base = 0x1ULL + x86ms->above_4g_mem_size;
+sgx_epc->base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 
 memory_region_init(_epc->mr, OBJECT(pcms), "sgx-epc", UINT64_MAX);
 memory_region_add_subregion(get_system_memory(), sgx_epc->base,
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b84840a1bb99..912e96718ee8 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1319,6 +1319,7 @@ static void x86_machine_initfn(Object *obj)
 x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
 x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
 x86ms->bus_lock_ratelimit = 0;
+x86ms->above_4g_mem_start = 0x1ULL;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index a145a303703f..ec6ead296064 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -58,6 +58,9 @@ struct X86MachineState {
 /* RAM information (sizes, addresses, configuration): */
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
 
+/* Start address of the initial RAM above 4G */
+ram_addr_t above_4g_mem_start;
+
 /* CPU and apic information: */
 bool apic_xrupt_override;
 unsigned pci_irq_mask;
-- 
2.17.2

[PATCH v3 4/6] i386/pc: relocate 4g start to 1T where applicable

2022-02-23 Thread Joao Martins

It is assumed that the whole GPA space is available to be DMA
addressable, within a given address space limit, expect for a
tiny region before the 4G. Since Linux v5.4, VFIO validates
whether the selected GPA is indeed valid i.e. not reserved by
IOMMU on behalf of some specific devices or platform-defined
restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
 -EINVAL.

AMD systems with an IOMMU are examples of such platforms and
particularly may only have these ranges as allowed:

 - fedf (0  .. 3.982G)
fef0 - 00fc (3.983G .. 1011.9G)
0100 -  (1Tb.. 16Pb[*])

We already account for the 4G hole, albeit if the guest is big
enough we will fail to allocate a guest with  >1010G due to the
~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).

[*] there is another reserved region unrelated to HT that exists
in the 256T boundaru in Fam 17h according to Errata #1286,
documeted also in "Open-Source Register Reference for AMD Family
17h Processors (PUB)"

When creating the region above 4G, take into account that on AMD
platforms the HyperTransport range is reserved and hence it
cannot be used either as GPAs. On those cases rather than
establishing the start of ram-above-4g to be 4G, relocate instead
to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
Topology", for more information on the underlying restriction of
IOVAs.

After accounting for the 1Tb hole on AMD hosts, mtree should
look like:

-7fff (prio 0, i/o):
 alias ram-below-4g @pc.ram -7fff
0100-01ff7fff (prio 0, i/o):
alias ram-above-4g @pc.ram 8000-00ff

If the relocation is done, we also add the the reserved HT
e820 range as reserved.

Suggested-by: Igor Mammedov 
Signed-off-by: Joao Martins 
---
 hw/i386/pc.c  | 79 +++
 target/i386/cpu.h |  4 +++
 2 files changed, 83 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 360f4e10001b..6e4f5c87a2e5 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -802,6 +802,78 @@ void xen_load_linux(PCMachineState *pcms)
 #define PC_ROM_ALIGN   0x800
 #define PC_ROM_SIZE(PC_ROM_MAX - PC_ROM_MIN_VGA)
 
+/*
+ * AMD systems with an IOMMU have an additional hole close to the
+ * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
+ * on kernel version, VFIO may or may not let you DMA map those ranges.
+ * Starting Linux v5.4 we validate it, and can't create guests on AMD machines
+ * with certain memory sizes. It's also wrong to use those IOVA ranges
+ * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
+ * The ranges reserved for Hyper-Transport are:
+ *
+ * FD__h - FF__h
+ *
+ * The ranges represent the following:
+ *
+ * Base Address   Top Address  Use
+ *
+ * FD__h FD_F7FF_h Reserved interrupt address space
+ * FD_F800_h FD_F8FF_h Interrupt/EOI IntCtl
+ * FD_F900_h FD_F90F_h Legacy PIC IACK
+ * FD_F910_h FD_F91F_h System Management
+ * FD_F920_h FD_FAFF_h Reserved Page Tables
+ * FD_FB00_h FD_FBFF_h Address Translation
+ * FD_FC00_h FD_FDFF_h I/O Space
+ * FD_FE00_h FD__h Configuration
+ * FE__h FE_1FFF_h Extended Configuration/Device Messages
+ * FE_2000_h FF__h Reserved
+ *
+ * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
+ * Table 3: Special Address Controls (GPA) for more information.
+ */
+#define AMD_HT_START 0xfdUL
+#define AMD_HT_END   0xffUL
+#define AMD_ABOVE_1TB_START  (AMD_HT_END + 1)
+#define AMD_HT_SIZE  (AMD_ABOVE_1TB_START - AMD_HT_START)
+
+static hwaddr x86_max_phys_addr(PCMachineState *pcms,
+uint64_t pci_hole64_size)
+{
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+X86MachineState *x86ms = X86_MACHINE(pcms);
+MachineState *machine = MACHINE(pcms);
+ram_addr_t device_mem_size = 0;
+hwaddr base;
+
+if (pcmc->has_reserved_memory &&
+   (machine->ram_size < machine->maxram_size)) {
+device_mem_size = machine->maxram_size - machine->ram_size;
+}
+
+base = ROUND_UP(x86ms->above_4g_mem_start + x86ms->above_4g_mem_size +
+pcms->sgx_epc.size, 1 * GiB);
+
+return base + device_mem_size + pci_hole64_size;
+}
+
+static void x86_update_above_4g_mem_start(PCMachineState *pcms,
+  uint64_t pci_hole64_size)
+{
+X86MachineState *x86ms = X86_MACHINE(pcms);
+uint32_t eax, vendor[3];
+
+host_cpuid(0x0, 0, , [0], [2], [1]);
+if (!IS_AMD_VENDOR(vendor)) {
+return;
+}
+
+if (x86_max_phys_addr(pcms, pci_hole64_size) < AMD_HT_START) {
+return;
+}
+
+x86ms->above_4g_mem_start = AMD_ABOVE_1TB_START;
+}
+
 void

Re: [PATCH] qapi: fix mistake in example command illustration

2022-02-23 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> The snapshot-load/save/delete commands illustrated their usage, but
> mistakenly used 'data' rather than 'arguments' as the field name.
> 
> Signed-off-by: Daniel P. Berrangé 

Fabian Holler's patch from yesterday beat you to it slightly;
I think Markus has it queued.

(20220222170116.63105-1-fabian.hol...@simplesurance.de )
> ---
>  qapi/migration.json | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5975a0e104..1c6296897d 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1888,7 +1888,7 @@
>  # Example:
>  #
>  # -> { "execute": "snapshot-save",
> -#  "data": {
> +#  "arguments": {
>  # "job-id": "snapsave0",
>  # "tag": "my-snap",
>  # "vmstate": "disk0",
> @@ -1949,7 +1949,7 @@
>  # Example:
>  #
>  # -> { "execute": "snapshot-load",
> -#  "data": {
> +#  "arguments": {
>  # "job-id": "snapload0",
>  # "tag": "my-snap",
>  # "vmstate": "disk0",
> @@ -2002,7 +2002,7 @@
>  # Example:
>  #
>  # -> { "execute": "snapshot-delete",
> -#  "data": {
> +#  "arguments": {
>  # "job-id": "snapdelete0",
>  # "tag": "my-snap",
>  # "devices": ["disk0", "disk1"]
> -- 
> 2.34.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v4 12/12] KVM: Expose KVM_MEM_PRIVATE

2022-02-23 Thread Maciej S. Szmigiero


On 23.02.2022 13:00, Chao Peng wrote:

On Tue, Feb 22, 2022 at 02:16:46AM +0100, Maciej S. Szmigiero wrote:

On 17.02.2022 14:45, Chao Peng wrote:

On Tue, Jan 25, 2022 at 09:20:39PM +0100, Maciej S. Szmigiero wrote:

On 18.01.2022 14:21, Chao Peng wrote:

KVM_MEM_PRIVATE is not exposed by default but architecture code can turn
on it by implementing kvm_arch_private_memory_supported().

Also private memslot cannot be movable and the same file+offset can not
be mapped into different GFNs.

Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---

(..)

static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
- gfn_t start, gfn_t end)
+ struct file *file,
+ gfn_t start, gfn_t end,
+ loff_t start_off, loff_t end_off)
{
struct kvm_memslot_iter iter;
+   struct kvm_memory_slot *slot;
+   struct inode *inode;
+   int bkt;
kvm_for_each_memslot_in_gfn_range(, slots, start, end) {
if (iter.slot->id != id)
return true;
}
+   /* Disallow mapping the same file+offset into multiple gfns. */
+   if (file) {
+   inode = file_inode(file);
+   kvm_for_each_memslot(slot, bkt, slots) {
+   if (slot->private_file &&
+file_inode(slot->private_file) == inode &&
+!(end_off <= slot->private_offset ||
+  start_off >= slot->private_offset
++ (slot->npages >> PAGE_SHIFT)))
+   return true;
+   }
+   }


That's a linear scan of all memslots on each CREATE (and MOVE) operation
with a fd - we just spent more than a year rewriting similar linear scans
into more efficient operations in KVM.



(..)

So linear scan is used before I can find a better way.


Another option would be to simply not check for overlap at add or move
time, declare such configuration undefined behavior under KVM API and
make sure in MMU notifiers that nothing bad happens to the host kernel
if it turns out somebody actually set up a VM this way (it could be
inefficient in this case, since it's not supposed to ever happen
unless there is a bug somewhere in the userspace part).


Specific to TDX case, SEAMMODULE will fail the overlapping case and then
KVM prints a message to the kernel log. It will not cause any other side
effect, it does look weird however. Yes warn that in the API document
can help to some extent.


So for the functionality you are adding this code for (TDX) this scan
isn't necessary and the overlapping case (not supported anyway) is safely
handled by the hardware (or firmware)?
Then I would simply remove the scan and, maybe, add a comment instead
that the overlap check is done by the hardware.

By the way, if a kernel log message could be triggered by (misbehaving)
userspace then it should be rate limited (if it isn't already).


Thanks,
Chao


Thanks,
Maciej

Re: [PATCH 4/5] python: qmp_shell: add -e/--exit-on-error option

2022-02-23 Thread John Snow

On Wed, Feb 23, 2022 at 12:09 PM Damien Hedde
 wrote:
>
>
>
> On 2/23/22 17:18, John Snow wrote:
> > On Wed, Feb 23, 2022 at 10:44 AM Daniel P. Berrangé  
> > wrote:
> >>
> >> On Wed, Feb 23, 2022 at 10:41:11AM -0500, John Snow wrote:
> >>> On Wed, Feb 23, 2022 at 10:27 AM Daniel P. Berrangé  
> >>> wrote:
> 
>  On Wed, Feb 23, 2022 at 10:22:11AM -0500, John Snow wrote:
> > On Mon, Feb 21, 2022 at 10:55 AM Damien Hedde
> >  wrote:
> >>
> >> This option makes qmp_shell exit (with error code 1)
> >> as soon as one of the following error occurs:
> >> + command parsing error
> >> + disconnection
> >> + command failure (response is an error)
> >>
> >> _execute_cmd() method now returns None or the response
> >> so that read_exec_command() can do the last check.
> >>
> >> This is meant to be used in combination with an input file
> >> redirection. It allows to store a list of commands
> >> into a file and try to run them by qmp_shell and easily
> >> see if it failed or not.
> >>
> >> Signed-off-by: Damien Hedde 
> >
> > Based on this patch, it looks like you really want something
> > scriptable, so I think the qemu-send idea that Dan has suggested might
> > be the best way to go. Are you still hoping to use the interactive
> > "short" QMP command format? That might be a bad idea, given how flaky
> > the parsing is -- and how we don't actually have a published standard
> > for that format. We've *never* liked the bad parsing here, so I have a
> > reluctance to use it in more places.
> >
> > I'm having the naive idea that a script file could be as simple as a
> > list of QMP commands to send:
> >
> > [
> >  {"execute": "block-dirty-bitmap-add", "arguments": { ... }},
> >  ...
> > ]
> 
>  I'd really recommend against creating a new format for the script
>  file, especially one needing opening & closing  [] like this, as
>  that isn't so amenable to dynamic usage/creation. ie you can't
>  just append an extcra command to an existing file.
> 
>  IMHO, the "file" format should be identical to the result of
>  capturing the socket data off the wire. ie just a concatenation
>  of QMP commands, with no extra wrapping / change in format.
> 
> >>>
> >>> Eugh. That's just so hard to parse, because there's no off-the-shelf
> >>> tooling for "load a sequence of JSON documents". Nothing in Python
> >>> does it. :\
> >>
> >> It isn't that hard if you require each JSON doc to be followed by
> >> a newline.
> >>
> >> Feed one line at a time to the JSON parser, until you get a complete
> >> JSON doc, process that, then re-init the parser and carry on feeding
> >> it lines until it emits the next JSON doc, and so on.
> >>
> >
> > There's two interfaces in Python:
> >
> > (1) json.load(), which takes a file pointer and either returns a
> > single, complete JSON document or it raises an Exception. It's not
> > useful here at all.
> > (2) json.JSONDecoder().raw_decode(strbuf), which takes a string buffer
> > and returns a 2-tuple of a JSON Document and the position at which it
> > stopped decoding.
> >
> > The second is what we need here, but it does require buffering the
> > entire file into a string first, and then iteratively calling it. It
> > feels like working against the grain a little bit. We also can't use
> > the QAPI parser, as that parser has intentionally removed support for
> > constructs we don't use in the qapi schema language. Boo. (Not that I
> > want more non-standard configuration files like that propagating,
> > either.)
> >
> > It would be possible to generate a JSON-Schema document to describe a
> > script file that used a containing list construct, but impossible for
> > a concatenation of JSON documents. This is one of the reasons I
> > instinctively shy away from non-standard file formats, they tend to
> > cut off support for this sort of thing.
> >
> > Wanting to keep the script easy to append to is legitimate. I'm keen
> > to hear a bit more about the use case here before I press extremely
> > hard in any given direction, but those are my impulses here.
> >
>
> The use case is to be able to feed qemu with a bunch of commands we
> expect to succeed and let qemu continue (unlike Daniel's wrap use case,
> we don't want to quit qemu after the last command).
>
> Typically it's the use case I present in the following cover-letter:
> https://lore.kernel.org/qemu-devel/20220223090706.4888-1-damien.he...@greensocs.com/
>

OK (Sorry for blowing this out into a bigger ordeal than you had maybe
hoped for. I want to get you happy and on your way ASAP, I promise)

So, I see comments and simple QMP commands using the short-hand
format. If I understand correctly, you want this script upstream so
that you don't have to re-engineer the hack every time I shift
something around in qmp-shell, and so that examples can be easily
shared and

Re: [PATCH v3] docs/system/i386: Add measurement calculation details to amd-memory-encryption

2022-02-23 Thread Dr. David Alan Gilbert

* Dov Murik (dovmu...@linux.ibm.com) wrote:
> Add a section explaining how the Guest Owner should calculate the
> expected guest launch measurement for SEV and SEV-ES.
> 
> Also update the name and links to the SEV API Spec document.
> 
> Signed-off-by: Dov Murik 
> Suggested-by: Daniel P. Berrangé 
> 

Thanks; my guess is we're going to need to document the expected VMSA
values at some point.

Reviewed-by: Dr. David Alan Gilbert 

> ---
> 
> v2:
> - Explain that firmware must be built without NVRAM store.
> 
> v3:
> - rstify
> ---
>  docs/system/i386/amd-memory-encryption.rst | 54 --
>  1 file changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/system/i386/amd-memory-encryption.rst 
> b/docs/system/i386/amd-memory-encryption.rst
> index 215946f813..dcf4add0e7 100644
> --- a/docs/system/i386/amd-memory-encryption.rst
> +++ b/docs/system/i386/amd-memory-encryption.rst
> @@ -47,7 +47,7 @@ The guest policy is passed as plaintext. A hypervisor may 
> choose to read it,
>  but should not modify it (any modification of the policy bits will result
>  in bad measurement). The guest policy is a 4-byte data structure containing
>  several flags that restricts what can be done on a running SEV guest.
> -See KM Spec section 3 and 6.2 for more details.
> +See SEV API Spec ([SEVAPI]_) section 3 and 6.2 for more details.
>  
>  The guest policy can be provided via the ``policy`` property::
>  
> @@ -92,7 +92,7 @@ expects.
>  ``LAUNCH_FINISH`` finalizes the guest launch and destroys the cryptographic
>  context.
>  
> -See SEV KM API Spec ([SEVKM]_) 'Launching a guest' usage flow (Appendix A) 
> for the
> +See SEV API Spec ([SEVAPI]_) 'Launching a guest' usage flow (Appendix A) for 
> the
>  complete flow chart.
>  
>  To launch a SEV guest::
> @@ -118,6 +118,49 @@ a SEV-ES guest:
>   - Requires in-kernel irqchip - the burden is placed on the hypervisor to
> manage booting APs.
>  
> +Calculating expected guest launch measurement
> +-
> +
> +In order to verify the guest launch measurement, The Guest Owner must compute
> +it in the exact same way as it is calculated by the AMD-SP.  SEV API Spec
> +([SEVAPI]_) section 6.5.1 describes the AMD-SP operations:
> +
> +GCTX.LD is finalized, producing the hash digest of all plaintext data
> +imported into the guest.
> +
> +The launch measurement is calculated as:
> +
> +HMAC(0x04 || API_MAJOR || API_MINOR || BUILD || GCTX.POLICY || GCTX.LD 
> || MNONCE; GCTX.TIK)
> +
> +where "||" represents concatenation.
> +
> +The values of API_MAJOR, API_MINOR, BUILD, and GCTX.POLICY can be obtained
> +from the ``query-sev`` qmp command.
> +
> +The value of MNONCE is part of the response of ``query-sev-launch-measure``: 
> it
> +is the last 16 bytes of the base64-decoded data field (see SEV API Spec
> +([SEVAPI]_) section 6.5.2 Table 52: LAUNCH_MEASURE Measurement Buffer).
> +
> +The value of GCTX.LD is
> +``SHA256(firmware_blob || kernel_hashes_blob || vmsas_blob)``, where:
> +
> +* ``firmware_blob`` is the content of the entire firmware flash file (for
> +  example, ``OVMF.fd``).  Note that you must build a stateless firmware file
> +  which doesn't use an NVRAM store, because the NVRAM area is not measured, 
> and
> +  therefore it is not secure to use a firmware which uses state from an NVRAM
> +  store.
> +* if kernel is used, and ``kernel-hashes=on``, then ``kernel_hashes_blob`` is
> +  the content of PaddedSevHashTable (including the zero padding), which 
> itself
> +  includes the hashes of kernel, initrd, and cmdline that are passed to the
> +  guest.  The PaddedSevHashTable struct is defined in ``target/i386/sev.c``.
> +* if SEV-ES is enabled (``policy & 0x4 != 0``), ``vmsas_blob`` is the
> +  concatenation of all VMSAs of the guest vcpus.  Each VMSA is 4096 bytes 
> long;
> +  its content is defined inside Linux kernel code as ``struct 
> vmcb_save_area``,
> +  or in AMD APM Volume 2 ([APMVOL2]_) Table B-2: VMCB Layout, State Save 
> Area.
> +
> +If kernel hashes are not used, or SEV-ES is disabled, use empty blobs for
> +``kernel_hashes_blob`` and ``vmsas_blob`` as needed.
> +
>  Debugging
>  -
>  
> @@ -142,8 +185,11 @@ References
>  `AMD Memory Encryption whitepaper
>  
> `_
>  
> -.. [SEVKM] `Secure Encrypted Virtualization Key Management
> -   
> `_
> +.. [SEVAPI] `Secure Encrypted Virtualization API
> +   
> `_
> +
> +.. [APMVOL2] `AMD64 Architecture Programmer's Manual Volume 2: System 
> Programming
> +   `_
>  
>  KVM Forum slides:
>  
> 
> base-commit: c13b8e9973635f34f3ce4356af27a311c993729c
> -- 
> 2.25.1
> 
-- 
Dr. David Alan Gilbert /

[PATCH] qapi: fix mistake in example command illustration

2022-02-23 Thread Daniel P . Berrangé

The snapshot-load/save/delete commands illustrated their usage, but
mistakenly used 'data' rather than 'arguments' as the field name.

Signed-off-by: Daniel P. Berrangé 
---
 qapi/migration.json | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 5975a0e104..1c6296897d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1888,7 +1888,7 @@
 # Example:
 #
 # -> { "execute": "snapshot-save",
-#  "data": {
+#  "arguments": {
 # "job-id": "snapsave0",
 # "tag": "my-snap",
 # "vmstate": "disk0",
@@ -1949,7 +1949,7 @@
 # Example:
 #
 # -> { "execute": "snapshot-load",
-#  "data": {
+#  "arguments": {
 # "job-id": "snapload0",
 # "tag": "my-snap",
 # "vmstate": "disk0",
@@ -2002,7 +2002,7 @@
 # Example:
 #
 # -> { "execute": "snapshot-delete",
-#  "data": {
+#  "arguments": {
 # "job-id": "snapdelete0",
 # "tag": "my-snap",
 # "devices": ["disk0", "disk1"]
-- 
2.34.1

Re: [PATCH RFC 4/4] rtc: Have event RTC_CHANGE identify the RTC by QOM path

2022-02-23 Thread Cédric Le Goater


On 2/22/22 14:06, Peter Maydell wrote:

On Tue, 22 Feb 2022 at 12:56, Philippe Mathieu-Daudé
 wrote:

On 22/2/22 13:02, Markus Armbruster wrote:

Event RTC_CHANGE is "emitted when the guest changes the RTC time" (and
the RTC supports the event).  What if there's more than one RTC?


w.r.t. RTC, a machine having multiple RTC devices is silly...


I don't think we have any examples in the tree currently, but
I bet real hardware like that does exist: the most plausible
thing would be a board where there's an RTC built into the SoC
but the board designers put an external RTC on the board (perhaps
because it was better/more accurate/easier to make battery-backed).


Yes. like Aspeed machines.

C.




In fact, here's an old bug report from a user trying to get
their Debian system to use the battery-backed RTC as the
"real" one rather than the non-battery-backed RTC device
that's also part of the arm board they're using:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785445

-- PMM

Re: [PATCH 4/5] python: qmp_shell: add -e/--exit-on-error option

2022-02-23 Thread Daniel P . Berrangé

On Wed, Feb 23, 2022 at 11:18:26AM -0500, John Snow wrote:
> On Wed, Feb 23, 2022 at 10:44 AM Daniel P. Berrangé  
> wrote:
> >
> > On Wed, Feb 23, 2022 at 10:41:11AM -0500, John Snow wrote:
> > > On Wed, Feb 23, 2022 at 10:27 AM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Wed, Feb 23, 2022 at 10:22:11AM -0500, John Snow wrote:
> > > > > On Mon, Feb 21, 2022 at 10:55 AM Damien Hedde
> > > > >  wrote:
> > > > > >
> > > > > > This option makes qmp_shell exit (with error code 1)
> > > > > > as soon as one of the following error occurs:
> > > > > > + command parsing error
> > > > > > + disconnection
> > > > > > + command failure (response is an error)
> > > > > >
> > > > > > _execute_cmd() method now returns None or the response
> > > > > > so that read_exec_command() can do the last check.
> > > > > >
> > > > > > This is meant to be used in combination with an input file
> > > > > > redirection. It allows to store a list of commands
> > > > > > into a file and try to run them by qmp_shell and easily
> > > > > > see if it failed or not.
> > > > > >
> > > > > > Signed-off-by: Damien Hedde 
> > > > >
> > > > > Based on this patch, it looks like you really want something
> > > > > scriptable, so I think the qemu-send idea that Dan has suggested might
> > > > > be the best way to go. Are you still hoping to use the interactive
> > > > > "short" QMP command format? That might be a bad idea, given how flaky
> > > > > the parsing is -- and how we don't actually have a published standard
> > > > > for that format. We've *never* liked the bad parsing here, so I have a
> > > > > reluctance to use it in more places.
> > > > >
> > > > > I'm having the naive idea that a script file could be as simple as a
> > > > > list of QMP commands to send:
> > > > >
> > > > > [
> > > > > {"execute": "block-dirty-bitmap-add", "arguments": { ... }},
> > > > > ...
> > > > > ]
> > > >
> > > > I'd really recommend against creating a new format for the script
> > > > file, especially one needing opening & closing  [] like this, as
> > > > that isn't so amenable to dynamic usage/creation. ie you can't
> > > > just append an extcra command to an existing file.
> > > >
> > > > IMHO, the "file" format should be identical to the result of
> > > > capturing the socket data off the wire. ie just a concatenation
> > > > of QMP commands, with no extra wrapping / change in format.
> > > >
> > >
> > > Eugh. That's just so hard to parse, because there's no off-the-shelf
> > > tooling for "load a sequence of JSON documents". Nothing in Python
> > > does it. :\
> >
> > It isn't that hard if you require each JSON doc to be followed by
> > a newline.
> >
> > Feed one line at a time to the JSON parser, until you get a complete
> > JSON doc, process that, then re-init the parser and carry on feeding
> > it lines until it emits the next JSON doc, and so on.
> >
> 
> There's two interfaces in Python:
> 
> (1) json.load(), which takes a file pointer and either returns a
> single, complete JSON document or it raises an Exception. It's not
> useful here at all.
> (2) json.JSONDecoder().raw_decode(strbuf), which takes a string buffer
> and returns a 2-tuple of a JSON Document and the position at which it
> stopped decoding.

Yes, the latter would do it, but you can also be lazy and just
repeatedly call json.loads() until you get a valid parse

$ cat demo.py
import json

cmds = []
bits = []
with open("qmp.txt", "r") as fh:
for line in fh:
bits.append(line)
try:
cmdstr = "".join(bits)
cmds.append(json.loads(cmdstr))
bits = []
except json.JSONDecodeError:
pass


for cmd in cmds:
print("Command: %s" % cmd)


$ cat qmp.txt
{ "execute": "qmp_capabilities" }
{ "execute": "blockdev-add",
"arguments": {
"node-name": "drive0",
"driver": "file",
"filename": "$TEST_IMG"
}
}
{ "execute": "blockdev-add",
"arguments": {
"driver": "$IMGFMT",
"node-name": "drive0-debug",
"file": {
"driver": "blkdebug",
"image": "drive0",
"inject-error": [{
"event": "l2_load"
}]
}
}
}
{ "execute": "human-monitor-command",
"arguments": {
"command-line": "qemu-io drive0-debug \"read 0 512\""
}
}
{ "execute": "quit" }


$ python demo.py
Command: {'execute': 'qmp_capabilities'}
Command: {'execute': 'blockdev-add', 'arguments': {'node-name': 'drive0', 
'driver': 'file', 'filename': '$TEST_IMG'}}
Command: {'execute': 'blockdev-add', 'arguments': {'driver': '$IMGFMT', 
'node-name': 'drive0-debug', 'file': {'driver': 'blkdebug', 'image': 'drive0', 
'inject-error': [{'event': 'l2_load'}]}}}
Command: {'execute': 'human-monitor-command', 'arguments': {'command-line': 
'qemu-io drive0-debug "read 0 512"'}}
Command: {'execute': 'quit'}


> Wanting to keep the script easy to append to is legitimate. I'm keen
> to hear a bit more about the use case here before I press

Re: Fix a potential Use-after-free in virtio_iommu_handle_command() (v6.2.0).

2022-02-23 Thread Eric Auger

Hi,

On 2/23/22 5:02 PM, Thomas Huth wrote:
> On 23/02/2022 15.36, wli...@stu.xidian.edu.cn wrote:
>> Hi all,
>>
>> I find a potential Use-after-free in QEMU 6.2.0, which is in
>> virtio_iommu_handle_command() (./hw/virtio/virtio-iommu.c).
>>
>> Specifically, in the loop body, the variable 'buf' allocated at line
>> 639 can be freed by g_free() at line 659. However, if the execution
>> path enters the loop body again and the if branch takes true at line
>> 616, the control will directly jump to 'out' at line 651. At this
>> time, 'buf' is a freed pointer, which is not assigned with an
>> allocated memory but used at line 653. As a result, a UAF bug is
>> triggered.
>>
>>
>>
>> 599    for (;;) {
>> ...
>> 615        sz = iov_to_buf(iov, iov_cnt, 0, , sizeof(head));
>> 616        if (unlikely(sz != sizeof(head))) {
>> 617            tail.status = VIRTIO_IOMMU_S_DEVERR;
>> 618            goto out;
>> 619        }
>> ...
>> 639            buf = g_malloc0(output_size);
>> ...
>> 651out:
>> 652        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
>> 653                          buf ? buf : , output_size);
>> ...
>> 659        g_free(buf);
>> 660    }
>>
>>
>> We can fix it by set ‘buf‘ to NULL after freeing it:
>>
>>
>>
>> 651out:
>> 652        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
>> 653                          buf ? buf : , output_size);
>> ...
>> 659        g_free(buf);
>> +++buf = NULL;
>> 660    }
>>
>>
>> I'm looking forward to your confirmation.
Thank you for the report. Yes setting buff to null after the g_free
looks the right thing to do here. Please feel free to send the patch.
>
>  Hi,
>
> thanks for your report and patch - but to make sure that the right
> people get attention, please use the scripts/get_maintainer.pl script
> to get a list of people who should be on CC:, or look into the
> MAINTAINERS file directly (for the next time - this time, I've CC:ed
> them now already).
Thanks you Thomas for the cc ;-)

Eric
>
>  Thanks,
>   Thomas
>

Re: [PATCH RFCv2 3/4] i386/pc: warn if phys-bits is too low

2022-02-23 Thread Joao Martins

On 2/14/22 15:18, Joao Martins wrote:
> On 2/14/22 15:03, Igor Mammedov wrote:
>> On Mon,  7 Feb 2022 20:24:21 +
>> Joao Martins  wrote:
>>
>>> Default phys-bits on Qemu is TCG_PHYS_BITS (40) which is enough
>>> to address 1Tb (0xff  ). On AMD platforms, if a
>>> ram-above-4g relocation happens and the CPU wasn't configured
>>> with a big enough phys-bits, warn the user. There isn't a
>>> catastrophic failure exactly, the guest will still boot, but
>>> most likely won't be able to use more than ~4G of RAM.
>>
>> how 'unable to use" would manifest?
>> It might be better to prevent QEMU startup with broken setup (CLI)
>> rather then letting guest run and trying to figure out what's
>> going wrong when users start to complain. 
>>
> Sounds better to be conservative here.
> 
> I will change from warn_report() to error_report()
> and exit.
> 

I was running through x86_64 qtests prior to submission
and it seems that the inclusion of a pci_hole64_size in
the check added by this patch would break tests if we were
to error out. So far, I'm keeping it as a warning over
compatibility concerns, not limited these 5 test failures
below. Let me know otherwise if you disagree, or if you
prefer another way.

Summary of Failures:

 1/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/qom-test   ERROR   
0.07s
  killed by signal 6 SIGABRT
 4/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/test-hmp   ERROR   
0.07s
  killed by signal 6 SIGABRT
 7/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/boot-serial-test   ERROR   
0.07s
  killed by signal 6 SIGABRT
44/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/test-x86-cpuid-compat  ERROR   
0.09s
  killed by signal 6 SIGABRT
45/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/numa-test  ERROR   
0.17s
  killed by signal 6 SIGABRT

Re: [PATCH 4/5] python: qmp_shell: add -e/--exit-on-error option

2022-02-23 Thread Damien Hedde





On 2/23/22 17:18, John Snow wrote:

On Wed, Feb 23, 2022 at 10:44 AM Daniel P. Berrangé  wrote:


On Wed, Feb 23, 2022 at 10:41:11AM -0500, John Snow wrote:

On Wed, Feb 23, 2022 at 10:27 AM Daniel P. Berrangé  wrote:


On Wed, Feb 23, 2022 at 10:22:11AM -0500, John Snow wrote:

On Mon, Feb 21, 2022 at 10:55 AM Damien Hedde
 wrote:


This option makes qmp_shell exit (with error code 1)
as soon as one of the following error occurs:
+ command parsing error
+ disconnection
+ command failure (response is an error)

_execute_cmd() method now returns None or the response
so that read_exec_command() can do the last check.

This is meant to be used in combination with an input file
redirection. It allows to store a list of commands
into a file and try to run them by qmp_shell and easily
see if it failed or not.

Signed-off-by: Damien Hedde 


Based on this patch, it looks like you really want something
scriptable, so I think the qemu-send idea that Dan has suggested might
be the best way to go. Are you still hoping to use the interactive
"short" QMP command format? That might be a bad idea, given how flaky
the parsing is -- and how we don't actually have a published standard
for that format. We've *never* liked the bad parsing here, so I have a
reluctance to use it in more places.

I'm having the naive idea that a script file could be as simple as a
list of QMP commands to send:

[
 {"execute": "block-dirty-bitmap-add", "arguments": { ... }},
 ...
]


I'd really recommend against creating a new format for the script
file, especially one needing opening & closing  [] like this, as
that isn't so amenable to dynamic usage/creation. ie you can't
just append an extcra command to an existing file.

IMHO, the "file" format should be identical to the result of
capturing the socket data off the wire. ie just a concatenation
of QMP commands, with no extra wrapping / change in format.



Eugh. That's just so hard to parse, because there's no off-the-shelf
tooling for "load a sequence of JSON documents". Nothing in Python
does it. :\


It isn't that hard if you require each JSON doc to be followed by
a newline.

Feed one line at a time to the JSON parser, until you get a complete
JSON doc, process that, then re-init the parser and carry on feeding
it lines until it emits the next JSON doc, and so on.



There's two interfaces in Python:

(1) json.load(), which takes a file pointer and either returns a
single, complete JSON document or it raises an Exception. It's not
useful here at all.
(2) json.JSONDecoder().raw_decode(strbuf), which takes a string buffer
and returns a 2-tuple of a JSON Document and the position at which it
stopped decoding.

The second is what we need here, but it does require buffering the
entire file into a string first, and then iteratively calling it. It
feels like working against the grain a little bit. We also can't use
the QAPI parser, as that parser has intentionally removed support for
constructs we don't use in the qapi schema language. Boo. (Not that I
want more non-standard configuration files like that propagating,
either.)

It would be possible to generate a JSON-Schema document to describe a
script file that used a containing list construct, but impossible for
a concatenation of JSON documents. This is one of the reasons I
instinctively shy away from non-standard file formats, they tend to
cut off support for this sort of thing.

Wanting to keep the script easy to append to is legitimate. I'm keen
to hear a bit more about the use case here before I press extremely
hard in any given direction, but those are my impulses here.



The use case is to be able to feed qemu with a bunch of commands we 
expect to succeed and let qemu continue (unlike Daniel's wrap use case, 
we don't want to quit qemu after the last command).


Typically it's the use case I present in the following cover-letter:
https://lore.kernel.org/qemu-devel/20220223090706.4888-1-damien.he...@greensocs.com/

--
Damien

Re: Adding a handshake to qemu-guest-agent

2022-02-23 Thread John Snow

On Wed, Feb 16, 2022 at 3:52 PM Michael Roth  wrote:
>
> On Wed, Feb 16, 2022 at 10:12:36AM +0100, Markus Armbruster wrote:
> > Michael Roth  writes:
> >
> > > On Mon, Feb 14, 2022 at 03:14:37PM +0100, Markus Armbruster wrote:
> > >> Cc: the qemu-ga maintainer
> > >>
> > >> John Snow  writes:
> > >>
> > >> > [Moving our discussion upstream, because it stopped being brief and 
> > >> > simple.]
> > >
> > > Hi John, Markus,
> > >
> > >>
> > >> Motivation: qemu-ga doesn't do capability negotiation as specified in
> > >> docs/interop/qmp-spec.txt.
> > >>
> > >> Reminder: qmp-spec.txt specifies the server shall send a greeting
> > >> containing the capabilities on offer.  The client shall send a
> > >> qmp_capabilities command before any other command.
> > >>
> > >> We can't just fix qemu-ga to comply, because it would break existing
> > >> clients.
> > >>
> > >> We could document its behavior in qmp-spec.txt.  Easy enough, but also
> > >> kind of sad.
> > >
> > > I'm not sure we could've ever done it QMP-style with the initial
> > > greeting/negotiation mode. It's been a while, I but recall virtio-serial
> > > chardev in guest not having a very straight-forward way of flushing out
> > > data from the vring after a new client connects on the host side, so
> > > new clients had a chance of reading left-over garbage from previous
> > > client sessions. Or maybe it was open/close/open on the guest/chardev
> > > side that didn't cause the flush... anyway:
> > >
> > > This is why guest-sync was there, so you could verify the stream was
> > > in sync with a given "session ID" before continuing. But that doesn't
> > > help much if the stream is in some garbage state that parser can't
> > > recover from...
> > >
> > > This is why guest-sync-delimited was introduced; it inserts a 0xFF
> > > sential value (invalid for any normal QMP stream) prior to response that
> > > a client can scan for to flush the stream. Similarly, clients are
> > > supposed to precede guest-sync/guest-sync-delimited so QGA to get stuck
> > > trying to parse a partial read from an earlier client that is 'eating' a
> > > new request from a new client connection. I don't think these are really
> > > issues with vsock (or the other transports QGA accepts), but AFAIK
> > > Windows is still mostly reliant on virtio-serial, so these are probably
> > > still needed.
> >
> > I believe you're right about the reason being virtio-serial.  I
> > documented it that way in commit 72e9e569d0 "docs/interop/qmp-spec: How
> > to force known good parser state".
> >
> > 2.6 Forcing the JSON parser into known-good state
> > -
> >
> > Incomplete or invalid input can leave the server's JSON parser in a
> > state where it can't parse additional commands.  To get it back into
> > known-good state, the client should provoke a lexical error.
> >
> > The cleanest way to do that is sending an ASCII control character
> > other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new
> > line).
> >
> > Sadly, older versions of QEMU can fail to flag this as an error.  If a
> > client needs to deal with them, it should send a 0xFF byte.
> >
> > 2.7 QGA Synchronization
> > ---
> >
> > When a client connects to QGA over a transport lacking proper
> > connection semantics such as virtio-serial, QGA may have read partial
> > input from a previous client.  The client needs to force QGA's parser
> > into known-good state using the previous section's technique.
> > Moreover, the client may receive output a previous client didn't read.
> > To help with skipping that output, QGA provides the
> > 'guest-sync-delimited' command.  Refer to its documentation for
> > details.
> >
> > 0xFF is invalid UTF-8, which is kind of icky.  We should've used a
> > proper control character like EOT (end of transmission) from the start.
> > Water under the bridge.
> >
> > guest-sync has another design flaw: an unread command reply consisting
> > of just an integer can be confused with guest-sync's reply.  Unlikely as
> > long as guest-sync's @id argument is chosen at random, as its
> > documentation demands.
> >
> > guest-sync could be deprecated, I guess.
>
> Yes, should probably be deprecated in favor of guest-sync-delimited. I
> left it for clients that really don't want to dig into the transport
> layer to search for 0xFF, but still want at least some ability to
> re-sync.
>
> >
> > The @id argument of guest-sync and guest-sync-delimited feels kind of
> > redundant with the command object's @id member.  Except QGA didn't
> > conform to the QMP spec until commit 4eaca8de26 "qmp: common 'id'
> > handling & make QGA conform to QMP spec" (v4.0.0).  More water under the
> > bridge.
> >
> > Note that there's no need for all this when the transport provides
> > proper connection semantics.  Clients relying on connection semantics
> > work fine even when

Re: [PATCH 4/5] python: qmp_shell: add -e/--exit-on-error option

2022-02-23 Thread Damien Hedde




On 2/23/22 17:43, Damien Hedde wrote:



On 2/23/22 16:44, Daniel P. Berrangé wrote:

On Wed, Feb 23, 2022 at 10:41:11AM -0500, John Snow wrote:
On Wed, Feb 23, 2022 at 10:27 AM Daniel P. Berrangé 
 wrote:


On Wed, Feb 23, 2022 at 10:22:11AM -0500, John Snow wrote:

On Mon, Feb 21, 2022 at 10:55 AM Damien Hedde
 wrote:


This option makes qmp_shell exit (with error code 1)
as soon as one of the following error occurs:
+ command parsing error
+ disconnection
+ command failure (response is an error)

_execute_cmd() method now returns None or the response
so that read_exec_command() can do the last check.

This is meant to be used in combination with an input file
redirection. It allows to store a list of commands
into a file and try to run them by qmp_shell and easily
see if it failed or not.

Signed-off-by: Damien Hedde 


Based on this patch, it looks like you really want something
scriptable, so I think the qemu-send idea that Dan has suggested might
be the best way to go. Are you still hoping to use the interactive
"short" QMP command format? That might be a bad idea, given how flaky
the parsing is -- and how we don't actually have a published standard
for that format. We've *never* liked the bad parsing here, so I have a
reluctance to use it in more places.


I don't really care of the command format. I was just using the one 
expected by qmp-shell to avoid providing another script.

I think it's better not to use some standard syntax like json.

I wanted to say the opposite: it's best to use json.
As long as we can store the commands into a file and tests them easily, 
it's ok. The crucial feature is the "stop as soon something unexpected 
happens" so that we can easily spot an issue.


I'm having the naive idea that a script file could be as simple as a
list of QMP commands to send:

[
 {"execute": "block-dirty-bitmap-add", "arguments": { ... }},
 ...
]


I used this format at some point because it's so trivial to feed into 
the QMP tools. Even used a yaml version of that to get the "human 
readability" that goes with it.




I'd really recommend against creating a new format for the script
file, especially one needing opening & closing  [] like this, as
that isn't so amenable to dynamic usage/creation. ie you can't
just append an extcra command to an existing file.

IMHO, the "file" format should be identical to the result of
capturing the socket data off the wire. ie just a concatenation
of QMP commands, with no extra wrapping / change in format.
>>

Eugh. That's just so hard to parse, because there's no off-the-shelf
tooling for "load a sequence of JSON documents". Nothing in Python
does it. :\


It isn't that hard if you require each JSON doc to be followed by
a newline.

Feed one line at a time to the JSON parser, until you get a complete
JSON doc, process that, then re-init the parser and carry on feeding
it lines until it emits the next JSON doc, and so on.



I agree it's doable. I can look into that.

It makes me think that I've managed to modify the chardev 'file' backend
a few months ago so that it can be used with an input file on the cli. 
This allowed to give such raw qmp file directly with the -qmp option 
instead of using an intermediate socket and a script issuing the same file.
But I gave up with this approach because then it can't stop if a command 
failed without hacking into the receiving side in qemu.


--
Damien

Re: [PATCH 4/5] python: qmp_shell: add -e/--exit-on-error option

2022-02-23 Thread Damien Hedde





On 2/23/22 16:44, Daniel P. Berrangé wrote:

On Wed, Feb 23, 2022 at 10:41:11AM -0500, John Snow wrote:

On Wed, Feb 23, 2022 at 10:27 AM Daniel P. Berrangé  wrote:


On Wed, Feb 23, 2022 at 10:22:11AM -0500, John Snow wrote:

On Mon, Feb 21, 2022 at 10:55 AM Damien Hedde
 wrote:


This option makes qmp_shell exit (with error code 1)
as soon as one of the following error occurs:
+ command parsing error
+ disconnection
+ command failure (response is an error)

_execute_cmd() method now returns None or the response
so that read_exec_command() can do the last check.

This is meant to be used in combination with an input file
redirection. It allows to store a list of commands
into a file and try to run them by qmp_shell and easily
see if it failed or not.

Signed-off-by: Damien Hedde 


Based on this patch, it looks like you really want something
scriptable, so I think the qemu-send idea that Dan has suggested might
be the best way to go. Are you still hoping to use the interactive
"short" QMP command format? That might be a bad idea, given how flaky
the parsing is -- and how we don't actually have a published standard
for that format. We've *never* liked the bad parsing here, so I have a
reluctance to use it in more places.


I don't really care of the command format. I was just using the one 
expected by qmp-shell to avoid providing another script.

I think it's better not to use some standard syntax like json.
As long as we can store the commands into a file and tests them easily, 
it's ok. The crucial feature is the "stop as soon something unexpected 
happens" so that we can easily spot an issue.


I'm having the naive idea that a script file could be as simple as a
list of QMP commands to send:

[
 {"execute": "block-dirty-bitmap-add", "arguments": { ... }},
 ...
]


I used this format at some point because it's so trivial to feed into 
the QMP tools. Even used a yaml version of that to get the "human 
readability" that goes with it.




I'd really recommend against creating a new format for the script
file, especially one needing opening & closing  [] like this, as
that isn't so amenable to dynamic usage/creation. ie you can't
just append an extcra command to an existing file.

IMHO, the "file" format should be identical to the result of
capturing the socket data off the wire. ie just a concatenation
of QMP commands, with no extra wrapping / change in format.
>>

Eugh. That's just so hard to parse, because there's no off-the-shelf
tooling for "load a sequence of JSON documents". Nothing in Python
does it. :\


It isn't that hard if you require each JSON doc to be followed by
a newline.

Feed one line at a time to the JSON parser, until you get a complete
JSON doc, process that, then re-init the parser and carry on feeding
it lines until it emits the next JSON doc, and so on.



I agree it's doable. I can look into that.

It makes me think that I've managed to modify the chardev 'file' backend
a few months ago so that it can be used with an input file on the cli. 
This allowed to give such raw qmp file directly with the -qmp option 
instead of using an intermediate socket and a script issuing the same file.
But I gave up with this approach because then it can't stop if a command 
failed without hacking into the receiving side in qemu.


--
Damien

Re: Analysis of slow distro boots in check-avocado (BootLinuxAarch64.test_virt_tcg*)

2022-02-23 Thread Laszlo Ersek

On 02/23/22 14:34, Philippe Mathieu-Daudé wrote:
> On 23/2/22 12:07, Daniel P. Berrangé wrote:
>> On Tue, Feb 22, 2022 at 06:33:41PM +0100, Philippe Mathieu-Daudé wrote:
>>> +Igor/MST for UEFI tests.
>>>
>>> On 22/2/22 17:38, Daniel P. Berrangé wrote:
 On Tue, Feb 22, 2022 at 04:17:23PM +, Alex Bennée wrote:
>
> Alex Bennée  writes:
>
>> Hi,
>>
>> TL;DR:
>>
>>     - pc-bios/edk2-aarch64-code.fd should be rebuilt without debug
>
> Laszlo,
>
> Would it be possible to do a less debug enabled version of EDK2 on the
> next update to pc-bios/edk2-*?

 NB, Laszlo is no longer  maintaining EDK2 in QEMU, it was handed
 over to Philippe.  I'm CC'ing Gerd too since he's a reviewer and
 an EDK2 contributor taking over from Lazslo in EDK2 community
>>>
>>> We need the DEBUG profile to ensure the bios-tables-tests work.
>>
>> Can you elaborate on what bios-tables-tests needs this for, and
>> what coverage we would loose by disabling DEBUG.
> 
> Maybe it was only required when the tests were developed...
> I'll defer that question to Igor.

I've briefly rechecked commits 77db55fc8155 ("tests/uefi-test-tools: add
build scripts", 2019-02-21) and 536d2173b2b3 ("roms: build edk2 firmware
binaries and variable store templates", 2019-04-17). I think my only
reason for picking the DEBUG build target was that other build targets
are generally useless for debugging -- they produce no logs (or fewer logs).

> 
>> It may well be a better tradeoff to sacrifice part of bios-tables-tests
>> in favour of shipping more broadly usable images without DEBUG.
> 
> Why not, if users are aware/happy to use a unsafe image with various
> unfixed CVEs.
> 
> Removing the debug profile is as simple as this one-line patch:
> 
> -- >8 --
> diff --git a/roms/edk2-build.sh b/roms/edk2-build.sh
> index d5391c7637..ea79dc27a2 100755
> --- a/roms/edk2-build.sh
> +++ b/roms/edk2-build.sh
> @@ -50,6 +50,6 @@ qemu_edk2_set_cross_env "$emulation_target"
>  build \
>    --cmd-len=65536 \
>    -n "$edk2_thread_count" \
> -  --buildtarget=DEBUG \
> +  --buildtarget=RELEASE \
>    --tagname="$edk2_toolchain" \
>    "${args[@]}"
> ---
> 

The patch would be larger; the DEBUG build target is included in a bunch
of pathnames (see those original two commits).

BTW I still don't understand the problem with the DEBUG firmware builds;
in the test suite, as many debug messages should be printed as possible,
for helping with the analysis of any new issue that pops up. I've
re-read Alex's message that I got first CC'd on, and I can't connect the
dots, sorry.

Thanks
Laszlo

Re: [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG

2022-02-23 Thread Jason A. Donenfeld

Adding the Hyper-V people to this:

On Wed, Feb 23, 2022 at 2:13 PM Jason A. Donenfeld  wrote:
>
> VM Generation ID is a feature from Microsoft, described at
> , and supported by
> Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
> , as:
>
> If the OS is running in a VM, there is a problem that most
> hypervisors can snapshot the state of the machine and later rewind
> the VM state to the saved state. This results in the machine running
> a second time with the exact same RNG state, which leads to serious
> security problems.  To reduce the window of vulnerability, Windows
> 10 on a Hyper-V VM will detect when the VM state is reset, retrieve
> a unique (not random) value from the hypervisor, and reseed the root
> RNG with that unique value.  This does not eliminate the
> vulnerability, but it greatly reduces the time during which the RNG
> system will produce the same outputs as it did during a previous
> instantiation of the same VM state.
>
> Linux has the same issue, and given that vmgenid is supported already by
> multiple hypervisors, we can implement more or less the same solution.
> So this commit wires up the vmgenid ACPI notification to the RNG's newly
> added add_vmfork_randomness() function.
>
> This driver builds on prior work from Adrian Catangiu at Amazon, and it
> is my hope that that team can resume maintenance of this driver.

If any of you have some experience with the Hyper-V side of this
protocol, could you take a look at this and see if it matches the way
this is supposed to work? It appears to work fine with QEMU's
behavior, at least, but I know Hyper-V has a lot of additional
complexities.

Thanks,
Jason

Re: configure: How to pass flags to the Objective-C compiler?

2022-02-23 Thread Joshua Seaton

> You can use this patch (which is going to be merged soon):
Any ETA on when this will merge?

> This entry in the machine file affects the compilation steps:
>
> +  test -n "$objcc" && echo "objc_args = [$(meson_quote $OBJCFLAGS 
> $EXTRA_OBJCFLAGS)]" >> $cross
Great! I had naively assumed that there was more plumbing involved
than merely defining the variable`objc_args`.


Joshua.

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

On Wed, Feb 23, 2022 at 5:08 PM Jason A. Donenfeld  wrote:
>
> On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld  wrote:
> > second patch is the reason this is just an RFC: it's a cleanup of the
> > ACPI driver from last year, and I don't really have much experience
> > writing, testing, debugging, or maintaining these types of drivers.
> > Ideally this thread would yield somebody saying, "I see the intent of
> > this; I'm happy to take over ownership of this part." That way, I can
> > focus on the RNG part, and whoever steps up for the paravirt ACPI part
> > can focus on that.
>
> I actually managed to test this in QEMU, and it seems to work quite well. 
> Steps:
>
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) savevm blah
> (qemu) quit
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) loadvm blah
>
> Doing this successfully triggers the function to reinitialize the RNG
> with the new GUID. (It appears there's a bug in QEMU which prevents
> the GUID from being reinitialized when running `loadvm` without
> quitting first; I suppose this should be discussed with QEMU
> upstream.)
>
> So that's very positive. But I would appreciate hearing from some
> ACPI/Virt/Amazon people about this.

Because something something picture thousand words something, here's a
gif to see this working as expected:
https://data.zx2c4.com/vmgenid-appears-to-work.gif

Jason

Re: [PATCH 4/5] python: qmp_shell: add -e/--exit-on-error option

2022-02-23 Thread John Snow

On Wed, Feb 23, 2022 at 10:44 AM Daniel P. Berrangé  wrote:
>
> On Wed, Feb 23, 2022 at 10:41:11AM -0500, John Snow wrote:
> > On Wed, Feb 23, 2022 at 10:27 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Wed, Feb 23, 2022 at 10:22:11AM -0500, John Snow wrote:
> > > > On Mon, Feb 21, 2022 at 10:55 AM Damien Hedde
> > > >  wrote:
> > > > >
> > > > > This option makes qmp_shell exit (with error code 1)
> > > > > as soon as one of the following error occurs:
> > > > > + command parsing error
> > > > > + disconnection
> > > > > + command failure (response is an error)
> > > > >
> > > > > _execute_cmd() method now returns None or the response
> > > > > so that read_exec_command() can do the last check.
> > > > >
> > > > > This is meant to be used in combination with an input file
> > > > > redirection. It allows to store a list of commands
> > > > > into a file and try to run them by qmp_shell and easily
> > > > > see if it failed or not.
> > > > >
> > > > > Signed-off-by: Damien Hedde 
> > > >
> > > > Based on this patch, it looks like you really want something
> > > > scriptable, so I think the qemu-send idea that Dan has suggested might
> > > > be the best way to go. Are you still hoping to use the interactive
> > > > "short" QMP command format? That might be a bad idea, given how flaky
> > > > the parsing is -- and how we don't actually have a published standard
> > > > for that format. We've *never* liked the bad parsing here, so I have a
> > > > reluctance to use it in more places.
> > > >
> > > > I'm having the naive idea that a script file could be as simple as a
> > > > list of QMP commands to send:
> > > >
> > > > [
> > > > {"execute": "block-dirty-bitmap-add", "arguments": { ... }},
> > > > ...
> > > > ]
> > >
> > > I'd really recommend against creating a new format for the script
> > > file, especially one needing opening & closing  [] like this, as
> > > that isn't so amenable to dynamic usage/creation. ie you can't
> > > just append an extcra command to an existing file.
> > >
> > > IMHO, the "file" format should be identical to the result of
> > > capturing the socket data off the wire. ie just a concatenation
> > > of QMP commands, with no extra wrapping / change in format.
> > >
> >
> > Eugh. That's just so hard to parse, because there's no off-the-shelf
> > tooling for "load a sequence of JSON documents". Nothing in Python
> > does it. :\
>
> It isn't that hard if you require each JSON doc to be followed by
> a newline.
>
> Feed one line at a time to the JSON parser, until you get a complete
> JSON doc, process that, then re-init the parser and carry on feeding
> it lines until it emits the next JSON doc, and so on.
>

There's two interfaces in Python:

(1) json.load(), which takes a file pointer and either returns a
single, complete JSON document or it raises an Exception. It's not
useful here at all.
(2) json.JSONDecoder().raw_decode(strbuf), which takes a string buffer
and returns a 2-tuple of a JSON Document and the position at which it
stopped decoding.

The second is what we need here, but it does require buffering the
entire file into a string first, and then iteratively calling it. It
feels like working against the grain a little bit. We also can't use
the QAPI parser, as that parser has intentionally removed support for
constructs we don't use in the qapi schema language. Boo. (Not that I
want more non-standard configuration files like that propagating,
either.)

It would be possible to generate a JSON-Schema document to describe a
script file that used a containing list construct, but impossible for
a concatenation of JSON documents. This is one of the reasons I
instinctively shy away from non-standard file formats, they tend to
cut off support for this sort of thing.

Wanting to keep the script easy to append to is legitimate. I'm keen
to hear a bit more about the use case here before I press extremely
hard in any given direction, but those are my impulses here.

--js

Re: Fix a potential Use-after-free in virtio_iommu_handle_command() (v6.2.0).

2022-02-23 Thread Philippe Mathieu-Daudé


On 23/2/22 17:02, Thomas Huth wrote:

On 23/02/2022 15.36, wli...@stu.xidian.edu.cn wrote:

Hi all,

I find a potential Use-after-free in QEMU 6.2.0, which is in 
virtio_iommu_handle_command() (./hw/virtio/virtio-iommu.c).



I'm looking forward to your confirmation.


  Hi,

thanks for your report and patch - but to make sure that the right 
people get attention, please use the scripts/get_maintainer.pl script to 
get a list of people who should be on CC:, or look into the MAINTAINERS 
file directly (for the next time - this time, I've CC:ed them now already).


You can find the contribution guidelines here:
https://www.qemu.org/docs/master/devel/submitting-a-patch.html

Re: Fix a potential memory leak bug in write_boot_rom() (v6.2.0).

2022-02-23 Thread Philippe Mathieu-Daudé


On 23/2/22 15:39, wli...@stu.xidian.edu.cn wrote:

Hi all,

I find a memory leak bug in QEMU 6.2.0, which is in 
write_boot_rom()(./hw/arm/aspeed.c).


Specifically, at line 276, a memory chunk is allocated with g_new0() and 
assigned to the variable 'storage'. However, if the branch takes true at 
line 277, there will be only an error report at line 278 but not a free 
operation for 'storage' before function returns. As a result, a memory 
leak bug is triggered.



259    BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
...
276    storage = g_new0(uint8_t, rom_size);
277    if (blk_pread(blk, 0, storage, rom_size) < 0) {
278        error_setg(errp, "failed to read the initial flash content");
279        return;
280    }


I believe that the problem can be fixed by adding a g_free() before the 
function returns.



277    if (blk_pread(blk, 0, storage, rom_size) < 0) {
278        error_setg(errp, "failed to read the initial flash content");
+++    g_free(storage);
279        return;
280    }


I'm looking forward to your confirmation.


Correct.

Or using g_autofree:

-- >8 --
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index d911dc904f..170e773ef8 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -257,7 +257,7 @@ static void write_boot_rom(DriveInfo *dinfo, hwaddr 
addr, size_t rom_size,

Error **errp)
 {
 BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
-uint8_t *storage;
+g_autofree void *storage = NULL;
 int64_t size;

 /* The block backend size should have already been 'validated' by
@@ -273,14 +273,13 @@ static void write_boot_rom(DriveInfo *dinfo, 
hwaddr addr, size_t rom_size,

 rom_size = size;
 }

-storage = g_new0(uint8_t, rom_size);
+storage = g_malloc0(rom_size);
 if (blk_pread(blk, 0, storage, rom_size) < 0) {
 error_setg(errp, "failed to read the initial flash content");
 return;
 }

 rom_add_blob_fixed("aspeed.boot_rom", storage, rom_size, addr);
-g_free(storage);
 }
---

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld  wrote:
> second patch is the reason this is just an RFC: it's a cleanup of the
> ACPI driver from last year, and I don't really have much experience
> writing, testing, debugging, or maintaining these types of drivers.
> Ideally this thread would yield somebody saying, "I see the intent of
> this; I'm happy to take over ownership of this part." That way, I can
> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> can focus on that.

I actually managed to test this in QEMU, and it seems to work quite well. Steps:

$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) savevm blah
(qemu) quit
$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) loadvm blah

Doing this successfully triggers the function to reinitialize the RNG
with the new GUID. (It appears there's a bug in QEMU which prevents
the GUID from being reinitialized when running `loadvm` without
quitting first; I suppose this should be discussed with QEMU
upstream.)

So that's very positive. But I would appreciate hearing from some
ACPI/Virt/Amazon people about this.

Jason

Re: Fix a potential Use-after-free in virtio_iommu_handle_command() (v6.2.0).

2022-02-23 Thread Thomas Huth


On 23/02/2022 15.36, wli...@stu.xidian.edu.cn wrote:

Hi all,

I find a potential Use-after-free in QEMU 6.2.0, which is in 
virtio_iommu_handle_command() (./hw/virtio/virtio-iommu.c).


Specifically, in the loop body, the variable 'buf' allocated at line 639 can 
be freed by g_free() at line 659. However, if the execution path enters the 
loop body again and the if branch takes true at line 616, the control will 
directly jump to 'out' at line 651. At this time, 'buf' is a freed pointer, 
which is not assigned with an allocated memory but used at line 653. As a 
result, a UAF bug is triggered.




599    for (;;) {
...
615        sz = iov_to_buf(iov, iov_cnt, 0, , sizeof(head));
616        if (unlikely(sz != sizeof(head))) {
617            tail.status = VIRTIO_IOMMU_S_DEVERR;
618            goto out;
619        }
...
639            buf = g_malloc0(output_size);
...
651out:
652        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
653                          buf ? buf : , output_size);
...
659        g_free(buf);
660    }


We can fix it by set ‘buf‘ to NULL after freeing it:



651out:
652        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
653                          buf ? buf : , output_size);
...
659        g_free(buf);
+++buf = NULL;
660    }


I'm looking forward to your confirmation.


 Hi,

thanks for your report and patch - but to make sure that the right people 
get attention, please use the scripts/get_maintainer.pl script to get a list 
of people who should be on CC:, or look into the MAINTAINERS file directly 
(for the next time - this time, I've CC:ed them now already).


 Thanks,
  Thomas

[PATCH] aio-posix: fix spurious ->poll_ready() callbacks in main loop

2022-02-23 Thread Stefan Hajnoczi

When ->poll() succeeds the AioHandler is placed on the ready list with
revents set to the magic value 0. This magic value causes
aio_dispatch_handler() to invoke ->poll_ready() instead of ->io_read()
for G_IO_IN or ->io_write() for G_IO_OUT.

This magic value 0 hack works for the IOThread where AioHandlers are
placed on ->ready_list and processed by aio_dispatch_ready_handlers().
It does not work for the main loop where all AioHandlers are processed
by aio_dispatch_handlers(), even those that are not ready and have a
revents value of 0.

As a result the main loop invokes ->poll_ready() on AioHandlers that are
not ready. These spurious ->poll_ready() calls waste CPU cycles and
could lead to crashes if the code assumes ->poll() must have succeeded
before ->poll_ready() is called (a reasonable asumption but I haven't
seen it in practice).

Stop using revents to track whether ->poll_ready() will be called on an
AioHandler. Introduce a separate AioHandler->poll_ready field instead.
This eliminates spurious ->poll_ready() calls in the main loop.

Fixes: 826cc32423db2a99d184dbf4f507c737d7e7a4ae ("aio-posix: split poll check 
from ready handler")
Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.h |  1 +
 util/aio-posix.c | 32 ++--
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/util/aio-posix.h b/util/aio-posix.h
index 7f2c37a684..80b927c7f4 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -37,6 +37,7 @@ struct AioHandler {
 unsigned flags; /* see fdmon-io_uring.c */
 #endif
 int64_t poll_idle_timeout; /* when to stop userspace polling */
+bool poll_ready; /* has polling detected an event? */
 bool is_external;
 };
 
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 7b9f629218..be0182a3c6 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -23,15 +23,6 @@
 #include "trace.h"
 #include "aio-posix.h"
 
-/*
- * G_IO_IN and G_IO_OUT are not appropriate revents values for polling, since
- * the handler may not need to access the file descriptor. For example, the
- * handler doesn't need to read from an EventNotifier if it polled a memory
- * location and a read syscall would be slow. Define our own unique revents
- * value to indicate that polling determined this AioHandler is ready.
- */
-#define REVENTS_POLL_READY 0
-
 /* Stop userspace polling on a handler if it isn't active for some time */
 #define POLL_IDLE_INTERVAL_NS (7 * NANOSECONDS_PER_SECOND)
 
@@ -49,6 +40,14 @@ void aio_add_ready_handler(AioHandlerList *ready_list,
 QLIST_INSERT_HEAD(ready_list, node, node_ready);
 }
 
+static void aio_add_poll_ready_handler(AioHandlerList *ready_list,
+   AioHandler *node)
+{
+QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
+node->poll_ready = true;
+QLIST_INSERT_HEAD(ready_list, node, node_ready);
+}
+
 static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 {
 AioHandler *node;
@@ -76,6 +75,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler 
*node)
 }
 
 node->pfd.revents = 0;
+node->poll_ready = false;
 
 /* If the fd monitor has already marked it deleted, leave it alone */
 if (QLIST_IS_INSERTED(node, node_deleted)) {
@@ -247,7 +247,7 @@ static bool poll_set_started(AioContext *ctx, 
AioHandlerList *ready_list,
 
 /* Poll one last time in case ->io_poll_end() raced with the event */
 if (!started && node->io_poll(node->opaque)) {
-aio_add_ready_handler(ready_list, node, REVENTS_POLL_READY);
+aio_add_poll_ready_handler(ready_list, node);
 progress = true;
 }
 }
@@ -282,6 +282,7 @@ bool aio_pending(AioContext *ctx)
 QLIST_FOREACH_RCU(node, >aio_handlers, node) {
 int revents;
 
+/* TODO should this check poll ready? */
 revents = node->pfd.revents & node->pfd.events;
 if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
 aio_node_check(ctx, node->is_external)) {
@@ -323,11 +324,15 @@ static void aio_free_deleted_handlers(AioContext *ctx)
 static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
 {
 bool progress = false;
+bool poll_ready;
 int revents;
 
 revents = node->pfd.revents & node->pfd.events;
 node->pfd.revents = 0;
 
+poll_ready = node->poll_ready;
+node->poll_ready = false;
+
 /*
  * Start polling AioHandlers when they become ready because activity is
  * likely to continue.  Note that starvation is theoretically possible when
@@ -344,7 +349,7 @@ static bool aio_dispatch_handler(AioContext *ctx, 
AioHandler *node)
 QLIST_INSERT_HEAD(>poll_aio_handlers, node, node_poll);
 }
 if (!QLIST_IS_INSERTED(node, node_deleted) &&
-revents == 0 &&
+poll_ready && revents == 0 &&
 aio_node_check(ctx, node->is_external) &&
 node->io_poll_ready) {
 node->io_poll_ready(node->opaque);
@@ -432,7

Re: virtio-blk issue with vIOMMU

2022-02-23 Thread Stefan Hajnoczi

On Wed, Feb 23, 2022 at 12:37:03PM +0800, Jason Wang wrote:
> Hi Stefan:
> 
> Recently I found intel vIOMMU gives the following warning when using
> virtio-blk:
> 
> qemu-system-x86_64: vtd_iova_to_slpte: detected slpte permission error
> (iova=0x7ffde000, level=0x3, slpte=0x0, write=0)
> qemu-system-x86_64: vtd_iommu_translate: detected translation failure
> (dev=01:00:00, iova=0x7ffde000)
> qemu-system-x86_64: New fault is not recorded due to compression of faults
> qemu-system-x86_64: virtio: zero sized buffers are not allowed
> 
> It happens on the boot (device start), and virtio-blk works well after this.
> A quick stack trace is:
> 
> Thread 1 "qemu-system-x86" hit Breakpoint 1, vtd_iova_to_slpte
> (s=0x57a9f710, ce=0x7fffd6e0, iova=2147344384, is_write=false,
> slptep=0x7fffd6b8,
>     slpte_level=0x7fffd6b0, reads=0x7fffd6aa, writes=0x7fffd6ab,
> aw_bits=39 '\'') at ../hw/i386/intel_iommu.c:1055
> 1055        error_report_once("%s: detected slpte permission error "
> (gdb) bt
> #0  vtd_iova_to_slpte
>     (s=0x57a9f710, ce=0x7fffd6e0, iova=2147344384, is_write=false,
> slptep=0x7fffd6b8, slpte_level=0x7fffd6b0, reads=0x7fffd6aa,
> writes=0x7fffd6ab, aw_bits=39 '\'') at ../hw/i386/intel_iommu.c:1055
> #1  0x55b45734 in vtd_do_iommu_translate (vtd_as=0x574cd000,
> bus=0x5766e700, devfn=0 '\000', addr=2147344384, is_write=false,
> entry=0x7fffd780)
>     at ../hw/i386/intel_iommu.c:1785
> #2  0x55b48543 in vtd_iommu_translate (iommu=0x574cd070,
> addr=2147344384, flag=IOMMU_RO, iommu_idx=0) at
> ../hw/i386/intel_iommu.c:2996
> #3  0x55bd3f4d in address_space_translate_iommu
>     (iommu_mr=0x574cd070, xlat=0x7fffd9f0, plen_out=0x7fffd9e8,
> page_mask_out=0x0, is_write=false, is_mmio=true, target_as=0x7fffd938,
> attrs=...)
>     at ../softmmu/physmem.c:433
> #4  0x55bdbdd1 in address_space_translate_cached
> (cache=0x7fffed3d02e0, addr=0, xlat=0x7fffd9f0, plen=0x7fffd9e8,
> is_write=false, attrs=...)
>     at ../softmmu/physmem.c:3388
> #5  0x55bdc519 in address_space_lduw_internal_cached_slow
> (cache=0x7fffed3d02e0, addr=0, attrs=..., result=0x0,
> endian=DEVICE_LITTLE_ENDIAN)
>     at /home/devel/git/qemu/memory_ldst.c.inc:209
> #6  0x55bdc6ac in address_space_lduw_le_cached_slow
> (cache=0x7fffed3d02e0, addr=0, attrs=..., result=0x0) at
> /home/devel/git/qemu/memory_ldst.c.inc:253
> #7  0x55c71719 in address_space_lduw_le_cached
> (cache=0x7fffed3d02e0, addr=0, attrs=..., result=0x0)
>     at /home/devel/git/qemu/include/exec/memory_ldst_cached.h.inc:35
> #8  0x55c7196a in lduw_le_phys_cached (cache=0x7fffed3d02e0, addr=0)
> at /home/devel/git/qemu/include/exec/memory_ldst_phys.h.inc:67
> #9  0x55c728fd in virtio_lduw_phys_cached (vdev=0x57743720,
> cache=0x7fffed3d02e0, pa=0) at
> /home/devel/git/qemu/include/hw/virtio/virtio-access.h:166
> #10 0x55c73485 in vring_used_flags_set_bit (vq=0x74ee5010,
> mask=1) at ../hw/virtio/virtio.c:383
> #11 0x55c736a8 in virtio_queue_split_set_notification
> (vq=0x74ee5010, enable=0) at ../hw/virtio/virtio.c:433
> #12 0x55c73896 in virtio_queue_set_notification (vq=0x74ee5010,
> enable=0) at ../hw/virtio/virtio.c:490
> #13 0x55c19064 in virtio_blk_handle_vq (s=0x57743720,
> vq=0x74ee5010) at ../hw/block/virtio-blk.c:782
> #14 0x55c191f5 in virtio_blk_handle_output (vdev=0x57743720,
> vq=0x74ee5010) at ../hw/block/virtio-blk.c:819
> #15 0x55c78453 in virtio_queue_notify_vq (vq=0x74ee5010) at
> ../hw/virtio/virtio.c:2315
> #16 0x55c7b523 in virtio_queue_host_notifier_aio_poll_ready
> (n=0x74ee5084) at ../hw/virtio/virtio.c:3516
> #17 0x55eff158 in aio_dispatch_handler (ctx=0x5680fac0,
> node=0x7fffeca5bbe0) at ../util/aio-posix.c:350
> #18 0x55eff390 in aio_dispatch_handlers (ctx=0x5680fac0) at
> ../util/aio-posix.c:406
> #19 0x55eff3ea in aio_dispatch (ctx=0x5680fac0) at
> ../util/aio-posix.c:416
> #20 0x55f184eb in aio_ctx_dispatch (source=0x5680fac0,
> callback=0x0, user_data=0x0) at ../util/async.c:311
> #21 0x77b6b17d in g_main_context_dispatch () at
> /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> #22 0x55f299ed in glib_pollfds_poll () at ../util/main-loop.c:232
> #23 0x55f29a6b in os_host_main_loop_wait (timeout=0) at
> ../util/main-loop.c:255
> #24 0x55f29b7c in main_loop_wait (nonblocking=0) at
> ../util/main-loop.c:531
> #25 0x55be097c in qemu_main_loop () at ../softmmu/runstate.c:727
> #26 0x558367fa in main (argc=26, argv=0x7fffe058,
> envp=0x7fffe130) at ../softmmu/main.c:50
> 
> The slpte is 0x0 and level is 3 which probably means the device is kicked
> before it was attached to any IOMMU domain.
> 
> Bisecting points to the first bad commit:
> 
> commit

Fix a potential Use-after-free in virtio_iommu_handle_command() (v6.2.0).

2022-02-23 Thread wliang

Hi all,

I find a potential Use-after-free in QEMU 6.2.0, which is in 
virtio_iommu_handle_command() (./hw/virtio/virtio-iommu.c).



Specifically, in the loop body, the variable 'buf' allocated at line 639 can be 
freed by g_free() at line 659. However, if the execution path enters the loop 
body again and the if branch takes true at line 616, the control will directly 
jump to 'out' at line 651. At this time, 'buf' is a freed pointer, which is not 
assigned with an allocated memory but used at line 653. As a result, a UAF bug 
is triggered.





599for (;;) {
...
615sz = iov_to_buf(iov, iov_cnt, 0, , sizeof(head));
616if (unlikely(sz != sizeof(head))) {
617tail.status = VIRTIO_IOMMU_S_DEVERR;
618goto out;
619}
...
639buf = g_malloc0(output_size);
...
651out:
652sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
653  buf ? buf : , output_size);
...
659g_free(buf);
660}






We can fix it by set ‘buf‘ to NULL after freeing it:



651out:
652sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
653  buf ? buf : , output_size);
...
659g_free(buf);
+++buf = NULL;
660}


I'm looking forward to your confirmation.


Best,
Wentao
--- ./hw/virtio/virtio-iommu.c	2022-02-23 15:06:32.040727196 +0800
+++ ./hw/virtio/virtio-iommu-PATCH.c	2022-02-23 21:12:24.605032121 +0800
@@ -657,6 +657,7 @@
 virtio_notify(vdev, vq);
 g_free(elem);
 g_free(buf);
+buf = NULL;
 }
 }

Fix a potential Use-after-free bug in handle_simd_shift_fpint_conv() (v6.2.0).

2022-02-23 Thread wliang


Hi all,

I find a potential Use-after-free bug in QEMU 6.2.0, which is in 
handle_simd_shift_fpint_conv()(./target/arm/translate-a64.c).

At line 9048, a variable 'tcg_fpstatus' is freed by invoking 
tcg_temp_free_ptr(). However, at line 9050, the variable 'tcg_fpstatus' is 
subsequently use as the 3rd parameter of the function gen_helper_set_rmode. 
This may result in a use-after-free bug.


9048tcg_temp_free_ptr(tcg_fpstatus);
9049tcg_temp_free_i32(tcg_shift);
9050gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);


I believe the bug can be fixed by invoking the gen_helper_set_rmode() before 
'tcg_fpstatus' being freed by the tcg_temp_free_ptr().


 ---tcg_temp_free_ptr(tcg_fpstatus);
9049tcg_temp_free_i32(tcg_shift);
9050gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 +++tcg_temp_free_ptr(tcg_fpstatus);
 
I'm looking forward to your confirmation.


Best,

Wentao
--- ./target/arm/translate-a64.c	2022-02-23 15:06:32.212756633 +0800
+++ ./target/arm/translate-a64-PATCH.c	2022-02-23 21:13:15.604128138 +0800
@@ -9045,9 +9045,9 @@
 }
 }
 
-tcg_temp_free_ptr(tcg_fpstatus);
 tcg_temp_free_i32(tcg_shift);
 gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
+tcg_temp_free_ptr(tcg_fpstatus);
 tcg_temp_free_i32(tcg_rmode);
 }

Fix a potential Use-after-free in test_blockjob_common_drain_node() (v6.2.0).

2022-02-23 Thread wliang

Hi all,

I find a potential Use-after-free in QEMU 6.2.0, which is in 
test_blockjob_common_drain_node() (./tests/unit/test-bdrv-drain.c).

Specifically, at line 880, the variable 'scr' is released by the bdrv_unref(). 
However, at line 881, it is subsequently used as the 1st parameter of the 
function bdrv_set_backing_hd(). As a result, an UAF bug may be triggered.





880bdrv_unref(src);


881bdrv_set_backing_hd(src, src_backing, _abort);





I believe that the problem can be fixed by invoking bdrv_unref() after the call 
of bdrv_set_backing_hd() rather than before it.


---bdrv_unref(src);
881bdrv_set_backing_hd(src, src_backing, _abort);
+++bdrv_unref(src);


I'm looking forward to your confirmation.

Best,
Wentao
--- ./tests/unit/test-bdrv-drain.c	2022-02-23 15:06:32.384786070 +0800
+++ ./tests/unit/test-bdrv-drain-PATCH.c	2022-02-23 21:16:43.444928992 +0800
@@ -877,8 +877,8 @@
BDRV_O_RDWR, _abort);
 
 bdrv_set_backing_hd(src_overlay, src, _abort);
-bdrv_unref(src);
 bdrv_set_backing_hd(src, src_backing, _abort);
+bdrv_unref(src);
 bdrv_unref(src_backing);
 
 blk_src = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);

Fix a potential memory leak bug in write_boot_rom() (v6.2.0).

2022-02-23 Thread wliang

Hi all,

I find a memory leak bug in QEMU 6.2.0, which is in 
write_boot_rom()(./hw/arm/aspeed.c).

Specifically, at line 276, a memory chunk is allocated with g_new0() and 
assigned to the variable 'storage'. However, if the branch takes true at line 
277, there will be only an error report at line 278 but not a free operation 
for 'storage' before function returns. As a result, a memory leak bug is 
triggered.


259BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
...
276storage = g_new0(uint8_t, rom_size);
277if (blk_pread(blk, 0, storage, rom_size) < 0) {
278error_setg(errp, "failed to read the initial flash content");
279return;
280}


I believe that the problem can be fixed by adding a g_free() before the 
function returns.


277if (blk_pread(blk, 0, storage, rom_size) < 0) {
278error_setg(errp, "failed to read the initial flash content");
+++g_free(storage);
279return;
280}


I'm looking forward to your confirmation.

Best,
Wentao
--- ./hw/arm/aspeed.c	2022-02-23 15:06:31.928708083 +0800
+++ ./hw/arm/aspeed-PATCH.c	2022-02-23 21:22:28.200802801 +0800
@@ -276,6 +276,7 @@
 storage = g_new0(uint8_t, rom_size);
 if (blk_pread(blk, 0, storage, rom_size) < 0) {
 error_setg(errp, "failed to read the initial flash content");
+g_free(storage);
 return;
 }

1 2 3 >

1 - 100 of 207 matches

Mail list logo