date:20220804

RE: [PATCH] net/colo.c: Fix the pointer issuse reported by Coverity.

2022-08-04 Thread Zhang, Chen



> -Original Message-
> From: Jason Wang 
> Sent: Friday, August 5, 2022 11:46 AM
> To: Zhang, Chen 
> Cc: Peter Maydell ; Li Zhijian
> ; qemu-dev 
> Subject: Re: [PATCH] net/colo.c: Fix the pointer issuse reported by Coverity.
> 
> On Tue, Aug 2, 2022 at 4:24 PM Zhang Chen  wrote:
> >
> > When enable the virtio-net-pci, guest network packet will load the
> > vnet_hdr. In COLO status, the primary VM's network packet maybe
> > redirect to another VM, it need filter-redirect enable the vnet_hdr
> > flag at the same time, COLO-proxy will correctly parse the original
> > network packet. If have any misconfiguration here, the vnet_hdr_len is
> > wrong for parse the packet, the data+offset will point to wrong place.
> >
> > Signed-off-by: Zhang Chen 
> > ---
> >  net/colo.c | 16 ++--
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/colo.c b/net/colo.c
> > index 6b0ff562ad..dfb15b4c14 100644
> > --- a/net/colo.c
> > +++ b/net/colo.c
> > @@ -44,21 +44,25 @@ int parse_packet_early(Packet *pkt)  {
> >  int network_length;
> >  static const uint8_t vlan[] = {0x81, 0x00};
> > -uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> > +uint8_t *data = pkt->data;
> >  uint16_t l3_proto;
> >  ssize_t l2hdr_len;
> >
> >  if (data == NULL) {
> 
> I wonder under which case we can see data == NULL?
> 
> AFAIK, data is either dup via packet_new() or assigned to a pointer to the buf
> in packet_new_nocopy().

Yes, you are right. I just checked it for hint of bugs.
Do you think no need to do it?

Thanks
Chen

> 
> Thanks
> 
> > -trace_colo_proxy_main_vnet_info("This packet is not parsed 
> > correctly,
> "
> > -"pkt->vnet_hdr_len", 
> > pkt->vnet_hdr_len);
> > +trace_colo_proxy_main("COLO-proxy got NULL data packet ");
> >  return 1;
> >  }
> > -l2hdr_len = eth_get_l2_hdr_length(data);
> >
> > -if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
> > -trace_colo_proxy_main("pkt->size < ETH_HLEN");
> > +/* Check the received vnet_hdr_len then add the offset */
> > +if (pkt->size < sizeof(struct eth_header) + sizeof(struct vlan_header)
> > ++ pkt->vnet_hdr_len) {
> > +trace_colo_proxy_main_vnet_info("This packet may be load wrong "
> > +"pkt->vnet_hdr_len",
> > + pkt->vnet_hdr_len);
> >  return 1;
> >  }
> > +data += pkt->vnet_hdr_len;
> > +
> > +l2hdr_len = eth_get_l2_hdr_length(data);
> >
> >  /*
> >   * TODO: support vlan.
> > --
> > 2.25.1
> >

Re: [PULL 0/5] Trivial branch for 7.1 patches

2022-08-04 Thread Markus Armbruster

I was hoping for "[PATCH] contrib/vhost-user-blk: Clean up deallocation
of VuVirtqElement".

https://patchew.org/QEMU/20220630085219.1305519-1-arm...@redhat.com/

Next time, perhaps?

Re: [RFC 1/1] hw: tpmtisspi: add SPI support to QEMU TPM implementation

2022-08-04 Thread Dan Zhang

On Thu, Aug 4, 2022 at 4:21 PM Peter Delevoryas  wrote:
>
> On Thu, Aug 04, 2022 at 11:07:10AM -0700, Dan Zhang wrote:
> > On Wed, Aug 3, 2022 at 10:30 AM Peter Delevoryas  wrote:
> > >
> > > On Wed, Aug 03, 2022 at 10:52:23AM +0200, Cédric Le Goater wrote:
> > > > On 8/3/22 04:32, Iris Chen wrote:
> > > > > From: Iris Chen 
> > > >
> > > > A commit log telling us about this new device would be good to have.
> > > >
> > > >
> > > > > Signed-off-by: Iris Chen 
> > > > > ---
> > > > >   configs/devices/arm-softmmu/default.mak |   1 +
> > > > >   hw/arm/Kconfig  |   5 +
> > > > >   hw/tpm/Kconfig  |   5 +
> > > > >   hw/tpm/meson.build  |   1 +
> > > > >   hw/tpm/tpm_tis_spi.c| 311 
> > > > > 
> > > > >   include/sysemu/tpm.h|   3 +
> > > > >   6 files changed, 326 insertions(+)
> > > > >   create mode 100644 hw/tpm/tpm_tis_spi.c
> > > > >
> > > > > diff --git a/configs/devices/arm-softmmu/default.mak 
> > > > > b/configs/devices/arm-softmmu/default.mak
> > > > > index 6985a25377..80d2841568 100644
> > > > > --- a/configs/devices/arm-softmmu/default.mak
> > > > > +++ b/configs/devices/arm-softmmu/default.mak
> > > > > @@ -42,3 +42,4 @@ CONFIG_FSL_IMX6UL=y
> > > > >   CONFIG_SEMIHOSTING=y
> > > > >   CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
> > > > >   CONFIG_ALLWINNER_H3=y
> > > > > +CONFIG_FBOBMC_AST=y
> > > >
> > > > I don't think this extra config is useful for now
> > > >
> > > > > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > > > > index 15fa79afd3..193decaec1 100644
> > > > > --- a/hw/arm/Kconfig
> > > > > +++ b/hw/arm/Kconfig
> > > > > @@ -458,6 +458,11 @@ config ASPEED_SOC
> > > > >   select PMBUS
> > > > >   select MAX31785
> > > > > +config FBOBMC_AST
> > > > > +bool
> > > > > +select ASPEED_SOC
> > > > > +select TPM_TIS_SPI
> > > > > +
> > > > >   config MPS2
> > > > >   bool
> > > > >   imply I2C_DEVICES
> > > > > diff --git a/hw/tpm/Kconfig b/hw/tpm/Kconfig
> > > > > index 29e82f3c92..370a43f045 100644
> > > > > --- a/hw/tpm/Kconfig
> > > > > +++ b/hw/tpm/Kconfig
> > > > > @@ -8,6 +8,11 @@ config TPM_TIS_SYSBUS
> > > > >   depends on TPM
> > > > >   select TPM_TIS
> > > > > +config TPM_TIS_SPI
> > > > > +bool
> > > > > +depends on TPM
> > > > > +select TPM_TIS
> > > > > +
> > > > >   config TPM_TIS
> > > > >   bool
> > > > >   depends on TPM
> > > > > diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
> > > > > index 1c68d81d6a..1a057f4e36 100644
> > > > > --- a/hw/tpm/meson.build
> > > > > +++ b/hw/tpm/meson.build
> > > > > @@ -2,6 +2,7 @@ softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: 
> > > > > files('tpm_tis_common.c'))
> > > > >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: 
> > > > > files('tpm_tis_isa.c'))
> > > > >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
> > > > > files('tpm_tis_sysbus.c'))
> > > > >   softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
> > > > > +softmmu_ss.add(when: 'CONFIG_TPM_TIS_SPI', if_true: 
> > > > > files('tpm_tis_spi.c'))
> > > > >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_TIS'], 
> > > > > if_true: files('tpm_ppi.c'))
> > > > >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_CRB'], 
> > > > > if_true: files('tpm_ppi.c'))
> > > > > diff --git a/hw/tpm/tpm_tis_spi.c b/hw/tpm/tpm_tis_spi.c
> > > > > new file mode 100644
> > > > > index 00..c98ddcfddb
> > > > > --- /dev/null
> > > > > +++ b/hw/tpm/tpm_tis_spi.c
> > > > > @@ -0,0 +1,311 @@
> > > > > +#include "qemu/osdep.h"
> > > > > +#include "hw/qdev-properties.h"
> > > > > +#include "migration/vmstate.h"
> > > > > +#include "hw/acpi/tpm.h"
> > > > > +#include "tpm_prop.h"
> > > > > +#include "tpm_tis.h"
> > > > > +#include "qom/object.h"
> > > > > +#include "hw/ssi/ssi.h"
> > > > > +#include "hw/ssi/spi_gpio.h"
> > > > > +
> > > > > +#define TPM_TIS_SPI_ADDR_BYTES 3
> > > > > +#define SPI_WRITE 0
> > > > > +
> > > > > +typedef enum {
> > > > > +TIS_SPI_PKT_STATE_DEACTIVATED = 0,
> > > > > +TIS_SPI_PKT_STATE_START,
> > > > > +TIS_SPI_PKT_STATE_ADDRESS,
> > > > > +TIS_SPI_PKT_STATE_DATA_WR,
> > > > > +TIS_SPI_PKT_STATE_DATA_RD,
> > > > > +TIS_SPI_PKT_STATE_DONE,
> > > > > +} TpmTisSpiPktState;
> > > > > +
> > > > > +union TpmTisRWSizeByte {
> > > > > +uint8_t byte;
> > > > > +struct {
> > > > > +uint8_t data_expected_size:6;
> > > > > +uint8_t resv:1;
> > > > > +uint8_t rwflag:1;
> > > > > +};
> > > > > +};
> > > > > +
> > > > > +union TpmTisSpiHwAddr {
> > > > > +hwaddr addr;
> > > > > +uint8_t bytes[sizeof(hwaddr)];
> > > > > +};
> > > > > +
> > > > > +union TpmTisSpiData {
> > > > > +uint32_t data;
> > > > > +uint8_t bytes[64];
> > > > > +};
> > > > > +
> > > > > +struct TpmTisSpiState {
> > > > > +/*< private >*/
> > > > > +SSIPeripheral parent_obj;
> > > > > +
> > > > > +

Re: [PATCH] target/riscv: Fix priority of csr related check in riscv_csrrw_check

2022-08-04 Thread Alistair Francis

On Wed, Aug 3, 2022 at 10:56 PM Weiwei Li  wrote:
>
> Normally, riscv_csrrw_check is called when executing Zicsr instructions.
> And we can only do access control for existed CSRs. So the priority of
> CSR related check, from highest to lowest, should be as follows:
> 1) check whether Zicsr is supported: raise RISCV_EXCP_ILLEGAL_INST if not
> 2) check whether csr is existed: raise RISCV_EXCP_ILLEGAL_INST if not
> 3) do access control: raise RISCV_EXCP_ILLEGAL_INST or RISCV_EXCP_VIRT_
> INSTRUCTION_FAULT if not allowed
>
> The predicates contain parts of function of both 2) and 3), So they need
> to be placed in the middle of riscv_csrrw_check
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/csr.c | 44 +---
>  1 file changed, 25 insertions(+), 19 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 0fb042b2fd..d81f466c80 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -3270,6 +3270,30 @@ static inline RISCVException 
> riscv_csrrw_check(CPURISCVState *env,
>  /* check privileges and return RISCV_EXCP_ILLEGAL_INST if check fails */
>  int read_only = get_field(csrno, 0xC00) == 3;
>  int csr_min_priv = csr_ops[csrno].min_priv_ver;
> +
> +/* ensure the CSR extension is enabled. */
> +if (!cpu->cfg.ext_icsr) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +if (env->priv_ver < csr_min_priv) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +/* check predicate */
> +if (!csr_ops[csrno].predicate) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +if (write_mask && read_only) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +RISCVException ret = csr_ops[csrno].predicate(env, csrno);
> +if (ret != RISCV_EXCP_NONE) {
> +return ret;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  int csr_priv, effective_priv = env->priv;
>
> @@ -3290,25 +3314,7 @@ static inline RISCVException 
> riscv_csrrw_check(CPURISCVState *env,
>  return RISCV_EXCP_ILLEGAL_INST;
>  }
>  #endif
> -if (write_mask && read_only) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -/* ensure the CSR extension is enabled. */
> -if (!cpu->cfg.ext_icsr) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -/* check predicate */
> -if (!csr_ops[csrno].predicate) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -if (env->priv_ver < csr_min_priv) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -return csr_ops[csrno].predicate(env, csrno);
> +return RISCV_EXCP_NONE;
>  }
>
>  static RISCVException riscv_csrrw_do64(CPURISCVState *env, int csrno,
> --
> 2.17.1
>
>

[PATCH v2 0/2] This patch adds runtime check of AVX512

2022-08-04 Thread ling xu

This patch adds runtime check of AVX512 on running machine and update
avx512 support for xbzrle_encode_buffer function to accelerate xbzrle
encoding speed.

The runtime check is added in meson.build and meson_options.txt.

The updated AVX512 algorithm is provided in ram.c, xbzrle.h and
xbzrle.c.

The test code is provided in test-xbzrle.c.

Previous discussion is refered below:
https://lore.kernel.org/all/ytlshitevijwe...@redhat.com/

ling xu (2):
  Update AVX512 support for xbzrle_encode_buffer function
  Test code for AVX512 support for xbzrle_encode_buffer function

 meson.build  | 211 +++
 meson_options.txt|  28 
 migration/ram.c  |  41 ++
 migration/xbzrle.c   | 181 +++
 migration/xbzrle.h   |   4 +
 tests/unit/test-xbzrle.c | 307 ---
 6 files changed, 755 insertions(+), 17 deletions(-)

-- 
2.25.1

[PATCH v2 2/2] Test code for AVX512 support for xbzrle_encode_buffer function

2022-08-04 Thread ling xu

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 
---
 tests/unit/test-xbzrle.c | 307 ---
 1 file changed, 290 insertions(+), 17 deletions(-)

diff --git a/tests/unit/test-xbzrle.c b/tests/unit/test-xbzrle.c
index ef951b6e54..653016826f 100644
--- a/tests/unit/test-xbzrle.c
+++ b/tests/unit/test-xbzrle.c
@@ -38,111 +38,280 @@ static void test_uleb(void)
 g_assert(val == 0);
 }
 
-static void test_encode_decode_zero(void)
+static float *test_encode_decode_zero(void)
 {
 uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE);
 uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE);
 int i = 0;
-int dlen = 0;
+int dlen = 0, dlen512 = 0;
 int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006);
 
 for (i = diff_len; i > 0; i--) {
 buffer[1000 + i] = i;
+buffer512[1000 + i] = i;
 }
 
 buffer[1000 + diff_len + 3] = 103;
 buffer[1000 + diff_len + 5] = 105;
 
+buffer512[1000 + diff_len + 3] = 103;
+buffer512[1000 + diff_len + 5] = 105;
+
 /* encode zero page */
+time_t t_start, t_end, t_start512, t_end512;
+t_start = clock();
 dlen = xbzrle_encode_buffer(buffer, buffer, XBZRLE_PAGE_SIZE, compressed,
XBZRLE_PAGE_SIZE);
+t_end = clock();
+float time_val = difftime(t_end, t_start);
 g_assert(dlen == 0);
 
+t_start512 = clock();
+dlen512 = xbzrle_encode_buffer_512(buffer512, buffer512, XBZRLE_PAGE_SIZE,
+   compressed512, XBZRLE_PAGE_SIZE);
+t_end512 = clock();
+float time_val512 = difftime(t_end512, t_start512);
+g_assert(dlen512 == 0);
+
+static float result_zero[2];
+result_zero[0] = time_val;
+result_zero[1] = time_val512;
+
 g_free(buffer);
 g_free(compressed);
+g_free(buffer512);
+g_free(compressed512);
+
+return result_zero;
+}
+
+static void test_encode_decode_zero_range(void)
+{
+int i;
+float time_raw = 0.0, time_512 = 0.0;
+float *res;
+for (i = 0; i < 1; i++) {
+res = test_encode_decode_zero();
+time_raw += res[0];
+time_512 += res[1];
+}
+printf("Zero test:\n");
+printf("Raw xbzrle_encode time is %f ms\n", time_raw);
+printf("512 xbzrle_encode time is %f ms\n", time_512);
 }
 
-static void test_encode_decode_unchanged(void)
+static float *test_encode_decode_unchanged(void)
 {
 uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE);
 uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE);
 int i = 0;
-int dlen = 0;
+int dlen = 0, dlen512 = 0;
 int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006);
 
 for (i = diff_len; i > 0; i--) {
 test[1000 + i] = i + 4;
+test512[1000 + i] = i + 4;
 }
 
 test[1000 + diff_len + 3] = 107;
 test[1000 + diff_len + 5] = 109;
 
+test512[1000 + diff_len + 3] = 107;
+test512[1000 + diff_len + 5] = 109;
+
 /* test unchanged buffer */
+time_t t_start, t_end, t_start512, t_end512;
+t_start = clock();
 dlen = xbzrle_encode_buffer(test, test, XBZRLE_PAGE_SIZE, compressed,
 XBZRLE_PAGE_SIZE);
+t_end = clock();
+float time_val = difftime(t_end, t_start);
 g_assert(dlen == 0);
 
+t_start512 = clock();
+dlen512 = xbzrle_encode_buffer_512(test512, test512, XBZRLE_PAGE_SIZE,
+   compressed512, XBZRLE_PAGE_SIZE);
+t_end512 = clock();
+float time_val512 = difftime(t_end512, t_start512);
+g_assert(dlen512 == 0);
+
+static float result_unchanged[2];
+result_unchanged[0] = time_val;
+result_unchanged[1] = time_val512;
+
 g_free(test);
 g_free(compressed);
+g_free(test512);
+g_free(compressed512);
+
+return result_unchanged;
 }
 
-static void test_encode_decode_1_byte(void)
+static void test_encode_decode_unchanged_range(void)
+{
+int i;
+float time_raw = 0.0, time_512 = 0.0;
+float *res;
+for (i = 0; i < 1; i++) {
+res = test_encode_decode_unchanged();
+time_raw += res[0];
+time_512 += res[1];
+}
+printf("Unchanged test:\n");
+printf("Raw xbzrle_encode time is %f ms\n", time_raw);
+printf("512 xbzrle_encode time is %f ms\n", time_512);
+}
+
+static float *test_encode_decode_1_byte(void)
 {
 uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE);
 uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE);
 uint8_t *compressed = g_malloc(XBZRLE_PAGE_SIZE);
-int dlen = 0, rc = 0;
+uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE);
+uint8_t *compressed512 = g_malloc(XBZRLE_PAGE_SIZE);
+int dlen = 0, rc = 0, dlen512 = 0, rc512 = 0;

[PATCH v2 1/2] Update AVX512 support for xbzrle_encode_buffer function

2022-08-04 Thread ling xu

This commit adds runtime check of AVX512 on running machine, and implements 
AVX512 of
xbzrle_encode_buffer function to accelerate xbzrle encoding speed.
Compared with C version of xbzrle_encode_buffer function, AVX512 version
can achieve almost 60%-70% performance improvement on unit test provided
by qemu. In addition, we provide one more unit test called
"test_encode_decode_random", in which dirty data are randomly located in
4K page, and this case can achieve almost 140% performance gain.

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 
---
 meson.build| 211 +
 meson_options.txt  |  28 ++
 migration/ram.c|  41 +
 migration/xbzrle.c | 181 ++
 migration/xbzrle.h |   4 +
 5 files changed, 465 insertions(+)

diff --git a/meson.build b/meson.build
index 294e9a8f32..9228df2442 100644
--- a/meson.build
+++ b/meson.build
@@ -2262,6 +2262,217 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
get_option('avx512f') \
 int main(int argc, char *argv[]) { return bar(argv[0]); }
   '''), error_message: 'AVX512F not available').allowed())
 
+config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512BW') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512bw")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512i x = *(__m512i *)a;
+  __m512i res= _mm512_abs_epi8(x);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
+  '''), error_message: 'AVX512BW not available').allowed())
+
+config_host_data.set('CONFIG_AVX512CD_OPT', get_option('avx512cd') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512CD') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512cd")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512i x = *(__m512i *)a;
+  __mmask16 k;
+  __m512i res= _mm512_maskz_lzcnt_epi32 (k, x);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
+  '''), error_message: 'AVX512CD not available').allowed())
+
+config_host_data.set('CONFIG_AVX512DQ_OPT', get_option('avx512dq') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512D') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512dq")
+#include 
+#include 
+static int bar(void *a) {
+
+  __mmask x = *(__mmask *)a;
+  __mmask8 b;
+  return _kxor_mask8(x,b);
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
+  '''), error_message: 'AVX512DQ not available').allowed())
+
+config_host_data.set('CONFIG_AVX512ER_OPT', get_option('avx512er') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512ER') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512er")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512d x = *(__m512d *)a;
+  __m512d res=_mm512_rsqrt28_pd(x);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
+  '''), error_message: 'AVX512ER not available').allowed())
+
+
+config_host_data.set('CONFIG_AVX512IFMA52_OPT', get_option('avx512ifma52') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512ER') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512ifma")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512i x = *(__m512i *)a;
+  __m512i b,c;
+  __m512i res= _mm512_madd52lo_epu64 (x, b, c);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
+  '''), error_message: 'AVX512IFMA52 not available').allowed())
+
+
+config_host_data.set('CONFIG_AVX512PF_OPT', get_option('avx512pf') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512PF') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512pf")
+#include 
+#include 
+static void bar(void *a) {
+  char* base_addr;
+  __mmask8 k;
+  __m512i vindex = *(__m512i *)a;
+  _mm512_mask_prefetch_i64scatter_pd (base_addr, k, vindex, 1, 2);
+}
+int main(int argc, char *argv[]) { bar(argv[0]); return 0;}
+  '''), error_message: 'AVX512PF not available').allowed())
+
+
+config_host_data.set('CONFIG_AVX512VPOPCNTDQ_OPT', 
get_option('avx512vpopcntdq') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512VPOPCNTDQ') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512vpopcntdq")
+#include 
+#include 
+static int bar(void *a) {
+  __m512i x = *(__m512i *)a;
+  __mmask8 k;
+  __m512i

Re: [PATCH] hw/riscv: remove 'fdt' param from riscv_setup_rom_reset_vec()

2022-08-04 Thread Alistair Francis

On Fri, Jul 29, 2022 at 4:19 AM Daniel Henrique Barboza
 wrote:
>
> The 'fdt' param is not being used in riscv_setup_rom_reset_vec().
> Simplify the API by removing it. While we're at it, remove the redundant
> 'return' statement at the end of function.
>
> Cc: Palmer Dabbelt 
> Cc: Alistair Francis 
> Cc: Bin Meng 
> Cc: Vijai Kumar K 
> Signed-off-by: Daniel Henrique Barboza 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  hw/riscv/boot.c| 4 +---
>  hw/riscv/microchip_pfsoc.c | 2 +-
>  hw/riscv/shakti_c.c| 3 +--
>  hw/riscv/spike.c   | 2 +-
>  hw/riscv/virt.c| 2 +-
>  include/hw/riscv/boot.h| 2 +-
>  6 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
> index 06b4fc5ac3..1ae7596873 100644
> --- a/hw/riscv/boot.c
> +++ b/hw/riscv/boot.c
> @@ -286,7 +286,7 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
> RISCVHartArrayState *harts
> hwaddr start_addr,
> hwaddr rom_base, hwaddr rom_size,
> uint64_t kernel_entry,
> -   uint64_t fdt_load_addr, void *fdt)
> +   uint64_t fdt_load_addr)
>  {
>  int i;
>  uint32_t start_addr_hi32 = 0x;
> @@ -326,8 +326,6 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
> RISCVHartArrayState *harts
>rom_base, _space_memory);
>  riscv_rom_copy_firmware_info(machine, rom_base, rom_size, 
> sizeof(reset_vec),
>   kernel_entry);
> -
> -return;
>  }
>
>  void riscv_setup_direct_kernel(hwaddr kernel_addr, hwaddr fdt_addr)
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index 10a5d0e501..7313153606 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -583,7 +583,7 @@ static void 
> microchip_icicle_kit_machine_init(MachineState *machine)
>  riscv_setup_rom_reset_vec(machine, >soc.u_cpus, 
> firmware_load_addr,
>memmap[MICROCHIP_PFSOC_ENVM_DATA].base,
>memmap[MICROCHIP_PFSOC_ENVM_DATA].size,
> -  kernel_entry, fdt_load_addr, machine->fdt);
> +  kernel_entry, fdt_load_addr);
>  }
>  }
>
> diff --git a/hw/riscv/shakti_c.c b/hw/riscv/shakti_c.c
> index 90e2cf609f..e43cc9445c 100644
> --- a/hw/riscv/shakti_c.c
> +++ b/hw/riscv/shakti_c.c
> @@ -66,8 +66,7 @@ static void shakti_c_machine_state_init(MachineState 
> *mstate)
>  riscv_setup_rom_reset_vec(mstate, >soc.cpus,
>shakti_c_memmap[SHAKTI_C_RAM].base,
>shakti_c_memmap[SHAKTI_C_ROM].base,
> -  shakti_c_memmap[SHAKTI_C_ROM].size, 0, 0,
> -  NULL);
> +  shakti_c_memmap[SHAKTI_C_ROM].size, 0, 0);
>  if (mstate->firmware) {
>  riscv_load_firmware(mstate->firmware,
>  shakti_c_memmap[SHAKTI_C_RAM].base,
> diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
> index e41b6aa9f0..5ba34543c8 100644
> --- a/hw/riscv/spike.c
> +++ b/hw/riscv/spike.c
> @@ -308,7 +308,7 @@ static void spike_board_init(MachineState *machine)
>  riscv_setup_rom_reset_vec(machine, >soc[0], memmap[SPIKE_DRAM].base,
>memmap[SPIKE_MROM].base,
>memmap[SPIKE_MROM].size, kernel_entry,
> -  fdt_load_addr, s->fdt);
> +  fdt_load_addr);
>
>  /* initialize HTIF using symbols found in load_kernel */
>  htif_mm_init(system_memory, mask_rom,
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index bc424dd2f5..2e9ed2628c 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -1299,7 +1299,7 @@ static void virt_machine_done(Notifier *notifier, void 
> *data)
>  riscv_setup_rom_reset_vec(machine, >soc[0], start_addr,
>virt_memmap[VIRT_MROM].base,
>virt_memmap[VIRT_MROM].size, kernel_entry,
> -  fdt_load_addr, machine->fdt);
> +  fdt_load_addr);
>
>  /*
>   * Only direct boot kernel is currently supported for KVM VM,
> diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
> index d2db29721a..a36f7618f5 100644
> --- a/include/hw/riscv/boot.h
> +++ b/include/hw/riscv/boot.h
> @@ -51,7 +51,7 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
> RISCVHartArrayState *harts
> hwaddr saddr,
> hwaddr rom_base, hwaddr rom_size,
> uint64_t kernel_entry,
> -   uint64_t fdt_load_addr, void *fdt);
> +   uint64_t fdt_load_addr);
>  void

Re: [PATCH] target/riscv: Fix priority of csr related check in riscv_csrrw_check

2022-08-04 Thread Alistair Francis

On Wed, Aug 3, 2022 at 10:56 PM Weiwei Li  wrote:
>
> Normally, riscv_csrrw_check is called when executing Zicsr instructions.
> And we can only do access control for existed CSRs. So the priority of
> CSR related check, from highest to lowest, should be as follows:
> 1) check whether Zicsr is supported: raise RISCV_EXCP_ILLEGAL_INST if not
> 2) check whether csr is existed: raise RISCV_EXCP_ILLEGAL_INST if not
> 3) do access control: raise RISCV_EXCP_ILLEGAL_INST or RISCV_EXCP_VIRT_
> INSTRUCTION_FAULT if not allowed
>
> The predicates contain parts of function of both 2) and 3), So they need
> to be placed in the middle of riscv_csrrw_check
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/csr.c | 44 +---
>  1 file changed, 25 insertions(+), 19 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 0fb042b2fd..d81f466c80 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -3270,6 +3270,30 @@ static inline RISCVException 
> riscv_csrrw_check(CPURISCVState *env,
>  /* check privileges and return RISCV_EXCP_ILLEGAL_INST if check fails */
>  int read_only = get_field(csrno, 0xC00) == 3;
>  int csr_min_priv = csr_ops[csrno].min_priv_ver;
> +
> +/* ensure the CSR extension is enabled. */
> +if (!cpu->cfg.ext_icsr) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +if (env->priv_ver < csr_min_priv) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +/* check predicate */
> +if (!csr_ops[csrno].predicate) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +if (write_mask && read_only) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +
> +RISCVException ret = csr_ops[csrno].predicate(env, csrno);
> +if (ret != RISCV_EXCP_NONE) {
> +return ret;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  int csr_priv, effective_priv = env->priv;
>
> @@ -3290,25 +3314,7 @@ static inline RISCVException 
> riscv_csrrw_check(CPURISCVState *env,
>  return RISCV_EXCP_ILLEGAL_INST;
>  }
>  #endif
> -if (write_mask && read_only) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -/* ensure the CSR extension is enabled. */
> -if (!cpu->cfg.ext_icsr) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -/* check predicate */
> -if (!csr_ops[csrno].predicate) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -if (env->priv_ver < csr_min_priv) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -
> -return csr_ops[csrno].predicate(env, csrno);
> +return RISCV_EXCP_NONE;
>  }
>
>  static RISCVException riscv_csrrw_do64(CPURISCVState *env, int csrno,
> --
> 2.17.1
>
>

Re: [PATCH v7 02/12] vhost: use SVQ element ndescs instead of opaque data for desc validation

2022-08-04 Thread Jason Wang

On Fri, Aug 5, 2022 at 2:29 AM Eugenio Pérez  wrote:
>
> Since we're going to allow SVQ to add elements without the guest's
> knowledge and without its own VirtQueueElement, it's easier to check if
> an element is a valid head checking a different thing than the
> VirtQueueElement.
>
> Signed-off-by: Eugenio Pérez 
> ---

Acked-by: Jason Wang 

>  hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index ffd2b2c972..e6eebd0e8d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -414,7 +414,7 @@ static VirtQueueElement 
> *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>  return NULL;
>  }
>
> -if (unlikely(!svq->desc_state[used_elem.id].elem)) {
> +if (unlikely(!svq->desc_state[used_elem.id].ndescs)) {
>  qemu_log_mask(LOG_GUEST_ERROR,
>  "Device %s says index %u is used, but it was not available",
>  svq->vdev->name, used_elem.id);
> @@ -422,6 +422,7 @@ static VirtQueueElement 
> *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>  }
>
>  num = svq->desc_state[used_elem.id].ndescs;
> +svq->desc_state[used_elem.id].ndescs = 0;
>  last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
>  svq->desc_next[last_used_chain] = svq->free_head;
>  svq->free_head = used_elem.id;
> --
> 2.31.1
>

Re: [PATCH v7 04/12] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush

2022-08-04 Thread Jason Wang

On Fri, Aug 5, 2022 at 2:29 AM Eugenio Pérez  wrote:
>
> Since QEMU will be able to inject new elements on CVQ to restore the
> state, we need not to depend on a VirtQueueElement to know if a new
> element has been used by the device or not. Instead of check that, check
> if there are new elements only using used idx on vhost_svq_flush.
>
> Signed-off-by: Eugenio Pérez 
> ---

Acked-by: Jason Wang 

> v6: Change less from the previous function
> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index 1b49bf54f2..f863b08627 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -499,17 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>  size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
>  {
>  int64_t start_us = g_get_monotonic_time();
> +uint32_t len;
> +
>  do {
> -uint32_t len;
> -VirtQueueElement *elem = vhost_svq_get_buf(svq, );
> -if (elem) {
> -return len;
> +if (vhost_svq_more_used(svq)) {
> +break;
>  }
>
>  if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
>  return 0;
>  }
>  } while (true);
> +
> +vhost_svq_get_buf(svq, );
> +return len;
>  }
>
>  /**
> --
> 2.31.1
>

Re: [PATCH v7 01/12] vhost: stop transfer elem ownership in vhost_handle_guest_kick

2022-08-04 Thread Jason Wang

On Fri, Aug 5, 2022 at 2:29 AM Eugenio Pérez  wrote:
>
> It was easier to allow vhost_svq_add to handle the memory. Now that we
> will allow qemu to add elements to a SVQ without the guest's knowledge,
> it's better to handle it in the caller.
>
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index e4956728dd..ffd2b2c972 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -233,9 +233,6 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>  /**
>   * Add an element to a SVQ.
>   *
> - * The caller must check that there is enough slots for the new element. It
> - * takes ownership of the element: In case of failure not ENOSPC, it is free.
> - *
>   * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
>   */
>  int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> @@ -252,7 +249,6 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct 
> iovec *out_sg,
>
>  ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, 
> _head);
>  if (unlikely(!ok)) {
> -g_free(elem);
>  return -EINVAL;
>  }
>
> @@ -293,7 +289,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
> *svq)
>  virtio_queue_set_notification(svq->vq, false);
>
>  while (true) {
> -VirtQueueElement *elem;
> +g_autofree VirtQueueElement *elem;
>  int r;
>
>  if (svq->next_guest_avail_elem) {
> @@ -324,12 +320,14 @@ static void 
> vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   * queue the current guest descriptor and ignore kicks
>   * until some elements are used.
>   */
> -svq->next_guest_avail_elem = elem;
> +svq->next_guest_avail_elem = g_steal_pointer();
>  }
>
>  /* VQ is full or broken, just return and ignore kicks */
>  return;
>  }
> +/* elem belongs to SVQ or external caller now */
> +elem = NULL;
>  }
>
>  virtio_queue_set_notification(svq->vq, true);
> --
> 2.31.1
>

Re: [PATCH v7 03/12] vhost: Delete useless read memory barrier

2022-08-04 Thread Jason Wang

On Fri, Aug 5, 2022 at 2:29 AM Eugenio Pérez  wrote:
>
> As discussed in previous series [1], this memory barrier is useless with
> the atomic read of used idx at vhost_svq_more_used. Deleting it.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02616.html
>
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index e6eebd0e8d..1b49bf54f2 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -509,9 +509,6 @@ size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
>  if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
>  return 0;
>  }
> -
> -/* Make sure we read new used_idx */
> -smp_rmb();
>  } while (true);
>  }
>
> --
> 2.31.1
>

Re: [PATCH] net/colo.c: Fix the pointer issuse reported by Coverity.

2022-08-04 Thread Jason Wang

On Tue, Aug 2, 2022 at 4:24 PM Zhang Chen  wrote:
>
> When enable the virtio-net-pci, guest network packet will
> load the vnet_hdr. In COLO status, the primary VM's network
> packet maybe redirect to another VM, it need filter-redirect
> enable the vnet_hdr flag at the same time, COLO-proxy will
> correctly parse the original network packet. If have any
> misconfiguration here, the vnet_hdr_len is wrong for parse
> the packet, the data+offset will point to wrong place.
>
> Signed-off-by: Zhang Chen 
> ---
>  net/colo.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/net/colo.c b/net/colo.c
> index 6b0ff562ad..dfb15b4c14 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -44,21 +44,25 @@ int parse_packet_early(Packet *pkt)
>  {
>  int network_length;
>  static const uint8_t vlan[] = {0x81, 0x00};
> -uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> +uint8_t *data = pkt->data;
>  uint16_t l3_proto;
>  ssize_t l2hdr_len;
>
>  if (data == NULL) {

I wonder under which case we can see data == NULL?

AFAIK, data is either dup via packet_new() or assigned to a pointer to
the buf in packet_new_nocopy().

Thanks

> -trace_colo_proxy_main_vnet_info("This packet is not parsed 
> correctly, "
> -"pkt->vnet_hdr_len", 
> pkt->vnet_hdr_len);
> +trace_colo_proxy_main("COLO-proxy got NULL data packet ");
>  return 1;
>  }
> -l2hdr_len = eth_get_l2_hdr_length(data);
>
> -if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
> -trace_colo_proxy_main("pkt->size < ETH_HLEN");
> +/* Check the received vnet_hdr_len then add the offset */
> +if (pkt->size < sizeof(struct eth_header) + sizeof(struct vlan_header)
> ++ pkt->vnet_hdr_len) {
> +trace_colo_proxy_main_vnet_info("This packet may be load wrong "
> +"pkt->vnet_hdr_len", 
> pkt->vnet_hdr_len);
>  return 1;
>  }
> +data += pkt->vnet_hdr_len;
> +
> +l2hdr_len = eth_get_l2_hdr_length(data);
>
>  /*
>   * TODO: support vlan.
> --
> 2.25.1
>

Re: [PATCH] hw/net/rocker: Avoid undefined shifts with more than 31 ports

2022-08-04 Thread Jason Wang

On Thu, Aug 4, 2022 at 11:27 PM Richard Henderson
 wrote:
>
> On 8/4/22 03:45, Peter Maydell wrote:
> > Ping?
> >
> > thanks
> > -- PMM
> >
> > On Fri, 29 Jul 2022 at 16:59, Peter Maydell  
> > wrote:
> >>
> >> In rocker_port_phys_link_status() and rocker_port_phys_enable_read()
> >> we construct a 64-bit value with one bit per front-panel port.
> >> However we accidentally do the shift as 32-bit arithmetic, which
> >> means that if there are more than 31 front-panel ports this is
> >> undefined behaviour.
> >>
> >> Fix the problem by ensuring we use 64-bit arithmetic for the whole
> >> calculation. (We won't ever shift off the 64-bit value because
> >> ROCKER_FP_PORTS_MAX is 62.)
> >>
> >> Resolves: Coverity CID 1487121, 1487160
> >> Signed-off-by: Peter Maydell 
>
> Reviewed-by: Richard Henderson 

Queued.

Thanks

>
>
> r~
>
> >> ---
> >>   hw/net/rocker/rocker.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
> >> index 31f2340fb91..d8f3f16fe87 100644
> >> --- a/hw/net/rocker/rocker.c
> >> +++ b/hw/net/rocker/rocker.c
> >> @@ -1010,7 +1010,7 @@ static uint64_t rocker_port_phys_link_status(Rocker 
> >> *r)
> >>   FpPort *port = r->fp_port[i];
> >>
> >>   if (fp_port_get_link_up(port)) {
> >> -status |= 1 << (i + 1);
> >> +status |= 1ULL << (i + 1);
> >>   }
> >>   }
> >>   return status;
> >> @@ -1025,7 +1025,7 @@ static uint64_t rocker_port_phys_enable_read(Rocker 
> >> *r)
> >>   FpPort *port = r->fp_port[i];
> >>
> >>   if (fp_port_enabled(port)) {
> >> -ret |= 1 << (i + 1);
> >> +ret |= 1ULL << (i + 1);
> >>   }
> >>   }
> >>   return ret;
> >
>

[BUG] cxl can not create region

2022-08-04 Thread Bobo WL

Hi list

I want to test cxl functions in arm64, and found some problems I can't
figure out.

My test environment:

1. build latest bios from https://github.com/tianocore/edk2.git master
branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
support patch: 
https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-jonathan.came...@huawei.com/
3. build Linux kernel from
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git preview
branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
4. build latest ndctl tools from https://github.com/pmem/ndctl
create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)

And my qemu test commands:
sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
-cpu max -smp 8 -nographic -no-reboot \
-kernel $KERNEL -bios $BIOS_BIN \
-drive if=none,file=$ROOTFS,format=qcow2,id=hd \
-device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
nokaslr dyndbg="module cxl* +p"' \
-object memory-backend-ram,size=4G,id=mem0 \
-numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-net nic -net user,hostfwd=tcp::-:22 -enable-kvm \
-object
memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
\
-object
memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
\
-object
memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
\
-object
memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
\
-object
memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
\
-object
memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
\
-object
memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
\
-object
memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
\
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device
cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device
cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device
cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device
cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
-M 
cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k

And I have got two problems.
1. When I want to create x1 region with command: "cxl create-region -d
decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
reference. Crash log:

[  534.697324] cxl_region region0: config state: 0
[  534.697346] cxl_region region0: probe: -6
[  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
[  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
[  534.699149] cxl region0: :0d:00.0:port2 decoder2.0 add:
mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
[  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
mem0:decoder3.0 @ 0 next: :0d:00.0 nr_eps: 1 nr_targets: 1
[  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
[  534.699182] cxl region0: ACPI0016:00:port1 target[0] = :0c:00.0
for mem0:decoder3.0 @ 0
[  534.699189] cxl region0: :0d:00.0:port2 iw: 1 ig: 256
[  534.699193] cxl region0: :0d:00.0:port2 target[0] =
:0e:00.0 for mem0:decoder3.0 @ 0
[  534.699405] Unable to handle kernel NULL pointer dereference at
virtual address 
[  534.701474] Mem abort info:
[  534.701994]   ESR = 0x8604
[  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
[  534.703616]   SET = 0, FnV = 0
[  534.704174]   EA = 0, S1PTW = 0
[  534.704803]   FSC = 0x04: level 0 translation fault
[  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=00010144a000
[  534.706875] [] pgd=, p4d=
[  534.709855] Internal error: Oops: 8604 [#1] PREEMPT SMP
[  534.710301] Modules linked in:
[  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
[  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  534.717179] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  534.719190] pc : 0x0
[  534.719928] lr : commit_store+0x118/0x2cc
[  534.721007] sp : 8aec3c30
[  534.721793] x29: 8aec3c30 x28: da62e740 x27: c0c06b30
[

[PATCH for-7.1 v2 2/5] target/loongarch: add gdb_arch_name()

2022-08-04 Thread Song Gao

Matches bfd/cpu-loongarch.c, bfd_loongarch_arch.

Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index d84ec38cf7..941e2772bc 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -661,6 +661,11 @@ static const struct SysemuCPUOps loongarch_sysemu_ops = {
 };
 #endif
 
+static gchar *loongarch_gdb_arch_name(CPUState *cs)
+{
+return g_strdup("loongarch64");
+}
+
 static void loongarch_cpu_class_init(ObjectClass *c, void *data)
 {
 LoongArchCPUClass *lacc = LOONGARCH_CPU_CLASS(c);
@@ -686,6 +691,7 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 cc->gdb_num_core_regs = 35;
 cc->gdb_core_xml_file = "loongarch-base64.xml";
 cc->gdb_stop_before_watchpoint = true;
+cc->gdb_arch_name = loongarch_gdb_arch_name;
 
 #ifdef CONFIG_TCG
 cc->tcg_ops = _tcg_ops;
-- 
2.31.1

[PATCH for-7.1 v2 1/5] target/loongarch: Fix GDB get the wrong pc

2022-08-04 Thread Song Gao

GDB LoongArch add a register orig_a0, see the base64.xml [1].
We should add the orig_a0 to match the upstream GDB.

[1]: 
https://github.com/bminor/binutils-gdb/blob/master/gdb/features/loongarch/base64.xml

Signed-off-by: Song Gao 
---
 gdb-xml/loongarch-base64.xml | 1 +
 target/loongarch/cpu.c   | 2 +-
 target/loongarch/gdbstub.c   | 7 +--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gdb-xml/loongarch-base64.xml b/gdb-xml/loongarch-base64.xml
index 4962bdbd28..a1dd4f2208 100644
--- a/gdb-xml/loongarch-base64.xml
+++ b/gdb-xml/loongarch-base64.xml
@@ -39,6 +39,7 @@
   
   
   
+  
   
   
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 1c69a76f2b..d84ec38cf7 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -683,7 +683,7 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 cc->gdb_read_register = loongarch_cpu_gdb_read_register;
 cc->gdb_write_register = loongarch_cpu_gdb_write_register;
 cc->disas_set_info = loongarch_cpu_disas_set_info;
-cc->gdb_num_core_regs = 34;
+cc->gdb_num_core_regs = 35;
 cc->gdb_core_xml_file = "loongarch-base64.xml";
 cc->gdb_stop_before_watchpoint = true;
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 24e126fb2d..5feb43445f 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -19,8 +19,11 @@ int loongarch_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 if (0 <= n && n < 32) {
 return gdb_get_regl(mem_buf, env->gpr[n]);
 } else if (n == 32) {
-return gdb_get_regl(mem_buf, env->pc);
+/* orig_a0 */
+return gdb_get_regl(mem_buf, 0);
 } else if (n == 33) {
+return gdb_get_regl(mem_buf, env->pc);
+} else if (n == 34) {
 return gdb_get_regl(mem_buf, env->CSR_BADV);
 }
 return 0;
@@ -36,7 +39,7 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 if (0 <= n && n < 32) {
 env->gpr[n] = tmp;
 length = sizeof(target_ulong);
-} else if (n == 32) {
+} else if (n == 33) {
 env->pc = tmp;
 length = sizeof(target_ulong);
 }
-- 
2.31.1

[PATCH for-7.1 v2 3/5] target/loongarch: update loongarch-base64.xml

2022-08-04 Thread Song Gao

Update loongarch-base64.xml to match the upstream GDB [1].

[1]:https://github.com/bminor/binutils-gdb/blob/master/gdb/features/loongarch/base64.xml

Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 gdb-xml/loongarch-base64.xml | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gdb-xml/loongarch-base64.xml b/gdb-xml/loongarch-base64.xml
index a1dd4f2208..2d8a1f6b73 100644
--- a/gdb-xml/loongarch-base64.xml
+++ b/gdb-xml/loongarch-base64.xml
@@ -1,5 +1,5 @@

[PATCH for-7.1 v2 4/5] target/loongarch: Update loongarch-fpu.xml

2022-08-04 Thread Song Gao

Rename loongarch-fpu64.xml to loongarch-fpu.xml and update loongarch-fpu.xml to 
match upstream GDB [1]

[1]:https://github.com/bminor/binutils-gdb/blob/master/gdb/features/loongarch/fpu.xml

Signed-off-by: Song Gao 
---
 configs/targets/loongarch64-softmmu.mak |  2 +-
 gdb-xml/loongarch-fpu.xml   | 50 ++
 gdb-xml/loongarch-fpu64.xml | 57 -
 target/loongarch/gdbstub.c  |  2 +-
 4 files changed, 52 insertions(+), 59 deletions(-)
 create mode 100644 gdb-xml/loongarch-fpu.xml
 delete mode 100644 gdb-xml/loongarch-fpu64.xml

diff --git a/configs/targets/loongarch64-softmmu.mak 
b/configs/targets/loongarch64-softmmu.mak
index 483474ba93..9abc99056f 100644
--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@@ -1,5 +1,5 @@
 TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
 TARGET_SUPPORTS_MTTCG=y
-TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu64.xml
+TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
 TARGET_NEED_FDT=y
diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
new file mode 100644
index 00..78e42cf5dd
--- /dev/null
+++ b/gdb-xml/loongarch-fpu.xml
@@ -0,0 +1,50 @@
+
+
+
+
+
+
+  
+
+
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/gdb-xml/loongarch-fpu64.xml b/gdb-xml/loongarch-fpu64.xml
deleted file mode 100644
index e52cf89fbc..00
--- a/gdb-xml/loongarch-fpu64.xml
+++ /dev/null
@@ -1,57 +0,0 @@
-
-
-
-
-
-
-  
-
-
-  
-
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 5feb43445f..d3a5e404b0 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -80,5 +80,5 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
 void loongarch_cpu_register_gdb_regs_for_features(CPUState *cs)
 {
 gdb_register_coprocessor(cs, loongarch_gdb_get_fpu, loongarch_gdb_set_fpu,
- 41, "loongarch-fpu64.xml", 0);
+ 41, "loongarch-fpu.xml", 0);
 }
-- 
2.31.1

[PATCH for-7.1 v2 5/5] target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()

2022-08-04 Thread Song Gao

GDB LoongArch fpu use fcc register,  update gdb_set_fpu() and gdb_get_fpu() to 
match it.

Signed-off-by: Song Gao 
---
 linux-user/loongarch64/signal.c | 24 ++-
 target/loongarch/gdbstub.c  | 34 ++---
 target/loongarch/internals.h|  3 +++
 3 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index 65fd5f3857..7c7afb652e 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -71,26 +71,6 @@ struct extctx_layout {
 struct ctx_layout end;
 };
 
-/* The kernel's sc_save_fcc macro is a sequence of MOVCF2GR+BSTRINS. */
-static uint64_t read_all_fcc(CPULoongArchState *env)
-{
-uint64_t ret = 0;
-
-for (int i = 0; i < 8; ++i) {
-ret |= (uint64_t)env->cf[i] << (i * 8);
-}
-
-return ret;
-}
-
-/* The kernel's sc_restore_fcc macro is a sequence of BSTRPICK+MOVGR2CF. */
-static void write_all_fcc(CPULoongArchState *env, uint64_t val)
-{
-for (int i = 0; i < 8; ++i) {
-env->cf[i] = (val >> (i * 8)) & 1;
-}
-}
-
 static abi_ptr extframe_alloc(struct extctx_layout *extctx,
   struct ctx_layout *sctx, unsigned size,
   unsigned align, abi_ptr orig_sp)
@@ -150,7 +130,7 @@ static void setup_sigframe(CPULoongArchState *env,
 for (i = 0; i < 32; ++i) {
 __put_user(env->fpr[i], _ctx->regs[i]);
 }
-__put_user(read_all_fcc(env), _ctx->fcc);
+__put_user(read_fcc(env), _ctx->fcc);
 __put_user(env->fcsr0, _ctx->fcsr);
 
 /*
@@ -216,7 +196,7 @@ static void restore_sigframe(CPULoongArchState *env,
 __get_user(env->fpr[i], _ctx->regs[i]);
 }
 __get_user(fcc, _ctx->fcc);
-write_all_fcc(env, fcc);
+write_fcc(env, fcc);
 __get_user(env->fcsr0, _ctx->fcsr);
 restore_fp_status(env);
 }
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index d3a5e404b0..a4d1e28e36 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -11,6 +11,24 @@
 #include "internals.h"
 #include "exec/gdbstub.h"
 
+uint64_t read_fcc(CPULoongArchState *env)
+{
+uint64_t ret = 0;
+
+for (int i = 0; i < 8; ++i) {
+ret |= (uint64_t)env->cf[i] << (i * 8);
+}
+
+return ret;
+}
+
+void write_fcc(CPULoongArchState *env, uint64_t val)
+{
+for (int i = 0; i < 8; ++i) {
+env->cf[i] = (val >> (i * 8)) & 1;
+}
+}
+
 int loongarch_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
@@ -51,9 +69,10 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
 {
 if (0 <= n && n < 32) {
 return gdb_get_reg64(mem_buf, env->fpr[n]);
-} else if (32 <= n && n < 40) {
-return gdb_get_reg8(mem_buf, env->cf[n - 32]);
-} else if (n == 40) {
+} else if (n == 32) {
+uint64_t val = read_fcc(env);
+return gdb_get_reg64(mem_buf, val);
+} else if (n == 33) {
 return gdb_get_reg32(mem_buf, env->fcsr0);
 }
 return 0;
@@ -67,10 +86,11 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
 if (0 <= n && n < 32) {
 env->fpr[n] = ldq_p(mem_buf);
 length = 8;
-} else if (32 <= n && n < 40) {
-env->cf[n - 32] = ldub_p(mem_buf);
-length = 1;
-} else if (n == 40) {
+} else if (n == 32) {
+uint64_t val = ldq_p(mem_buf);
+write_fcc(env, val);
+length = 8;
+} else if (n == 33) {
 env->fcsr0 = ldl_p(mem_buf);
 length = 4;
 }
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index ea227362b6..f01635aed6 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -51,6 +51,9 @@ bool loongarch_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 hwaddr loongarch_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 #endif /* !CONFIG_USER_ONLY */
 
+uint64_t read_fcc(CPULoongArchState *env);
+void write_fcc(CPULoongArchState *env, uint64_t val);
+
 int loongarch_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n);
 int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n);
 void loongarch_cpu_register_gdb_regs_for_features(CPUState *cs);
-- 
2.31.1

[PATCH for-7.1 v2 0/5] Fix gdb bugs and update gdb-xml

2022-08-04 Thread Song Gao

Hi,All

This series fiex LoongArch GDB get the wrong pc, because the xml missing
the register orig_a0, and update loongarch gdb-xml to match GDB[1]

[1]:https://github.com/bminor/binutils-gdb/blob/master/gdb/features/loongarch

Please review!


V2:
- Update orig_a0 value to 0;
- Update fcc type to uint64;
- Share write_fcc()/read_fcc();
- Update patch2 commit message.

Thanks.
Song Gao


Song Gao (5):
  target/loongarch: Fix GDB get the wrong pc
  target/loongarch: add gdb_arch_name()
  target/loongarch: update loongarch-base64.xml
  target/loongarch: Update loongarch-fpu.xml
  target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()

 configs/targets/loongarch64-softmmu.mak |  2 +-
 gdb-xml/loongarch-base64.xml| 13 +++---
 gdb-xml/loongarch-fpu.xml   | 50 ++
 gdb-xml/loongarch-fpu64.xml | 57 -
 linux-user/loongarch64/signal.c | 24 +--
 target/loongarch/cpu.c  |  8 +++-
 target/loongarch/gdbstub.c  | 43 ++-
 target/loongarch/internals.h|  3 ++
 8 files changed, 103 insertions(+), 97 deletions(-)
 create mode 100644 gdb-xml/loongarch-fpu.xml
 delete mode 100644 gdb-xml/loongarch-fpu64.xml

-- 
2.31.1

Re: [PATCH v2 2/2] vdpa: do not save failed dma maps in SVQ iova tree

2022-08-04 Thread Jason Wang

On Thu, Aug 4, 2022 at 11:54 PM Eugenio Pérez  wrote:
>
> If a map fails for whatever reason, it must not be saved in the tree.
> Otherwise, qemu will try to unmap it in cleanup, leaving to more errors.
>
> Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
> Reported-by: Lei Yang 
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-vdpa.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 983d3697b0..7e28d2f674 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -176,6 +176,7 @@ static void vhost_vdpa_listener_commit(MemoryListener 
> *listener)
>  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> MemoryRegionSection *section)
>  {
> +DMAMap mem_region = {};
>  struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, 
> listener);
>  hwaddr iova;
>  Int128 llend, llsize;
> @@ -212,13 +213,13 @@ static void 
> vhost_vdpa_listener_region_add(MemoryListener *listener,
>
>  llsize = int128_sub(llend, int128_make64(iova));
>  if (v->shadow_vqs_enabled) {
> -DMAMap mem_region = {
> -.translated_addr = (hwaddr)(uintptr_t)vaddr,
> -.size = int128_get64(llsize) - 1,
> -.perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> -};
> +int r;
>
> -int r = vhost_iova_tree_map_alloc(v->iova_tree, _region);
> +mem_region.translated_addr = (hwaddr)(uintptr_t)vaddr,
> +mem_region.size = int128_get64(llsize) - 1,
> +mem_region.perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> +
> +r = vhost_iova_tree_map_alloc(v->iova_tree, _region);
>  if (unlikely(r != IOVA_OK)) {
>  error_report("Can't allocate a mapping (%d)", r);
>  goto fail;
> @@ -232,11 +233,16 @@ static void 
> vhost_vdpa_listener_region_add(MemoryListener *listener,
>   vaddr, section->readonly);
>  if (ret) {
>  error_report("vhost vdpa map fail!");
> -goto fail;
> +goto fail_map;
>  }
>
>  return;
>
> +fail_map:
> +if (v->shadow_vqs_enabled) {
> +vhost_iova_tree_remove(v->iova_tree, _region);
> +}
> +
>  fail:
>  /*
>   * On the initfn path, store the first error in the container so we
> --
> 2.31.1
>

Re: [PATCH v2 1/2] vdpa: Skip the maps not in the iova tree

2022-08-04 Thread Jason Wang




在 2022/8/4 23:54, Eugenio Pérez 写道:

Next patch will skip the registering of dma maps that the vdpa device
rejects in the iova tree. We need to consider that here or we cause a
SIGSEGV accessing result.

Reported-by: Lei Yang 
Signed-off-by: Eugenio Pérez 



Acked-by: Jason Wang 



---
  hw/virtio/vhost-vdpa.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3ff9ce3501..983d3697b0 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -289,6 +289,10 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
*listener,
  };
  
  result = vhost_iova_tree_find_iova(v->iova_tree, _region);

+if (!result) {
+/* The memory listener map wasn't mapped */
+return;
+}
  iova = result->iova;
  vhost_iova_tree_remove(v->iova_tree, result);
  }

Re: [PULL 0/5] Trivial branch for 7.1 patches

2022-08-04 Thread Richard Henderson


On 8/4/22 12:22, Laurent Vivier wrote:

The following changes since commit 2480f3bbd03814b0651a1f74959f5c6631ee5819:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-03 08:32:44 -0700)

are available in the Git repository at:

   https://gitlab.com/laurent_vivier/qemu.git 
tags/trivial-branch-for-7.1-pull-request

for you to fetch changes up to 21d4e557e2fd0cb7f10b632b35f51146a1b6d892:

   include/qemu/host-utils.h: Simplify the compiler check in mulu128() 
(2022-08-04 13:49:47 +0200)


Pull request trivial branch 20220804


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Cornelia Huck (1):
   README.rst: fix link formatting

Eugenio Pérez (1):
   vdpa: Fix file descriptor leak on get features error

Thomas Huth (2):
   docs/about/removed-features: Move the -soundhw into the right section
   include/qemu/host-utils.h: Simplify the compiler check in mulu128()

Yonggang Luo (1):
   ppc: Remove redundant macro MSR_BOOK3S_MASK.

  README.rst  |  4 ++--
  docs/about/removed-features.rst | 14 +++---
  include/qemu/host-utils.h   |  3 +--
  net/vhost-vdpa.c|  4 ++--
  target/ppc/excp_helper.c|  1 -
  5 files changed, 12 insertions(+), 14 deletions(-)

Re: Re: PING: [PATCH] KVM: HWPoison: Fix memory address during remap

2022-08-04 Thread zhenwei pi


Hi,

Could you please give me any hint about this issue & patch?


On 8/4/22 14:59, Eiichi Tsukata wrote:

Hi

We’ve also hit this case.


On May 5, 2022, at 9:32, zhenwei pi  wrote:

Hi, Paolo

I would appreciate it if you could review patch.

On 4/20/22 14:45, zhenwei pi wrote:

qemu exits during reset with log:
qemu-system-x86_64: Could not remap addr: 1000@22001000
Currently, after MCE on RAM of a guest, qemu records a ram_addr only,
remaps this address with a fixed size(TARGET_PAGE_SIZE) during reset.
In the hugetlbfs scenario, mmap(addr...) needs page_size aligned
address and correct size. Unaligned address leads mmap to fail.


As far as I checked, SIGBUS sent from memory_failure() due to PR_MCE_KILL_EARLY 
has aligned address
in siginfo. But SIGBUS sent from kvm_mmu_page_fault() has unaligned address. 
This happens only when Guest touches
poisoned pages before they get remapped. This is not a usual case but it can 
sometimes happen.

FYI: call path
CPU 1/KVM-328915  [005] d..1. 711765.805910: signal_generate: sig=7 
errno=0 code=4 comm=CPU 1/KVM pid=328915 grp=0 res=0
CPU 1/KVM-328915  [005] d..1. 711765.805915: 
  => trace_event_raw_event_signal_generate
  => __send_signal
  => do_send_sig_info
  => send_sig_mceerr
  => handle_abnormal_pfn
  => direct_page_fault
  => kvm_mmu_page_fault
  => kvm_arch_vcpu_ioctl_run
  => kvm_vcpu_ioctl
  => __x64_sys_ioctl
  => do_syscall_64


In addition, aligning length suppresses the following madvise error message in 
qemu_ram_setup_dump():

   qemu_madvise: Invalid argument
   madvise doesn't support MADV_DONTDUMP, but dump_guest_core=off specified


Thanks

Eiichi


--
zhenwei pi

Re: [PULL for-7.1 0/1] Block patches

2022-08-04 Thread Richard Henderson


On 8/4/22 12:02, Stefan Hajnoczi wrote:

The following changes since commit 2480f3bbd03814b0651a1f74959f5c6631ee5819:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-03 08:32:44 -0700)

are available in the Git repository at:

   https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 7b0ca313647532a2c7007379ff800c9a2415c95d:

   virtiofsd: Fix format strings (2022-08-04 14:44:25 -0400)


Pull request

- Format string portability fix in virtiofsd



Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Stefan Weil (1):
   virtiofsd: Fix format strings

  tools/virtiofsd/fuse_lowlevel.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Re: [PATCH v1 00/40] TDX QEMU support

2022-08-04 Thread Xiaoyao Li


On 8/4/2022 1:44 AM, Daniel P. Berrangé wrote:

On Tue, Aug 02, 2022 at 06:55:48PM +0800, Xiaoyao Li wrote:

On 8/2/2022 5:49 PM, Daniel P. Berrangé wrote:

On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:



- CPU model

We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
because only a subset of features can be configured for TD.
- It's recommended to use '-cpu host' to create TD;
- '+feature/-feature' might not work as expected;

future work: To introduce specific CPU model for TDs and enhance +/-features
 for TDs.


Which features are incompatible with TDX ?


TDX enforces some features fixed to 1 (e.g., CPUID_EXT_X2APIC,
CPUID_EXT_HYPERVISOR)and some fixed to 0 (e.g., CPUID_EXT_VMX ).

Details can be found in patch 8 and TDX spec chapter "CPUID virtualization"


Presumably you have such a list, so that KVM can block them when
using '-cpu host' ?


No, KVM doesn't do this. The result is no error reported from KVM but what
TD OS sees from CPUID might be different what user specifies in QEMU.


If so, we should be able to sanity check the
use of these features in QEMU for the named CPU models / feature
selection too.


This series enhances get_supported_cpuid() for TDX. If named CPU models are
used to boot a TDX guest, it likely gets warning of "xxx feature is not
available"


If the  ',check=on' arg is given to -cpu, does it ensure that the
guest fails to startup with an incompatible feature set ? That's
really the key thing to protect the user from mistakes.


"check=on" won't stop startup with an incompatible feature set but 
"enforce=on". Yes, this series can ensure it with "enforce=on"





We have another series to enhance the "-feature" for TDX, to warn out if
some fixed1 is specified to be removed. Besides, we will introduce specific
named CPU model for TDX. e.g., TDX-SapphireRapids which contains the maximum
feature set a TDX guest can have on SPR host.


I don't know if this is the right approach or not, but we should at least
consider making use of CPU versioning here.  ie have a single "SapphireRapids"
alias, which resolves to a suitable specific CPU version depending on whether
TDX is used or not.


New version of a CPU model inherits from the last version. This fits 
well with CPU model fixup when features need to be removed/added to 
existing CPU model to make it work well with the latest kernel, and a 
new version is created.


However, I think it less proper to define a TDX variant with versioned- 
cpu model. For example, we have a SPR-V(x), then we need to define 
SPR-V(x+1) and alias it as SPR-TDX. For SPR-V(x+1), we need to add and 
remove several features. In the future, we may need a SPR-V(x+2) to fix 
up the normal SPR cpu model SPR-V(x). All the changes in V(x+1)/SPR-TDX 
 has to be reverted at first.


Anyway, we can discuss it in the future when we post the series of TDX 
CPU model. We plan to do that after this basic series gets merged. :)



With regards,
Daniel

Re: [RFC 1/1] hw: tpmtisspi: add SPI support to QEMU TPM implementation

2022-08-04 Thread Peter Delevoryas

On Thu, Aug 04, 2022 at 11:07:10AM -0700, Dan Zhang wrote:
> On Wed, Aug 3, 2022 at 10:30 AM Peter Delevoryas  wrote:
> >
> > On Wed, Aug 03, 2022 at 10:52:23AM +0200, Cédric Le Goater wrote:
> > > On 8/3/22 04:32, Iris Chen wrote:
> > > > From: Iris Chen 
> > >
> > > A commit log telling us about this new device would be good to have.
> > >
> > >
> > > > Signed-off-by: Iris Chen 
> > > > ---
> > > >   configs/devices/arm-softmmu/default.mak |   1 +
> > > >   hw/arm/Kconfig  |   5 +
> > > >   hw/tpm/Kconfig  |   5 +
> > > >   hw/tpm/meson.build  |   1 +
> > > >   hw/tpm/tpm_tis_spi.c| 311 
> > > >   include/sysemu/tpm.h|   3 +
> > > >   6 files changed, 326 insertions(+)
> > > >   create mode 100644 hw/tpm/tpm_tis_spi.c
> > > >
> > > > diff --git a/configs/devices/arm-softmmu/default.mak 
> > > > b/configs/devices/arm-softmmu/default.mak
> > > > index 6985a25377..80d2841568 100644
> > > > --- a/configs/devices/arm-softmmu/default.mak
> > > > +++ b/configs/devices/arm-softmmu/default.mak
> > > > @@ -42,3 +42,4 @@ CONFIG_FSL_IMX6UL=y
> > > >   CONFIG_SEMIHOSTING=y
> > > >   CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
> > > >   CONFIG_ALLWINNER_H3=y
> > > > +CONFIG_FBOBMC_AST=y
> > >
> > > I don't think this extra config is useful for now
> > >
> > > > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > > > index 15fa79afd3..193decaec1 100644
> > > > --- a/hw/arm/Kconfig
> > > > +++ b/hw/arm/Kconfig
> > > > @@ -458,6 +458,11 @@ config ASPEED_SOC
> > > >   select PMBUS
> > > >   select MAX31785
> > > > +config FBOBMC_AST
> > > > +bool
> > > > +select ASPEED_SOC
> > > > +select TPM_TIS_SPI
> > > > +
> > > >   config MPS2
> > > >   bool
> > > >   imply I2C_DEVICES
> > > > diff --git a/hw/tpm/Kconfig b/hw/tpm/Kconfig
> > > > index 29e82f3c92..370a43f045 100644
> > > > --- a/hw/tpm/Kconfig
> > > > +++ b/hw/tpm/Kconfig
> > > > @@ -8,6 +8,11 @@ config TPM_TIS_SYSBUS
> > > >   depends on TPM
> > > >   select TPM_TIS
> > > > +config TPM_TIS_SPI
> > > > +bool
> > > > +depends on TPM
> > > > +select TPM_TIS
> > > > +
> > > >   config TPM_TIS
> > > >   bool
> > > >   depends on TPM
> > > > diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
> > > > index 1c68d81d6a..1a057f4e36 100644
> > > > --- a/hw/tpm/meson.build
> > > > +++ b/hw/tpm/meson.build
> > > > @@ -2,6 +2,7 @@ softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: 
> > > > files('tpm_tis_common.c'))
> > > >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: 
> > > > files('tpm_tis_isa.c'))
> > > >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
> > > > files('tpm_tis_sysbus.c'))
> > > >   softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
> > > > +softmmu_ss.add(when: 'CONFIG_TPM_TIS_SPI', if_true: 
> > > > files('tpm_tis_spi.c'))
> > > >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_TIS'], if_true: 
> > > > files('tpm_ppi.c'))
> > > >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_CRB'], if_true: 
> > > > files('tpm_ppi.c'))
> > > > diff --git a/hw/tpm/tpm_tis_spi.c b/hw/tpm/tpm_tis_spi.c
> > > > new file mode 100644
> > > > index 00..c98ddcfddb
> > > > --- /dev/null
> > > > +++ b/hw/tpm/tpm_tis_spi.c
> > > > @@ -0,0 +1,311 @@
> > > > +#include "qemu/osdep.h"
> > > > +#include "hw/qdev-properties.h"
> > > > +#include "migration/vmstate.h"
> > > > +#include "hw/acpi/tpm.h"
> > > > +#include "tpm_prop.h"
> > > > +#include "tpm_tis.h"
> > > > +#include "qom/object.h"
> > > > +#include "hw/ssi/ssi.h"
> > > > +#include "hw/ssi/spi_gpio.h"
> > > > +
> > > > +#define TPM_TIS_SPI_ADDR_BYTES 3
> > > > +#define SPI_WRITE 0
> > > > +
> > > > +typedef enum {
> > > > +TIS_SPI_PKT_STATE_DEACTIVATED = 0,
> > > > +TIS_SPI_PKT_STATE_START,
> > > > +TIS_SPI_PKT_STATE_ADDRESS,
> > > > +TIS_SPI_PKT_STATE_DATA_WR,
> > > > +TIS_SPI_PKT_STATE_DATA_RD,
> > > > +TIS_SPI_PKT_STATE_DONE,
> > > > +} TpmTisSpiPktState;
> > > > +
> > > > +union TpmTisRWSizeByte {
> > > > +uint8_t byte;
> > > > +struct {
> > > > +uint8_t data_expected_size:6;
> > > > +uint8_t resv:1;
> > > > +uint8_t rwflag:1;
> > > > +};
> > > > +};
> > > > +
> > > > +union TpmTisSpiHwAddr {
> > > > +hwaddr addr;
> > > > +uint8_t bytes[sizeof(hwaddr)];
> > > > +};
> > > > +
> > > > +union TpmTisSpiData {
> > > > +uint32_t data;
> > > > +uint8_t bytes[64];
> > > > +};
> > > > +
> > > > +struct TpmTisSpiState {
> > > > +/*< private >*/
> > > > +SSIPeripheral parent_obj;
> > > > +
> > > > +/*< public >*/
> > > > +TPMState tpm_state; /* not a QOM object */
> > > > +TpmTisSpiPktState tpm_tis_spi_state;
> > > > +
> > > > +union TpmTisRWSizeByte first_byte;
> > > > +union TpmTisSpiHwAddr addr;
> > > > +union TpmTisSpiData data;
> > >
> > > Are these device registers ? I am not sure the unions are

[PATCH v3] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

The boot parameter header refers to setup_data at an absolute address,
and each setup_data refers to the next setup_data at an absolute address
too. Currently QEMU simply puts the setup_datas right after the kernel
image, and since the kernel_image is loaded at prot_addr -- a fixed
address knowable to QEMU apriori -- the setup_data absolute address
winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.

This mostly works fine, so long as the kernel image really is loaded at
prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
generally EFI doesn't give a good way of predicting where it's going to
load the kernel. So when it loads it at some address != prot_addr, the
absolute addresses in setup_data now point somewhere bogus, causing
crashes when EFI stub tries to follow the next link.

Fix this by placing setup_data at some fixed place in memory, not as
part of the kernel image, and then pointing the setup_data absolute
address to that fixed place in memory. This way, even if OVMF or other
chains relocate the kernel image, the boot parameter still points to the
correct absolute address.

For this, an unused part of the hardware mapped area is used, which
isn't used by anything else.

Fixes: 3cbeb52467 ("hw/i386: add device tree support")
Reported-by: Xiaoyao Li 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Cc: Daniel P. Berrangé 
Cc: Gerd Hoffmann 
Cc: Ard Biesheuvel 
Cc: Laszlo Ersek 
Cc: linux-...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..3affef3277 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -773,10 +773,10 @@ void x86_load_linux(X86MachineState *x86ms,
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
-int dtb_size, setup_data_offset;
+int dtb_size, setup_data_item_len, setup_data_total_len = 0;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel;
-hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
+uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
+hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0, setup_data_base;
 FILE *f;
 char *vmode;
 MachineState *machine = MACHINE(x86ms);
@@ -899,6 +899,8 @@ void x86_load_linux(X86MachineState *x86ms,
 cmdline_addr = 0x2;
 prot_addr= 0x10;
 }
+/* Nothing else uses this part of the hardware mapped region */
+setup_data_base = 0xf - 0x1000;
 
 /* highest address for loading the initrd */
 if (protocol >= 0x20c &&
@@ -1062,34 +1064,35 @@ void x86_load_linux(X86MachineState *x86ms,
 exit(1);
 }
 
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
-kernel = g_realloc(kernel, kernel_size);
-
-
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + dtb_size;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = setup_data_base + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_DTB);
 setup_data->len = cpu_to_le32(dtb_size);
-
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
 if (!legacy_no_rng_seed) {
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + 
RNG_SEED_LENGTH;
-kernel = g_realloc(kernel, kernel_size);
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + RNG_SEED_LENGTH;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = setup_data_base + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
 setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
 qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
 }
 
-/* Offset 0x250 is a pointer to the first setup_data link. */
-stq_p(header + 0x250, first_setup_data);
+if

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hey Laszlo,

On Thu, Aug 04, 2022 at 03:56:54PM +0200, Laszlo Ersek wrote:
> - do we want setup_data chaining work generally?
> 
> - or do we want only the random seed injection to stop crashing OVMF guests?

Preferably the first - generally. Which brings us to your point:

> > Given we only need 48 bytes or so, isn't there a more subtle place we
> > could just throw this in ram that doesn't need such complex
> > coordination?
> 
> These tricks add up and go wrong after a while. The pedantic
> reservations in the firmware have proved necessary.
> 
> IIUC, with v2, the setup_data_base address would (most frequently) be 96
> KB. edk2 does have uses for very low memory. If OVMF's PlatformPei does
> not reserve away the area, UefiCpuPkg or other drivers might allocate an
> overlapping chunk, even if only temporarily. That might not break the
> firmware, but it could overwrite the random seed. 

Yea, so we don't want an address that something else might overwrite. So
my question is: isn't there some 48 bytes or so available in some low
address (or maybe a high one?) that is traditionally reserved for some
hardware function, and so software doesn't use it, but it turns out QEMU
doesn't use it for anything either, so we can get away placing it at
that address? It seems like there *ought* to be something like that. I
just don't (yet) know what it is...

Jason

Re: [PATCH for-7.1] hw/mips/malta: turn off x86 specific features of PIIX4_PM

2022-08-04 Thread BB




Am 3. August 2022 20:00:18 MESZ schrieb Peter Maydell 
:
>On Wed, 3 Aug 2022 at 18:26, Bernhard Beschow  wrote:
>>
>> On Tue, Aug 2, 2022 at 8:37 AM Philippe Mathieu-Daudé via 
>>  wrote:
>>>
>>> On 28/7/22 15:16, Igor Mammedov wrote:
>>> > On Thu, 28 Jul 2022 13:29:07 +0100
>>> > Peter Maydell  wrote:
>>> >
>>> >> On Thu, 28 Jul 2022 at 12:50, Igor Mammedov  wrote:
>>> >>> Disable compiled out features using compat properties as the least
>>> >>> risky way to deal with issue.
>>>
>>> So now MIPS is forced to use meaningless compat[] to satisfy X86.
>>>
>>> Am I wrong seeing this as a dirty hack creeping in, yet another
>>> technical debt that will hit (me...) back in a close future?
>>>
>>> Are we sure there are no better solution (probably more time consuming
>>> and involving refactors) we could do instead?
>>
>>
>> Working on the consolidation of piix3 and -4 soutbridges [1] I've stumbled 
>> over certain design decisions where board/platform specific assumptions are 
>> baked into the piix device models. I figure that's the core of the issue.
>>
>> In our case the ACPI functionality is implemented by inheritance while 
>> perhaps it should be implemented using composition. With composition, the 
>> ACPI functionality could be injected by the caller: The pc board would 
>> inject it while the Malta board wouldn't. This would solve both the crash 
>> and above design problem.
>>
>> I'd be willing to implement it but can't make any promises about the time 
>> frame since I'm currently doing this in my free time. Any hints regarding 
>> the implementation would be welcome, though.
>
>
>For the 7.1 release (coming up real soon now) can we get consensus
>on this patch from Igor as the least risky way to at least fix
>the segfault ? We can look at better approaches for 7.2.

Hi,

My proposal isn't 7.1 material. I merily intended to start a design discussion 
how to proceed after 7.1 that would make Phil's maintainer life easier and 
provide further insights for my consolidation work.

I don't feel qualified enough to judge the impact of Igor's patch, so I'd leave 
that for the competent.

Best regards,
Bernhard

>
>thanks
>-- PMM

[PATCH v2] pc: add property for Linux setup_data random number seed

2022-08-04 Thread Paolo Bonzini

Using a property makes it possible to use the normal compat property
mechanism instead of ad hoc code; it avoids parameter proliferation
in x86_load_linux; and allows shipping the code even if it is
disabled by default.

Cc: Michael S. Tsirkin 
Co-developed-by: Jason A. Donenfeld 
Signed-off-by: Paolo Bonzini 
---
 hw/i386/microvm.c |  2 +-
 hw/i386/pc.c  |  5 +++--
 hw/i386/pc_piix.c |  2 +-
 hw/i386/pc_q35.c  |  2 +-
 hw/i386/x86.c | 33 +
 include/hw/i386/pc.h  |  3 ---
 include/hw/i386/x86.h |  5 +++--
 7 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 7fe8cce03e..dc929727dc 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -332,7 +332,7 @@ static void microvm_memory_init(MicrovmMachineState *mms)
 rom_set_fw(fw_cfg);
 
 if (machine->kernel_filename != NULL) {
-x86_load_linux(x86ms, fw_cfg, 0, true, false);
+x86_load_linux(x86ms, fw_cfg, 0, true);
 }
 
 if (mms->option_roms) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7280c02ce3..9b192373c0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -112,6 +112,7 @@ const size_t pc_compat_7_0_len = 
G_N_ELEMENTS(pc_compat_7_0);
 
 GlobalProperty pc_compat_6_2[] = {
 { "virtio-mem", "unplugged-inaccessible", "off" },
+{ TYPE_X86_MACHINE, "linuxboot-randomness", "off" },
 };
 const size_t pc_compat_6_2_len = G_N_ELEMENTS(pc_compat_6_2);
 
@@ -796,7 +797,7 @@ void xen_load_linux(PCMachineState *pcms)
 rom_set_fw(fw_cfg);
 
 x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
-   pcmc->pvh_enabled, pcmc->legacy_no_rng_seed);
+   pcmc->pvh_enabled);
 for (i = 0; i < nb_option_roms; i++) {
 assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
!strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
@@ -1118,7 +1119,7 @@ void pc_memory_init(PCMachineState *pcms,
 
 if (linux_boot) {
 x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
-   pcmc->pvh_enabled, pcmc->legacy_no_rng_seed);
+   pcmc->pvh_enabled);
 }
 
 for (i = 0; i < nb_option_roms; i++) {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a5c65c1c35..1526b7e3fd 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -447,10 +447,10 @@ DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1", NULL,
 static void pc_i440fx_7_0_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+
 pc_i440fx_7_1_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
-pcmc->legacy_no_rng_seed = true;
 pcmc->enforce_amd_1tb_hole = false;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 3a35193ff7..c5b38edc65 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -384,9 +384,9 @@ DEFINE_Q35_MACHINE(v7_1, "pc-q35-7.1", NULL,
 static void pc_q35_7_0_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+
 pc_q35_7_1_machine_options(m);
 m->alias = NULL;
-pcmc->legacy_no_rng_seed = true;
 pcmc->enforce_amd_1tb_hole = false;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..8c6450ee07 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -60,6 +60,8 @@
 #include CONFIG_DEVICES
 #include "kvm/kvm_i386.h"
 
+#define RNG_SEED_LENGTH 32
+
 /* Physical Address of PVH entry point read from kernel ELF NOTE */
 static size_t pvh_start_addr;
 
@@ -767,8 +769,7 @@ static bool load_elfboot(const char *kernel_filename,
 void x86_load_linux(X86MachineState *x86ms,
 FWCfgState *fw_cfg,
 int acpi_data_size,
-bool pvh_enabled,
-bool legacy_no_rng_seed)
+bool pvh_enabled)
 {
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
@@ -786,7 +787,6 @@ void x86_load_linux(X86MachineState *x86ms,
 const char *dtb_filename = machine->dtb;
 const char *kernel_cmdline = machine->kernel_cmdline;
 SevKernelLoaderContext sev_load_ctx = {};
-enum { RNG_SEED_LENGTH = 32 };
 
 /* Align to 16 bytes as a paranoia measure */
 cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
@@ -1076,7 +1076,8 @@ void x86_load_linux(X86MachineState *x86ms,
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
-if (!legacy_no_rng_seed) {
+if (x86ms->linuxboot_randomness != ON_OFF_AUTO_OFF &&
+(protocol >= 0x209 || x86ms->linuxboot_randomness == ON_OFF_AUTO_ON)) {
 setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
 kernel_size = setup_data_offset +

Re: [PATCH] pc: add property for Linux setup_data seed

2022-08-04 Thread Paolo Bonzini


On 8/4/22 16:31, Jason A. Donenfeld wrote:

I'm still not really keen on adding a knob for this. I understand ARM
has a knob for it for different reasons (better named "dtb-randomness").
If this knob thing is to live on here, maybe it should have
"-randomness" in the name also.


Ok, I just reused your variable name but linuxboot-randomness is fine by 
me too.



Rather, let's fix the bug. The code as-is -- going back to the 2016 DTB
addition -- is problematic and needs to be fixed. So let's fix that.
Trying to cover up the problem with a default-off knob just ensures this
stuff will never be made to work right.


It isn't covering up the problem, just providing a workaround
option, should another bug be discovered after release. We
still need to fix current discussed problems of course.


Thanks for the explanation. I don't like adding a knob. But if it's on
by default for the default machine type, then that's a compromise I
could accept.


Yes, in fact this allows enabling the seed even for older machine types 
if everything goes fine.  And if it doesn't, we only need a one-line 
patch to revert the feature, like Michael said.  So it's a good thing to 
have either way.


The patch was extracted out of my version from last month, but I didn't 
--amend the changes needed to make it compile (doh).  I incorporated 
yours instead and I'll send v2.


Paolo

Re: [PULL 0/1] ppc queue

2022-08-04 Thread Richard Henderson


On 8/4/22 11:41, Daniel Henrique Barboza wrote:

The following changes since commit 2480f3bbd03814b0651a1f74959f5c6631ee5819:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-03 08:32:44 -0700)

are available in the Git repository at:

   https://gitlab.com/danielhb/qemu.git tags/pull-ppc-20220804

for you to fetch changes up to ed021daf2d6c19499ae406055156dc19c073228f:

   hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[] (2022-08-04 15:20:14 
-0300)


ppc patch queue for 2022-08-04:

In this short queue we have a fix in the sam460ex machine where we're
not storing all GPIO lines in sam460ex_init().

This is not causing problems (as far as we're aware of) at this moment,
but this is getting in the way of a ppc405 rework we want to do for 7.2,
so let's fix it now.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Daniel Henrique Barboza (1):
   hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[]

  hw/ppc/sam460ex.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

[PULL for-7.1 1/1] virtiofsd: Fix format strings

2022-08-04 Thread Stefan Hajnoczi

From: Stefan Weil 

Signed-off-by: Stefan Weil 
Message-Id: <20220804074833.892604-1...@weilnetz.de>
Signed-off-by: Stefan Hajnoczi 
---
 tools/virtiofsd/fuse_lowlevel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 752928741d..2f08471627 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2025,7 +2025,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 
 fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
 if (arg->major == 7 && arg->minor >= 6) {
-fuse_log(FUSE_LOG_DEBUG, "flags=0x%016llx\n", flags);
+fuse_log(FUSE_LOG_DEBUG, "flags=0x%016" PRIx64 "\n", flags);
 fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n", arg->max_readahead);
 }
 se->conn.proto_major = arg->major;
@@ -2174,7 +2174,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 if (se->conn.want & (~se->conn.capable)) {
 fuse_log(FUSE_LOG_ERR,
  "fuse: error: filesystem requested capabilities "
- "0x%llx that are not supported by kernel, aborting.\n",
+ "0x%" PRIx64 " that are not supported by kernel, aborting.\n",
  se->conn.want & (~se->conn.capable));
 fuse_reply_err(req, EPROTO);
 se->error = -EPROTO;
-- 
2.37.1

[ANNOUNCE] QEMU 7.1.0-rc1 is now available

2022-08-04 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
second release candidate for the QEMU 7.1 release. This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-7.1.0-rc1.tar.xz
  http://download.qemu-project.org/qemu-7.1.0-rc1.tar.xz.sig

You can help improve the quality of the QEMU 7.1 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/7.1

Please add entries to the ChangeLog for the 7.1 release below:

  http://wiki.qemu.org/ChangeLog/7.1

Thank you to everyone involved!

Changes since rc0:

d2656dd577: Update version for v7.1.0-rc1 release (Richard Henderson)
d44971e725: target/mips: Advance pc after semihosting exception (Richard 
Henderson)
a21ba54dd5: virtiofsd: Disable killpriv_v2 by default (Vivek Goyal)
4bcb7de072: migration: Define BLK_MIG_BLOCK_SIZE as unsigned long long (Peter 
Maydell)
ead34f64f9: migration: Assert that migrate_multifd_compression() returns an 
in-range value (Peter Maydell)
777f53c759: Revert "migration: Simplify unqueue_page()" (Thomas Huth)
df67aa3e61: migration: add remaining params->has_* = true in 
migration_instance_init() (Leonardo Bras)
21b1d97459: main loop: add missing documentation links to GS/IO macros 
(Emanuele Giuseppe Esposito)
e13fe274bf: qemu-iotests: Discard stderr when probing devices (Cole Robinson)
fd8a68ad68: hw/block/hd-geometry: Do not override specified bios-chs-trans (Lev 
Kujawski)
630179b7f7: libvduse: Pass positive value to strerror() (Xie Yongji)
d9cf16c0be: libvduse: Replace strcpy() with strncpy() (Xie Yongji)
e7156ff7cb: libvduse: Fix the incorrect function name (Xie Yongji)
77e3f038af: block/io_uring: add missing include file (Jinhao Fan)
1eaa63429a: linux-user/riscv: Align signal frame to 16 bytes (Richard Henderson)
5265d24c98: target/arm: Move sve probe inside kvm >= 4.15 branch (Richard 
Henderson)
b9e8d68a39: target/arm: Set KVM_ARM_VCPU_SVE while probing the host (Richard 
Henderson)
0dd14e9555: target/arm: Use kvm_arm_sve_supported in 
kvm_arm_get_host_cpu_features (Richard Henderson)
1bca64a3f0: tests/qtest/migration-test: Run the dirty ring tests only with the 
x86 target (Thomas Huth)
398c01da9c: aspeed/fby35: Fix owner of the BMC RAM memory region (Cédric Le 
Goater)
3867c1c5fd: aspeed: Remove unused fields from AspeedMachineState (Cédric Le 
Goater)
3fde641e72: ipmi:smbus: Add a check around a memcpy (Corey Minyard)
e2e137f642: hw/nvme: do not enable ioeventfd by default (Klaus Jensen)
04e8da8890: hw/nvme: unregister the event notifier handler on the main loop 
(Klaus Jensen)
a2da737729: hw/nvme: skip queue processing if notifier is cleared (Klaus Jensen)
a07d9df0fd: trivial: Fix duplicated words (Thomas Huth)
7a21bee2aa: misc: fix commonly doubled up words (Daniel P. Berrangé)
ebf705541c: tests/unit/test-qga: Replace the word 'blacklist' in the guest 
agent unit test (Thomas Huth)
2649a72555: migration-test: Allow test to run without uffd (Peter Xu)
219044b8e6: migration-test: Use migrate_ensure_converge() for auto-converge 
(Peter Xu)
b9e6074fc5: tests/tcg/linux-test: Fix random hangs in test_socket (Ilya 
Leoshkevich)
7eabb050ea: Hexagon (tests/tcg/hexagon) reference file for float_convd (Taylor 
Simpson)
a1ad040dba: Hexagon (tests/tcg/hexagon) Fix alignment in load_unpack.c (Taylor 
Simpson)
1e814a0dc4: Hexagon (target/hexagon) make VyV operands use a unique temp 
(Taylor Simpson)
74725231d6: hw/loongarch: Change macro name 'LS7A_XXX' to 'VIRT_XXX' (Xiaojuan 
Yang)
587858ed0d: hw/loongarch: Rename file 'loongson3.XXX' to 'virt.XXX' (Xiaojuan 
Yang)
fc2cc19ffa: ci: Upgrade msys2 release to 20220603 (Yonggang Luo)
1235cf7d31: qemu-options: bring the kernel and image options together (Alex 
Bennée)
28053143ab: docs/devel: fix description of OBJECT_DECLARE_SIMPLE_TYPE (Alex 
Bennée)
503e549e44: tests/tcg/s390x: Test unaligned accesses to lowcore (Ilya 
Leoshkevich)
0882caf4d6: qapi: Add exit-failure PanicAction (Ilya Leoshkevich)
9b1268f55c: semihosting: Fix handling of buffer in TARGET_SYS_TMPNAM (Peter 
Maydell)
fed49cdf6a: semihosting: Check for errors on SET_ARG() (Peter Maydell)
45704e8904: semihosting: Don't copy buffer after console_write() (Peter Maydell)
aed04e6357: semihosting: Don't return negative values on 
qemu_semihosting_console_write() failure (Peter Maydell)
93a02e822f: .gitlab-ci.d/windows.yml: Enable native Windows symlink (Bin Meng)
6ad5208661: .cirrus.yml: Change winsymlinks to 'native' (Bin Meng)
ca58b4931e: gitlab: drop 'containers-layer2' stage (Daniel P. Berrangé)
998f334722: gitlab: show testlog.txt contents when cirrus/custom-runner jobs 
fail (Daniel P. Berrangé)
feb6cb9369: tests: refresh to latest libvirt-ci module (Daniel P. Berrangé)
ebc55f523c: configure: pass correct cflags to

Re: [PATCH v2 12/20] ppc/ppc405: QOM'ify EBC

2022-08-04 Thread BALATON Zoltan


On Thu, 4 Aug 2022, Peter Maydell wrote:

On Thu, 4 Aug 2022 at 19:03, BALATON Zoltan  wrote:

I was trying to find out how to do it but I don't understand QOM enough to
answer the simple question of how to get the cpu object from QOM. My
guesses are:

object_resolve_path_type("/machine", TYPE_POWERPC_CPU, NULL)


Out of curiosity would this work though to get the cpu or if not why not 
and what would be a preferred way? I could not find this out from reading 
the object.h comments, the docs/deve/qom.rst, nor searching the code.



or maybe

object_resolve_path_at(OBJECT(dev)->parent, "cpu")

or how do these functions work and what is the preferred way to retrieve
an object from the QOM tree? This is what I hoped someone with more
understanding of QOM could answer.


The standard approach that we use elsewhere in the tree for handling
"this device needs to have a pointer to a CPU object or whatever"
is "the device has a QOM link property, and the SoC sets that
property when it creates the device".

There are other ways it could in theory be done, but there is
benefit in consistency, and "define and set the property" is


If this is the preferred way then so be it, I just don't like it because I 
think this is too many boilerplate code that could be avoided. This series:


 9 files changed, 894 insertions(+), 652 deletions(-)

 and that's including removing all of the taihu machine; the file where 
the QOMification is done:


 hw/ppc/ppc405_uc.c  | 799 +++-

Ideally introducing QOM should make it simpler not more complex. Four of 
the QOMified devices only have a property defined at all because of this 
cpu link that's only used once in the realize method to register DCRs. 
This is about 10 lines of code each. If there was a simple way to get the 
cpu object from these realize methods then we could get rid of all these 
properties and save about 40-50 lines and make these simpler.



straightforward. It also means the device object doesn't have
to know anything about the way the SoC container is laid out.


We only need the cpu object so we don't need to know the soc container if 
there's a way to get it otherwise I just don't know how QOM works and was 
trying to find a way to get to the cpu object. Maybe it's simpler than 
that.


If there's no simple way or you and Cédric think it isn't worth the effort 
then I'm also OK with it but if there's a way to make this simpler I'd be 
happy to get rid of things that make it harder to read and understand code 
or allow making mistakes more easily. I take whatever decision you make so 
won't say this again, I think I've explained my point now.


Regards,
BALATON Zoltan


(It's usually worth looking at whether there are cleanups
that could mean the device doesn't have to have a pointer to
that other object at all -- but that isn't always the case,
or the cleanups would be a big job in their own right that
are better not tangled up with QOMification.)

thanks
-- PMM

[PULL 1/5] README.rst: fix link formatting

2022-08-04 Thread Laurent Vivier

From: Cornelia Huck 

Make the links render correctly.

Signed-off-by: Cornelia Huck 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <20220803090250.136556-1-coh...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 README.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.rst b/README.rst
index 23795b837740..21df79ef4379 100644
--- a/README.rst
+++ b/README.rst
@@ -39,7 +39,7 @@ Documentation can be found hosted online at
 current development version that is available at
 ``_ is generated from the ``docs/``
 folder in the source tree, and is built by `Sphinx
-_`.
+`_.
 
 
 Building
@@ -78,7 +78,7 @@ format-patch' and/or 'git send-email' to format & send the 
mail to the
 qemu-devel@nongnu.org mailing list. All patches submitted must contain
 a 'Signed-off-by' line from the author. Patches should follow the
 guidelines set out in the `style section
-` of
+`_ of
 the Developers Guide.
 
 Additional information on submitting patches can be found online via
-- 
2.37.1

[PULL 0/5] Trivial branch for 7.1 patches

2022-08-04 Thread Laurent Vivier

The following changes since commit 2480f3bbd03814b0651a1f74959f5c6631ee5819:

  Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-03 08:32:44 -0700)

are available in the Git repository at:

  https://gitlab.com/laurent_vivier/qemu.git 
tags/trivial-branch-for-7.1-pull-request

for you to fetch changes up to 21d4e557e2fd0cb7f10b632b35f51146a1b6d892:

  include/qemu/host-utils.h: Simplify the compiler check in mulu128() 
(2022-08-04 13:49:47 +0200)


Pull request trivial branch 20220804



Cornelia Huck (1):
  README.rst: fix link formatting

Eugenio Pérez (1):
  vdpa: Fix file descriptor leak on get features error

Thomas Huth (2):
  docs/about/removed-features: Move the -soundhw into the right section
  include/qemu/host-utils.h: Simplify the compiler check in mulu128()

Yonggang Luo (1):
  ppc: Remove redundant macro MSR_BOOK3S_MASK.

 README.rst  |  4 ++--
 docs/about/removed-features.rst | 14 +++---
 include/qemu/host-utils.h   |  3 +--
 net/vhost-vdpa.c|  4 ++--
 target/ppc/excp_helper.c|  1 -
 5 files changed, 12 insertions(+), 14 deletions(-)

-- 
2.37.1

[PULL 4/5] ppc: Remove redundant macro MSR_BOOK3S_MASK.

2022-08-04 Thread Laurent Vivier

From: Yonggang Luo 

Signed-off-by: Yonggang Luo 
Reviewed-by: Daniel Henrique Barboza 
Message-Id: <20220728201135.223-1-luoyongg...@gmail.com>
Signed-off-by: Laurent Vivier 
---
 target/ppc/excp_helper.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index cb752b184a0a..7550aafed660 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2015,7 +2015,6 @@ void helper_rfi(CPUPPCState *env)
 do_rfi(env, env->spr[SPR_SRR0], env->spr[SPR_SRR1] & 0xul);
 }
 
-#define MSR_BOOK3S_MASK
 #if defined(TARGET_PPC64)
 void helper_rfid(CPUPPCState *env)
 {
-- 
2.37.1

[PULL 2/5] vdpa: Fix file descriptor leak on get features error

2022-08-04 Thread Laurent Vivier

From: Eugenio Pérez 

File descriptor vdpa_device_fd is not free in the case of returning
error from vhost_vdpa_get_features. Fixing it by making all errors go to
the same error path.

Resolves: Coverity CID 1490785
Fixes: 8170ab3f43 ("vdpa: Extract get features part from 
vhost_vdpa_get_max_queue_pairs")

Signed-off-by: Eugenio Pérez 
Reviewed-by: Laurent Vivier 
Reviewed-by: Michael S. Tsirkin 
Message-Id: <20220802112447.249436-2-epere...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 net/vhost-vdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 6abad276a61a..303447a68e8b 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -566,7 +566,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 g_autofree NetClientState **ncs = NULL;
 g_autoptr(VhostIOVATree) iova_tree = NULL;
 NetClientState *nc;
-int queue_pairs, r, i, has_cvq = 0;
+int queue_pairs, r, i = 0, has_cvq = 0;
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
@@ -582,7 +582,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 
 r = vhost_vdpa_get_features(vdpa_device_fd, , errp);
 if (unlikely(r < 0)) {
-return r;
+goto err;
 }
 
 queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
-- 
2.37.1

[PULL 5/5] include/qemu/host-utils.h: Simplify the compiler check in mulu128()

2022-08-04 Thread Laurent Vivier

From: Thomas Huth 

We currently require at least GCC 7.4 or Clang 6.0 for compiling QEMU.
GCC has __builtin_mul_overflow since version 5 already, and Clang 6.0
also provides this built-in function (see its documentation on this page:
https://releases.llvm.org/6.0.0/tools/clang/docs/LanguageExtensions.html ).
So we can simplify the #if statement here.

Signed-off-by: Thomas Huth 
Reviewed-by: Richard Henderson 
Message-Id: <20220721074809.1513357-1-th...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 include/qemu/host-utils.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 29f3a9987880..88d476161ccb 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -533,8 +533,7 @@ static inline bool umul64_overflow(uint64_t x, uint64_t y, 
uint64_t *ret)
  */
 static inline bool mulu128(uint64_t *plow, uint64_t *phigh, uint64_t factor)
 {
-#if defined(CONFIG_INT128) && \
-(__has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5)
+#if defined(CONFIG_INT128)
 bool res;
 __uint128_t r;
 __uint128_t f = ((__uint128_t)*phigh << 64) | *plow;
-- 
2.37.1

[PULL 3/5] docs/about/removed-features: Move the -soundhw into the right section

2022-08-04 Thread Laurent Vivier

From: Thomas Huth 

The note about the removal of '-soundhw' has been accidentally added
to the section of removed "linux-user mode CPUs" ... it should reside
in the section about removed "System emulator command line arguments"
instead.

Fixes: 039a68373c ("introduce -audio as a replacement for -soundhw")
Signed-off-by: Thomas Huth 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Christian Schoenebeck 
Message-Id: <20220802075611.346835-1-th...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 docs/about/removed-features.rst | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c7b9dadd5d63..925e22016f98 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -396,6 +396,13 @@ Use ``-display sdl`` instead.
 
 Use ``-display curses`` instead.
 
+Creating sound card devices using ``-soundhw`` (removed in 7.1)
+'''
+
+Sound card devices should be created using ``-device`` or ``-audio``.
+The exception is ``pcspk`` which can be activated using ``-machine
+pcspk-audiodev=``.
+
 
 QEMU Machine Protocol (QMP) commands
 
@@ -681,13 +688,6 @@ tripped up the CI testing and was suspected to be quite 
broken. For that
 reason the maintainers strongly suspected no one actually used it.
 
 
-Creating sound card devices using ``-soundhw`` (removed in 7.1)
-'''
-
-Sound card devices should be created using ``-device`` or ``-audio``.
-The exception is ``pcspk`` which can be activated using ``-machine
-pcspk-audiodev=``.
-
 TCG introspection features
 --
 
-- 
2.37.1

[PATCH v7 03/12] vhost: Delete useless read memory barrier

2022-08-04 Thread Eugenio Pérez

As discussed in previous series [1], this memory barrier is useless with
the atomic read of used idx at vhost_svq_more_used. Deleting it.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02616.html

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e6eebd0e8d..1b49bf54f2 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -509,9 +509,6 @@ size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
 if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
 return 0;
 }
-
-/* Make sure we read new used_idx */
-smp_rmb();
 } while (true);
 }
 
-- 
2.31.1

[PULL for-7.1 0/1] Block patches

2022-08-04 Thread Stefan Hajnoczi

The following changes since commit 2480f3bbd03814b0651a1f74959f5c6631ee5819:

  Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-03 08:32:44 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 7b0ca313647532a2c7007379ff800c9a2415c95d:

  virtiofsd: Fix format strings (2022-08-04 14:44:25 -0400)


Pull request

- Format string portability fix in virtiofsd



Stefan Weil (1):
  virtiofsd: Fix format strings

 tools/virtiofsd/fuse_lowlevel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.37.1

Re: [PATCH 7.1] virtio-scsi: fix race in virtio_scsi_dataplane_start()

2022-08-04 Thread Stefan Hajnoczi

On Wed, Aug 03, 2022 at 12:28:24PM -0400, Stefan Hajnoczi wrote:
> As soon as virtio_scsi_data_plane_start() attaches host notifiers the
> IOThread may start virtqueue processing. There is a race between
> IOThread virtqueue processing and virtio_scsi_data_plane_start() because
> it only assigns s->dataplane_started after attaching host notifiers.
> 
> When a virtqueue handler function in the IOThread calls
> virtio_scsi_defer_to_dataplane() it may see !s->dataplane_started and
> attempt to start dataplane even though we're already in the IOThread:
> 
>   #0  0x7f67b360857c __pthread_kill_implementation (libc.so.6 + 0xa257c)
>   #1  0x7f67b35bbd56 raise (libc.so.6 + 0x55d56)
>   #2  0x7f67b358e833 abort (libc.so.6 + 0x28833)
>   #3  0x7f67b358e75b __assert_fail_base.cold (libc.so.6 + 0x2875b)
>   #4  0x7f67b35b4cd6 __assert_fail (libc.so.6 + 0x4ecd6)
>   #5  0x55ca87fd411b memory_region_transaction_commit (qemu-kvm + 
> 0x67511b)
>   #6  0x55ca87e17811 virtio_pci_ioeventfd_assign (qemu-kvm + 0x4b8811)
>   #7  0x55ca87e14836 virtio_bus_set_host_notifier (qemu-kvm + 0x4b5836)
>   #8  0x55ca87f8e14e virtio_scsi_set_host_notifier (qemu-kvm + 0x62f14e)
>   #9  0x55ca87f8dd62 virtio_scsi_dataplane_start (qemu-kvm + 0x62ed62)
>   #10 0x55ca87e14610 virtio_bus_start_ioeventfd (qemu-kvm + 0x4b5610)
>   #11 0x55ca87f8c29a virtio_scsi_handle_ctrl (qemu-kvm + 0x62d29a)
>   #12 0x55ca87fa5902 virtio_queue_host_notifier_read (qemu-kvm + 0x646902)
>   #13 0x55ca882c099e aio_dispatch_handler (qemu-kvm + 0x96199e)
>   #14 0x55ca882c1761 aio_poll (qemu-kvm + 0x962761)
>   #15 0x55ca880e1052 iothread_run (qemu-kvm + 0x782052)
>   #16 0x55ca882c562a qemu_thread_start (qemu-kvm + 0x96662a)
> 
> This patch assigns s->dataplane_started before attaching host notifiers
> so that virtqueue handler functions that run in the IOThread before
> virtio_scsi_data_plane_start() returns correctly identify that dataplane
> does not need to be started.
> 
> Note that s->dataplane_started does not need the AioContext lock because
> it is set before attaching host notifiers and cleared after detaching
> host notifiers. In other words, the IOThread always sees the value true
> and the main loop thread does not modify it while the IOThread is
> active.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099541
> Reported-by: Qing Wang 

Qing Wang has confirmed that this solves the assertion failures.

Paolo/Michael: Can this still be merged for QEMU 7.1?

Stefan


signature.asc
Description: PGP signature

Re: [PATCH] util/aio: Defer disabling poll mode as long as possible

2022-08-04 Thread Stefan Hajnoczi

On Sun, Jul 10, 2022 at 08:08:49PM +0800, Chao Gao wrote:
> When we measure FIO read performance (cache=writethrough, bs=4k,
> iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
> from guest to qemu.
> 
> It turns out those frequent notificatons are caused by interference from
> worker threads. Worker threads queue bottom halves after completing IO
> requests.  Pending bottom halves may lead to either aio_compute_timeout()
> zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
> no progress after noticing pending aio_notify() events. Both cause
> run_poll_handlers() to call poll_set_started(false) to disable poll mode.
> However, for both cases, as timeout is already zeroed, the event loop
> (i.e., aio_poll()) just processes bottom halves and then starts the next
> event loop iteration. So, disabling poll mode has no value but leads to
> unnecessary notifications from guest.
> 
> To minimize unnecessary notifications from guest, defer disabling poll
> mode to when the event loop is about to be blocked.
> 
> With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
> cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Chao Gao 
> ---
>  util/aio-posix.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)

I just noticed that I forgot to send a pull request with this for QEMU
7.1. It's my fault that this missed QEMU 7.1, sorry. It will be merged
once the 7.2 merge window opens.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH for-7.1] virtiofsd: Fix format strings

2022-08-04 Thread Stefan Hajnoczi

On Thu, 4 Aug 2022 at 03:50, Stefan Weil via  wrote:
>
> Signed-off-by: Stefan Weil 
> ---
>
> I have also several patches which add missing G_GNUC_PRINTF.
> Would such changes still be wanted for 7.1?

Hi Stefan,
Thanks for the fix! I have merged it for 7.1. Please send the
G_GNUC_PRINTF fixes for 7.2.

Stefan

[PULL 0/1] ppc queue

2022-08-04 Thread Daniel Henrique Barboza

The following changes since commit 2480f3bbd03814b0651a1f74959f5c6631ee5819:

  Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-03 08:32:44 -0700)

are available in the Git repository at:

  https://gitlab.com/danielhb/qemu.git tags/pull-ppc-20220804

for you to fetch changes up to ed021daf2d6c19499ae406055156dc19c073228f:

  hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[] (2022-08-04 15:20:14 
-0300)


ppc patch queue for 2022-08-04:

In this short queue we have a fix in the sam460ex machine where we're
not storing all GPIO lines in sam460ex_init().

This is not causing problems (as far as we're aware of) at this moment,
but this is getting in the way of a ppc405 rework we want to do for 7.2,
so let's fix it now.


Daniel Henrique Barboza (1):
  hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[]

 hw/ppc/sam460ex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH v7 09/12] vdpa: Extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail

2022-08-04 Thread Eugenio Pérez

So we can reuse it to inject state messages.

Signed-off-by: Eugenio Pérez 
--
v7:
* Remove double free error

v6:
* Do not assume in buffer sent to the device is sizeof(virtio_net_ctrl_ack)

v5:
* Do not use an artificial !NULL VirtQueueElement
* Use only out size instead of iovec dev_buffers for these functions.
---
 net/vhost-vdpa.c | 59 +++-
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2c6a26cca0..10843e6d97 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -331,6 +331,38 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
 }
 }
 
+static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
+  size_t in_len)
+{
+/* Buffers for the device */
+const struct iovec out = {
+.iov_base = s->cvq_cmd_out_buffer,
+.iov_len = out_len,
+};
+const struct iovec in = {
+.iov_base = s->cvq_cmd_in_buffer,
+.iov_len = sizeof(virtio_net_ctrl_ack),
+};
+VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
+int r;
+
+r = vhost_svq_add(svq, , 1, , 1, NULL);
+if (unlikely(r != 0)) {
+if (unlikely(r == -ENOSPC)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+  __func__);
+}
+return r;
+}
+
+/*
+ * We can poll here since we've had BQL from the time we sent the
+ * descriptor. Also, we need to take the answer before SVQ pulls by itself,
+ * when BQL is released
+ */
+return vhost_svq_poll(svq);
+}
+
 static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
@@ -387,23 +419,18 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 void *opaque)
 {
 VhostVDPAState *s = opaque;
-size_t in_len, dev_written;
+size_t in_len;
 virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
 /* Out buffer sent to both the vdpa device and the device model */
 struct iovec out = {
 .iov_base = s->cvq_cmd_out_buffer,
 };
-/* In buffer sent to the device */
-const struct iovec dev_in = {
-.iov_base = s->cvq_cmd_in_buffer,
-.iov_len = sizeof(virtio_net_ctrl_ack),
-};
 /* in buffer used for device model */
 const struct iovec in = {
 .iov_base = ,
 .iov_len = sizeof(status),
 };
-int r = -EINVAL;
+ssize_t dev_written = -EINVAL;
 bool ok;
 
 out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
@@ -414,21 +441,11 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 goto out;
 }
 
-r = vhost_svq_add(svq, , 1, _in, 1, elem);
-if (unlikely(r != 0)) {
-if (unlikely(r == -ENOSPC)) {
-qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
-  __func__);
-}
+dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
+if (unlikely(dev_written < 0)) {
 goto out;
 }
 
-/*
- * We can poll here since we've had BQL from the time we sent the
- * descriptor. Also, we need to take the answer before SVQ pulls by itself,
- * when BQL is released
- */
-dev_written = vhost_svq_poll(svq);
 if (unlikely(dev_written < sizeof(status))) {
 error_report("Insufficient written data (%zu)", dev_written);
 goto out;
@@ -436,7 +453,7 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 
 memcpy(, s->cvq_cmd_in_buffer, sizeof(status));
 if (status != VIRTIO_NET_OK) {
-goto out;
+return VIRTIO_NET_ERR;
 }
 
 status = VIRTIO_NET_ERR;
@@ -453,7 +470,7 @@ out:
 }
 vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
 g_free(elem);
-return r;
+return dev_written < 0 ? dev_written : 0;
 }
 
 static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
-- 
2.31.1

[PULL 1/1] hw/ppc: sam460ex.c: store all GPIO lines in mal_irqs[]

2022-08-04 Thread Daniel Henrique Barboza

We're not storing all GPIO lines we're retrieving with
qdev_get_gpio_in() in mal_irqs[]. We're storing just the last one in the
first index:

for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
}
ppc4xx_mal_init(env, 4, 16, mal_irqs);

mal_irqs is used in ppc4xx_mal_init() to assign the IRQs to MAL:

for (i = 0; i < 4; i++) {
mal->irqs[i] = irqs[i];
}

Since only irqs[0] has been initialized, mal->irqs[1,2,3] are being
zeroed.

This doesn´t seem to trigger any apparent issues at this moment, but
Cedric's QOMification of the MAL device [1] is executing a
sysbus_connect_irq() that will fail if we do not store all GPIO lines
properly.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00497.html

Cc: Peter Maydell 
Cc: BALATON Zoltan 
Fixes: 706e944206d7 ("hw/ppc/sam460ex: Drop use of ppcuic_init()")
Acked-by: BALATON Zoltan 
Reviewed-by: Cédric Le Goater 
Message-Id: <20220803233204.2724202-1-danielhb...@gmail.com>
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/sam460ex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 7e8da657c2..0357ee077f 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -384,7 +384,7 @@ static void sam460ex_init(MachineState *machine)
 
 /* MAL */
 for (i = 0; i < ARRAY_SIZE(mal_irqs); i++) {
-mal_irqs[0] = qdev_get_gpio_in(uic[2], 3 + i);
+mal_irqs[i] = qdev_get_gpio_in(uic[2], 3 + i);
 }
 ppc4xx_mal_init(env, 4, 16, mal_irqs);
 
-- 
2.36.1

[PATCH v7 12/12] vdpa: Delete CVQ migration blocker

2022-08-04 Thread Eugenio Pérez

We can restore the device state in the destination via CVQ now. Remove
the migration blocker.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  1 -
 hw/virtio/vhost-vdpa.c | 14 --
 net/vhost-vdpa.c   |  2 --
 3 files changed, 17 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d10a89303e..d85643 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -35,7 +35,6 @@ typedef struct vhost_vdpa {
 bool shadow_vqs_enabled;
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
-Error *migration_blocker;
 GPtrArray *shadow_vqs;
 const VhostShadowVirtqueueOps *shadow_vq_ops;
 void *shadow_vq_ops_opaque;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7e28d2f674..4b0cfc0f56 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1033,13 +1033,6 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 return true;
 }
 
-if (v->migration_blocker) {
-int r = migrate_add_blocker(v->migration_blocker, );
-if (unlikely(r < 0)) {
-return false;
-}
-}
-
 for (i = 0; i < v->shadow_vqs->len; ++i) {
 VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
 VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -1082,10 +1075,6 @@ err:
 vhost_svq_stop(svq);
 }
 
-if (v->migration_blocker) {
-migrate_del_blocker(v->migration_blocker);
-}
-
 return false;
 }
 
@@ -1105,9 +1094,6 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 }
 }
 
-if (v->migration_blocker) {
-migrate_del_blocker(v->migration_blocker);
-}
 return true;
 }
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4f1524c2e9..7c0d600aea 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -558,8 +558,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.shadow_vq_ops = _vdpa_net_svq_ops;
 s->vhost_vdpa.shadow_vq_ops_opaque = s;
-error_setg(>vhost_vdpa.migration_blocker,
-   "Migration disabled: vhost-vdpa uses CVQ.");
 }
 ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
-- 
2.31.1

[PATCH v7 06/12] vhost_net: Add NetClientInfo stop callback

2022-08-04 Thread Eugenio Pérez

Used by the backend to perform actions after the device is stopped.

In particular, vdpa net use it to unmap CVQ buffers to the device,
cleaning the actions performend in prepare().

Signed-off-by: Eugenio Pérez 
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 3416bb3d46..7aa1ec0974 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -45,6 +45,7 @@ typedef struct NICConf {
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
 typedef int (NetPrepare)(NetClientState *);
+typedef void (NetStop)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
@@ -73,6 +74,7 @@ typedef struct NetClientInfo {
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetPrepare *prepare;
+NetStop *stop;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e1150d7532..10bca15446 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -320,6 +320,9 @@ static void vhost_net_stop_one(struct vhost_net *net,
 net->nc->info->poll(net->nc, true);
 }
 vhost_dev_stop(>dev, dev);
+if (net->nc->info->stop) {
+net->nc->info->stop(net->nc);
+}
 vhost_dev_disable_notifiers(>dev, dev);
 }
 
-- 
2.31.1

[PATCH v7 11/12] vdpa: Add virtio-net mac address via CVQ at start

2022-08-04 Thread Eugenio Pérez

This is needed so the destination vdpa device see the same state a the
guest set in the source.

Signed-off-by: Eugenio Pérez 
---
v6:
* Map and unmap command buffers at the start and end of device usage.

v5:
* Rename s/start/load/
* Use independent NetClientInfo to only add load callback on cvq.
---
 net/vhost-vdpa.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 10843e6d97..4f1524c2e9 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -363,11 +363,54 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, 
size_t out_len,
 return vhost_svq_poll(svq);
 }
 
+static int vhost_vdpa_net_load(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_vdpa *v = >vhost_vdpa;
+VirtIONet *n;
+uint64_t features;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (!v->shadow_vqs_enabled) {
+return 0;
+}
+
+n = VIRTIO_NET(v->dev->vdev);
+features = v->dev->vdev->host_features;
+if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+const struct virtio_net_ctrl_hdr ctrl = {
+.class = VIRTIO_NET_CTRL_MAC,
+.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+};
+char *cursor = s->cvq_cmd_out_buffer;
+ssize_t dev_written;
+virtio_net_ctrl_ack state;
+
+memcpy(cursor, , sizeof(ctrl));
+cursor += sizeof(ctrl);
+memcpy(cursor, n->mac, sizeof(n->mac));
+cursor += sizeof(n->mac);
+
+dev_written = vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + sizeof(n->mac),
+ sizeof(state));
+if (unlikely(dev_written < 0)) {
+return dev_written;
+}
+
+memcpy(, s->cvq_cmd_in_buffer, sizeof(state));
+return state == VIRTIO_NET_OK ? 0 : -1;
+}
+
+return 0;
+}
+
 static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
 .receive = vhost_vdpa_receive,
 .prepare = vhost_vdpa_net_cvq_prepare,
+.load = vhost_vdpa_net_load,
 .stop = vhost_vdpa_net_cvq_stop,
 .cleanup = vhost_vdpa_cleanup,
 .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
-- 
2.31.1

[PATCH v7 10/12] vhost_net: add NetClientState->load() callback

2022-08-04 Thread Eugenio Pérez

It allows per-net client operations right after device's successful
start. In particular, to load the device status.

Vhost-vdpa net will use it to add the CVQ buffers to restore the device
status.

Signed-off-by: Eugenio Pérez 
---
v5: Rename start / load, naming it more specifically.
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 7aa1ec0974..356e682ab6 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -45,6 +45,7 @@ typedef struct NICConf {
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
 typedef int (NetPrepare)(NetClientState *);
+typedef int (NetLoad)(NetClientState *);
 typedef void (NetStop)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
@@ -74,6 +75,7 @@ typedef struct NetClientInfo {
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetPrepare *prepare;
+NetLoad *load;
 NetStop *stop;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 10bca15446..6b83d5503f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -281,6 +281,13 @@ static int vhost_net_start_one(struct vhost_net *net,
 }
 }
 }
+
+if (net->nc->info->load) {
+r = net->nc->info->load(net->nc);
+if (r < 0) {
+goto fail;
+}
+}
 return 0;
 fail:
 file.fd = -1;
-- 
2.31.1

[PATCH v7 00/12] NIC vhost-vdpa state restore via Shadow CVQ

2022-08-04 Thread Eugenio Pérez

CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
virtio-net device model is updated. The migration was blocked because although
the state can be megrated between VMM it was not possible to restore on the
destination NIC.

This series add support for SVQ to inject external messages without the guest's
knowledge, so before the guest is resumed all the guest visible state is
restored. It is done using standard CVQ messages, so the vhost-vdpa device does
not need to learn how to restore it: As long as they have the feature, they
know how to handle it.

This series needs fix [1] to be applied to achieve full live
migration.

Thanks!

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00325.html

v7:
- Remove accidental double free.

v6:
- Move map and unmap of the buffers to the start and stop of the device. This
  implies more callbacks on NetClientInfo, but simplifies the SVQ CVQ code.
- Not assume that in buffer is sizeof(virtio_net_ctrl_ack) in
  vhost_vdpa_net_cvq_add
- Reduce the number of changes from previous versions
- Delete unused memory barrier

v5:
- Rename s/start/load/
- Use independent NetClientInfo to only add load callback on cvq.
- Accept out sg instead of dev_buffers[] at vhost_vdpa_net_cvq_map_elem
- Use only out size instead of iovec dev_buffers to know if the descriptor is
  effectively available, allowing to delete artificial !NULL VirtQueueElement
  on vhost_svq_add call.

v4:
- Actually use NetClientInfo callback.

v3:
- Route vhost-vdpa start code through NetClientInfo callback.
- Delete extra vhost_net_stop_one() call.

v2:
- Fix SIGSEGV dereferencing SVQ when not in svq mode

v1 from RFC:
- Do not reorder DRIVER_OK & enable patches.
- Delete leftovers

Eugenio Pérez (12):
  vhost: stop transfer elem ownership in vhost_handle_guest_kick
  vhost: use SVQ element ndescs instead of opaque data for desc
validation
  vhost: Delete useless read memory barrier
  vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  vhost_net: Add NetClientInfo prepare callback
  vhost_net: Add NetClientInfo stop callback
  vdpa: add net_vhost_vdpa_cvq_info NetClientInfo
  vdpa: Move command buffers map to start of net device
  vdpa: Extract vhost_vdpa_net_cvq_add from
vhost_vdpa_net_handle_ctrl_avail
  vhost_net: add NetClientState->load() callback
  vdpa: Add virtio-net mac address via CVQ at start
  vdpa: Delete CVQ migration blocker

 include/hw/virtio/vhost-vdpa.h |   1 -
 include/net/net.h  |   6 +
 hw/net/vhost_net.c |  17 +++
 hw/virtio/vhost-shadow-virtqueue.c |  27 ++--
 hw/virtio/vhost-vdpa.c |  14 --
 net/vhost-vdpa.c   | 227 ++---
 6 files changed, 180 insertions(+), 112 deletions(-)

-- 
2.31.1

[PATCH v7 07/12] vdpa: add net_vhost_vdpa_cvq_info NetClientInfo

2022-08-04 Thread Eugenio Pérez

Next patches will add a new info callback to restore NIC status through
CVQ. Since only the CVQ vhost device is needed, create it with a new
NetClientInfo.

Signed-off-by: Eugenio Pérez 
---
v5: Create a new NetClientInfo instead of reusing the dataplane one.
---
 net/vhost-vdpa.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ac1810723c..55e8a39a56 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -334,6 +334,16 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
 return true;
 }
 
+static NetClientInfo net_vhost_vdpa_cvq_info = {
+.type = NET_CLIENT_DRIVER_VHOST_VDPA,
+.size = sizeof(VhostVDPAState),
+.receive = vhost_vdpa_receive,
+.cleanup = vhost_vdpa_cleanup,
+.has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
+.has_ufo = vhost_vdpa_has_ufo,
+.check_peer_type = vhost_vdpa_check_peer_type,
+};
+
 /**
  * Do not forward commands not supported by SVQ. Otherwise, the device could
  * accept it and qemu would not know how to update the device model.
@@ -475,7 +485,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 nc = qemu_new_net_client(_vhost_vdpa_info, peer, device,
  name);
 } else {
-nc = qemu_new_net_control_client(_vhost_vdpa_info, peer,
+nc = qemu_new_net_control_client(_vhost_vdpa_cvq_info, peer,
  device, name);
 }
 snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
-- 
2.31.1

[PATCH 1/2] linux-user: Fix siginfo_t contents when jumping to non-readable pages

2022-08-04 Thread Ilya Leoshkevich

When the first instruction of a translation block is located in a
non-readable page, qemu-user fills siginfo_t correctly. For the other
instructions the result is as if it were the first instruction, which
is not correct.

The reason is that the current logic expects translate_insn() hook to
stop at the page boundary. This way only the first instruction can
cause a SEGV. However, this is quite difficult to properly implement
when the problematic instruction crosses a page boundary, and indeed
the actual implementations do not do this. Note that this can also
break self-modifying code detection when only bytes on the second page
are modified, but this is outside of the scope of this patch.

Instead of chaning all the translators, do a much simpler thing: when
such a situation is detected, start from scratch and stop right before
the problematic instruction.

Signed-off-by: Ilya Leoshkevich 
---
 accel/tcg/translate-all.c | 16 +++-
 accel/tcg/translator.c| 25 +
 include/hw/core/cpu.h |  2 ++
 linux-user/signal.c   |  5 +
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ef62a199c7..b4766f4661 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2295,12 +2295,18 @@ void page_set_flags(target_ulong start, target_ulong 
end, int flags)
  len != 0;
  len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
 PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+bool invalidate;
 
-/* If the write protection bit is set, then we invalidate
-   the code inside.  */
-if (!(p->flags & PAGE_WRITE) &&
-(flags & PAGE_WRITE) &&
-p->first_tb) {
+/*
+ * If the write protection bit is set, then we invalidate the code
+ * inside.  For qemu-user, we need to do this when PAGE_READ is cleared
+ * as well, in order to force a SEGV when trying to run this code.
+ */
+invalidate = !(p->flags & PAGE_WRITE) && (flags & PAGE_WRITE);
+#ifdef CONFIG_USER_ONLY
+invalidate |= (p->flags & PAGE_READ) && !(flags & PAGE_READ);
+#endif
+if (invalidate && p->first_tb) {
 tb_invalidate_phys_page(addr, 0);
 }
 if (reset_target_data) {
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index fe7af9b943..e444c17515 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -57,6 +57,18 @@ void translator_loop(const TranslatorOps *ops, 
DisasContextBase *db,
 uint32_t cflags = tb_cflags(tb);
 bool plugin_enabled;
 
+/*
+ * In case translate_insn hook touched an unreadable page, redo the
+ * translation until the problematic instruction.  We cannot just throw
+ * away the trailing ops, because the hook could have changed DisasContext.
+ */
+tcg_debug_assert(!cpu->translator_jmp);
+if (sigsetjmp(cpu->translator_jmp_env, 1) != 0) {
+cpu->translator_jmp = false;
+tcg_remove_ops_after(NULL);
+max_insns = db->num_insns - 1;
+}
+
 /* Initialize DisasContext */
 db->tb = tb;
 db->pc_first = tb->pc;
@@ -122,8 +134,21 @@ void translator_loop(const TranslatorOps *ops, 
DisasContextBase *db,
 db->is_jmp = DISAS_TOO_MANY;
 break;
 }
+
+/*
+ * Propagate SEGVs from the first instruction to the guest and handle
+ * the rest. This way guest's siginfo_t gets accurate pc and si_addr.
+ */
+cpu->translator_jmp = true;
 }
 
+/*
+ * Clear translator_jmp on all ways out of this function, otherwise
+ * instructions that fetch code as part of their operation will be
+ * confused.
+ */
+cpu->translator_jmp = false;
+
 /* Emit code to exit the TB, as indicated by db->is_jmp.  */
 ops->tb_stop(db, cpu);
 gen_tb_end(db->tb, db->num_insns);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 500503da13..6c1829b7f5 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -349,6 +349,8 @@ struct CPUState {
 int64_t icount_extra;
 uint64_t random_seed;
 sigjmp_buf jmp_env;
+bool translator_jmp;
+sigjmp_buf translator_jmp_env;
 
 QemuMutex work_mutex;
 QSIMPLEQ_HEAD(, qemu_work_item) work_list;
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 8d29bfaa6b..f7e77c8d2e 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -833,6 +833,11 @@ static void host_signal_handler(int host_sig, siginfo_t 
*info, void *puc)
 abi_ptr guest_addr;
 bool is_write;
 
+/* Translator wants to handle this. */
+if (helper_retaddr == 1 && cpu->translator_jmp) {
+siglongjmp(cpu->translator_jmp_env, 1);
+}
+
 host_addr = (uintptr_t)info->si_addr;
 
 /*
-- 
2.35.3

Re: [PATCH v2 1/1] target/ppc: fix unreachable code in do_ldst_quad()

2022-08-04 Thread Daniel Henrique Barboza





On 8/4/22 15:05, Peter Maydell wrote:

On Mon, 25 Jul 2022 at 21:24, Daniel Henrique Barboza
 wrote:


Coverity reports that commit fc34e81acd51 ("target/ppc: add macros to
check privilege level") turned the following code unreachable:

if (!prefixed && !(ctx->insns_flags2 & PPC2_LSQ_ISA207)) {
 /* lq and stq were privileged prior to V. 2.07 */
 REQUIRE_SV(ctx);


 CID 1490757:  Control flow issues  (UNREACHABLE)
 This code cannot be reached: "if (ctx->le_mode) {

 if (ctx->le_mode) {
 gen_align_no_le(ctx);
 return true;
 }
}

This happens because the macro REQUIRE_SV(), in CONFIG_USER_MODE, will
always result in a 'return true' statement. In fact, all REQUIRE_*
macros for target/ppc/translate.c behave the same way: if a condition
isn't met, an exception is generated and a 'return' statement is issued.

The difference is that all other callers are using it in insns that are
not implemented in user mode. do_ldst_quad(), on the other hand, is user
mode compatible.


This is a Coverity false positive, and I'd already marked it that way
in the Coverity UI back on the 20th. Coverity gets confused sometimes
by ifdeffery.

So you don't need this patch, unless you think the code is genuinely
better (more readable to humans, etc) this way.


The idea was to make Coverity happy. If Coverity is already happy then
let's drop this patch - there's no particular improvement made here that
justifies it.


Thanks,

Daniel



thanks
-- PMM

[PATCH v7 05/12] vhost_net: Add NetClientInfo prepare callback

2022-08-04 Thread Eugenio Pérez

This is used by the backend to perform actions before the device is
started.

In particular, vdpa net use it to map CVQ buffers to the device, so it
can send control commands using them.

Signed-off-by: Eugenio Pérez 
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 523136c7ac..3416bb3d46 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
+typedef int (NetPrepare)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
 NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
+NetPrepare *prepare;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..e1150d7532 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -244,6 +244,13 @@ static int vhost_net_start_one(struct vhost_net *net,
 struct vhost_vring_file file = { };
 int r;
 
+if (net->nc->info->prepare) {
+r = net->nc->info->prepare(net->nc);
+if (r < 0) {
+return r;
+}
+}
+
 r = vhost_dev_enable_notifiers(>dev, dev);
 if (r < 0) {
 goto fail_notifiers;
-- 
2.31.1

[PATCH v7 08/12] vdpa: Move command buffers map to start of net device

2022-08-04 Thread Eugenio Pérez

As this series will reuse them to restore the device state at the end of
a migration (or a device start), let's allocate only once at the device
start so we don't duplicate their map and unmap.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 123 ++-
 1 file changed, 58 insertions(+), 65 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 55e8a39a56..2c6a26cca0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -263,29 +263,20 @@ static size_t vhost_vdpa_net_cvq_cmd_page_len(void)
 return ROUND_UP(vhost_vdpa_net_cvq_cmd_len(), qemu_real_host_page_size());
 }
 
-/** Copy and map a guest buffer. */
-static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
-   const struct iovec *out_data,
-   size_t out_num, size_t data_len, void *buf,
-   size_t *written, bool write)
+/** Map CVQ buffer. */
+static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
+  bool write)
 {
 DMAMap map = {};
 int r;
 
-if (unlikely(!data_len)) {
-qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid legnth of %s buffer\n",
-  __func__, write ? "in" : "out");
-return false;
-}
-
-*written = iov_to_buf(out_data, out_num, 0, buf, data_len);
 map.translated_addr = (hwaddr)(uintptr_t)buf;
-map.size = vhost_vdpa_net_cvq_cmd_page_len() - 1;
+map.size = size - 1;
 map.perm = write ? IOMMU_RW : IOMMU_RO,
 r = vhost_iova_tree_map_alloc(v->iova_tree, );
 if (unlikely(r != IOVA_OK)) {
 error_report("Cannot map injected element");
-return false;
+return r;
 }
 
 r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
@@ -294,50 +285,58 @@ static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
 goto dma_map_err;
 }
 
-return true;
+return 0;
 
 dma_map_err:
 vhost_iova_tree_remove(v->iova_tree, );
-return false;
+return r;
 }
 
-/**
- * Copy the guest element into a dedicated buffer suitable to be sent to NIC
- *
- * @iov: [0] is the out buffer, [1] is the in one
- */
-static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
-VirtQueueElement *elem,
-struct iovec *iov)
+static int vhost_vdpa_net_cvq_prepare(NetClientState *nc)
 {
-size_t in_copied;
-bool ok;
+VhostVDPAState *s;
+int r;
 
-iov[0].iov_base = s->cvq_cmd_out_buffer;
-ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, elem->out_sg, elem->out_num,
-vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
-[0].iov_len, false);
-if (unlikely(!ok)) {
-return false;
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+s = DO_UPCAST(VhostVDPAState, nc, nc);
+if (!s->vhost_vdpa.shadow_vqs_enabled) {
+return 0;
 }
 
-iov[1].iov_base = s->cvq_cmd_in_buffer;
-ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, NULL, 0,
-sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
-_copied, true);
-if (unlikely(!ok)) {
+r = vhost_vdpa_cvq_map_buf(>vhost_vdpa, s->cvq_cmd_out_buffer,
+   vhost_vdpa_net_cvq_cmd_page_len(), false);
+if (unlikely(r < 0)) {
+return r;
+}
+
+r = vhost_vdpa_cvq_map_buf(>vhost_vdpa, s->cvq_cmd_in_buffer,
+   vhost_vdpa_net_cvq_cmd_page_len(), true);
+if (unlikely(r < 0)) {
 vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_out_buffer);
-return false;
 }
 
-iov[1].iov_len = sizeof(virtio_net_ctrl_ack);
-return true;
+return r;
+}
+
+static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (s->vhost_vdpa.shadow_vqs_enabled) {
+vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_out_buffer);
+vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_in_buffer);
+}
 }
 
 static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
 .receive = vhost_vdpa_receive,
+.prepare = vhost_vdpa_net_cvq_prepare,
+.stop = vhost_vdpa_net_cvq_stop,
 .cleanup = vhost_vdpa_cleanup,
 .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
 .has_ufo = vhost_vdpa_has_ufo,
@@ -348,19 +347,17 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
  * Do not forward commands not supported by SVQ. Otherwise, the device could
  * accept it and qemu would not know how to update the device model.
  */
-static bool vhost_vdpa_net_cvq_validate_cmd(const struct iovec *out,
-size_t out_num)
+static bool

Re: [PATCH] disas: Add LoongArch support

2022-08-04 Thread Richard Henderson


On 8/4/22 10:29, Qi Hu wrote:

Signed-off-by: Qi Hu 
---
  disas.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/disas.c b/disas.c
index e31438f349..d44f46 100644
--- a/disas.c
+++ b/disas.c
@@ -176,6 +176,8 @@ static void initialize_debug_host(CPUDebug *s)
  #else
  #error unsupported RISC-V ABI
  #endif
+#elif defined(__loongarch__)
+s->info.print_insn = print_insn_loongarch;


This is very much insufficient.  Try --target-list=i386-softmmu and watch it 
fail to link.
You need to modify the build rules to make certain that the loongarch disassembler is 
built for loongarch host.



r~

[PATCH v6 11/12] vdpa: Add virtio-net mac address via CVQ at start

2022-08-04 Thread Eugenio Pérez

This is needed so the destination vdpa device see the same state a the
guest set in the source.

Signed-off-by: Eugenio Pérez 
---
v6:
* Map and unmap command buffers at the start and end of device usage.

v5:
* Rename s/start/load/
* Use independent NetClientInfo to only add load callback on cvq.
---
 net/vhost-vdpa.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a3ca6af69f..7a50d46dae 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -363,11 +363,54 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, 
size_t out_len,
 return vhost_svq_poll(svq);
 }
 
+static int vhost_vdpa_net_load(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_vdpa *v = >vhost_vdpa;
+VirtIONet *n;
+uint64_t features;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (!v->shadow_vqs_enabled) {
+return 0;
+}
+
+n = VIRTIO_NET(v->dev->vdev);
+features = v->dev->vdev->host_features;
+if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+const struct virtio_net_ctrl_hdr ctrl = {
+.class = VIRTIO_NET_CTRL_MAC,
+.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+};
+char *cursor = s->cvq_cmd_out_buffer;
+ssize_t dev_written;
+virtio_net_ctrl_ack state;
+
+memcpy(cursor, , sizeof(ctrl));
+cursor += sizeof(ctrl);
+memcpy(cursor, n->mac, sizeof(n->mac));
+cursor += sizeof(n->mac);
+
+dev_written = vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + sizeof(n->mac),
+ sizeof(state));
+if (unlikely(dev_written < 0)) {
+return dev_written;
+}
+
+memcpy(, s->cvq_cmd_in_buffer, sizeof(state));
+return state == VIRTIO_NET_OK ? 0 : -1;
+}
+
+return 0;
+}
+
 static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
 .receive = vhost_vdpa_receive,
 .prepare = vhost_vdpa_net_cvq_prepare,
+.load = vhost_vdpa_net_load,
 .stop = vhost_vdpa_net_cvq_stop,
 .cleanup = vhost_vdpa_cleanup,
 .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
-- 
2.31.1

[PATCH v7 04/12] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush

2022-08-04 Thread Eugenio Pérez

Since QEMU will be able to inject new elements on CVQ to restore the
state, we need not to depend on a VirtQueueElement to know if a new
element has been used by the device or not. Instead of check that, check
if there are new elements only using used idx on vhost_svq_flush.

Signed-off-by: Eugenio Pérez 
---
v6: Change less from the previous function
---
 hw/virtio/vhost-shadow-virtqueue.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 1b49bf54f2..f863b08627 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -499,17 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
 {
 int64_t start_us = g_get_monotonic_time();
+uint32_t len;
+
 do {
-uint32_t len;
-VirtQueueElement *elem = vhost_svq_get_buf(svq, );
-if (elem) {
-return len;
+if (vhost_svq_more_used(svq)) {
+break;
 }
 
 if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
 return 0;
 }
 } while (true);
+
+vhost_svq_get_buf(svq, );
+return len;
 }
 
 /**
-- 
2.31.1

[PATCH v7 01/12] vhost: stop transfer elem ownership in vhost_handle_guest_kick

2022-08-04 Thread Eugenio Pérez

It was easier to allow vhost_svq_add to handle the memory. Now that we
will allow qemu to add elements to a SVQ without the guest's knowledge,
it's better to handle it in the caller.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e4956728dd..ffd2b2c972 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -233,9 +233,6 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 /**
  * Add an element to a SVQ.
  *
- * The caller must check that there is enough slots for the new element. It
- * takes ownership of the element: In case of failure not ENOSPC, it is free.
- *
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
@@ -252,7 +249,6 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct 
iovec *out_sg,
 
 ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, _head);
 if (unlikely(!ok)) {
-g_free(elem);
 return -EINVAL;
 }
 
@@ -293,7 +289,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 virtio_queue_set_notification(svq->vq, false);
 
 while (true) {
-VirtQueueElement *elem;
+g_autofree VirtQueueElement *elem;
 int r;
 
 if (svq->next_guest_avail_elem) {
@@ -324,12 +320,14 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
  * queue the current guest descriptor and ignore kicks
  * until some elements are used.
  */
-svq->next_guest_avail_elem = elem;
+svq->next_guest_avail_elem = g_steal_pointer();
 }
 
 /* VQ is full or broken, just return and ignore kicks */
 return;
 }
+/* elem belongs to SVQ or external caller now */
+elem = NULL;
 }
 
 virtio_queue_set_notification(svq->vq, true);
-- 
2.31.1

[PATCH v6 09/12] vdpa: Extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail

2022-08-04 Thread Eugenio Pérez

So we can reuse it to inject state messages.

Signed-off-by: Eugenio Pérez 
--
v6:
* Do not assume in buffer sent to the device is sizeof(virtio_net_ctrl_ack)

v5:
* Do not use an artificial !NULL VirtQueueElement
* Use only out size instead of iovec dev_buffers for these functions.
---
 net/vhost-vdpa.c | 59 +++-
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2c6a26cca0..a3ca6af69f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -331,6 +331,38 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
 }
 }
 
+static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
+  size_t in_len)
+{
+/* Buffers for the device */
+const struct iovec out = {
+.iov_base = s->cvq_cmd_out_buffer,
+.iov_len = out_len,
+};
+const struct iovec in = {
+.iov_base = s->cvq_cmd_in_buffer,
+.iov_len = sizeof(virtio_net_ctrl_ack),
+};
+VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
+int r;
+
+r = vhost_svq_add(svq, , 1, , 1, NULL);
+if (unlikely(r != 0)) {
+if (unlikely(r == -ENOSPC)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+  __func__);
+}
+return r;
+}
+
+/*
+ * We can poll here since we've had BQL from the time we sent the
+ * descriptor. Also, we need to take the answer before SVQ pulls by itself,
+ * when BQL is released
+ */
+return vhost_svq_poll(svq);
+}
+
 static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
@@ -387,23 +419,18 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 void *opaque)
 {
 VhostVDPAState *s = opaque;
-size_t in_len, dev_written;
+size_t in_len;
 virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
 /* Out buffer sent to both the vdpa device and the device model */
 struct iovec out = {
 .iov_base = s->cvq_cmd_out_buffer,
 };
-/* In buffer sent to the device */
-const struct iovec dev_in = {
-.iov_base = s->cvq_cmd_in_buffer,
-.iov_len = sizeof(virtio_net_ctrl_ack),
-};
 /* in buffer used for device model */
 const struct iovec in = {
 .iov_base = ,
 .iov_len = sizeof(status),
 };
-int r = -EINVAL;
+ssize_t dev_written = -EINVAL;
 bool ok;
 
 out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
@@ -414,21 +441,11 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 goto out;
 }
 
-r = vhost_svq_add(svq, , 1, _in, 1, elem);
-if (unlikely(r != 0)) {
-if (unlikely(r == -ENOSPC)) {
-qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
-  __func__);
-}
+dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
+if (unlikely(dev_written < 0)) {
 goto out;
 }
 
-/*
- * We can poll here since we've had BQL from the time we sent the
- * descriptor. Also, we need to take the answer before SVQ pulls by itself,
- * when BQL is released
- */
-dev_written = vhost_svq_poll(svq);
 if (unlikely(dev_written < sizeof(status))) {
 error_report("Insufficient written data (%zu)", dev_written);
 goto out;
@@ -436,7 +453,7 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 
 memcpy(, s->cvq_cmd_in_buffer, sizeof(status));
 if (status != VIRTIO_NET_OK) {
-goto out;
+return VIRTIO_NET_ERR;
 }
 
 status = VIRTIO_NET_ERR;
@@ -453,7 +470,7 @@ out:
 }
 vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
 g_free(elem);
-return r;
+return MAX(dev_written, 0);
 }
 
 static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
-- 
2.31.1

[PATCH v7 02/12] vhost: use SVQ element ndescs instead of opaque data for desc validation

2022-08-04 Thread Eugenio Pérez

Since we're going to allow SVQ to add elements without the guest's
knowledge and without its own VirtQueueElement, it's easier to check if
an element is a valid head checking a different thing than the
VirtQueueElement.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index ffd2b2c972..e6eebd0e8d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -414,7 +414,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return NULL;
 }
 
-if (unlikely(!svq->desc_state[used_elem.id].elem)) {
+if (unlikely(!svq->desc_state[used_elem.id].ndescs)) {
 qemu_log_mask(LOG_GUEST_ERROR,
 "Device %s says index %u is used, but it was not available",
 svq->vdev->name, used_elem.id);
@@ -422,6 +422,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 }
 
 num = svq->desc_state[used_elem.id].ndescs;
+svq->desc_state[used_elem.id].ndescs = 0;
 last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
-- 
2.31.1

[PATCH v6 07/12] vdpa: add net_vhost_vdpa_cvq_info NetClientInfo

2022-08-04 Thread Eugenio Pérez

Next patches will add a new info callback to restore NIC status through
CVQ. Since only the CVQ vhost device is needed, create it with a new
NetClientInfo.

Signed-off-by: Eugenio Pérez 
---
v5: Create a new NetClientInfo instead of reusing the dataplane one.
---
 net/vhost-vdpa.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ac1810723c..55e8a39a56 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -334,6 +334,16 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
 return true;
 }
 
+static NetClientInfo net_vhost_vdpa_cvq_info = {
+.type = NET_CLIENT_DRIVER_VHOST_VDPA,
+.size = sizeof(VhostVDPAState),
+.receive = vhost_vdpa_receive,
+.cleanup = vhost_vdpa_cleanup,
+.has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
+.has_ufo = vhost_vdpa_has_ufo,
+.check_peer_type = vhost_vdpa_check_peer_type,
+};
+
 /**
  * Do not forward commands not supported by SVQ. Otherwise, the device could
  * accept it and qemu would not know how to update the device model.
@@ -475,7 +485,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 nc = qemu_new_net_client(_vhost_vdpa_info, peer, device,
  name);
 } else {
-nc = qemu_new_net_control_client(_vhost_vdpa_info, peer,
+nc = qemu_new_net_control_client(_vhost_vdpa_cvq_info, peer,
  device, name);
 }
 snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
-- 
2.31.1

[PATCH v6 10/12] vhost_net: add NetClientState->load() callback

2022-08-04 Thread Eugenio Pérez

It allows per-net client operations right after device's successful
start. In particular, to load the device status.

Vhost-vdpa net will use it to add the CVQ buffers to restore the device
status.

Signed-off-by: Eugenio Pérez 
---
v5: Rename start / load, naming it more specifically.
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 7aa1ec0974..356e682ab6 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -45,6 +45,7 @@ typedef struct NICConf {
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
 typedef int (NetPrepare)(NetClientState *);
+typedef int (NetLoad)(NetClientState *);
 typedef void (NetStop)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
@@ -74,6 +75,7 @@ typedef struct NetClientInfo {
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetPrepare *prepare;
+NetLoad *load;
 NetStop *stop;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 10bca15446..6b83d5503f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -281,6 +281,13 @@ static int vhost_net_start_one(struct vhost_net *net,
 }
 }
 }
+
+if (net->nc->info->load) {
+r = net->nc->info->load(net->nc);
+if (r < 0) {
+goto fail;
+}
+}
 return 0;
 fail:
 file.fd = -1;
-- 
2.31.1

[PATCH v6 04/12] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush

2022-08-04 Thread Eugenio Pérez

Since QEMU will be able to inject new elements on CVQ to restore the
state, we need not to depend on a VirtQueueElement to know if a new
element has been used by the device or not. Instead of check that, check
if there are new elements only using used idx on vhost_svq_flush.

Signed-off-by: Eugenio Pérez 
---
v6: Change less from the previous function
---
 hw/virtio/vhost-shadow-virtqueue.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 1b49bf54f2..f863b08627 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -499,17 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
 {
 int64_t start_us = g_get_monotonic_time();
+uint32_t len;
+
 do {
-uint32_t len;
-VirtQueueElement *elem = vhost_svq_get_buf(svq, );
-if (elem) {
-return len;
+if (vhost_svq_more_used(svq)) {
+break;
 }
 
 if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
 return 0;
 }
 } while (true);
+
+vhost_svq_get_buf(svq, );
+return len;
 }
 
 /**
-- 
2.31.1

[PATCH 0/2] linux-user: Fix siginfo_t contents when jumping to non-readable pages

2022-08-04 Thread Ilya Leoshkevich

Hi,

I noticed that when we get a SEGV due to jumping to non-readable
memory, sometimes si_addr and program counter in siginfo_t are slightly
off. I tracked this down to the assumption that translators stop before
the end of a page, while in reality they may stop right after it.

Patch 1 fixes the issue, patch 2 adds tests.

Best regards,
Ilya

Ilya Leoshkevich (2):
  linux-user: Fix siginfo_t contents when jumping to non-readable pages
  tests/tcg: Test siginfo_t contents when jumping to non-readable pages

 accel/tcg/translate-all.c|  16 ++--
 accel/tcg/translator.c   |  25 ++
 include/hw/core/cpu.h|   2 +
 linux-user/signal.c  |   5 ++
 tests/tcg/multiarch/noexec.h | 114 
 tests/tcg/s390x/Makefile.target  |   1 +
 tests/tcg/s390x/noexec.c | 145 +++
 tests/tcg/x86_64/Makefile.target |   3 +-
 tests/tcg/x86_64/noexec.c| 116 +
 9 files changed, 421 insertions(+), 6 deletions(-)
 create mode 100644 tests/tcg/multiarch/noexec.h
 create mode 100644 tests/tcg/s390x/noexec.c
 create mode 100644 tests/tcg/x86_64/noexec.c

-- 
2.35.3

[PATCH 2/2] tests/tcg: Test siginfo_t contents when jumping to non-readable pages

2022-08-04 Thread Ilya Leoshkevich

Add x86_64 and s390x tests to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/multiarch/noexec.h | 114 
 tests/tcg/s390x/Makefile.target  |   1 +
 tests/tcg/s390x/noexec.c | 145 +++
 tests/tcg/x86_64/Makefile.target |   3 +-
 tests/tcg/x86_64/noexec.c| 116 +
 5 files changed, 378 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/multiarch/noexec.h
 create mode 100644 tests/tcg/s390x/noexec.c
 create mode 100644 tests/tcg/x86_64/noexec.c

diff --git a/tests/tcg/multiarch/noexec.h b/tests/tcg/multiarch/noexec.h
new file mode 100644
index 00..a76e0aa9ea
--- /dev/null
+++ b/tests/tcg/multiarch/noexec.h
@@ -0,0 +1,114 @@
+/*
+ * Common code for arch-specific MMU_INST_FETCH fault testing.
+ *
+ * Declare struct arch_noexec_test before including this file and define
+ * arch_check_mcontext() after that.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Forward declarations. */
+
+static void arch_check_mcontext(const struct arch_noexec_test *test,
+const mcontext_t *ctx);
+
+/* Utility functions. */
+
+static void safe_print(const char *s)
+{
+write(0, s, strlen(s));
+}
+
+static void safe_puts(const char *s)
+{
+safe_print(s);
+safe_print("\n");
+}
+
+#define PAGE_ALIGN(p) (void *)((unsigned long)(p) & ~0xfffUL)
+
+/* Testing infrastructure. */
+
+struct noexec_test {
+const char *name;
+void (*func)(int);
+void *page;
+void *expected_si_addr;
+struct arch_noexec_test arch;
+};
+
+static const struct noexec_test *current_noexec_test;
+
+static void handle_segv(int sig, siginfo_t *info, void *ucontext)
+{
+int err;
+
+if (current_noexec_test == NULL) {
+safe_puts("[  FAILED  ] unexpected SEGV");
+_exit(1);
+}
+
+if (info->si_addr != current_noexec_test->expected_si_addr) {
+safe_puts("[  FAILED  ] wrong si_addr");
+_exit(1);
+}
+
+arch_check_mcontext(_noexec_test->arch,
+&((ucontext_t *)ucontext)->uc_mcontext);
+
+err = mprotect(current_noexec_test->page, 0x1000, PROT_READ | PROT_EXEC);
+if (err != 0) {
+safe_puts("[  FAILED  ] mprotect() failed");
+_exit(1);
+}
+
+current_noexec_test = NULL;
+}
+
+static void test_noexec_1(const struct noexec_test *test)
+{
+int ret;
+
+/* Trigger TB creation in order to test invalidation. */
+test->func(0);
+
+ret = mprotect(test->page, 0x1000, PROT_NONE);
+assert(ret == 0);
+
+/* Trigger SEGV and check that handle_segv() ran. */
+current_noexec_test = test;
+test->func(0);
+assert(current_noexec_test == NULL);
+}
+
+static int test_noexec(struct noexec_test *tests, size_t n_tests)
+{
+struct sigaction act;
+size_t i;
+int err;
+
+memset(, 0, sizeof(act));
+act.sa_sigaction = handle_segv;
+act.sa_flags = SA_SIGINFO;
+err = sigaction(SIGSEGV, , NULL);
+assert(err == 0);
+
+for (i = 0; i < n_tests; i++) {
+struct noexec_test *test = [i];
+
+safe_print("[ RUN  ] ");
+safe_puts(test->name);
+test_noexec_1(test);
+safe_puts("[   OK ]");
+}
+
+safe_puts("[  PASSED  ]");
+
+return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 1a7a4a2f59..5e13a41c3f 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -16,6 +16,7 @@ TESTS+=shift
 TESTS+=trap
 TESTS+=signals-s390x
 TESTS+=branch-relative-long
+TESTS+=noexec
 
 Z14_TESTS=vfminmax
 vfminmax: LDFLAGS+=-lm
diff --git a/tests/tcg/s390x/noexec.c b/tests/tcg/s390x/noexec.c
new file mode 100644
index 00..2dfc9ee817
--- /dev/null
+++ b/tests/tcg/s390x/noexec.c
@@ -0,0 +1,145 @@
+#define _GNU_SOURCE
+
+struct arch_noexec_test {
+void *expected_pswa;
+unsigned long expected_r2;
+};
+
+#include "../multiarch/noexec.h"
+
+static void arch_check_mcontext(const struct arch_noexec_test *test,
+const mcontext_t *ctx) {
+if (ctx->psw.addr != (unsigned long)test->expected_pswa) {
+safe_puts("[  FAILED  ] wrong psw.addr");
+_exit(1);
+}
+
+if (ctx->gregs[2] != test->expected_r2) {
+safe_puts("[  FAILED  ] wrong r2");
+_exit(1);
+}
+}
+
+#define DEFINE_NX(name, offset) \
+void name ## _1(int); \
+void name ## _2(int); \
+void name ## _exrl(int); \
+extern const short name ## _end[]; \
+asm(/* Go to the specified page offset. */ \
+".align 0x1000\n" \
+".org .+" #offset "\n" \
+/* %r2 is 0 on entry, overwrite it with 1. */ \
+".globl " #name "_1\n" \
+#name "_1:\n" \
+".cfi_startproc\n" \
+"lgfi %r2,1\n" \
+/* Overwrite %2 with 2. */ \
+".globl " #name "_2\n" \
+#name "_2:\n" \
+"lgfi %r2,2\n" \

[PATCH v6 08/12] vdpa: Move command buffers map to start of net device

2022-08-04 Thread Eugenio Pérez

As this series will reuse them to restore the device state at the end of
a migration (or a device start), let's allocate only once at the device
start so we don't duplicate their map and unmap.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 123 ++-
 1 file changed, 58 insertions(+), 65 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 55e8a39a56..2c6a26cca0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -263,29 +263,20 @@ static size_t vhost_vdpa_net_cvq_cmd_page_len(void)
 return ROUND_UP(vhost_vdpa_net_cvq_cmd_len(), qemu_real_host_page_size());
 }
 
-/** Copy and map a guest buffer. */
-static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
-   const struct iovec *out_data,
-   size_t out_num, size_t data_len, void *buf,
-   size_t *written, bool write)
+/** Map CVQ buffer. */
+static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
+  bool write)
 {
 DMAMap map = {};
 int r;
 
-if (unlikely(!data_len)) {
-qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid legnth of %s buffer\n",
-  __func__, write ? "in" : "out");
-return false;
-}
-
-*written = iov_to_buf(out_data, out_num, 0, buf, data_len);
 map.translated_addr = (hwaddr)(uintptr_t)buf;
-map.size = vhost_vdpa_net_cvq_cmd_page_len() - 1;
+map.size = size - 1;
 map.perm = write ? IOMMU_RW : IOMMU_RO,
 r = vhost_iova_tree_map_alloc(v->iova_tree, );
 if (unlikely(r != IOVA_OK)) {
 error_report("Cannot map injected element");
-return false;
+return r;
 }
 
 r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
@@ -294,50 +285,58 @@ static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
 goto dma_map_err;
 }
 
-return true;
+return 0;
 
 dma_map_err:
 vhost_iova_tree_remove(v->iova_tree, );
-return false;
+return r;
 }
 
-/**
- * Copy the guest element into a dedicated buffer suitable to be sent to NIC
- *
- * @iov: [0] is the out buffer, [1] is the in one
- */
-static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
-VirtQueueElement *elem,
-struct iovec *iov)
+static int vhost_vdpa_net_cvq_prepare(NetClientState *nc)
 {
-size_t in_copied;
-bool ok;
+VhostVDPAState *s;
+int r;
 
-iov[0].iov_base = s->cvq_cmd_out_buffer;
-ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, elem->out_sg, elem->out_num,
-vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
-[0].iov_len, false);
-if (unlikely(!ok)) {
-return false;
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+s = DO_UPCAST(VhostVDPAState, nc, nc);
+if (!s->vhost_vdpa.shadow_vqs_enabled) {
+return 0;
 }
 
-iov[1].iov_base = s->cvq_cmd_in_buffer;
-ok = vhost_vdpa_cvq_map_buf(>vhost_vdpa, NULL, 0,
-sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
-_copied, true);
-if (unlikely(!ok)) {
+r = vhost_vdpa_cvq_map_buf(>vhost_vdpa, s->cvq_cmd_out_buffer,
+   vhost_vdpa_net_cvq_cmd_page_len(), false);
+if (unlikely(r < 0)) {
+return r;
+}
+
+r = vhost_vdpa_cvq_map_buf(>vhost_vdpa, s->cvq_cmd_in_buffer,
+   vhost_vdpa_net_cvq_cmd_page_len(), true);
+if (unlikely(r < 0)) {
 vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_out_buffer);
-return false;
 }
 
-iov[1].iov_len = sizeof(virtio_net_ctrl_ack);
-return true;
+return r;
+}
+
+static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (s->vhost_vdpa.shadow_vqs_enabled) {
+vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_out_buffer);
+vhost_vdpa_cvq_unmap_buf(>vhost_vdpa, s->cvq_cmd_in_buffer);
+}
 }
 
 static NetClientInfo net_vhost_vdpa_cvq_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
 .receive = vhost_vdpa_receive,
+.prepare = vhost_vdpa_net_cvq_prepare,
+.stop = vhost_vdpa_net_cvq_stop,
 .cleanup = vhost_vdpa_cleanup,
 .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
 .has_ufo = vhost_vdpa_has_ufo,
@@ -348,19 +347,17 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
  * Do not forward commands not supported by SVQ. Otherwise, the device could
  * accept it and qemu would not know how to update the device model.
  */
-static bool vhost_vdpa_net_cvq_validate_cmd(const struct iovec *out,
-size_t out_num)
+static bool

[PATCH v6 06/12] vhost_net: Add NetClientInfo stop callback

2022-08-04 Thread Eugenio Pérez

Used by the backend to perform actions after the device is stopped.

In particular, vdpa net use it to unmap CVQ buffers to the device,
cleaning the actions performend in prepare().

Signed-off-by: Eugenio Pérez 
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 3416bb3d46..7aa1ec0974 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -45,6 +45,7 @@ typedef struct NICConf {
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
 typedef int (NetPrepare)(NetClientState *);
+typedef void (NetStop)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
@@ -73,6 +74,7 @@ typedef struct NetClientInfo {
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetPrepare *prepare;
+NetStop *stop;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e1150d7532..10bca15446 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -320,6 +320,9 @@ static void vhost_net_stop_one(struct vhost_net *net,
 net->nc->info->poll(net->nc, true);
 }
 vhost_dev_stop(>dev, dev);
+if (net->nc->info->stop) {
+net->nc->info->stop(net->nc);
+}
 vhost_dev_disable_notifiers(>dev, dev);
 }
 
-- 
2.31.1

[PATCH v6 03/12] vhost: Delete useless read memory barrier

2022-08-04 Thread Eugenio Pérez

As discussed in previous series [1], this memory barrier is useless with
the atomic read of used idx at vhost_svq_more_used. Deleting it.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02616.html

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e6eebd0e8d..1b49bf54f2 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -509,9 +509,6 @@ size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
 if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
 return 0;
 }
-
-/* Make sure we read new used_idx */
-smp_rmb();
 } while (true);
 }
 
-- 
2.31.1

Re: [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ

2022-08-04 Thread Eugenio Perez Martin

On Thu, Aug 4, 2022 at 6:21 AM Jason Wang  wrote:
>
>
> 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
> > virtio-net device model is updated. The migration was blocked because 
> > although
> > the state can be megrated between VMM it was not possible to restore on the
> > destination NIC.
> >
> > This series add support for SVQ to inject external messages without the 
> > guest's
> > knowledge, so before the guest is resumed all the guest visible state is
> > restored. It is done using standard CVQ messages, so the vhost-vdpa device 
> > does
> > not need to learn how to restore it: As long as they have the feature, they
> > know how to handle it.
> >
> > This series needs fixes [1], [2] and [3] to be applied to achieve full live
> > migration.
> >
> > Thanks!
> >
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02984.html
> > [2] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg03993.html
>
>
> Note that the above has been merged into master.
>
> And the series looks good overall, just some comments to make the code
> easier to be read and maintained in the future.
>

I think I addressed all of them, plus some others that were decided to
leave for later. We can revert them if it's not fine.

Sending a new version.

Thanks!

[PATCH v6 02/12] vhost: use SVQ element ndescs instead of opaque data for desc validation

2022-08-04 Thread Eugenio Pérez

Since we're going to allow SVQ to add elements without the guest's
knowledge and without its own VirtQueueElement, it's easier to check if
an element is a valid head checking a different thing than the
VirtQueueElement.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index ffd2b2c972..e6eebd0e8d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -414,7 +414,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return NULL;
 }
 
-if (unlikely(!svq->desc_state[used_elem.id].elem)) {
+if (unlikely(!svq->desc_state[used_elem.id].ndescs)) {
 qemu_log_mask(LOG_GUEST_ERROR,
 "Device %s says index %u is used, but it was not available",
 svq->vdev->name, used_elem.id);
@@ -422,6 +422,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 }
 
 num = svq->desc_state[used_elem.id].ndescs;
+svq->desc_state[used_elem.id].ndescs = 0;
 last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
-- 
2.31.1

Re: [PATCH v2 12/20] ppc/ppc405: QOM'ify EBC

2022-08-04 Thread Peter Maydell

On Thu, 4 Aug 2022 at 19:03, BALATON Zoltan  wrote:
> I was trying to find out how to do it but I don't understand QOM enough to
> answer the simple question of how to get the cpu object from QOM. My
> guesses are:
>
> object_resolve_path_type("/machine", TYPE_POWERPC_CPU, NULL)
>
> or maybe
>
> object_resolve_path_at(OBJECT(dev)->parent, "cpu")
>
> or how do these functions work and what is the preferred way to retrieve
> an object from the QOM tree? This is what I hoped someone with more
> understanding of QOM could answer.

The standard approach that we use elsewhere in the tree for handling
"this device needs to have a pointer to a CPU object or whatever"
is "the device has a QOM link property, and the SoC sets that
property when it creates the device".

There are other ways it could in theory be done, but there is
benefit in consistency, and "define and set the property" is
straightforward. It also means the device object doesn't have
to know anything about the way the SoC container is laid out.

(It's usually worth looking at whether there are cleanups
that could mean the device doesn't have to have a pointer to
that other object at all -- but that isn't always the case,
or the cleanups would be a big job in their own right that
are better not tangled up with QOMification.)

thanks
-- PMM

[PATCH v6 01/12] vhost: stop transfer elem ownership in vhost_handle_guest_kick

2022-08-04 Thread Eugenio Pérez

It was easier to allow vhost_svq_add to handle the memory. Now that we
will allow qemu to add elements to a SVQ without the guest's knowledge,
it's better to handle it in the caller.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e4956728dd..ffd2b2c972 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -233,9 +233,6 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 /**
  * Add an element to a SVQ.
  *
- * The caller must check that there is enough slots for the new element. It
- * takes ownership of the element: In case of failure not ENOSPC, it is free.
- *
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
@@ -252,7 +249,6 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct 
iovec *out_sg,
 
 ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, _head);
 if (unlikely(!ok)) {
-g_free(elem);
 return -EINVAL;
 }
 
@@ -293,7 +289,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 virtio_queue_set_notification(svq->vq, false);
 
 while (true) {
-VirtQueueElement *elem;
+g_autofree VirtQueueElement *elem;
 int r;
 
 if (svq->next_guest_avail_elem) {
@@ -324,12 +320,14 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
  * queue the current guest descriptor and ignore kicks
  * until some elements are used.
  */
-svq->next_guest_avail_elem = elem;
+svq->next_guest_avail_elem = g_steal_pointer();
 }
 
 /* VQ is full or broken, just return and ignore kicks */
 return;
 }
+/* elem belongs to SVQ or external caller now */
+elem = NULL;
 }
 
 virtio_queue_set_notification(svq->vq, true);
-- 
2.31.1

[PATCH v6 05/12] vhost_net: Add NetClientInfo prepare callback

2022-08-04 Thread Eugenio Pérez

This is used by the backend to perform actions before the device is
started.

In particular, vdpa net use it to map CVQ buffers to the device, so it
can send control commands using them.

Signed-off-by: Eugenio Pérez 
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 523136c7ac..3416bb3d46 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
+typedef int (NetPrepare)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
 NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
+NetPrepare *prepare;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..e1150d7532 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -244,6 +244,13 @@ static int vhost_net_start_one(struct vhost_net *net,
 struct vhost_vring_file file = { };
 int r;
 
+if (net->nc->info->prepare) {
+r = net->nc->info->prepare(net->nc);
+if (r < 0) {
+return r;
+}
+}
+
 r = vhost_dev_enable_notifiers(>dev, dev);
 if (r < 0) {
 goto fail_notifiers;
-- 
2.31.1

[PATCH v6 12/12] vdpa: Delete CVQ migration blocker

2022-08-04 Thread Eugenio Pérez

We can restore the device state in the destination via CVQ now. Remove
the migration blocker.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  1 -
 hw/virtio/vhost-vdpa.c | 14 --
 net/vhost-vdpa.c   |  2 --
 3 files changed, 17 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d10a89303e..d85643 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -35,7 +35,6 @@ typedef struct vhost_vdpa {
 bool shadow_vqs_enabled;
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
-Error *migration_blocker;
 GPtrArray *shadow_vqs;
 const VhostShadowVirtqueueOps *shadow_vq_ops;
 void *shadow_vq_ops_opaque;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7e28d2f674..4b0cfc0f56 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1033,13 +1033,6 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 return true;
 }
 
-if (v->migration_blocker) {
-int r = migrate_add_blocker(v->migration_blocker, );
-if (unlikely(r < 0)) {
-return false;
-}
-}
-
 for (i = 0; i < v->shadow_vqs->len; ++i) {
 VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
 VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -1082,10 +1075,6 @@ err:
 vhost_svq_stop(svq);
 }
 
-if (v->migration_blocker) {
-migrate_del_blocker(v->migration_blocker);
-}
-
 return false;
 }
 
@@ -1105,9 +1094,6 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 }
 }
 
-if (v->migration_blocker) {
-migrate_del_blocker(v->migration_blocker);
-}
 return true;
 }
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 7a50d46dae..b70fdb49f5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -558,8 +558,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.shadow_vq_ops = _vdpa_net_svq_ops;
 s->vhost_vdpa.shadow_vq_ops_opaque = s;
-error_setg(>vhost_vdpa.migration_blocker,
-   "Migration disabled: vhost-vdpa uses CVQ.");
 }
 ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
-- 
2.31.1

[PATCH v6 00/12] NIC vhost-vdpa state restore via Shadow CVQ

2022-08-04 Thread Eugenio Pérez

CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
virtio-net device model is updated. The migration was blocked because although
the state can be megrated between VMM it was not possible to restore on the
destination NIC.

This series add support for SVQ to inject external messages without the guest's
knowledge, so before the guest is resumed all the guest visible state is
restored. It is done using standard CVQ messages, so the vhost-vdpa device does
not need to learn how to restore it: As long as they have the feature, they
know how to handle it.

This series needs fix [1] to be applied to achieve full live
migration.

Thanks!

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00325.html

v6:
- Move map and unmap of the buffers to the start and stop of the device. This
  implies more callbacks on NetClientInfo, but simplifies the SVQ CVQ code.
- Not assume that in buffer is sizeof(virtio_net_ctrl_ack) in
  vhost_vdpa_net_cvq_add
- Reduce the number of changes from previous versions
- Delete unused memory barrier

v5:
- Rename s/start/load/
- Use independent NetClientInfo to only add load callback on cvq.
- Accept out sg instead of dev_buffers[] at vhost_vdpa_net_cvq_map_elem
- Use only out size instead of iovec dev_buffers to know if the descriptor is
  effectively available, allowing to delete artificial !NULL VirtQueueElement
  on vhost_svq_add call.

v4:
- Actually use NetClientInfo callback.

v3:
- Route vhost-vdpa start code through NetClientInfo callback.
- Delete extra vhost_net_stop_one() call.

v2:
- Fix SIGSEGV dereferencing SVQ when not in svq mode

v1 from RFC:
- Do not reorder DRIVER_OK & enable patches.
- Delete leftovers

Eugenio Pérez (12):
  vhost: stop transfer elem ownership in vhost_handle_guest_kick
  vhost: use SVQ element ndescs instead of opaque data for desc
validation
  vhost: Delete useless read memory barrier
  vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  vhost_net: Add NetClientInfo prepare callback
  vhost_net: Add NetClientInfo stop callback
  vdpa: add net_vhost_vdpa_cvq_info NetClientInfo
  vdpa: Move command buffers map to start of net device
  vdpa: Extract vhost_vdpa_net_cvq_add from
vhost_vdpa_net_handle_ctrl_avail
  vhost_net: add NetClientState->load() callback
  vdpa: Add virtio-net mac address via CVQ at start
  vdpa: Delete CVQ migration blocker

 include/hw/virtio/vhost-vdpa.h |   1 -
 include/net/net.h  |   6 +
 hw/net/vhost_net.c |  17 +++
 hw/virtio/vhost-shadow-virtqueue.c |  27 ++--
 hw/virtio/vhost-vdpa.c |  14 --
 net/vhost-vdpa.c   | 227 ++---
 6 files changed, 180 insertions(+), 112 deletions(-)

-- 
2.31.1

Re: [PATCH] disas: Add LoongArch support

2022-08-04 Thread Peter Maydell

On Thu, 4 Aug 2022 at 18:32, Qi Hu  wrote:
>

More specifically, this is adding support for disassembling
on LoongArch hosts. The handling of disassembling LoongArch
guests is already connected up.

thanks
-- PMM

Re: [RFC 1/1] hw: tpmtisspi: add SPI support to QEMU TPM implementation

2022-08-04 Thread Dan Zhang

On Wed, Aug 3, 2022 at 10:30 AM Peter Delevoryas  wrote:
>
> On Wed, Aug 03, 2022 at 10:52:23AM +0200, Cédric Le Goater wrote:
> > On 8/3/22 04:32, Iris Chen wrote:
> > > From: Iris Chen 
> >
> > A commit log telling us about this new device would be good to have.
> >
> >
> > > Signed-off-by: Iris Chen 
> > > ---
> > >   configs/devices/arm-softmmu/default.mak |   1 +
> > >   hw/arm/Kconfig  |   5 +
> > >   hw/tpm/Kconfig  |   5 +
> > >   hw/tpm/meson.build  |   1 +
> > >   hw/tpm/tpm_tis_spi.c| 311 
> > >   include/sysemu/tpm.h|   3 +
> > >   6 files changed, 326 insertions(+)
> > >   create mode 100644 hw/tpm/tpm_tis_spi.c
> > >
> > > diff --git a/configs/devices/arm-softmmu/default.mak 
> > > b/configs/devices/arm-softmmu/default.mak
> > > index 6985a25377..80d2841568 100644
> > > --- a/configs/devices/arm-softmmu/default.mak
> > > +++ b/configs/devices/arm-softmmu/default.mak
> > > @@ -42,3 +42,4 @@ CONFIG_FSL_IMX6UL=y
> > >   CONFIG_SEMIHOSTING=y
> > >   CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
> > >   CONFIG_ALLWINNER_H3=y
> > > +CONFIG_FBOBMC_AST=y
> >
> > I don't think this extra config is useful for now
> >
> > > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > > index 15fa79afd3..193decaec1 100644
> > > --- a/hw/arm/Kconfig
> > > +++ b/hw/arm/Kconfig
> > > @@ -458,6 +458,11 @@ config ASPEED_SOC
> > >   select PMBUS
> > >   select MAX31785
> > > +config FBOBMC_AST
> > > +bool
> > > +select ASPEED_SOC
> > > +select TPM_TIS_SPI
> > > +
> > >   config MPS2
> > >   bool
> > >   imply I2C_DEVICES
> > > diff --git a/hw/tpm/Kconfig b/hw/tpm/Kconfig
> > > index 29e82f3c92..370a43f045 100644
> > > --- a/hw/tpm/Kconfig
> > > +++ b/hw/tpm/Kconfig
> > > @@ -8,6 +8,11 @@ config TPM_TIS_SYSBUS
> > >   depends on TPM
> > >   select TPM_TIS
> > > +config TPM_TIS_SPI
> > > +bool
> > > +depends on TPM
> > > +select TPM_TIS
> > > +
> > >   config TPM_TIS
> > >   bool
> > >   depends on TPM
> > > diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
> > > index 1c68d81d6a..1a057f4e36 100644
> > > --- a/hw/tpm/meson.build
> > > +++ b/hw/tpm/meson.build
> > > @@ -2,6 +2,7 @@ softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: 
> > > files('tpm_tis_common.c'))
> > >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: 
> > > files('tpm_tis_isa.c'))
> > >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
> > > files('tpm_tis_sysbus.c'))
> > >   softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
> > > +softmmu_ss.add(when: 'CONFIG_TPM_TIS_SPI', if_true: 
> > > files('tpm_tis_spi.c'))
> > >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_TIS'], if_true: 
> > > files('tpm_ppi.c'))
> > >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_CRB'], if_true: 
> > > files('tpm_ppi.c'))
> > > diff --git a/hw/tpm/tpm_tis_spi.c b/hw/tpm/tpm_tis_spi.c
> > > new file mode 100644
> > > index 00..c98ddcfddb
> > > --- /dev/null
> > > +++ b/hw/tpm/tpm_tis_spi.c
> > > @@ -0,0 +1,311 @@
> > > +#include "qemu/osdep.h"
> > > +#include "hw/qdev-properties.h"
> > > +#include "migration/vmstate.h"
> > > +#include "hw/acpi/tpm.h"
> > > +#include "tpm_prop.h"
> > > +#include "tpm_tis.h"
> > > +#include "qom/object.h"
> > > +#include "hw/ssi/ssi.h"
> > > +#include "hw/ssi/spi_gpio.h"
> > > +
> > > +#define TPM_TIS_SPI_ADDR_BYTES 3
> > > +#define SPI_WRITE 0
> > > +
> > > +typedef enum {
> > > +TIS_SPI_PKT_STATE_DEACTIVATED = 0,
> > > +TIS_SPI_PKT_STATE_START,
> > > +TIS_SPI_PKT_STATE_ADDRESS,
> > > +TIS_SPI_PKT_STATE_DATA_WR,
> > > +TIS_SPI_PKT_STATE_DATA_RD,
> > > +TIS_SPI_PKT_STATE_DONE,
> > > +} TpmTisSpiPktState;
> > > +
> > > +union TpmTisRWSizeByte {
> > > +uint8_t byte;
> > > +struct {
> > > +uint8_t data_expected_size:6;
> > > +uint8_t resv:1;
> > > +uint8_t rwflag:1;
> > > +};
> > > +};
> > > +
> > > +union TpmTisSpiHwAddr {
> > > +hwaddr addr;
> > > +uint8_t bytes[sizeof(hwaddr)];
> > > +};
> > > +
> > > +union TpmTisSpiData {
> > > +uint32_t data;
> > > +uint8_t bytes[64];
> > > +};
> > > +
> > > +struct TpmTisSpiState {
> > > +/*< private >*/
> > > +SSIPeripheral parent_obj;
> > > +
> > > +/*< public >*/
> > > +TPMState tpm_state; /* not a QOM object */
> > > +TpmTisSpiPktState tpm_tis_spi_state;
> > > +
> > > +union TpmTisRWSizeByte first_byte;
> > > +union TpmTisSpiHwAddr addr;
> > > +union TpmTisSpiData data;
> >
> > Are these device registers ? I am not sure the unions are very useful.
>
> +1, I don't think we should be using unions, instead we should split out
> all the relevant fields we want to store and use extract32/deposit32/etc
> if necessary.
These union is used to saving us code to extract the bits from first byte
and assembling the hwaddr and unint32_t data from bytes.
I think as the bitfields has

Re: [PATCH v2 1/1] target/ppc: fix unreachable code in do_ldst_quad()

2022-08-04 Thread Peter Maydell

On Mon, 25 Jul 2022 at 21:24, Daniel Henrique Barboza
 wrote:
>
> Coverity reports that commit fc34e81acd51 ("target/ppc: add macros to
> check privilege level") turned the following code unreachable:
>
> if (!prefixed && !(ctx->insns_flags2 & PPC2_LSQ_ISA207)) {
> /* lq and stq were privileged prior to V. 2.07 */
> REQUIRE_SV(ctx);
>
> >>> CID 1490757:  Control flow issues  (UNREACHABLE)
> >>> This code cannot be reached: "if (ctx->le_mode) {
> if (ctx->le_mode) {
> gen_align_no_le(ctx);
> return true;
> }
> }
>
> This happens because the macro REQUIRE_SV(), in CONFIG_USER_MODE, will
> always result in a 'return true' statement. In fact, all REQUIRE_*
> macros for target/ppc/translate.c behave the same way: if a condition
> isn't met, an exception is generated and a 'return' statement is issued.
>
> The difference is that all other callers are using it in insns that are
> not implemented in user mode. do_ldst_quad(), on the other hand, is user
> mode compatible.

This is a Coverity false positive, and I'd already marked it that way
in the Coverity UI back on the 20th. Coverity gets confused sometimes
by ifdeffery.

So you don't need this patch, unless you think the code is genuinely
better (more readable to humans, etc) this way.

thanks
-- PMM

Re: [PATCH v2 12/20] ppc/ppc405: QOM'ify EBC

2022-08-04 Thread BALATON Zoltan


On Thu, 4 Aug 2022, Cédric Le Goater wrote:

[ Replying to all ]

On 8/4/22 16:26, BALATON Zoltan wrote:

On Thu, 4 Aug 2022, Cédric Le Goater wrote:

On 8/4/22 14:09, BALATON Zoltan wrote:

On Thu, 4 Aug 2022, Cédric Le Goater wrote:

On 8/4/22 01:36, Daniel Henrique Barboza wrote:

Cedric,

On 8/3/22 10:28, Cédric Le Goater wrote:

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
  hw/ppc/ppc405.h    | 16 +++
  hw/ppc/ppc405_uc.c | 71 
+++---

  2 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/ppc405.h b/hw/ppc/ppc405.h
index 1da34a7f10f3..1c7fe07b8084 100644
--- a/hw/ppc/ppc405.h
+++ b/hw/ppc/ppc405.h
@@ -65,7 +65,22 @@ struct ppc4xx_bd_info_t {
  typedef struct Ppc405SoCState Ppc405SoCState;
+/* Peripheral controller */
+#define TYPE_PPC405_EBC "ppc405-ebc"
+OBJECT_DECLARE_SIMPLE_TYPE(Ppc405EbcState, PPC405_EBC);
+struct Ppc405EbcState {
+    DeviceState parent_obj;
+
+    PowerPCCPU *cpu;
+    uint32_t addr;
+    uint32_t bcr[8];
+    uint32_t bap[8];
+    uint32_t bear;
+    uint32_t besr0;
+    uint32_t besr1;
+    uint32_t cfg;
+};
  /* DMA controller */
  #define TYPE_PPC405_DMA "ppc405-dma"
@@ -203,6 +218,7 @@ struct Ppc405SoCState {
  Ppc405OcmState ocm;
  Ppc405GpioState gpio;
  Ppc405DmaState dma;
+    Ppc405EbcState ebc;
  };
  /* PowerPC 405 core */
diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c
index 6bd93c1cb90c..0166f3fc36da 100644
--- a/hw/ppc/ppc405_uc.c
+++ b/hw/ppc/ppc405_uc.c
@@ -393,17 +393,6 @@ static void ppc4xx_opba_init(hwaddr base)
/*/
  /* Peripheral controller */
-typedef struct ppc4xx_ebc_t ppc4xx_ebc_t;
-struct ppc4xx_ebc_t {
-    uint32_t addr;
-    uint32_t bcr[8];
-    uint32_t bap[8];
-    uint32_t bear;
-    uint32_t besr0;
-    uint32_t besr1;
-    uint32_t cfg;
-};
-
  enum {
  EBC0_CFGADDR = 0x012,
  EBC0_CFGDATA = 0x013,
@@ -411,10 +400,9 @@ enum {
  static uint32_t dcr_read_ebc (void *opaque, int dcrn)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(opaque);
  uint32_t ret;
-    ebc = opaque;
  switch (dcrn) {
  case EBC0_CFGADDR:
  ret = ebc->addr;
@@ -496,9 +484,8 @@ static uint32_t dcr_read_ebc (void *opaque, int 
dcrn)

  static void dcr_write_ebc (void *opaque, int dcrn, uint32_t val)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(opaque);
-    ebc = opaque;
  switch (dcrn) {
  case EBC0_CFGADDR:
  ebc->addr = val;
@@ -554,12 +541,11 @@ static void dcr_write_ebc (void *opaque, int 
dcrn, uint32_t val)

  }
  }
-static void ebc_reset (void *opaque)
+static void ppc405_ebc_reset(DeviceState *dev)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(dev);
  int i;
-    ebc = opaque;
  ebc->addr = 0x;
  ebc->bap[0] = 0x7F8FFE80;
  ebc->bcr[0] = 0xFFE28000;
@@ -572,18 +558,46 @@ static void ebc_reset (void *opaque)
  ebc->cfg = 0x8040;
  }
-void ppc405_ebc_init(CPUPPCState *env)
+static void ppc405_ebc_realize(DeviceState *dev, Error **errp)
  {
-    ppc4xx_ebc_t *ebc;
+    Ppc405EbcState *ebc = PPC405_EBC(dev);
+    CPUPPCState *env;
+
+    assert(ebc->cpu);
+
+    env = >cpu->env;
-    ebc = g_new0(ppc4xx_ebc_t, 1);
-    qemu_register_reset(_reset, ebc);
  ppc_dcr_register(env, EBC0_CFGADDR,
   ebc, _read_ebc, _write_ebc);
  ppc_dcr_register(env, EBC0_CFGDATA,
   ebc, _read_ebc, _write_ebc);
  }
+static Property ppc405_ebc_properties[] = {
+    DEFINE_PROP_LINK("cpu", Ppc405EbcState, cpu, TYPE_POWERPC_CPU,
+ PowerPCCPU *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ppc405_ebc_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = ppc405_ebc_realize;
+    dc->user_creatable = false;
+    dc->reset = ppc405_ebc_reset;
+    device_class_set_props(dc, ppc405_ebc_properties);
+}
+
+void ppc405_ebc_init(CPUPPCState *env)
+{
+    PowerPCCPU *cpu = env_archcpu(env);
+    DeviceState *dev = qdev_new(TYPE_PPC405_EBC);
+
+    object_property_set_link(OBJECT(cpu), "cpu", OBJECT(dev), 
_abort);


This line is breaking the boot of sam460ex:


  ./qemu-system-ppc64 -display none -M sam460ex
Unexpected error in object_property_find_err() at ../qom/object.c:1304:
qemu-system-ppc64: Property '460exb-powerpc64-cpu.cpu' not found
Aborted (core dumped)


I think you meant to link the cpu prop of the EBC obj to the CPU 
object,

not the cpu prop of the CPU obj to the EBC dev.


Yes. ppc405_ebc_init() has only one user left, the sam460ex, which I 
didn't

test :/


This patch changes ppc405_ebc_init to a realize method so shouldn't the 
sam460ex be changed to create the new object instead of calling 
ppc405_ebc_init too instead? 


Sure.

First step was to make sure nothing was broken. I can add some extra
patches in v3 to convert

Re: [RFC PATCH] cputlb and ssi: cache class to avoid expensive object_dynamic_cast_assert (HACK!!!)

2022-08-04 Thread Cédric Le Goater


On 8/4/22 18:51, Alex Bennée wrote:


Cédric Le Goater  writes:


Hello Alex,

Thanks for putting some time into this problem,

On 8/4/22 11:20, Alex Bennée wrote:

Investigating why some BMC models are so slow compared to a plain ARM
virt machines I did some profiling of:
./qemu-system-arm -M romulus-bmc -nic user \
  -drive
  file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \
  -nographic -serial mon:stdio
And saw that object_dynamic_cast was dominating the profile times.
We
have a number of cases in the CPU hot path and more importantly for
this model in the SSI bus. As the class is static once the object is
created we just cache it and use it instead of the dynamic case
macros.
[AJB: I suspect a proper fix for this is for QOM to support a cached
class lookup, abortive macro attempt #if 0'd in this patch].
Signed-off-by: Alex Bennée 
Cc: Cédric Le Goater 



Here are some results,

* romulus-bmc, OpenBmc login prompt

   without : 82s
   with: 56s


Looks like I lucked out picking the lowest hanging fruit.


That's a huge improvement. I tend to use buildroot mostly for FW and
kernel dev but OpenBMC has become as complex as a common server distro.
The above result is probably faster than real HW, for the AST2400 and
AST2500 at least.




* ast2500-evb,execute-in-place=true, U-boot 2019.04 prompt

   without : 30s
   with: 22s

* witherspoon-bmc,execute-in-place=true, U-boot 2016.07 prompt

   without : 5.5s
   with: 4.1s

There is definitely an improvement in all scenarios.

Applying a similar technique on AspeedSMCClass, I could gain
another ~10% and boot the ast2500-evb,execute-in-place=true
machine, up to the U-boot 2019.04 prompt, in less then 20s.


There are some fundamentals to XIP which means they will be slower if
each instruction is being sucked through io_readx/device emulation


Yes. But when using XIP, there is a huge time difference between two
U-boot versions. See above. It takes 4s to reach the U-boot prompt of
the older 2016.07 and 22s on the newer U-boot 2019.04.


although I'm not sure what the exact mechanism is because surely a ROM
can just be mapped into the address space and run from there?


It can and that's the default QEMU mode for the Aspeed machines. The flash
contents is copied in a ROM at 0x0. See commit 1a15311a12fa ("hw/arm/aspeed:
add a 'execute-in-place' property to boot directly from CE0")


That's not exactly how the HW works and there are still some FW (like uboot
on the AST2600 BMC of some Meta boards) which will fetch instructions to
execute from the flash contents region at 0x2000 and not use the ROM
region copied at 0x0.


However, newer u-boot are still quite slow to boot when executing
from the flash device.


For any of those machines? 


Yes. It gets worse with the AST2600, which has 2 CPUs


Whats the next command line for me to dig into?


Here are images to reproduce.

* U-Boot 2016.07:

  wget 
https://github.com/openbmc/openbmc/releases/download/2.9.0/obmc-phosphor-image-romulus.static.mtd

  qemu-system-arm -M romulus-bmc -drive 
file=./obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd -nographic
  qemu-system-arm -M romulus-bmc,execute-in-place=true -drive 
file=./obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd -nographic

* U-Boot 2019.04:

  wget https://www.kaod.org/qemu/aspeed/romulus/flash-romulus-bmc

  same commands

Thanks,

C.

[PATCH] disas: Add LoongArch support

2022-08-04 Thread Qi Hu

Signed-off-by: Qi Hu 
---
 disas.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/disas.c b/disas.c
index e31438f349..d44f46 100644
--- a/disas.c
+++ b/disas.c
@@ -176,6 +176,8 @@ static void initialize_debug_host(CPUDebug *s)
 #else
 #error unsupported RISC-V ABI
 #endif
+#elif defined(__loongarch__)
+s->info.print_insn = print_insn_loongarch;
 #elif defined(__aarch64__)
 s->info.cap_arch = CS_ARCH_ARM64;
 #elif defined(__alpha__)
-- 
2.37.1

Re: [PATCH v10 11/21] jobs: group together API calls under the same job lock

2022-08-04 Thread Kevin Wolf

Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> Now that the API offers also _locked() functions, take advantage
> of it and give also the caller control to take the lock and call
> _locked functions.
> 
> This makes sense especially when we have for loops, because it
> makes no sense to have:
> 
> for(job = job_next(); ...)
> 
> where each job_next() takes the lock internally.
> Instead we want
> 
> JOB_LOCK_GUARD();
> for(job = job_next_locked(); ...)
> 
> In addition, protect also direct field accesses, by either creating a
> new critical section or widening the existing ones.

"In addition" sounds like it should be a separate patch. I was indeed
surprised when after a few for loops where you just pulled the existing
locking up a bit, I saw some hunks that add completely new locking.

> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  block.c| 17 ++---
>  blockdev.c | 12 +---
>  blockjob.c | 35 ++-
>  job-qmp.c  |  4 +++-
>  job.c  |  7 +--
>  monitor/qmp-cmds.c |  7 +--
>  qemu-img.c | 37 +
>  7 files changed, 75 insertions(+), 44 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 2c0080..7559965dbc 100644
> --- a/block.c
> +++ b/block.c
> @@ -4978,8 +4978,8 @@ static void bdrv_close(BlockDriverState *bs)
>  
>  void bdrv_close_all(void)
>  {
> -assert(job_next(NULL) == NULL);
>  GLOBAL_STATE_CODE();
> +assert(job_next(NULL) == NULL);
>  
>  /* Drop references from requests still in flight, such as canceled block
>   * jobs whose AIO context has not been polled yet */
> @@ -6165,13 +6165,16 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error 
> **errp)
>  }
>  }
>  
> -for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> -GSList *el;
> +WITH_JOB_LOCK_GUARD() {
> +for (job = block_job_next_locked(NULL); job;
> + job = block_job_next_locked(job)) {
> +GSList *el;
>  
> -xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> -   job->job.id);
> -for (el = job->nodes; el; el = el->next) {
> -xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> +xdbg_graph_add_node(gr, job, 
> X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> +job->job.id);
> +for (el = job->nodes; el; el = el->next) {
> +xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> +}
>  }
>  }
>  
> diff --git a/blockdev.c b/blockdev.c
> index 71f793c4ab..5b79093155 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -150,12 +150,15 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>  return;
>  }
>  
> -for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> +JOB_LOCK_GUARD();
> +
> +for (job = block_job_next_locked(NULL); job;
> + job = block_job_next_locked(job)) {
>  if (block_job_has_bdrv(job, blk_bs(blk))) {

Should this be renamed to block_job_has_bdrv_locked() now?

It looks to me like it does need the locking. (Which actually makes
this patch a fix and not just an optimisation as the commit message
suggests.)

>  AioContext *aio_context = job->job.aio_context;
>  aio_context_acquire(aio_context);
>  
> -job_cancel(>job, false);
> +job_cancel_locked(>job, false);
>  
>  aio_context_release(aio_context);
>  }
> @@ -3745,7 +3748,10 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>  BlockJobInfoList *head = NULL, **tail = 
>  BlockJob *job;
>  
> -for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> +JOB_LOCK_GUARD();
> +
> +for (job = block_job_next_locked(NULL); job;
> + job = block_job_next_locked(job)) {
>  BlockJobInfo *value;
>  AioContext *aio_context;

More context:

BlockJobInfo *value;
AioContext *aio_context;

if (block_job_is_internal(job)) {
continue;
}
aio_context = block_job_get_aio_context(job);
aio_context_acquire(aio_context);
value = block_job_query(job, errp);
aio_context_release(aio_context);

This should become block_job_query_locked(). (You do that in patch 18,
but it looks a bit out of place there - which is precisely because it
really belongs in this one.)

> diff --git a/blockjob.c b/blockjob.c
> index 0d59aba439..96fb9d9f73 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -111,8 +111,10 @@ static bool child_job_drained_poll(BdrvChild *c)
>  /* An inactive or completed job doesn't have any pending requests. Jobs
>   * with !job->busy are either already paused or have a pause point after
>   * being reentered, so no job driver code

[PATCH 3/3] iotests, parallels: Add a test for duplicated clusters

2022-08-04 Thread alexander . ivanov

From: Alexander Ivanov 

Check if original and duplicated offsets refer to the same cluster.
Repair the image and check that writing to a referred cluster
doesn't affects another referred cluster.

Signed-off-by: Natalia Kuzmina 
Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/314|  88 ++
 tests/qemu-iotests/314.out|  36 +++
 .../parallels-2-duplicated-cluster.bz2| Bin 0 -> 148 bytes
 3 files changed, 124 insertions(+)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out
 create mode 100644 
tests/qemu-iotests/sample_images/parallels-2-duplicated-cluster.bz2

diff --git a/tests/qemu-iotests/314 b/tests/qemu-iotests/314
new file mode 100755
index 00..fdf47f86d4
--- /dev/null
+++ b/tests/qemu-iotests/314
@@ -0,0 +1,88 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Test qemu-img check on duplicated clusters
+#
+# Copyright (C) 2009 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=natalia.kuzm...@openvz.org
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+_supported_fmt parallels
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "using sample corrupted image"
+echo
+_use_sample_img parallels-2-duplicated-cluster.bz2
+
+CLUSTER_SIZE=65536
+
+#read one cluster from original offset
+$QEMU_IO -c "read -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+#read from duplicated offset (data must be the same as on original offset)
+$QEMU_IO -c "read -P 0x11 $((4 * CLUSTER_SIZE)) $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+#change data from original offset
+$QEMU_IO -c "write -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+#read from duplicated offset (data must be the same as on original offset)
+$QEMU_IO -c "read -P 0x55 $((4 * CLUSTER_SIZE)) $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+echo
+echo "check and repair the image"
+echo
+_check_test_img -r all
+echo
+
+#read one cluster from original offset
+$QEMU_IO -c "read -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+#read copied data from new offset
+$QEMU_IO -c "read -P 0x55 $((4 * CLUSTER_SIZE)) $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+#change data from original offset
+$QEMU_IO -c "write -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+#read from new offset (fail, now this data was left unchanged)
+$QEMU_IO -c "read -P 0x11 $((4 * CLUSTER_SIZE)) $CLUSTER_SIZE" "$TEST_IMG" | \
+_filter_qemu_io
+
+echo
+echo
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/314.out b/tests/qemu-iotests/314.out
new file mode 100644
index 00..c36022c407
--- /dev/null
+++ b/tests/qemu-iotests/314.out
@@ -0,0 +1,36 @@
+QA output created by 314
+
+using sample corrupted image
+
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+check and repair the image
+
+Repairing BAT offset in entry 4 duplicates offset in entry 0
+The following inconsistencies were found and repaired:
+
+0 leaked clusters
+1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Pattern verification failed at offset 262144, 65536 bytes
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+
+*** done
diff --git 
a/tests/qemu-iotests/sample_images/parallels-2-duplicated-cluster.bz2 
b/tests/qemu-iotests/sample_images/parallels-2-duplicated-cluster.bz2
new file mode 100644
index

Re: [RFC 0/3] Add Generic SPI GPIO model

2022-08-04 Thread Dan Zhang

On Tue, Aug 2, 2022 at 7:25 AM Cédric Le Goater  wrote:
>
> On 7/31/22 00:06, Peter Delevoryas wrote:
> > On Sat, Jul 30, 2022 at 11:18:33PM +0200, Cédric Le Goater wrote:
> >> On 7/29/22 19:30, Peter Delevoryas wrote:
> >>> On Fri, Jul 29, 2022 at 03:25:55PM +0200, Cédric Le Goater wrote:
>  Hello Iris,
> 
>  On 7/29/22 01:23, Iris Chen wrote:
> > Hey everyone,
> >
> > I have been working on a project to add support for SPI-based TPMs in 
> > QEMU.
> > Currently, most of our vboot platforms using a SPI-based TPM use the 
> > Linux
> > SPI-GPIO driver to "bit-bang" the SPI protocol. This is because the 
> > Aspeed
> > SPI controller (modeled in QEMU under hw/ssi/aspeed_smc.c) has an 
> > implementation
> > deficiency that prevents bi-directional operations.
>  aspeed_smc models the Aspeed FMC/SPI controllers which have a well 
>  defined
>  HW interface. Your model proposal adds support for a new SPI controller
>  using bitbang GPIOs. These are really two differents models. I don't see
>  how you could reuse aspeed_smc for this purpose.
> 
>  or you mean that Linux is using the SPI-GPIO driver because the Linux
>  Aspeed SMC driver doesn't match the need ? It is true that the Linux
>  driver is not generic, it deals with flash devices only. But that's
>  another problem.
> 
> > Thus, in order to connect
> > a TPM to this bus, my patch implements a QEMU SPI-GPIO driver (as the 
> > QEMU
> > counterpart of the Linux SPI-GPIO driver).
> >
> > As we use SPI-based TPMs on many of our BMCs for the secure-boot 
> > implementation,
> > I have already tested this implementation locally with our 
> > Yosemite-v3.5 platform
> > and Facebook-OpenBMC. This model was tested by connecting a generic 
> > SPI-NOR (m25p80
> > for example) to the Yosemite-v3.5 SPI bus containing the TPM.
> >
> > This patch is an RFC because I have several questions about design. 
> > Although the
> > model is working, I understand there are many areas where the design 
> > decision
> > is not deal (ie. abstracting hard coded GPIO values). Below are some 
> > details of the
> > patch and specific areas where I would appreciate feedback on how to 
> > make this better:
> > hw/arm/aspeed.c:
> > I am not sure where the best place is to instantiate the spi_gpio 
> > besides the
> > aspeed_machine_init.
> 
>  The SPI GPIO device would be a platform device and not a SoC device.
>  Hence, it should be instantiated at the machine level, like the I2C
>  device are, using properties to let the model know about the GPIOs
>  that should be driven to implement the SPI protocol.
> >>>
> >>> Agreed, should be like an I2C device.
> >>>
> 
>  Ideally, the state of the GPIO controller pins and the SPI GPIO state
>  should be shared. I think that's what you are trying to do that with
>  attribute 'controller_state' in your patch ? But, the way it is done
>  today, is too tightly coupled (names) with the Aspeed GPIO model to
>  be generic.
> 
>  I think we need an intermediate QOM interface, or a base class, to
>  implement an abstract SPI GPIO model and an Aspeed SPI GPIO model
>  on top which would be linked to the Aspeed GPIO model of the SoC
>  in use.
> >>>
> >>> Disagree, I feel like we can accomplish this without inheritance.
> >>>
> 
>  Or we could introduce some kind of generic GPIO controller that
>  we would link the SPI GPIO model with (using a property). The
>  Aspeed GPIO model would be of that kind and the SPI GPIO model
>  would be able to drive the pins using a common interface.
>  That's another way to do it.
> >>>
> >>> Agree, I would like to go in this direction if at all possible.
> >> Let's give it a try then. I would introduce a new QOM interface,
> >> something like  :
> >>
> >>  #define TYPE_GPIO_INTERFACE "gpio-interface"
> >>  #define GPIO_INTERFACE(obj) \
> >>  INTERFACE_CHECK(GpioInterface, (obj), TYPE_GPIO_INTERFACE)
> >>  typedef struct GpioInterfaceClass GpioInterfaceClass;
> >>  DECLARE_CLASS_CHECKERS(GpioInterfaceClass, GPIO_INTERFACE,
> >> TYPE_GPIO_INTERFACE)
> >>  struct GpioInterfaceClass {
> >>  InterfaceClass parent;
> >>  int (*get)(GpioInterface *gi, ...);
> >>  int (*set)(GpioInterface *gi, ...);
> >>  ...
> >>  };
> >>
> >> and implement the interface handlers under the AspeedGPIO model.
> >> The SPI GPIO model would have a link to such an interface to drive
> >> the GPIO pins.
> >>
> >> See IPMI and XIVE for some more complete models.
> >
> > This sounds good, but I just want to clarify first:
> >
> > Is it necessary to introduce a GPIO interface?
>
> Well, my feeling is that we need an abstract layer to

[PATCH 1/3] parallels: Add checking and repairing duplicate offsets in BAT

2022-08-04 Thread alexander . ivanov

From: Alexander Ivanov 

There could be corruptions in the image file:
two quest memory areas refer to the same host cluster.

If a duplicate offset is found fix it by copying the content
of the referred cluster to a new allocated cluster and
replace one of the two referring entries by the new cluster offset.

Signed-off-by: Natalia Kuzmina 
Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 93 +--
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..6a82942f38 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -64,6 +64,11 @@ static QEnumLookup prealloc_mode_lookup = {
 #define PARALLELS_OPT_PREALLOC_MODE "prealloc-mode"
 #define PARALLELS_OPT_PREALLOC_SIZE "prealloc-size"
 
+#define REVERSED_BAT_UNTOUCHED  0x
+
+#define HOST_CLUSTER_INDEX(s, off) \
+((off - ((s->header->data_off) << BDRV_SECTOR_BITS)) / (s->cluster_size))
+
 static QemuOptsList parallels_runtime_opts = {
 .name = "parallels",
 .head = QTAILQ_HEAD_INITIALIZER(parallels_runtime_opts.head),
@@ -419,9 +424,11 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
BdrvCheckMode fix)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t size, prev_off, high_off;
-int ret;
-uint32_t i;
+QEMUIOVector qiov;
+int64_t size, prev_off, high_off, sector_num;
+int ret, n;
+uint32_t i, idx_host, *reversed_bat;
+int64_t *cluster_buf;
 bool flush_bat = false;
 
 size = bdrv_getlength(bs->file->bs);
@@ -443,8 +450,31 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 }
 
 res->bfi.total_clusters = s->bat_size;
+res->bfi.allocated_clusters = 0;
 res->bfi.compressed_clusters = 0; /* compression is not supported */
 
+cluster_buf = g_malloc(s->cluster_size);
+qemu_iovec_init(, 0);
+qemu_iovec_add(, cluster_buf, s->cluster_size);
+
+/*
+ * Make a reversed BAT. The table has the same size as BAT.
+ * Initially the table is filled with REVERSED_BAT_UNTOUCHED values.
+ * A position in the table is defined by a host index
+ * (a number of a cluster in the data area):
+ * index = (cluster_offset - data_area_offset) / cluster_size
+ * In the main loop fill the table with guest indexes
+ * (a number of entry in BAT).
+ * Before this, check if the relevant entry in the reversed table
+ * is REVERSED_BAT_UNTOUCHED. If that's not true, a guest index was
+ * written to the reversed table on a previous step.
+ * It means there is a duplicate offset.
+ */
+reversed_bat = g_malloc(s->bat_size * sizeof(uint32_t));
+for (i = 0; i < s->bat_size; i++) {
+reversed_bat[i] = REVERSED_BAT_UNTOUCHED;
+}
+
 high_off = 0;
 prev_off = 0;
 for (i = 0; i < s->bat_size; i++) {
@@ -468,6 +498,59 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 }
 }
 
+/* Checking bat entry uniqueness. */
+idx_host = HOST_CLUSTER_INDEX(s, off);
+if (reversed_bat[idx_host] != REVERSED_BAT_UNTOUCHED) {
+/* duplicated offset in BAT */
+fprintf(stderr,
+"%s BAT offset in entry %u duplicates offset in entry 
%u\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
+i, reversed_bat[idx_host]);
+res->corruptions++;
+
+if (fix & BDRV_FIX_ERRORS) {
+/* copy data to a new cluster */
+sector_num = bat2sect(s, reversed_bat[idx_host]);
+
+ret = bdrv_pread(bs->file, sector_num << BDRV_SECTOR_BITS,
+ s->cluster_size, cluster_buf, 0);
+if (ret < 0) {
+res->check_errors++;
+goto out;
+}
+
+s->bat_bitmap[i] = 0;
+
+sector_num = (i * s->cluster_size) >> BDRV_SECTOR_BITS;
+off = allocate_clusters(bs, sector_num, s->tracks, );
+if (off < 0) {
+res->check_errors++;
+goto out;
+}
+off <<= BDRV_SECTOR_BITS;
+
+/* off is new and we should repair idx_host accordingly. */
+idx_host = HOST_CLUSTER_INDEX(s, off);
+
+ret = bdrv_co_pwritev(bs->file, off, s->cluster_size, , 
0);
+if (ret < 0) {
+res->check_errors++;
+goto out;
+}
+
+size = bdrv_getlength(bs->file->bs);
+if (size < 0) {
+res->check_errors++;
+ret = size;
+goto out;
+}
+
+res->corruptions_fixed++;
+flush_bat = true;
+}
+}
+reversed_bat[idx_host] = i;
+

[PATCH 2/3] parallels: Let duplicates repairing pass without unwanted messages

2022-08-04 Thread alexander . ivanov

From: Alexander Ivanov 

When duplicates are repaired a new space area is allocated
and further leak check considers it as a leak.
Let fix it without printing any messages.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6a82942f38..1f56ce26e4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -429,7 +429,7 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
 int ret, n;
 uint32_t i, idx_host, *reversed_bat;
 int64_t *cluster_buf;
-bool flush_bat = false;
+bool flush_bat = false, truncate_silently = false;
 
 size = bdrv_getlength(bs->file->bs);
 if (size < 0) {
@@ -547,6 +547,7 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
 
 res->corruptions_fixed++;
 flush_bat = true;
+truncate_silently = true;
 }
 }
 reversed_bat[idx_host] = i;
@@ -576,10 +577,13 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 if (size > res->image_end_offset) {
 int64_t count;
 count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
-fprintf(stderr, "%s space leaked at the end of the image %" PRId64 
"\n",
-fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
-size - res->image_end_offset);
-res->leaks += count;
+if (!truncate_silently) {
+fprintf(stderr,
+"%s space leaked at the end of the image %" PRId64 "\n",
+fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
+size - res->image_end_offset);
+res->leaks += count;
+}
 if (fix & BDRV_FIX_LEAKS) {
 Error *local_err = NULL;
 
@@ -594,7 +598,10 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 res->check_errors++;
 goto out;
 }
-res->leaks_fixed += count;
+
+if (!truncate_silently) {
+res->leaks_fixed += count;
+}
 }
 }
 
-- 
2.34.1

[PATCH 0/3] Check and repair duplicated clusters in parallels images

2022-08-04 Thread alexander . ivanov

From: Alexander Ivanov 

Parallels image file can be corrupted this way: two guest memory areas
refer to the same host memory area (duplicated offsets in BAT).
qemu-img check copies data from duplicated cluster to the new cluster and
writes new corresponding offset to BAT instead of duplicated one.

Test 314 uses sample corrupted image parallels-2-duplicated-cluster.bz2.
Reading from duplicated offset and from original offset returns the same
data. After repairing changing either of these blocks of data
does not affect another one.

Alexander Ivanov (3):
  parallels: Add checking and repairing duplicate offsets in BAT
  parallels: Let duplicates repairing pass without unwanted messages
  iotests, parallels: Add a test for duplicated clusters

 block/parallels.c | 112 --
 tests/qemu-iotests/314|  88 ++
 tests/qemu-iotests/314.out|  36 ++
 .../parallels-2-duplicated-cluster.bz2| Bin 0 -> 148 bytes
 4 files changed, 227 insertions(+), 9 deletions(-)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out
 create mode 100644 
tests/qemu-iotests/sample_images/parallels-2-duplicated-cluster.bz2

-- 
2.34.1

Re: [PATCH v2 09/15] util: move 256-by-128 division helpers to int128

2022-08-04 Thread Lucas Mateus Martins Araujo e Castro



On 12/07/2022 06:35, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau

Break a cyclic dependency between int128 and host-utils.

Reviewed-by: Lucas Mateus Castro 


Signed-off-by: Marc-André Lureau
---
  include/qemu/host-utils.h |   3 -
  include/qemu/int128.h |   3 +
  util/host-utils.c | 180 --
  util/int128.c | 180 ++
  4 files changed, 183 insertions(+), 183 deletions(-)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 29f3a9987880..fa228a4a86e2 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -32,7 +32,6 @@

  #include "qemu/compiler.h"
  #include "qemu/bswap.h"
-#include "qemu/int128.h"

  #ifdef CONFIG_INT128
  static inline void mulu64(uint64_t *plow, uint64_t *phigh,
@@ -785,6 +784,4 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
  #endif
  }

-Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor);
-Int128 divs256(Int128 *plow, Int128 *phigh, Int128 divisor);
  #endif
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index d2b76ca6acdc..823c61edb0fd 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -472,4 +472,7 @@ static inline void bswap128s(Int128 *s)
  #define INT128_MAX int128_make128(UINT64_MAX, INT64_MAX)
  #define INT128_MIN int128_make128(0, INT64_MIN)

+Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor);
+Int128 divs256(Int128 *plow, Int128 *phigh, Int128 divisor);
+
  #endif /* INT128_H */
diff --git a/util/host-utils.c b/util/host-utils.c
index fb91bcba823d..96d5dc0bed25 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -266,183 +266,3 @@ void ulshift(uint64_t *plow, uint64_t *phigh, int32_t 
shift, bool *overflow)
  *plow = *plow << shift;
  }
  }
-
-/*
- * Unsigned 256-by-128 division.
- * Returns the remainder via r.
- * Returns lower 128 bit of quotient.
- * Needs a normalized divisor (most significant bit set to 1).
- *
- * Adapted from include/qemu/host-utils.h udiv_qrnnd,
- * from the GNU Multi Precision Library - longlong.h __udiv_qrnnd
- * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
- *
- * Licensed under the GPLv2/LGPLv3
- */
-static Int128 udiv256_qrnnd(Int128 *r, Int128 n1, Int128 n0, Int128 d)
-{
-Int128 d0, d1, q0, q1, r1, r0, m;
-uint64_t mp0, mp1;
-
-d0 = int128_make64(int128_getlo(d));
-d1 = int128_make64(int128_gethi(d));
-
-r1 = int128_remu(n1, d1);
-q1 = int128_divu(n1, d1);
-mp0 = int128_getlo(q1);
-mp1 = int128_gethi(q1);
-mulu128(, , int128_getlo(d0));
-m = int128_make128(mp0, mp1);
-r1 = int128_make128(int128_gethi(n0), int128_getlo(r1));
-if (int128_ult(r1, m)) {
-q1 = int128_sub(q1, int128_one());
-r1 = int128_add(r1, d);
-if (int128_uge(r1, d)) {
-if (int128_ult(r1, m)) {
-q1 = int128_sub(q1, int128_one());
-r1 = int128_add(r1, d);
-}
-}
-}
-r1 = int128_sub(r1, m);
-
-r0 = int128_remu(r1, d1);
-q0 = int128_divu(r1, d1);
-mp0 = int128_getlo(q0);
-mp1 = int128_gethi(q0);
-mulu128(, , int128_getlo(d0));
-m = int128_make128(mp0, mp1);
-r0 = int128_make128(int128_getlo(n0), int128_getlo(r0));
-if (int128_ult(r0, m)) {
-q0 = int128_sub(q0, int128_one());
-r0 = int128_add(r0, d);
-if (int128_uge(r0, d)) {
-if (int128_ult(r0, m)) {
-q0 = int128_sub(q0, int128_one());
-r0 = int128_add(r0, d);
-}
-}
-}
-r0 = int128_sub(r0, m);
-
-*r = r0;
-return int128_or(int128_lshift(q1, 64), q0);
-}
-
-/*
- * Unsigned 256-by-128 division.
- * Returns the remainder.
- * Returns quotient via plow and phigh.
- * Also returns the remainder via the function return value.
- */
-Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor)
-{
-Int128 dhi = *phigh;
-Int128 dlo = *plow;
-Int128 rem, dhighest;
-int sh;
-
-if (!int128_nz(divisor) || !int128_nz(dhi)) {
-*plow  = int128_divu(dlo, divisor);
-*phigh = int128_zero();
-return int128_remu(dlo, divisor);
-} else {
-sh = clz128(divisor);
-
-if (int128_ult(dhi, divisor)) {
-if (sh != 0) {
-/* normalize the divisor, shifting the dividend accordingly */
-divisor = int128_lshift(divisor, sh);
-dhi = int128_or(int128_lshift(dhi, sh),
-int128_urshift(dlo, (128 - sh)));
-dlo = int128_lshift(dlo, sh);
-}
-
-*phigh = int128_zero();
-*plow = udiv256_qrnnd(, dhi, dlo, divisor);
-} else {
-if (sh != 0) {
-/* normalize the divisor, shifting the dividend accordingly */
-divisor = int128_lshift(divisor, sh);
-dhighest = int128_rshift(dhi, (128 - sh));
-

Re: [PATCH v2 1/1] osdep: asynchronous teardown for shutdown on Linux

2022-08-04 Thread Daniel P . Berrangé

On Thu, Aug 04, 2022 at 09:20:59AM +0100, Daniel P. Berrangé wrote:
> On Thu, Aug 04, 2022 at 07:56:49AM +0200, Claudio Imbrenda wrote:
> > On Wed, 3 Aug 2022 18:34:45 +0100
> > Daniel P. Berrangé  wrote:
> > 
> > > On Wed, Aug 03, 2022 at 07:31:41PM +0200, Claudio Imbrenda wrote:
> > > > This patch adds support for asynchronously tearing down a VM on Linux.
> > > > 
> > > > When qemu terminates, either naturally or because of a fatal signal,
> > > > the VM is torn down. If the VM is huge, it can take a considerable
> > > > amount of time for it to be cleaned up. In case of a protected VM, it
> > > > might take even longer than a non-protected VM (this is the case on
> > > > s390x, for example).
> > > > 
> > > > Some users might want to shut down a VM and restart it immediately,
> > > > without having to wait. This is especially true if management
> > > > infrastructure like libvirt is used.
> > > > 
> > > > This patch implements a simple trick on Linux to allow qemu to return
> > > > immediately, with the teardown of the VM being performed
> > > > asynchronously.
> > > > 
> > > > If the new commandline option -async-teardown is used, a new process is
> > > > spawned from qemu at startup, using the clone syscall, in such way that
> > > > it will share its address space with qemu.
> > > > 
> > > > The new process will then simpy wait until qemu terminates, and then it
> > > > will exit itself.
> > > > 
> > > > This allows qemu to terminate quickly, without having to wait for the
> > > > whole address space to be torn down. The teardown process will exit
> > > > after qemu, so it will be the last user of the address space, and
> > > > therefore it will take care of the actual teardown.
> > > > 
> > > > The teardown process will share the same cgroups as qemu, so both
> > > > memory usage and cpu time will be accounted properly.
> > > > 
> > > > This feature can already be used with libvirt by adding the following
> > > > to the XML domain definition:
> > > > 
> > > >   http://libvirt.org/schemas/domain/qemu/1.0;>
> > > >   
> > > > 
> > > 
> > > How does this work in practice ?  Libvirt should be blocking until
> > 
> > I don't know the inner details of how libvirt works..
> > 
> > > all processes in the cgroup have exited, including this cloned
> > > child process.
> > 
> > ..but I tested it and it works
> > 
> > my impression is that libvirt by default is only waiting for the
> > main qemu process.
> 
> If true, that would be a bug that needs fixing and should not be
> relied on.

Libvirt is invoking 'TerminateMachine' DBus call on systemd-machined.
That in turn iterates over every process in the cgroup and kills
them off.

Docs are a little vague and I've not followed the code perfectly, but
that should mean TeminateMachine doesnt return until every process in
the cgroup has exited.

That said, since this is a dbus API call, libvirt will probably
timeout waiting for the DBus reply after something like 30-60
seconds IIRC.

> 
> > the only issue I have found is the log file, which stays open as long
> > as some file descriptors (which the cloned process inherits from the
> > main qemu process) stay open. A new VM cannot be started if its log file
> > is still open by the logger process. The close_range() call solves the
> > issue.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-04 Thread Alex Bennée

Daniel P. Berrangé  writes:

> On Thu, Jul 21, 2022 at 06:36:21PM +0200, Paolo Bonzini wrote:
>> From: "Jason A. Donenfeld" 
>> 
>> Tiny machines optimized for fast boot time generally don't use EFI,
>> which means a random seed has to be supplied some other way. For this
>> purpose, Linux (â‰¥5.20) supports passing a seed in the setup_data table
>> with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
>> specialized bootloaders. The linked commit shows the upstream kernel
>> implementation.
>> 
>> At Paolo's request, we don't pass these to versioned machine types â‰¤7.0.
>
>
> This change has also broken direct kernel measured boot with AMD SEV
> confidential virtualization.

FWIW this is why we had to introduce the dtb-randomness control knob for
ARM -M virt machines. Although we have deprecated the old dtb-kaslr-seed
knob and it has always enabled by default because the measured boot was
sufficiently new the few people working with it could just add it to
their command lines.

-- 
Alex Bennée

Re: [PATCH v2 1/1] target/ppc: fix unreachable code in do_ldst_quad()

2022-08-04 Thread Matheus K. Ferst


On 25/07/2022 17:21, Daniel Henrique Barboza wrote:

Coverity reports that commit fc34e81acd51 ("target/ppc: add macros to
check privilege level") turned the following code unreachable:

if (!prefixed && !(ctx->insns_flags2 & PPC2_LSQ_ISA207)) {
 /* lq and stq were privileged prior to V. 2.07 */
 REQUIRE_SV(ctx);


 CID 1490757:  Control flow issues  (UNREACHABLE)
 This code cannot be reached: "if (ctx->le_mode) {

 if (ctx->le_mode) {
 gen_align_no_le(ctx);
 return true;
 }
}

This happens because the macro REQUIRE_SV(), in CONFIG_USER_MODE, will
always result in a 'return true' statement. In fact, all REQUIRE_*
macros for target/ppc/translate.c behave the same way: if a condition
isn't met, an exception is generated and a 'return' statement is issued.

The difference is that all other callers are using it in insns that are
not implemented in user mode. do_ldst_quad(), on the other hand, is user
mode compatible.

Fixes include wrapping these lines in "if !defined(CONFIG_USER_MODE)",
making it explicit that these lines are not user mode anymore. Another
fix would be, for example, to change REQUIRE_SV() to not issue a
'return' and check if we're running in privileged mode or not by hand,
but this would change all other callers of the macro that are using it
in an adequate manner.

The code that was in place before fc34e81acd51 was good enough, so let's
go back to that: open code the ctx->pr condition and fire the exception
if we're not privileged. The difference from the code back then to what
we're doing now is an 'unlikely' compiler hint to ctx->pr and the use of
gen_priv_opc() instead of gen_priv_exception().

Fixes: Coverity CID 1490757
Cc: Matheus Ferst 
Signed-off-by: Daniel Henrique Barboza 
---
  target/ppc/translate/fixedpoint-impl.c.inc | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
b/target/ppc/translate/fixedpoint-impl.c.inc
index db14d3bebc..a3ade4fe2b 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -79,8 +79,11 @@ static bool do_ldst_quad(DisasContext *ctx, arg_D *a, bool 
store, bool prefixed)
  REQUIRE_INSNS_FLAGS(ctx, 64BX);

  if (!prefixed && !(ctx->insns_flags2 & PPC2_LSQ_ISA207)) {
-/* lq and stq were privileged prior to V. 2.07 */
-REQUIRE_SV(ctx);
+if (unlikely(ctx->pr)) {
+/* lq and stq were privileged prior to V. 2.07 */
+gen_priv_opc(ctx);
+return true;
+}

  if (ctx->le_mode) {
  gen_align_no_le(ctx);
--
2.36.1



Since the remaining code in this branch is dead code in user-mode, I'd 
personally prefer the v1 approach, but the difference is unlikely to 
have any meaningful impact, so either way is good.


Reviewed-by: Matheus Ferst 

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer

Re: [PATCH v2 1/1] osdep: asynchronous teardown for shutdown on Linux

2022-08-04 Thread Daniel P . Berrangé

On Wed, Aug 03, 2022 at 07:31:41PM +0200, Claudio Imbrenda wrote:
> This patch adds support for asynchronously tearing down a VM on Linux.
> 
> When qemu terminates, either naturally or because of a fatal signal,
> the VM is torn down. If the VM is huge, it can take a considerable
> amount of time for it to be cleaned up. In case of a protected VM, it
> might take even longer than a non-protected VM (this is the case on
> s390x, for example).
> 
> Some users might want to shut down a VM and restart it immediately,
> without having to wait. This is especially true if management
> infrastructure like libvirt is used.
> 
> This patch implements a simple trick on Linux to allow qemu to return
> immediately, with the teardown of the VM being performed
> asynchronously.
> 
> If the new commandline option -async-teardown is used, a new process is
> spawned from qemu at startup, using the clone syscall, in such way that
> it will share its address space with qemu.
> 
> The new process will then simpy wait until qemu terminates, and then it
> will exit itself.
> 
> This allows qemu to terminate quickly, without having to wait for the
> whole address space to be torn down. The teardown process will exit
> after qemu, so it will be the last user of the address space, and
> therefore it will take care of the actual teardown.
> 
> The teardown process will share the same cgroups as qemu, so both
> memory usage and cpu time will be accounted properly.
> 
> This feature can already be used with libvirt by adding the following
> to the XML domain definition:
> 
>   http://libvirt.org/schemas/domain/qemu/1.0;>
>   
>   
> 
> Signed-off-by: Claudio Imbrenda 
> ---
>  include/qemu/osdep.h |  2 ++
>  os-posix.c   |  5 
>  qemu-options.hx  | 17 ++
>  util/osdep.c | 55 
>  4 files changed, 79 insertions(+)


> diff --git a/util/osdep.c b/util/osdep.c
> index 60fcbbaebe..bb0baf97a0 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -23,6 +23,15 @@
>   */
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +
> +#ifdef CONFIG_LINUX
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#endif
> +
>  #include "qemu/cutils.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -512,6 +521,52 @@ const char *qemu_hw_version(void)
>  return hw_version;
>  }
>  
> +#ifdef __linux__
> +static int async_teardown_fn(void *arg)
> +{
> +sigset_t all_signals;
> +fd_set r, w, e;
> +int fd;
> +
> +/* open a pidfd descriptor for the parent qemu process */
> +fd = syscall(__NR_pidfd_open, getppid(), 0);

We ought to open this FD in the parent process to avoid a race
where the parent crashes immediately after clone() and gets
reparented to 'init' before this child process calls pidfd_open,
otherwise it'll sit around waiting for init to exit.

> +/* if something went wrong, or if the file descriptor is too big */
> +if ((fd < 0) || (fd >= FD_SETSIZE)) {
> +_exit(1);
> +}
> +/* zero all fd sets */
> +FD_ZERO();
> +FD_ZERO();
> +FD_ZERO();
> +/* set the fd for the pidfd in the "read" set */
> +FD_SET(fd, );
> +/* block all signals */
> +sigfillset(_signals);
> +sigprocmask(SIG_BLOCK, _signals, NULL);

Technnically this is racy as there's still a window in which the
child is running when signals are not blocked.

> +/* wait for the pid to disappear -> fd will appear as ready for read */
> +(void) select(fd + 1, , , , NULL);

While using pidfd can work, a stronger protection would be to use

   prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);

this guarantees that the kernel will deliver SIGKILL to this
process immediately when the parent QEMU exits.

We should probably do both in fact.

> +
> +/*
> + * Close all file descriptors that might have been inherited from the
> + * main qemu process when doing clone. This is needed to make libvirt
> + * happy.
> + */
> +close_range(0, ~0U, 0);

Shouldn't we close all the FDs immediately when this process is
created, rather than only at the end when we're exiting. I don't
see there's any need to keep them open. Doing it immediately
would be better when using prctl(PR_SET_PDEATHSIG)

> +_exit(0);
> +}
> +
> +void init_async_teardown(void)
> +{
> +const int size = 8192; /* should be more than enough */
> +char *stack = malloc(size);
> +

You need to block all signals here.

> +/* start a new process sharing the address space with qemu */
> +clone(async_teardown_fn, stack + size, CLONE_VM, NULL, NULL, NULL, NULL);

And unblock signals again here.

That way the "everything blocked"  mask is inherited by the child
from the very first moment of its existance.

> +}
> +#else /* __linux__ */
> +void init_async_teardown(void) {}
> +#endif
> +
>  #ifdef _WIN32
>  static void socket_cleanup(void)
>  {
> -- 
> 2.37.1
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-

1 2 3 >

1 - 100 of 257 matches

Mail list logo