date:20230913

Re: [PATCH v5 3/3] hw/nvme: add nvme management interface model

2023-09-13 Thread Klaus Jensen

On Sep 12 13:50, Andrew Jeffery wrote:
> Hi Klaus,
> 
> On Tue, 2023-09-05 at 10:38 +0200, Klaus Jensen wrote:
> > > 
> > > +static void nmi_handle_mi_config_get(NMIDevice *nmi, NMIRequest
> > > *request)
> > > +{
> > > +    uint32_t dw0 = le32_to_cpu(request->dw0);
> > > +    uint8_t identifier = FIELD_EX32(dw0,
> > > NMI_CMD_CONFIGURATION_GET_DW0,
> > > +    IDENTIFIER);
> > > +    const uint8_t *buf;
> > > +
> > > +    static const uint8_t smbus_freq[4] = {
> > > +    0x00,   /* success */
> > > +    0x01, 0x00, 0x00,   /* 100 kHz */
> > > +    };
> > > +
> > > +    static const uint8_t mtu[4] = {
> > > +    0x00,   /* success */
> > > +    0x40, 0x00, /* 64 */
> > > +    0x00,   /* reserved */
> > > +    };
> > > +
> > > +    trace_nmi_handle_mi_config_get(identifier);
> > > +
> > > +    switch (identifier) {
> > > +    case NMI_CMD_CONFIGURATION_GET_SMBUS_FREQ:
> > > +    buf = smbus_freq;
> > > +    break;
> > > +
> > > +    case NMI_CMD_CONFIGURATION_GET_MCTP_TRANSMISSION_UNIT:
> > > +    buf = mtu;
> > > +    break;
> > > +
> > > +    default:
> > > +    nmi_set_parameter_error(nmi, 0x0, offsetof(NMIRequest,
> > > dw0));
> > > +    return;
> > > +    }
> > > +
> > > +    nmi_scratch_append(nmi, buf, sizeof(buf));
> > > +}
> 
> When I tried to build this patch I got:
> 
> ```
> In file included from /usr/include/string.h:535,
>  from /home/andrew/src/qemu.org/qemu/include/qemu/osdep.h:112,
>  from ../hw/nvme/nmi-i2c.c:12:
> In function ‘memcpy’,
> inlined from ‘nmi_scratch_append’ at ../hw/nvme/nmi-i2c.c:80:5,
> inlined from ‘nmi_handle_mi_config_get’ at ../hw/nvme/nmi-i2c.c:246:5,
> inlined from ‘nmi_handle_mi’ at ../hw/nvme/nmi-i2c.c:266:9,
> inlined from ‘nmi_handle’ at ../hw/nvme/nmi-i2c.c:313:9:
> /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: error: 
> ‘__builtin_memcpy’ forming offset [4, 7] is out of the bounds [0, 4] 
> [-Werror=array-bounds=]
>29 |   return __builtin___memcpy_chk (__dest, __src, __len,
>   |  ^
>30 |  __glibc_objsize0 (__dest));
>   |  ~~
> ```
> 
> It wasn't clear initially from the error that the source of the problem
> was the size associated with the source buffer, especially as there is
> some pointer arithmetic being done to derive `__dest`.
> 
> Anyway, what we're trying to express is that the size to copy from buf
> is the size of the array pointed to by buf. However, buf is declared as
> a pointer to uint8_t, which loses the length information. To fix that I
> think we need:
> 
> - const uint8_t *buf;
> + const uint8_t (*buf)[4];
> 
> and then:
> 
> - nmi_scratch_append(nmi, buf, sizeof(buf));
> + nmi_scratch_append(nmi, buf, sizeof(*buf));
> 
> Andrew
> 

Hi Andrew,

Nice (and important) catch! Just curious, are you massaging QEMU's build
system into adding additional checks or how did your compiler catch
this?

Thanks,
Klaus


signature.asc
Description: PGP signature

Re: [PATCH v3 1/5] block: remove AIOCBInfo->get_aio_context()

2023-09-13 Thread Klaus Jensen

On Sep 12 19:10, Stefan Hajnoczi wrote:
> The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
> it can call aio_poll() to wait for cancellation.
> 
> It turns out that all users run under the BQL in the main AioContext, so
> this callback is not needed.
> 
> Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
> its blk_aio_cancel() caller, and poll the main loop AioContext.
> 
> The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
> that does not work with the multi-queue block layer.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/aio.h|  1 -
>  include/block/block-global-state.h |  2 ++
>  include/block/block-io.h   |  1 -
>  block/block-backend.c  | 17 -
>  block/io.c | 23 ---
>  hw/nvme/ctrl.c |  7 ---
>  softmmu/dma-helpers.c  |  8 
>  util/thread-pool.c |  8 
>  8 files changed, 10 insertions(+), 57 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 539d273553..ee7273daa1 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -2130,11 +2130,6 @@ static inline bool nvme_is_write(NvmeRequest *req)
> rw->opcode == NVME_CMD_WRITE_ZEROES;
>  }
>  
> -static AioContext *nvme_get_aio_context(BlockAIOCB *acb)
> -{
> -return qemu_get_aio_context();
> -}
> -
>  static void nvme_misc_cb(void *opaque, int ret)
>  {
>  NvmeRequest *req = opaque;
> @@ -3302,7 +3297,6 @@ static void nvme_flush_cancel(BlockAIOCB *acb)
>  static const AIOCBInfo nvme_flush_aiocb_info = {
>  .aiocb_size = sizeof(NvmeFlushAIOCB),
>  .cancel_async = nvme_flush_cancel,
> -.get_aio_context = nvme_get_aio_context,
>  };
>  
>  static void nvme_do_flush(NvmeFlushAIOCB *iocb);
> @@ -6478,7 +6472,6 @@ static void nvme_format_cancel(BlockAIOCB *aiocb)
>  static const AIOCBInfo nvme_format_aiocb_info = {
>  .aiocb_size = sizeof(NvmeFormatAIOCB),
>  .cancel_async = nvme_format_cancel,
> -.get_aio_context = nvme_get_aio_context,
>  };
>  
>  static void nvme_format_set(NvmeNamespace *ns, uint8_t lbaf, uint8_t mset,

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH 1/3] esp: use correct type for esp_dma_enable() in sysbus_esp_gpio_demux()

2023-09-13 Thread Philippe Mathieu-Daudé


On 13/9/23 22:44, Mark Cave-Ayland wrote:

The call to esp_dma_enable() was being made with the SYSBUS_ESP type instead of
the ESP type. This meant that when GPIO 1 was being used to trigger a DMA
request from an external DMA controller, the setting of ESPState's dma_enabled
field would clobber unknown memory whilst the dma_cb callback pointer would
typically return NULL so the DMA request would never start.



Cc: qemu-sta...@nongnu.org
Fixes: a391fdbc7f ("esp: split esp code into generic chip emulation and 
sysbus layer")

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Mark Cave-Ayland 
---
  hw/scsi/esp.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index e52188d022..4218a6a960 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -1395,7 +1395,7 @@ static void sysbus_esp_gpio_demux(void *opaque, int irq, 
int level)
  parent_esp_reset(s, irq, level);
  break;
  case 1:
-esp_dma_enable(opaque, irq, level);
+esp_dma_enable(s, irq, level);
  break;
  }
  }

Re: [PATCH v2] target/i386: Export GDS_NO bit to guests

2023-09-13 Thread Jinpu Wang

Hi Paolo,

Ping!

Thx!

On Tue, Aug 15, 2023 at 7:44 AM Xiaoyao Li  wrote:
>
> On 8/15/2023 12:54 PM, Pawan Gupta wrote:
> > Gather Data Sampling (GDS) is a side-channel attack using Gather
> > instructions. Some Intel processors will set ARCH_CAP_GDS_NO bit in
> > MSR IA32_ARCH_CAPABILITIES to report that they are not vulnerable to
> > GDS.
> >
> > Make this bit available to guests.
> >
> > Closes: 
> > https://lore.kernel.org/qemu-devel/camgffemg6tnq0n3+4ojagxc8j0oevy60khzekxcbs3lok9v...@mail.gmail.com/
> > Reported-by: Jack Wang 
> > Signed-off-by: Pawan Gupta 
> > Tested-by: Jack Wang 
> > Tested-by: Daniel Sneddon 
>
> Reviewed-by: Xiaoyao Li 
>
> > ---
> > v2: Added commit tags
> >
> > v1: 
> > https://lore.kernel.org/qemu-devel/c373f3f92b542b738f296d44bb6a916a1cded7bd.1691774049.git.pawan.kumar.gu...@linux.intel.com/
> >
> >   target/i386/cpu.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 97ad229d8ba3..48709b77689f 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -1155,7 +1155,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
> >   NULL, "sbdr-ssdp-no", "fbsdp-no", "psdp-no",
> >   NULL, "fb-clear", NULL, NULL,
> >   NULL, NULL, NULL, NULL,
> > -"pbrsb-no", NULL, NULL, NULL,
> > +"pbrsb-no", NULL, "gds-no", NULL,
> >   NULL, NULL, NULL, NULL,
> >   },
> >   .msr = {
>

Re: [PATCH v2 21/24] accel/tcg: Use CPUState in atomicity helpers

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

From: Anton Johansson 

Makes ldst_atomicity.c.inc almost target-independent, with the exception
of TARGET_PAGE_MASK, which will be addressed in a future patch.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-8-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
  accel/tcg/cputlb.c | 20 
  accel/tcg/user-exec.c  | 16 +++
  accel/tcg/ldst_atomicity.c.inc | 88 +-
  3 files changed, 62 insertions(+), 62 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 20/24] accel/tcg: Modify atomic_mmu_lookup() to use CPUState

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

From: Anton Johansson 

The goal is to (in the future) allow for per-target compilation of
functions in atomic_template.h whilst atomic_mmu_lookup() and cputlb.c
are compiled once-per user- or system mode.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-7-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb()]
Signed-off-by: Richard Henderson 
---
  accel/tcg/atomic_template.h | 20 
  accel/tcg/cputlb.c  | 26 +-
  accel/tcg/user-exec.c   |  8 
  3 files changed, 29 insertions(+), 25 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 19/24] accel/tcg: Modifies memory access functions to use CPUState

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

From: Anton Johansson 

do_[ld|st]*() and mmu_lookup*() are changed to use CPUState over
CPUArchState, moving the target-dependence to the target-facing facing
cpu_[ld|st] functions.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-6-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb; cpu_env instead of env_ptr.]
Signed-off-by: Richard Henderson 
---
  accel/tcg/cputlb.c | 324 ++---
  1 file changed, 161 insertions(+), 163 deletions(-)


s/Modifies/Modify/ in subject.

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 18/24] accel/tcg: Modify probe_access_internal() to use CPUState

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

From: Anton Johansson 

probe_access_internal() is changed to instead take the generic CPUState
over CPUArchState, in order to lessen the target-specific coupling of
cputlb.c. Note: probe_access*() also don't need the full CPUArchState,
but aren't touched in this patch as they are target-facing.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-5-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb()]
Signed-off-by: Richard Henderson 
---
  accel/tcg/cputlb.c | 46 +++---
  1 file changed, 23 insertions(+), 23 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 14/24] accel/tcg: Remove cpu_set_cpustate_pointers

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

This function is now empty, so remove it.  In the case of
m68k and tricore, this empties the class instance initfn,
so remove those as well.

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h  | 10 --
  target/alpha/cpu.c  |  2 --
  target/arm/cpu.c|  1 -
  target/avr/cpu.c|  2 --
  target/cris/cpu.c   |  2 --
  target/hexagon/cpu.c|  3 ---
  target/hppa/cpu.c   |  1 -
  target/i386/cpu.c   |  1 -
  target/loongarch/cpu.c  |  8 +++-
  target/m68k/cpu.c   |  8 
  target/microblaze/cpu.c |  1 -
  target/mips/cpu.c   |  1 -
  target/nios2/cpu.c  |  4 +---
  target/openrisc/cpu.c   |  6 +-
  target/ppc/cpu_init.c   |  1 -
  target/riscv/cpu.c  |  6 +-
  target/rx/cpu.c |  1 -
  target/s390x/cpu.c  |  2 --
  target/sh4/cpu.c|  2 --
  target/sparc/cpu.c  |  2 --
  target/tricore/cpu.c|  9 -
  target/xtensa/cpu.c |  1 -
  22 files changed, 6 insertions(+), 68 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 12/24] tcg: Rename cpu_env to tcg_env

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

Allow the name 'cpu_env' to be used for something else.

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h |2 +-
  target/arm/tcg/translate-a32.h|2 +-
  target/arm/tcg/translate-a64.h|4 +-
  target/arm/tcg/translate.h|   16 +-
  target/hexagon/gen_tcg.h  |  120 +-
  target/hexagon/gen_tcg_hvx.h  |   20 +-
  target/hexagon/macros.h   |8 +-
  target/mips/tcg/translate.h   |6 +-
  accel/tcg/translator.c|8 +-
  target/alpha/translate.c  |  142 +-
  target/arm/tcg/translate-a64.c|  374 ++---
  target/arm/tcg/translate-m-nocp.c |   24 +-
  target/arm/tcg/translate-mve.c|   52 +-
  target/arm/tcg/translate-neon.c   |   78 +-
  target/arm/tcg/translate-sme.c|8 +-
  target/arm/tcg/translate-sve.c|  172 +--
  target/arm/tcg/translate-vfp.c|   56 +-
  target/arm/tcg/translate.c|  228 +--
  target/avr/translate.c|   64 +-
  target/cris/translate.c   |   68 +-
  target/hexagon/genptr.c   |   36 +-
  target/hexagon/idef-parser/parser-helpers.c   |2 +-
  target/hexagon/translate.c|   48 +-
  target/hppa/translate.c   |  144 +-
  target/i386/tcg/translate.c   |  580 
  target/loongarch/translate.c  |   18 +-
  target/m68k/translate.c   |  302 ++--
  target/microblaze/translate.c |   50 +-
  target/mips/tcg/lcsr_translate.c  |6 +-
  target/mips/tcg/msa_translate.c   |   34 +-
  target/mips/tcg/mxu_translate.c   |4 +-
  target/mips/tcg/translate.c   | 1284 -
  target/mips/tcg/vr54xx_translate.c|2 +-
  target/nios2/translate.c  |   48 +-
  target/openrisc/translate.c   |   84 +-
  target/ppc/translate.c|  362 ++---
  target/riscv/translate.c  |   50 +-
  target/rx/translate.c |   56 +-
  target/s390x/tcg/translate.c  |  424 +++---
  target/sh4/translate.c|  124 +-
  target/sparc/translate.c  |  328 ++---
  target/tricore/translate.c|  220 +--
  target/xtensa/translate.c |  188 +--
  tcg/tcg-op-gvec.c |  288 ++--
  tcg/tcg-op-ldst.c |   22 +-
  tcg/tcg-op.c  |2 +-
  tcg/tcg.c |4 +-
  target/cris/translate_v10.c.inc   |   28 +-
  target/i386/tcg/decode-new.c.inc  |2 +-
  target/i386/tcg/emit.c.inc|  262 ++--
  .../loongarch/insn_trans/trans_atomic.c.inc   |4 +-
  .../loongarch/insn_trans/trans_branch.c.inc   |2 +-
  target/loongarch/insn_trans/trans_extra.c.inc |   10 +-
  .../loongarch/insn_trans/trans_farith.c.inc   |6 +-
  target/loongarch/insn_trans/trans_fcmp.c.inc  |8 +-
  .../loongarch/insn_trans/trans_fmemory.c.inc  |8 +-
  target/loongarch/insn_trans/trans_fmov.c.inc  |   20 +-
  target/loongarch/insn_trans/trans_lsx.c.inc   |   44 +-
  .../loongarch/insn_trans/trans_memory.c.inc   |8 +-
  .../insn_trans/trans_privileged.c.inc |   52 +-
  target/mips/tcg/micromips_translate.c.inc |   12 +-
  target/mips/tcg/nanomips_translate.c.inc  |  200 +--
  target/ppc/power8-pmu-regs.c.inc  |8 +-
  target/ppc/translate/branch-impl.c.inc|2 +-
  target/ppc/translate/dfp-impl.c.inc   |   22 +-
  target/ppc/translate/fixedpoint-impl.c.inc|2 +-
  target/ppc/translate/fp-impl.c.inc|   50 +-
  .../ppc/translate/processor-ctrl-impl.c.inc   |8 +-
  target/ppc/translate/spe-impl.c.inc   |   30 +-
  target/ppc/translate/storage-ctrl-impl.c.inc  |   26 +-
  target/ppc/translate/vmx-impl.c.inc   |   34 +-
  target/ppc/translate/vsx-impl.c.inc   |   54 +-
  .../riscv/insn_trans/trans_privileged.c.inc   |8 +-
  target/riscv/insn_trans/trans_rvbf16.c.inc|   10 +-
  target/riscv/insn_trans/trans_rvd.c.inc   |   48 +-
  target/riscv/insn_trans/trans_rvf.c.inc   |   46 +-
  target/riscv/insn_trans/trans_rvh.c.inc   |8 +-
  target/riscv/insn_trans/trans_rvi.c.inc   |   16 +-
  target/riscv/insn_trans/trans_rvm.c.inc   |   16 +-
  target/riscv/insn_trans/trans_rvv.c.inc   |  130 +-
  target/riscv/insn_trans/trans_rvvk.c.inc  |   30 +-
  target/riscv/insn_trans/trans_rvzce.c.inc |2 +-
  target/riscv/insn_trans/trans_rvzfa.c.inc |   38 +-

Re: [PATCH v2 11/24] accel/tcg: Remove cpu_neg()

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

Now that CPUNegativeOffsetState is part of CPUState,
we can reference it directly.

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h   | 11 ---
  include/exec/exec-all.h  |  2 +-
  accel/tcg/cpu-exec.c | 14 +++---
  accel/tcg/tcg-accel-ops-icount.c |  6 +++---
  accel/tcg/tcg-accel-ops.c|  2 +-
  accel/tcg/translate-all.c|  6 +++---
  softmmu/icount.c |  2 +-
  7 files changed, 16 insertions(+), 27 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 10/24] accel/tcg: Move can_do_io to CPUNegativeOffsetState

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

Minimize the displacement to can_do_io, since it may
be touched at the start of each TranslationBlock.
It fits into other padding within the substructure.

Signed-off-by: Richard Henderson 
---
  include/hw/core/cpu.h| 2 +-
  accel/dummy-cpus.c   | 2 +-
  accel/kvm/kvm-accel-ops.c| 2 +-
  accel/tcg/cpu-exec-common.c  | 2 +-
  accel/tcg/cpu-exec.c | 2 +-
  accel/tcg/cputlb.c   | 4 ++--
  accel/tcg/tcg-accel-ops-icount.c | 2 +-
  accel/tcg/tcg-accel-ops-mttcg.c  | 2 +-
  accel/tcg/tcg-accel-ops-rr.c | 4 ++--
  accel/tcg/translator.c   | 4 ++--
  hw/core/cpu-common.c | 2 +-
  softmmu/icount.c | 2 +-
  softmmu/watchpoint.c | 2 +-
  13 files changed, 16 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 06/24] target/*: Add instance_align to all cpu base classes

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

The omission of alignment has technically been wrong since
269bd5d8f61, where QEMU_ALIGNED was added to CPUTLBDescFast.

Signed-off-by: Richard Henderson 
---
  target/alpha/cpu.c  | 1 +
  target/avr/cpu.c| 1 +
  target/cris/cpu.c   | 1 +
  target/hexagon/cpu.c| 1 +
  target/hppa/cpu.c   | 1 +
  target/i386/cpu.c   | 1 +
  target/loongarch/cpu.c  | 1 +
  target/m68k/cpu.c   | 1 +
  target/microblaze/cpu.c | 1 +
  target/mips/cpu.c   | 1 +
  target/nios2/cpu.c  | 1 +
  target/openrisc/cpu.c   | 1 +
  target/riscv/cpu.c  | 2 +-
  target/rx/cpu.c | 1 +
  target/sh4/cpu.c| 1 +
  target/sparc/cpu.c  | 1 +
  target/tricore/cpu.c| 1 +
  target/xtensa/cpu.c | 1 +
  18 files changed, 18 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 05/24] target/arm: Remove size and alignment for cpu subclasses

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

Inherit the size and alignment from TYPE_ARM_CPU.

Signed-off-by: Richard Henderson 
---
  target/arm/cpu.c   | 3 ---
  target/arm/cpu64.c | 4 
  2 files changed, 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 04/24] target/*: Use alignof not alignof__

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

No functional change, just using a common spelling.

Signed-off-by: Richard Henderson 
---
  target/arm/cpu.c  | 2 +-
  target/ppc/cpu_init.c | 2 +-
  target/s390x/cpu.c| 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)


Why this one and not the other?

$ git grep -w __alignof__ | wc -l
  21
$ git grep -w __alignof | wc -l
   5

In particular:

include/qom/object.h:298:.instance_align = 
__alignof__(ModuleObjName), \


If we have a preference, we should clean tree-wide.

Regardless,
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 03/24] qom: Propagate alignment through type system

2023-09-13 Thread Philippe Mathieu-Daudé


On 14/9/23 04:44, Richard Henderson wrote:

Propagate alignment just like size.  This is required in
order to get the correct alignment on most cpu subclasses
where the size and alignment is only specified for the
base cpu type.

Signed-off-by: Richard Henderson 
---
  qom/object.c | 14 ++
  1 file changed, 14 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] ui: add XBGR8888 and ABGR8888 in drm_format_pixman_map

2023-09-13 Thread Philippe Mathieu-Daudé


Cc'ing maintainers:

$ ./scripts/get_maintainer.pl -f ui/qemu-pixman.c
Gerd Hoffmann  (odd fixer:Graphics)
"Marc-André Lureau"  (odd fixer:Graphics)

On 14/9/23 03:31, Ken Xue wrote:

Android uses XBGR and ABGR as default scanout buffer, But qemu
does not support them for qemu_pixman_to_drm_format conversion within
virtio_gpu_create_dmabuf for virtio gpu.

so, add those 2 formats into drm_format_pixman_map.

Signed-off-by: Ken Xue 
---
  include/ui/qemu-pixman.h | 4 
  ui/qemu-pixman.c | 4 +++-
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/ui/qemu-pixman.h b/include/ui/qemu-pixman.h
index 51f870932791..e587c48b1fde 100644
--- a/include/ui/qemu-pixman.h
+++ b/include/ui/qemu-pixman.h
@@ -32,6 +32,8 @@
  # define PIXMAN_LE_r8g8b8 PIXMAN_b8g8r8
  # define PIXMAN_LE_a8r8g8b8   PIXMAN_b8g8r8a8
  # define PIXMAN_LE_x8r8g8b8   PIXMAN_b8g8r8x8
+# define PIXMAN_LE_a8b8g8r8   PIXMAN_r8g8b8a8
+# define PIXMAN_LE_x8b8g8r8   PIXMAN_r8g8b8x8
  #else
  # define PIXMAN_BE_r8g8b8 PIXMAN_b8g8r8
  # define PIXMAN_BE_x8r8g8b8   PIXMAN_b8g8r8x8
@@ -45,6 +47,8 @@
  # define PIXMAN_LE_r8g8b8 PIXMAN_r8g8b8
  # define PIXMAN_LE_a8r8g8b8   PIXMAN_a8r8g8b8
  # define PIXMAN_LE_x8r8g8b8   PIXMAN_x8r8g8b8
+# define PIXMAN_LE_a8b8g8r8   PIXMAN_a8b8g8r8
+# define PIXMAN_LE_x8b8g8r8   PIXMAN_x8b8g8r8
  #endif
  
  #define QEMU_PIXMAN_COLOR(r, g, b)   \

diff --git a/ui/qemu-pixman.c b/ui/qemu-pixman.c
index be00a96340d3..b43ec38bf0e9 100644
--- a/ui/qemu-pixman.c
+++ b/ui/qemu-pixman.c
@@ -96,7 +96,9 @@ static const struct {
  } drm_format_pixman_map[] = {
  { DRM_FORMAT_RGB888,   PIXMAN_LE_r8g8b8   },
  { DRM_FORMAT_ARGB, PIXMAN_LE_a8r8g8b8 },
-{ DRM_FORMAT_XRGB, PIXMAN_LE_x8r8g8b8 }
+{ DRM_FORMAT_XRGB, PIXMAN_LE_x8r8g8b8 },
+{ DRM_FORMAT_XBGR, PIXMAN_LE_x8b8g8r8 },
+{ DRM_FORMAT_ABGR, PIXMAN_LE_a8b8g8r8 },
  };
  
  pixman_format_code_t qemu_drm_format_to_pixman(uint32_t drm_format)


base-commit: 9a8af699677cdf58e92ff43f38ea74bbe9d37ab0

Re: [PATCH] mem/x86: add processor address space check for VM memory

2023-09-13 Thread Ani Sinha




> On 12-Sep-2023, at 9:04 PM, David Hildenbrand  wrote:
> 
> [...]
> 
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 54838c0c41..d187890675 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
>>> uint64_t pci_hole64_size)
>>> {
>>> X86CPU *cpu = X86_CPU(first_cpu);
>>> 
>>> -/* 32-bit systems don't have hole64 thus return max CPU address */
>>> -if (cpu->phys_bits <= 32) {
>>> -return ((hwaddr)1 << cpu->phys_bits) - 1;
>>> +/*
>>> + * 32-bit systems don't have hole64, but we might have a region for
>>> + * memory hotplug.
>>> + */
>>> +if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
>>> +return pc_pci_hole64_start() - 1;
>> Ok this is very confusing! I am looking at pc_pci_hole64_start() function. I 
>> have a few questions …
>> (a) pc_get_device_memory_range() returns the size of the device memory as 
>> the difference between ram_size and maxram_size. But from what I understand, 
>> ram_size is the actual size of the ram present and maxram_size is the max 
>> size of ram *after* hot plugging additional memory. How can we assume that 
>> the additional available space is already occupied by hot plugged memory?
> 
> Let's take a look at an example:
> 
> $ ./build/qemu-system-x86_64 -m 8g,maxmem=16g,slots=1 \
>  -object memory-backend-ram,id=mem0,size=1g \
>  -device pc-dimm,memdev=mem0 \
>  -nodefaults -nographic -S -monitor stdio
> 
> (qemu) info mtree
> ...
> memory-region: system
>  - (prio 0, i/o): system
>-bfff (prio 0, ram): alias ram-below-4g 
> @pc.ram -bfff
>- (prio -1, i/o): pci
>  000c-000d (prio 1, rom): pc.rom
>  000e-000f (prio 1, rom): alias isa-bios @pc.bios 
> 0002-0003
>  fffc- (prio 0, rom): pc.bios
>000a-000b (prio 1, i/o): alias smram-region @pci 
> 000a-000b
>000c-000c3fff (prio 1, i/o): alias pam-pci @pci 
> 000c-000c3fff
>000c4000-000c7fff (prio 1, i/o): alias pam-pci @pci 
> 000c4000-000c7fff
>000c8000-000cbfff (prio 1, i/o): alias pam-pci @pci 
> 000c8000-000cbfff
>000cc000-000c (prio 1, i/o): alias pam-pci @pci 
> 000cc000-000c
>000d-000d3fff (prio 1, i/o): alias pam-pci @pci 
> 000d-000d3fff
>000d4000-000d7fff (prio 1, i/o): alias pam-pci @pci 
> 000d4000-000d7fff
>000d8000-000dbfff (prio 1, i/o): alias pam-pci @pci 
> 000d8000-000dbfff
>000dc000-000d (prio 1, i/o): alias pam-pci @pci 
> 000dc000-000d
>000e-000e3fff (prio 1, i/o): alias pam-pci @pci 
> 000e-000e3fff
>000e4000-000e7fff (prio 1, i/o): alias pam-pci @pci 
> 000e4000-000e7fff
>000e8000-000ebfff (prio 1, i/o): alias pam-pci @pci 
> 000e8000-000ebfff
>000ec000-000e (prio 1, i/o): alias pam-pci @pci 
> 000ec000-000e
>000f-000f (prio 1, i/o): alias pam-pci @pci 
> 000f-000f
>fec0-fec00fff (prio 0, i/o): ioapic
>fed0-fed003ff (prio 0, i/o): hpet
>fee0-feef (prio 4096, i/o): apic-msi
>0001-00023fff (prio 0, ram): alias ram-above-4g 
> @pc.ram c000-0001
>00024000-00047fff (prio 0, i/o): device-memory
>  00024000-00027fff (prio 0, ram): mem0
> 
> 
> We requested 8G of boot memory, which is split between "<4G" memory and 
> ">=4G" memory.
> 
> We only place exactly 3G (0x0->0xbfff) under 4G, starting at address 0.

I can’t reconcile this with this code for q35:

   if (machine->ram_size >= 0xb000) {
lowmem = 0x8000; // max memory 0x8fff or 2.25 GiB   


} else {
lowmem = 0xb000; // max memory 0xbfff or 3 GiB  

  
}

You assigned 8 Gib to ram which is > 0xb000 (2.75 Gib) 


> 
> We leave the remainder (1G) of the <4G addresses available for I/O devices 
> (32bit PCI hole).
> 
> So we end up with 5G (0x1->0x23fff) of memory starting exactly at 
> address 4G.
> 
> "maxram_size - ram_size"=8G is the maximum amount of memo

Re: [PATCH v2 0/2] hw/mips/jazz: Rework the NIC init code

2023-09-13 Thread Philippe Mathieu-Daudé


On 13/9/23 18:09, Thomas Huth wrote:


Thomas Huth (2):
   hw/mips/jazz: Move the NIC init code into a separate function
   hw/mips/jazz: Simplify the NIC setup code


Reviewed-by: Philippe Mathieu-Daudé

Re: [RFC 0/7] vhost-vdpa: add support for iommufd

2023-09-13 Thread Cindy Lu

Hi Michael，
Really sorry for the delay, I was sick-leave for almost 2 months,
which caused the delay in the development of this feature.  I will
continue working on this feature soon.
Thanks
Cindy

On Wed, Sep 13, 2023 at 9:31 PM Michael S. Tsirkin  wrote:
>
> On Wed, May 03, 2023 at 05:13:30PM +0800, Cindy Lu wrote:
> > Hi All
> > There is the RFC to support the IOMMUFD in vdpa device
> > any comments are welcome
> > Thanks
> > Cindy
>
> Any plans to work on this or should I consider this abandoned?
>
>
> > Cindy Lu (7):
> >   vhost: introduce new UAPI to support IOMMUFD
> >   qapi: support iommufd in vdpa
> >   virtio : add a ptr for vdpa_iommufd in VirtIODevice
> >   net/vhost-vdpa: Add the check for iommufd
> >   vhost-vdpa: Add the iommufd support in the map/unmap function
> >   vhost-vdpa: init iommufd function in vhost_vdpa start
> >   vhost-vdpa-iommufd: Add iommufd support for vdpa
> >
> >  hw/virtio/meson.build  |   2 +-
> >  hw/virtio/vhost-vdpa-iommufd.c | 240 +
> >  hw/virtio/vhost-vdpa.c |  74 +-
> >  include/hw/virtio/vhost-vdpa.h |  47 +++
> >  include/hw/virtio/virtio.h |   5 +
> >  linux-headers/linux/vhost.h|  72 ++
> >  net/vhost-vdpa.c   |  31 +++--
> >  qapi/net.json  |   1 +
> >  8 files changed, 451 insertions(+), 21 deletions(-)
> >  create mode 100644 hw/virtio/vhost-vdpa-iommufd.c
> >
> > --
> > 2.34.3
>

Re: [PATCH 3/4] hw/cxl: CXLDVSECPortExtensions renamed to CXLDVSECPortExt

2023-09-13 Thread Philippe Mathieu-Daudé


On 13/9/23 17:05, Jonathan Cameron wrote:

Done to reduce line lengths where this is used.
Ext seems sufficiently obvious that it need not be spelt out
fully.

Signed-off-by: Jonathan Cameron 
---
  include/hw/cxl/cxl_pci.h   |  6 ++---
  hw/cxl/cxl-component-utils.c   | 49 --
  hw/pci-bridge/cxl_downstream.c |  2 +-
  hw/pci-bridge/cxl_root_port.c  |  2 +-
  hw/pci-bridge/cxl_upstream.c   |  2 +-
  5 files changed, 35 insertions(+), 26 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH] ui: add XBGR8888 and ABGR8888 in drm_format_pixman_map

2023-09-13 Thread Ken Xue

Android uses XBGR and ABGR as default scanout buffer, But qemu
does not support them for qemu_pixman_to_drm_format conversion within
virtio_gpu_create_dmabuf for virtio gpu.

so, add those 2 formats into drm_format_pixman_map.

Signed-off-by: Ken Xue 
---
 include/ui/qemu-pixman.h | 4 
 ui/qemu-pixman.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/ui/qemu-pixman.h b/include/ui/qemu-pixman.h
index 51f870932791..e587c48b1fde 100644
--- a/include/ui/qemu-pixman.h
+++ b/include/ui/qemu-pixman.h
@@ -32,6 +32,8 @@
 # define PIXMAN_LE_r8g8b8 PIXMAN_b8g8r8
 # define PIXMAN_LE_a8r8g8b8   PIXMAN_b8g8r8a8
 # define PIXMAN_LE_x8r8g8b8   PIXMAN_b8g8r8x8
+# define PIXMAN_LE_a8b8g8r8   PIXMAN_r8g8b8a8
+# define PIXMAN_LE_x8b8g8r8   PIXMAN_r8g8b8x8
 #else
 # define PIXMAN_BE_r8g8b8 PIXMAN_b8g8r8
 # define PIXMAN_BE_x8r8g8b8   PIXMAN_b8g8r8x8
@@ -45,6 +47,8 @@
 # define PIXMAN_LE_r8g8b8 PIXMAN_r8g8b8
 # define PIXMAN_LE_a8r8g8b8   PIXMAN_a8r8g8b8
 # define PIXMAN_LE_x8r8g8b8   PIXMAN_x8r8g8b8
+# define PIXMAN_LE_a8b8g8r8   PIXMAN_a8b8g8r8
+# define PIXMAN_LE_x8b8g8r8   PIXMAN_x8b8g8r8
 #endif
 
 #define QEMU_PIXMAN_COLOR(r, g, b) 
  \
diff --git a/ui/qemu-pixman.c b/ui/qemu-pixman.c
index be00a96340d3..b43ec38bf0e9 100644
--- a/ui/qemu-pixman.c
+++ b/ui/qemu-pixman.c
@@ -96,7 +96,9 @@ static const struct {
 } drm_format_pixman_map[] = {
 { DRM_FORMAT_RGB888,   PIXMAN_LE_r8g8b8   },
 { DRM_FORMAT_ARGB, PIXMAN_LE_a8r8g8b8 },
-{ DRM_FORMAT_XRGB, PIXMAN_LE_x8r8g8b8 }
+{ DRM_FORMAT_XRGB, PIXMAN_LE_x8r8g8b8 },
+{ DRM_FORMAT_XBGR, PIXMAN_LE_x8b8g8r8 },
+{ DRM_FORMAT_ABGR, PIXMAN_LE_a8b8g8r8 },
 };
 
 pixman_format_code_t qemu_drm_format_to_pixman(uint32_t drm_format)

base-commit: 9a8af699677cdf58e92ff43f38ea74bbe9d37ab0
-- 
2.35.1

50% of all time spent in victim_tlb_hit() !? (or case when OVPSim beats QEMU hands down)

2023-09-13 Thread Igor Lesik

Hi.

I came across a case when OVPSim shamelessly outperforms QEMU. In 8 CPUs test,
OPVSim single-thread is faster than QEMU tcg-single 4 times, and faster than 
QEMU mttcg by ~30%.

I constructed a simple test case that reproduces it.
When I profiled the test I saw that ~50% of all time QEMU spends inside 
function  victim_tlb_hit (according to perf tool).

Setup:
1. For both QEMU and OPVSim I made simple machine with 8 RISC-V CPUs and one 
RAM (system mode).
2. Host machine is x86 with 4 Cores, but only 1 thread per Core, so 4 HW 
threads only.
3. The test is "bare metal", no OS.
4. All CPUs run the same program, no explicit synchronizations in the code.
5. Both QEMU and OPVSim use semihosting EXIT and simulation ends when "last" 
exit happens.

Test:

```
#define N (1000ul * 60ul)
#define M (1024*1024)

int my_main(int argc, char* argv[]) {

  volatile long unsigned int a = 0;
  volatile long unsigned int b[M] = {};
  volatile long unsigned int c[M] = {};

  for (long unsigned int i = 1; i < N; i++) {
  int j = i % M;
  a += i;
  a |= (b[j] * i);
  b[j] += a & (c[j] / i);
  c[j] += i + a;
  a += b[j] - c[j];
  }

  //consume a
```

Perf report:

```
  46.78%  qemu-system-riscv64  [.] victim_tlb_hit
  23.68%  qemu-system-riscv64  [.] helper_le_ldq_mmu
   4.46%  qemu-system-riscv64  [.] helper_latch_ld_dest_reg_id
```

victim_tlb_hit
```
   │jne1f9
   │lea(%rax,%r9,1),%rcx
   │add$0x130,%rcx
  0.25 │mov$0x7,%edi
  0.29 │126:shl$0x4,%rsi
  0.39 │mov%rdx,%r8
  1.65 │shl$0x5,%r8
  0.35 │add0x1fa8(%rax,%rsi,1),%r8
  0.32 │139:mov$0x1,%esi
  0.37 │xchg   %esi,(%rax)
 51.86 │test   %esi,%esi
   │je 150
   │jmp148
   │146:pause
```


Results:
1. OPVSim single 4 times faster than QEMU tcg-single.
2. OPVSim single ~30% times faster than QEMU mttcg.
3. When M changed from 1M to 2, OPVSim single 2 times faster than QEMU 
tcg-single,
   and 2 time slower than QEMU mttcg.

Question: does someone have an idea/intuition how QEMU code can be improved to 
speed up the simulation in cases like this?

Thanks,
Igor

Re: [PATCH v11 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-09-13 Thread Gurchetan Singh

On Wed, Sep 13, 2023 at 4:58 AM Bernhard Beschow  wrote:

>
>
> Am 23. August 2023 01:25:38 UTC schrieb Gurchetan Singh <
> gurchetansi...@chromium.org>:
> >This adds initial support for gfxstream and cross-domain.  Both
> >features rely on virtio-gpu blob resources and context types, which
> >are also implemented in this patch.
> >
> >gfxstream has a long and illustrious history in Android graphics
> >paravirtualization.  It has been powering graphics in the Android
> >Studio Emulator for more than a decade, which is the main developer
> >platform.
> >
> >Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
> >The key design characteristic was a 1:1 threading model and
> >auto-generation, which fit nicely with the OpenGLES spec.  It also
> >allowed easy layering with ANGLE on the host, which provides the GLES
> >implementations on Windows or MacOS enviroments.
> >
> >gfxstream has traditionally been maintained by a single engineer, and
> >between 2015 to 2021, the goldfish throne passed to Frank Yang.
> >Historians often remark this glorious reign ("pax gfxstreama" is the
> >academic term) was comparable to that of Augustus and both Queen
> >Elizabeths.  Just to name a few accomplishments in a resplendent
> >panoply: higher versions of GLES, address space graphics, snapshot
> >support and CTS compliant Vulkan [b].
> >
> >One major drawback was the use of out-of-tree goldfish drivers.
> >Android engineers didn't know much about DRM/KMS and especially TTM so
> >a simple guest to host pipe was conceived.
> >
> >Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
> >the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
> >port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
> >It was a symbol compatible replacement of virglrenderer [c] and named
> >"AVDVirglrenderer".  This implementation forms the basis of the
> >current gfxstream host implementation still in use today.
> >
> >cross-domain support follows a similar arc.  Originally conceived by
> >Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
> >2018, it initially relied on the downstream "virtio-wl" device.
> >
> >In 2020 and 2021, virtio-gpu was extended to include blob resources
> >and multiple timelines by yours truly, features gfxstream/cross-domain
> >both require to function correctly.
> >
> >Right now, we stand at the precipice of a truly fantastic possibility:
> >the Android Emulator powered by upstream QEMU and upstream Linux
> >kernel.  gfxstream will then be packaged properfully, and app
> >developers can even fix gfxstream bugs on their own if they encounter
> >them.
> >
> >It's been quite the ride, my friends.  Where will gfxstream head next,
> >nobody really knows.  I wouldn't be surprised if it's around for
> >another decade, maintained by a new generation of Android graphics
> >enthusiasts.
> >
> >Technical details:
> >  - Very simple initial display integration: just used Pixman
> >  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
> >calls
> >
> >Next steps for Android VMs:
> >  - The next step would be improving display integration and UI interfaces
> >with the goal of the QEMU upstream graphics being in an emulator
> >release [d].
> >
> >Next steps for Linux VMs for display virtualization:
> >  - For widespread distribution, someone needs to package Sommelier or the
> >wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
> >versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
> >which allows disabling KMS hypercalls.  If anyone cares enough, it'll
> >probably be possible to build a custom VM variant that uses this
> display
> >virtualization strategy.
> >
> >[a]
> https://android-review.googlesource.com/c/platform/development/+/34470
> >[b]
> https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
> >[c]
> https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
> >[d] https://developer.android.com/studio/releases/emulator
> >[e] https://github.com/talex5/wayland-proxy-virtwl
> >
> >Signed-off-by: Gurchetan Singh 
> >Tested-by: Alyssa Ross 
> >Tested-by: Emmanouil Pitsidianakis 
> >Reviewed-by: Emmanouil Pitsidianakis 
> >---
> >v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
> >- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
> >- Used error_report(..)
> >- Used g_autofree to fix leaks on error paths
> >- Removed unnecessary casts
> >- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files
> >
> >v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau
> and
> >Bernard Berschow:
> >- Parenthesis in CHECK macro
> >- CHECK_RESULT(result, ..) --> CHECK(!result, ..)
> >- delay until g->parent_obj.enable = 1
> >- Additional cast fixes
> >- initialize directly in virtio_gpu_rutabaga_realize(..)
> >- add debug callback to hook into Q

Re: [PATCH v13 0/9] rutabaga_gfx + gfxstream

2023-09-13 Thread Gurchetan Singh

On Wed, Sep 13, 2023 at 6:49 AM Marc-André Lureau <
marcandre.lur...@gmail.com> wrote:

> Hi
>
> On Wed, Sep 13, 2023 at 5:08 AM Gurchetan Singh
>  wrote:
> > On Tue, Sep 12, 2023 at 6:59 AM Marc-André Lureau <
> marcandre.lur...@gmail.com> wrote:
> >> Packaging aemu and gfxstream is a bit problematic. I have some WIP
> >> Fedora packages.

>>
> >> AEMU:
> >> - installs files under /usr/include/host-common and
> >> /usr/include/snapshot. Can this be moved under /usr/include/aemu
> >> instead?
> >> - builds only static versions of libaemu-host-common.a and
> >> liblogging-base.a (distros don't like static libs much)
> >> - could liblogging-base(.a,.so,..) also have "aemu" name in it?
> >> - libaemu-base.so is not versioned
> >> - I can't find a release tarball, nor the (v0.1.2) release tag
> >> - could have a README file
> >>
> >> I am not very familiar with cmake, so it's not obvious how to make the
> >> required changes. Would you like me to open an issue (where?) or try
> >> to make some patches?
> >
> >
> > I filed an internal bug with all the issues you listed: Android side
> should fix this internally.
> >
> > I see a few options for packaging:
> >
> > 1) Punt on gfxstream/AEMU packaging, just do rutabaga
> >
> > gfxstream is mostly useful for Android guests, and I didn't expect
> anyone to actually package it at this point since most here are interested
> in Linux guests (where gfxstream VK is headless only right now).  Plus
> ioctl-fowarding > API forwarding for security and performance, so I'm not
> sure if it'll have any sticking power even if everything is supported
> (outside of a few niche use cases).
> >
> > Though, I sense interest in Wayland passthrough for dual Linux use
> cases.  I put up:
> >
> > crrev.com/c/4860836
> >
> > that'll allow packaging on rutabaga_gfx and even CI testing without
> gfxstream, since it is designed to function without it.  We could issue
> another rutabaga-release tag, or you can simply add a patch (a common
> packaging practice) on the Fedora package with the "UPSTEAM label".
> >
> > 2) Actually package gfxstream only
> >
> > Probably an intermediate solution that doesn't introduce
> versioning/static library issues would be just to have a copy of AEMU in
> the gfxstream repo, and link it statically.  Will need another release
> tag/commit of gfxstream.
> >
> > 3) Don't package at all
> >
> > For my particular use case since we have to build QEMU for sources, this
> is fine.  If upstream breaks virtio-gpu-rutabaga.c, we'll send a patch and
> fix it.  Being in-tree is most important.
> >
> > Let me know what you prefer!
> >
>
> I would rather have standard packaging of the various projects, so we
> can test and develop easily.
>

Ack.  Here are the requested changes:

- https://android-review.googlesource.com/q/topic:%22aemu-package-fix%22
- crrev.com/c/4865171

The main change is:

https://android-review.googlesource.com/c/platform/hardware/google/aemu/+/2751066

Once that's okay for packaging, I'll ping harder on v0.1.2 gfxstream/AEMU
release tags.  Let me know if you want another release tag for rutabaga, or
if just patching in upstream changes would be acceptable.

For rutabaga, I ended up having to patch a little bit the shared
> library, to fix SONAME:
>

Landed as crrev.com/c/4863380.


>
> diff --git a/ffi/Makefile b/ffi/Makefile
> index d2f0d38..7efc8f3 100644
> --- a/ffi/Makefile
> +++ b/ffi/Makefile
> @@ -47,13 +47,13 @@ build:
>
>  install: build
>  ifeq ($(UNAME), Linux)
> -install -D -m 755 -t $(DESTDIR)$(libdir) $(OUT)/$(LIB_NAME)
> +install -D -m 755 $(OUT)/$(LIB_NAME)
> $(DESTDIR)$(libdir)/$(LIB_NAME).$(RUTABAGA_VERSION)
>  endif
>  ifeq ($(UNAME), Darwin)
> -install_name_tool -id $(DESTDIR)$(libdir)/$(LIB_NAME)
> $(DESTDIR)$(libdir)/$(LIB_NAME)
> +install_name_tool -id
> $(DESTDIR)$(libdir)/$(LIB_NAME).$(RUTABAGA_VERSION)
> $(DESTDIR)$(libdir)/$(LIB_NAME)
>  endif
> -ln -sf $(DESTDIR)$(libdir)/$(LIB_NAME)
> $(DESTDIR)$(libdir)/$(LIB_NAME).$(RUTABAGA_VERSION)
> -ln -sf $(DESTDIR)$(libdir)/$(LIB_NAME)
> $(DESTDIR)$(libdir)/$(LIB_NAME).$(RUTABAGA_VERSION_MAJOR)
> +ln -s $(LIB_NAME).$(RUTABAGA_VERSION)
> $(DESTDIR)$(libdir)/$(LIB_NAME).$(RUTABAGA_VERSION_MAJOR)
> +ln -s $(LIB_NAME).$(RUTABAGA_VERSION) $(DESTDIR)$(libdir)/$(LIB_NAME)
>  ifeq ($(UNAME), Linux)
>  install -D -m 0644 $(SRC)/share/rutabaga_gfx_ffi.pc
> $(DESTDIR)$(libdir)/pkgconfig/rutabaga_gfx_ffi.pc
>  install -D -m 0644 $(SRC)/include/rutabaga_gfx_ffi.h
> $(DESTDIR)$(includedir)/rutabaga_gfx_ffi.h
> diff --git a/ffi/build.rs b/ffi/build.rs
> new file mode 100644
> index 000..efa18d3
> --- /dev/null
> +++ b/ffi/build.rs
> @@ -0,0 +1,3 @@
> +fn main() {
> +
>  println!("cargo:rustc-cdylib-link-arg=-Wl,-soname,librutabaga_gfx_ffi.so.0");
> +}
>
>
> The package is a bit unconventional, since it's a rust project
> providing a C shared library. I am not sure I did Fedora packaging
> right, let see:
> https://bugzilla.redhat.com/bugzilla/sh

[RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type()

2023-09-13 Thread Xiaoyao Li

pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
specific initialization by utilizing kvm_type method.

commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
moves the Xen specific initialization to pc_basic_device_init().

There is no need to keep the PC specific kvm_type() implementation
anymore. On the other hand, later patch will implement kvm_type()
method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.

Signed-off-by: Xiaoyao Li 
Reviewed-by: Isaku Yamahata 
---
 hw/i386/pc.c | 5 -
 include/hw/i386/pc.h | 3 ---
 2 files changed, 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3109d5e0e035..abeadd903827 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1794,11 +1794,6 @@ static void pc_machine_initfn(Object *obj)
 cxl_machine_init(obj, &pcms->cxl_devices_state);
 }
 
-int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
-{
-return 0;
-}
-
 static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
 CPUState *cs;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d54e8b1101e4..c98d628a76f3 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -296,15 +296,12 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
-int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
-
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
 MachineClass *mc = MACHINE_CLASS(oc); \
 optsfn(mc); \
 mc->init = initfn; \
-mc->kvm_type = pc_machine_kvm_type; \
 } \
 static const TypeInfo pc_machine_type_##suffix = { \
 .name   = namestr TYPE_MACHINE_SUFFIX, \
-- 
2.34.1

[RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization

2023-09-13 Thread Xiaoyao Li

From: Isaku Yamahata 

In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
 hw/pci-host/q35.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 91c46df9ae31..ac1518a94ee4 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -575,6 +575,16 @@ static void mch_realize(PCIDevice *d, Error **errp)
 /* setup pci memory mapping */
 pc_pci_as_mapping_init(mch->system_memory, mch->pci_address_space);
 
+/* PAM */
+init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
+ mch->system_memory, mch->pci_address_space,
+ PAM_BIOS_BASE, PAM_BIOS_SIZE);
+for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
+init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
+ mch->system_memory, mch->pci_address_space,
+ PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+}
+
 /* if *disabled* show SMRAM to all CPUs */
 memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
  mch->pci_address_space, 
MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -641,15 +651,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
 
 object_property_add_const_link(qdev_get_machine(), "smram",
OBJECT(&mch->smram));
-
-init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
- mch->system_memory, mch->pci_address_space,
- PAM_BIOS_BASE, PAM_BIOS_SIZE);
-for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
-init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
- mch->system_memory, mch->pci_address_space,
- PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
-}
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.34.1

[RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM

2023-09-13 Thread Xiaoyao Li

Register a memory listener for KVM_X86_SW_PROVTED_VM. It set RAM to
private by default.

Signed-off-by: Xiaoyao Li 
---
 include/exec/memory.h |  1 +
 target/i386/kvm/sw-protected-vm.c | 18 ++
 2 files changed, 19 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2c738b5dc420..a2602b783a38 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -820,6 +820,7 @@ struct IOMMUMemoryRegion {
 #define MEMORY_LISTENER_PRIORITY_MIN0
 #define MEMORY_LISTENER_PRIORITY_ACCEL  10
 #define MEMORY_LISTENER_PRIORITY_DEV_BACKEND10
+#define MEMORY_LISTENER_PRIORITY_ACCEL_HIGH 20
 
 /**
  * struct MemoryListener: callbacks structure for updates to the physical 
memory map
diff --git a/target/i386/kvm/sw-protected-vm.c 
b/target/i386/kvm/sw-protected-vm.c
index 3cfcc89202a6..f47ac383e1dd 100644
--- a/target/i386/kvm/sw-protected-vm.c
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -12,14 +12,32 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "exec/address-spaces.h"
+#include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
 #include "sw-protected-vm.h"
 
+static void kvm_x86_sw_protected_vm_region_add(MemoryListener *listenr,
+   MemoryRegionSection *section)
+{
+memory_region_set_default_private(section->mr);
+}
+
+static MemoryListener kvm_x86_sw_protected_vm_memory_listener = {
+.name = "kvm_x86_sw_protected_vm_memory_listener",
+.region_add = kvm_x86_sw_protected_vm_region_add,
+/* Higher than KVM memory listener = 10. */
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL_HIGH,
+};
+
 int sw_protected_vm_kvm_init(MachineState *ms, Error **errp)
 {
 SwProtectedVm *spvm = SW_PROTECTED_VM(OBJECT(ms->cgs));
 
+memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener,
+ &address_space_memory);
+
 spvm->parent_obj.ready = true;
 return 0;
 }
-- 
2.34.1

[RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions

2023-09-13 Thread Xiaoyao Li

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/kvm.c |  2 ++
 target/i386/kvm/sw-protected-vm.c | 10 ++
 target/i386/kvm/sw-protected-vm.h |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index fb1be16471b4..e126bf4e7ddd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2587,6 +2587,8 @@ static int kvm_confidential_guest_init(MachineState *ms, 
Error **errp)
 {
 if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
 return sev_kvm_init(ms->cgs, errp);
+} else if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SW_PROTECTED_VM)) {
+return sw_protected_vm_kvm_init(ms, errp);
 }
 
 return 0;
diff --git a/target/i386/kvm/sw-protected-vm.c 
b/target/i386/kvm/sw-protected-vm.c
index 62a1d3d5d3fe..3cfcc89202a6 100644
--- a/target/i386/kvm/sw-protected-vm.c
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -10,10 +10,20 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "qom/object_interfaces.h"
 
+#include "hw/i386/x86.h"
 #include "sw-protected-vm.h"
 
+int sw_protected_vm_kvm_init(MachineState *ms, Error **errp)
+{
+SwProtectedVm *spvm = SW_PROTECTED_VM(OBJECT(ms->cgs));
+
+spvm->parent_obj.ready = true;
+return 0;
+}
+
 /* x86-sw-protected-vm */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(SwProtectedVm,
sw_protected_vm,
diff --git a/target/i386/kvm/sw-protected-vm.h 
b/target/i386/kvm/sw-protected-vm.h
index db192a81c75e..15f63bfc7c60 100644
--- a/target/i386/kvm/sw-protected-vm.h
+++ b/target/i386/kvm/sw-protected-vm.h
@@ -14,4 +14,6 @@ typedef struct SwProtectedVm {
 ConfidentialGuestSupport parent_obj;
 } SwProtectedVm;
 
+int sw_protected_vm_kvm_init(MachineState *ms, Error **errp);
+
 #endif /* QEMU_I386_SW_PROTECTED_VM_H */
-- 
2.34.1

[RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private

2023-09-13 Thread Xiaoyao Li

From: Isaku Yamahata 

Signed-off-by: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c| 1 +
 accel/kvm/trace-events | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c67aa66b0559..229b7038a4c2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3048,6 +3048,7 @@ static int kvm_convert_memory(hwaddr start, hwaddr size, 
bool to_private)
 void *addr;
 int ret = -1;
 
+trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : 
"private_to_shared");
 section = memory_region_find(get_system_memory(), start, size);
 if (!section.mr) {
 return ret;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 80694683acea..4935c9c5cf0b 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -25,4 +25,4 @@ kvm_dirty_ring_reaper(const char *s) "%s"
 kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took 
%"PRIi64" us)"
 kvm_dirty_ring_reaper_kick(const char *reason) "%s"
 kvm_dirty_ring_flush(int finished) "%d"
-
+kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" 
PRIx64 " size 0x%" PRIx64 " %s"
-- 
2.34.1

[RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range()

2023-09-13 Thread Xiaoyao Li

Extract the alignment check and sanity check out from
ram_block_discard_range() into a seperate function
ram_block_discard_range_fd(), which can be passed with an explicit fd as
input parameter.

ram_block_discard_range_fd() can be used to discard private memory range
from gmem fd with later patch. When doing private memory <-> shared
memory conversion, it requires 4KB alignment instead of
RamBlock.page_size.

Signed-off-by: Xiaoyao Li 
---
 softmmu/physmem.c | 192 --
 1 file changed, 100 insertions(+), 92 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 34d580ec0d39..6ee6bc794f44 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3425,117 +3425,125 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, 
void *opaque)
 return ret;
 }
 
+static int ram_block_discard_range_fd(RAMBlock *rb, uint64_t start,
+  size_t length, int fd)
+{
+uint8_t *host_startaddr = rb->host + start;
+bool need_madvise, need_fallocate;
+int ret = -1;
+
+errno = ENOTSUP; /* If we are missing MADVISE etc */
+
+/* The logic here is messy;
+ *madvise DONTNEED fails for hugepages
+ *fallocate works on hugepages and shmem
+ *shared anonymous memory requires madvise REMOVE
+ */
+need_madvise = (rb->page_size == qemu_host_page_size) && (rb->fd == fd);
+need_fallocate = fd != -1;
+
+if (need_fallocate) {
+/* For a file, this causes the area of the file to be zero'd
+ * if read, and for hugetlbfs also causes it to be unmapped
+ * so a userfault will trigger.
+ */
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+/*
+ * We'll discard data from the actual file, even though we only
+ * have a MAP_PRIVATE mapping, possibly messing with other
+ * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
+ * change that behavior whithout violating the promised
+ * semantics of ram_block_discard_range().
+ *
+ * Only warn, because it works as long as nobody else uses that
+ * file.
+ */
+if (!qemu_ram_is_shared(rb)) {
+warn_report_once("%s: Discarding RAM"
+" in private file mappings is possibly"
+" dangerous, because it will modify the"
+" underlying file and will affect other"
+" users of the file", __func__);
+}
+
+ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+start, length);
+if (ret) {
+ret = -errno;
+error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
+__func__, rb->idstr, start, length, ret);
+return ret;
+}
+#else
+ret = -ENOSYS;
+error_report("%s: fallocate not available/file "
+ "%s:%" PRIx64 " +%zx (%d)",
+ __func__, rb->idstr, start, length, ret);
+return ret;
+#endif
+}
+
+if (need_madvise) {
+/* For normal RAM this causes it to be unmapped,
+ * for shared memory it causes the local mapping to disappear
+ * and to fall back on the file contents (which we just
+ * fallocate'd away).
+ */
+#if defined(CONFIG_MADVISE)
+if (qemu_ram_is_shared(rb) && fd < 0) {
+ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
+} else {
+ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
+}
+if (ret) {
+ret = -errno;
+error_report("%s: Failed to discard range %s:%" PRIx64 " +%zx 
(%d)",
+ __func__, rb->idstr, start, length, ret);
+return ret;
+}
+#else
+ret = -ENOSYS;
+error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
+__func__, rb->idstr, start, length, ret);
+return ret;
+#endif
+}
+
+trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
+  need_madvise, need_fallocate, ret);
+return ret;
+}
+
 /*
  * Unmap pages of memory from start to start+length such that
  * they a) read as 0, b) Trigger whatever fault mechanism
  * the OS provides for postcopy.
+ *
  * The pages must be unmapped by the end of the function.
- * Returns: 0 on success, none-0 on failure
- *
+ * Returns: 0 on success, none-0 on failure.
  */
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 {
-int ret = -1;
-
 uint8_t *host_startaddr = rb->host + start;
 
 if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
 error_report("%s: Unaligned start address: %p",
  __func__, host_startaddr);
-goto err;
+return -1;
 }
 
-if ((start + length) <= rb->max_length) {
-bool n

[RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host

2023-09-13 Thread Xiaoyao Li

From: Isaku Yamahata 

Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Xiaoyao Li 
---
 hw/i386/pc_q35.c  |  3 ++-
 hw/pci-host/q35.c | 42 +++
 include/hw/i386/pc.h  |  1 +
 include/hw/pci-host/q35.h |  1 +
 4 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index dc27a9e223a2..73eb3bc5e826 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -232,8 +232,9 @@ static void pc_q35_init(MachineState *machine)
 x86ms->above_4g_mem_size, NULL);
 object_property_set_bool(phb, PCI_HOST_BYPASS_IOMMU,
  pcms->default_bus_bypass_iommu, NULL);
+object_property_set_bool(phb, PCI_HOST_PROP_SMM_RANGES,
+ x86_machine_is_smm_enabled(x86ms), NULL);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), &error_fatal);
-
 /* pci */
 host_bus = PCI_BUS(qdev_get_child_bus(DEVICE(phb), "pcie.0"));
 pcms->bus = host_bus;
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index ac1518a94ee4..f5c01f7080f9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -186,6 +186,8 @@ static Property q35_host_props[] = {
  mch.below_4g_mem_size, 0),
 DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
  mch.above_4g_mem_size, 0),
+DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
+ mch.has_smm_ranges, true),
 DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -221,6 +223,7 @@ static void q35_host_initfn(Object *obj)
 /* mch's object_initialize resets the default value, set it again */
 qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
  Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
+
 object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
 q35_host_get_pci_hole_start,
 NULL, NULL, NULL);
@@ -483,6 +486,10 @@ static void mch_write_config(PCIDevice *d,
 mch_update_pciexbar(mch);
 }
 
+if (!mch->has_smm_ranges) {
+return;
+}
+
 if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
MCH_HOST_BRIDGE_SMRAM_SIZE)) {
 mch_update_smram(mch);
@@ -501,10 +508,13 @@ static void mch_write_config(PCIDevice *d,
 static void mch_update(MCHPCIState *mch)
 {
 mch_update_pciexbar(mch);
+
 mch_update_pam(mch);
-mch_update_smram(mch);
-mch_update_ext_tseg_mbytes(mch);
-mch_update_smbase_smram(mch);
+if (mch->has_smm_ranges) {
+mch_update_smram(mch);
+mch_update_ext_tseg_mbytes(mch);
+mch_update_smbase_smram(mch);
+}
 
 /*
  * pci hole goes from end-of-low-ram to io-apic.
@@ -545,19 +555,21 @@ static void mch_reset(DeviceState *qdev)
 pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
 
-d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
-d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
-d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
-d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
+if (mch->has_smm_ranges) {
+d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
+d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
+d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
+d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
 
-if (mch->ext_tseg_mbytes > 0) {
-pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
- MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+if (mch->ext_tseg_mbytes > 0) {
+pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
+MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+}
+
+d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
+d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
 }
 
-d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
-d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
-
 mch_update(mch);
 }
 
@@ -585,6 +597,10 @@ static void mch_realize(PCIDevice *d, Error **errp)
  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
 }
 
+if (!mch->has_smm_ranges) {
+return;
+}
+
 /* if *disabled* show SMRAM to all CPUs */
 memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
  mch->pci_address_space, 
MCH_HOST_BRIDGE_SMRAM_C_BASE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c98d628a76f3..

[RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM

2023-09-13 Thread Xiaoyao Li

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/sw-protected-vm.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/sw-protected-vm.c 
b/target/i386/kvm/sw-protected-vm.c
index f47ac383e1dd..65347067aa03 100644
--- a/target/i386/kvm/sw-protected-vm.c
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -34,10 +34,18 @@ static MemoryListener 
kvm_x86_sw_protected_vm_memory_listener = {
 int sw_protected_vm_kvm_init(MachineState *ms, Error **errp)
 {
 SwProtectedVm *spvm = SW_PROTECTED_VM(OBJECT(ms->cgs));
+X86MachineState *x86ms = X86_MACHINE(ms);
 
 memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener,
  &address_space_memory);
 
+if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+x86ms->smm = ON_OFF_AUTO_OFF;
+} else if (x86ms->smm == ON_OFF_AUTO_ON) {
+error_setg(errp, "X86_SW_PROTECTED_VM doesn't support SMM");
+return -EINVAL;
+}
+
 spvm->parent_obj.ready = true;
 return 0;
 }
-- 
2.34.1

[RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range()

2023-09-13 Thread Xiaoyao Li

It's used for discarding oppsite memory after memory conversion to
shared/private.

Note, private-shared page conversion is done at 4KB granularity. Don't
check alignment with rb->page_size, instead qemu_host_page_size, which
is 4K.

Originally-from: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
 include/exec/cpu-common.h |  2 ++
 softmmu/physmem.c | 36 
 2 files changed, 38 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 87dc9a752c9a..558684b9f246 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -157,6 +157,8 @@ typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
 
 int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length);
+int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
+bool shared_to_private);
 
 #endif
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 6ee6bc794f44..dab3247d461c 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3733,3 +3733,39 @@ bool ram_block_discard_is_required(void)
 return qatomic_read(&ram_block_discard_required_cnt) ||
qatomic_read(&ram_block_coordinated_discard_required_cnt);
 }
+
+int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
+bool shared_to_private)
+{
+int fd;
+
+if (!rb || rb->gmem_fd < 0) {
+return -1;
+}
+
+if (!QEMU_PTR_IS_ALIGNED(start, qemu_host_page_size) ||
+!QEMU_PTR_IS_ALIGNED(length, qemu_host_page_size)) {
+return -1;
+}
+
+if (!length) {
+return -1;
+}
+
+if (start + length > rb->max_length) {
+return -1;
+}
+
+if (shared_to_private) {
+void *host_startaddr = rb->host + start;
+
+if (!QEMU_PTR_IS_ALIGNED(host_startaddr, qemu_host_page_size)) {
+return -1;
+}
+fd = rb->fd;
+} else {
+fd = rb->gmem_fd;
+}
+
+return ram_block_discard_range_fd(rb, start, length, fd);
+}
-- 
2.34.1

[RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes

2023-09-13 Thread Xiaoyao Li

Introcude the helper functions to set the attributes of a range of
memory to private and shared.

Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c  | 43 +++
 include/sysemu/kvm.h |  3 +++
 2 files changed, 46 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 91cee0878366..eeccc6317fa9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -105,6 +105,7 @@ bool kvm_msi_use_devid;
 bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
+static uint64_t kvm_supported_memory_attributes;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -1343,6 +1344,44 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
 kvm_max_slot_size = max_slot_size;
 }
 
+static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr)
+{
+struct kvm_memory_attributes attrs;
+int r;
+
+attrs.attributes = attr;
+attrs.address = start;
+attrs.size = size;
+attrs.flags = 0;
+
+r = kvm_vm_ioctl(kvm_state, KVM_SET_MEMORY_ATTRIBUTES, &attrs);
+if (r) {
+warn_report("%s: failed to set memory (0x%lx+%#zx) with attr 0x%lx 
error '%s'",
+ __func__, start, size, attr, strerror(errno));
+}
+return r;
+}
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size)
+{
+if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+error_report("KVM doesn't support PRIVATE memory attribute\n");
+return -EINVAL;
+}
+
+return kvm_set_memory_attributes(start, size, 
KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size)
+{
+if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+error_report("KVM doesn't support PRIVATE memory attribute\n");
+return -EINVAL;
+}
+
+return kvm_set_memory_attributes(start, size, 0);
+}
+
 /* Called with KVMMemoryListener.slots_lock held */
 static void kvm_set_phys_mem(KVMMemoryListener *kml,
  MemoryRegionSection *section, bool add)
@@ -2556,6 +2595,10 @@ static int kvm_init(MachineState *ms)
 }
 s->as = g_new0(struct KVMAs, s->nr_as);
 
+if (kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES)) {
+kvm_supported_memory_attributes = kvm_ioctl(s, 
KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES, 0);
+}
+
 if (object_property_find(OBJECT(current_machine), "kvm-type")) {
 g_autofree char *kvm_type = 
object_property_get_str(OBJECT(current_machine),
 "kvm-type",
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index f5b74c8dd8c5..0f78f1246c7f 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,7 @@ bool kvm_dirty_ring_enabled(void);
 uint32_t kvm_dirty_ring_size(void);
 
 int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
 #endif
-- 
2.34.1

[RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot

2023-09-13 Thread Xiaoyao Li

From: Chao Peng 

Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and gmem fd based private
memory.

Signed-off-by: Chao Peng 
Codeveloped-by: Xiaoyao Li 
Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c  | 55 ++--
 accel/kvm/trace-events   |  2 +-
 include/sysemu/kvm_int.h |  2 ++
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 185ae16d9620..91cee0878366 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -288,35 +288,68 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void 
*ram,
 static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, 
bool new)
 {
 KVMState *s = kvm_state;
-struct kvm_userspace_memory_region mem;
+struct kvm_userspace_memory_region2 mem;
+static int cap_user_memory2 = -1;
 int ret;
 
+if (cap_user_memory2 == -1) {
+cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
+}
+
+if (!cap_user_memory2 && slot->gmem_fd >= 0) {
+error_report("%s, KVM doesn't support gmem!", __func__);
+exit(1);
+}
+
 mem.slot = slot->slot | (kml->as_id << 16);
 mem.guest_phys_addr = slot->start_addr;
 mem.userspace_addr = (unsigned long)slot->ram;
 mem.flags = slot->flags;
+mem.gmem_fd = slot->gmem_fd;
+mem.gmem_offset = slot->ofs;
 
 if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & 
KVM_MEM_READONLY) {
 /* Set the slot size to 0 before setting the slot to the desired
  * value. This is needed based on KVM commit 75d61fbc. */
 mem.memory_size = 0;
-ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+
+if (cap_user_memory2) {
+ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+} else {
+ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+   }
 if (ret < 0) {
 goto err;
 }
 }
 mem.memory_size = slot->memory_size;
-ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+if (cap_user_memory2) {
+ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+} else {
+ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+}
 slot->old_flags = mem.flags;
 err:
 trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
   mem.guest_phys_addr, mem.memory_size,
-  mem.userspace_addr, ret);
+  mem.userspace_addr, mem.gmem_fd,
+ mem.gmem_offset, ret);
 if (ret < 0) {
-error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
- " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
- __func__, mem.slot, slot->start_addr,
- (uint64_t)mem.memory_size, strerror(errno));
+if (cap_user_memory2) {
+error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
+" start=0x%" PRIx64 ", size=0x%" PRIx64 ","
+" flags=0x%" PRIx32 ","
+" gmem_fd=%" PRId32 ", gmem_offset=0x%" PRIx64 ": %s",
+__func__, mem.slot, slot->start_addr,
+(uint64_t)mem.memory_size, mem.flags,
+mem.gmem_fd, (uint64_t)mem.gmem_offset,
+strerror(errno));
+} else {
+error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
+" start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
+__func__, mem.slot, slot->start_addr,
+(uint64_t)mem.memory_size, strerror(errno));
+}
 }
 return ret;
 }
@@ -472,6 +505,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
 if (readonly && kvm_readonly_mem_allowed) {
 flags |= KVM_MEM_READONLY;
 }
+if (memory_region_has_gmem_fd(mr)) {
+flags |= KVM_MEM_PRIVATE;
+}
 return flags;
 }
 
@@ -1402,6 +1438,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 mem->ram_start_offset = ram_start_offset;
 mem->ram = ram;
 mem->flags = kvm_mem_flags(mr);
+mem->gmem_fd = mr->ram_block->gmem_fd;
+mem->ofs = (uint8_t*)ram - mr->ram_block->host;
+
 kvm_slot_init_dirty_bitmap(mem);
 err = kvm_set_user_memory_region(kml, mem, true);
 if (err) {
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 14ebfa1b991c..80694683acea 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route 
virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign

[RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem

2023-09-13 Thread Xiaoyao Li

From: Chao Peng 

Add KVM gmem support to RAMBlock so both normal hva based memory
and kvm gmem fd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
gmem for the RAMBlock when it's set.

Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c | 17 +
 include/exec/memory.h   |  3 +++
 include/exec/ramblock.h |  1 +
 include/sysemu/kvm.h|  2 ++
 softmmu/physmem.c   | 18 +++---
 5 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 60aacd925393..185ae16d9620 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -4225,3 +4225,20 @@ void query_stats_schemas_cb(StatsSchemaList **result, 
Error **errp)
 query_stats_schema_vcpu(first_cpu, &stats_args);
 }
 }
+
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
+{
+int fd;
+struct kvm_create_guest_memfd gmem = {
+.size = size,
+/* TODO: to decide whether KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is supported 
*/
+.flags = flags,
+};
+
+fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &gmem);
+if (fd < 0) {
+error_setg_errno(errp, errno, "%s: error creating kvm gmem\n", 
__func__);
+}
+
+return fd;
+}
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 68284428f87c..227cb2578e95 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -235,6 +235,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM is an mmap-ed named file */
 #define RAM_NAMED_FILE (1 << 9)
 
+/* RAM can be private that has kvm gmem backend */
+#define RAM_KVM_GMEM(1 << 10)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a5390293..0d158b3909c9 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -41,6 +41,7 @@ struct RAMBlock {
 QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
 int fd;
 uint64_t fd_offset;
+int gmem_fd;
 size_t page_size;
 /* dirty bitmap used during migration */
 unsigned long *bmap;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 115f0cca79d1..f5b74c8dd8c5 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -580,4 +580,6 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1fe..2d98a88f41f0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1824,6 +1824,16 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 }
 }
 
+if (kvm_enabled() && new_block->flags & RAM_KVM_GMEM &&
+new_block->gmem_fd < 0) {
+new_block->gmem_fd = kvm_create_guest_memfd(new_block->max_length,
+0, errp);
+if (new_block->gmem_fd < 0) {
+qemu_mutex_unlock_ramlist();
+return;
+}
+}
+
 new_ram_size = MAX(old_ram_size,
   (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
 if (new_ram_size > old_ram_size) {
@@ -1885,7 +1895,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 /* Just support these ram flags by now. */
 assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
-  RAM_PROTECTED | RAM_NAMED_FILE)) == 0);
+  RAM_PROTECTED | RAM_NAMED_FILE | RAM_KVM_GMEM)) == 
0);
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
@@ -1920,6 +1930,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->used_length = size;
 new_block->max_length = size;
 new_block->flags = ram_flags;
+new_block->gmem_fd = -1;
 new_block->host = file_ram_alloc(new_block, size, fd, readonly,
  !file_size, offset, errp);
 if (!new_block->host) {
@@ -1978,7 +1989,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 Error *local_err = NULL;
 
 assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
-  RAM_NORESERVE)) == 0);
+  RAM_NORESERVE| RAM_KVM_GMEM)) == 0);
 assert(!host ^ (ram_flags & RAM_PREALLOC));
 
 size = HOST_PAGE_ALIGN(size);
@@ -1990,6 +2001,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 new_block->max_length = max_size;
 assert(max_size >= size);
 new_block->fd = -1;
+new_block->gmem_fd = -1;
 new_block->page_size = qemu_real_host_page_size();
 new_block->host = host;
 new_block->fla

[RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT

2023-09-13 Thread Xiaoyao Li

From: Chao Peng 

Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
gmem memory backend.

Signed-off-by: Chao Peng 
Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c | 54 +
 1 file changed, 54 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7e32ee83b258..c67aa66b0559 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3040,6 +3040,50 @@ static void kvm_eat_signals(CPUState *cpu)
 } while (sigismember(&chkset, SIG_IPI));
 }
 
+static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
+{
+MemoryRegionSection section;
+ram_addr_t offset;
+RAMBlock *rb;
+void *addr;
+int ret = -1;
+
+section = memory_region_find(get_system_memory(), start, size);
+if (!section.mr) {
+return ret;
+}
+
+if (memory_region_has_gmem_fd(section.mr)) {
+if (to_private) {
+ret = kvm_set_memory_attributes_private(start, size);
+} else {
+ret = kvm_set_memory_attributes_shared(start, size);
+}
+
+if (ret) {
+memory_region_unref(section.mr);
+return ret;
+}
+
+addr = memory_region_get_ram_ptr(section.mr) +
+   section.offset_within_region;
+rb = qemu_ram_block_from_host(addr, false, &offset);
+/*
+ * With KVM_SET_MEMORY_ATTRIBUTES by kvm_set_memory_attributes(),
+ * operation on underlying file descriptor is only for releasing
+ * unnecessary pages.
+ */
+ram_block_convert_range(rb, offset, size, to_private);
+} else {
+warn_report("Convert non guest-memfd backed memory region "
+"(0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
+start, size, to_private ? "private" : "shared");
+}
+
+memory_region_unref(section.mr);
+return ret;
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
 struct kvm_run *run = cpu->kvm_run;
@@ -3198,6 +3242,16 @@ int kvm_cpu_exec(CPUState *cpu)
 break;
 }
 break;
+case KVM_EXIT_MEMORY_FAULT:
+if (run->memory.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
+error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" PRIx64,
+ (uint64_t)run->memory.flags);
+ret = -1;
+break;
+}
+ret = kvm_convert_memory(run->memory.gpa, run->memory.size,
+ run->memory.flags & 
KVM_MEMORY_EXIT_FLAG_PRIVATE);
+break;
 default:
 DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
-- 
2.34.1

[RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value

2023-09-13 Thread Xiaoyao Li

Introduce new flag RAM_DEFAULT_PRIVATE for RAMBlock. It's used to
indicate the default attribute,  private or not.

Set the RAM range to private explicitly when it's default private.

Originated-from: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c   | 10 ++
 include/exec/memory.h |  6 ++
 softmmu/memory.c  | 13 +
 3 files changed, 29 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eeccc6317fa9..7e32ee83b258 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1487,6 +1487,16 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 strerror(-err));
 abort();
 }
+
+if (memory_region_is_default_private(mr)) {
+err = kvm_set_memory_attributes_private(start_addr, slot_size);
+if (err) {
+error_report("%s: failed to set memory attribute private: 
%s\n",
+ __func__, strerror(-err));
+exit(1);
+}
+}
+
 start_addr += slot_size;
 ram_start_offset += slot_size;
 ram += slot_size;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4b8486ca3632..2c738b5dc420 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -238,6 +238,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM can be private that has kvm gmem backend */
 #define RAM_KVM_GMEM(1 << 10)
 
+/* RAM is default private */
+#define RAM_DEFAULT_PRIVATE (1 << 11)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
@@ -1684,6 +1687,9 @@ bool memory_region_is_protected(MemoryRegion *mr);
  */
 bool memory_region_has_gmem_fd(MemoryRegion *mr);
 
+void memory_region_set_default_private(MemoryRegion *mr);
+bool memory_region_is_default_private(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e69a5f96d5d1..dc5d0d7703b5 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1851,6 +1851,19 @@ bool memory_region_has_gmem_fd(MemoryRegion *mr)
 return mr->ram_block && mr->ram_block->gmem_fd >= 0;
 }
 
+bool memory_region_is_default_private(MemoryRegion *mr)
+{
+return memory_region_has_gmem_fd(mr) &&
+   (mr->ram_block->flags & RAM_DEFAULT_PRIVATE);
+}
+
+void memory_region_set_default_private(MemoryRegion *mr)
+{
+if (memory_region_has_gmem_fd(mr)) {
+mr->ram_block->flags |= RAM_DEFAULT_PRIVATE;
+}
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
 uint8_t mask = mr->dirty_log_mask;
-- 
2.34.1

[RFC PATCH v2 01/21] * HACK * linux-headers: Update headers to pull in gmem APIs

2023-09-13 Thread Xiaoyao Li

This patch needs to be updated by script

scripts/update-linux-headers.sh

once gmem fd support is upstreamed in Linux kernel.

Signed-off-by: Xiaoyao Li 
---
 linux-headers/asm-x86/kvm.h |  3 +++
 linux-headers/linux/kvm.h   | 50 +
 2 files changed, 53 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 2b3a8f7bd2c0..003fb745347c 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -560,4 +560,7 @@ struct kvm_pmu_event_filter {
 /* x86-specific KVM_EXIT_HYPERCALL flags. */
 #define KVM_EXIT_HYPERCALL_LONG_MODE   BIT(0)
 
+#define KVM_X86_DEFAULT_VM 0
+#define KVM_X86_SW_PROTECTED_VM1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 1f3fa4d8..278bed78f98e 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -95,6 +95,19 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SET_USER_MEMORY_REGION2 */
+struct kvm_userspace_memory_region2 {
+   __u32 slot;
+   __u32 flags;
+   __u64 guest_phys_addr;
+   __u64 memory_size;
+   __u64 userspace_addr;
+   __u64 gmem_offset;
+   __u32 gmem_fd;
+   __u32 pad1;
+   __u64 pad2[14];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
  * userspace, other bits are reserved for kvm internal use which are defined
@@ -102,6 +115,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES(1UL << 0)
 #define KVM_MEM_READONLY   (1UL << 1)
+#define KVM_MEM_PRIVATE(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -264,6 +278,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI35
 #define KVM_EXIT_RISCV_CSR36
 #define KVM_EXIT_NOTIFY   37
+#define KVM_EXIT_MEMORY_FAULT 38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -506,6 +521,13 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
__u32 flags;
} notify;
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1ULL << 3)
+   __u64 flags;
+   __u64 gpa;
+   __u64 size;
+   } memory;
/* Fix the size of the union. */
char padding[256];
};
@@ -1188,6 +1210,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COUNTER_OFFSET 227
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
+#define KVM_CAP_USER_MEMORY2 230
+#define KVM_CAP_MEMORY_ATTRIBUTES 231
+#define KVM_CAP_VM_TYPES 232
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1462,6 +1487,8 @@ struct kvm_vfio_spapr_tce {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
+struct kvm_userspace_memory_region2)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
@@ -2245,4 +2272,27 @@ struct kvm_s390_zpci_op {
 /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
 #define KVM_S390_ZPCIOP_REGAEN_HOST(1 << 0)
 
+/* Available with KVM_CAP_MEMORY_ATTRIBUTES */
+#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES_IOR(KVMIO,  0xd2, __u64)
+#define KVM_SET_MEMORY_ATTRIBUTES  _IOW(KVMIO,  0xd3, struct 
kvm_memory_attributes)
+
+struct kvm_memory_attributes {
+   __u64 address;
+   __u64 size;
+   __u64 attributes;
+   __u64 flags;
+};
+
+#define KVM_MEMORY_ATTRIBUTE_PRIVATE   (1ULL << 3)
+
+#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct 
kvm_create_guest_memfd)
+
+#define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE (1ULL << 0)
+
+struct kvm_create_guest_memfd {
+   __u64 size;
+   __u64 flags;
+   __u64 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
2.34.1

[RFC PATCH v2 14/21] physmem: replace function name with func in ram_block_discard_range()

2023-09-13 Thread Xiaoyao Li

Signed-off-by: Xiaoyao Li 
---
 softmmu/physmem.c | 34 +++---
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 2d98a88f41f0..34d580ec0d39 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3440,16 +3440,15 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
start, size_t length)
 uint8_t *host_startaddr = rb->host + start;
 
 if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
-error_report("ram_block_discard_range: Unaligned start address: %p",
- host_startaddr);
+error_report("%s: Unaligned start address: %p",
+ __func__, host_startaddr);
 goto err;
 }
 
 if ((start + length) <= rb->max_length) {
 bool need_madvise, need_fallocate;
 if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
-error_report("ram_block_discard_range: Unaligned length: %zx",
- length);
+error_report("%s: Unaligned length: %zx", __func__, length);
 goto err;
 }
 
@@ -3479,27 +3478,26 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
start, size_t length)
  * file.
  */
 if (!qemu_ram_is_shared(rb)) {
-warn_report_once("ram_block_discard_range: Discarding RAM"
+warn_report_once("%s: Discarding RAM"
  " in private file mappings is possibly"
  " dangerous, because it will modify the"
  " underlying file and will affect other"
- " users of the file");
+ " users of the file", __func__);
 }
 
 ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
 start, length);
 if (ret) {
 ret = -errno;
-error_report("ram_block_discard_range: Failed to fallocate "
- "%s:%" PRIx64 " +%zx (%d)",
- rb->idstr, start, length, ret);
+error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx 
(%d)",
+ __func__, rb->idstr, start, length, ret);
 goto err;
 }
 #else
 ret = -ENOSYS;
-error_report("ram_block_discard_range: fallocate not 
available/file"
+error_report("%s: fallocate not available/file"
  "%s:%" PRIx64 " +%zx (%d)",
- rb->idstr, start, length, ret);
+ __func__, rb->idstr, start, length, ret);
 goto err;
 #endif
 }
@@ -3517,25 +3515,23 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
start, size_t length)
 }
 if (ret) {
 ret = -errno;
-error_report("ram_block_discard_range: Failed to discard range 
"
+error_report("%s: Failed to discard range "
  "%s:%" PRIx64 " +%zx (%d)",
- rb->idstr, start, length, ret);
+ __func__, rb->idstr, start, length, ret);
 goto err;
 }
 #else
 ret = -ENOSYS;
-error_report("ram_block_discard_range: MADVISE not available"
- "%s:%" PRIx64 " +%zx (%d)",
- rb->idstr, start, length, ret);
+error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
+ __func__, rb->idstr, start, length, ret);
 goto err;
 #endif
 }
 trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
   need_madvise, need_fallocate, ret);
 } else {
-error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
- "/%zx/" RAM_ADDR_FMT")",
- rb->idstr, start, length, rb->max_length);
+error_report("%s: Overrun block '%s' (%" PRIu64 "/%zx/" 
RAM_ADDR_FMT")",
+ __func__, rb->idstr, start, length, rb->max_length);
 }
 
 err:
-- 
2.34.1

[RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init()

2023-09-13 Thread Xiaoyao Li

Introduce a separate function kvm_confidential_guest_init(), which
dispatches specific confidential guest initialization function by
ms->cgs type.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
Reviewed-by: Philippe Mathieu-Daudé 
---
 target/i386/kvm/kvm.c | 11 ++-
 target/i386/sev.c |  1 -
 target/i386/sev.h |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index d1cf6c1f63b3..fb1be16471b4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2583,6 +2583,15 @@ static void register_smram_listener(Notifier *n, void 
*unused)
  &smram_address_space, 1, "kvm-smram");
 }
 
+static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
+{
+if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
+return sev_kvm_init(ms->cgs, errp);
+}
+
+return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
 uint64_t identity_base = 0xfffbc000;
@@ -2603,7 +2612,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  * mechanisms are supported in future (e.g. TDX), they'll need
  * their own initialization either here or elsewhere.
  */
-ret = sev_kvm_init(ms->cgs, &local_err);
+ret = kvm_confidential_guest_init(ms, &local_err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index fe2144c0388b..5aa04863846d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -39,7 +39,6 @@
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
 
-#define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
 
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 7b1528248a54..64fbf186dbd2 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -20,6 +20,8 @@
 
 #include "exec/confidential-guest-support.h"
 
+#define TYPE_SEV_GUEST "sev-guest"
+
 #define SEV_POLICY_NODBG0x1
 #define SEV_POLICY_NOKS 0x2
 #define SEV_POLICY_ES   0x4
-- 
2.34.1

[RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type

2023-09-13 Thread Xiaoyao Li

Implement mc->kvm_type() for i386 machines. It provides a way for user
to create SW_PROTECTE_VM.

Also store the vm_type in machinestate to other code to query what the
VM type is.

Signed-off-by: Xiaoyao Li 
---
 hw/i386/x86.c  | 12 
 include/hw/i386/x86.h  |  1 +
 target/i386/kvm/kvm.c  | 30 ++
 target/i386/kvm/kvm_i386.h |  1 +
 4 files changed, 44 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123be..660f83935315 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1382,6 +1382,17 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, 
const char *name,
 qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+X86MachineState *x86ms = X86_MACHINE(ms);
+int kvm_type;
+
+kvm_type = kvm_get_vm_type(ms, vm_type);
+x86ms->vm_type = kvm_type;
+
+return kvm_type;
+}
+
 static void x86_machine_initfn(Object *obj)
 {
 X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1406,6 +1417,7 @@ static void x86_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
 mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+mc->kvm_type = x86_kvm_type;
 x86mc->save_tsc_khz = true;
 x86mc->fwcfg_dma_enabled = true;
 nc->nmi_monitor_handler = x86_nmi;
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index da19ae15463a..ab1d38569019 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -41,6 +41,7 @@ struct X86MachineState {
 MachineState parent;
 
 /*< public >*/
+unsigned int vm_type;
 
 /* Pointers to devices and objects: */
 ISADevice *rtc;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f8cc8eb1fe70..d1cf6c1f63b3 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -32,6 +32,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "sw-protected-vm.h"
 #include "xen-emu.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
@@ -154,6 +155,35 @@ static KVMMSRHandlers 
msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
+static const char* vm_type_name[] = {
+[KVM_X86_DEFAULT_VM] = "default",
+[KVM_X86_SW_PROTECTED_VM] = "sw-protected-vm",
+};
+
+int kvm_get_vm_type(MachineState *ms, const char *vm_type)
+{
+int kvm_type = KVM_X86_DEFAULT_VM;
+
+if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_SW_PROTECTED_VM)) 
{
+kvm_type = KVM_X86_SW_PROTECTED_VM;
+}
+
+/*
+ * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
+ * is always supported
+ */
+if (kvm_type == KVM_X86_DEFAULT_VM) {
+return kvm_type;
+}
+
+if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & 
BIT(kvm_type))) {
+error_report("vm-type %s not supported by KVM", 
vm_type_name[kvm_type]);
+exit(1);
+}
+
+return kvm_type;
+}
+
 int kvm_has_pit_state2(void)
 {
 return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index e24753abfe6a..ea3a5b174ac0 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_get_vm_type(MachineState *ms, const char *vm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_after_reset_vcpu(X86CPU *cpu);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
-- 
2.34.1

[RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM

2023-09-13 Thread Xiaoyao Li

From: Isaku Yamahata 

Add a new property "private" to memory backends. When it's set to true,
it indicates the RAMblock of the backend also requires kvm gmem.

Signed-off-by: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
 backends/hostmem-file.c  |  1 +
 backends/hostmem-memfd.c |  1 +
 backends/hostmem-ram.c   |  1 +
 backends/hostmem.c   | 18 ++
 include/sysemu/hostmem.h |  2 +-
 qapi/qom.json|  4 
 6 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index b4335a80e6da..861f76f2de8a 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -56,6 +56,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 name = host_memory_backend_get_name(backend);
 ram_flags = backend->share ? RAM_SHARED : 0;
 ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+ram_flags |= backend->private ? RAM_KVM_GMEM : 0;
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 ram_flags |= RAM_NAMED_FILE;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 3fc85c3db81b..f49990ce3bbd 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 name = host_memory_backend_get_name(backend);
 ram_flags = backend->share ? RAM_SHARED : 0;
 ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+ram_flags |= backend->private ? RAM_KVM_GMEM : 0;
 memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, fd, 0, errp);
 g_free(name);
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index b8e55cdbd0f8..d6c46250dcfd 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 name = host_memory_backend_get_name(backend);
 ram_flags = backend->share ? RAM_SHARED : 0;
 ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+ram_flags |= backend->private ? RAM_KVM_GMEM : 0;
 memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, errp);
 g_free(name);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c031..dbdbb0aafd45 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -461,6 +461,20 @@ static void host_memory_backend_set_reserve(Object *o, 
bool value, Error **errp)
 }
 backend->reserve = value;
 }
+
+static bool host_memory_backend_get_private(Object *o, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+return backend->private;
+}
+
+static void host_memory_backend_set_private(Object *o, bool value, Error 
**errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+backend->private = value;
+}
 #endif /* CONFIG_LINUX */
 
 static bool
@@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 host_memory_backend_get_reserve, host_memory_backend_set_reserve);
 object_class_property_set_description(oc, "reserve",
 "Reserve swap space (or huge pages) if applicable");
+object_class_property_add_bool(oc, "private",
+host_memory_backend_get_private, host_memory_backend_set_private);
+object_class_property_set_description(oc, "private",
+"Use KVM gmem private memory");
 #endif /* CONFIG_LINUX */
 /*
  * Do not delete/rename option. This option must be considered stable
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 39326f1d4f9c..d88970395618 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -65,7 +65,7 @@ struct HostMemoryBackend {
 /* protected */
 uint64_t size;
 bool merge, dump, use_canonical_path;
-bool prealloc, is_mapped, share, reserve;
+bool prealloc, is_mapped, share, reserve, private;
 uint32_t prealloc_threads;
 ThreadContext *prealloc_context;
 DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
diff --git a/qapi/qom.json b/qapi/qom.json
index fa3e88c8e6ab..d28c5403bc0f 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -605,6 +605,9 @@
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
 #
+# @private: if true, use KVM gmem private memory (default: false)
+# (since 8.2)
+#
 # @size: size of the memory region in bytes
 #
 # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
@@ -631,6 +634,7 @@
 '*prealloc-context': 'str',
 '*share': 'bool',
 '*reserve': 'bool',
+'*private': 'bool',
 'size': 'size',
 '*x-use-canonical-path-for-ramblock-id': 'bool' } }
 
-- 
2.34.1

[RFC PATCH v2 00/21] QEMU gmem implemention

2023-09-13 Thread Xiaoyao Li

It's the v2 RFC of enabling KVM gmem[1] as the backend for private
memory.

For confidential-computing, KVM provides gmem/guest_mem interfaces for
userspace, like QEMU, to allocate user-unaccesible private memory. This
series aims to add gmem support in QEMU's RAMBlock so that each RAM can
have both hva-based shared memory and gmem_fd based private memory. QEMU
does the shared-private conversion on KVM_MEMORY_EXIT and discards the
memory.

It chooses the design that adds "private" property to hostmeory backend.
If "private" property is set, QEMU will allocate/create KVM gmem when
initialize the RAMbloch of the memory backend. 

This sereis also introduces the first user of kvm gmem,
KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
can be created with 

  $qemu -object sw-protected-vm,id=sp-vm0 \
-object memory-backend-ram,id=mem0,size=1G,private=on \
-machine 
q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
...

Unfortunately this patch series fails the boot of OVMF at very early
stage due to triple fault, because KVM doesn't support emulating string IO
to private memory.

This version still leave some opens to be discussed:
1. whether we need "private" propery to be user-settable?

   It seems unnecessary because vm-type is determined. If the VM is
   confidential-guest, then the RAM of the guest must be able to be
   mapped as private, i.e., have kvm gmem backend. So QEMU can
   determine the value of "private" property automatiacally based on vm
   type.

   This also aligns with the board internal MemoryRegion that needs to
   have kvm gmem backend, e.g., TDX requires OVMF to act as private
   memory so bios memory region needs to have kvm gmem fd associated.
   QEMU no doubt will do it internally automatically.

2. hugepage support.

   KVM gmem can be allocated from hugetlbfs. How does QEMU determine
   when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
   easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
   only when memory backend is HostMemoryBackendFile of hugetlbfs.

3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?

   This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
   with gmem together on KVM side and it's supposed to be the first user
   who requires KVM gmem. However the implementation is incomplete and
   there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.

Any other idea/open/question is welcomed.

Beside, TDX QEMU implemetation is based on this series to provide
private gmem for TD private memory, which can be found at [2].
And it can work corresponding KVM [3] to boot TDX guest. 

[1] https://lore.kernel.org/all/20230718234512.1690985-1-sea...@google.com/
[2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream
[3] 
https://github.com/intel/tdx/tree/kvm-upstream-2023.07.27-v6.5-rc2-workaround

===
Changes since rfc v1:
- Implement KVM_X86_SW_PROTECTED_VM with confidential-guest-support
interface;
- rename memory_region_can_be_private() to memory_region_has_gmem_fd();
- allocate kvm gmem fd when creating/initializing the memory backend by
introducing the RAM_KVM_GMEM flag;


Chao Peng (3):
  RAMBlock: Add support of KVM private gmem
  kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  kvm: handle KVM_EXIT_MEMORY_FAULT

Isaku Yamahata (4):
  HostMem: Add private property and associate it with RAM_KVM_GMEM
  trace/kvm: Add trace for page convertion between shared and private
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host

Xiaoyao Li (14):
  *** HACK *** linux-headers: Update headers to pull in gmem APIs
  memory: Introduce memory_region_has_gmem_fd()
  i386: Add support for sw-protected-vm object
  i386/pc: Drop pc_machine_kvm_type()
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: Introduce kvm_confidential_guest_init()
  i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm
specific functions
  kvm: Introduce support for memory_attributes
  kvm/memory: Introduce the infrastructure to set the default
shared/private value
  i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  physmem: replace function name with __func__ in
ram_block_discard_range()
  physmem: extract ram_block_discard_range_fd() from
ram_block_discard_range()
  physmem: Introduce ram_block_convert_range()
  i386: Disable SMM mode for X86_SW_PROTECTED_VM

 accel/kvm/kvm-all.c   | 180 -
 accel/kvm/trace-events|   4 +-
 backends/hostmem-file.c   |   1 +
 backends/hostmem-memfd.c  |   1 +
 backends/hostmem-ram.c|   1 +
 backends/hostmem.c|  18 +++
 hw/i386/pc.c  |   5 -
 hw/i386/pc_q35.c  |   3 +-
 hw/i386/x86.c |  12 ++
 hw/pci-host/q35.c |  61

[RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object

2023-09-13 Thread Xiaoyao Li

Introduce sw-protected-vm object which implements the interface of
CONFIDENTIAL_GUEST_SUPPORT, and will be used to create
X86_SW_PROTECTED_VM via

  $qemu -machine ...,confidential-guest-support=sp-vm0  \
-object sw-protected-vm,id=sp-vm0

Signed-off-by: Xiaoyao Li 
---
 qapi/qom.json |  1 +
 target/i386/kvm/meson.build   |  1 +
 target/i386/kvm/sw-protected-vm.c | 35 +++
 target/i386/kvm/sw-protected-vm.h | 17 +++
 4 files changed, 54 insertions(+)
 create mode 100644 target/i386/kvm/sw-protected-vm.c
 create mode 100644 target/i386/kvm/sw-protected-vm.h

diff --git a/qapi/qom.json b/qapi/qom.json
index d28c5403bc0f..be054ee2f348 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -944,6 +944,7 @@
   'if': 'CONFIG_SECRET_KEYRING' },
 'sev-guest',
 'thread-context',
+'sw-protected-vm',
 's390-pv-guest',
 'throttle-group',
 'tls-creds-anon',
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 40fbde96cac6..a31e760b3f19 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -5,6 +5,7 @@ i386_softmmu_kvm_ss = ss.source_set()
 i386_softmmu_kvm_ss.add(files(
   'kvm.c',
   'kvm-cpu.c',
+  'sw-protected-vm.c',
 ))
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
diff --git a/target/i386/kvm/sw-protected-vm.c 
b/target/i386/kvm/sw-protected-vm.c
new file mode 100644
index ..62a1d3d5d3fe
--- /dev/null
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -0,0 +1,35 @@
+/*
+ * QEMU X86_SW_PROTECTED_VM SUPPORT
+ *
+ * Author:
+ *  Xiaoyao Li 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object_interfaces.h"
+
+#include "sw-protected-vm.h"
+
+/* x86-sw-protected-vm */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(SwProtectedVm,
+   sw_protected_vm,
+   SW_PROTECTED_VM,
+   CONFIDENTIAL_GUEST_SUPPORT,
+   { TYPE_USER_CREATABLE },
+   { NULL })
+
+static void sw_protected_vm_init(Object *obj)
+{
+}
+
+static void sw_protected_vm_finalize(Object *obj)
+{
+}
+
+static void sw_protected_vm_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/sw-protected-vm.h 
b/target/i386/kvm/sw-protected-vm.h
new file mode 100644
index ..db192a81c75e
--- /dev/null
+++ b/target/i386/kvm/sw-protected-vm.h
@@ -0,0 +1,17 @@
+#ifndef QEMU_I386_SW_PROTECTED_VM_H
+#define QEMU_I386_SW_PROTECTED_VM_H
+
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_SW_PROTECTED_VM"sw-protected-vm"
+#define SW_PROTECTED_VM(obj)OBJECT_CHECK(SwProtectedVm, (obj), 
TYPE_SW_PROTECTED_VM)
+
+typedef struct SwProtectedVmClass {
+ConfidentialGuestSupportClass parent_class;
+} SwProtectedVmClass;
+
+typedef struct SwProtectedVm {
+ConfidentialGuestSupport parent_obj;
+} SwProtectedVm;
+
+#endif /* QEMU_I386_SW_PROTECTED_VM_H */
-- 
2.34.1

[RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd()

2023-09-13 Thread Xiaoyao Li

Introduce memory_region_has_gmem_fd() to query if the MemoryRegion has
KVM gmem fd allocated.

Signed-off-by: Xiaoyao Li 
---
 include/exec/memory.h | 10 ++
 softmmu/memory.c  |  5 +
 2 files changed, 15 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 227cb2578e95..4b8486ca3632 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1674,6 +1674,16 @@ static inline bool memory_region_is_romd(MemoryRegion 
*mr)
  */
 bool memory_region_is_protected(MemoryRegion *mr);
 
+/**
+ * memory_region_has_gmem_fd: check whether a memory region has KVM gmem fd
+ * associated
+ *
+ * Returns %true if a memory region's ram_block has valid gmem fd assigned.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_has_gmem_fd(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce7028..e69a5f96d5d1 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1846,6 +1846,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
 return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
 }
 
+bool memory_region_has_gmem_fd(MemoryRegion *mr)
+{
+return mr->ram_block && mr->ram_block->gmem_fd >= 0;
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
 uint8_t mask = mr->dirty_log_mask;
-- 
2.34.1

Re: [PATCH] vhost: Add a defensive check in vhost_commit against wrong deallocation

2023-09-13 Thread Jason Wang

On Wed, Sep 13, 2023 at 3:47 PM Eric Auger  wrote:
>
> In vhost_commit(), it may happen that dev->mem_sections and
> dev->tmp_sections are equal, in which case, unconditionally
> freeing old_sections at the end of the function will also free
> dev->mem_sections used on subsequent call leading to a segmentation
> fault.
>
> Check this situation before deallocating memory.
>
> Signed-off-by: Eric Auger 
> Fixes: c44317efecb2 ("vhost: Build temporary section list and deref
> after commit")
> CC: QEMU Stable 
>
> ---
>
> This SIGSEV condition can be reproduced with
> https://lore.kernel.org/all/20230904080451.424731-1-eric.au...@redhat.com/#r
> This is most probably happening in a situation where the memory API is
> used in a wrong manner but well.

Any chance to move this to the memory API or we may end up with things
like this in another listener?

Thanks

> ---
>  hw/virtio/vhost.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2f6ffb446..c02c599ef0 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -545,6 +545,11 @@ static void vhost_commit(MemoryListener *listener)
>  dev->mem_sections = dev->tmp_sections;
>  dev->n_mem_sections = dev->n_tmp_sections;
>
> +if (old_sections == dev->mem_sections) {
> +assert(n_old_sections ==  dev->n_mem_sections);
> +return;
> +}
> +
>  if (dev->n_mem_sections != n_old_sections) {
>  changed = true;
>  } else {
> --
> 2.41.0
>

Re: [PATCH] vdpa net: zero vhost_vdpa iova_tree pointer at cleanup

2023-09-13 Thread Jason Wang

On Wed, Sep 13, 2023 at 8:34 PM Eugenio Pérez  wrote:
>
> Not zeroing it causes a SIGSEGV if the live migration is cancelled, at
> net device restart.
>
> This is caused because CVQ tries to reuse the iova_tree that is present
> in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start.
> As a consequence, it tries to access an iova_tree that has been already
> free.
>
> Fixes: 00ef422e9fbf ("vdpa net: move iova tree creation from init to start")
> Reported-by: Yanhui Ma 
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

> ---
>  net/vhost-vdpa.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 34202ca009..1714ff4b11 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -385,6 +385,8 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>  dev = s->vhost_vdpa.dev;
>  if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>  g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> +} else {
> +s->vhost_vdpa.iova_tree = NULL;
>  }
>  }
>
> --
> 2.39.3
>

Re: [PATCH v6 52/57] target/loongarch: Implement xvreplve xvinsve0 xvpickve

2023-09-13 Thread Richard Henderson


On 9/13/23 19:26, Song Gao wrote:

+static bool gen_xvrepl128(DisasContext *ctx, arg_vv_i *a, MemOp mop)
  {
-int ofs;
-TCGv_i64 desthigh, destlow, high, low;
+int index = LSX_LEN / (8 * (1 << mop));
  
-if (!avail_LSX(ctx)) {

-return false;
-}
-
-if (!check_vec(ctx, 16)) {
+if (!check_vec(ctx, 32)) {
  return true;
  }
  
-desthigh = tcg_temp_new_i64();

-destlow = tcg_temp_new_i64();
-high = tcg_temp_new_i64();
-low = tcg_temp_new_i64();
+tcg_gen_gvec_dup_mem(mop, vec_reg_offset(a->vd, 0, mop),
+ vec_reg_offset(a->vj, a->imm, mop), 16, 16);
+tcg_gen_gvec_dup_mem(mop, vec_reg_offset(a->vd, index, mop),
+ vec_reg_offset(a->vj, a->imm + index , mop), 16, 16);


I think this isn't right, because vec_reg_offset(a->vd, 0, mop) is not the beginning of 
the vector for a big-endian host -- remember the xor in vec_reg_offset.


Better as

for (i = 0; i < 32; i += 16) {
tcg_gen_gvec_dup_mem(mop, vec_full_offset(a->vd) + i,
 vec_reg_offset(a->vj, a->imm, mop) + i, 16, 16);
}


Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PULL v2 00/45] riscv-to-apply queue

2023-09-13 Thread Alistair Francis

On Tue, Sep 12, 2023 at 8:27 PM Michael Tokarev  wrote:
>
> 11.09.2023 09:42, Alistair Francis:>target/riscv: don't read CSR in 
> riscv_csrrw_do64 (2023-09-11 11:45:55 +1000)
> 2 more questions about this pull-req and -stable.
>
>
> commit 50f9464962fb41f04fd5f42e7ee2cb60942aba89
> Author: Daniel Henrique Barboza 
> Date:   Thu Jul 20 10:24:23 2023 -0300
>
>  target/riscv/cpu.c: add zmmul isa string
>
>  zmmul was promoted from experimental to ratified in commit 6d00ffad4e95.
>  Add a riscv,isa string for it.
>
>  Fixes: 6d00ffad4e95 ("target/riscv: move zmmul out of the experimental 
> properties")
>
> Does this need to be picked for -stable (based on the "Fixes" tag)?
> I don't know the full impact of this change (or lack thereof).
>
>
> commit 4cc9f284d5971ecd8055d26ef74c23ef0be8b8f5
> Author: LIU Zhiwei 
> Date:   Sat Jul 29 11:16:18 2023 +0800
>
>  target/riscv: Fix page_check_range use in fault-only-first
>
>  Commit bef6f008b98(accel/tcg: Return bool from page_check_range) converts
>  integer return value to bool type. However, it wrongly converted the use
>  of the API in riscv fault-only-first, where page_check_range < = 0, 
> should
>  be converted to !page_check_range.
>
> This one also catches an eye, the commit in question is in 8.1, and it is
> a clear bugfix (from the patch anyway).

These two are also good candidates if it isn't too late.

Alistair

>
>
> I probably should stop making such questions and rely more on Cc: qemu-stable@
> instead. It just so happened that I had a closer look at this patchset/pullreq
> while trying to cherry-pick already agreed-upon changes from there.
>
> So far, I picked the following changes for -stable from this pullreq:
>
> c255946e3d hw/char/riscv_htif: Fix printing of console characters on big 
> endian hosts
> 058096f1c5 hw/char/riscv_htif: Fix the console syscall on big endian hosts
> 50f9464962 target/riscv/cpu.c: add zmmul isa string
> 4cc9f284d5 target/riscv: Fix page_check_range use in fault-only-first
> eda633a534 target/riscv: Fix zfa fleq.d and fltq.d
> e0922b73ba hw/intc: Fix upper/lower mtime write calculation
> 9382a9eafc hw/intc: Make rtc variable names consistent
> ae7d4d625c linux-user/riscv: Use abi type for target_ucontext
> 9ff3140631 hw/riscv: virt: Fix riscv,pmu DT node path
> 3a2fc23563 target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0
> 4e3adce124 target/riscv/pmp.c: respect mseccfg.RLB for pmpaddrX changes
> a7c272df82 target/riscv: Allocate itrigger timers only once
>
> Thanks,
>
> /mjt

Re: [PATCH v6 51/57] target/loongarch: Implement xvinsgr2vr xvpickve2gr

2023-09-13 Thread Richard Henderson


On 9/13/23 19:26, Song Gao wrote:

+static inline int vec_reg_offset(int regno, int index, MemOp mop)
+{
+const uint8_t size = 1 << mop;
+int offs = index * size;
+
+#if HOST_BIG_ENDIAN
+if (size < 8 ) {
+offs ^ = (8 - size);
+}
+#endif
+return offs + vec_full_offset(regno);
+}


Merge the #if into the if:

   if (HOST_BIG_ENDIAN && size < 8)

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH v6 06/57] target/loongarch: Use gen_helper_gvec_3 for 3OP vector instructions

2023-09-13 Thread Richard Henderson


On 9/13/23 19:25, Song Gao wrote:

Signed-off-by: Song Gao
---
  target/loongarch/helper.h   | 214 +-
  target/loongarch/vec_helper.c   | 444 +---
  target/loongarch/insn_trans/trans_vec.c.inc |  19 +-
  3 files changed, 326 insertions(+), 351 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PATCH v2 21/24] accel/tcg: Use CPUState in atomicity helpers

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

Makes ldst_atomicity.c.inc almost target-independent, with the exception
of TARGET_PAGE_MASK, which will be addressed in a future patch.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-8-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 20 
 accel/tcg/user-exec.c  | 16 +++
 accel/tcg/ldst_atomicity.c.inc | 88 +-
 3 files changed, 62 insertions(+), 62 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index fcd13d522e..a7f2c848ad 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2209,7 +2209,7 @@ static uint64_t do_ld_whole_be8(CPUState *cpu, uintptr_t 
ra,
 MMULookupPageData *p, uint64_t ret_be)
 {
 int o = p->addr & 7;
-uint64_t x = load_atomic8_or_exit(cpu_env(cpu), ra, p->haddr - o);
+uint64_t x = load_atomic8_or_exit(cpu, ra, p->haddr - o);
 
 x = cpu_to_be64(x);
 x <<= o * 8;
@@ -2229,7 +2229,7 @@ static Int128 do_ld_whole_be16(CPUState *cpu, uintptr_t 
ra,
MMULookupPageData *p, uint64_t ret_be)
 {
 int o = p->addr & 15;
-Int128 x, y = load_atomic16_or_exit(cpu_env(cpu), ra, p->haddr - o);
+Int128 x, y = load_atomic16_or_exit(cpu, ra, p->haddr - o);
 int size = p->size;
 
 if (!HOST_BIG_ENDIAN) {
@@ -2373,7 +2373,7 @@ static uint16_t do_ld_2(CPUState *cpu, MMULookupPageData 
*p, int mmu_idx,
 }
 } else {
 /* Perform the load host endian, then swap if necessary. */
-ret = load_atom_2(cpu_env(cpu), ra, p->haddr, memop);
+ret = load_atom_2(cpu, ra, p->haddr, memop);
 if (memop & MO_BSWAP) {
 ret = bswap16(ret);
 }
@@ -2394,7 +2394,7 @@ static uint32_t do_ld_4(CPUState *cpu, MMULookupPageData 
*p, int mmu_idx,
 }
 } else {
 /* Perform the load host endian. */
-ret = load_atom_4(cpu_env(cpu), ra, p->haddr, memop);
+ret = load_atom_4(cpu, ra, p->haddr, memop);
 if (memop & MO_BSWAP) {
 ret = bswap32(ret);
 }
@@ -2415,7 +2415,7 @@ static uint64_t do_ld_8(CPUState *cpu, MMULookupPageData 
*p, int mmu_idx,
 }
 } else {
 /* Perform the load host endian. */
-ret = load_atom_8(cpu_env(cpu), ra, p->haddr, memop);
+ret = load_atom_8(cpu, ra, p->haddr, memop);
 if (memop & MO_BSWAP) {
 ret = bswap64(ret);
 }
@@ -2578,7 +2578,7 @@ static Int128 do_ld16_mmu(CPUState *cpu, vaddr addr,
 }
 } else {
 /* Perform the load host endian. */
-ret = load_atom_16(cpu_env(cpu), ra, l.page[0].haddr, l.memop);
+ret = load_atom_16(cpu, ra, l.page[0].haddr, l.memop);
 if (l.memop & MO_BSWAP) {
 ret = bswap128(ret);
 }
@@ -2893,7 +2893,7 @@ static void do_st_2(CPUState *cpu, MMULookupPageData *p, 
uint16_t val,
 if (memop & MO_BSWAP) {
 val = bswap16(val);
 }
-store_atom_2(cpu_env(cpu), ra, p->haddr, memop, val);
+store_atom_2(cpu, ra, p->haddr, memop, val);
 }
 }
 
@@ -2913,7 +2913,7 @@ static void do_st_4(CPUState *cpu, MMULookupPageData *p, 
uint32_t val,
 if (memop & MO_BSWAP) {
 val = bswap32(val);
 }
-store_atom_4(cpu_env(cpu), ra, p->haddr, memop, val);
+store_atom_4(cpu, ra, p->haddr, memop, val);
 }
 }
 
@@ -2933,7 +2933,7 @@ static void do_st_8(CPUState *cpu, MMULookupPageData *p, 
uint64_t val,
 if (memop & MO_BSWAP) {
 val = bswap64(val);
 }
-store_atom_8(cpu_env(cpu), ra, p->haddr, memop, val);
+store_atom_8(cpu, ra, p->haddr, memop, val);
 }
 }
 
@@ -3064,7 +3064,7 @@ static void do_st16_mmu(CPUState *cpu, vaddr addr, Int128 
val,
 if (l.memop & MO_BSWAP) {
 val = bswap128(val);
 }
-store_atom_16(cpu_env(cpu), ra, l.page[0].haddr, l.memop, val);
+store_atom_16(cpu, ra, l.page[0].haddr, l.memop, val);
 }
 return;
 }
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index d2daeafbab..f9f5cd1770 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -1002,7 +1002,7 @@ static uint16_t do_ld2_mmu(CPUArchState *env, abi_ptr 
addr,
 tcg_debug_assert((mop & MO_SIZE) == MO_16);
 cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
 haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
-ret = load_atom_2(env, ra, haddr, mop);
+ret = load_atom_2(env_cpu(env), ra, haddr, mop);
 clear_helper_retaddr();
 
 if (mop & MO_BSWAP) {
@@ -1040,7 +1040,7 @@ static uint32_t do_ld4_mmu(CPUArchState *env, abi_ptr 
addr,
 tcg_debug_assert((mop & MO_SIZE) == MO_32);
 cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
 haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
-ret = load_atom_4(env, ra, hadd

[PATCH v2 02/24] accel/tcg: Move CPUTLB definitions from cpu-defs.h

2023-09-13 Thread Richard Henderson

Accept that we will consume space in CPUState for CONFIG_USER_ONLY,
since we cannot test CONFIG_SOFTMMU within hw/core/cpu.h.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-defs.h | 150 
 include/hw/core/cpu.h   | 141 +
 2 files changed, 141 insertions(+), 150 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 0a600a312b..3915438b83 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -54,18 +54,7 @@
 
 #include "exec/target_long.h"
 
-/*
- * Fix the number of mmu modes to 16, which is also the maximum
- * supported by the softmmu tlb api.
- */
-#define NB_MMU_MODES 16
-
 #if defined(CONFIG_SOFTMMU) && defined(CONFIG_TCG)
-#include "exec/tlb-common.h"
-
-/* use a fully associative victim tlb of 8 entries */
-#define CPU_VTLB_SIZE 8
-
 #define CPU_TLB_DYN_MIN_BITS 6
 #define CPU_TLB_DYN_DEFAULT_BITS 8
 
@@ -91,143 +80,4 @@
 
 #endif /* CONFIG_SOFTMMU && CONFIG_TCG */
 
-#if defined(CONFIG_SOFTMMU)
-/*
- * The full TLB entry, which is not accessed by generated TCG code,
- * so the layout is not as critical as that of CPUTLBEntry. This is
- * also why we don't want to combine the two structs.
- */
-typedef struct CPUTLBEntryFull {
-/*
- * @xlat_section contains:
- *  - in the lower TARGET_PAGE_BITS, a physical section number
- *  - with the lower TARGET_PAGE_BITS masked off, an offset which
- *must be added to the virtual address to obtain:
- * + the ram_addr_t of the target RAM (if the physical section
- *   number is PHYS_SECTION_NOTDIRTY or PHYS_SECTION_ROM)
- * + the offset within the target MemoryRegion (otherwise)
- */
-hwaddr xlat_section;
-
-/*
- * @phys_addr contains the physical address in the address space
- * given by cpu_asidx_from_attrs(cpu, @attrs).
- */
-hwaddr phys_addr;
-
-/* @attrs contains the memory transaction attributes for the page. */
-MemTxAttrs attrs;
-
-/* @prot contains the complete protections for the page. */
-uint8_t prot;
-
-/* @lg_page_size contains the log2 of the page size. */
-uint8_t lg_page_size;
-
-/*
- * Additional tlb flags for use by the slow path. If non-zero,
- * the corresponding CPUTLBEntry comparator must have TLB_FORCE_SLOW.
- */
-uint8_t slow_flags[MMU_ACCESS_COUNT];
-
-/*
- * Allow target-specific additions to this structure.
- * This may be used to cache items from the guest cpu
- * page tables for later use by the implementation.
- */
-union {
-/*
- * Cache the attrs and shareability fields from the page table entry.
- *
- * For ARMMMUIdx_Stage2*, pte_attrs is the S2 descriptor bits [5:2].
- * Otherwise, pte_attrs is the same as the MAIR_EL1 8-bit format.
- * For shareability and guarded, as in the SH and GP fields 
respectively
- * of the VMSAv8-64 PTEs.
- */
-struct {
-uint8_t pte_attrs;
-uint8_t shareability;
-bool guarded;
-} arm;
-} extra;
-} CPUTLBEntryFull;
-#endif /* CONFIG_SOFTMMU */
-
-#if defined(CONFIG_SOFTMMU) && defined(CONFIG_TCG)
-/*
- * Data elements that are per MMU mode, minus the bits accessed by
- * the TCG fast path.
- */
-typedef struct CPUTLBDesc {
-/*
- * Describe a region covering all of the large pages allocated
- * into the tlb.  When any page within this region is flushed,
- * we must flush the entire tlb.  The region is matched if
- * (addr & large_page_mask) == large_page_addr.
- */
-vaddr large_page_addr;
-vaddr large_page_mask;
-/* host time (in ns) at the beginning of the time window */
-int64_t window_begin_ns;
-/* maximum number of entries observed in the window */
-size_t window_max_entries;
-size_t n_used_entries;
-/* The next index to use in the tlb victim table.  */
-size_t vindex;
-/* The tlb victim table, in two parts.  */
-CPUTLBEntry vtable[CPU_VTLB_SIZE];
-CPUTLBEntryFull vfulltlb[CPU_VTLB_SIZE];
-CPUTLBEntryFull *fulltlb;
-} CPUTLBDesc;
-
-/*
- * Data elements that are shared between all MMU modes.
- */
-typedef struct CPUTLBCommon {
-/* Serialize updates to f.table and d.vtable, and others as noted. */
-QemuSpin lock;
-/*
- * Within dirty, for each bit N, modifications have been made to
- * mmu_idx N since the last time that mmu_idx was flushed.
- * Protected by tlb_c.lock.
- */
-uint16_t dirty;
-/*
- * Statistics.  These are not lock protected, but are read and
- * written atomically.  This allows the monitor to print a snapshot
- * of the stats without interfering with the cpu.
- */
-size_t full_flush_count;
-size_t part_flush_count;
-size_t elide_flush_count;
-} CPUTLBCommon;
-
-/*
- * The entire softmmu tlb, for all MMU modes.
- * The meaning of each of the MMU modes is defined

[PATCH v2 16/24] tcg: Remove TCGContext.tlb_fast_offset

2023-09-13 Thread Richard Henderson

Now that there is no padding between CPUNegativeOffsetState
and CPUArchState, this value is constant across all targets.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h |  1 -
 accel/tcg/translate-all.c |  2 --
 tcg/tcg.c | 13 +++--
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 3cdbeaf460..7743868dc9 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -489,7 +489,6 @@ struct TCGContext {
 TCGType addr_type;/* TCG_TYPE_I32 or TCG_TYPE_I64 */
 
 #ifdef CONFIG_SOFTMMU
-int tlb_fast_offset;
 int page_mask;
 uint8_t page_bits;
 uint8_t tlb_dyn_max_bits;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 6fac5b7e29..83e07b830f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -344,8 +344,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_ctx->page_bits = TARGET_PAGE_BITS;
 tcg_ctx->page_mask = TARGET_PAGE_MASK;
 tcg_ctx->tlb_dyn_max_bits = CPU_TLB_DYN_MAX_BITS;
-tcg_ctx->tlb_fast_offset = (int)offsetof(ArchCPU, parent_obj.neg.tlb.f)
- - (int)offsetof(ArchCPU, env);
 #endif
 tcg_ctx->insn_start_words = TARGET_INSN_START_WORDS;
 #ifdef TCG_GUEST_DEFAULT_MO
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 164532bafb..78070ee935 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -405,7 +405,8 @@ static uintptr_t G_GNUC_UNUSED 
get_jmp_target_addr(TCGContext *s, int which)
 #if defined(CONFIG_SOFTMMU) && !defined(CONFIG_TCG_INTERPRETER)
 static int tlb_mask_table_ofs(TCGContext *s, int which)
 {
-return s->tlb_fast_offset + which * sizeof(CPUTLBDescFast);
+return (offsetof(CPUNegativeOffsetState, tlb.f[which]) -
+sizeof(CPUNegativeOffsetState));
 }
 #endif
 
@@ -733,6 +734,11 @@ static const TCGTargetOpDef constraint_sets[] = {
 
 #include "tcg-target.c.inc"
 
+/* Validate CPUTLBDescFast placement. */
+QEMU_BUILD_BUG_ON((int)(offsetof(CPUNegativeOffsetState, tlb.f[0]) -
+sizeof(CPUNegativeOffsetState))
+  < MIN_TLB_MASK_TABLE_OFS);
+
 static void alloc_tcg_plugin_context(TCGContext *s)
 {
 #ifdef CONFIG_PLUGIN
@@ -1496,11 +1502,6 @@ void tcg_func_start(TCGContext *s)
 tcg_debug_assert(s->addr_type == TCG_TYPE_I32 ||
  s->addr_type == TCG_TYPE_I64);
 
-#if defined(CONFIG_SOFTMMU) && !defined(CONFIG_TCG_INTERPRETER)
-tcg_debug_assert(s->tlb_fast_offset < 0);
-tcg_debug_assert(s->tlb_fast_offset >= MIN_TLB_MASK_TABLE_OFS);
-#endif
-
 tcg_debug_assert(s->insn_start_words > 0);
 }
 
-- 
2.34.1

[PATCH v2 05/24] target/arm: Remove size and alignment for cpu subclasses

2023-09-13 Thread Richard Henderson

Inherit the size and alignment from TYPE_ARM_CPU.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c   | 3 ---
 target/arm/cpu64.c | 4 
 2 files changed, 7 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 906eb981b0..d0f279b87f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2423,10 +2423,7 @@ void arm_cpu_register(const ARMCPUInfo *info)
 {
 TypeInfo type_info = {
 .parent = TYPE_ARM_CPU,
-.instance_size = sizeof(ARMCPU),
-.instance_align = __alignof__(ARMCPU),
 .instance_init = arm_cpu_instance_init,
-.class_size = sizeof(ARMCPUClass),
 .class_init = info->class_init ?: cpu_register_class_init,
 .class_data = (void *)info,
 };
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index f3d87e001f..811f3b38c2 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -822,9 +822,7 @@ void aarch64_cpu_register(const ARMCPUInfo *info)
 {
 TypeInfo type_info = {
 .parent = TYPE_AARCH64_CPU,
-.instance_size = sizeof(ARMCPU),
 .instance_init = aarch64_cpu_instance_init,
-.class_size = sizeof(ARMCPUClass),
 .class_init = info->class_init ?: cpu_register_class_init,
 .class_data = (void *)info,
 };
@@ -837,10 +835,8 @@ void aarch64_cpu_register(const ARMCPUInfo *info)
 static const TypeInfo aarch64_cpu_type_info = {
 .name = TYPE_AARCH64_CPU,
 .parent = TYPE_ARM_CPU,
-.instance_size = sizeof(ARMCPU),
 .instance_finalize = aarch64_cpu_finalizefn,
 .abstract = true,
-.class_size = sizeof(AArch64CPUClass),
 .class_init = aarch64_cpu_class_init,
 };
 
-- 
2.34.1

[PATCH v2 10/24] accel/tcg: Move can_do_io to CPUNegativeOffsetState

2023-09-13 Thread Richard Henderson

Minimize the displacement to can_do_io, since it may
be touched at the start of each TranslationBlock.
It fits into other padding within the substructure.

Signed-off-by: Richard Henderson 
---
 include/hw/core/cpu.h| 2 +-
 accel/dummy-cpus.c   | 2 +-
 accel/kvm/kvm-accel-ops.c| 2 +-
 accel/tcg/cpu-exec-common.c  | 2 +-
 accel/tcg/cpu-exec.c | 2 +-
 accel/tcg/cputlb.c   | 4 ++--
 accel/tcg/tcg-accel-ops-icount.c | 2 +-
 accel/tcg/tcg-accel-ops-mttcg.c  | 2 +-
 accel/tcg/tcg-accel-ops-rr.c | 4 ++--
 accel/tcg/translator.c   | 4 ++--
 hw/core/cpu-common.c | 2 +-
 softmmu/icount.c | 2 +-
 softmmu/watchpoint.c | 2 +-
 13 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 44955af3bc..99066da2f3 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -350,6 +350,7 @@ typedef union IcountDecr {
  */
 typedef struct CPUNegativeOffsetState {
 CPUTLB tlb;
+uint32_t can_do_io;
 IcountDecr icount_decr;
 } CPUNegativeOffsetState;
 
@@ -557,7 +558,6 @@ struct CPUState {
 int cluster_index;
 uint32_t tcg_cflags;
 uint32_t halted;
-uint32_t can_do_io;
 int32_t exception_index;
 
 AccelCPUState *accel;
diff --git a/accel/dummy-cpus.c b/accel/dummy-cpus.c
index d6a1b8d0a2..af7f90a4da 100644
--- a/accel/dummy-cpus.c
+++ b/accel/dummy-cpus.c
@@ -27,7 +27,7 @@ static void *dummy_cpu_thread_fn(void *arg)
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
 cpu->thread_id = qemu_get_thread_id();
-cpu->can_do_io = 1;
+cpu->neg.can_do_io = 1;
 current_cpu = cpu;
 
 #ifndef _WIN32
diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
index 457eafa380..b610c0d378 100644
--- a/accel/kvm/kvm-accel-ops.c
+++ b/accel/kvm/kvm-accel-ops.c
@@ -36,7 +36,7 @@ static void *kvm_vcpu_thread_fn(void *arg)
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
 cpu->thread_id = qemu_get_thread_id();
-cpu->can_do_io = 1;
+cpu->neg.can_do_io = 1;
 current_cpu = cpu;
 
 r = kvm_init_vcpu(cpu, &error_fatal);
diff --git a/accel/tcg/cpu-exec-common.c b/accel/tcg/cpu-exec-common.c
index 7e35d7f4b5..8ac2af4d0c 100644
--- a/accel/tcg/cpu-exec-common.c
+++ b/accel/tcg/cpu-exec-common.c
@@ -36,7 +36,7 @@ void cpu_loop_exit_noexc(CPUState *cpu)
 void cpu_loop_exit(CPUState *cpu)
 {
 /* Undo the setting in cpu_tb_exec.  */
-cpu->can_do_io = 1;
+cpu->neg.can_do_io = 1;
 /* Undo any setting in generated code.  */
 qemu_plugin_disable_mem_helpers(cpu);
 siglongjmp(cpu->jmp_env, 1);
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index e2c494e75e..b01e3e5dc8 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -455,7 +455,7 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
*tb_exit)
 
 qemu_thread_jit_execute();
 ret = tcg_qemu_tb_exec(env, tb_ptr);
-cpu->can_do_io = 1;
+cpu->neg.can_do_io = 1;
 qemu_plugin_disable_mem_helpers(cpu);
 /*
  * TODO: Delay swapping back to the read-write region of the TB
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index c643d66190..21c035497f 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1394,7 +1394,7 @@ static uint64_t io_readx(CPUArchState *env, 
CPUTLBEntryFull *full,
 mr = section->mr;
 mr_offset = (full->xlat_section & TARGET_PAGE_MASK) + addr;
 cpu->mem_io_pc = retaddr;
-if (!cpu->can_do_io) {
+if (!cpu->neg.can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 
@@ -1433,7 +1433,7 @@ static void io_writex(CPUArchState *env, CPUTLBEntryFull 
*full,
 section = iotlb_to_section(cpu, full->xlat_section, full->attrs);
 mr = section->mr;
 mr_offset = (full->xlat_section & TARGET_PAGE_MASK) + addr;
-if (!cpu->can_do_io) {
+if (!cpu->neg.can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 cpu->mem_io_pc = retaddr;
diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
index 3d2cfbbc97..0af643b217 100644
--- a/accel/tcg/tcg-accel-ops-icount.c
+++ b/accel/tcg/tcg-accel-ops-icount.c
@@ -153,7 +153,7 @@ void icount_handle_interrupt(CPUState *cpu, int mask)
 
 tcg_handle_interrupt(cpu, mask);
 if (qemu_cpu_is_self(cpu) &&
-!cpu->can_do_io
+!cpu->neg.can_do_io
 && (mask & ~old_mask) != 0) {
 cpu_abort(cpu, "Raised interrupt while not in I/O function");
 }
diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index b276262007..a8e375b3e6 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -80,7 +80,7 @@ static void *mttcg_cpu_thread_fn(void *arg)
 qemu_thread_get_self(cpu->thread);
 
 cpu->thread_id = qemu_get_thread_id();
-cpu->can_do_io = 1;
+cpu->neg.can_do_io = 1;
 current_cpu = cpu;
 cpu_thread_signal_created(cpu);

[PATCH v2 20/24] accel/tcg: Modify atomic_mmu_lookup() to use CPUState

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

The goal is to (in the future) allow for per-target compilation of
functions in atomic_template.h whilst atomic_mmu_lookup() and cputlb.c
are compiled once-per user- or system mode.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-7-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb()]
Signed-off-by: Richard Henderson 
---
 accel/tcg/atomic_template.h | 20 
 accel/tcg/cputlb.c  | 26 +-
 accel/tcg/user-exec.c   |  8 
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/accel/tcg/atomic_template.h b/accel/tcg/atomic_template.h
index 84c08b1425..1dc2151daf 100644
--- a/accel/tcg/atomic_template.h
+++ b/accel/tcg/atomic_template.h
@@ -73,7 +73,8 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr addr,
   ABI_TYPE cmpv, ABI_TYPE newv,
   MemOpIdx oi, uintptr_t retaddr)
 {
-DATA_TYPE *haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);
+DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+ DATA_SIZE, retaddr);
 DATA_TYPE ret;
 
 #if DATA_SIZE == 16
@@ -90,7 +91,8 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr addr,
 ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, abi_ptr addr, ABI_TYPE val,
MemOpIdx oi, uintptr_t retaddr)
 {
-DATA_TYPE *haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);
+DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+ DATA_SIZE, retaddr);
 DATA_TYPE ret;
 
 ret = qatomic_xchg__nocheck(haddr, val);
@@ -104,7 +106,7 @@ ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,
\
 ABI_TYPE val, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 DATA_TYPE *haddr, ret;  \
-haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);   \
+haddr = atomic_mmu_lookup(env_cpu(env), addr, oi, DATA_SIZE, retaddr);   \
 ret = qatomic_##X(haddr, val);  \
 ATOMIC_MMU_CLEANUP; \
 atomic_trace_rmw_post(env, addr, oi);   \
@@ -135,7 +137,7 @@ ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,
\
 ABI_TYPE xval, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 XDATA_TYPE *haddr, cmp, old, new, val = xval;   \
-haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);   \
+haddr = atomic_mmu_lookup(env_cpu(env), addr, oi, DATA_SIZE, retaddr);   \
 smp_mb();   \
 cmp = qatomic_read__nocheck(haddr); \
 do {\
@@ -176,7 +178,8 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr 
addr,
   ABI_TYPE cmpv, ABI_TYPE newv,
   MemOpIdx oi, uintptr_t retaddr)
 {
-DATA_TYPE *haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);
+DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+ DATA_SIZE, retaddr);
 DATA_TYPE ret;
 
 #if DATA_SIZE == 16
@@ -193,7 +196,8 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr 
addr,
 ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, abi_ptr addr, ABI_TYPE val,
MemOpIdx oi, uintptr_t retaddr)
 {
-DATA_TYPE *haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);
+DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+ DATA_SIZE, retaddr);
 ABI_TYPE ret;
 
 ret = qatomic_xchg__nocheck(haddr, BSWAP(val));
@@ -207,7 +211,7 @@ ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,
\
 ABI_TYPE val, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 DATA_TYPE *haddr, ret;  \
-haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);   \
+haddr = atomic_mmu_lookup(env_cpu(env), addr, oi, DATA_SIZE, retaddr);   \
 ret = qatomic_##X(haddr, BSWAP(val));   \
 ATOMIC_MMU_CLEANUP; \
 atomic_trace_rmw_post(env, addr, oi);   \
@@ -235,7 +239,7 @@ ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,
\
 ABI_TYPE xval, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 XDATA_TYPE *haddr, ldo, ldn, old, new, va

[PATCH v2 23/24] accel/tcg: Unify user and softmmu do_[st|ld]*_mmu()

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

The prototype of do_[st|ld]*_mmu() is unified between system- and
user-mode allowing a large chunk of helper_[st|ld]*() and cpu_[st|ld]*()
functions to be expressed in same manner between both modes. These
functions will be moved to ldst_common.c.inc in a following commit.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-11-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c|  16 ++--
 accel/tcg/user-exec.c | 183 --
 2 files changed, 117 insertions(+), 82 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index a7f2c848ad..cbab7e2648 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2937,18 +2937,24 @@ static void do_st_8(CPUState *cpu, MMULookupPageData 
*p, uint64_t val,
 }
 }
 
-void helper_stb_mmu(CPUArchState *env, uint64_t addr, uint32_t val,
-MemOpIdx oi, uintptr_t ra)
+static void do_st1_mmu(CPUState *cpu, vaddr addr, uint8_t val,
+   MemOpIdx oi, uintptr_t ra)
 {
 MMULookupLocals l;
 bool crosspage;
 
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
 cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
-crosspage = mmu_lookup(env_cpu(env), addr, oi, ra, MMU_DATA_STORE, &l);
+crosspage = mmu_lookup(cpu, addr, oi, ra, MMU_DATA_STORE, &l);
 tcg_debug_assert(!crosspage);
 
-do_st_1(env_cpu(env), &l.page[0], val, l.mmu_idx, ra);
+do_st_1(cpu, &l.page[0], val, l.mmu_idx, ra);
+}
+
+void helper_stb_mmu(CPUArchState *env, uint64_t addr, uint32_t val,
+MemOpIdx oi, uintptr_t ra)
+{
+tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
+do_st1_mmu(env_cpu(env), addr, val, oi, ra);
 }
 
 static void do_st2_mmu(CPUState *cpu, vaddr addr, uint16_t val,
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index f9f5cd1770..a6593d0e0f 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -941,7 +941,7 @@ void page_reset_target_data(target_ulong start, 
target_ulong last) { }
 
 /* The softmmu versions of these helpers are in cputlb.c.  */
 
-static void *cpu_mmu_lookup(CPUArchState *env, vaddr addr,
+static void *cpu_mmu_lookup(CPUState *cpu, vaddr addr,
 MemOp mop, uintptr_t ra, MMUAccessType type)
 {
 int a_bits = get_alignment_bits(mop);
@@ -949,25 +949,24 @@ static void *cpu_mmu_lookup(CPUArchState *env, vaddr addr,
 
 /* Enforce guest required alignment.  */
 if (unlikely(addr & ((1 << a_bits) - 1))) {
-cpu_loop_exit_sigbus(env_cpu(env), addr, type, ra);
+cpu_loop_exit_sigbus(cpu, addr, type, ra);
 }
 
-ret = g2h(env_cpu(env), addr);
+ret = g2h(cpu, addr);
 set_helper_retaddr(ra);
 return ret;
 }
 
 #include "ldst_atomicity.c.inc"
 
-static uint8_t do_ld1_mmu(CPUArchState *env, abi_ptr addr,
-  MemOp mop, uintptr_t ra)
+static uint8_t do_ld1_mmu(CPUState *cpu, vaddr addr, MemOpIdx oi,
+  uintptr_t ra, MMUAccessType access_type)
 {
 void *haddr;
 uint8_t ret;
 
-tcg_debug_assert((mop & MO_SIZE) == MO_8);
 cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
-haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
+haddr = cpu_mmu_lookup(cpu, addr, get_memop(oi), ra, access_type);
 ret = ldub_p(haddr);
 clear_helper_retaddr();
 return ret;
@@ -976,33 +975,38 @@ static uint8_t do_ld1_mmu(CPUArchState *env, abi_ptr addr,
 tcg_target_ulong helper_ldub_mmu(CPUArchState *env, uint64_t addr,
  MemOpIdx oi, uintptr_t ra)
 {
-return do_ld1_mmu(env, addr, get_memop(oi), ra);
+tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
+return do_ld1_mmu(env_cpu(env), addr, oi, ra, MMU_DATA_LOAD);
 }
 
 tcg_target_ulong helper_ldsb_mmu(CPUArchState *env, uint64_t addr,
  MemOpIdx oi, uintptr_t ra)
 {
-return (int8_t)do_ld1_mmu(env, addr, get_memop(oi), ra);
+tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
+return (int8_t)do_ld1_mmu(env_cpu(env), addr, oi, ra, MMU_DATA_LOAD);
 }
 
 uint8_t cpu_ldb_mmu(CPUArchState *env, abi_ptr addr,
 MemOpIdx oi, uintptr_t ra)
 {
-uint8_t ret = do_ld1_mmu(env, addr, get_memop(oi), ra);
+uint8_t ret;
+
+tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
+ret = do_ld1_mmu(env_cpu(env), addr, oi, ra, MMU_DATA_LOAD);
 qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R);
 return ret;
 }
 
-static uint16_t do_ld2_mmu(CPUArchState *env, abi_ptr addr,
-   MemOp mop, uintptr_t ra)
+static uint16_t do_ld2_mmu(CPUState *cpu, vaddr addr, MemOpIdx oi,
+   uintptr_t ra, MMUAccessType access_type)
 {
 void *haddr;
 uint16_t ret;
+MemOp mop = get_memop(oi);
 
-tcg_debug_assert((mop & MO_SIZE) == MO_16);
 cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
-haddr = cpu_mmu

[PATCH v2 15/24] accel/tcg: Remove env_neg()

2023-09-13 Thread Richard Henderson

Replace the single use within env_tlb() and remove.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 9db8544125..af9516654a 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -451,17 +451,6 @@ static inline CPUState *env_cpu(CPUArchState *env)
 return (void *)env - sizeof(CPUState);
 }
 
-/**
- * env_neg(env)
- * @env: The architecture environment
- *
- * Return the CPUNegativeOffsetState associated with the environment.
- */
-static inline CPUNegativeOffsetState *env_neg(CPUArchState *env)
-{
-return (void *)env - sizeof(CPUNegativeOffsetState);
-}
-
 /**
  * env_tlb(env)
  * @env: The architecture environment
@@ -470,7 +459,7 @@ static inline CPUNegativeOffsetState *env_neg(CPUArchState 
*env)
  */
 static inline CPUTLB *env_tlb(CPUArchState *env)
 {
-return &env_neg(env)->tlb;
+return &env_cpu(env)->neg.tlb;
 }
 
 #endif /* CPU_ALL_H */
-- 
2.34.1

[PATCH v2 14/24] accel/tcg: Remove cpu_set_cpustate_pointers

2023-09-13 Thread Richard Henderson

This function is now empty, so remove it.  In the case of
m68k and tricore, this empties the class instance initfn,
so remove those as well.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h  | 10 --
 target/alpha/cpu.c  |  2 --
 target/arm/cpu.c|  1 -
 target/avr/cpu.c|  2 --
 target/cris/cpu.c   |  2 --
 target/hexagon/cpu.c|  3 ---
 target/hppa/cpu.c   |  1 -
 target/i386/cpu.c   |  1 -
 target/loongarch/cpu.c  |  8 +++-
 target/m68k/cpu.c   |  8 
 target/microblaze/cpu.c |  1 -
 target/mips/cpu.c   |  1 -
 target/nios2/cpu.c  |  4 +---
 target/openrisc/cpu.c   |  6 +-
 target/ppc/cpu_init.c   |  1 -
 target/riscv/cpu.c  |  6 +-
 target/rx/cpu.c |  1 -
 target/s390x/cpu.c  |  2 --
 target/sh4/cpu.c|  2 --
 target/sparc/cpu.c  |  2 --
 target/tricore/cpu.c|  9 -
 target/xtensa/cpu.c |  1 -
 22 files changed, 6 insertions(+), 68 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 40831122ce..9db8544125 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -425,16 +425,6 @@ int cpu_exec(CPUState *cpu);
 void tcg_exec_realizefn(CPUState *cpu, Error **errp);
 void tcg_exec_unrealizefn(CPUState *cpu);
 
-/**
- * cpu_set_cpustate_pointers(cpu)
- * @cpu: The cpu object
- *
- * Set the generic pointers in CPUState into the outer object.
- */
-static inline void cpu_set_cpustate_pointers(ArchCPU *cpu)
-{
-}
-
 /* Validate correct placement of CPUArchState. */
 QEMU_BUILD_BUG_ON(offsetof(ArchCPU, parent_obj) != 0);
 QEMU_BUILD_BUG_ON(offsetof(ArchCPU, env) != sizeof(CPUState));
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index e2156fcb41..51b7d8d1bf 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -209,8 +209,6 @@ static void alpha_cpu_initfn(Object *obj)
 AlphaCPU *cpu = ALPHA_CPU(obj);
 CPUAlphaState *env = &cpu->env;
 
-cpu_set_cpustate_pointers(cpu);
-
 env->lock_addr = -1;
 #if defined(CONFIG_USER_ONLY)
 env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index b76025485d..78e42d6dcc 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1215,7 +1215,6 @@ static void arm_cpu_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
 
-cpu_set_cpustate_pointers(cpu);
 cpu->cp_regs = g_hash_table_new_full(g_direct_hash, g_direct_equal,
  NULL, g_free);
 
diff --git a/target/avr/cpu.c b/target/avr/cpu.c
index c5a6436336..14d8b9d1f0 100644
--- a/target/avr/cpu.c
+++ b/target/avr/cpu.c
@@ -147,8 +147,6 @@ static void avr_cpu_initfn(Object *obj)
 {
 AVRCPU *cpu = AVR_CPU(obj);
 
-cpu_set_cpustate_pointers(cpu);
-
 /* Set the number of interrupts supported by the CPU. */
 qdev_init_gpio_in(DEVICE(cpu), avr_cpu_set_int,
   sizeof(cpu->env.intsrc) * 8);
diff --git a/target/cris/cpu.c b/target/cris/cpu.c
index 8ab8a30b8d..be4a44c218 100644
--- a/target/cris/cpu.c
+++ b/target/cris/cpu.c
@@ -201,8 +201,6 @@ static void cris_cpu_initfn(Object *obj)
 CRISCPUClass *ccc = CRIS_CPU_GET_CLASS(obj);
 CPUCRISState *env = &cpu->env;
 
-cpu_set_cpustate_pointers(cpu);
-
 env->pregs[PR_VR] = ccc->vr;
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 65f198b956..1adc11b713 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -353,9 +353,6 @@ static void hexagon_cpu_realize(DeviceState *dev, Error 
**errp)
 
 static void hexagon_cpu_init(Object *obj)
 {
-HexagonCPU *cpu = HEXAGON_CPU(obj);
-
-cpu_set_cpustate_pointers(cpu);
 qdev_property_add_static(DEVICE(obj), &hexagon_lldb_compat_property);
 qdev_property_add_static(DEVICE(obj), &hexagon_lldb_stack_adjust_property);
 qdev_property_add_static(DEVICE(obj), &hexagon_short_circuit_property);
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 17fa901f6a..1644297bf8 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -149,7 +149,6 @@ static void hppa_cpu_initfn(Object *obj)
 HPPACPU *cpu = HPPA_CPU(obj);
 CPUHPPAState *env = &cpu->env;
 
-cpu_set_cpustate_pointers(cpu);
 cs->exception_index = -1;
 cpu_hppa_loaded_fr0(env);
 cpu_hppa_put_psw(env, PSW_W);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9565e3d160..b72affdbe3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7590,7 +7590,6 @@ static void x86_cpu_initfn(Object *obj)
 CPUX86State *env = &cpu->env;
 
 env->nr_dies = 1;
-cpu_set_cpustate_pointers(cpu);
 
 object_property_add(obj, "feature-words", "X86CPUFeatureWordInfo",
 x86_cpu_get_feature_words,
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 56e67cea8d..e70773c22e 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -614,17 +614,15 @@ static const MemoryRegionOps loongarch_qemu_ops = {
 
 static void loongarch_cpu_init(Obje

[PATCH v2 01/24] target/arm: Replace TARGET_PAGE_ENTRY_EXTRA

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

TARGET_PAGE_ENTRY_EXTRA is a macro that allows guests to specify additional
fields for caching with the full TLB entry.  This macro is replaced with
a union in CPUTLBEntryFull, thus making CPUTLB target-agnostic at the
cost of slightly inflated CPUTLBEntryFull for non-arm guests.

Note, this is needed to ensure that fields in CPUTLB don't vary in
offset between various targets.

(arm is the only guest actually making use of this feature.)

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-2-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-defs.h| 18 +++---
 target/arm/cpu-param.h | 12 
 target/arm/ptw.c   |  4 ++--
 target/arm/tcg/mte_helper.c|  2 +-
 target/arm/tcg/sve_helper.c|  2 +-
 target/arm/tcg/tlb_helper.c|  4 ++--
 target/arm/tcg/translate-a64.c |  2 +-
 7 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index fb4c8d480f..0a600a312b 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -135,9 +135,21 @@ typedef struct CPUTLBEntryFull {
  * This may be used to cache items from the guest cpu
  * page tables for later use by the implementation.
  */
-#ifdef TARGET_PAGE_ENTRY_EXTRA
-TARGET_PAGE_ENTRY_EXTRA
-#endif
+union {
+/*
+ * Cache the attrs and shareability fields from the page table entry.
+ *
+ * For ARMMMUIdx_Stage2*, pte_attrs is the S2 descriptor bits [5:2].
+ * Otherwise, pte_attrs is the same as the MAIR_EL1 8-bit format.
+ * For shareability and guarded, as in the SH and GP fields 
respectively
+ * of the VMSAv8-64 PTEs.
+ */
+struct {
+uint8_t pte_attrs;
+uint8_t shareability;
+bool guarded;
+} arm;
+} extra;
 } CPUTLBEntryFull;
 #endif /* CONFIG_SOFTMMU */
 
diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index b3b35f7aa1..f9b462a98f 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -31,18 +31,6 @@
 # define TARGET_PAGE_BITS_VARY
 # define TARGET_PAGE_BITS_MIN  10
 
-/*
- * Cache the attrs and shareability fields from the page table entry.
- *
- * For ARMMMUIdx_Stage2*, pte_attrs is the S2 descriptor bits [5:2].
- * Otherwise, pte_attrs is the same as the MAIR_EL1 8-bit format.
- * For shareability and guarded, as in the SH and GP fields respectively
- * of the VMSAv8-64 PTEs.
- */
-# define TARGET_PAGE_ENTRY_EXTRA  \
-uint8_t pte_attrs;\
-uint8_t shareability; \
-bool guarded;
 #endif
 
 #endif
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index bfbab26b9b..95db9ec4c3 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -579,7 +579,7 @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate 
*ptw,
 }
 ptw->out_phys = full->phys_addr | (addr & ~TARGET_PAGE_MASK);
 ptw->out_rw = full->prot & PAGE_WRITE;
-pte_attrs = full->pte_attrs;
+pte_attrs = full->extra.arm.pte_attrs;
 ptw->out_space = full->attrs.space;
 #else
 g_assert_not_reached();
@@ -2036,7 +2036,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 
 /* When in aarch64 mode, and BTI is enabled, remember GP in the TLB. */
 if (aarch64 && cpu_isar_feature(aa64_bti, cpu)) {
-result->f.guarded = extract64(attrs, 50, 1); /* GP */
+result->f.extra.arm.guarded = extract64(attrs, 50, 1); /* GP */
 }
 }
 
diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c
index b23d11563a..dba21cc4d6 100644
--- a/target/arm/tcg/mte_helper.c
+++ b/target/arm/tcg/mte_helper.c
@@ -124,7 +124,7 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
 assert(!(flags & TLB_INVALID_MASK));
 
 /* If the virtual page MemAttr != Tagged, access unchecked. */
-if (full->pte_attrs != 0xf0) {
+if (full->extra.arm.pte_attrs != 0xf0) {
 return NULL;
 }
 
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 7c103fc9f7..f006d152cc 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5373,7 +5373,7 @@ bool sve_probe_page(SVEHostPage *info, bool nofault, 
CPUARMState *env,
 info->tagged = (flags & PAGE_ANON) && (flags & PAGE_MTE);
 #else
 info->attrs = full->attrs;
-info->tagged = full->pte_attrs == 0xf0;
+info->tagged = full->extra.arm.pte_attrs == 0xf0;
 #endif
 
 /* Ensure that info->host[] is relative to addr, not addr + mem_off. */
diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
index b22b2a4c6e..59bff8b452 100644
--- a/target/arm/tcg/tlb_helper.c
+++ b/target/arm/tcg/tlb_helper.c
@@ -334,8 +334,8 @@ bool arm_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 address &= TARGET_PAGE_MASK;
 }
 
-res.f.pte_attrs = res.cacheattrs.attrs;
-res.

[PATCH v2 19/24] accel/tcg: Modifies memory access functions to use CPUState

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

do_[ld|st]*() and mmu_lookup*() are changed to use CPUState over
CPUArchState, moving the target-dependence to the target-facing facing
cpu_[ld|st] functions.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-6-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb; cpu_env instead of env_ptr.]
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 324 ++---
 1 file changed, 161 insertions(+), 163 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index f3ac87050e..29c35bd201 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1367,11 +1367,10 @@ static void save_iotlb_data(CPUState *cs, 
MemoryRegionSection *section,
 #endif
 }
 
-static uint64_t io_readx(CPUArchState *env, CPUTLBEntryFull *full,
+static uint64_t io_readx(CPUState *cpu, CPUTLBEntryFull *full,
  int mmu_idx, vaddr addr, uintptr_t retaddr,
  MMUAccessType access_type, MemOp op)
 {
-CPUState *cpu = env_cpu(env);
 hwaddr mr_offset;
 MemoryRegionSection *section;
 MemoryRegion *mr;
@@ -1408,11 +1407,10 @@ static uint64_t io_readx(CPUArchState *env, 
CPUTLBEntryFull *full,
 return val;
 }
 
-static void io_writex(CPUArchState *env, CPUTLBEntryFull *full,
+static void io_writex(CPUState *cpu, CPUTLBEntryFull *full,
   int mmu_idx, uint64_t val, vaddr addr,
   uintptr_t retaddr, MemOp op)
 {
-CPUState *cpu = env_cpu(env);
 hwaddr mr_offset;
 MemoryRegionSection *section;
 MemoryRegion *mr;
@@ -1776,7 +1774,7 @@ typedef struct MMULookupLocals {
 
 /**
  * mmu_lookup1: translate one page
- * @env: cpu context
+ * @cpu: generic cpu state
  * @data: lookup parameters
  * @mmu_idx: virtual address context
  * @access_type: load/store/code
@@ -1787,12 +1785,12 @@ typedef struct MMULookupLocals {
  * tlb_fill will longjmp out.  Return true if the softmmu tlb for
  * @mmu_idx may have resized.
  */
-static bool mmu_lookup1(CPUArchState *env, MMULookupPageData *data,
+static bool mmu_lookup1(CPUState *cpu, MMULookupPageData *data,
 int mmu_idx, MMUAccessType access_type, uintptr_t ra)
 {
 vaddr addr = data->addr;
-uintptr_t index = tlb_index(env_cpu(env), mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env_cpu(env), mmu_idx, addr);
+uintptr_t index = tlb_index(cpu, mmu_idx, addr);
+CPUTLBEntry *entry = tlb_entry(cpu, mmu_idx, addr);
 uint64_t tlb_addr = tlb_read_idx(entry, access_type);
 bool maybe_resized = false;
 CPUTLBEntryFull *full;
@@ -1800,17 +1798,17 @@ static bool mmu_lookup1(CPUArchState *env, 
MMULookupPageData *data,
 
 /* If the TLB entry is for a different page, reload and try again.  */
 if (!tlb_hit(tlb_addr, addr)) {
-if (!victim_tlb_hit(env_cpu(env), mmu_idx, index, access_type,
+if (!victim_tlb_hit(cpu, mmu_idx, index, access_type,
 addr & TARGET_PAGE_MASK)) {
-tlb_fill(env_cpu(env), addr, data->size, access_type, mmu_idx, ra);
+tlb_fill(cpu, addr, data->size, access_type, mmu_idx, ra);
 maybe_resized = true;
-index = tlb_index(env_cpu(env), mmu_idx, addr);
-entry = tlb_entry(env_cpu(env), mmu_idx, addr);
+index = tlb_index(cpu, mmu_idx, addr);
+entry = tlb_entry(cpu, mmu_idx, addr);
 }
 tlb_addr = tlb_read_idx(entry, access_type) & ~TLB_INVALID_MASK;
 }
 
-full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
+full = &cpu->neg.tlb.d[mmu_idx].fulltlb[index];
 flags = tlb_addr & (TLB_FLAGS_MASK & ~TLB_FORCE_SLOW);
 flags |= full->slow_flags[access_type];
 
@@ -1824,7 +1822,7 @@ static bool mmu_lookup1(CPUArchState *env, 
MMULookupPageData *data,
 
 /**
  * mmu_watch_or_dirty
- * @env: cpu context
+ * @cpu: generic cpu state
  * @data: lookup parameters
  * @access_type: load/store/code
  * @ra: return address into tcg generated code, or 0
@@ -1832,7 +1830,7 @@ static bool mmu_lookup1(CPUArchState *env, 
MMULookupPageData *data,
  * Trigger watchpoints for @data.addr:@data.size;
  * record writes to protected clean pages.
  */
-static void mmu_watch_or_dirty(CPUArchState *env, MMULookupPageData *data,
+static void mmu_watch_or_dirty(CPUState *cpu, MMULookupPageData *data,
MMUAccessType access_type, uintptr_t ra)
 {
 CPUTLBEntryFull *full = data->full;
@@ -1843,13 +1841,13 @@ static void mmu_watch_or_dirty(CPUArchState *env, 
MMULookupPageData *data,
 /* On watchpoint hit, this will longjmp out.  */
 if (flags & TLB_WATCHPOINT) {
 int wp = access_type == MMU_DATA_STORE ? BP_MEM_WRITE : BP_MEM_READ;
-cpu_check_watchpoint(env_cpu(env), addr, size, full->attrs, wp, ra);
+cpu_check_watchpoint(cpu, addr, size, full->attrs, wp, ra);
 flags &= ~TLB_WATCHPOINT;
 }
 
 /* Note that not

[PATCH v2 17/24] accel/tcg: Modify tlb_*() to use CPUState

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

Changes tlb_*() functions to take CPUState instead of CPUArchState, as
they don't require the full CPUArchState. This makes it easier to
decouple target-(in)dependent code.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-4-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb()]
Signed-off-by: Richard Henderson 
---
 include/exec/cpu_ldst.h |   8 +-
 accel/tcg/cputlb.c  | 220 +++-
 2 files changed, 108 insertions(+), 120 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index da10ba1433..6061e33ac9 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -361,19 +361,19 @@ static inline uint64_t tlb_addr_write(const CPUTLBEntry 
*entry)
 }
 
 /* Find the TLB index corresponding to the mmu_idx + address pair.  */
-static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
+static inline uintptr_t tlb_index(CPUState *cpu, uintptr_t mmu_idx,
   vaddr addr)
 {
-uintptr_t size_mask = env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
+uintptr_t size_mask = cpu->neg.tlb.f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
 
 return (addr >> TARGET_PAGE_BITS) & size_mask;
 }
 
 /* Find the TLB entry corresponding to the mmu_idx + address pair.  */
-static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
+static inline CPUTLBEntry *tlb_entry(CPUState *cpu, uintptr_t mmu_idx,
  vaddr addr)
 {
-return &env_tlb(env)->f[mmu_idx].table[tlb_index(env, mmu_idx, addr)];
+return &cpu->neg.tlb.f[mmu_idx].table[tlb_index(cpu, mmu_idx, addr)];
 }
 
 #endif /* defined(CONFIG_USER_ONLY) */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 095bc79b6f..08df68f03a 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -240,11 +240,11 @@ static void tlb_mmu_flush_locked(CPUTLBDesc *desc, 
CPUTLBDescFast *fast)
 memset(desc->vtable, -1, sizeof(desc->vtable));
 }
 
-static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx,
+static void tlb_flush_one_mmuidx_locked(CPUState *cpu, int mmu_idx,
 int64_t now)
 {
-CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
-CPUTLBDescFast *fast = &env_tlb(env)->f[mmu_idx];
+CPUTLBDesc *desc = &cpu->neg.tlb.d[mmu_idx];
+CPUTLBDescFast *fast = &cpu->neg.tlb.f[mmu_idx];
 
 tlb_mmu_resize_locked(desc, fast, now);
 tlb_mmu_flush_locked(desc, fast);
@@ -262,41 +262,39 @@ static void tlb_mmu_init(CPUTLBDesc *desc, CPUTLBDescFast 
*fast, int64_t now)
 tlb_mmu_flush_locked(desc, fast);
 }
 
-static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)
+static inline void tlb_n_used_entries_inc(CPUState *cpu, uintptr_t mmu_idx)
 {
-env_tlb(env)->d[mmu_idx].n_used_entries++;
+cpu->neg.tlb.d[mmu_idx].n_used_entries++;
 }
 
-static inline void tlb_n_used_entries_dec(CPUArchState *env, uintptr_t mmu_idx)
+static inline void tlb_n_used_entries_dec(CPUState *cpu, uintptr_t mmu_idx)
 {
-env_tlb(env)->d[mmu_idx].n_used_entries--;
+cpu->neg.tlb.d[mmu_idx].n_used_entries--;
 }
 
 void tlb_init(CPUState *cpu)
 {
-CPUArchState *env = cpu_env(cpu);
 int64_t now = get_clock_realtime();
 int i;
 
-qemu_spin_init(&env_tlb(env)->c.lock);
+qemu_spin_init(&cpu->neg.tlb.c.lock);
 
 /* All tlbs are initialized flushed. */
-env_tlb(env)->c.dirty = 0;
+cpu->neg.tlb.c.dirty = 0;
 
 for (i = 0; i < NB_MMU_MODES; i++) {
-tlb_mmu_init(&env_tlb(env)->d[i], &env_tlb(env)->f[i], now);
+tlb_mmu_init(&cpu->neg.tlb.d[i], &cpu->neg.tlb.f[i], now);
 }
 }
 
 void tlb_destroy(CPUState *cpu)
 {
-CPUArchState *env = cpu_env(cpu);
 int i;
 
-qemu_spin_destroy(&env_tlb(env)->c.lock);
+qemu_spin_destroy(&cpu->neg.tlb.c.lock);
 for (i = 0; i < NB_MMU_MODES; i++) {
-CPUTLBDesc *desc = &env_tlb(env)->d[i];
-CPUTLBDescFast *fast = &env_tlb(env)->f[i];
+CPUTLBDesc *desc = &cpu->neg.tlb.d[i];
+CPUTLBDescFast *fast = &cpu->neg.tlb.f[i];
 
 g_free(fast->table);
 g_free(desc->fulltlb);
@@ -328,11 +326,9 @@ void tlb_flush_counts(size_t *pfull, size_t *ppart, size_t 
*pelide)
 size_t full = 0, part = 0, elide = 0;
 
 CPU_FOREACH(cpu) {
-CPUArchState *env = cpu_env(cpu);
-
-full += qatomic_read(&env_tlb(env)->c.full_flush_count);
-part += qatomic_read(&env_tlb(env)->c.part_flush_count);
-elide += qatomic_read(&env_tlb(env)->c.elide_flush_count);
+full += qatomic_read(&cpu->neg.tlb.c.full_flush_count);
+part += qatomic_read(&cpu->neg.tlb.c.part_flush_count);
+elide += qatomic_read(&cpu->neg.tlb.c.elide_flush_count);
 }
 *pfull = full;
 *ppart = part;
@@ -341,7 +337,6 @@ void tlb_flush_counts(size_t *pfull, size_t *ppart, size_t 
*pelide)
 
 static void tlb_flush_by_mmuidx_async_w

[PATCH v2 13/24] accel/tcg: Replace CPUState.env_ptr with cpu_env()

2023-09-13 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h   |  1 -
 include/hw/core/cpu.h|  9 ++---
 target/arm/common-semi-target.h  |  2 +-
 accel/tcg/cpu-exec.c |  8 
 accel/tcg/cputlb.c   | 18 +-
 accel/tcg/translate-all.c|  4 ++--
 gdbstub/gdbstub.c|  4 ++--
 gdbstub/user-target.c|  2 +-
 hw/i386/kvm/clock.c  |  2 +-
 hw/intc/mips_gic.c   |  2 +-
 hw/intc/riscv_aclint.c   | 12 ++--
 hw/intc/riscv_imsic.c|  2 +-
 hw/ppc/e500.c|  4 ++--
 hw/ppc/spapr.c   |  2 +-
 linux-user/elfload.c |  4 ++--
 linux-user/i386/cpu_loop.c   |  2 +-
 linux-user/main.c|  4 ++--
 linux-user/signal.c  | 15 +++
 monitor/hmp-cmds-target.c|  2 +-
 semihosting/arm-compat-semi.c|  6 +++---
 semihosting/syscalls.c   | 28 ++--
 target/alpha/translate.c |  4 ++--
 target/arm/cpu.c |  8 
 target/arm/helper.c  |  2 +-
 target/arm/tcg/translate-a64.c   |  4 ++--
 target/arm/tcg/translate.c   |  6 +++---
 target/avr/translate.c   |  2 +-
 target/cris/translate.c  |  4 ++--
 target/hexagon/translate.c   |  4 ++--
 target/hppa/mem_helper.c |  2 +-
 target/hppa/translate.c  |  4 ++--
 target/i386/tcg/sysemu/excp_helper.c |  2 +-
 target/i386/tcg/tcg-cpu.c|  2 +-
 target/i386/tcg/translate.c  |  4 ++--
 target/loongarch/translate.c |  4 ++--
 target/m68k/translate.c  |  4 ++--
 target/microblaze/translate.c|  2 +-
 target/mips/tcg/sysemu/mips-semi.c   |  4 ++--
 target/mips/tcg/translate.c  |  4 ++--
 target/nios2/translate.c |  4 ++--
 target/openrisc/translate.c  |  2 +-
 target/ppc/excp_helper.c | 10 +-
 target/ppc/translate.c   |  4 ++--
 target/riscv/translate.c |  6 +++---
 target/rx/cpu.c  |  3 ---
 target/rx/translate.c|  2 +-
 target/s390x/tcg/translate.c |  2 +-
 target/sh4/op_helper.c   |  2 +-
 target/sh4/translate.c   |  4 ++--
 target/sparc/translate.c |  4 ++--
 target/tricore/translate.c   |  4 ++--
 target/xtensa/translate.c|  4 ++--
 target/i386/tcg/decode-new.c.inc |  2 +-
 53 files changed, 125 insertions(+), 127 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ae0cb2ce50..40831122ce 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -433,7 +433,6 @@ void tcg_exec_unrealizefn(CPUState *cpu);
  */
 static inline void cpu_set_cpustate_pointers(ArchCPU *cpu)
 {
-cpu->parent_obj.env_ptr = &cpu->env;
 }
 
 /* Validate correct placement of CPUArchState. */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 99066da2f3..f3fa1ffa95 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -440,7 +440,6 @@ struct qemu_work_item;
  * @num_ases: number of CPUAddressSpaces in @cpu_ases
  * @as: Pointer to the first AddressSpace, for the convenience of targets which
  *  only have a single AddressSpace
- * @env_ptr: Pointer to subclass-specific CPUArchState field.
  * @gdb_regs: Additional GDB registers.
  * @gdb_num_regs: Number of total registers accessible to GDB.
  * @gdb_num_g_regs: Number of registers in GDB 'g' packets.
@@ -511,8 +510,6 @@ struct CPUState {
 AddressSpace *as;
 MemoryRegion *memory;
 
-CPUArchState *env_ptr;
-
 CPUJumpCache *tb_jmp_cache;
 
 struct GDBRegisterState *gdb_regs;
@@ -594,6 +591,12 @@ struct CPUState {
 QEMU_BUILD_BUG_ON(offsetof(CPUState, neg) + sizeof(CPUNegativeOffsetState)
   != sizeof(CPUState));
 
+static inline CPUArchState *cpu_env(CPUState *cpu)
+{
+/* We validate that CPUArchState follows CPUState in cpu-all.h. */
+return (CPUArchState *)(cpu + 1);
+}
+
 typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
 extern CPUTailQ cpus;
 
diff --git a/target/arm/common-semi-target.h b/target/arm/common-semi-target.h
index 629d75ca5a..19438ed8cd 100644
--- a/target/arm/common-semi-target.h
+++ b/target/arm/common-semi-target.h
@@ -38,7 +38,7 @@ static inline void common_semi_set_ret(CPUState *cs, 
target_ulong ret)
 
 static inline bool common_semi_sys_exit_extended(CPUState *cs, int nr)
 {
-return (nr == TARGET_SYS_EXIT_EXTENDED || is_a64(cs->env_ptr));
+return nr == TARGET_SYS_EXIT_EXTENDED || is_a64(cpu_env(cs));
 }
 
 static inline bool is_64bit_semihosting(CPUArchState *env)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 4abbd037f3..0e7eeef001 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -222,7 +222,7 @@ static Trans

[PATCH v2 06/24] target/*: Add instance_align to all cpu base classes

2023-09-13 Thread Richard Henderson

The omission of alignment has technically been wrong since
269bd5d8f61, where QEMU_ALIGNED was added to CPUTLBDescFast.

Signed-off-by: Richard Henderson 
---
 target/alpha/cpu.c  | 1 +
 target/avr/cpu.c| 1 +
 target/cris/cpu.c   | 1 +
 target/hexagon/cpu.c| 1 +
 target/hppa/cpu.c   | 1 +
 target/i386/cpu.c   | 1 +
 target/loongarch/cpu.c  | 1 +
 target/m68k/cpu.c   | 1 +
 target/microblaze/cpu.c | 1 +
 target/mips/cpu.c   | 1 +
 target/nios2/cpu.c  | 1 +
 target/openrisc/cpu.c   | 1 +
 target/riscv/cpu.c  | 2 +-
 target/rx/cpu.c | 1 +
 target/sh4/cpu.c| 1 +
 target/sparc/cpu.c  | 1 +
 target/tricore/cpu.c| 1 +
 target/xtensa/cpu.c | 1 +
 18 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index 270ae787b1..e2156fcb41 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -286,6 +286,7 @@ static const TypeInfo alpha_cpu_type_infos[] = {
 .name = TYPE_ALPHA_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(AlphaCPU),
+.instance_align = __alignof(AlphaCPU),
 .instance_init = alpha_cpu_initfn,
 .abstract = true,
 .class_size = sizeof(AlphaCPUClass),
diff --git a/target/avr/cpu.c b/target/avr/cpu.c
index 8f741f258c..c5a6436336 100644
--- a/target/avr/cpu.c
+++ b/target/avr/cpu.c
@@ -390,6 +390,7 @@ static const TypeInfo avr_cpu_type_info[] = {
 .name = TYPE_AVR_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(AVRCPU),
+.instance_align = __alignof(AVRCPU),
 .instance_init = avr_cpu_initfn,
 .class_size = sizeof(AVRCPUClass),
 .class_init = avr_cpu_class_init,
diff --git a/target/cris/cpu.c b/target/cris/cpu.c
index a6a93c2359..8ab8a30b8d 100644
--- a/target/cris/cpu.c
+++ b/target/cris/cpu.c
@@ -345,6 +345,7 @@ static const TypeInfo cris_cpu_model_type_infos[] = {
 .name = TYPE_CRIS_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(CRISCPU),
+.instance_align = __alignof(CRISCPU),
 .instance_init = cris_cpu_initfn,
 .abstract = true,
 .class_size = sizeof(CRISCPUClass),
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index f155936289..65f198b956 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -408,6 +408,7 @@ static const TypeInfo hexagon_cpu_type_infos[] = {
 .name = TYPE_HEXAGON_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(HexagonCPU),
+.instance_align = __alignof(HexagonCPU),
 .instance_init = hexagon_cpu_init,
 .abstract = true,
 .class_size = sizeof(HexagonCPUClass),
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 11022f9c99..17fa901f6a 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -212,6 +212,7 @@ static const TypeInfo hppa_cpu_type_info = {
 .name = TYPE_HPPA_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(HPPACPU),
+.instance_align = __alignof(HPPACPU),
 .instance_init = hppa_cpu_initfn,
 .abstract = false,
 .class_size = sizeof(HPPACPUClass),
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 24ee67b42d..9565e3d160 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8022,6 +8022,7 @@ static const TypeInfo x86_cpu_type_info = {
 .name = TYPE_X86_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(X86CPU),
+.instance_align = __alignof(X86CPU),
 .instance_init = x86_cpu_initfn,
 .instance_post_init = x86_cpu_post_initfn,
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 65f9320e34..56e67cea8d 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -804,6 +804,7 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .name = TYPE_LOONGARCH_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(LoongArchCPU),
+.instance_align = __alignof(LoongArchCPU),
 .instance_init = loongarch_cpu_init,
 
 .abstract = true,
diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index 70d58471dc..d34d1b57d0 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -611,6 +611,7 @@ static const TypeInfo m68k_cpus_type_infos[] = {
 .name = TYPE_M68K_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(M68kCPU),
+.instance_align = __alignof(M68kCPU),
 .instance_init = m68k_cpu_initfn,
 .abstract = true,
 .class_size = sizeof(M68kCPUClass),
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 03c2c4db1f..c53711da52 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -439,6 +439,7 @@ static const TypeInfo mb_cpu_type_info = {
 .name = TYPE_MICROBLAZE_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(MicroBlazeCPU),
+.instance_align = __alignof(MicroBlazeCPU),
 .instance_init = mb_cpu_initfn,
 .class_size = sizeof(MicroBlazeCPUClass),
 .class_init = mb_cpu_class_init,
diff --g

[PATCH v2 00/24] Reduce usage of CPUArchState in cputlb.c

2023-09-13 Thread Richard Henderson

Now that CPUTLB has target-independent types, and a target-independent
number of mmu_idx, there's very little reason not to merge it into CPUState.
Once we've done that, all of the objections I had vs Anton's v1 go away.
Indeed even more cleanups are possible, like removing cpu->env_ptr,
because the placement of CPUState and CPUArchState are target-independent,
which we can double-check at build-time.

I'm not happy with the size of patch 12, but I also really want
the pair of functions env_cpu() and cpu_env() to match.


r~


Anton Johansson (9):
  target/arm: Replace TARGET_PAGE_ENTRY_EXTRA
  accel/tcg: Modify tlb_*() to use CPUState
  accel/tcg: Modify probe_access_internal() to use CPUState
  accel/tcg: Modifies memory access functions to use CPUState
  accel/tcg: Modify atomic_mmu_lookup() to use CPUState
  accel/tcg: Use CPUState in atomicity helpers
  accel/tcg: Remove env_tlb()
  accel/tcg: Unify user and softmmu do_[st|ld]*_mmu()
  accel/tcg: move ld/st helpers to ldst_common.c.inc

Richard Henderson (15):
  accel/tcg: Move CPUTLB definitions from cpu-defs.h
  qom: Propagate alignment through type system
  target/*: Use __alignof not __alignof__
  target/arm: Remove size and alignment for cpu subclasses
  target/*: Add instance_align to all cpu base classes
  accel/tcg: Validate placement of CPUNegativeOffsetState
  accel/tcg: Move CPUNegativeOffsetState into CPUState
  accel/tcg: Remove CPUState.icount_decr_ptr
  accel/tcg: Move can_do_io to CPUNegativeOffsetState
  accel/tcg: Remove cpu_neg()
  tcg: Rename cpu_env to tcg_env
  accel/tcg: Replace CPUState.env_ptr with cpu_env()
  accel/tcg: Remove cpu_set_cpustate_pointers
  accel/tcg: Remove env_neg()
  tcg: Remove TCGContext.tlb_fast_offset

 accel/tcg/atomic_template.h   |   20 +-
 include/exec/cpu-all.h|   53 +-
 include/exec/cpu-defs.h   |  138 --
 include/exec/cpu_ldst.h   |8 +-
 include/exec/exec-all.h   |2 +-
 include/hw/core/cpu.h |  164 ++-
 include/tcg/tcg.h |3 +-
 target/alpha/cpu.h|1 -
 target/arm/common-semi-target.h   |2 +-
 target/arm/cpu-param.h|   12 -
 target/arm/cpu.h  |1 -
 target/arm/tcg/translate-a32.h|2 +-
 target/arm/tcg/translate-a64.h|4 +-
 target/arm/tcg/translate.h|   16 +-
 target/avr/cpu.h  |1 -
 target/cris/cpu.h |1 -
 target/hexagon/cpu.h  |2 +-
 target/hexagon/gen_tcg.h  |  120 +-
 target/hexagon/gen_tcg_hvx.h  |   20 +-
 target/hexagon/macros.h   |8 +-
 target/hppa/cpu.h |1 -
 target/i386/cpu.h |1 -
 target/loongarch/cpu.h|1 -
 target/m68k/cpu.h |1 -
 target/microblaze/cpu.h   |6 +-
 target/mips/cpu.h |4 +-
 target/mips/tcg/translate.h   |6 +-
 target/nios2/cpu.h|1 -
 target/openrisc/cpu.h |1 -
 target/ppc/cpu.h  |1 -
 target/riscv/cpu.h|2 +-
 target/rx/cpu.h   |1 -
 target/s390x/cpu.h|1 -
 target/sh4/cpu.h  |1 -
 target/sparc/cpu.h|1 -
 target/tricore/cpu.h  |1 -
 target/xtensa/cpu.h   |3 +-
 accel/dummy-cpus.c|2 +-
 accel/kvm/kvm-accel-ops.c |2 +-
 accel/tcg/cpu-exec-common.c   |2 +-
 accel/tcg/cpu-exec.c  |   24 +-
 accel/tcg/cputlb.c|  762 --
 accel/tcg/tcg-accel-ops-icount.c  |8 +-
 accel/tcg/tcg-accel-ops-mttcg.c   |2 +-
 accel/tcg/tcg-accel-ops-rr.c  |4 +-
 accel/tcg/tcg-accel-ops.c |2 +-
 accel/tcg/translate-all.c |   12 +-
 accel/tcg/translator.c|   20 +-
 accel/tcg/user-exec.c |  276 +---
 gdbstub/gdbstub.c |4 +-
 gdbstub/user-target.c |2 +-
 hw/core/cpu-common.c  |6 +-
 hw/i386/kvm/clock.c   |2 +-
 hw/intc/mips_gic.c|2 +-
 hw/intc/riscv_aclint.c|   12 +-
 hw/intc/riscv_imsic.c |2 +-
 hw/ppc/e500.c |4 +-
 hw/ppc/spapr.c

[PATCH v2 08/24] accel/tcg: Move CPUNegativeOffsetState into CPUState

2023-09-13 Thread Richard Henderson

Retain the separate structure to emphasize its importance.
Enforce CPUArchState always follows CPUState without padding.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h| 22 +-
 include/hw/core/cpu.h | 14 --
 target/alpha/cpu.h|  1 -
 target/arm/cpu.h  |  1 -
 target/avr/cpu.h  |  1 -
 target/cris/cpu.h |  1 -
 target/hexagon/cpu.h  |  2 +-
 target/hppa/cpu.h |  1 -
 target/i386/cpu.h |  1 -
 target/loongarch/cpu.h|  1 -
 target/m68k/cpu.h |  1 -
 target/microblaze/cpu.h   |  6 +++---
 target/mips/cpu.h |  4 ++--
 target/nios2/cpu.h|  1 -
 target/openrisc/cpu.h |  1 -
 target/ppc/cpu.h  |  1 -
 target/riscv/cpu.h|  2 +-
 target/rx/cpu.h   |  1 -
 target/s390x/cpu.h|  1 -
 target/sh4/cpu.h  |  1 -
 target/sparc/cpu.h|  1 -
 target/tricore/cpu.h  |  1 -
 target/xtensa/cpu.h   |  3 +--
 accel/tcg/translate-all.c |  4 ++--
 accel/tcg/translator.c|  8 
 25 files changed, 35 insertions(+), 46 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 86a7452b0d..c3c78ed8ab 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -434,9 +434,13 @@ void tcg_exec_unrealizefn(CPUState *cpu);
 static inline void cpu_set_cpustate_pointers(ArchCPU *cpu)
 {
 cpu->parent_obj.env_ptr = &cpu->env;
-cpu->parent_obj.icount_decr_ptr = &cpu->neg.icount_decr;
+cpu->parent_obj.icount_decr_ptr = &cpu->parent_obj.neg.icount_decr;
 }
 
+/* Validate correct placement of CPUArchState. */
+QEMU_BUILD_BUG_ON(offsetof(ArchCPU, parent_obj) != 0);
+QEMU_BUILD_BUG_ON(offsetof(ArchCPU, env) != sizeof(CPUState));
+
 /**
  * env_archcpu(env)
  * @env: The architecture environment
@@ -445,7 +449,7 @@ static inline void cpu_set_cpustate_pointers(ArchCPU *cpu)
  */
 static inline ArchCPU *env_archcpu(CPUArchState *env)
 {
-return container_of(env, ArchCPU, env);
+return (void *)env - sizeof(CPUState);
 }
 
 /**
@@ -456,15 +460,9 @@ static inline ArchCPU *env_archcpu(CPUArchState *env)
  */
 static inline CPUState *env_cpu(CPUArchState *env)
 {
-return &env_archcpu(env)->parent_obj;
+return (void *)env - sizeof(CPUState);
 }
 
-/*
- * Validate placement of CPUNegativeOffsetState.
- */
-QEMU_BUILD_BUG_ON(offsetof(ArchCPU, env) - offsetof(ArchCPU, neg) >=
-  sizeof(CPUNegativeOffsetState) + __alignof(CPUArchState));
-
 /**
  * env_neg(env)
  * @env: The architecture environment
@@ -473,8 +471,7 @@ QEMU_BUILD_BUG_ON(offsetof(ArchCPU, env) - 
offsetof(ArchCPU, neg) >=
  */
 static inline CPUNegativeOffsetState *env_neg(CPUArchState *env)
 {
-ArchCPU *arch_cpu = container_of(env, ArchCPU, env);
-return &arch_cpu->neg;
+return (void *)env - sizeof(CPUNegativeOffsetState);
 }
 
 /**
@@ -485,8 +482,7 @@ static inline CPUNegativeOffsetState *env_neg(CPUArchState 
*env)
  */
 static inline CPUNegativeOffsetState *cpu_neg(CPUState *cpu)
 {
-ArchCPU *arch_cpu = container_of(cpu, ArchCPU, parent_obj);
-return &arch_cpu->neg;
+return &cpu->neg;
 }
 
 /**
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 861664a5df..1f289136ec 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -345,8 +345,8 @@ typedef union IcountDecr {
 } IcountDecr;
 
 /*
- * This structure must be placed in ArchCPU immediately
- * before CPUArchState, as a field named "neg".
+ * Elements of CPUState most efficiently accessed from CPUArchState,
+ * via small negative offsets.
  */
 typedef struct CPUNegativeOffsetState {
 CPUTLB tlb;
@@ -584,8 +584,18 @@ struct CPUState {
 
 /* track IOMMUs whose translations we've cached in the TCG TLB */
 GArray *iommu_notifiers;
+
+/*
+ * MUST BE LAST in order to minimize the displacement to CPUArchState.
+ * This will be verified within exec/cpu-all.h.
+ */
+CPUNegativeOffsetState neg;
 };
 
+/* Validate placement of CPUNegativeOffsetState. */
+QEMU_BUILD_BUG_ON(offsetof(CPUState, neg) + sizeof(CPUNegativeOffsetState)
+  != sizeof(CPUState));
+
 typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
 extern CPUTailQ cpus;
 
diff --git a/target/alpha/cpu.h b/target/alpha/cpu.h
index 13306665af..e2a467ec17 100644
--- a/target/alpha/cpu.h
+++ b/target/alpha/cpu.h
@@ -263,7 +263,6 @@ struct ArchCPU {
 CPUState parent_obj;
 /*< public >*/
 
-CPUNegativeOffsetState neg;
 CPUAlphaState env;
 
 /* This alarm doesn't exist in real hardware; we wish it did.  */
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f2e3dc49a6..51963b6545 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -856,7 +856,6 @@ struct ArchCPU {
 CPUState parent_obj;
 /*< public >*/
 
-CPUNegativeOffsetState neg;
 CPUARMState env;
 
 /* Coprocessor information */
diff --git a/target/avr/cpu.h b/target/avr/cpu.h
index 7225174668..4ce22d8e4f 100644
--- a/target/avr/cpu.h
+++

[PATCH v2 04/24] target/*: Use alignof not alignof__

2023-09-13 Thread Richard Henderson

No functional change, just using a common spelling.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c  | 2 +-
 target/ppc/cpu_init.c | 2 +-
 target/s390x/cpu.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index b9e09a702d..906eb981b0 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2440,7 +2440,7 @@ static const TypeInfo arm_cpu_type_info = {
 .name = TYPE_ARM_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(ARMCPU),
-.instance_align = __alignof__(ARMCPU),
+.instance_align = __alignof(ARMCPU),
 .instance_init = arm_cpu_initfn,
 .instance_finalize = arm_cpu_finalizefn,
 .abstract = true,
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 7ab5ee92d9..7830640f01 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7413,7 +7413,7 @@ static const TypeInfo ppc_cpu_type_info = {
 .name = TYPE_POWERPC_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(PowerPCCPU),
-.instance_align = __alignof__(PowerPCCPU),
+.instance_align = __alignof(PowerPCCPU),
 .instance_init = ppc_cpu_instance_init,
 .instance_finalize = ppc_cpu_instance_finalize,
 .abstract = true,
diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index df167493c3..d9625bc266 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -363,7 +363,7 @@ static const TypeInfo s390_cpu_type_info = {
 .name = TYPE_S390_CPU,
 .parent = TYPE_CPU,
 .instance_size = sizeof(S390CPU),
-.instance_align = __alignof__(S390CPU),
+.instance_align = __alignof(S390CPU),
 .instance_init = s390_cpu_initfn,
 
 #ifndef CONFIG_USER_ONLY
-- 
2.34.1

[PATCH v2 07/24] accel/tcg: Validate placement of CPUNegativeOffsetState

2023-09-13 Thread Richard Henderson

Verify that the distance between CPUNegativeOffsetState and
CPUArchState is no greater than any alignment requirements.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index c2c62160c6..86a7452b0d 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -459,6 +459,12 @@ static inline CPUState *env_cpu(CPUArchState *env)
 return &env_archcpu(env)->parent_obj;
 }
 
+/*
+ * Validate placement of CPUNegativeOffsetState.
+ */
+QEMU_BUILD_BUG_ON(offsetof(ArchCPU, env) - offsetof(ArchCPU, neg) >=
+  sizeof(CPUNegativeOffsetState) + __alignof(CPUArchState));
+
 /**
  * env_neg(env)
  * @env: The architecture environment
-- 
2.34.1

[PATCH v2 18/24] accel/tcg: Modify probe_access_internal() to use CPUState

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

probe_access_internal() is changed to instead take the generic CPUState
over CPUArchState, in order to lessen the target-specific coupling of
cputlb.c. Note: probe_access*() also don't need the full CPUArchState,
but aren't touched in this patch as they are target-facing.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-5-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Use cpu->neg.tlb instead of cpu_tlb()]
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 08df68f03a..f3ac87050e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1504,27 +1504,24 @@ static void notdirty_write(CPUState *cpu, vaddr 
mem_vaddr, unsigned size,
 }
 }
 
-static int probe_access_internal(CPUArchState *env, vaddr addr,
+static int probe_access_internal(CPUState *cpu, vaddr addr,
  int fault_size, MMUAccessType access_type,
  int mmu_idx, bool nonfault,
  void **phost, CPUTLBEntryFull **pfull,
  uintptr_t retaddr, bool check_mem_cbs)
 {
-uintptr_t index = tlb_index(env_cpu(env), mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env_cpu(env), mmu_idx, addr);
+uintptr_t index = tlb_index(cpu, mmu_idx, addr);
+CPUTLBEntry *entry = tlb_entry(cpu, mmu_idx, addr);
 uint64_t tlb_addr = tlb_read_idx(entry, access_type);
 vaddr page_addr = addr & TARGET_PAGE_MASK;
 int flags = TLB_FLAGS_MASK & ~TLB_FORCE_SLOW;
-bool force_mmio = check_mem_cbs && 
cpu_plugin_mem_cbs_enabled(env_cpu(env));
+bool force_mmio = check_mem_cbs && cpu_plugin_mem_cbs_enabled(cpu);
 CPUTLBEntryFull *full;
 
 if (!tlb_hit_page(tlb_addr, page_addr)) {
-if (!victim_tlb_hit(env_cpu(env), mmu_idx, index,
-access_type, page_addr)) {
-CPUState *cs = env_cpu(env);
-
-if (!cs->cc->tcg_ops->tlb_fill(cs, addr, fault_size, access_type,
-   mmu_idx, nonfault, retaddr)) {
+if (!victim_tlb_hit(cpu, mmu_idx, index, access_type, page_addr)) {
+if (!cpu->cc->tcg_ops->tlb_fill(cpu, addr, fault_size, access_type,
+mmu_idx, nonfault, retaddr)) {
 /* Non-faulting page table read failed.  */
 *phost = NULL;
 *pfull = NULL;
@@ -1532,8 +1529,8 @@ static int probe_access_internal(CPUArchState *env, vaddr 
addr,
 }
 
 /* TLB resize via tlb_fill may have moved the entry.  */
-index = tlb_index(env_cpu(env), mmu_idx, addr);
-entry = tlb_entry(env_cpu(env), mmu_idx, addr);
+index = tlb_index(cpu, mmu_idx, addr);
+entry = tlb_entry(cpu, mmu_idx, addr);
 
 /*
  * With PAGE_WRITE_INV, we set TLB_INVALID_MASK immediately,
@@ -1546,7 +1543,7 @@ static int probe_access_internal(CPUArchState *env, vaddr 
addr,
 }
 flags &= tlb_addr;
 
-*pfull = full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
+*pfull = full = &cpu->neg.tlb.d[mmu_idx].fulltlb[index];
 flags |= full->slow_flags[access_type];
 
 /* Fold all "mmio-like" bits into TLB_MMIO.  This is not RAM.  */
@@ -1567,8 +1564,9 @@ int probe_access_full(CPUArchState *env, vaddr addr, int 
size,
   bool nonfault, void **phost, CPUTLBEntryFull **pfull,
   uintptr_t retaddr)
 {
-int flags = probe_access_internal(env, addr, size, access_type, mmu_idx,
-  nonfault, phost, pfull, retaddr, true);
+int flags = probe_access_internal(env_cpu(env), addr, size, access_type,
+  mmu_idx, nonfault, phost, pfull, retaddr,
+  true);
 
 /* Handle clean RAM pages.  */
 if (unlikely(flags & TLB_NOTDIRTY)) {
@@ -1590,8 +1588,8 @@ int probe_access_full_mmu(CPUArchState *env, vaddr addr, 
int size,
 phost = phost ? phost : &discard_phost;
 pfull = pfull ? pfull : &discard_tlb;
 
-int flags = probe_access_internal(env, addr, size, access_type, mmu_idx,
-  true, phost, pfull, 0, false);
+int flags = probe_access_internal(env_cpu(env), addr, size, access_type,
+  mmu_idx, true, phost, pfull, 0, false);
 
 /* Handle clean RAM pages.  */
 if (unlikely(flags & TLB_NOTDIRTY)) {
@@ -1611,8 +1609,9 @@ int probe_access_flags(CPUArchState *env, vaddr addr, int 
size,
 
 g_assert(-(addr | TARGET_PAGE_MASK) >= size);
 
-flags = probe_access_internal(env, addr, size, access_type, mmu_idx,
-  nonfault, phost, &full, retaddr, true);
+flags = probe_access_internal(env_cpu(env), ad

[PATCH v2 24/24] accel/tcg: move ld/st helpers to ldst_common.c.inc

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

A large chunk of ld/st functions are moved from cputlb.c and user-exec.c
to ldst_common.c.inc as their implementation is the same between both
modes.

Eventually, ldst_common.c.inc could be compiled into a separate
target-specific compilation unit, and be linked in with the targets.
Keeping CPUArchState usage out of cputlb.c (CPUArchState is primarily
used to access the mmu index in these functions).

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-12-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c  | 214 --
 accel/tcg/user-exec.c   | 193 ---
 accel/tcg/ldst_common.c.inc | 225 
 3 files changed, 225 insertions(+), 407 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cbab7e2648..31472fb0b2 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2436,13 +2436,6 @@ static uint8_t do_ld1_mmu(CPUState *cpu, vaddr addr, 
MemOpIdx oi,
 return do_ld_1(cpu, &l.page[0], l.mmu_idx, access_type, ra);
 }
 
-tcg_target_ulong helper_ldub_mmu(CPUArchState *env, uint64_t addr,
- MemOpIdx oi, uintptr_t retaddr)
-{
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
-return do_ld1_mmu(env_cpu(env), addr, oi, retaddr, MMU_DATA_LOAD);
-}
-
 static uint16_t do_ld2_mmu(CPUState *cpu, vaddr addr, MemOpIdx oi,
uintptr_t ra, MMUAccessType access_type)
 {
@@ -2468,13 +2461,6 @@ static uint16_t do_ld2_mmu(CPUState *cpu, vaddr addr, 
MemOpIdx oi,
 return ret;
 }
 
-tcg_target_ulong helper_lduw_mmu(CPUArchState *env, uint64_t addr,
- MemOpIdx oi, uintptr_t retaddr)
-{
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_16);
-return do_ld2_mmu(env_cpu(env), addr, oi, retaddr, MMU_DATA_LOAD);
-}
-
 static uint32_t do_ld4_mmu(CPUState *cpu, vaddr addr, MemOpIdx oi,
uintptr_t ra, MMUAccessType access_type)
 {
@@ -2496,13 +2482,6 @@ static uint32_t do_ld4_mmu(CPUState *cpu, vaddr addr, 
MemOpIdx oi,
 return ret;
 }
 
-tcg_target_ulong helper_ldul_mmu(CPUArchState *env, uint64_t addr,
- MemOpIdx oi, uintptr_t retaddr)
-{
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_32);
-return do_ld4_mmu(env_cpu(env), addr, oi, retaddr, MMU_DATA_LOAD);
-}
-
 static uint64_t do_ld8_mmu(CPUState *cpu, vaddr addr, MemOpIdx oi,
uintptr_t ra, MMUAccessType access_type)
 {
@@ -2524,36 +2503,6 @@ static uint64_t do_ld8_mmu(CPUState *cpu, vaddr addr, 
MemOpIdx oi,
 return ret;
 }
 
-uint64_t helper_ldq_mmu(CPUArchState *env, uint64_t addr,
-MemOpIdx oi, uintptr_t retaddr)
-{
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_64);
-return do_ld8_mmu(env_cpu(env), addr, oi, retaddr, MMU_DATA_LOAD);
-}
-
-/*
- * Provide signed versions of the load routines as well.  We can of course
- * avoid this for 64-bit data, or for 32-bit data on 32-bit host.
- */
-
-tcg_target_ulong helper_ldsb_mmu(CPUArchState *env, uint64_t addr,
- MemOpIdx oi, uintptr_t retaddr)
-{
-return (int8_t)helper_ldub_mmu(env, addr, oi, retaddr);
-}
-
-tcg_target_ulong helper_ldsw_mmu(CPUArchState *env, uint64_t addr,
- MemOpIdx oi, uintptr_t retaddr)
-{
-return (int16_t)helper_lduw_mmu(env, addr, oi, retaddr);
-}
-
-tcg_target_ulong helper_ldsl_mmu(CPUArchState *env, uint64_t addr,
- MemOpIdx oi, uintptr_t retaddr)
-{
-return (int32_t)helper_ldul_mmu(env, addr, oi, retaddr);
-}
-
 static Int128 do_ld16_mmu(CPUState *cpu, vaddr addr,
   MemOpIdx oi, uintptr_t ra)
 {
@@ -2619,81 +2568,6 @@ static Int128 do_ld16_mmu(CPUState *cpu, vaddr addr,
 return ret;
 }
 
-Int128 helper_ld16_mmu(CPUArchState *env, uint64_t addr,
-   uint32_t oi, uintptr_t retaddr)
-{
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_128);
-return do_ld16_mmu(env_cpu(env), addr, oi, retaddr);
-}
-
-Int128 helper_ld_i128(CPUArchState *env, uint64_t addr, uint32_t oi)
-{
-return helper_ld16_mmu(env, addr, oi, GETPC());
-}
-
-/*
- * Load helpers for cpu_ldst.h.
- */
-
-static void plugin_load_cb(CPUArchState *env, abi_ptr addr, MemOpIdx oi)
-{
-qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R);
-}
-
-uint8_t cpu_ldb_mmu(CPUArchState *env, abi_ptr addr, MemOpIdx oi, uintptr_t ra)
-{
-uint8_t ret;
-
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_UB);
-ret = do_ld1_mmu(env_cpu(env), addr, oi, ra, MMU_DATA_LOAD);
-plugin_load_cb(env, addr, oi);
-return ret;
-}
-
-uint16_t cpu_ldw_mmu(CPUArchState *env, abi_ptr addr,
- MemOpIdx oi, uintptr_t ra)
-{
-uint16_t ret;
-
-tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_16);
-

[PATCH v2 09/24] accel/tcg: Remove CPUState.icount_decr_ptr

2023-09-13 Thread Richard Henderson

We can now access icount_decr directly.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h | 1 -
 include/hw/core/cpu.h  | 2 --
 hw/core/cpu-common.c   | 4 ++--
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index c3c78ed8ab..3b01e4ee25 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -434,7 +434,6 @@ void tcg_exec_unrealizefn(CPUState *cpu);
 static inline void cpu_set_cpustate_pointers(ArchCPU *cpu)
 {
 cpu->parent_obj.env_ptr = &cpu->env;
-cpu->parent_obj.icount_decr_ptr = &cpu->parent_obj.neg.icount_decr;
 }
 
 /* Validate correct placement of CPUArchState. */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 1f289136ec..44955af3bc 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -440,7 +440,6 @@ struct qemu_work_item;
  * @as: Pointer to the first AddressSpace, for the convenience of targets which
  *  only have a single AddressSpace
  * @env_ptr: Pointer to subclass-specific CPUArchState field.
- * @icount_decr_ptr: Pointer to IcountDecr field within subclass.
  * @gdb_regs: Additional GDB registers.
  * @gdb_num_regs: Number of total registers accessible to GDB.
  * @gdb_num_g_regs: Number of registers in GDB 'g' packets.
@@ -512,7 +511,6 @@ struct CPUState {
 MemoryRegion *memory;
 
 CPUArchState *env_ptr;
-IcountDecr *icount_decr_ptr;
 
 CPUJumpCache *tb_jmp_cache;
 
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index ced66c2b34..08d5bbc873 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -86,7 +86,7 @@ void cpu_exit(CPUState *cpu)
 qatomic_set(&cpu->exit_request, 1);
 /* Ensure cpu_exec will see the exit request after TCG has exited.  */
 smp_wmb();
-qatomic_set(&cpu->icount_decr_ptr->u16.high, -1);
+qatomic_set(&cpu->neg.icount_decr.u16.high, -1);
 }
 
 static int cpu_common_gdb_read_register(CPUState *cpu, GByteArray *buf, int 
reg)
@@ -130,7 +130,7 @@ static void cpu_common_reset_hold(Object *obj)
 cpu->halted = cpu->start_powered_off;
 cpu->mem_io_pc = 0;
 cpu->icount_extra = 0;
-qatomic_set(&cpu->icount_decr_ptr->u32, 0);
+qatomic_set(&cpu->neg.icount_decr.u32, 0);
 cpu->can_do_io = 1;
 cpu->exception_index = -1;
 cpu->crash_occurred = false;
-- 
2.34.1

[PATCH v2 03/24] qom: Propagate alignment through type system

2023-09-13 Thread Richard Henderson

Propagate alignment just like size.  This is required in
order to get the correct alignment on most cpu subclasses
where the size and alignment is only specified for the
base cpu type.

Signed-off-by: Richard Henderson 
---
 qom/object.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/qom/object.c b/qom/object.c
index e25f1e96db..8557fe8e4e 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -220,6 +220,19 @@ static size_t type_object_get_size(TypeImpl *ti)
 return 0;
 }
 
+static size_t type_object_get_align(TypeImpl *ti)
+{
+if (ti->instance_align) {
+return ti->instance_align;
+}
+
+if (type_has_parent(ti)) {
+return type_object_get_align(type_get_parent(ti));
+}
+
+return 0;
+}
+
 size_t object_type_get_instance_size(const char *typename)
 {
 TypeImpl *type = type_get_by_name(typename);
@@ -293,6 +306,7 @@ static void type_initialize(TypeImpl *ti)
 
 ti->class_size = type_class_get_size(ti);
 ti->instance_size = type_object_get_size(ti);
+ti->instance_align = type_object_get_align(ti);
 /* Any type with zero instance_size is implicitly abstract.
  * This means interface types are all abstract.
  */
-- 
2.34.1

[PATCH v2 11/24] accel/tcg: Remove cpu_neg()

2023-09-13 Thread Richard Henderson

Now that CPUNegativeOffsetState is part of CPUState,
we can reference it directly.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h   | 11 ---
 include/exec/exec-all.h  |  2 +-
 accel/tcg/cpu-exec.c | 14 +++---
 accel/tcg/tcg-accel-ops-icount.c |  6 +++---
 accel/tcg/tcg-accel-ops.c|  2 +-
 accel/tcg/translate-all.c|  6 +++---
 softmmu/icount.c |  2 +-
 7 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 3b01e4ee25..ae0cb2ce50 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -473,17 +473,6 @@ static inline CPUNegativeOffsetState *env_neg(CPUArchState 
*env)
 return (void *)env - sizeof(CPUNegativeOffsetState);
 }
 
-/**
- * cpu_neg(cpu)
- * @cpu: The generic CPUState
- *
- * Return the CPUNegativeOffsetState associated with the cpu.
- */
-static inline CPUNegativeOffsetState *cpu_neg(CPUState *cpu)
-{
-return &cpu->neg;
-}
-
 /**
  * env_tlb(env)
  * @env: The architecture environment
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b2f5cd4c2a..2e4d337805 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -71,7 +71,7 @@ G_NORETURN void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t 
pc);
  */
 static inline bool cpu_loop_exit_requested(CPUState *cpu)
 {
-return (int32_t)qatomic_read(&cpu_neg(cpu)->icount_decr.u32) < 0;
+return (int32_t)qatomic_read(&cpu->neg.icount_decr.u32) < 0;
 }
 
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index b01e3e5dc8..4abbd037f3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -73,7 +73,7 @@ static void align_clocks(SyncClocks *sc, CPUState *cpu)
 return;
 }
 
-cpu_icount = cpu->icount_extra + cpu_neg(cpu)->icount_decr.u16.low;
+cpu_icount = cpu->icount_extra + cpu->neg.icount_decr.u16.low;
 sc->diff_clk += icount_to_ns(sc->last_cpu_icount - cpu_icount);
 sc->last_cpu_icount = cpu_icount;
 
@@ -124,7 +124,7 @@ static void init_delay_params(SyncClocks *sc, CPUState *cpu)
 sc->realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
 sc->diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - sc->realtime_clock;
 sc->last_cpu_icount
-= cpu->icount_extra + cpu_neg(cpu)->icount_decr.u16.low;
+= cpu->icount_extra + cpu->neg.icount_decr.u16.low;
 if (sc->diff_clk < max_delay) {
 max_delay = sc->diff_clk;
 }
@@ -717,7 +717,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 if (cpu->exception_index < 0) {
 #ifndef CONFIG_USER_ONLY
 if (replay_has_exception()
-&& cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0) {
+&& cpu->neg.icount_decr.u16.low + cpu->icount_extra == 0) {
 /* Execute just one insn to trigger exception pending in the log */
 cpu->cflags_next_tb = (curr_cflags(cpu) & ~CF_USE_ICOUNT)
 | CF_NOIRQ | 1;
@@ -807,7 +807,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
  * Ensure zeroing happens before reading cpu->exit_request or
  * cpu->interrupt_request (see also smp_wmb in cpu_exit())
  */
-qatomic_set_mb(&cpu_neg(cpu)->icount_decr.u16.high, 0);
+qatomic_set_mb(&cpu->neg.icount_decr.u16.high, 0);
 
 if (unlikely(qatomic_read(&cpu->interrupt_request))) {
 int interrupt_request;
@@ -898,7 +898,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 if (unlikely(qatomic_read(&cpu->exit_request))
 || (icount_enabled()
 && (cpu->cflags_next_tb == -1 || cpu->cflags_next_tb & 
CF_USE_ICOUNT)
-&& cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {
+&& cpu->neg.icount_decr.u16.low + cpu->icount_extra == 0)) {
 qatomic_set(&cpu->exit_request, 0);
 if (cpu->exception_index == -1) {
 cpu->exception_index = EXCP_INTERRUPT;
@@ -923,7 +923,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 }
 
 *last_tb = NULL;
-insns_left = qatomic_read(&cpu_neg(cpu)->icount_decr.u32);
+insns_left = qatomic_read(&cpu->neg.icount_decr.u32);
 if (insns_left < 0) {
 /* Something asked us to stop executing chained TBs; just
  * continue round the main loop. Whatever requested the exit
@@ -942,7 +942,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 icount_update(cpu);
 /* Refill decrementer and continue execution.  */
 insns_left = MIN(0x, cpu->icount_budget);
-cpu_neg(cpu)->icount_decr.u16.low = insns_left;
+cpu->neg.icount_decr.u16.low = insns_left;
 cpu->icount_extra = cpu->icount_budget - insns_left;
 
 /*
diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
index 0af643b217..b25685fb71 100644
--- a/accel/tcg/tcg-accel-ops-icount.c
+++ b/a

[PATCH v2 22/24] accel/tcg: Remove env_tlb()

2023-09-13 Thread Richard Henderson

From: Anton Johansson 

The function is no longer used to access the TLB,
and has been replaced by cpu->neg.tlb.

Signed-off-by: Anton Johansson 
Message-Id: <20230912153428.17816-9-a...@rev.ng>
Reviewed-by: Richard Henderson 
[rth: Merge comment update patch]
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h   | 11 ---
 tcg/aarch64/tcg-target.c.inc |  2 +-
 tcg/arm/tcg-target.c.inc |  2 +-
 3 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index af9516654a..b37715c27d 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -451,15 +451,4 @@ static inline CPUState *env_cpu(CPUArchState *env)
 return (void *)env - sizeof(CPUState);
 }
 
-/**
- * env_tlb(env)
- * @env: The architecture environment
- *
- * Return the CPUTLB state associated with the environment.
- */
-static inline CPUTLB *env_tlb(CPUArchState *env)
-{
-return &env_cpu(env)->neg.tlb;
-}
-
 #endif /* CPU_ALL_H */
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 0931a69448..0db4c907e3 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1676,7 +1676,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, 
HostAddress *h,
 mask_type = (s->page_bits + s->tlb_dyn_max_bits > 32
  ? TCG_TYPE_I64 : TCG_TYPE_I32);
 
-/* Load env_tlb(env)->f[mmu_idx].{mask,table} into {tmp0,tmp1}. */
+/* Load cpu->neg.tlb.f[mmu_idx].{mask,table} into {tmp0,tmp1}. */
 QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, mask) != 0);
 QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, table) != 8);
 tcg_out_insn(s, 3314, LDP, TCG_REG_TMP0, TCG_REG_TMP1, TCG_AREG0,
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index acb5f23b54..75978018c9 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1420,7 +1420,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, 
HostAddress *h,
 ldst->addrlo_reg = addrlo;
 ldst->addrhi_reg = addrhi;
 
-/* Load env_tlb(env)->f[mmu_idx].{mask,table} into {r0,r1}.  */
+/* Load cpu->neg.tlb.f[mmu_idx].{mask,table} into {r0,r1}.  */
 QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, mask) != 0);
 QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, table) != 4);
 tcg_out_ldrd_8(s, COND_AL, TCG_REG_R0, TCG_AREG0, fast_off);
-- 
2.34.1

[PATCH v6 53/57] target/loongarch: Implement xvpack xvpick xvilv{l/h}

2023-09-13 Thread Song Gao

This patch includes:
- XVPACK{EV/OD}.{B/H/W/D};
- XVPICK{EV/OD}.{B/H/W/D};
- XVILV{L/H}.{B/H/W/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  27 
 target/loongarch/disas.c|  27 
 target/loongarch/vec_helper.c   | 138 +++-
 target/loongarch/insn_trans/trans_vec.c.inc |  24 
 4 files changed, 156 insertions(+), 60 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 74383ba3bc..a325b861c1 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -2012,3 +2012,30 @@ xvpickve_d   0111 0111 00111 110 .. . .  
 @vv_ui2
 
 xvbsll_v 0111 01101000 11100 . . .@vv_ui5
 xvbsrl_v 0111 01101000 11101 . . .@vv_ui5
+
+xvpackev_b   0111 01010001 01100 . . .@vvv
+xvpackev_h   0111 01010001 01101 . . .@vvv
+xvpackev_w   0111 01010001 01110 . . .@vvv
+xvpackev_d   0111 01010001 0 . . .@vvv
+xvpackod_b   0111 01010001 1 . . .@vvv
+xvpackod_h   0111 01010001 10001 . . .@vvv
+xvpackod_w   0111 01010001 10010 . . .@vvv
+xvpackod_d   0111 01010001 10011 . . .@vvv
+
+xvpickev_b   0111 01010001 11100 . . .@vvv
+xvpickev_h   0111 01010001 11101 . . .@vvv
+xvpickev_w   0111 01010001 0 . . .@vvv
+xvpickev_d   0111 01010001 1 . . .@vvv
+xvpickod_b   0111 01010010 0 . . .@vvv
+xvpickod_h   0111 01010010 1 . . .@vvv
+xvpickod_w   0111 01010010 00010 . . .@vvv
+xvpickod_d   0111 01010010 00011 . . .@vvv
+
+xvilvl_b 0111 01010001 10100 . . .@vvv
+xvilvl_h 0111 01010001 10101 . . .@vvv
+xvilvl_w 0111 01010001 10110 . . .@vvv
+xvilvl_d 0111 01010001 10111 . . .@vvv
+xvilvh_b 0111 01010001 11000 . . .@vvv
+xvilvh_h 0111 01010001 11001 . . .@vvv
+xvilvh_w 0111 01010001 11010 . . .@vvv
+xvilvh_d 0111 01010001 11011 . . .@vvv
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d091402db6..74ae916a10 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2547,3 +2547,30 @@ INSN_LASX(xvpickve_d,vv_i)
 
 INSN_LASX(xvbsll_v,  vv_i)
 INSN_LASX(xvbsrl_v,  vv_i)
+
+INSN_LASX(xvpackev_b,vvv)
+INSN_LASX(xvpackev_h,vvv)
+INSN_LASX(xvpackev_w,vvv)
+INSN_LASX(xvpackev_d,vvv)
+INSN_LASX(xvpackod_b,vvv)
+INSN_LASX(xvpackod_h,vvv)
+INSN_LASX(xvpackod_w,vvv)
+INSN_LASX(xvpackod_d,vvv)
+
+INSN_LASX(xvpickev_b,vvv)
+INSN_LASX(xvpickev_h,vvv)
+INSN_LASX(xvpickev_w,vvv)
+INSN_LASX(xvpickev_d,vvv)
+INSN_LASX(xvpickod_b,vvv)
+INSN_LASX(xvpickod_h,vvv)
+INSN_LASX(xvpickod_w,vvv)
+INSN_LASX(xvpickod_d,vvv)
+
+INSN_LASX(xvilvl_b,  vvv)
+INSN_LASX(xvilvl_h,  vvv)
+INSN_LASX(xvilvl_w,  vvv)
+INSN_LASX(xvilvl_d,  vvv)
+INSN_LASX(xvilvh_b,  vvv)
+INSN_LASX(xvilvh_h,  vvv)
+INSN_LASX(xvilvh_w,  vvv)
+INSN_LASX(xvilvh_d,  vvv)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 26d48ed921..2bbaee628b 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3241,12 +3241,13 @@ XVPICKVE(xvpickve_d, D, 64, 0x3)
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {  \
 int i; \
-VReg temp; \
+VReg temp = {};\
 VReg *Vd = (VReg *)vd; \
 VReg *Vj = (VReg *)vj; \
 VReg *Vk = (VReg *)vk; \
+int oprsz = simd_oprsz(desc);  \
\
-for (i = 0; i < LSX_LEN/BIT; i++) {\
+for (i = 0; i < oprsz / (BIT / 8); i++) {  \
 temp.E(2 * i + 1) = Vj->E(2 * i);  \
 temp.E(2 *i) = Vk->E(2 * i);   \
 }  \
@@ -3262,12 +3263,13 @@ VPACKEV(vpackev_d, 128, D)
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {  \
 int i;

[PATCH v6 14/57] target/loongarch: Implement xvadd/xvsub

2023-09-13 Thread Song Gao

This patch includes:
- XVADD.{B/H/W/D/Q};
- XVSUB.{B/H/W/D/Q}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  14 +++
 target/loongarch/disas.c|  23 +
 target/loongarch/translate.c|   4 +
 target/loongarch/insn_trans/trans_vec.c.inc | 107 +---
 4 files changed, 109 insertions(+), 39 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c9c3bc2c73..bcc18fb6c5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1296,3 +1296,17 @@ vstelm_d 0011 00010001 0 .  . .  
 @vr_i8i1
 vstelm_w 0011 00010010 ..  . .@vr_i8i2
 vstelm_h 0011 0001010 ...  . .@vr_i8i3
 vstelm_b 0011 000110   . .@vr_i8i4
+
+#
+# LoongArch LASX instructions
+#
+xvadd_b  0111 0100 10100 . . .@vvv
+xvadd_h  0111 0100 10101 . . .@vvv
+xvadd_w  0111 0100 10110 . . .@vvv
+xvadd_d  0111 0100 10111 . . .@vvv
+xvadd_q  0111 01010010 11010 . . .@vvv
+xvsub_b  0111 0100 11000 . . .@vvv
+xvsub_h  0111 0100 11001 . . .@vvv
+xvsub_w  0111 0100 11010 . . .@vvv
+xvsub_d  0111 0100 11011 . . .@vvv
+xvsub_q  0111 01010010 11011 . . .@vvv
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5c402d944d..d8b62ba532 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1695,3 +1695,26 @@ INSN_LSX(vstelm_d, vr_ii)
 INSN_LSX(vstelm_w, vr_ii)
 INSN_LSX(vstelm_h, vr_ii)
 INSN_LSX(vstelm_b, vr_ii)
+
+#define INSN_LASX(insn, type)   \
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{   \
+output_##type ## _x(ctx, a, #insn); \
+return true;\
+}
+
+static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
+{
+output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
+}
+
+INSN_LASX(xvadd_b,   vvv)
+INSN_LASX(xvadd_h,   vvv)
+INSN_LASX(xvadd_w,   vvv)
+INSN_LASX(xvadd_d,   vvv)
+INSN_LASX(xvadd_q,   vvv)
+INSN_LASX(xvsub_b,   vvv)
+INSN_LASX(xvsub_h,   vvv)
+INSN_LASX(xvsub_w,   vvv)
+INSN_LASX(xvsub_d,   vvv)
+INSN_LASX(xvsub_q,   vvv)
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 7f3958a1f4..10e2fe8ff6 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -124,6 +124,10 @@ static void 
loongarch_tr_init_disas_context(DisasContextBase *dcbase,
 ctx->vl = LSX_LEN;
 }
 
+if (FIELD_EX64(env->cpucfg[2], CPUCFG2, LASX)) {
+ctx->vl = LASX_LEN;
+}
+
 ctx->la64 = is_la64(env);
 ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0;
 
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index b5ca65c250..3252e1d809 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -193,6 +193,10 @@ static bool gvec_vvv_vl(DisasContext *ctx, arg_vvv *a,
 uint32_t vj_ofs = vec_full_offset(a->vj);
 uint32_t vk_ofs = vec_full_offset(a->vk);
 
+if (!check_vec(ctx, oprsz)) {
+return true;
+}
+
 func(mop, vd_ofs, vj_ofs, vk_ofs, oprsz, ctx->vl / 8);
 return true;
 }
@@ -201,13 +205,15 @@ static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp 
mop,
  void (*func)(unsigned, uint32_t, uint32_t,
   uint32_t, uint32_t, uint32_t))
 {
-if (!check_vec(ctx, 16)) {
-return true;
-}
-
 return gvec_vvv_vl(ctx, a, 16, mop, func);
 }
 
+static bool gvec_xxx(DisasContext *ctx, arg_vvv *a, MemOp mop,
+ void (*func)(unsigned, uint32_t, uint32_t,
+  uint32_t, uint32_t, uint32_t))
+{
+return gvec_vvv_vl(ctx, a, 32, mop, func);
+}
 
 static bool gvec_vv_vl(DisasContext *ctx, arg_vv *a,
uint32_t oprsz, MemOp mop,
@@ -279,47 +285,70 @@ TRANS(vadd_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_add)
 TRANS(vadd_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_add)
 TRANS(vadd_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_add)
 TRANS(vadd_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_add)
+TRANS(xvadd_b, LASX, gvec_xxx, MO_8, tcg_gen_gvec_add)
+TRANS(xvadd_h, LASX, gvec_xxx, MO_16, tcg_gen_gvec_add)
+TRANS(xvadd_w, LASX, gvec_xxx, MO_32, tcg_gen_gvec_add)
+TRANS(xvadd_d, LASX, gvec_xxx, MO_64, tcg_gen_gvec_add)
+
+static bool gen_vaddsub_q_vl(DisasContext *ctx, arg_vvv *a,

[PATCH v6 29/57] target/loongarch: Implement xvexth

2023-09-13 Thread Song Gao

This patch includes:
- XVEXTH.{H.B/W.H/D.W/Q.D};
- XVEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  9 ++
 target/loongarch/disas.c|  9 ++
 target/loongarch/vec_helper.c   | 36 ++---
 target/loongarch/insn_trans/trans_vec.c.inc | 21 +---
 4 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index e366cf7615..7491f295a5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1571,6 +1571,15 @@ xvsat_hu 0111 01110010 1 1  . .  
 @vv_ui4
 xvsat_wu 0111 01110010 10001 . . .@vv_ui5
 xvsat_du 0111 01110010 1001 .. . .@vv_ui6
 
+xvexth_h_b   0111 01101001 11101 11000 . .@vv
+xvexth_w_h   0111 01101001 11101 11001 . .@vv
+xvexth_d_w   0111 01101001 11101 11010 . .@vv
+xvexth_q_d   0111 01101001 11101 11011 . .@vv
+xvexth_hu_bu 0111 01101001 11101 11100 . .@vv
+xvexth_wu_hu 0111 01101001 11101 11101 . .@vv
+xvexth_du_wu 0111 01101001 11101 0 . .@vv
+xvexth_qu_du 0111 01101001 11101 1 . .@vv
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 4e54dcd08a..d4bea69b61 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1988,6 +1988,15 @@ INSN_LASX(xvsat_hu,  vv_i)
 INSN_LASX(xvsat_wu,  vv_i)
 INSN_LASX(xvsat_du,  vv_i)
 
+INSN_LASX(xvexth_h_b,vv)
+INSN_LASX(xvexth_w_h,vv)
+INSN_LASX(xvexth_d_w,vv)
+INSN_LASX(xvexth_q_d,vv)
+INSN_LASX(xvexth_hu_bu,  vv)
+INSN_LASX(xvexth_wu_hu,  vv)
+INSN_LASX(xvexth_du_wu,  vv)
+INSN_LASX(xvexth_qu_du,  vv)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index f2e19343bf..2eccbc81a7 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -716,32 +716,44 @@ VSAT_U(vsat_hu, 16, UH)
 VSAT_U(vsat_wu, 32, UW)
 VSAT_U(vsat_du, 64, UD)
 
-#define VEXTH(NAME, BIT, E1, E2) \
-void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
-{\
-int i;   \
-VReg *Vd = (VReg *)vd;   \
-VReg *Vj = (VReg *)vj;   \
- \
-for (i = 0; i < LSX_LEN/BIT; i++) {  \
-Vd->E1(i) = Vj->E2(i + LSX_LEN/BIT); \
-}\
+#define VEXTH(NAME, BIT, E1, E2) \
+void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
+{\
+int i, j, ofs;   \
+VReg *Vd = (VReg *)vd;   \
+VReg *Vj = (VReg *)vj;   \
+int oprsz = simd_oprsz(desc);\
+ \
+ofs = LSX_LEN / BIT; \
+for (i = 0; i < oprsz / 16; i++) {   \
+for (j = 0; j < ofs; j++) {  \
+Vd->E1(j + i * ofs) = Vj->E2(j + ofs + ofs * 2 * i); \
+}\
+}\
 }
 
 void HELPER(vexth_q_d)(void *vd, void *vj, uint32_t desc)
 {
+int i;
 VReg *Vd = (VReg *)vd;
 VReg *Vj = (VReg *)vj;
+int oprsz = simd_oprsz(desc);
 
-Vd->Q(0) = int128_makes64(Vj->D(1));
+for (i = 0; i < oprsz / 16; i++) {
+Vd->Q(i) = int128_makes64(Vj->D(2 * i + 1));
+}
 }
 
 void HELPER(vexth_qu_du)(void *vd, void *vj, uint32_t desc)
 {
+int i;
 VReg *Vd = (VReg *)vd;
 VReg *Vj = (VReg *)vj;
+int oprsz = simd_oprsz(desc);
 
-Vd->Q(0) = int128_make64((uint64_t)Vj->D(1));
+for (i = 0; i < oprsz / 16; i++) {
+Vd->Q(i) = int128_make64(Vj->UD(2 * i + 1));
+}
 }
 
 VEXTH(vexth_h_b, 16, H, B)
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index a6c6675a94..e002bb05d6 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -141,6 +141,10 @@ static bool gen_vv_ptr(DisasContext *ctx, arg_v

[PATCH v6 12/57] target/loongarch: check_vec support check LASX instructions

2023-09-13 Thread Song Gao

Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.h  | 2 ++
 target/loongarch/cpu.c  | 2 ++
 target/loongarch/insn_trans/trans_vec.c.inc | 6 ++
 3 files changed, 10 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 347ad1c8a9..f125a8e49b 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -462,6 +462,7 @@ static inline void set_pc(CPULoongArchState *env, uint64_t 
value)
 #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
 #define HW_FLAGS_EUEN_SXE   0x08
+#define HW_FLAGS_EUEN_ASXE  0x10
 #define HW_FLAGS_VA32   0x20
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
@@ -472,6 +473,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState 
*env, vaddr *pc,
 *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
+*flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, ASXE) * HW_FLAGS_EUEN_ASXE;
 *flags |= is_va32(env) * HW_FLAGS_VA32;
 }
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 4d72e905aa..a1d3f680d8 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -55,6 +55,7 @@ static const char * const excp_names[] = {
 [EXCCODE_DBP] = "Debug breakpoint",
 [EXCCODE_BCE] = "Bound Check Exception",
 [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
+[EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
 };
 
 const char *loongarch_exception_name(int32_t exception)
@@ -190,6 +191,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 case EXCCODE_FPD:
 case EXCCODE_FPE:
 case EXCCODE_SXD:
+case EXCCODE_ASXD:
 env->CSR_BADV = env->pc;
 QEMU_FALLTHROUGH;
 case EXCCODE_BCE:
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index d8ab7c3417..b5ca65c250 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -12,6 +12,12 @@ static bool check_vec(DisasContext *ctx, uint32_t oprsz)
 generate_exception(ctx, EXCCODE_SXD);
 return false;
 }
+
+if ((oprsz == 32) && ((ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0)) {
+generate_exception(ctx, EXCCODE_ASXD);
+return false;
+}
+
 return true;
 }
 
-- 
2.39.1

[PATCH v6 37/57] target/loongarch: Implement xvsrlr xvsrar

2023-09-13 Thread Song Gao

This patch includes:
- XVSRLR[I].{B/H/W/D};
- XVSRAR[I].{B/H/W/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   | 17 +
 target/loongarch/disas.c| 18 ++
 target/loongarch/vec_helper.c   | 12 
 target/loongarch/insn_trans/trans_vec.c.inc | 16 
 4 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8a7933eccc..ca0951e1cc 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1661,6 +1661,23 @@ xvsllwil_wu_hu   0111 0111 11000 1  . .  
 @vv_ui4
 xvsllwil_du_wu   0111 0111 11001 . . .@vv_ui5
 xvextl_qu_du 0111 0111 11010 0 . .@vv
 
+xvsrlr_b 0111 0100 0 . . .@vvv
+xvsrlr_h 0111 0100 1 . . .@vvv
+xvsrlr_w 0111 0100 00010 . . .@vvv
+xvsrlr_d 0111 0100 00011 . . .@vvv
+xvsrlri_b0111 01101010 01000 01 ... . .   @vv_ui3
+xvsrlri_h0111 01101010 01000 1  . .   @vv_ui4
+xvsrlri_w0111 01101010 01001 . . .@vv_ui5
+xvsrlri_d0111 01101010 0101 .. . .@vv_ui6
+xvsrar_b 0111 0100 00100 . . .@vvv
+xvsrar_h 0111 0100 00101 . . .@vvv
+xvsrar_w 0111 0100 00110 . . .@vvv
+xvsrar_d 0111 0100 00111 . . .@vvv
+xvsrari_b0111 01101010 1 01 ... . .   @vv_ui3
+xvsrari_h0111 01101010 1 1  . .   @vv_ui4
+xvsrari_w0111 01101010 10001 . . .@vv_ui5
+xvsrari_d0111 01101010 1001 .. . .@vv_ui6
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d93ecdb60d..bc5eb82b49 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2086,6 +2086,24 @@ INSN_LASX(xvsllwil_wu_hu,vv_i)
 INSN_LASX(xvsllwil_du_wu,vv_i)
 INSN_LASX(xvextl_qu_du,  vv)
 
+INSN_LASX(xvsrlr_b,  vvv)
+INSN_LASX(xvsrlr_h,  vvv)
+INSN_LASX(xvsrlr_w,  vvv)
+INSN_LASX(xvsrlr_d,  vvv)
+INSN_LASX(xvsrlri_b, vv_i)
+INSN_LASX(xvsrlri_h, vv_i)
+INSN_LASX(xvsrlri_w, vv_i)
+INSN_LASX(xvsrlri_d, vv_i)
+
+INSN_LASX(xvsrar_b,  vvv)
+INSN_LASX(xvsrar_h,  vvv)
+INSN_LASX(xvsrar_w,  vvv)
+INSN_LASX(xvsrar_d,  vvv)
+INSN_LASX(xvsrari_b, vv_i)
+INSN_LASX(xvsrari_h, vv_i)
+INSN_LASX(xvsrari_w, vv_i)
+INSN_LASX(xvsrari_d, vv_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index a3376439e3..bb30d24b89 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -1025,8 +1025,9 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t 
desc)  \
 VReg *Vd = (VReg *)vd;  \
 VReg *Vj = (VReg *)vj;  \
 VReg *Vk = (VReg *)vk;  \
+int oprsz = simd_oprsz(desc);   \
 \
-for (i = 0; i < LSX_LEN/BIT; i++) { \
+for (i = 0; i < oprsz / (BIT / 8); i++) {   \
 Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
 }   \
 }
@@ -1042,8 +1043,9 @@ void HELPER(NAME)(void *vd, void *vj, uint64_t imm, 
uint32_t desc) \
 int i; \
 VReg *Vd = (VReg *)vd; \
 VReg *Vj = (VReg *)vj; \
+int oprsz = simd_oprsz(desc);  \
\
-for (i = 0; i < LSX_LEN/BIT; i++) {\
+for (i = 0; i < oprsz / (BIT / 8); i++) {  \
 Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), imm);  \
 }  \
 }
@@ -1075,8 +1077,9 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t 
desc)  \
 VReg *Vd = (VReg *)vd;  \
 VReg *Vj = (VReg *)vj;  \
 VReg *Vk = (VReg *)vk;  \
+int oprsz = simd_oprsz(desc);

[PATCH v6 33/57] target/loognarch: Implement xvldi

2023-09-13 Thread Song Gao

This patch includes:
- XVLDI.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  2 ++
 target/loongarch/disas.c|  7 +++
 target/loongarch/insn_trans/trans_vec.c.inc | 13 ++---
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6a161d6d20..edaa756395 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1605,6 +1605,8 @@ xvmskltz_d   0111 01101001 11000 10011 . .
@vv
 xvmskgez_b   0111 01101001 11000 10100 . .@vv
 xvmsknz_b0111 01101001 11000 11000 . .@vv
 
+xvldi0111 0110 00 . . @v_i13
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 05710098ad..3f6fbeddd7 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1703,6 +1703,11 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * 
a) \
 return true;\
 }
 
+static void output_v_i_x(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "x%d, 0x%x", a->vd, a->imm);
+}
+
 static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
 {
 output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
@@ -2022,6 +2027,8 @@ INSN_LASX(xvmskltz_d,vv)
 INSN_LASX(xvmskgez_b,vv)
 INSN_LASX(xvmsknz_b, vv)
 
+INSN_LASX(xvldi, v_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 843ec6d4af..7ebe971ad9 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -3546,16 +3546,12 @@ static uint64_t vldi_get_value(DisasContext *ctx, 
uint32_t imm)
 return data;
 }
 
-static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
+static bool gen_vldi(DisasContext *ctx, arg_vldi *a, uint32_t oprsz)
 {
 int sel, vece;
 uint64_t value;
 
-if (!avail_LSX(ctx)) {
-return false;
-}
-
-if (!check_vec(ctx, 16)) {
+if (!check_vec(ctx, oprsz)) {
 return true;
 }
 
@@ -3569,11 +3565,14 @@ static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
 vece = (a->imm >> 10) & 0x3;
 }
 
-tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->vd), 16, ctx->vl/8,
+tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->vd), oprsz, ctx->vl/8,
  tcg_constant_i64(value));
 return true;
 }
 
+TRANS(vldi, LSX, gen_vldi, 16)
+TRANS(xvldi, LASX, gen_vldi, 32)
+
 TRANS(vand_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_and)
 TRANS(vor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_or)
 TRANS(vxor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_xor)
-- 
2.39.1

[PATCH v6 44/57] target/loongarch: Implement xvbitclr xvbitset xvbitrev

2023-09-13 Thread Song Gao

This patch includes:
- XVBITCLR[I].{B/H/W/D};
- XVBITSET[I].{B/H/W/D};
- XVBITREV[I].{B/H/W/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   | 27 +
 target/loongarch/disas.c| 25 
 target/loongarch/vec_helper.c   | 44 +++--
 target/loongarch/insn_trans/trans_vec.c.inc | 24 +++
 4 files changed, 99 insertions(+), 21 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d683c6a6ab..cb6db8002a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1784,6 +1784,33 @@ xvpcnt_h 0111 01101001 11000 01001 . .   
 @vv
 xvpcnt_w 0111 01101001 11000 01010 . .@vv
 xvpcnt_d 0111 01101001 11000 01011 . .@vv
 
+xvbitclr_b   0111 0101 11000 . . .@vvv
+xvbitclr_h   0111 0101 11001 . . .@vvv
+xvbitclr_w   0111 0101 11010 . . .@vvv
+xvbitclr_d   0111 0101 11011 . . .@vvv
+xvbitclri_b  0111 01110001 0 01 ... . .   @vv_ui3
+xvbitclri_h  0111 01110001 0 1  . .   @vv_ui4
+xvbitclri_w  0111 01110001 1 . . .@vv_ui5
+xvbitclri_d  0111 01110001 0001 .. . .@vv_ui6
+
+xvbitset_b   0111 0101 11100 . . .@vvv
+xvbitset_h   0111 0101 11101 . . .@vvv
+xvbitset_w   0111 0101 0 . . .@vvv
+xvbitset_d   0111 0101 1 . . .@vvv
+xvbitseti_b  0111 01110001 01000 01 ... . .   @vv_ui3
+xvbitseti_h  0111 01110001 01000 1  . .   @vv_ui4
+xvbitseti_w  0111 01110001 01001 . . .@vv_ui5
+xvbitseti_d  0111 01110001 0101 .. . .@vv_ui6
+
+xvbitrev_b   0111 01010001 0 . . .@vvv
+xvbitrev_h   0111 01010001 1 . . .@vvv
+xvbitrev_w   0111 01010001 00010 . . .@vvv
+xvbitrev_d   0111 01010001 00011 . . .@vvv
+xvbitrevi_b  0111 01110001 1 01 ... . .   @vv_ui3
+xvbitrevi_h  0111 01110001 1 1  . .   @vv_ui4
+xvbitrevi_w  0111 01110001 10001 . . .@vv_ui5
+xvbitrevi_d  0111 01110001 1001 .. . .@vv_ui6
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ff7f7a792a..7f04c912aa 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2210,6 +2210,31 @@ INSN_LASX(xvpcnt_h,  vv)
 INSN_LASX(xvpcnt_w,  vv)
 INSN_LASX(xvpcnt_d,  vv)
 
+INSN_LASX(xvbitclr_b,vvv)
+INSN_LASX(xvbitclr_h,vvv)
+INSN_LASX(xvbitclr_w,vvv)
+INSN_LASX(xvbitclr_d,vvv)
+INSN_LASX(xvbitclri_b,   vv_i)
+INSN_LASX(xvbitclri_h,   vv_i)
+INSN_LASX(xvbitclri_w,   vv_i)
+INSN_LASX(xvbitclri_d,   vv_i)
+INSN_LASX(xvbitset_b,vvv)
+INSN_LASX(xvbitset_h,vvv)
+INSN_LASX(xvbitset_w,vvv)
+INSN_LASX(xvbitset_d,vvv)
+INSN_LASX(xvbitseti_b,   vv_i)
+INSN_LASX(xvbitseti_h,   vv_i)
+INSN_LASX(xvbitseti_w,   vv_i)
+INSN_LASX(xvbitseti_d,   vv_i)
+INSN_LASX(xvbitrev_b,vvv)
+INSN_LASX(xvbitrev_h,vvv)
+INSN_LASX(xvbitrev_w,vvv)
+INSN_LASX(xvbitrev_d,vvv)
+INSN_LASX(xvbitrevi_b,   vv_i)
+INSN_LASX(xvbitrevi_h,   vv_i)
+INSN_LASX(xvbitrevi_w,   vv_i)
+INSN_LASX(xvbitrevi_d,   vv_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index e529b58419..ec63efb428 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2313,17 +2313,18 @@ VPCNT(vpcnt_d, 64, UD, ctpop64)
 #define DO_BITSET(a, bit) (a | 1ull << bit)
 #define DO_BITREV(a, bit) (a ^ (1ull << bit))
 
-#define DO_BIT(NAME, BIT, E, DO_OP) \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{   \
-int i;  \
-VReg *Vd = (VReg *)vd;  \
-VReg *Vj = (VReg *)vj;  \
-VReg *Vk = (VReg *)vk;  \
-\
-for (i = 0; i < LSX_LEN/BIT; i++) { \
-Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i)%BIT);   \
-}   \
+#define DO_BIT(NAME, BIT, E, DO_OP)\
+void HELPER(NAME)(void *v

[PATCH v6 00/57] Add LoongArch LASX instructions

2023-09-13 Thread Song Gao

Based-on: 
https://patchew.org/QEMU/20230831030904.1194667-1-richard.hender...@linaro.org/

Hi,

This series adds LoongArch LASX instructions.

About test:
We use RISU test the LoongArch LASX instructions.

QEMU:
https://github.com/loongson/qemu/tree/tcg-old-abi-support-lasx
RISU:
https://github.com/loongson/risu/tree/loongarch-suport-lasx

patch6, 51, 52, 54 need review. please review, Thanks.

Chagnes for v6:
- Rebase;
- tcg_gen_gvec_xx, set data = 0;
- Move check_vec to gen/gvec_xxx_vl;
- Create gen_{v/x}2r/gen_r2(x/v) common helper fucntions, (patch 51);
- Some functions use a loop;
- xvrepl128vei use a helper function gen_xvrepl128, (patch 52);
- xvreplve0 use a helper function and TRANS(), (patch52);
- Optimize vshuf vpermi_q instructions, (patch54);
- Use tcg_gen_gvec_cmpi(), (patch48);
- R-b.

Changes for v5:
- Rebase;
- Split V4'patch 10 to 7 patches(patch3-9);
- LSX use gen/gvec_vv..
- LASX use gen/gvec_xx...
- Don't use an array of shift_res. (patch40,41);
- Move simply DO_XX marcos together. (patch56);
- Renamed lsx*.c to vec*.c. (patch 1);
- Change marcos CHECK_VEC to check_vec(ctx, oprsz);
- R-b.

Changes for v4:
- Rebase;
- Add avail_LASX to check.

Changes for v3:
- Add a new patch 9, rename lsx_helper.c to vec_helper.c,
  and use gen_helper_gvec_* series functions;
- Use i < oprsz / (BIT / 8) in loop;
- Some helper functions use loop;
- patch 46: use tcg_gen_qemu_ld/st_i64 for xvld/xvst{x};
- R-b.

Changes for v2:
- Expand the definition of VReg to be 256 bits.
- Use more LSX functions.
- R-b.

Song Gao (57):
  target/loongarch: Renamed lsx*.c to vec* .c
  target/loongarch: Implement gvec_*_vl functions
  target/loongarch: Use gen_helper_gvec_4_ptr for 4OP + env vector
instructions
  target/loongarch: Use gen_helper_gvec_4 for 4OP vector instructions
  target/loongarch: Use gen_helper_gvec_3_ptr for 3OP + env vector
instructions
  target/loongarch: Use gen_helper_gvec_3 for 3OP vector instructions
  target/loongarch: Use gen_helper_gvec_2_ptr for 2OP + env vector
instructions
  target/loongarch: Use gen_helper_gvec_2 for 2OP vector instructions
  target/loongarch: Use gen_helper_gvec_2i for 2OP + imm vector
instructions
  target/loongarch: Replace CHECK_SXE to check_vec(ctx, 16)
  target/loongarch: Add LASX data support
  target/loongarch: check_vec support check LASX instructions
  target/loongarch: Add avail_LASX to check LASX instructions
  target/loongarch: Implement xvadd/xvsub
  target/loongarch: Implement xvreplgr2vr
  target/loongarch: Implement xvaddi/xvsubi
  target/loongarch: Implement xvneg
  target/loongarch: Implement xvsadd/xvssub
  target/loongarch: Implement xvhaddw/xvhsubw
  target/loongarch: Implement xvaddw/xvsubw
  target/loongarch: Implement xavg/xvagr
  target/loongarch: Implement xvabsd
  target/loongarch: Implement xvadda
  target/loongarch: Implement xvmax/xvmin
  target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od}
  target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}
  target/loongarch; Implement xvdiv/xvmod
  target/loongarch: Implement xvsat
  target/loongarch: Implement xvexth
  target/loongarch: Implement vext2xv
  target/loongarch: Implement xvsigncov
  target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
  target/loognarch: Implement xvldi
  target/loongarch: Implement LASX logic instructions
  target/loongarch: Implement xvsll xvsrl xvsra xvrotr
  target/loongarch: Implement xvsllwil xvextl
  target/loongarch: Implement xvsrlr xvsrar
  target/loongarch: Implement xvsrln xvsran
  target/loongarch: Implement xvsrlrn xvsrarn
  target/loongarch: Implement xvssrln xvssran
  target/loongarch: Implement xvssrlrn xvssrarn
  target/loongarch: Implement xvclo xvclz
  target/loongarch: Implement xvpcnt
  target/loongarch: Implement xvbitclr xvbitset xvbitrev
  target/loongarch: Implement xvfrstp
  target/loongarch: Implement LASX fpu arith instructions
  target/loongarch: Implement LASX fpu fcvt instructions
  target/loongarch: Implement xvseq xvsle xvslt
  target/loongarch: Implement xvfcmp
  target/loongarch: Implement xvbitsel xvset
  target/loongarch: Implement xvinsgr2vr xvpickve2gr
  target/loongarch: Implement xvreplve xvinsve0 xvpickve
  target/loongarch: Implement xvpack xvpick xvilv{l/h}
  target/loongarch: Implement xvshuf xvperm{i} xvshuf4i
  target/loongarch: Implement xvld xvst
  target/loongarch: Move simply DO_XX marcos togther
  target/loongarch: CPUCFG support LASX

 target/loongarch/cpu.h|   26 +-
 target/loongarch/helper.h |  689 ++--
 target/loongarch/internals.h  |   22 -
 target/loongarch/translate.h  |1 +
 target/loongarch/vec.h|   75 +
 target/loongarch/insns.decode |  782 
 linux-user/loongarch64/signal.c   |1 +
 target/loongarch/cpu.c|4 +
 target/loongarch/disas.c  |  924 +
 target/loongarch/gdbstub.c

[PATCH v6 06/57] target/loongarch: Use gen_helper_gvec_3 for 3OP vector instructions

2023-09-13 Thread Song Gao

Signed-off-by: Song Gao 
---
 target/loongarch/helper.h   | 214 +-
 target/loongarch/vec_helper.c   | 444 +---
 target/loongarch/insn_trans/trans_vec.c.inc |  19 +-
 3 files changed, 326 insertions(+), 351 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index bcf82597aa..4b681e948f 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -133,22 +133,22 @@ DEF_HELPER_1(idle, void, env)
 #endif
 
 /* LoongArch LSX  */
-DEF_HELPER_4(vhaddw_h_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_w_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_d_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_q_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_hu_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_wu_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_du_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_qu_du, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_h_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_w_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_d_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_q_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_4(vhaddw_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
@@ -305,22 +305,22 @@ DEF_HELPER_FLAGS_4(vmaddwod_h_bu_b, TCG_CALL_NO_RWG, 
void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
-DEF_HELPER_4(vdiv_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_du, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_4(vdiv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vsat_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)

[PATCH v6 07/57] target/loongarch: Use gen_helper_gvec_2_ptr for 2OP + env vector instructions

2023-09-13 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/helper.h   | 118 +++---
 target/loongarch/vec_helper.c   | 161 +++-
 target/loongarch/insn_trans/trans_vec.c.inc | 129 +---
 3 files changed, 219 insertions(+), 189 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4b681e948f..0752cc7212 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -547,73 +547,73 @@ DEF_HELPER_FLAGS_5(vfmaxa_d, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, env, i32)
 DEF_HELPER_FLAGS_5(vfmina_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_FLAGS_5(vfmina_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
 
-DEF_HELPER_3(vflogb_s, void, env, i32, i32)
-DEF_HELPER_3(vflogb_d, void, env, i32, i32)
-
-DEF_HELPER_3(vfclass_s, void, env, i32, i32)
-DEF_HELPER_3(vfclass_d, void, env, i32, i32)
-
-DEF_HELPER_3(vfsqrt_s, void, env, i32, i32)
-DEF_HELPER_3(vfsqrt_d, void, env, i32, i32)
-DEF_HELPER_3(vfrecip_s, void, env, i32, i32)
-DEF_HELPER_3(vfrecip_d, void, env, i32, i32)
-DEF_HELPER_3(vfrsqrt_s, void, env, i32, i32)
-DEF_HELPER_3(vfrsqrt_d, void, env, i32, i32)
-
-DEF_HELPER_3(vfcvtl_s_h, void, env, i32, i32)
-DEF_HELPER_3(vfcvth_s_h, void, env, i32, i32)
-DEF_HELPER_3(vfcvtl_d_s, void, env, i32, i32)
-DEF_HELPER_3(vfcvth_d_s, void, env, i32, i32)
+DEF_HELPER_FLAGS_4(vflogb_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vflogb_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfclass_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfclass_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfsqrt_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfsqrt_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrecip_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrecip_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrsqrt_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrsqrt_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfcvtl_s_h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfcvth_s_h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfcvtl_d_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfcvth_d_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
 DEF_HELPER_FLAGS_5(vfcvt_h_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_FLAGS_5(vfcvt_s_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
 
-DEF_HELPER_3(vfrintrne_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrne_d, void, env, i32, i32)
-DEF_HELPER_3(vfrintrz_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrz_d, void, env, i32, i32)
-DEF_HELPER_3(vfrintrp_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrp_d, void, env, i32, i32)
-DEF_HELPER_3(vfrintrm_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrm_d, void, env, i32, i32)
-DEF_HELPER_3(vfrint_s, void, env, i32, i32)
-DEF_HELPER_3(vfrint_d, void, env, i32, i32)
-
-DEF_HELPER_3(vftintrne_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrne_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrp_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrp_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrm_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrm_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftint_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftint_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_wu_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_lu_d, void, env, i32, i32)
-DEF_HELPER_3(vftint_wu_s, void, env, i32, i32)
-DEF_HELPER_3(vftint_lu_d, void, env, i32, i32)
+DEF_HELPER_FLAGS_4(vfrintrne_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrne_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrz_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrz_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrp_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrp_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrm_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrm_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrint_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrint_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vftintrne_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrne_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrz_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrz_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrp_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrp_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrm_w_s

[PATCH v6 46/57] target/loongarch: Implement LASX fpu arith instructions

2023-09-13 Thread Song Gao

This patch includes:
- XVF{ADD/SUB/MUL/DIV}.{S/D};
- XVF{MADD/MSUB/NMADD/NMSUB}.{S/D};
- XVF{MAX/MIN}.{S/D};
- XVF{MAXA/MINA}.{S/D};
- XVFLOGB.{S/D};
- XVFCLASS.{S/D};
- XVF{SQRT/RECIP/RSQRT}.{S/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   | 41 +++
 target/loongarch/disas.c| 46 +
 target/loongarch/vec_helper.c   | 12 ++--
 target/loongarch/insn_trans/trans_vec.c.inc | 75 +
 4 files changed, 158 insertions(+), 16 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6035fe139c..4224b0a4b1 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1816,6 +1816,47 @@ xvfrstp_h0111 01010010 10111 . . .   
 @vvv
 xvfrstpi_b   0111 01101001 10100 . . .@vv_ui5
 xvfrstpi_h   0111 01101001 10101 . . .@vv_ui5
 
+xvfadd_s 0111 01010011 1 . . .@vvv
+xvfadd_d 0111 01010011 00010 . . .@vvv
+xvfsub_s 0111 01010011 00101 . . .@vvv
+xvfsub_d 0111 01010011 00110 . . .@vvv
+xvfmul_s 0111 01010011 10001 . . .@vvv
+xvfmul_d 0111 01010011 10010 . . .@vvv
+xvfdiv_s 0111 01010011 10101 . . .@vvv
+xvfdiv_d 0111 01010011 10110 . . .@vvv
+
+xvfmadd_s 1011 . . . .@
+xvfmadd_d 10100010 . . . .@
+xvfmsub_s 10100101 . . . .@
+xvfmsub_d 10100110 . . . .@
+xvfnmadd_s    10101001 . . . .@
+xvfnmadd_d    10101010 . . . .@
+xvfnmsub_s    10101101 . . . .@
+xvfnmsub_d    10101110 . . . .@
+
+xvfmax_s 0111 01010011 11001 . . .@vvv
+xvfmax_d 0111 01010011 11010 . . .@vvv
+xvfmin_s 0111 01010011 11101 . . .@vvv
+xvfmin_d 0111 01010011 0 . . .@vvv
+
+xvfmaxa_s0111 01010100 1 . . .@vvv
+xvfmaxa_d0111 01010100 00010 . . .@vvv
+xvfmina_s0111 01010100 00101 . . .@vvv
+xvfmina_d0111 01010100 00110 . . .@vvv
+
+xvflogb_s0111 01101001 11001 10001 . .@vv
+xvflogb_d0111 01101001 11001 10010 . .@vv
+
+xvfclass_s   0111 01101001 11001 10101 . .@vv
+xvfclass_d   0111 01101001 11001 10110 . .@vv
+
+xvfsqrt_s0111 01101001 11001 11001 . .@vv
+xvfsqrt_d0111 01101001 11001 11010 . .@vv
+xvfrecip_s   0111 01101001 11001 11101 . .@vv
+xvfrecip_d   0111 01101001 11001 0 . .@vv
+xvfrsqrt_s   0111 01101001 11010 1 . .@vv
+xvfrsqrt_d   0111 01101001 11010 00010 . .@vv
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1c4aecaa93..1fb9d7eac1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_v_i_x(DisasContext *ctx, arg_v_i *a, 
const char *mnemonic)
 output(ctx, mnemonic, "x%d, 0x%x", a->vd, a->imm);
 }
 
+static void output__x(DisasContext *ctx, arg_ *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "x%d, x%d, x%d, x%d", a->vd, a->vj, a->vk, a->va);
+}
+
 static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
 {
 output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
@@ -2240,6 +2245,47 @@ INSN_LASX(xvfrstp_h, vvv)
 INSN_LASX(xvfrstpi_b,vv_i)
 INSN_LASX(xvfrstpi_h,vv_i)
 
+INSN_LASX(xvfadd_s,  vvv)
+INSN_LASX(xvfadd_d,  vvv)
+INSN_LASX(xvfsub_s,  vvv)
+INSN_LASX(xvfsub_d,  vvv)
+INSN_LASX(xvfmul_s,  vvv)
+INSN_LASX(xvfmul_d,  vvv)
+INSN_LASX(xvfdiv_s,  vvv)
+INSN_LASX(xvfdiv_d,  vvv)
+
+INSN_LASX(xvfmadd_s, )
+INSN_LASX(xvfmadd_d, )
+INSN_LASX(xvfmsub_s, )
+INSN_LASX(xvfmsub_d, )
+INSN_LASX(xvfnmadd_s,)
+INSN_LASX(xvfnmadd_d,)
+INSN_LASX(xvfnmsub_s,)
+INSN_LASX(xvfnmsub_d,)
+
+INSN_LASX(xvfmax_s,  vvv)
+INSN_LASX(xvfmax_d,  vvv)
+INSN_LASX(xvfmin_s,  vvv)
+INSN_LASX(xvfmin_d,  vvv)
+
+INSN_LASX(xvfmaxa_s, vvv)
+INSN_LASX(xvfmaxa_d, vvv)
+INSN_LASX(xvfmina_s, vvv)
+INSN_LASX(xvfmina_d, vvv)
+
+INSN_LASX(xvflogb_

[PATCH v6 54/57] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i

2023-09-13 Thread Song Gao

This patch includes:
- XVSHUF.{B/H/W/D};
- XVPERM.W;
- XVSHUF4i.{B/H/W/D};
- XVPERMI.{W/D/Q};
- XVEXTRINS.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/helper.h   |   3 +
 target/loongarch/insns.decode   |  21 +++
 target/loongarch/disas.c|  21 +++
 target/loongarch/vec_helper.c   | 146 ++--
 target/loongarch/insn_trans/trans_vec.c.inc |  30 +++-
 5 files changed, 175 insertions(+), 46 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index fb489dda2d..b3b64a0215 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -709,7 +709,10 @@ DEF_HELPER_FLAGS_4(vshuf4i_h, TCG_CALL_NO_RWG, void, ptr, 
ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vshuf4i_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vshuf4i_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
+DEF_HELPER_FLAGS_4(vperm_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vpermi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vpermi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vpermi_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
 DEF_HELPER_FLAGS_4(vextrins_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vextrins_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a325b861c1..64b67ee9ac 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -2039,3 +2039,24 @@ xvilvh_b 0111 01010001 11000 . . .   
 @vvv
 xvilvh_h 0111 01010001 11001 . . .@vvv
 xvilvh_w 0111 01010001 11010 . . .@vvv
 xvilvh_d 0111 01010001 11011 . . .@vvv
+
+xvshuf_b  11010110 . . . .@
+xvshuf_h 0111 01010111 10101 . . .@vvv
+xvshuf_w 0111 01010111 10110 . . .@vvv
+xvshuf_d 0111 01010111 10111 . . .@vvv
+
+xvperm_w 0111 01010111 11010 . . .@vvv
+
+xvshuf4i_b   0111 0001 00  . .@vv_ui8
+xvshuf4i_h   0111 0001 01  . .@vv_ui8
+xvshuf4i_w   0111 0001 10  . .@vv_ui8
+xvshuf4i_d   0111 0001 11  . .@vv_ui8
+
+xvpermi_w0111 0110 01  . .@vv_ui8
+xvpermi_d0111 0110 10  . .@vv_ui8
+xvpermi_q0111 0110 11  . .@vv_ui8
+
+xvextrins_d  0111 0000 00  . .@vv_ui8
+xvextrins_w  0111 0000 01  . .@vv_ui8
+xvextrins_h  0111 0000 10  . .@vv_ui8
+xvextrins_b  0111 0000 11  . .@vv_ui8
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 74ae916a10..1ec8e21e01 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2574,3 +2574,24 @@ INSN_LASX(xvilvh_b,  vvv)
 INSN_LASX(xvilvh_h,  vvv)
 INSN_LASX(xvilvh_w,  vvv)
 INSN_LASX(xvilvh_d,  vvv)
+
+INSN_LASX(xvshuf_b,  )
+INSN_LASX(xvshuf_h,  vvv)
+INSN_LASX(xvshuf_w,  vvv)
+INSN_LASX(xvshuf_d,  vvv)
+
+INSN_LASX(xvperm_w,  vvv)
+
+INSN_LASX(xvshuf4i_b,vv_i)
+INSN_LASX(xvshuf4i_h,vv_i)
+INSN_LASX(xvshuf4i_w,vv_i)
+INSN_LASX(xvshuf4i_d,vv_i)
+
+INSN_LASX(xvpermi_w, vv_i)
+INSN_LASX(xvpermi_d, vv_i)
+INSN_LASX(xvpermi_q, vv_i)
+
+INSN_LASX(xvextrins_d,   vv_i)
+INSN_LASX(xvextrins_w,   vv_i)
+INSN_LASX(xvextrins_h,   vv_i)
+INSN_LASX(xvextrins_b,   vv_i)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 2bbaee628b..6b61a5c447 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3381,57 +3381,65 @@ VILVH(vilvh_h, 32, H)
 VILVH(vilvh_w, 64, W)
 VILVH(vilvh_d, 128, D)
 
+#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
+
 void HELPER(vshuf_b)(void *vd, void *vj, void *vk, void *va, uint32_t desc)
 {
-int i, m;
-VReg temp;
+int i, j, m;
+VReg temp = {};
 VReg *Vd = (VReg *)vd;
 VReg *Vj = (VReg *)vj;
 VReg *Vk = (VReg *)vk;
 VReg *Va = (VReg *)va;
+int oprsz = simd_oprsz(desc);
 
-m = LSX_LEN/8;
-for (i = 0; i < m ; i++) {
+m = LSX_LEN / 8;
+for (i = 0; i < (oprsz / 16) * m; i++) {
+j = i < m ? 0 : 1;
 uint64_t k = (uint8_t)Va->B(i) % (2 * m);
-temp.B(i) = k < m ? Vk->B(k) : Vj->B(k - m);
+temp.B(i) = k < m ? Vk->B(k + j * m): Vj->B(k + (j - 1) * m);
 }
 *Vd = temp;
 }
 
-#define VSHUF(NAME, BIT, E)\
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{  \
-

[PATCH v6 51/57] target/loongarch: Implement xvinsgr2vr xvpickve2gr

2023-09-13 Thread Song Gao

This patch includes:
- XVINSGR2VR.{W/D};
- XVPICKVE2GR.{W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/insns.decode   |   7 +
 target/loongarch/disas.c|  17 ++
 target/loongarch/translate.c|  13 ++
 target/loongarch/insn_trans/trans_vec.c.inc | 208 
 4 files changed, 75 insertions(+), 170 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ad6751fdfb..bb3bb447ae 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1976,6 +1976,13 @@ xvsetallnez_h0111 01101001 11001 01101 . 00 ...  
 @cv
 xvsetallnez_w0111 01101001 11001 01110 . 00 ...   @cv
 xvsetallnez_d0111 01101001 11001 0 . 00 ...   @cv
 
+xvinsgr2vr_w 0111 01101110 10111 10 ... . .   @vr_ui3
+xvinsgr2vr_d 0111 01101110 10111 110 .. . .   @vr_ui2
+xvpickve2gr_w0111 01101110 1 10 ... . .   @rv_ui3
+xvpickve2gr_d0111 01101110 1 110 .. . .   @rv_ui2
+xvpickve2gr_wu   0111 0110 00111 10 ... . .   @rv_ui3
+xvpickve2gr_du   0111 0110 00111 110 .. . .   @rv_ui2
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index abe113b150..04f9f9fa4b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1738,6 +1738,16 @@ static void output_vv_x(DisasContext *ctx, arg_vv *a, 
const char *mnemonic)
 output(ctx, mnemonic, "x%d, x%d", a->vd, a->vj);
 }
 
+static void output_vr_i_x(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "x%d, r%d, 0x%x", a->vd, a->rj, a->imm);
+}
+
+static void output_rv_i_x(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "r%d, x%d, 0x%x", a->rd, a->vj, a->imm);
+}
+
 INSN_LASX(xvadd_b,   vvv)
 INSN_LASX(xvadd_h,   vvv)
 INSN_LASX(xvadd_w,   vvv)
@@ -2497,6 +2507,13 @@ INSN_LASX(xvsetallnez_h, cv)
 INSN_LASX(xvsetallnez_w, cv)
 INSN_LASX(xvsetallnez_d, cv)
 
+INSN_LASX(xvinsgr2vr_w,  vr_i)
+INSN_LASX(xvinsgr2vr_d,  vr_i)
+INSN_LASX(xvpickve2gr_w, rv_i)
+INSN_LASX(xvpickve2gr_d, rv_i)
+INSN_LASX(xvpickve2gr_wu,rv_i)
+INSN_LASX(xvpickve2gr_du,rv_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 10e2fe8ff6..4892834d0c 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -37,6 +37,19 @@ static inline int vec_full_offset(int regno)
 return  offsetof(CPULoongArchState, fpr[regno]);
 }
 
+static inline int vec_reg_offset(int regno, int index, MemOp mop)
+{
+const uint8_t size = 1 << mop;
+int offs = index * size;
+
+#if HOST_BIG_ENDIAN
+if (size < 8 ) {
+offs ^ = (8 - size);
+}
+#endif
+return offs + vec_full_offset(regno);
+}
+
 static inline void get_vreg64(TCGv_i64 dest, int regno, int index)
 {
 tcg_gen_ld_i64(dest, cpu_env,
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 0dec3dfffe..e1ba54075e 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -4829,209 +4829,77 @@ TRANS(xvsetallnez_h, LASX, gen_cx, 
gen_helper_vsetallnez_h)
 TRANS(xvsetallnez_w, LASX, gen_cx, gen_helper_vsetallnez_w)
 TRANS(xvsetallnez_d, LASX, gen_cx, gen_helper_vsetallnez_d)
 
-static bool trans_vinsgr2vr_b(DisasContext *ctx, arg_vr_i *a)
+static bool gen_g2v_vl(DisasContext *ctx, arg_vr_i *a, uint32_t oprsz, MemOp 
mop,
+   void (*func)(TCGv, TCGv_ptr, tcg_target_long))
 {
 TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
 
-if (!avail_LSX(ctx)) {
-return false;
-}
-
-if (!check_vec(ctx, 16)) {
-return true;
-}
-
-tcg_gen_st8_i64(src, cpu_env,
-offsetof(CPULoongArchState, fpr[a->vd].vreg.B(a->imm)));
-return true;
-}
-
-static bool trans_vinsgr2vr_h(DisasContext *ctx, arg_vr_i *a)
-{
-TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
-
-if (!avail_LSX(ctx)) {
-return false;
-}
-
-if (!check_vec(ctx, 16)) {
-return true;
-}
-
-tcg_gen_st16_i64(src, cpu_env,
-offsetof(CPULoongArchState, fpr[a->vd].vreg.H(a->imm)));
-return true;
-}
-
-static bool trans_vinsgr2vr_w(DisasContext *ctx, arg_vr_i *a)
-{
-TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
-
-if (!avail_LSX(ctx)) {
-return false;
-}
-
-if (!check_vec(ctx, 16)) {
+if (!check_vec(ctx, oprsz)) {
 return true;
 }
 
-tcg_gen_st32_i64(src, cpu_env,
- offsetof(CPULoongArchState, fpr[a->vd].vreg.W(a->imm)));
-return true

[PATCH v6 11/57] target/loongarch: Add LASX data support

2023-09-13 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu.h  | 24 --
 target/loongarch/internals.h| 22 
 target/loongarch/vec.h  | 33 ++
 linux-user/loongarch64/signal.c |  1 +
 target/loongarch/cpu.c  |  1 +
 target/loongarch/gdbstub.c  |  1 +
 target/loongarch/machine.c  | 36 -
 target/loongarch/translate.c|  1 +
 target/loongarch/vec_helper.c   |  1 +
 9 files changed, 86 insertions(+), 34 deletions(-)
 create mode 100644 target/loongarch/vec.h

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 4d7201995a..347ad1c8a9 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -251,18 +251,20 @@ FIELD(TLB_MISC, ASID, 1, 10)
 FIELD(TLB_MISC, VPPN, 13, 35)
 FIELD(TLB_MISC, PS, 48, 6)
 
-#define LSX_LEN   (128)
+#define LSX_LEN(128)
+#define LASX_LEN   (256)
+
 typedef union VReg {
-int8_t   B[LSX_LEN / 8];
-int16_t  H[LSX_LEN / 16];
-int32_t  W[LSX_LEN / 32];
-int64_t  D[LSX_LEN / 64];
-uint8_t  UB[LSX_LEN / 8];
-uint16_t UH[LSX_LEN / 16];
-uint32_t UW[LSX_LEN / 32];
-uint64_t UD[LSX_LEN / 64];
-Int128   Q[LSX_LEN / 128];
-}VReg;
+int8_t   B[LASX_LEN / 8];
+int16_t  H[LASX_LEN / 16];
+int32_t  W[LASX_LEN / 32];
+int64_t  D[LASX_LEN / 64];
+uint8_t  UB[LASX_LEN / 8];
+uint16_t UH[LASX_LEN / 16];
+uint32_t UW[LASX_LEN / 32];
+uint64_t UD[LASX_LEN / 64];
+Int128   Q[LASX_LEN / 128];
+} VReg;
 
 typedef union fpr_t fpr_t;
 union fpr_t {
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index 7b0f29c942..c492863cc5 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -21,28 +21,6 @@
 /* Global bit for huge page */
 #define LOONGARCH_HGLOBAL_SHIFT 12
 
-#if  HOST_BIG_ENDIAN
-#define B(x)  B[15 - (x)]
-#define H(x)  H[7 - (x)]
-#define W(x)  W[3 - (x)]
-#define D(x)  D[1 - (x)]
-#define UB(x) UB[15 - (x)]
-#define UH(x) UH[7 - (x)]
-#define UW(x) UW[3 - (x)]
-#define UD(x) UD[1 -(x)]
-#define Q(x)  Q[x]
-#else
-#define B(x)  B[x]
-#define H(x)  H[x]
-#define W(x)  W[x]
-#define D(x)  D[x]
-#define UB(x) UB[x]
-#define UH(x) UH[x]
-#define UW(x) UW[x]
-#define UD(x) UD[x]
-#define Q(x)  Q[x]
-#endif
-
 void loongarch_translate_init(void);
 
 void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
new file mode 100644
index 00..2f23cae7d7
--- /dev/null
+++ b/target/loongarch/vec.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch vector utilitites
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#ifndef LOONGARCH_VEC_H
+#define LOONGARCH_VEC_H
+
+#if HOST_BIG_ENDIAN
+#define B(x)  B[(x) ^ 15]
+#define H(x)  H[(x) ^ 7]
+#define W(x)  W[(x) ^ 3]
+#define D(x)  D[(x) ^ 1]
+#define UB(x) UB[(x) ^ 15]
+#define UH(x) UH[(x) ^ 7]
+#define UW(x) UW[(x) ^ 3]
+#define UD(x) UD[(x) ^ 1]
+#define Q(x)  Q[x]
+#else
+#define B(x)  B[x]
+#define H(x)  H[x]
+#define W(x)  W[x]
+#define D(x)  D[x]
+#define UB(x) UB[x]
+#define UH(x) UH[x]
+#define UW(x) UW[x]
+#define UD(x) UD[x]
+#define Q(x)  Q[x]
+#endif /* HOST_BIG_ENDIAN */
+
+#endif /* LOONGARCH_VEC_H */
diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index bb8efb1172..39572c1190 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -12,6 +12,7 @@
 #include "linux-user/trace.h"
 
 #include "target/loongarch/internals.h"
+#include "target/loongarch/vec.h"
 
 /* FP context was used */
 #define SC_USED_FP  (1 << 0)
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 65f9320e34..4d72e905aa 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -19,6 +19,7 @@
 #include "cpu-csr.h"
 #include "sysemu/reset.h"
 #include "tcg/tcg.h"
+#include "vec.h"
 
 const char * const regnames[32] = {
 "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index b09804b62f..5fc2f19e96 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -11,6 +11,7 @@
 #include "internals.h"
 #include "exec/gdbstub.h"
 #include "gdbstub/helpers.h"
+#include "vec.h"
 
 uint64_t read_fcc(CPULoongArchState *env)
 {
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index d8ac99c9a4..1c4e01d076 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -8,7 +8,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "migration/cpu.h"
-#include "internals.h"
+#include "vec.h"
 
 static const VMStateDescription vmstate_fpu_reg = {
 .name = "fpu_reg",
@@ -76,6 +76,39 @@ static const VMStateDescription vmstate_lsx = {
 },
 };
 
+static const VMStateDescription vmstate_lasxh_reg = {
+.name = "lasxh_reg",
+.version_id = 1,
+.

[PATCH v6 34/57] target/loongarch: Implement LASX logic instructions

2023-09-13 Thread Song Gao

This patch includes:
- XV{AND/OR/XOR/NOR/ANDN/ORN}.V;
- XV{AND/OR/XOR/NOR}I.B.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   | 12 +++
 target/loongarch/disas.c| 12 +++
 target/loongarch/vec_helper.c   |  4 +--
 target/loongarch/insn_trans/trans_vec.c.inc | 38 -
 4 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index edaa756395..fb28666577 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1607,6 +1607,18 @@ xvmsknz_b0111 01101001 11000 11000 . .   
 @vv
 
 xvldi0111 0110 00 . . @v_i13
 
+xvand_v  0111 01010010 01100 . . .@vvv
+xvor_v   0111 01010010 01101 . . .@vvv
+xvxor_v  0111 01010010 01110 . . .@vvv
+xvnor_v  0111 01010010 0 . . .@vvv
+xvandn_v 0111 01010010 1 . . .@vvv
+xvorn_v  0111 01010010 10001 . . .@vvv
+
+xvandi_b 0111 0101 00  . .@vv_ui8
+xvori_b  0111 0101 01  . .@vv_ui8
+xvxori_b 0111 0101 10  . .@vv_ui8
+xvnori_b 0111 0101 11  . .@vv_ui8
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3f6fbeddd7..e9adc017db 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2029,6 +2029,18 @@ INSN_LASX(xvmsknz_b, vv)
 
 INSN_LASX(xvldi, v_i)
 
+INSN_LASX(xvand_v,   vvv)
+INSN_LASX(xvor_v,vvv)
+INSN_LASX(xvxor_v,   vvv)
+INSN_LASX(xvnor_v,   vvv)
+INSN_LASX(xvandn_v,  vvv)
+INSN_LASX(xvorn_v,   vvv)
+
+INSN_LASX(xvandi_b,  vv_i)
+INSN_LASX(xvori_b,   vv_i)
+INSN_LASX(xvxori_b,  vv_i)
+INSN_LASX(xvnori_b,  vv_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index f749800880..1a602ee548 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -941,13 +941,13 @@ void HELPER(vmsknz_b)(void *vd, void *vj, uint32_t desc)
 }
 }
 
-void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
+void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
 int i;
 VReg *Vd = (VReg *)vd;
 VReg *Vj = (VReg *)vj;
 
-for (i = 0; i < LSX_LEN/8; i++) {
+for (i = 0; i < simd_oprsz(desc); i++) {
 Vd->B(i) = ~(Vj->B(i) | (uint8_t)imm);
 }
 }
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 7ebe971ad9..5b14d0f894 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -3573,20 +3573,11 @@ static bool gen_vldi(DisasContext *ctx, arg_vldi *a, 
uint32_t oprsz)
 TRANS(vldi, LSX, gen_vldi, 16)
 TRANS(xvldi, LASX, gen_vldi, 32)
 
-TRANS(vand_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_and)
-TRANS(vor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_or)
-TRANS(vxor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_xor)
-TRANS(vnor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_nor)
-
-static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
+static bool gen_vandn_v(DisasContext *ctx, arg_vvv *a, uint32_t oprsz)
 {
 uint32_t vd_ofs, vj_ofs, vk_ofs;
 
-if (!avail_LSX(ctx)) {
-return false;
-}
-
-if (!check_vec(ctx, 16)) {
+if (!check_vec(ctx, oprsz)) {
 return true;
 }
 
@@ -3594,13 +3585,9 @@ static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
 vj_ofs = vec_full_offset(a->vj);
 vk_ofs = vec_full_offset(a->vk);
 
-tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, 16, ctx->vl/8);
+tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, oprsz, ctx->vl / 8);
 return true;
 }
-TRANS(vorn_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_orc)
-TRANS(vandi_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_andi)
-TRANS(vori_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_ori)
-TRANS(vxori_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_xori)
 
 static void gen_vnori(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
 {
@@ -3633,7 +3620,26 @@ static void do_vnori_b(unsigned vece, uint32_t vd_ofs, 
uint32_t vj_ofs,
 tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op);
 }
 
+TRANS(vand_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_and)
+TRANS(vor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_or)
+TRANS(vxor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_xor)
+TRANS(vnor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_nor)
+TRANS(vandn_v, LSX, gen_vandn_v, 16)
+TRANS(vorn_v, LSX, gvec_vvv, MO

[PATCH v6 49/57] target/loongarch: Implement xvfcmp

2023-09-13 Thread Song Gao

This patch includes:
- XVFCMP.cond.{S/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/helper.h   |  8 +-
 target/loongarch/insns.decode   |  3 +
 target/loongarch/disas.c| 93 +
 target/loongarch/vec_helper.c   |  4 +-
 target/loongarch/insn_trans/trans_vec.c.inc | 31 ---
 5 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e9c5412267..b54ce68077 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -652,10 +652,10 @@ DEF_HELPER_FLAGS_4(vslti_hu, TCG_CALL_NO_RWG, void, ptr, 
ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_c_s, void, env, i32, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_s_s, void, env, i32, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_c_d, void, env, i32, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_s_d, void, env, i32, i32, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(vbitseli_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 82c26a318b..0d46bd5e5e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1958,6 +1958,9 @@ xvslti_hu0111 01101000 10001 . . .
@vv_ui5
 xvslti_wu0111 01101000 10010 . . .@vv_ui5
 xvslti_du0111 01101000 10011 . . .@vv_ui5
 
+xvfcmp_cond_s 11001001 . . . .@vvv_fcond
+xvfcmp_cond_d 11001010 . . . .@vvv_fcond
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 48e0b559f2..4ab51b712e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2385,6 +2385,99 @@ INSN_LASX(xvslti_hu, vv_i)
 INSN_LASX(xvslti_wu, vv_i)
 INSN_LASX(xvslti_du, vv_i)
 
+#define output_xvfcmp(C, PREFIX, SUFFIX)\
+{   \
+(C)->info->fprintf_func((C)->info->stream, "%08x  %s%s\tx%d, x%d, x%d", \
+(C)->insn, PREFIX, SUFFIX, a->vd,   \
+a->vj, a->vk);  \
+}
+static bool output_xxx_fcond(DisasContext *ctx, arg_vvv_fcond * a,
+ const char *suffix)
+{
+bool ret = true;
+switch (a->fcond) {
+case 0x0:
+output_xvfcmp(ctx, "xvfcmp_caf_", suffix);
+break;
+case 0x1:
+output_xvfcmp(ctx, "xvfcmp_saf_", suffix);
+break;
+case 0x2:
+output_xvfcmp(ctx, "xvfcmp_clt_", suffix);
+break;
+case 0x3:
+output_xvfcmp(ctx, "xvfcmp_slt_", suffix);
+break;
+case 0x4:
+output_xvfcmp(ctx, "xvfcmp_ceq_", suffix);
+break;
+case 0x5:
+output_xvfcmp(ctx, "xvfcmp_seq_", suffix);
+break;
+case 0x6:
+output_xvfcmp(ctx, "xvfcmp_cle_", suffix);
+break;
+case 0x7:
+output_xvfcmp(ctx, "xvfcmp_sle_", suffix);
+break;
+case 0x8:
+output_xvfcmp(ctx, "xvfcmp_cun_", suffix);
+break;
+case 0x9:
+output_xvfcmp(ctx, "xvfcmp_sun_", suffix);
+break;
+case 0xA:
+output_xvfcmp(ctx, "xvfcmp_cult_", suffix);
+break;
+case 0xB:
+output_xvfcmp(ctx, "xvfcmp_sult_", suffix);
+break;
+case 0xC:
+output_xvfcmp(ctx, "xvfcmp_cueq_", suffix);
+break;
+case 0xD:
+output_xvfcmp(ctx, "xvfcmp_sueq_", suffix);
+break;
+case 0xE:
+output_xvfcmp(ctx, "xvfcmp_cule_", suffix);
+break;
+case 0xF:
+output_xvfcmp(ctx, "xvfcmp_sule_", suffix);
+break;
+case 0x10:
+output_xvfcmp(ctx, "xvfcmp_cne_", suffix);
+break;
+case 0x11:
+output_xvfcmp(ctx, "xvfcmp_sne_", suffix);
+break;
+case 0x14:
+output_xvfcmp(ctx, "xvfcmp_cor_", suffix);
+break;
+case 0x15:
+output_xvfcmp(ctx, "xvfcmp_sor_", suffix);
+break;
+case 0x18:
+output_xvfcmp(ctx, "xvfcmp_cune_", suffix);
+break;
+case 0x19:
+output_xvfcmp(ctx, "xvfcmp_sune_", suffix);
+break;
+default:
+ret = false;
+}
+return ret;
+}
+
+#define LASX_FCMP_INSN(suffix)\
+static bool trans_xvfcmp_cond

[PATCH v6 17/57] target/loongarch: Implement xvneg

2023-09-13 Thread Song Gao

This patch includes:
- XVNEG.{B/H/W/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  5 +
 target/loongarch/disas.c| 10 ++
 target/loongarch/insn_trans/trans_vec.c.inc | 19 +++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c48dca70b8..759172628f 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1320,6 +1320,11 @@ xvsubi_hu0111 01101000 11001 . . .   
 @vv_ui5
 xvsubi_wu0111 01101000 11010 . . .@vv_ui5
 xvsubi_du0111 01101000 11011 . . .@vv_ui5
 
+xvneg_b  0111 01101001 11000 01100 . .@vv
+xvneg_h  0111 01101001 11000 01101 . .@vv
+xvneg_w  0111 01101001 11000 01110 . .@vv
+xvneg_d  0111 01101001 11000 0 . .@vv
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 20df9c7c99..a7455840a0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1718,6 +1718,11 @@ static void output_vv_i_x(DisasContext *ctx, arg_vv_i 
*a, const char *mnemonic)
 output(ctx, mnemonic, "x%d, x%d, 0x%x", a->vd, a->vj, a->imm);
 }
 
+static void output_vv_x(DisasContext *ctx, arg_vv *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "x%d, x%d", a->vd, a->vj);
+}
+
 INSN_LASX(xvadd_b,   vvv)
 INSN_LASX(xvadd_h,   vvv)
 INSN_LASX(xvadd_w,   vvv)
@@ -1738,6 +1743,11 @@ INSN_LASX(xvsubi_hu, vv_i)
 INSN_LASX(xvsubi_wu, vv_i)
 INSN_LASX(xvsubi_du, vv_i)
 
+INSN_LASX(xvneg_b,   vv)
+INSN_LASX(xvneg_h,   vv)
+INSN_LASX(xvneg_w,   vv)
+INSN_LASX(xvneg_d,   vv)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 689db12d71..f837d695d1 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -223,6 +223,10 @@ static bool gvec_vv_vl(DisasContext *ctx, arg_vv *a,
 uint32_t vd_ofs = vec_full_offset(a->vd);
 uint32_t vj_ofs = vec_full_offset(a->vj);
 
+if (!check_vec(ctx, oprsz)) {
+return true;
+}
+
 func(mop, vd_ofs, vj_ofs, oprsz, ctx->vl / 8);
 return true;
 }
@@ -232,13 +236,16 @@ static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp 
mop,
 void (*func)(unsigned, uint32_t, uint32_t,
  uint32_t, uint32_t))
 {
-if (!check_vec(ctx, 16)) {
-return true;
-}
-
 return gvec_vv_vl(ctx, a, 16, mop, func);
 }
 
+static bool gvec_xx(DisasContext *ctx, arg_vv *a, MemOp mop,
+void (*func)(unsigned, uint32_t, uint32_t,
+ uint32_t, uint32_t))
+{
+return gvec_vv_vl(ctx, a, 32, mop, func);
+}
+
 static bool gvec_vv_i_vl(DisasContext *ctx, arg_vv_i *a,
  uint32_t oprsz, MemOp mop,
  void (*func)(unsigned, uint32_t, uint32_t,
@@ -383,6 +390,10 @@ TRANS(vneg_b, LSX, gvec_vv, MO_8, tcg_gen_gvec_neg)
 TRANS(vneg_h, LSX, gvec_vv, MO_16, tcg_gen_gvec_neg)
 TRANS(vneg_w, LSX, gvec_vv, MO_32, tcg_gen_gvec_neg)
 TRANS(vneg_d, LSX, gvec_vv, MO_64, tcg_gen_gvec_neg)
+TRANS(xvneg_b, LASX, gvec_xx, MO_8, tcg_gen_gvec_neg)
+TRANS(xvneg_h, LASX, gvec_xx, MO_16, tcg_gen_gvec_neg)
+TRANS(xvneg_w, LASX, gvec_xx, MO_32, tcg_gen_gvec_neg)
+TRANS(xvneg_d, LASX, gvec_xx, MO_64, tcg_gen_gvec_neg)
 
 TRANS(vsadd_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_ssadd)
 TRANS(vsadd_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_ssadd)
-- 
2.39.1

[PATCH v6 38/57] target/loongarch: Implement xvsrln xvsran

2023-09-13 Thread Song Gao

This patch includes:
- XVSRLN.{B.H/H.W/W.D};
- XVSRAN.{B.H/H.W/W.D};
- XVSRLNI.{B.H/H.W/W.D/D.Q};
- XVSRANI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  16 ++
 target/loongarch/disas.c|  16 ++
 target/loongarch/vec_helper.c   | 166 +++-
 target/loongarch/insn_trans/trans_vec.c.inc |  14 ++
 4 files changed, 137 insertions(+), 75 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ca0951e1cc..204dcfa075 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1678,6 +1678,22 @@ xvsrari_h0111 01101010 1 1  . .  
 @vv_ui4
 xvsrari_w0111 01101010 10001 . . .@vv_ui5
 xvsrari_d0111 01101010 1001 .. . .@vv_ui6
 
+xvsrln_b_h   0111 0100 01001 . . .@vvv
+xvsrln_h_w   0111 0100 01010 . . .@vvv
+xvsrln_w_d   0111 0100 01011 . . .@vvv
+xvsran_b_h   0111 0100 01101 . . .@vvv
+xvsran_h_w   0111 0100 01110 . . .@vvv
+xvsran_w_d   0111 0100 0 . . .@vvv
+
+xvsrlni_b_h  0111 01110100 0 1  . .   @vv_ui4
+xvsrlni_h_w  0111 01110100 1 . . .@vv_ui5
+xvsrlni_w_d  0111 01110100 0001 .. . .@vv_ui6
+xvsrlni_d_q  0111 01110100 001 ... . .@vv_ui7
+xvsrani_b_h  0111 01110101 1 1  . .   @vv_ui4
+xvsrani_h_w  0111 01110101 10001 . . .@vv_ui5
+xvsrani_w_d  0111 01110101 1001 .. . .@vv_ui6
+xvsrani_d_q  0111 01110101 101 ... . .@vv_ui7
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index bc5eb82b49..28e5e16eb2 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2104,6 +2104,22 @@ INSN_LASX(xvsrari_h, vv_i)
 INSN_LASX(xvsrari_w, vv_i)
 INSN_LASX(xvsrari_d, vv_i)
 
+INSN_LASX(xvsrln_b_h,vvv)
+INSN_LASX(xvsrln_h_w,vvv)
+INSN_LASX(xvsrln_w_d,vvv)
+INSN_LASX(xvsran_b_h,vvv)
+INSN_LASX(xvsran_h_w,vvv)
+INSN_LASX(xvsran_w_d,vvv)
+
+INSN_LASX(xvsrlni_b_h,   vv_i)
+INSN_LASX(xvsrlni_h_w,   vv_i)
+INSN_LASX(xvsrlni_w_d,   vv_i)
+INSN_LASX(xvsrlni_d_q,   vv_i)
+INSN_LASX(xvsrani_b_h,   vv_i)
+INSN_LASX(xvsrani_h_w,   vv_i)
+INSN_LASX(xvsrani_w_d,   vv_i)
+INSN_LASX(xvsrani_d_q,   vv_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index bb30d24b89..8c405ce32b 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -1109,105 +1109,121 @@ VSRARI(vsrari_d, 64, D)
 
 #define R_SHIFT(a, b) (a >> b)
 
-#define VSRLN(NAME, BIT, T, E1, E2) \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)  \
-{   \
-int i;  \
-VReg *Vd = (VReg *)vd;  \
-VReg *Vj = (VReg *)vj;  \
-VReg *Vk = (VReg *)vk;  \
-\
-for (i = 0; i < LSX_LEN/BIT; i++) { \
-Vd->E1(i) = R_SHIFT((T)Vj->E2(i),((T)Vk->E2(i)) % BIT); \
-}   \
-Vd->D(1) = 0;   \
+#define VSRLN(NAME, BIT, E1, E2)  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)\
+{ \
+int i, j, ofs;\
+VReg *Vd = (VReg *)vd;\
+VReg *Vj = (VReg *)vj;\
+VReg *Vk = (VReg *)vk;\
+int oprsz = simd_oprsz(desc); \
+  \
+ofs = LSX_LEN / BIT;  \
+for (i = 0; i < oprsz / 16; i++) {\
+for (j = 0; j < ofs; j++) {   \
+Vd->E1(j + ofs * 2 * i) = R_SHIFT(Vj->E2(j + ofs * i),\
+

[PATCH v6 08/57] target/loongarch: Use gen_helper_gvec_2 for 2OP vector instructions

2023-09-13 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/helper.h   |  58 -
 target/loongarch/vec_helper.c   | 124 ++--
 target/loongarch/insn_trans/trans_vec.c.inc |  16 ++-
 3 files changed, 101 insertions(+), 97 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0752cc7212..523591035d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -331,37 +331,37 @@ DEF_HELPER_FLAGS_4(vsat_hu, TCG_CALL_NO_RWG, void, ptr, 
ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_3(vexth_h_b, void, env, i32, i32)
-DEF_HELPER_3(vexth_w_h, void, env, i32, i32)
-DEF_HELPER_3(vexth_d_w, void, env, i32, i32)
-DEF_HELPER_3(vexth_q_d, void, env, i32, i32)
-DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
-DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
-DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
-DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(vexth_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
-DEF_HELPER_3(vmskltz_b, void, env, i32, i32)
-DEF_HELPER_3(vmskltz_h, void, env, i32, i32)
-DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
-DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
-DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
-DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
+DEF_HELPER_FLAGS_3(vmskltz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskltz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskltz_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskltz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskgez_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmsknz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
 DEF_HELPER_4(vsllwil_h_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_w_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_d_w, void, env, i32, i32, i32)
-DEF_HELPER_3(vextl_q_d, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(vextl_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
-DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(vextl_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vsrlr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsrlr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
@@ -473,19 +473,19 @@ DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
 
-DEF_HELPER_3(vclo_b, void, env, i32, i32)
-DEF_HELPER_3(vclo_h, void, env, i32, i32)
-DEF_HELPER_3(vclo_w, void, env, i32, i32)
-DEF_HELPER_3(vclo_d, void, env, i32, i32)
-DEF_HELPER_3(vclz_b, void, env, i32, i32)
-DEF_HELPER_3(vclz_h, void, env, i32, i32)
-DEF_HELPER_3(vclz_w, void, env, i32, i32)
-DEF_HELPER_3(vclz_d, void, env, i32, i32)
-
-DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
-DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
-DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
-DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(vclo_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclo_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclo_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclo_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(vpcnt_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vpcnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vpcnt_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vpcnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vbitclr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32

[PATCH v6 43/57] target/loongarch: Implement xvpcnt

2023-09-13 Thread Song Gao

This patch includes:
- VPCNT.{B/H/W/D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   | 5 +
 target/loongarch/disas.c| 5 +
 target/loongarch/vec_helper.c   | 3 ++-
 target/loongarch/insn_trans/trans_vec.c.inc | 4 
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3175532045..d683c6a6ab 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1779,6 +1779,11 @@ xvclz_h  0111 01101001 11000 00101 . .   
 @vv
 xvclz_w  0111 01101001 11000 00110 . .@vv
 xvclz_d  0111 01101001 11000 00111 . .@vv
 
+xvpcnt_b 0111 01101001 11000 01000 . .@vv
+xvpcnt_h 0111 01101001 11000 01001 . .@vv
+xvpcnt_w 0111 01101001 11000 01010 . .@vv
+xvpcnt_d 0111 01101001 11000 01011 . .@vv
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index bbf530b349..ff7f7a792a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2205,6 +2205,11 @@ INSN_LASX(xvclz_h,   vv)
 INSN_LASX(xvclz_w,   vv)
 INSN_LASX(xvclz_d,   vv)
 
+INSN_LASX(xvpcnt_b,  vv)
+INSN_LASX(xvpcnt_h,  vv)
+INSN_LASX(xvpcnt_w,  vv)
+INSN_LASX(xvpcnt_d,  vv)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 363309b6ea..e529b58419 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2296,8 +2296,9 @@ void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
 int i;   \
 VReg *Vd = (VReg *)vd;   \
 VReg *Vj = (VReg *)vj;   \
+int oprsz = simd_oprsz(desc);\
  \
-for (i = 0; i < LSX_LEN/BIT; i++)\
+for (i = 0; i < oprsz / (BIT / 8); i++)  \
 {\
 Vd->E(i) = FN(Vj->E(i)); \
 }\
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 85d0d10355..94afdf6d70 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -3962,6 +3962,10 @@ TRANS(vpcnt_b, LSX, gen_vv, gen_helper_vpcnt_b)
 TRANS(vpcnt_h, LSX, gen_vv, gen_helper_vpcnt_h)
 TRANS(vpcnt_w, LSX, gen_vv, gen_helper_vpcnt_w)
 TRANS(vpcnt_d, LSX, gen_vv, gen_helper_vpcnt_d)
+TRANS(xvpcnt_b, LASX, gen_xx, gen_helper_vpcnt_b)
+TRANS(xvpcnt_h, LASX, gen_xx, gen_helper_vpcnt_h)
+TRANS(xvpcnt_w, LASX, gen_xx, gen_helper_vpcnt_w)
+TRANS(xvpcnt_d, LASX, gen_xx, gen_helper_vpcnt_d)
 
 static void do_vbit(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
 void (*func)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec))
-- 
2.39.1

[PATCH v6 16/57] target/loongarch: Implement xvaddi/xvsubi

2023-09-13 Thread Song Gao

This patch includes:
- XVADDI.{B/H/W/D}U;
- XVSUBI.{B/H/W/D}U.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  9 ++
 target/loongarch/disas.c| 14 
 target/loongarch/insn_trans/trans_vec.c.inc | 36 -
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 04bd238995..c48dca70b8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1311,6 +1311,15 @@ xvsub_w  0111 0100 11010 . . .   
 @vvv
 xvsub_d  0111 0100 11011 . . .@vvv
 xvsub_q  0111 01010010 11011 . . .@vvv
 
+xvaddi_bu0111 01101000 10100 . . .@vv_ui5
+xvaddi_hu0111 01101000 10101 . . .@vv_ui5
+xvaddi_wu0111 01101000 10110 . . .@vv_ui5
+xvaddi_du0111 01101000 10111 . . .@vv_ui5
+xvsubi_bu0111 01101000 11000 . . .@vv_ui5
+xvsubi_hu0111 01101000 11001 . . .@vv_ui5
+xvsubi_wu0111 01101000 11010 . . .@vv_ui5
+xvsubi_du0111 01101000 11011 . . .@vv_ui5
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c47f455ed0..20df9c7c99 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1713,6 +1713,11 @@ static void output_vr_x(DisasContext *ctx, arg_vr *a, 
const char *mnemonic)
 output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
 }
 
+static void output_vv_i_x(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "x%d, x%d, 0x%x", a->vd, a->vj, a->imm);
+}
+
 INSN_LASX(xvadd_b,   vvv)
 INSN_LASX(xvadd_h,   vvv)
 INSN_LASX(xvadd_w,   vvv)
@@ -1724,6 +1729,15 @@ INSN_LASX(xvsub_w,   vvv)
 INSN_LASX(xvsub_d,   vvv)
 INSN_LASX(xvsub_q,   vvv)
 
+INSN_LASX(xvaddi_bu, vv_i)
+INSN_LASX(xvaddi_hu, vv_i)
+INSN_LASX(xvaddi_wu, vv_i)
+INSN_LASX(xvaddi_du, vv_i)
+INSN_LASX(xvsubi_bu, vv_i)
+INSN_LASX(xvsubi_hu, vv_i)
+INSN_LASX(xvsubi_wu, vv_i)
+INSN_LASX(xvsubi_du, vv_i)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 5001042870..689db12d71 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -247,6 +247,10 @@ static bool gvec_vv_i_vl(DisasContext *ctx, arg_vv_i *a,
 uint32_t vd_ofs = vec_full_offset(a->vd);
 uint32_t vj_ofs = vec_full_offset(a->vj);
 
+if (!check_vec(ctx, oprsz)) {
+return true;
+}
+
 func(mop, vd_ofs, vj_ofs, a->imm, oprsz, ctx->vl / 8);
 return true;
 }
@@ -255,32 +259,40 @@ static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, 
MemOp mop,
   void (*func)(unsigned, uint32_t, uint32_t,
int64_t, uint32_t, uint32_t))
 {
-if (!check_vec(ctx, 16)) {
-return true;
-}
-
 return gvec_vv_i_vl(ctx, a, 16, mop, func);
 }
 
+static bool gvec_xx_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
+  void (*func)(unsigned, uint32_t, uint32_t,
+   int64_t, uint32_t, uint32_t))
+{
+return gvec_vv_i_vl(ctx,a, 32, mop, func);
+}
+
 static bool gvec_subi_vl(DisasContext *ctx, arg_vv_i *a,
  uint32_t oprsz, MemOp mop)
 {
 uint32_t vd_ofs = vec_full_offset(a->vd);
 uint32_t vj_ofs = vec_full_offset(a->vj);
 
+if (!check_vec(ctx, oprsz)) {
+return true;
+}
+
 tcg_gen_gvec_addi(mop, vd_ofs, vj_ofs, -a->imm, oprsz, ctx->vl / 8);
 return true;
 }
 
 static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
 {
-if (!check_vec(ctx, 16)) {
-return true;
-}
-
 return gvec_subi_vl(ctx, a, 16, mop);
 }
 
+static bool gvec_xsubi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
+{
+return gvec_subi_vl(ctx, a, 32, mop);
+}
+
 TRANS(vadd_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_add)
 TRANS(vadd_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_add)
 TRANS(vadd_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_add)
@@ -358,6 +370,14 @@ TRANS(vsubi_bu, LSX, gvec_subi, MO_8)
 TRANS(vsubi_hu, LSX, gvec_subi, MO_16)
 TRANS(vsubi_wu, LSX, gvec_subi, MO_32)
 TRANS(vsubi_du, LSX, gvec_subi, MO_64)
+TRANS(xvaddi_bu, LASX, gvec_xx_i, MO_8, tcg_gen_gvec_addi)
+TRANS(xvaddi_hu, LASX, gvec_xx_i, MO_16, tcg_gen_gvec_addi)
+TRANS(xvaddi_wu, LASX, gvec_xx_i, MO_32, tcg_gen_gvec_addi)
+TRANS(xvaddi_du, LASX, gvec_xx_i, MO_64, tcg_gen_g

[PATCH v6 57/57] target/loongarch: CPUCFG support LASX

2023-09-13 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index a1d3f680d8..fc7f70fbe5 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -393,6 +393,7 @@ static void loongarch_la464_initfn(Object *obj)
 data = FIELD_DP32(data, CPUCFG2, FP_DP, 1);
 data = FIELD_DP32(data, CPUCFG2, FP_VER, 1);
 data = FIELD_DP32(data, CPUCFG2, LSX, 1),
+data = FIELD_DP32(data, CPUCFG2, LASX, 1),
 data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
 data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
 data = FIELD_DP32(data, CPUCFG2, LSPW, 1);
-- 
2.39.1

[PATCH v6 26/57] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}

2023-09-13 Thread Song Gao

This patch includes:
- XVMADD.{B/H/W/D};
- XVMSUB.{B/H/W/D};
- XVMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/insns.decode   |  34 ++
 target/loongarch/disas.c|  34 ++
 target/loongarch/vec_helper.c   | 112 +-
 target/loongarch/insn_trans/trans_vec.c.inc | 121 ++--
 4 files changed, 212 insertions(+), 89 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0f9ebe641f..d6fb51ae64 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1511,6 +1511,40 @@ xvmulwod_w_hu_h  0111 01001010 00101 . . .   
 @vvv
 xvmulwod_d_wu_w  0111 01001010 00110 . . .@vvv
 xvmulwod_q_du_d  0111 01001010 00111 . . .@vvv
 
+xvmadd_b 0111 01001010 1 . . .@vvv
+xvmadd_h 0111 01001010 10001 . . .@vvv
+xvmadd_w 0111 01001010 10010 . . .@vvv
+xvmadd_d 0111 01001010 10011 . . .@vvv
+xvmsub_b 0111 01001010 10100 . . .@vvv
+xvmsub_h 0111 01001010 10101 . . .@vvv
+xvmsub_w 0111 01001010 10110 . . .@vvv
+xvmsub_d 0111 01001010 10111 . . .@vvv
+
+xvmaddwev_h_b0111 01001010 11000 . . .@vvv
+xvmaddwev_w_h0111 01001010 11001 . . .@vvv
+xvmaddwev_d_w0111 01001010 11010 . . .@vvv
+xvmaddwev_q_d0111 01001010 11011 . . .@vvv
+xvmaddwod_h_b0111 01001010 11100 . . .@vvv
+xvmaddwod_w_h0111 01001010 11101 . . .@vvv
+xvmaddwod_d_w0111 01001010 0 . . .@vvv
+xvmaddwod_q_d0111 01001010 1 . . .@vvv
+xvmaddwev_h_bu   0111 01001011 01000 . . .@vvv
+xvmaddwev_w_hu   0111 01001011 01001 . . .@vvv
+xvmaddwev_d_wu   0111 01001011 01010 . . .@vvv
+xvmaddwev_q_du   0111 01001011 01011 . . .@vvv
+xvmaddwod_h_bu   0111 01001011 01100 . . .@vvv
+xvmaddwod_w_hu   0111 01001011 01101 . . .@vvv
+xvmaddwod_d_wu   0111 01001011 01110 . . .@vvv
+xvmaddwod_q_du   0111 01001011 0 . . .@vvv
+xvmaddwev_h_bu_b 0111 01001011 11000 . . .@vvv
+xvmaddwev_w_hu_h 0111 01001011 11001 . . .@vvv
+xvmaddwev_d_wu_w 0111 01001011 11010 . . .@vvv
+xvmaddwev_q_du_d 0111 01001011 11011 . . .@vvv
+xvmaddwod_h_bu_b 0111 01001011 11100 . . .@vvv
+xvmaddwod_w_hu_h 0111 01001011 11101 . . .@vvv
+xvmaddwod_d_wu_w 0111 01001011 0 . . .@vvv
+xvmaddwod_q_du_d 0111 01001011 1 . . .@vvv
+
 xvreplgr2vr_b0111 01101001 0 0 . .@vr
 xvreplgr2vr_h0111 01101001 0 1 . .@vr
 xvreplgr2vr_w0111 01101001 0 00010 . .@vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f839373a7a..e4369fd08b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1928,6 +1928,40 @@ INSN_LASX(xvmulwod_w_hu_h,   vvv)
 INSN_LASX(xvmulwod_d_wu_w,   vvv)
 INSN_LASX(xvmulwod_q_du_d,   vvv)
 
+INSN_LASX(xvmadd_b,  vvv)
+INSN_LASX(xvmadd_h,  vvv)
+INSN_LASX(xvmadd_w,  vvv)
+INSN_LASX(xvmadd_d,  vvv)
+INSN_LASX(xvmsub_b,  vvv)
+INSN_LASX(xvmsub_h,  vvv)
+INSN_LASX(xvmsub_w,  vvv)
+INSN_LASX(xvmsub_d,  vvv)
+
+INSN_LASX(xvmaddwev_h_b, vvv)
+INSN_LASX(xvmaddwev_w_h, vvv)
+INSN_LASX(xvmaddwev_d_w, vvv)
+INSN_LASX(xvmaddwev_q_d, vvv)
+INSN_LASX(xvmaddwod_h_b, vvv)
+INSN_LASX(xvmaddwod_w_h, vvv)
+INSN_LASX(xvmaddwod_d_w, vvv)
+INSN_LASX(xvmaddwod_q_d, vvv)
+INSN_LASX(xvmaddwev_h_bu,vvv)
+INSN_LASX(xvmaddwev_w_hu,vvv)
+INSN_LASX(xvmaddwev_d_wu,vvv)
+INSN_LASX(xvmaddwev_q_du,vvv)
+INSN_LASX(xvmaddwod_h_bu,vvv)
+INSN_LASX(xvmaddwod_w_hu,vvv)
+INSN_LASX(xvmaddwod_d_wu,vvv)
+INSN_LASX(xvmaddwod_q_du,vvv)
+INSN_LASX(xvmaddwev_h_bu_b,  vvv)
+INSN_LASX(xvmaddwev_w_hu_h,  vvv)
+INSN_LASX(xvmaddwev_d_wu_w,  vvv)
+INSN_LASX(xvmaddwev_q_du_d,  vvv)
+INSN_LASX(xvmaddwod_h_bu_b,  vvv)
+INSN_LASX(xvmaddwod_w_hu_h,  vvv)
+INSN_LASX(xvmaddwod_d_wu_w,  vvv)
+INSN_LASX(xvmaddwod_q_du_d,  vvv)
+
 INSN_LASX(xvreplgr2vr_b, vr)
 INSN_LASX(xvreplgr2vr_h, vr)
 INSN_LASX(xvreplgr2vr_w, vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index e152998094..a800554159 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -529,16 +529,18 @@ DO_ODD_U_S(vmulwod_d_wu_w, 64, D, UD, W, UW, DO_MUL)
 #define DO_MADD(a, b, c)  (a + b * c)
 #define DO_MSUB(a, b

1 2 3 4 5 >

1 - 100 of 401 matches

Mail list logo