date:20221223

Re: [PATCH v5 4/4] x86: re-enable rng seeding via SetupData

2022-12-23 Thread Jason A. Donenfeld

On Sat, Dec 24, 2022 at 04:09:08AM +0100, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> Replying to you from my telephone, and I'm traveling the next two days,
> but I thought I should mention some preliminary results right away from
> doing some termux compiles:
> 
> On Fri, Dec 23, 2022 at 04:14:00PM -0800, Eric Biggers wrote:
> > Hi Jason,
> > 
> > On Wed, Sep 21, 2022 at 11:31:34AM +0200, Jason A. Donenfeld wrote:
> > > This reverts 3824e25db1 ("x86: disable rng seeding via setup_data"), but
> > > for 7.2 rather than 7.1, now that modifying setup_data is safe to do.
> > > 
> > > Cc: Laurent Vivier 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Paolo Bonzini 
> > > Cc: Peter Maydell 
> > > Cc: Philippe Mathieu-Daudé 
> > > Cc: Richard Henderson 
> > > Cc: Ard Biesheuvel 
> > > Acked-by: Gerd Hoffmann 
> > > Signed-off-by: Jason A. Donenfeld 
> > > ---
> > >  hw/i386/microvm.c | 2 +-
> > >  hw/i386/pc_piix.c | 3 ++-
> > >  hw/i386/pc_q35.c  | 3 ++-
> > >  3 files changed, 5 insertions(+), 3 deletions(-)
> > > 
> > 
> > After upgrading to QEMU 7.2, Linux 6.1 no longer boots with some configs.  
> > There
> > is no output at all.  I bisected it to this commit, and I verified that the
> > following change to QEMU's master branch makes the problem go away:
> > 
> > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > index b48047f50c..42f5b07d2f 100644
> > --- a/hw/i386/pc_piix.c
> > +++ b/hw/i386/pc_piix.c
> > @@ -441,6 +441,7 @@ static void pc_i440fx_8_0_machine_options(MachineClass 
> > *m)
> >  pc_i440fx_machine_options(m);
> >  m->alias = "pc";
> >  m->is_default = true;
> > +PC_MACHINE_CLASS(m)->legacy_no_rng_seed = true;
> >  }
> > 
> > I've attached the kernel config I am seeing the problem on.
> > 
> > For some reason, the problem also goes away if I disable CONFIG_KASAN.
> > 
> > Any idea what is causing this?
> 
> - Commenting out the call to parse_setup_data() doesn't fix the issue.
>   So there's no KASAN issue with the actual parser.
> 
> - Using KASAN_OUTLINE rather than INLINE does fix the issue!
> 
> That makes me suspect that it's file size related, and QEMU or the BIOS
> is placing setup data at an overlapping offset by accident, or something
> similar.

I removed the file systems from your config to bring the kernel size
back down, and voila, it works, even with KASAN_INLINE. So perhaps I'm
on the right track here...


> 
> I'll investigate this hypothesis when I'm back at a real computer.
> 
> Jason
> 
> 
> 
> 
> 
> 
> > 
> > - Eric
> 
> > #
> > # Automatically generated file; DO NOT EDIT.
> > # Linux/x86 6.1.0 Kernel Configuration
> > #
> > CONFIG_CC_VERSION_TEXT="gcc (GCC) 12.2.0"
> > CONFIG_CC_IS_GCC=y
> > CONFIG_GCC_VERSION=120200
> > CONFIG_CLANG_VERSION=0
> > CONFIG_AS_IS_GNU=y
> > CONFIG_AS_VERSION=23900
> > CONFIG_LD_IS_BFD=y
> > CONFIG_LD_VERSION=23900
> > CONFIG_LLD_VERSION=0
> > CONFIG_CC_CAN_LINK=y
> > CONFIG_CC_CAN_LINK_STATIC=y
> > CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
> > CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
> > CONFIG_CC_HAS_ASM_INLINE=y
> > CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
> > CONFIG_PAHOLE_VERSION=0
> > CONFIG_CONSTRUCTORS=y
> > CONFIG_IRQ_WORK=y
> > CONFIG_BUILDTIME_TABLE_SORT=y
> > CONFIG_THREAD_INFO_IN_TASK=y
> > 
> > #
> > # General setup
> > #
> > CONFIG_INIT_ENV_ARG_LIMIT=32
> > # CONFIG_COMPILE_TEST is not set
> > # CONFIG_WERROR is not set
> > CONFIG_LOCALVERSION=""
> > CONFIG_LOCALVERSION_AUTO=y
> > CONFIG_BUILD_SALT=""
> > CONFIG_HAVE_KERNEL_GZIP=y
> > CONFIG_HAVE_KERNEL_BZIP2=y
> > CONFIG_HAVE_KERNEL_LZMA=y
> > CONFIG_HAVE_KERNEL_XZ=y
> > CONFIG_HAVE_KERNEL_LZO=y
> > CONFIG_HAVE_KERNEL_LZ4=y
> > CONFIG_HAVE_KERNEL_ZSTD=y
> > CONFIG_KERNEL_GZIP=y
> > # CONFIG_KERNEL_BZIP2 is not set
> > # CONFIG_KERNEL_LZMA is not set
> > # CONFIG_KERNEL_XZ is not set
> > # CONFIG_KERNEL_LZO is not set
> > # CONFIG_KERNEL_LZ4 is not set
> > # CONFIG_KERNEL_ZSTD is not set
> > CONFIG_DEFAULT_INIT=""
> > CONFIG_DEFAULT_HOSTNAME="(none)"
> > CONFIG_SYSVIPC=y
> > CONFIG_SYSVIPC_SYSCTL=y
> > CONFIG_SYSVIPC_COMPAT=y
> > CONFIG_POSIX_MQUEUE=y
> > CONFIG_POSIX_MQUEUE_SYSCTL=y
> > # CONFIG_WATCH_QUEUE is not set
> > CONFIG_CROSS_MEMORY_ATTACH=y
> > # CONFIG_USELIB is not set
> > # CONFIG_AUDIT is not set
> > CONFIG_HAVE_ARCH_AUDITSYSCALL=y
> > 
> > #
> > # IRQ subsystem
> > #
> > CONFIG_GENERIC_IRQ_PROBE=y
> > CONFIG_GENERIC_IRQ_SHOW=y
> > CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
> > CONFIG_GENERIC_PENDING_IRQ=y
> > CONFIG_GENERIC_IRQ_MIGRATION=y
> > CONFIG_HARDIRQS_SW_RESEND=y
> > CONFIG_IRQ_DOMAIN=y
> > CONFIG_IRQ_DOMAIN_HIERARCHY=y
> > CONFIG_GENERIC_MSI_IRQ=y
> > CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
> > CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
> > CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
> > CONFIG_IRQ_FORCED_THREADING=y
> > CONFIG_SPARSE_IRQ=y
> > # CONFIG_GENERIC_IRQ_DEBUGFS is not set
> > # end of IRQ subsystem
> > 
> > CONFIG_CLOCKSOURCE_WATCHDOG=y
> > CONFIG_ARCH_CLOCKSOURCE_INIT=y
> > CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
> > CONFIG_GENERIC_TIME_VSYSCALL=y
> > CONFIG_GENERIC_CLOCK

Re: [PATCH 01/15] tests/avocado: add RISC-V opensbi boot test

2022-12-23 Thread Bin Meng

Hi,

On Fri, Dec 23, 2022 at 2:25 PM Bin Meng  wrote:
>
> Hi Anup,
>
> On Fri, Dec 23, 2022 at 12:56 AM Anup Patel  wrote:
> >
> > On Thu, Dec 22, 2022 at 6:27 PM Bin Meng  wrote:
> > >
> > > On Thu, Dec 22, 2022 at 6:47 PM Daniel Henrique Barboza
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On 12/22/22 07:24, Bin Meng wrote:
> > > > > On Thu, Dec 22, 2022 at 2:29 AM Daniel Henrique Barboza
> > > > >  wrote:
> > > > >> This test is used to do a quick sanity check to ensure that we're 
> > > > >> able
> > > > >> to run the existing QEMU FW image.
> > > > >>
> > > > >> 'sifive_u', 'spike' and 'virt' riscv64 machines, and 'sifive_u' and
> > > > >> 'virt' 32 bit machines are able to run the default RISCV64_BIOS_BIN |
> > > > >> RISCV32_BIOS_BIN firmware with minimal options.
> > > > >>
> > > > >> Cc: Cleber Rosa 
> > > > >> Cc: Philippe Mathieu-Daudé 
> > > > >> Cc: Wainer dos Santos Moschetta 
> > > > >> Cc: Beraldo Leal 
> > > > >> Signed-off-by: Daniel Henrique Barboza 
> > > > >> ---
> > > > >>   tests/avocado/riscv_opensbi.py | 65 
> > > > >> ++
> > > > >>   1 file changed, 65 insertions(+)
> > > > >>   create mode 100644 tests/avocado/riscv_opensbi.py
> > > > >>
> > > > >> diff --git a/tests/avocado/riscv_opensbi.py 
> > > > >> b/tests/avocado/riscv_opensbi.py
> > > > >> new file mode 100644
> > > > >> index 00..abc99ced30
> > > > >> --- /dev/null
> > > > >> +++ b/tests/avocado/riscv_opensbi.py
> > > > >> @@ -0,0 +1,65 @@
> > > > >> +# opensbi boot test for RISC-V machines
> > > > >> +#
> > > > >> +# Copyright (c) 2022, Ventana Micro
> > > > >> +#
> > > > >> +# This work is licensed under the terms of the GNU GPL, version 2 or
> > > > >> +# later.  See the COPYING file in the top-level directory.
> > > > >> +
> > > > >> +from avocado_qemu import QemuSystemTest
> > > > >> +from avocado_qemu import wait_for_console_pattern
> > > > >> +
> > > > >> +class RiscvOpensbi(QemuSystemTest):
> > > > >> +"""
> > > > >> +:avocado: tags=accel:tcg
> > > > >> +"""
> > > > >> +timeout = 5
> > > > >> +
> > > > >> +def test_riscv64_virt(self):
> > > > >> +"""
> > > > >> +:avocado: tags=arch:riscv64
> > > > >> +:avocado: tags=machine:virt
> > > > >> +"""
> > > > >> +self.vm.set_console()
> > > > >> +self.vm.launch()
> > > > >> +wait_for_console_pattern(self, 'Platform Name')
> > > > >> +wait_for_console_pattern(self, 'Boot HART MEDELEG')
> > > > >> +
> > > > >> +def test_riscv64_spike(self):
> > > > >> +"""
> > > > >> +:avocado: tags=arch:riscv64
> > > > >> +:avocado: tags=machine:spike
> > > > >> +"""
> > > > >> +self.vm.set_console()
> > > > >> +self.vm.launch()
> > > > >> +wait_for_console_pattern(self, 'Platform Name')
> > > > >> +wait_for_console_pattern(self, 'Boot HART MEDELEG')
> > > > >> +
> > > > >> +def test_riscv64_sifive_u(self):
> > > > >> +"""
> > > > >> +:avocado: tags=arch:riscv64
> > > > >> +:avocado: tags=machine:sifive_u
> > > > >> +"""
> > > > >> +self.vm.set_console()
> > > > >> +self.vm.launch()
> > > > >> +wait_for_console_pattern(self, 'Platform Name')
> > > > >> +wait_for_console_pattern(self, 'Boot HART MEDELEG')
> > > > >> +
> > > > >> +def test_riscv32_virt(self):
> > > > >> +"""
> > > > >> +:avocado: tags=arch:riscv32
> > > > >> +:avocado: tags=machine:virt
> > > > >> +"""
> > > > >> +self.vm.set_console()
> > > > >> +self.vm.launch()
> > > > >> +wait_for_console_pattern(self, 'Platform Name')
> > > > >> +wait_for_console_pattern(self, 'Boot HART MEDELEG')
> > > > > How about testing riscv32_spike too?
> > > >
> > > >
> > > > I didn't manage to make it work. This riscv64 spark command line boots 
> > > > opensbi:
> > > >
> > > >
> > > > $ ./qemu-system-riscv64 -nographic -display none -vga none -machine 
> > > > spike
> > > >
> > > > OpenSBI v1.1
> > > > _  _
> > > >/ __ \  / |  _ \_   _|
> > > >   | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > >   | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > >   | |__| | |_) |  __/ | | |) | |_) || |_
> > > >\/| .__/ \___|_| |_|_/|/_|
> > > >  | |
> > > >  |_|
> > > >
> > > > (...)
> > > >
> > > > The same command line doesn't boot riscv32 spark:
> > > >
> > > > ./qemu-system-riscv32 -nographic -display none -vga none -machine spike
> > > > (--- hangs indefinitely ---)
> > > >
> > > > I debugged it a bit and, as far as boot code goes, it goes all the way 
> > > > and loads the
> > > > opensbi 32bit binary.
> > > >
> > > > After that I tried to found any command line example that boots spike 
> > > > with riscv32
> > > > bit and didn't find any.  So I gave up digging it further because I 
> > > > became unsure
> > > > about whether 3

Re: [PATCH v5 4/4] x86: re-enable rng seeding via SetupData

2022-12-23 Thread Jason A. Donenfeld

Hi Eric,

Replying to you from my telephone, and I'm traveling the next two days,
but I thought I should mention some preliminary results right away from
doing some termux compiles:

On Fri, Dec 23, 2022 at 04:14:00PM -0800, Eric Biggers wrote:
> Hi Jason,
> 
> On Wed, Sep 21, 2022 at 11:31:34AM +0200, Jason A. Donenfeld wrote:
> > This reverts 3824e25db1 ("x86: disable rng seeding via setup_data"), but
> > for 7.2 rather than 7.1, now that modifying setup_data is safe to do.
> > 
> > Cc: Laurent Vivier 
> > Cc: Michael S. Tsirkin 
> > Cc: Paolo Bonzini 
> > Cc: Peter Maydell 
> > Cc: Philippe Mathieu-Daudé 
> > Cc: Richard Henderson 
> > Cc: Ard Biesheuvel 
> > Acked-by: Gerd Hoffmann 
> > Signed-off-by: Jason A. Donenfeld 
> > ---
> >  hw/i386/microvm.c | 2 +-
> >  hw/i386/pc_piix.c | 3 ++-
> >  hw/i386/pc_q35.c  | 3 ++-
> >  3 files changed, 5 insertions(+), 3 deletions(-)
> > 
> 
> After upgrading to QEMU 7.2, Linux 6.1 no longer boots with some configs.  
> There
> is no output at all.  I bisected it to this commit, and I verified that the
> following change to QEMU's master branch makes the problem go away:
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index b48047f50c..42f5b07d2f 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -441,6 +441,7 @@ static void pc_i440fx_8_0_machine_options(MachineClass *m)
>  pc_i440fx_machine_options(m);
>  m->alias = "pc";
>  m->is_default = true;
> +PC_MACHINE_CLASS(m)->legacy_no_rng_seed = true;
>  }
> 
> I've attached the kernel config I am seeing the problem on.
> 
> For some reason, the problem also goes away if I disable CONFIG_KASAN.
> 
> Any idea what is causing this?

- Commenting out the call to parse_setup_data() doesn't fix the issue.
  So there's no KASAN issue with the actual parser.

- Using KASAN_OUTLINE rather than INLINE does fix the issue!

That makes me suspect that it's file size related, and QEMU or the BIOS
is placing setup data at an overlapping offset by accident, or something
similar.

I'll investigate this hypothesis when I'm back at a real computer.

Jason






> 
> - Eric

> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 6.1.0 Kernel Configuration
> #
> CONFIG_CC_VERSION_TEXT="gcc (GCC) 12.2.0"
> CONFIG_CC_IS_GCC=y
> CONFIG_GCC_VERSION=120200
> CONFIG_CLANG_VERSION=0
> CONFIG_AS_IS_GNU=y
> CONFIG_AS_VERSION=23900
> CONFIG_LD_IS_BFD=y
> CONFIG_LD_VERSION=23900
> CONFIG_LLD_VERSION=0
> CONFIG_CC_CAN_LINK=y
> CONFIG_CC_CAN_LINK_STATIC=y
> CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
> CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
> CONFIG_CC_HAS_ASM_INLINE=y
> CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
> CONFIG_PAHOLE_VERSION=0
> CONFIG_CONSTRUCTORS=y
> CONFIG_IRQ_WORK=y
> CONFIG_BUILDTIME_TABLE_SORT=y
> CONFIG_THREAD_INFO_IN_TASK=y
> 
> #
> # General setup
> #
> CONFIG_INIT_ENV_ARG_LIMIT=32
> # CONFIG_COMPILE_TEST is not set
> # CONFIG_WERROR is not set
> CONFIG_LOCALVERSION=""
> CONFIG_LOCALVERSION_AUTO=y
> CONFIG_BUILD_SALT=""
> CONFIG_HAVE_KERNEL_GZIP=y
> CONFIG_HAVE_KERNEL_BZIP2=y
> CONFIG_HAVE_KERNEL_LZMA=y
> CONFIG_HAVE_KERNEL_XZ=y
> CONFIG_HAVE_KERNEL_LZO=y
> CONFIG_HAVE_KERNEL_LZ4=y
> CONFIG_HAVE_KERNEL_ZSTD=y
> CONFIG_KERNEL_GZIP=y
> # CONFIG_KERNEL_BZIP2 is not set
> # CONFIG_KERNEL_LZMA is not set
> # CONFIG_KERNEL_XZ is not set
> # CONFIG_KERNEL_LZO is not set
> # CONFIG_KERNEL_LZ4 is not set
> # CONFIG_KERNEL_ZSTD is not set
> CONFIG_DEFAULT_INIT=""
> CONFIG_DEFAULT_HOSTNAME="(none)"
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_SYSVIPC_COMPAT=y
> CONFIG_POSIX_MQUEUE=y
> CONFIG_POSIX_MQUEUE_SYSCTL=y
> # CONFIG_WATCH_QUEUE is not set
> CONFIG_CROSS_MEMORY_ATTACH=y
> # CONFIG_USELIB is not set
> # CONFIG_AUDIT is not set
> CONFIG_HAVE_ARCH_AUDITSYSCALL=y
> 
> #
> # IRQ subsystem
> #
> CONFIG_GENERIC_IRQ_PROBE=y
> CONFIG_GENERIC_IRQ_SHOW=y
> CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
> CONFIG_GENERIC_PENDING_IRQ=y
> CONFIG_GENERIC_IRQ_MIGRATION=y
> CONFIG_HARDIRQS_SW_RESEND=y
> CONFIG_IRQ_DOMAIN=y
> CONFIG_IRQ_DOMAIN_HIERARCHY=y
> CONFIG_GENERIC_MSI_IRQ=y
> CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
> CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
> CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
> CONFIG_IRQ_FORCED_THREADING=y
> CONFIG_SPARSE_IRQ=y
> # CONFIG_GENERIC_IRQ_DEBUGFS is not set
> # end of IRQ subsystem
> 
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_ARCH_CLOCKSOURCE_INIT=y
> CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
> CONFIG_GENERIC_TIME_VSYSCALL=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
> CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
> CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y
> CONFIG_CONTEXT_TRACKING=y
> CONFIG_CONTEXT_TRACKING_IDLE=y
> 
> #
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
> # end of Timers subsystem
> 
> CO

[PATCH v3 2/2] hw/intc/loongarch_pch_pic: add irq number property

2022-12-23 Thread Tianrui Zhao

With loongarch 7A1000 manual, irq number supported can be set
in PCH_PIC_INT_ID_HI register. This patch adds irq number property
for loongarch_pch_pic, so that virt machine can set different
irq number when pch_pic intc is added.

Signed-off-by: Tianrui Zhao 
---
 hw/intc/loongarch_pch_pic.c | 29 +
 hw/loongarch/virt.c |  8 +---
 include/hw/intc/loongarch_pch_pic.h |  5 ++---
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/hw/intc/loongarch_pch_pic.c b/hw/intc/loongarch_pch_pic.c
index 3380b09807..26f36501b4 100644
--- a/hw/intc/loongarch_pch_pic.c
+++ b/hw/intc/loongarch_pch_pic.c
@@ -10,6 +10,7 @@
 #include "hw/loongarch/virt.h"
 #include "hw/irq.h"
 #include "hw/intc/loongarch_pch_pic.h"
+#include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
 #include "trace.h"
 
@@ -40,7 +41,7 @@ static void pch_pic_irq_handler(void *opaque, int irq, int 
level)
 LoongArchPCHPIC *s = LOONGARCH_PCH_PIC(opaque);
 uint64_t mask = 1ULL << irq;
 
-assert(irq < PCH_PIC_IRQ_NUM);
+assert(irq < s->irq_num);
 trace_loongarch_pch_pic_irq_handler(irq, level);
 
 if (s->intedge & mask) {
@@ -78,7 +79,12 @@ static uint64_t loongarch_pch_pic_low_readw(void *opaque, 
hwaddr addr,
 val = PCH_PIC_INT_ID_VAL;
 break;
 case PCH_PIC_INT_ID_HI:
-val = PCH_PIC_INT_ID_NUM;
+/*
+ * With 7A1000 manual
+ *   bit  0-15 pch irqchip version
+ *   bit 16-31 irq number supported with pch irqchip
+ */
+val = PCH_PIC_INT_ID_VER + ((s->irq_num - 1) << 16);
 break;
 case PCH_PIC_INT_MASK_LO:
 val = (uint32_t)s->int_mask;
@@ -365,6 +371,16 @@ static void loongarch_pch_pic_reset(DeviceState *d)
 s->int_polarity = 0x0;
 }
 
+static void loongarch_pch_pic_realize(DeviceState *dev, Error **errp)
+{
+LoongArchPCHPIC *s = LOONGARCH_PCH_PIC(dev);
+
+assert(s->irq_num > 0 && (s->irq_num <= 64));
+
+qdev_init_gpio_out(dev, s->parent_irq, s->irq_num);
+qdev_init_gpio_in(dev, pch_pic_irq_handler, s->irq_num);
+}
+
 static void loongarch_pch_pic_init(Object *obj)
 {
 LoongArchPCHPIC *s = LOONGARCH_PCH_PIC(obj);
@@ -382,10 +398,13 @@ static void loongarch_pch_pic_init(Object *obj)
 sysbus_init_mmio(sbd, &s->iomem8);
 sysbus_init_mmio(sbd, &s->iomem32_high);
 
-qdev_init_gpio_out(DEVICE(obj), s->parent_irq, PCH_PIC_IRQ_NUM);
-qdev_init_gpio_in(DEVICE(obj), pch_pic_irq_handler, PCH_PIC_IRQ_NUM);
 }
 
+static Property loongarch_pch_pic_properties[] = {
+DEFINE_PROP_UINT32("pch_pic_irq_num",  LoongArchPCHPIC, irq_num, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static const VMStateDescription vmstate_loongarch_pch_pic = {
 .name = TYPE_LOONGARCH_PCH_PIC,
 .version_id = 1,
@@ -411,8 +430,10 @@ static void loongarch_pch_pic_class_init(ObjectClass 
*klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc->realize = loongarch_pch_pic_realize;
 dc->reset = loongarch_pch_pic_reset;
 dc->vmsd = &vmstate_loongarch_pch_pic;
+device_class_set_props(dc, loongarch_pch_pic_properties);
 }
 
 static const TypeInfo loongarch_pch_pic_info = {
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 1e58346aeb..a39704e1e7 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -559,6 +559,8 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 }
 
 pch_pic = qdev_new(TYPE_LOONGARCH_PCH_PIC);
+num = PCH_PIC_IRQ_NUM;
+qdev_prop_set_uint32(pch_pic, "pch_pic_irq_num", num);
 d = SYS_BUS_DEVICE(pch_pic);
 sysbus_realize_and_unref(d, &error_fatal);
 memory_region_add_subregion(get_system_memory(), VIRT_IOAPIC_REG_BASE,
@@ -570,13 +572,13 @@ static void loongarch_irq_init(LoongArchMachineState 
*lams)
 VIRT_IOAPIC_REG_BASE + PCH_PIC_INT_STATUS_LO,
 sysbus_mmio_get_region(d, 2));
 
-/* Connect 64 pch_pic irqs to extioi */
-for (int i = 0; i < PCH_PIC_IRQ_NUM; i++) {
+/* Connect pch_pic irqs to extioi */
+for (int i = 0; i < num; i++) {
 qdev_connect_gpio_out(DEVICE(d), i, qdev_get_gpio_in(extioi, i));
 }
 
 pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);
-start   =  PCH_PIC_IRQ_NUM;
+start   =  num;
 num = EXTIOI_IRQS - start;
 qdev_prop_set_uint32(pch_msi, "msi_irq_base", start);
 qdev_prop_set_uint32(pch_msi, "msi_irq_num", num);
diff --git a/include/hw/intc/loongarch_pch_pic.h 
b/include/hw/intc/loongarch_pch_pic.h
index 2d4aa9ed6f..ba3a47fa88 100644
--- a/include/hw/intc/loongarch_pch_pic.h
+++ b/include/hw/intc/loongarch_pch_pic.h
@@ -9,11 +9,9 @@
 #define PCH_PIC_NAME(name) TYPE_LOONGARCH_PCH_PIC#name
 OBJECT_DECLARE_SIMPLE_TYPE(LoongArchPCHPIC, LOONGARCH_PCH_PIC)
 
-#define PCH_PIC_IRQ_START   0
-#define PCH_PIC_IRQ_END 63
 #define PCH_PIC_IRQ_NUM 64
 #define PCH_PIC_INT_ID_VAL  0x700UL
-#define PCH

[PATCH v3 1/2] hw/intc/loongarch_pch_msi: add irq number property

2022-12-23 Thread Tianrui Zhao

This patch adds irq number property for loongarch msi interrupt
controller, and remove hard coding irq number macro.

Signed-off-by: Tianrui Zhao 
---
 hw/intc/loongarch_pch_msi.c | 33 ++---
 hw/loongarch/virt.c | 13 +++-
 include/hw/intc/loongarch_pch_msi.h |  3 ++-
 include/hw/pci-host/ls7a.h  |  1 -
 4 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/hw/intc/loongarch_pch_msi.c b/hw/intc/loongarch_pch_msi.c
index b36d6d76e4..09d3890ade 100644
--- a/hw/intc/loongarch_pch_msi.c
+++ b/hw/intc/loongarch_pch_msi.c
@@ -32,7 +32,7 @@ static void loongarch_msi_mem_write(void *opaque, hwaddr addr,
  */
 irq_num = (val & 0xff) - s->irq_base;
 trace_loongarch_msi_set_irq(irq_num);
-assert(irq_num < PCH_MSI_IRQ_NUM);
+assert(irq_num < s->irq_num);
 qemu_set_irq(s->pch_msi_irq[irq_num], 1);
 }
 
@@ -49,6 +49,32 @@ static void pch_msi_irq_handler(void *opaque, int irq, int 
level)
 qemu_set_irq(s->pch_msi_irq[irq], level);
 }
 
+static void loongarch_pch_msi_realize(DeviceState *dev, Error **errp)
+{
+LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
+
+if (!s->irq_num || s->irq_num  > PCH_MSI_IRQ_NUM) {
+error_setg(errp, "Invalid 'msi_irq_num'");
+return;
+}
+
+s->pch_msi_irq = g_new(qemu_irq, s->irq_num);
+if (!s->pch_msi_irq) {
+error_report("loongarch_pch_msi: fail to alloc memory");
+exit(1);
+}
+
+qdev_init_gpio_out(dev, s->pch_msi_irq, s->irq_num);
+qdev_init_gpio_in(dev, pch_msi_irq_handler, s->irq_num);
+}
+
+static void loongarch_pch_msi_unrealize(DeviceState *dev)
+{
+LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
+
+g_free(s->pch_msi_irq);
+}
+
 static void loongarch_pch_msi_init(Object *obj)
 {
 LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(obj);
@@ -59,12 +85,11 @@ static void loongarch_pch_msi_init(Object *obj)
 sysbus_init_mmio(sbd, &s->msi_mmio);
 msi_nonbroken = true;
 
-qdev_init_gpio_out(DEVICE(obj), s->pch_msi_irq, PCH_MSI_IRQ_NUM);
-qdev_init_gpio_in(DEVICE(obj), pch_msi_irq_handler, PCH_MSI_IRQ_NUM);
 }
 
 static Property loongarch_msi_properties[] = {
 DEFINE_PROP_UINT32("msi_irq_base", LoongArchPCHMSI, irq_base, 0),
+DEFINE_PROP_UINT32("msi_irq_num",  LoongArchPCHMSI, irq_num, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -72,6 +97,8 @@ static void loongarch_pch_msi_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc->realize = loongarch_pch_msi_realize;
+dc->unrealize = loongarch_pch_msi_unrealize;
 device_class_set_props(dc, loongarch_msi_properties);
 }
 
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 958be74fa1..1e58346aeb 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -496,7 +496,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 LoongArchCPU *lacpu;
 CPULoongArchState *env;
 CPUState *cpu_state;
-int cpu, pin, i;
+int cpu, pin, i, start, num;
 
 ipi = qdev_new(TYPE_LOONGARCH_IPI);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(ipi), &error_fatal);
@@ -576,14 +576,17 @@ static void loongarch_irq_init(LoongArchMachineState 
*lams)
 }
 
 pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);
-qdev_prop_set_uint32(pch_msi, "msi_irq_base", PCH_MSI_IRQ_START);
+start   =  PCH_PIC_IRQ_NUM;
+num = EXTIOI_IRQS - start;
+qdev_prop_set_uint32(pch_msi, "msi_irq_base", start);
+qdev_prop_set_uint32(pch_msi, "msi_irq_num", num);
 d = SYS_BUS_DEVICE(pch_msi);
 sysbus_realize_and_unref(d, &error_fatal);
 sysbus_mmio_map(d, 0, VIRT_PCH_MSI_ADDR_LOW);
-for (i = 0; i < PCH_MSI_IRQ_NUM; i++) {
-/* Connect 192 pch_msi irqs to extioi */
+for (i = 0; i < num; i++) {
+/* Connect pch_msi irqs to extioi */
 qdev_connect_gpio_out(DEVICE(d), i,
-  qdev_get_gpio_in(extioi, i + PCH_MSI_IRQ_START));
+  qdev_get_gpio_in(extioi, i + start));
 }
 
 loongarch_devices_init(pch_pic, lams);
diff --git a/include/hw/intc/loongarch_pch_msi.h 
b/include/hw/intc/loongarch_pch_msi.h
index 6d67560dea..c5a52bc327 100644
--- a/include/hw/intc/loongarch_pch_msi.h
+++ b/include/hw/intc/loongarch_pch_msi.h
@@ -15,8 +15,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(LoongArchPCHMSI, LOONGARCH_PCH_MSI)
 
 struct LoongArchPCHMSI {
 SysBusDevice parent_obj;
-qemu_irq pch_msi_irq[PCH_MSI_IRQ_NUM];
+qemu_irq *pch_msi_irq;
 MemoryRegion msi_mmio;
 /* irq base passed to upper extioi intc */
 unsigned int irq_base;
+unsigned int irq_num;
 };
diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index df7fa55a30..6443327bd7 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -34,7 +34,6 @@
  */
 #define PCH_PIC_IRQ_OFFSET   64
 #define VIRT_DEVICE_IRQS 16
-#define VIRT_PCI_IRQS48
 #define VIRT_UART_IRQ(PCH_PIC_IRQ_OFFSET + 2)
 #

[PATCH v3 0/2] Add irq number property for loongarch pch interrupt controller

2022-12-23 Thread Tianrui Zhao

This series add irq number property for loongarch pch_msi
and pch_pic interrupt controller.

Changes for v3: 
(1) Fix the valid range of msi_irq_num, it will trigger error_setg() when
irq_num is invalid.
(2) Using g_new() to alloc msi_irqs when pch_msi realize.
(3) Using EXTIOI_IRQS macro to replace the 256 irq number.

Changes for v2: 
(1) Free pch_msi_irq array in pch_msi_unrealize().

Changes for v1: 
(1) Add irq number property for loongarch_pch_msi.
(2) Add irq number property for loongarch_pch_pic.

Tianrui Zhao (2):
  hw/intc/loongarch_pch_msi: add irq number property
  hw/intc/loongarch_pch_pic: add irq number property

 hw/intc/loongarch_pch_msi.c | 33 ++---
 hw/intc/loongarch_pch_pic.c | 29 +
 hw/loongarch/virt.c | 19 +++--
 include/hw/intc/loongarch_pch_msi.h |  3 ++-
 include/hw/intc/loongarch_pch_pic.h |  5 ++---
 include/hw/pci-host/ls7a.h  |  1 -
 6 files changed, 71 insertions(+), 19 deletions(-)

-- 
2.31.1

Re: [PATCH v5 4/4] x86: re-enable rng seeding via SetupData

2022-12-23 Thread Eric Biggers

Hi Jason,

On Wed, Sep 21, 2022 at 11:31:34AM +0200, Jason A. Donenfeld wrote:
> This reverts 3824e25db1 ("x86: disable rng seeding via setup_data"), but
> for 7.2 rather than 7.1, now that modifying setup_data is safe to do.
> 
> Cc: Laurent Vivier 
> Cc: Michael S. Tsirkin 
> Cc: Paolo Bonzini 
> Cc: Peter Maydell 
> Cc: Philippe Mathieu-Daudé 
> Cc: Richard Henderson 
> Cc: Ard Biesheuvel 
> Acked-by: Gerd Hoffmann 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  hw/i386/microvm.c | 2 +-
>  hw/i386/pc_piix.c | 3 ++-
>  hw/i386/pc_q35.c  | 3 ++-
>  3 files changed, 5 insertions(+), 3 deletions(-)
> 

After upgrading to QEMU 7.2, Linux 6.1 no longer boots with some configs.  There
is no output at all.  I bisected it to this commit, and I verified that the
following change to QEMU's master branch makes the problem go away:

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b48047f50c..42f5b07d2f 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -441,6 +441,7 @@ static void pc_i440fx_8_0_machine_options(MachineClass *m)
 pc_i440fx_machine_options(m);
 m->alias = "pc";
 m->is_default = true;
+PC_MACHINE_CLASS(m)->legacy_no_rng_seed = true;
 }

I've attached the kernel config I am seeing the problem on.

For some reason, the problem also goes away if I disable CONFIG_KASAN.

Any idea what is causing this?

- Eric
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 6.1.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (GCC) 12.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120200
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23900
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23900
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=0
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_CONTEXT_TRACKING_IDLE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# end of BPF subsystem

CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
# CONFIG_SCHED_CORE is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADE

Re: [PATCH v2 4/4] docs/devel: Rules on #include in headers

2022-12-23 Thread Alex Bennée



Markus Armbruster  writes:

> Rules for headers were proposed a long time ago, and generally liked:
>
> Message-ID: <87h9g8j57d@blackfin.pond.sub.org>
> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html
>
> Wortk them into docs/devel/style.rst.

nit: spelling Work

>
> Suggested-by: Bernhard Beschow 
> Signed-off-by: Markus Armbruster 

Reviewed-by: Alex Bennée 

> ---
>  docs/devel/style.rst | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/docs/devel/style.rst b/docs/devel/style.rst
> index 7ddd42b6c2..68aa776930 100644
> --- a/docs/devel/style.rst
> +++ b/docs/devel/style.rst
> @@ -293,6 +293,13 @@ that QEMU depends on.
>  Do not include "qemu/osdep.h" from header files since the .c file will have
>  already included it.
>  
> +Headers should normally include everything they need beyond osdep.h.
> +If exceptions are needed for some reason, they must be documented in
> +the header.  If all that's needed from a header is typedefs, consider
> +putting those into qemu/typedefs.h instead of including the header.
> +
> +Cyclic inclusion is forbidden.
> +
>  C types
>  ===


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v2] linux-user: Fix brk() to release pages

2022-12-23 Thread Helge Deller

The current brk() implementation does not de-allocate pages if a lower
address is given compared to earlier brk() calls.
But according to the manpage, brk() shall deallocate memory in this case
and currently it breaks a real-world application, specifically building
the debian gcl package in qemu-user.

Fix this issue by reworking the qemu brk() implementation.

Tested with the C-code testcase included in qemu commit 4d1de87c750, and
by building debian package of gcl in a hppa-linux guest on a x86-64
host.

Signed-off-by: Helge Deller 

---
v2:
- Fixed return value of brk(). The v1 version wrongly page-aligned the
  provided address, while userspace expects unmodified the address
  returned unmodified.


diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4fee882cd7..2fcb6dba06 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -838,49 +838,59 @@ static inline int host_to_target_sock_type(int host_type)
 }

 static abi_ulong target_brk;
-static abi_ulong target_original_brk;
 static abi_ulong brk_page;

 void target_set_brk(abi_ulong new_brk)
 {
-target_original_brk = target_brk = HOST_PAGE_ALIGN(new_brk);
-brk_page = HOST_PAGE_ALIGN(target_brk);
+target_brk = new_brk;
+brk_page = HOST_PAGE_ALIGN(new_brk);
 }

 //#define DEBUGF_BRK(message, args...) do { fprintf(stderr, (message), ## 
args); } while (0)
 #define DEBUGF_BRK(message, args...)

 /* do_brk() must return target values and target errnos. */
-abi_long do_brk(abi_ulong new_brk)
+abi_long do_brk(abi_ulong brk_val)
 {
 abi_long mapped_addr;
 abi_ulong new_alloc_size;
+abi_ulong new_brk, new_host_brk_page;

 /* brk pointers are always untagged */

+/* return old brk value on zero brk_val */
+if (!brk_val || brk_val == target_brk) {
+return target_brk;
+}
+
+new_brk = TARGET_PAGE_ALIGN(brk_val);
+new_host_brk_page = HOST_PAGE_ALIGN(brk_val);
+
 DEBUGF_BRK("do_brk(" TARGET_ABI_FMT_lx ") -> ", new_brk);

-if (!new_brk) {
+/* brk_val and old target_brk might be on the same page */
+if (new_brk == TARGET_PAGE_ALIGN(target_brk)) {
+if (brk_val > target_brk) {
+/* empty remaining bytes in (possibly larger) host page */
+memset(g2h_untagged(target_brk), 0, new_host_brk_page - 
target_brk);
+}
 DEBUGF_BRK(TARGET_ABI_FMT_lx " (!new_brk)\n", target_brk);
-return target_brk;
-}
-if (new_brk < target_original_brk) {
-DEBUGF_BRK(TARGET_ABI_FMT_lx " (new_brk < target_original_brk)\n",
-   target_brk);
-return target_brk;
+target_brk = brk_val;
+return brk_val;
 }

-/* If the new brk is less than the highest page reserved to the
- * target heap allocation, set it and we're almost done...  */
-if (new_brk <= brk_page) {
-/* Heap contents are initialized to zero, as for anonymous
- * mapped pages.  */
-if (new_brk > target_brk) {
-memset(g2h_untagged(target_brk), 0, new_brk - target_brk);
-}
-   target_brk = new_brk;
+/* Release heap if necesary */
+if (new_brk < target_brk) {
+/* empty remaining bytes in (possibly larger) host page */
+memset(g2h_untagged(brk_val), 0, new_host_brk_page - brk_val);
+
+/* free unused host pages and set new brk_page */
+target_munmap(new_host_brk_page, brk_page - new_host_brk_page);
+brk_page = new_host_brk_page;
+
+   target_brk = brk_val;
 DEBUGF_BRK(TARGET_ABI_FMT_lx " (new_brk <= brk_page)\n", target_brk);
-   return target_brk;
+   return brk_val;
 }

 /* We need to allocate more memory after the brk... Note that
@@ -889,10 +899,14 @@ abi_long do_brk(abi_ulong new_brk)
  * itself); instead we treat "mapped but at wrong address" as
  * a failure and unmap again.
  */
-new_alloc_size = HOST_PAGE_ALIGN(new_brk - brk_page);
-mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size,
+new_alloc_size = new_host_brk_page - brk_page;
+if (new_alloc_size) {
+mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size,
 PROT_READ|PROT_WRITE,
 MAP_ANON|MAP_PRIVATE, 0, 0));
+} else {
+mapped_addr = brk_page;
+}

 if (mapped_addr == brk_page) {
 /* Heap contents are initialized to zero, as for anonymous
@@ -905,10 +919,10 @@ abi_long do_brk(abi_ulong new_brk)
 memset(g2h_untagged(target_brk), 0, brk_page - target_brk);

 target_brk = new_brk;
-brk_page = HOST_PAGE_ALIGN(target_brk);
+brk_page = new_host_brk_page;
 DEBUGF_BRK(TARGET_ABI_FMT_lx " (mapped_addr == brk_page)\n",
 target_brk);
-return target_brk;
+return brk_val;
 } else if (mapped_addr != -1) {
 /* Mapped but at wrong address, meaning there wasn't actually
  * enough space for this brk.

[PATCH] scripts/coverity-scan/model.c: update address_space_*_cached

2022-12-23 Thread Vladimir Sementsov-Ogievskiy

Make prototypes correspond to their origins. Also drop
address_space_rw_cached() which doesn't exist anywhere in the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/coverity-scan/model.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/scripts/coverity-scan/model.c b/scripts/coverity-scan/model.c
index 686d1a3008..b40d0fcbf3 100644
--- a/scripts/coverity-scan/model.c
+++ b/scripts/coverity-scan/model.c
@@ -69,7 +69,6 @@ static void __bufread(uint8_t *buf, ssize_t len)
 }
 
 MemTxResult address_space_read_cached(MemoryRegionCache *cache, hwaddr addr,
-  MemTxAttrs attrs,
   void *buf, int len)
 {
 MemTxResult result;
@@ -80,25 +79,13 @@ MemTxResult address_space_read_cached(MemoryRegionCache 
*cache, hwaddr addr,
 }
 
 MemTxResult address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
-MemTxAttrs attrs,
-const void *buf, int len)
+   const void *buf, int len)
 {
 MemTxResult result;
 __bufread(buf, len);
 return result;
 }
 
-MemTxResult address_space_rw_cached(MemoryRegionCache *cache, hwaddr addr,
-MemTxAttrs attrs,
-void *buf, int len, bool is_write)
-{
-if (is_write) {
-return address_space_write_cached(cache, addr, attrs, buf, len);
-} else {
-return address_space_read_cached(cache, addr, attrs, buf, len);
-}
-}
-
 MemTxResult address_space_read(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs,
void *buf, int len)
-- 
2.34.1

Re: [RFC v4 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Chuang Xu

On 2022/12/23 下午11:50, Peter Xu wrote:

Chuang,

On Fri, Dec 23, 2022 at 10:23:04PM +0800, Chuang Xu wrote:

In this version:

- attach more information in the cover letter.
- remove changes on virtio_load().
- add rcu_read_locked() to detect holding of rcu lock.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

Have you investigated why multi-queue added so much downtime overhead with
the same environment, comparing to below [1]?

I have analyzed the downtime in detail. Both stopping and starting the
device are
time-consuming.

For stopping vhost-net devices, vhost_net_stop_one() will be called once more
for each additional queue, while vhost_virtqueue_stop() will be called twice
in vhost_dev_stop(). For example, we need to call vhost_virtqueue_stop()
32(= 16 * 2) times to stop a 16-queue vhost-net device. In
vhost_virtqueue_stop(),
QEMU needs to negotiate with the vhost user daemon. The same is true
for vhost-net
devices startup.

For stopping vhost-user-blk devices, vhost_virtqueue_stop() will be called once
more for each additional queue. For example, we need to call
vhost_virtqueue_stop()
4 times to stop a 4-queue vhost-user-blk device. The same is true for
vhost-user-blk
devices startup.

It seems that the vhost-user-blk device is less affected by the number
of queues
than the vhost-net device. However, the vhost-user-blk device needs to prepare
inflight when it is started. The negotiation with spdk in this step is also
time-consuming. I tried to move this step to the startup phase of the
target QEMU
before the migration started. In my test, This optimization can greatly reduce
the vhost-user-blk device startup time and thus reduce the downtime.
I'm not sure
whether this is hacky. If you are interested in this, maybe we can
discuss it further.

(This result is different from that of v1. It may be that someone has
changed something on my host.., but it does not affect the display of
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms
after   about 25 ms  about 160 ms

[1]


In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms


As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

The downtime measured in precopy can be more complicated than postcopy
because the time of switch is calculated by qemu based on the downtime
setup, and also that contains part of RAM migrations.  Postcopy should be
more accurate on that because there's no calculation done, meanwhile
there's no RAM transferred during downtime.

However postcopy downtime is not accurate either in implementation of it in
postcopy_start(), where the downtime is measured right after we flushed the
packed data, and right below it there's some idea of optimizing it:

if (migrate_postcopy_ram()) {
/*
 * Although this ping is just for debug, it could potentially be
 * used for getting a better measurement of downtime at the source.
 */
qemu_savevm_send_ping(ms->to_dst_file, 4);
}

So maybe I'll have a look there.

The current calculation of downtime is really inaccurate, because the source

[PATCH v2 03/15] RISC-V: Adding XTheadBa ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadBa ISA extension.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Split XtheadB* extension into individual commits
- Use single decoder for XThead extensions

Co-developed-by: Philipp Tomsich 
Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 ++
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 39 ++
 target/riscv/translate.c   |  3 +-
 target/riscv/xthead.decode | 22 
 5 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a848836d2e..809b6eb4ed 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -108,6 +108,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0, ext_svinval),
 ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
+ISA_EXT_DATA_ENTRY(xtheadba, true, PRIV_VERSION_1_11_0, ext_xtheadba),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
@@ -1062,6 +1063,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
 /* Vendor-specific custom extensions */
+DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 4d3da2acfa..ec2588a0f0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -465,6 +465,7 @@ struct RISCVCPUConfig {
 uint64_t mimpid;
 
 /* Vendor-specific custom extensions */
+bool ext_xtheadba;
 bool ext_xtheadcmo;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index 6009d61c81..79e1f5bde9 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -16,6 +16,12 @@
  * this program.  If not, see .
  */
 
+#define REQUIRE_XTHEADBA(ctx) do {   \
+if (!ctx->cfg_ptr->ext_xtheadba) {   \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADCMO(ctx) do {  \
 if (!ctx->cfg_ptr->ext_xtheadcmo) {  \
 return false;\
@@ -28,6 +34,39 @@
 }\
 } while (0)
 
+/* XTheadBa */
+
+/*
+ * th.addsl is similar to sh[123]add (from Zba), but not an
+ * alternative encoding: while sh[123] applies the shift to rs1,
+ * th.addsl shifts rs2.
+ */
+
+#define GEN_TH_ADDSL(SHAMT) \
+static void gen_th_addsl##SHAMT(TCGv ret, TCGv arg1, TCGv arg2) \
+{   \
+TCGv t = tcg_temp_new();\
+tcg_gen_shli_tl(t, arg2, SHAMT);\
+tcg_gen_add_tl(ret, t, arg1);   \
+tcg_temp_free(t);   \
+}
+
+GEN_TH_ADDSL(1)
+GEN_TH_ADDSL(2)
+GEN_TH_ADDSL(3)
+
+#define GEN_TRANS_TH_ADDSL(SHAMT)   \
+static bool trans_th_addsl##SHAMT(DisasContext *ctx,\
+  arg_th_addsl##SHAMT * a)  \
+{   \
+REQUIRE_XTHEADBA(ctx);  \
+return gen_arith(ctx, a, EXT_NONE, gen_th_addsl##SHAMT, NULL);  \
+}
+
+GEN_TRANS_TH_ADDSL(1)
+GEN_TRANS_TH_ADDSL(2)
+GEN_TRANS_TH_ADDSL(3)
+
 /* XTheadCmo */
 
 static inline int priv_level(DisasContext *ctx)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index c40617662a..7b35f1d71b 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -127,7 +127,8 @@ static bool always_true_p(DisasContext *ctx  
__attribute__((__unused__)))
 
 static bool has_xthead_p(DisasContext *ctx  __attribute__((__unused__)))
 {
-return ctx->cfg_ptr->ext_xtheadcmo || ctx->cfg_ptr->ext_xtheadsync;
+return ctx->cfg_ptr->ext_xtheadba || ctx->cfg_ptr->ext_xtheadcmo ||
+   ctx->cfg_ptr->ext_xtheadsync;
 }
 
 #define MATERIALISE_EXT_PREDICATE(ext)  \
diff --git a/target/r

[PATCH v2 04/15] RISC-V: Adding XTheadBb ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadBb ISA extension.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Split XtheadB* extension into individual commits
- Make implementation compatible with RV32.
- Use single decoder for XThead extensions

Co-developed-by: Philipp Tomsich 
Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |   2 +
 target/riscv/cpu.h |   1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 124 +
 target/riscv/translate.c   |   4 +-
 target/riscv/xthead.decode |  20 
 5 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 809b6eb4ed..b5285fb7a7 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -109,6 +109,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
 ISA_EXT_DATA_ENTRY(xtheadba, true, PRIV_VERSION_1_11_0, ext_xtheadba),
+ISA_EXT_DATA_ENTRY(xtheadbb, true, PRIV_VERSION_1_11_0, ext_xtheadbb),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
@@ -1064,6 +1065,7 @@ static Property riscv_cpu_extensions[] = {
 
 /* Vendor-specific custom extensions */
 DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),
+DEFINE_PROP_BOOL("xtheadbb", RISCVCPU, cfg.ext_xtheadbb, false),
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ec2588a0f0..0ac1d3f5ef 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -466,6 +466,7 @@ struct RISCVCPUConfig {
 
 /* Vendor-specific custom extensions */
 bool ext_xtheadba;
+bool ext_xtheadbb;
 bool ext_xtheadcmo;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index 79e1f5bde9..a55d1491fa 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -22,6 +22,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADBB(ctx) do {   \
+if (!ctx->cfg_ptr->ext_xtheadbb) {   \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADCMO(ctx) do {  \
 if (!ctx->cfg_ptr->ext_xtheadcmo) {  \
 return false;\
@@ -67,6 +73,124 @@ GEN_TRANS_TH_ADDSL(1)
 GEN_TRANS_TH_ADDSL(2)
 GEN_TRANS_TH_ADDSL(3)
 
+/* XTheadBb */
+
+/* th.srri is an alternate encoding for rori (from Zbb) */
+static bool trans_th_srri(DisasContext *ctx, arg_th_srri * a)
+{
+REQUIRE_XTHEADBB(ctx);
+return gen_shift_imm_fn_per_ol(ctx, a, EXT_NONE,
+   tcg_gen_rotri_tl, gen_roriw, NULL);
+}
+
+/* th.srriw is an alternate encoding for roriw (from Zbb) */
+static bool trans_th_srriw(DisasContext *ctx, arg_th_srriw *a)
+{
+REQUIRE_XTHEADBB(ctx);
+REQUIRE_64BIT(ctx);
+ctx->ol = MXL_RV32;
+return gen_shift_imm_fn(ctx, a, EXT_NONE, gen_roriw, NULL);
+}
+
+/* th.ext and th.extu perform signed/unsigned bitfield extraction */
+static bool gen_th_bfextract(DisasContext *ctx, arg_th_bfext *a,
+ void (*f)(TCGv, TCGv, unsigned int, unsigned int))
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv source = get_gpr(ctx, a->rs1, EXT_ZERO);
+
+if (a->lsb <= a->msb) {
+f(dest, source, a->lsb, a->msb - a->lsb + 1);
+gen_set_gpr(ctx, a->rd, dest);
+}
+return true;
+}
+
+static bool trans_th_ext(DisasContext *ctx, arg_th_ext *a)
+{
+REQUIRE_XTHEADBB(ctx);
+return gen_th_bfextract(ctx, a, tcg_gen_sextract_tl);
+}
+
+static bool trans_th_extu(DisasContext *ctx, arg_th_extu *a)
+{
+REQUIRE_XTHEADBB(ctx);
+return gen_th_bfextract(ctx, a, tcg_gen_extract_tl);
+}
+
+/* th.ff0: find first zero (clz on an inverted input) */
+static bool gen_th_ff0(DisasContext *ctx, arg_th_ff0 *a, DisasExtend ext)
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+
+int olen = get_olen(ctx);
+TCGv t = tcg_temp_new();
+
+tcg_gen_not_tl(t, src1);
+if (olen != TARGET_LONG_BITS) {
+if (olen == 32) {
+gen_clzw(dest, t);
+} else {
+g_assert_not_reached();
+}
+} else

[PATCH v2 08/15] RISC-V: Adding T-Head MemPair extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the T-Head MemPair instructions.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Use single decoder for XThead extensions
- Use get_address() to calculate addresses

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 +
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 88 ++
 target/riscv/translate.c   |  2 +-
 target/riscv/xthead.decode | 13 
 5 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 88ad2138db..de00f69710 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -114,6 +114,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadcondmov, true, PRIV_VERSION_1_11_0, 
ext_xtheadcondmov),
 ISA_EXT_DATA_ENTRY(xtheadmac, true, PRIV_VERSION_1_11_0, ext_xtheadmac),
+ISA_EXT_DATA_ENTRY(xtheadmempair, true, PRIV_VERSION_1_11_0, 
ext_xtheadmempair),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 };
@@ -1073,6 +1074,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadcondmov", RISCVCPU, cfg.ext_xtheadcondmov, false),
 DEFINE_PROP_BOOL("xtheadmac", RISCVCPU, cfg.ext_xtheadmac, false),
+DEFINE_PROP_BOOL("xtheadmempair", RISCVCPU, cfg.ext_xtheadmempair, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 92198be9d8..836445115e 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -471,6 +471,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadcmo;
 bool ext_xtheadcondmov;
 bool ext_xtheadmac;
+bool ext_xtheadmempair;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
 
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index 109be58c9b..49314306eb 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -52,6 +52,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADMEMPAIR(ctx) do {  \
+if (!ctx->cfg_ptr->ext_xtheadmempair) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADSYNC(ctx) do { \
 if (!ctx->cfg_ptr->ext_xtheadsync) { \
 return false;\
@@ -390,6 +396,88 @@ static bool trans_th_mulsw(DisasContext *ctx, arg_th_mulsw 
*a)
 return gen_th_mac(ctx, a, tcg_gen_sub_tl, NULL);
 }
 
+/* XTheadMemPair */
+
+static bool gen_loadpair_tl(DisasContext *ctx, arg_th_pair *a, MemOp memop,
+int shamt)
+{
+TCGv rd1 = dest_gpr(ctx, a->rd1);
+TCGv rd2 = dest_gpr(ctx, a->rd2);
+TCGv addr1 = tcg_temp_new();
+TCGv addr2 = tcg_temp_new();
+
+addr1 = get_address(ctx, a->rs, a->sh2 << shamt);
+if ((memop & MO_SIZE) == MO_64) {
+addr2 = get_address(ctx, a->rs, 8 + (a->sh2 << shamt));
+} else {
+addr2 = get_address(ctx, a->rs, 4 + (a->sh2 << shamt));
+}
+
+tcg_gen_qemu_ld_tl(rd1, addr1, ctx->mem_idx, memop);
+tcg_gen_qemu_ld_tl(rd2, addr2, ctx->mem_idx, memop);
+gen_set_gpr(ctx, a->rd1, rd1);
+gen_set_gpr(ctx, a->rd2, rd2);
+
+tcg_temp_free(addr1);
+tcg_temp_free(addr2);
+return true;
+}
+
+static bool trans_th_ldd(DisasContext *ctx, arg_th_pair *a)
+{
+REQUIRE_XTHEADMEMPAIR(ctx);
+REQUIRE_64BIT(ctx);
+return gen_loadpair_tl(ctx, a, MO_TESQ, 4);
+}
+
+static bool trans_th_lwd(DisasContext *ctx, arg_th_pair *a)
+{
+REQUIRE_XTHEADMEMPAIR(ctx);
+return gen_loadpair_tl(ctx, a, MO_TESL, 3);
+}
+
+static bool trans_th_lwud(DisasContext *ctx, arg_th_pair *a)
+{
+REQUIRE_XTHEADMEMPAIR(ctx);
+return gen_loadpair_tl(ctx, a, MO_TEUL, 3);
+}
+
+static bool gen_storepair_tl(DisasContext *ctx, arg_th_pair *a, MemOp memop,
+ int shamt)
+{
+TCGv data1 = get_gpr(ctx, a->rd1, EXT_NONE);
+TCGv data2 = get_gpr(ctx, a->rd2, EXT_NONE);
+TCGv addr1 = tcg_temp_new();
+TCGv addr2 = tcg_temp_new();
+
+addr1 = get_address(ctx, a->rs, a->sh2 << shamt);
+if ((memop & MO_SIZE) == MO_64) {
+addr2 = get_address(ctx, a->rs, 8 + (a->sh2 << shamt));
+} else {
+addr2 = get_address(ctx, a->rs, 4 + (a->sh2 << shamt));
+}
+
+tcg_gen_qemu_st_tl(data1, addr1, ctx->mem_idx, memop)

[PATCH v2 07/15] RISC-V: Adding T-Head multiply-accumulate instructions

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the T-Head MAC instructions.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Use single decoder for XThead extensions

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 +
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 83 ++
 target/riscv/translate.c   |  3 +-
 target/riscv/xthead.decode |  8 +++
 5 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 36a53784dd..88ad2138db 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -113,6 +113,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadbs, true, PRIV_VERSION_1_11_0, ext_xtheadbs),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadcondmov, true, PRIV_VERSION_1_11_0, 
ext_xtheadcondmov),
+ISA_EXT_DATA_ENTRY(xtheadmac, true, PRIV_VERSION_1_11_0, ext_xtheadmac),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 };
@@ -1071,6 +1072,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadbs", RISCVCPU, cfg.ext_xtheadbs, false),
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadcondmov", RISCVCPU, cfg.ext_xtheadcondmov, false),
+DEFINE_PROP_BOOL("xtheadmac", RISCVCPU, cfg.ext_xtheadmac, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 01f035d8e9..92198be9d8 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -470,6 +470,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadbs;
 bool ext_xtheadcmo;
 bool ext_xtheadcondmov;
+bool ext_xtheadmac;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
 
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index bf549bbd74..109be58c9b 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -46,6 +46,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADMAC(ctx) do {  \
+if (!ctx->cfg_ptr->ext_xtheadmac) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADSYNC(ctx) do { \
 if (!ctx->cfg_ptr->ext_xtheadsync) { \
 return false;\
@@ -307,6 +313,83 @@ static bool trans_th_mvnez(DisasContext *ctx, arg_th_mveqz 
*a)
 return gen_th_condmove(ctx, a, TCG_COND_NE);
 }
 
+/* XTheadMac */
+
+static bool gen_th_mac(DisasContext *ctx, arg_r *a,
+   void (*accumulate_func)(TCGv, TCGv, TCGv),
+   void (*extend_operand_func)(TCGv, TCGv))
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src0 = get_gpr(ctx, a->rd, EXT_NONE);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+TCGv tmp = tcg_temp_new();
+
+if (extend_operand_func) {
+TCGv tmp2 = tcg_temp_new();
+extend_operand_func(tmp, src1);
+extend_operand_func(tmp2, src2);
+tcg_gen_mul_tl(tmp, tmp, tmp2);
+tcg_temp_free(tmp2);
+} else {
+tcg_gen_mul_tl(tmp, src1, src2);
+}
+
+accumulate_func(dest, src0, tmp);
+gen_set_gpr(ctx, a->rd, dest);
+tcg_temp_free(tmp);
+
+return true;
+}
+
+/* th.mula: "rd = rd + rs1 * rs2" */
+static bool trans_th_mula(DisasContext *ctx, arg_th_mula *a)
+{
+REQUIRE_XTHEADMAC(ctx);
+return gen_th_mac(ctx, a, tcg_gen_add_tl, NULL);
+}
+
+/* th.mulah: "rd = sext.w(rd + sext.w(rs1[15:0]) * sext.w(rs2[15:0]))" */
+static bool trans_th_mulah(DisasContext *ctx, arg_th_mulah *a)
+{
+REQUIRE_XTHEADMAC(ctx);
+ctx->ol = MXL_RV32;
+return gen_th_mac(ctx, a, tcg_gen_add_tl, tcg_gen_ext16s_tl);
+}
+
+/* th.mulaw: "rd = sext.w(rd + rs1 * rs2)" */
+static bool trans_th_mulaw(DisasContext *ctx, arg_th_mulaw *a)
+{
+REQUIRE_XTHEADMAC(ctx);
+REQUIRE_64BIT(ctx);
+ctx->ol = MXL_RV32;
+return gen_th_mac(ctx, a, tcg_gen_add_tl, NULL);
+}
+
+/* th.muls: "rd = rd - rs1 * rs2" */
+static bool trans_th_muls(DisasContext *ctx, arg_th_muls *a)
+{
+REQUIRE_XTHEADMAC(ctx);
+return gen_th_mac(ctx, a, tcg_gen_sub_tl, NULL);
+}
+
+/* th.mulsh: "rd = sext.w(rd - sext.w(rs1[15:0]) * sext.w(rs2[15:0]))" */
+static bool trans_th_mulsh(DisasContext *ctx, arg_th_mulsh *a)
+{
+REQUIRE_XTHEADMAC(ctx);
+ctx->ol = MXL_RV32;
+return gen_th_mac(

[PATCH v2 13/15] RISC-V: Add initial support for T-Head C906

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds the T-Head C906 to the list of known CPUs.
Selecting this CPUs will automatically enable the available
ISA extensions of the CPUs (incl. vendor extensions).

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 

Changes in v2:
- Drop C910 as it does not differ from C906
- Set priv version to 1.11 (new fmin/fmax behaviour)

Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c  | 31 +++
 target/riscv/cpu.h  |  2 ++
 target/riscv/cpu_vendorid.h |  6 ++
 3 files changed, 39 insertions(+)
 create mode 100644 target/riscv/cpu_vendorid.h

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a38127365e..d3d8587710 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -279,6 +279,36 @@ static void rv64_sifive_e_cpu_init(Object *obj)
 cpu->cfg.mmu = false;
 }
 
+static void rv64_thead_c906_cpu_init(Object *obj)
+{
+CPURISCVState *env = &RISCV_CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+
+set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
+set_priv_version(env, PRIV_VERSION_1_11_0);
+
+cpu->cfg.ext_g = true;
+cpu->cfg.ext_c = true;
+cpu->cfg.ext_u = true;
+cpu->cfg.ext_s = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.ext_zfh = true;
+cpu->cfg.mmu = true;
+cpu->cfg.ext_xtheadba = true;
+cpu->cfg.ext_xtheadbb = true;
+cpu->cfg.ext_xtheadbs = true;
+cpu->cfg.ext_xtheadcmo = true;
+cpu->cfg.ext_xtheadcondmov = true;
+cpu->cfg.ext_xtheadfmemidx = true;
+cpu->cfg.ext_xtheadmac = true;
+cpu->cfg.ext_xtheadmemidx = true;
+cpu->cfg.ext_xtheadmempair = true;
+cpu->cfg.ext_xtheadsync = true;
+cpu->cfg.ext_xtheadxmae = true;
+
+cpu->cfg.mvendorid = THEAD_VENDOR_ID;
+}
+
 static void rv128_base_cpu_init(Object *obj)
 {
 if (qemu_tcg_mttcg_enabled()) {
@@ -1311,6 +1341,7 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_E51,   rv64_sifive_e_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_U54,   rv64_sifive_u_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SHAKTI_C, rv64_sifive_u_cpu_init),
+DEFINE_CPU(TYPE_RISCV_CPU_THEAD_C906,   rv64_thead_c906_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_BASE128,  rv128_base_cpu_init),
 #endif
 };
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 897962f107..28184bbe40 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -27,6 +27,7 @@
 #include "qom/object.h"
 #include "qemu/int128.h"
 #include "cpu_bits.h"
+#include "cpu_vendorid.h"
 
 #define TCG_GUEST_DEFAULT_MO 0
 
@@ -53,6 +54,7 @@
 #define TYPE_RISCV_CPU_SIFIVE_E51   RISCV_CPU_TYPE_NAME("sifive-e51")
 #define TYPE_RISCV_CPU_SIFIVE_U34   RISCV_CPU_TYPE_NAME("sifive-u34")
 #define TYPE_RISCV_CPU_SIFIVE_U54   RISCV_CPU_TYPE_NAME("sifive-u54")
+#define TYPE_RISCV_CPU_THEAD_C906   RISCV_CPU_TYPE_NAME("thead-c906")
 #define TYPE_RISCV_CPU_HOST RISCV_CPU_TYPE_NAME("host")
 
 #if defined(TARGET_RISCV32)
diff --git a/target/riscv/cpu_vendorid.h b/target/riscv/cpu_vendorid.h
new file mode 100644
index 00..a5aa249bc9
--- /dev/null
+++ b/target/riscv/cpu_vendorid.h
@@ -0,0 +1,6 @@
+#ifndef TARGET_RISCV_CPU_VENDORID_H
+#define TARGET_RISCV_CPU_VENDORID_H
+
+#define THEAD_VENDOR_ID 0x5b7
+
+#endif /*  TARGET_RISCV_CPU_VENDORID_H */
-- 
2.38.1

[PATCH v2 12/15] RISC-V: Set minimum priv version for Zfh to 1.11

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

There are no differences for floating point instructions in priv version 1.11
and 1.12. There is also no dependency for Zfh to priv version 1.12.
Therefore allow Zfh to be enabled for priv version 1.11.

Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index bb310755b1..a38127365e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -76,7 +76,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
 ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
 ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
ext_zihintpause),
-ISA_EXT_DATA_ENTRY(zfh, true, PRIV_VERSION_1_12_0, ext_zfh),
+ISA_EXT_DATA_ENTRY(zfh, true, PRIV_VERSION_1_11_0, ext_zfh),
 ISA_EXT_DATA_ENTRY(zfhmin, true, PRIV_VERSION_1_12_0, ext_zfhmin),
 ISA_EXT_DATA_ENTRY(zfinx, true, PRIV_VERSION_1_12_0, ext_zfinx),
 ISA_EXT_DATA_ENTRY(zdinx, true, PRIV_VERSION_1_12_0, ext_zdinx),
-- 
2.38.1

[PATCH v2 00/15] Add support for the T-Head vendor extensions

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This series introduces support for the T-Head vendor extensions,
which are implemented e.g. in the XuanTie C906 and XuanTie C910:
* XTheadBa
* XTheadBb
* XTheadBs
* XTheadCmo
* XTheadCondMov
* XTheadFMemIdx
* XTheadFmv
* XTheadMac
* XTheadMemIdx
* XTheadMemPair
* XTheadSync

The xthead* extensions are documented here:
  https://github.com/T-head-Semi/thead-extension-spec/releases/latest

The "th." instruction prefix prevents future conflicts with standard
extensions and has been documentented in the RISC-V toolchain conventions:
  https://github.com/riscv-non-isa/riscv-toolchain-conventions

Note, that the T-Head vendor extensions do not contain all
vendor-specific functionality of the T-Head SoCs
(e.g. no vendor-specific CSRs are included).
Instead the extensions cover coherent functionality,
that is exposed to S and U mode.

To enable the extensions above, the following two methods are possible:
* add the extension to the arch string
  E.g. QEMU_CPU="any,xtheadcmo=true,xtheadsync=true"
* implicitly select the extensions via CPU selection
  E.g. QEMU_CPU="thead-c906"

Major changes in v2:
- Add ISA_EXT_DATA_ENTRY()s
- Use single decoder for XThead extensions
- Simplify a lot of translation functions
- Fix RV32 behaviour
- Added XTheadFmv
- Addressed all comments of v1

Christoph Müllner (15):
  RISC-V: Adding XTheadCmo ISA extension
  RISC-V: Adding XTheadSync ISA extension
  RISC-V: Adding XTheadBa ISA extension
  RISC-V: Adding XTheadBb ISA extension
  RISC-V: Adding XTheadBs ISA extension
  RISC-V: Adding XTheadCondMov ISA extension
  RISC-V: Adding T-Head multiply-accumulate instructions
  RISC-V: Adding T-Head MemPair extension
  RISC-V: Adding T-Head MemIdx extension
  RISC-V: Adding T-Head FMemIdx extension
  RISC-V: Adding T-Head XMAE support
  RISC-V: Set minimum priv version for Zfh to 1.11
  RISC-V: Add initial support for T-Head C906
  RISC-V: Adding XTheadFmv ISA extension
  target/riscv: add a MAINTAINERS entry for XThead* extension support

 MAINTAINERS|8 +
 target/riscv/cpu.c |   57 +-
 target/riscv/cpu.h |   14 +
 target/riscv/cpu_helper.c  |6 +-
 target/riscv/cpu_vendorid.h|6 +
 target/riscv/helper.h  |1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 1089 
 target/riscv/meson.build   |1 +
 target/riscv/op_helper.c   |6 +
 target/riscv/translate.c   |   38 +-
 target/riscv/xthead.decode |  185 
 11 files changed, 1405 insertions(+), 6 deletions(-)
 create mode 100644 target/riscv/cpu_vendorid.h
 create mode 100644 target/riscv/insn_trans/trans_xthead.c.inc
 create mode 100644 target/riscv/xthead.decode

-- 
2.38.1

[PATCH v2 15/15] target/riscv: add a MAINTAINERS entry for XThead* extension support

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

The XThead* extensions are maintained by T-Head and VRULL.
Adding a point of contact from both companies.

Signed-off-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b270eb8e5b..38f3ab3772 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -294,6 +294,14 @@ F: include/hw/riscv/
 F: linux-user/host/riscv32/
 F: linux-user/host/riscv64/
 
+RISC-V XThead* extensions
+M: Christoph Muellner 
+M: LIU Zhiwei 
+L: qemu-ri...@nongnu.org
+S: Supported
+F: target/riscv/insn_trans/trans_xthead.c.inc
+F: target/riscv/xthead*.decode
+
 RISC-V XVentanaCondOps extension
 M: Philipp Tomsich 
 L: qemu-ri...@nongnu.org
-- 
2.38.1

[PATCH v2 14/15] RISC-V: Adding XTheadFmv ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadFmv ISA extension.
The patch uses the T-Head specific decoder and translation.

Signed-off-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 +
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 45 ++
 target/riscv/translate.c   |  6 +--
 target/riscv/xthead.decode |  4 ++
 5 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d3d8587710..d3f711cc41 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -114,6 +114,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadcondmov, true, PRIV_VERSION_1_11_0, 
ext_xtheadcondmov),
 ISA_EXT_DATA_ENTRY(xtheadfmemidx, true, PRIV_VERSION_1_11_0, 
ext_xtheadfmemidx),
+ISA_EXT_DATA_ENTRY(xtheadfmv, true, PRIV_VERSION_1_11_0, ext_xtheadfmv),
 ISA_EXT_DATA_ENTRY(xtheadmac, true, PRIV_VERSION_1_11_0, ext_xtheadmac),
 ISA_EXT_DATA_ENTRY(xtheadmemidx, true, PRIV_VERSION_1_11_0, 
ext_xtheadmemidx),
 ISA_EXT_DATA_ENTRY(xtheadmempair, true, PRIV_VERSION_1_11_0, 
ext_xtheadmempair),
@@ -1107,6 +1108,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadcondmov", RISCVCPU, cfg.ext_xtheadcondmov, false),
 DEFINE_PROP_BOOL("xtheadfmemidx", RISCVCPU, cfg.ext_xtheadfmemidx, false),
+DEFINE_PROP_BOOL("xtheadfmv", RISCVCPU, cfg.ext_xtheadfmv, false),
 DEFINE_PROP_BOOL("xtheadmac", RISCVCPU, cfg.ext_xtheadmac, false),
 DEFINE_PROP_BOOL("xtheadmemidx", RISCVCPU, cfg.ext_xtheadmemidx, false),
 DEFINE_PROP_BOOL("xtheadmempair", RISCVCPU, cfg.ext_xtheadmempair, false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 28184bbe40..154c16208a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -473,6 +473,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadcmo;
 bool ext_xtheadcondmov;
 bool ext_xtheadfmemidx;
+bool ext_xtheadfmv;
 bool ext_xtheadmac;
 bool ext_xtheadmemidx;
 bool ext_xtheadmempair;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index dc1a11070e..12d5af4f75 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -52,6 +52,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADFMV(ctx) do {  \
+if (!ctx->cfg_ptr->ext_xtheadfmv) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADMAC(ctx) do {  \
 if (!ctx->cfg_ptr->ext_xtheadmac) {  \
 return false;\
@@ -457,6 +463,45 @@ static bool trans_th_fsurw(DisasContext *ctx, 
arg_th_memidx *a)
 return gen_fstore_idx(ctx, a, MO_TEUL, true);
 }
 
+/* XTheadFmv */
+
+static bool trans_th_fmv_hw_x(DisasContext *ctx, arg_th_fmv_hw_x *a)
+{
+REQUIRE_XTHEADFMV(ctx);
+REQUIRE_32BIT(ctx);
+REQUIRE_FPU;
+REQUIRE_EXT(ctx, RVD);
+
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
+TCGv_i64 t1 = tcg_temp_new_i64();
+
+tcg_gen_extu_tl_i64(t1, src1);
+tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rd], t1, 32, 32);
+tcg_temp_free_i64(t1);
+mark_fs_dirty(ctx);
+return true;
+}
+
+static bool trans_th_fmv_x_hw(DisasContext *ctx, arg_th_fmv_x_hw *a)
+{
+REQUIRE_XTHEADFMV(ctx);
+REQUIRE_32BIT(ctx);
+REQUIRE_FPU;
+REQUIRE_EXT(ctx, RVD);
+TCGv dst;
+TCGv_i64 t1;
+
+dst = dest_gpr(ctx, a->rd);
+t1 = tcg_temp_new_i64();
+
+tcg_gen_extract_i64(t1, cpu_fpr[a->rs1], 32, 32);
+tcg_gen_trunc_i64_tl(dst, t1);
+gen_set_gpr(ctx, a->rd, dst);
+tcg_temp_free_i64(t1);
+mark_fs_dirty(ctx);
+return true;
+}
+
 /* XTheadMac */
 
 static bool gen_th_mac(DisasContext *ctx, arg_r *a,
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index fb77df721e..1c54c3c67d 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -130,9 +130,9 @@ static bool has_xthead_p(DisasContext *ctx  
__attribute__((__unused__)))
 return ctx->cfg_ptr->ext_xtheadba || ctx->cfg_ptr->ext_xtheadbb ||
ctx->cfg_ptr->ext_xtheadbs || ctx->cfg_ptr->ext_xtheadcmo ||
ctx->cfg_ptr->ext_xtheadcondmov ||
-   ctx->cfg_ptr->ext_xtheadfmemidx || ctx->cfg_ptr->ext_xtheadmac ||
-   ctx->cfg_ptr->ext_xtheadmemidx || ctx->cfg_ptr->ext_xtheadmempair ||
-   ctx->cfg_ptr->ext_xtheadsync;
+   ctx->cfg_ptr->ext_xtheadfmemidx || ctx->cfg_ptr->ext_xtheadfmv ||
+   ctx->cfg_ptr->ext_xtheadmac || ctx->cfg_ptr->ext_xtheadmemidx ||
+   ctx->cfg_ptr->ext_xthe

[PATCH v2 02/15] RISC-V: Adding XTheadSync ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadSync ISA extension.
The patch uses the T-Head specific decoder and translation.

The implementation introduces a helper to execute synchronization tasks:
helper_tlb_flush_all() performs a synchronized TLB flush on all CPUs.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Use helper to synchronize CPUs and perform TLB flushes
- Change implemenation to follow latest spec update
- Use single decoder for XThead extensions

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 +
 target/riscv/cpu.h |  1 +
 target/riscv/helper.h  |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 86 ++
 target/riscv/op_helper.c   |  6 ++
 target/riscv/translate.c   |  2 +-
 target/riscv/xthead.decode |  9 +++
 7 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a90b82c5c5..a848836d2e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -109,6 +109,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
+ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 };
 
@@ -1062,6 +1063,7 @@ static Property riscv_cpu_extensions[] = {
 
 /* Vendor-specific custom extensions */
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
+DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
 
 /* These are experimental so mark with 'x-' */
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ad1c19f870..4d3da2acfa 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -466,6 +466,7 @@ struct RISCVCPUConfig {
 
 /* Vendor-specific custom extensions */
 bool ext_xtheadcmo;
+bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
 
 uint8_t pmu_num;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a03014fe67..ecfb8c280f 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -109,6 +109,7 @@ DEF_HELPER_1(sret, tl, env)
 DEF_HELPER_1(mret, tl, env)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
+DEF_HELPER_1(tlb_flush_all, void, env)
 #endif
 
 /* Hypervisor functions */
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index 00e75c7dca..6009d61c81 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -22,6 +22,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADSYNC(ctx) do { \
+if (!ctx->cfg_ptr->ext_xtheadsync) { \
+return false;\
+}\
+} while (0)
+
 /* XTheadCmo */
 
 static inline int priv_level(DisasContext *ctx)
@@ -87,3 +93,83 @@ NOP_PRIVCHECK(th_icache_iva, REQUIRE_XTHEADCMO, 
REQUIRE_PRIV_MHSU)
 NOP_PRIVCHECK(th_l2cache_call, REQUIRE_XTHEADCMO, REQUIRE_PRIV_MHS)
 NOP_PRIVCHECK(th_l2cache_ciall, REQUIRE_XTHEADCMO, REQUIRE_PRIV_MHS)
 NOP_PRIVCHECK(th_l2cache_iall, REQUIRE_XTHEADCMO, REQUIRE_PRIV_MHS)
+
+/* XTheadSync */
+
+static bool trans_th_sfence_vmas(DisasContext *ctx, arg_th_sfence_vmas *a)
+{
+(void) a;
+REQUIRE_XTHEADSYNC(ctx);
+
+#ifndef CONFIG_USER_ONLY
+REQUIRE_PRIV_MHS(ctx);
+decode_save_opc(ctx);
+gen_helper_tlb_flush_all(cpu_env);
+return true;
+#else
+return false;
+#endif
+}
+
+#ifndef CONFIG_USER_ONLY
+static void gen_th_sync_local(DisasContext *ctx)
+{
+/*
+ * Emulate out-of-order barriers with pipeline flush
+ * by exiting the translation block.
+ */
+gen_set_pc_imm(ctx, ctx->pc_succ_insn);
+tcg_gen_exit_tb(NULL, 0);
+ctx->base.is_jmp = DISAS_NORETURN;
+}
+#endif
+
+static bool trans_th_sync(DisasContext *ctx, arg_th_sync *a)
+{
+(void) a;
+REQUIRE_XTHEADSYNC(ctx);
+
+#ifndef CONFIG_USER_ONLY
+REQUIRE_PRIV_MHSU(ctx);
+
+/*
+ * th.sync is an out-of-order barrier.
+ */
+gen_th_sync_local(ctx);
+
+return true;
+#else
+return false;
+#endif
+}
+
+static bool trans_th_sync_i(DisasContext *ctx, arg_th_sync_i *a)
+{
+(void) a;
+REQUIRE_XTHEADSYNC(ctx);
+
+#ifndef CONFIG_USER_ONLY
+REQUIRE_PRIV_MHSU(ctx);
+
+/*
+ * th.sync.i is th.sync plus pipeline flush.
+ */
+gen_th_sync_local(ctx);
+
+return true;
+#else
+return false;
+#endif
+}
+
+static bool trans_th_sync_is(DisasContext *ctx, arg_th_sync_is *a)
+{
+/* This instruc

[PATCH v2 09/15] RISC-V: Adding T-Head MemIdx extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the T-Head MemIdx instructions.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Use single decoder for XThead extensions
- Avoid signed-bitfield-extraction by using signed immediate field imm5
- Use get_address() to calculate addresses
- Introduce helper get_th_address_indexed for rs1+(rs2<
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |   2 +
 target/riscv/cpu.h |   1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 377 +
 target/riscv/translate.c   |  21 +-
 target/riscv/xthead.decode |  54 +++
 5 files changed, 454 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index de00f69710..1fbfb7ccc3 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -114,6 +114,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadcondmov, true, PRIV_VERSION_1_11_0, 
ext_xtheadcondmov),
 ISA_EXT_DATA_ENTRY(xtheadmac, true, PRIV_VERSION_1_11_0, ext_xtheadmac),
+ISA_EXT_DATA_ENTRY(xtheadmemidx, true, PRIV_VERSION_1_11_0, 
ext_xtheadmemidx),
 ISA_EXT_DATA_ENTRY(xtheadmempair, true, PRIV_VERSION_1_11_0, 
ext_xtheadmempair),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
@@ -1074,6 +1075,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadcondmov", RISCVCPU, cfg.ext_xtheadcondmov, false),
 DEFINE_PROP_BOOL("xtheadmac", RISCVCPU, cfg.ext_xtheadmac, false),
+DEFINE_PROP_BOOL("xtheadmemidx", RISCVCPU, cfg.ext_xtheadmemidx, false),
 DEFINE_PROP_BOOL("xtheadmempair", RISCVCPU, cfg.ext_xtheadmempair, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 836445115e..965dc46591 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -471,6 +471,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadcmo;
 bool ext_xtheadcondmov;
 bool ext_xtheadmac;
+bool ext_xtheadmemidx;
 bool ext_xtheadmempair;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index 49314306eb..02b82ac327 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -52,6 +52,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADMEMIDX(ctx) do {   \
+if (!ctx->cfg_ptr->ext_xtheadmemidx) {   \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADMEMPAIR(ctx) do {  \
 if (!ctx->cfg_ptr->ext_xtheadmempair) {  \
 return false;\
@@ -64,6 +70,30 @@
 }\
 } while (0)
 
+/*
+ * Calculate and return the address for indexed mem operations:
+ * If !zext_offs, then the address is rs1 + (rs2 << imm2).
+ * If  zext_offs, then the address is rs1 + (zext(rs2[31:0]) << imm2).
+ */
+static TCGv get_th_address_indexed(DisasContext *ctx, int rs1, int rs2,
+   int imm2, bool zext_offs)
+{
+TCGv src2 = get_gpr(ctx, rs2, EXT_NONE);
+TCGv offs = tcg_temp_new();
+
+if (zext_offs) {
+tcg_gen_extract_tl(offs, src2, 0, 32);
+tcg_gen_shli_tl(offs, offs, imm2);
+} else {
+tcg_gen_shli_tl(offs, src2, imm2);
+}
+
+TCGv addr = get_address_indexed(ctx, rs1, offs);
+
+tcg_temp_free(offs);
+return addr;
+}
+
 /* XTheadBa */
 
 /*
@@ -396,6 +426,353 @@ static bool trans_th_mulsw(DisasContext *ctx, 
arg_th_mulsw *a)
 return gen_th_mac(ctx, a, tcg_gen_sub_tl, NULL);
 }
 
+/* XTheadMemIdx */
+
+/*
+ * Load with memop from indexed address and add (imm5 << imm2) to rs1.
+ * If !preinc, then the load address is rs1.
+ * If  preinc, then the load address is rs1 + (imm5) << imm2).
+ */
+static bool gen_load_inc(DisasContext *ctx, arg_th_meminc *a, MemOp memop,
+ bool preinc)
+{
+TCGv rd = dest_gpr(ctx, a->rd);
+TCGv addr = get_address(ctx, a->rs1, preinc ? a->imm5 << a->imm2 : 0);
+
+tcg_gen_qemu_ld_tl(rd, addr, ctx->mem_idx, memop);
+addr = get_address(ctx, a->rs1, !preinc ? a->imm5 << a->imm2 : 0);
+gen_set_gpr(ctx, a->rd, rd);
+gen_set_gpr(ctx, a->rs1, addr);
+
+return true;
+}
+
+/*
+ * Store with memop to indexed address and add (imm5 << imm2) to rs1.
+ * If !preinc, then the store ad

[PATCH v2 01/15] RISC-V: Adding XTheadCmo ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadCmo ISA extension.
To avoid interfering with standard extensions, decoder and translation
are in its own xthead* specific files.
Future patches should be able to easily add additional T-Head extension.

The implementation does not have much functionality (besides accepting
the instructions and not qualifying them as illegal instructions if
the hart executes in the required privilege level for the instruction),
as QEMU does not model CPU caches and instructions are documented
to not raise any exceptions.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Explicit test for PRV_U
- Encapsule access to env-priv in inline function
- Use single decoder for XThead extensions

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 +
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 89 ++
 target/riscv/meson.build   |  1 +
 target/riscv/translate.c   | 15 +++-
 target/riscv/xthead.decode | 38 +
 6 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_xthead.c.inc
 create mode 100644 target/riscv/xthead.decode

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6fe176e483..a90b82c5c5 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -108,6 +108,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0, ext_svinval),
 ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
+ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 };
 
@@ -1060,6 +1061,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
 /* Vendor-specific custom extensions */
+DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
 
 /* These are experimental so mark with 'x-' */
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 443d15a47c..ad1c19f870 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -465,6 +465,7 @@ struct RISCVCPUConfig {
 uint64_t mimpid;
 
 /* Vendor-specific custom extensions */
+bool ext_xtheadcmo;
 bool ext_XVentanaCondOps;
 
 uint8_t pmu_num;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
new file mode 100644
index 00..00e75c7dca
--- /dev/null
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -0,0 +1,89 @@
+/*
+ * RISC-V translation routines for the T-Head vendor extensions (xthead*).
+ *
+ * Copyright (c) 2022 VRULL GmbH.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#define REQUIRE_XTHEADCMO(ctx) do {  \
+if (!ctx->cfg_ptr->ext_xtheadcmo) {  \
+return false;\
+}\
+} while (0)
+
+/* XTheadCmo */
+
+static inline int priv_level(DisasContext *ctx)
+{
+#ifdef CONFIG_USER_ONLY
+return PRV_U;
+#else
+ /* Priv level equals mem_idx -- see cpu_mmu_index. */
+return ctx->mem_idx;
+#endif
+}
+
+#define REQUIRE_PRIV_MHSU(ctx)  \
+do {\
+int priv = priv_level(ctx); \
+if (!(priv == PRV_M ||  \
+  priv == PRV_H ||  \
+  priv == PRV_S ||  \
+  priv == PRV_U)) { \
+return false;   \
+}   \
+} while (0)
+
+#define REQUIRE_PRIV_MHS(ctx)   \
+do {\
+int priv = priv_level(ctx); \
+if (!(priv == PRV_M ||  \
+  priv == PRV_H ||

[PATCH v2 11/15] RISC-V: Adding T-Head XMAE support

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the T-Head specific extended memory
attributes. Similar like Svpbmt, this support does not have much effect
as most behaviour is not modelled in QEMU.

We also don't set any EDATA information, because XMAE discovery is done
using the vendor ID in the Linux kernel.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c| 2 ++
 target/riscv/cpu.h| 1 +
 target/riscv/cpu_helper.c | 6 --
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9c31a50e90..bb310755b1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -118,6 +118,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadmemidx, true, PRIV_VERSION_1_11_0, 
ext_xtheadmemidx),
 ISA_EXT_DATA_ENTRY(xtheadmempair, true, PRIV_VERSION_1_11_0, 
ext_xtheadmempair),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
+ISA_EXT_DATA_ENTRY(xtheadxmae, true, PRIV_VERSION_1_11_0, ext_xtheadxmae),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 };
 
@@ -1080,6 +1081,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadmemidx", RISCVCPU, cfg.ext_xtheadmemidx, false),
 DEFINE_PROP_BOOL("xtheadmempair", RISCVCPU, cfg.ext_xtheadmempair, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
+DEFINE_PROP_BOOL("xtheadxmae", RISCVCPU, cfg.ext_xtheadxmae, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
 
 /* These are experimental so mark with 'x-' */
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index c97c1c0af0..897962f107 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -475,6 +475,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadmemidx;
 bool ext_xtheadmempair;
 bool ext_xtheadsync;
+bool ext_xtheadxmae;
 bool ext_XVentanaCondOps;
 
 uint8_t pmu_num;
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 278d163803..345bb69b79 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -938,7 +938,8 @@ restart:
 
 if (riscv_cpu_sxl(env) == MXL_RV32) {
 ppn = pte >> PTE_PPN_SHIFT;
-} else if (cpu->cfg.ext_svpbmt || cpu->cfg.ext_svnapot) {
+} else if (cpu->cfg.ext_svpbmt || cpu->cfg.ext_svnapot ||
+   cpu->cfg.ext_xtheadxmae) {
 ppn = (pte & (target_ulong)PTE_PPN_MASK) >> PTE_PPN_SHIFT;
 } else {
 ppn = pte >> PTE_PPN_SHIFT;
@@ -950,7 +951,8 @@ restart:
 if (!(pte & PTE_V)) {
 /* Invalid PTE */
 return TRANSLATE_FAIL;
-} else if (!cpu->cfg.ext_svpbmt && (pte & PTE_PBMT)) {
+} else if (!cpu->cfg.ext_svpbmt && (pte & PTE_PBMT) &&
+   !cpu->cfg.ext_xtheadxmae) {
 return TRANSLATE_FAIL;
 } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
 /* Inner PTE, continue walking */
-- 
2.38.1

[PATCH v2 06/15] RISC-V: Adding XTheadCondMov ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadCondMov ISA extension.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Fix invalid use of register from dest_gpr()
- Use single decoder for XThead extensions

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 ++
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 35 ++
 target/riscv/translate.c   |  2 +-
 target/riscv/xthead.decode |  4 +++
 5 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 17273425a8..36a53784dd 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -112,6 +112,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadbb, true, PRIV_VERSION_1_11_0, ext_xtheadbb),
 ISA_EXT_DATA_ENTRY(xtheadbs, true, PRIV_VERSION_1_11_0, ext_xtheadbs),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
+ISA_EXT_DATA_ENTRY(xtheadcondmov, true, PRIV_VERSION_1_11_0, 
ext_xtheadcondmov),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 };
@@ -1069,6 +1070,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadbb", RISCVCPU, cfg.ext_xtheadbb, false),
 DEFINE_PROP_BOOL("xtheadbs", RISCVCPU, cfg.ext_xtheadbs, false),
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
+DEFINE_PROP_BOOL("xtheadcondmov", RISCVCPU, cfg.ext_xtheadcondmov, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5f68cb1e1e..01f035d8e9 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -469,6 +469,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadbb;
 bool ext_xtheadbs;
 bool ext_xtheadcmo;
+bool ext_xtheadcondmov;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
 
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index fb1f2c5731..bf549bbd74 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -40,6 +40,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADCONDMOV(ctx) do {  \
+if (!ctx->cfg_ptr->ext_xtheadcondmov) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADSYNC(ctx) do { \
 if (!ctx->cfg_ptr->ext_xtheadsync) { \
 return false;\
@@ -272,6 +278,35 @@ NOP_PRIVCHECK(th_l2cache_call, REQUIRE_XTHEADCMO, 
REQUIRE_PRIV_MHS)
 NOP_PRIVCHECK(th_l2cache_ciall, REQUIRE_XTHEADCMO, REQUIRE_PRIV_MHS)
 NOP_PRIVCHECK(th_l2cache_iall, REQUIRE_XTHEADCMO, REQUIRE_PRIV_MHS)
 
+/* XTheadCondMov */
+
+static bool gen_th_condmove(DisasContext *ctx, arg_r *a, TCGCond cond)
+{
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+TCGv old = get_gpr(ctx, a->rd, EXT_NONE);
+TCGv dest = dest_gpr(ctx, a->rd);
+
+tcg_gen_movcond_tl(cond, dest, src2, ctx->zero, src1, old);
+
+gen_set_gpr(ctx, a->rd, dest);
+return true;
+}
+
+/* th.mveqz: "if (rs2 == 0) rd = rs1;" */
+static bool trans_th_mveqz(DisasContext *ctx, arg_th_mveqz *a)
+{
+REQUIRE_XTHEADCONDMOV(ctx);
+return gen_th_condmove(ctx, a, TCG_COND_EQ);
+}
+
+/* th.mvnez: "if (rs2 != 0) rd = rs1;" */
+static bool trans_th_mvnez(DisasContext *ctx, arg_th_mveqz *a)
+{
+REQUIRE_XTHEADCONDMOV(ctx);
+return gen_th_condmove(ctx, a, TCG_COND_NE);
+}
+
 /* XTheadSync */
 
 static bool trans_th_sfence_vmas(DisasContext *ctx, arg_th_sfence_vmas *a)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index fc326e0a79..f15883b16b 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -129,7 +129,7 @@ static bool has_xthead_p(DisasContext *ctx  
__attribute__((__unused__)))
 {
 return ctx->cfg_ptr->ext_xtheadba || ctx->cfg_ptr->ext_xtheadbb ||
ctx->cfg_ptr->ext_xtheadbs || ctx->cfg_ptr->ext_xtheadcmo ||
-   ctx->cfg_ptr->ext_xtheadsync;
+   ctx->cfg_ptr->ext_xtheadcondmov || ctx->cfg_ptr->ext_xtheadsync;
 }
 
 #define MATERIALISE_EXT_PREDICATE(ext)  \
diff --git a/target/riscv/xthead.decode b/target/riscv/xthead.decode
index 8494805611..a8ebd8a18b 100644
--- a/target/riscv/xthead.decode
+++ b/target/riscv/xthead.decode
@@ -84,6 +84,10 @@ th_l2cache_call  000 10101 0 000 0 0001011
 th_l2cache_ciall 000 10111 0 000 0 0001011
 th_l2cache_iall  000 10110 0 000 0

[PATCH v2 10/15] RISC-V: Adding T-Head FMemIdx extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the T-Head FMemIdx instructions.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Use single decoder for XThead extensions
- Use get_th_address_indexed for address calculations

Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |   2 +
 target/riscv/cpu.h |   1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 108 +
 target/riscv/translate.c   |   3 +-
 target/riscv/xthead.decode |  10 ++
 5 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1fbfb7ccc3..9c31a50e90 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -113,6 +113,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadbs, true, PRIV_VERSION_1_11_0, ext_xtheadbs),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadcondmov, true, PRIV_VERSION_1_11_0, 
ext_xtheadcondmov),
+ISA_EXT_DATA_ENTRY(xtheadfmemidx, true, PRIV_VERSION_1_11_0, 
ext_xtheadfmemidx),
 ISA_EXT_DATA_ENTRY(xtheadmac, true, PRIV_VERSION_1_11_0, ext_xtheadmac),
 ISA_EXT_DATA_ENTRY(xtheadmemidx, true, PRIV_VERSION_1_11_0, 
ext_xtheadmemidx),
 ISA_EXT_DATA_ENTRY(xtheadmempair, true, PRIV_VERSION_1_11_0, 
ext_xtheadmempair),
@@ -1074,6 +1075,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xtheadbs", RISCVCPU, cfg.ext_xtheadbs, false),
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadcondmov", RISCVCPU, cfg.ext_xtheadcondmov, false),
+DEFINE_PROP_BOOL("xtheadfmemidx", RISCVCPU, cfg.ext_xtheadfmemidx, false),
 DEFINE_PROP_BOOL("xtheadmac", RISCVCPU, cfg.ext_xtheadmac, false),
 DEFINE_PROP_BOOL("xtheadmemidx", RISCVCPU, cfg.ext_xtheadmemidx, false),
 DEFINE_PROP_BOOL("xtheadmempair", RISCVCPU, cfg.ext_xtheadmempair, false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 965dc46591..c97c1c0af0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -470,6 +470,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadbs;
 bool ext_xtheadcmo;
 bool ext_xtheadcondmov;
+bool ext_xtheadfmemidx;
 bool ext_xtheadmac;
 bool ext_xtheadmemidx;
 bool ext_xtheadmempair;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index 02b82ac327..dc1a11070e 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -46,6 +46,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADFMEMIDX(ctx) do {  \
+if (!ctx->cfg_ptr->ext_xtheadfmemidx) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADMAC(ctx) do {  \
 if (!ctx->cfg_ptr->ext_xtheadmac) {  \
 return false;\
@@ -349,6 +355,108 @@ static bool trans_th_mvnez(DisasContext *ctx, 
arg_th_mveqz *a)
 return gen_th_condmove(ctx, a, TCG_COND_NE);
 }
 
+/* XTheadFMem */
+
+/*
+ * Load 64-bit float from indexed address.
+ * If !zext_offs, then address is rs1 + (rs2 << imm2).
+ * If  zext_offs, then address is rs1 + (zext(rs2[31:0]) << imm2).
+ */
+static bool gen_fload_idx(DisasContext *ctx, arg_th_memidx *a, MemOp memop,
+  bool zext_offs)
+{
+TCGv_i64 rd = cpu_fpr[a->rd];
+TCGv addr = get_th_address_indexed(ctx, a->rs1, a->rs2, a->imm2, 
zext_offs);
+
+tcg_gen_qemu_ld_i64(rd, addr, ctx->mem_idx, memop);
+if ((memop & MO_SIZE) == MO_32) {
+gen_nanbox_s(rd, rd);
+}
+
+mark_fs_dirty(ctx);
+return true;
+}
+
+/*
+ * Store 64-bit float to indexed address.
+ * If !zext_offs, then address is rs1 + (rs2 << imm2).
+ * If  zext_offs, then address is rs1 + (zext(rs2[31:0]) << imm2).
+ */
+static bool gen_fstore_idx(DisasContext *ctx, arg_th_memidx *a, MemOp memop,
+   bool zext_offs)
+{
+TCGv_i64 rd = cpu_fpr[a->rd];
+TCGv addr = get_th_address_indexed(ctx, a->rs1, a->rs2, a->imm2, 
zext_offs);
+
+tcg_gen_qemu_st_i64(rd, addr, ctx->mem_idx, memop);
+
+return true;
+}
+
+static bool trans_th_flrd(DisasContext *ctx, arg_th_memidx *a)
+{
+REQUIRE_XTHEADFMEMIDX(ctx);
+REQUIRE_FPU;
+REQUIRE_EXT(ctx, RVD);
+return gen_fload_idx(ctx, a, MO_TEUQ, false);
+}
+
+static bool trans_th_flrw(DisasContext *ctx, arg_th_memidx *a)
+{
+REQUIRE_XTHEADFMEMIDX(ctx);
+REQUIRE_FPU;
+REQUIRE_EXT(ctx, RVF);
+return gen_fload_idx(ctx, a, MO_TEUL, false);
+}
+
+static bool trans_th_flurd(DisasContext *ctx, arg_th_memidx *a)
+{
+REQUIRE_XTHEADFMEMIDX(ctx);
+REQUIRE_FPU;
+REQUIRE_EXT(ctx, RVD);
+return gen_fload_idx(ctx,

[PATCH v2 05/15] RISC-V: Adding XTheadBs ISA extension

2022-12-23 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds support for the XTheadBs ISA extension.
The patch uses the T-Head specific decoder and translation.

Changes in v2:
- Add ISA_EXT_DATA_ENTRY()
- Split XtheadB* extension into individual commits
- Use single decoder for XThead extensions

Co-developed-by: Philipp Tomsich 
Co-developed-by: LIU Zhiwei 
Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c |  2 ++
 target/riscv/cpu.h |  1 +
 target/riscv/insn_trans/trans_xthead.c.inc | 15 +++
 target/riscv/translate.c   |  3 ++-
 target/riscv/xthead.decode |  3 +++
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index b5285fb7a7..17273425a8 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -110,6 +110,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svpbmt, true, PRIV_VERSION_1_12_0, ext_svpbmt),
 ISA_EXT_DATA_ENTRY(xtheadba, true, PRIV_VERSION_1_11_0, ext_xtheadba),
 ISA_EXT_DATA_ENTRY(xtheadbb, true, PRIV_VERSION_1_11_0, ext_xtheadbb),
+ISA_EXT_DATA_ENTRY(xtheadbs, true, PRIV_VERSION_1_11_0, ext_xtheadbs),
 ISA_EXT_DATA_ENTRY(xtheadcmo, true, PRIV_VERSION_1_11_0, ext_xtheadcmo),
 ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
 ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
@@ -1066,6 +1067,7 @@ static Property riscv_cpu_extensions[] = {
 /* Vendor-specific custom extensions */
 DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),
 DEFINE_PROP_BOOL("xtheadbb", RISCVCPU, cfg.ext_xtheadbb, false),
+DEFINE_PROP_BOOL("xtheadbs", RISCVCPU, cfg.ext_xtheadbs, false),
 DEFINE_PROP_BOOL("xtheadcmo", RISCVCPU, cfg.ext_xtheadcmo, false),
 DEFINE_PROP_BOOL("xtheadsync", RISCVCPU, cfg.ext_xtheadsync, false),
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0ac1d3f5ef..5f68cb1e1e 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -467,6 +467,7 @@ struct RISCVCPUConfig {
 /* Vendor-specific custom extensions */
 bool ext_xtheadba;
 bool ext_xtheadbb;
+bool ext_xtheadbs;
 bool ext_xtheadcmo;
 bool ext_xtheadsync;
 bool ext_XVentanaCondOps;
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index a55d1491fa..fb1f2c5731 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -28,6 +28,12 @@
 }\
 } while (0)
 
+#define REQUIRE_XTHEADBS(ctx) do {   \
+if (!ctx->cfg_ptr->ext_xtheadbs) {   \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_XTHEADCMO(ctx) do {  \
 if (!ctx->cfg_ptr->ext_xtheadcmo) {  \
 return false;\
@@ -191,6 +197,15 @@ static bool trans_th_tstnbz(DisasContext *ctx, 
arg_th_tstnbz *a)
 return gen_unary(ctx, a, EXT_ZERO, gen_th_tstnbz);
 }
 
+/* XTheadBs */
+
+/* th.tst is an alternate encoding for bexti (from Zbs) */
+static bool trans_th_tst(DisasContext *ctx, arg_th_tst *a)
+{
+REQUIRE_XTHEADBS(ctx);
+return gen_shift_imm_tl(ctx, a, EXT_NONE, gen_bext);
+}
+
 /* XTheadCmo */
 
 static inline int priv_level(DisasContext *ctx)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 8439ff0bf4..fc326e0a79 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -128,7 +128,8 @@ static bool always_true_p(DisasContext *ctx  
__attribute__((__unused__)))
 static bool has_xthead_p(DisasContext *ctx  __attribute__((__unused__)))
 {
 return ctx->cfg_ptr->ext_xtheadba || ctx->cfg_ptr->ext_xtheadbb ||
-   ctx->cfg_ptr->ext_xtheadcmo || ctx->cfg_ptr->ext_xtheadsync;
+   ctx->cfg_ptr->ext_xtheadbs || ctx->cfg_ptr->ext_xtheadcmo ||
+   ctx->cfg_ptr->ext_xtheadsync;
 }
 
 #define MATERIALISE_EXT_PREDICATE(ext)  \
diff --git a/target/riscv/xthead.decode b/target/riscv/xthead.decode
index 8cd140891b..8494805611 100644
--- a/target/riscv/xthead.decode
+++ b/target/riscv/xthead.decode
@@ -58,6 +58,9 @@ th_rev   101 0 . 001 . 0001011 @r2
 th_revw  1001000 0 . 001 . 0001011 @r2
 th_tstnbz100 0 . 001 . 0001011 @r2
 
+# XTheadBs
+th_tst   100010 .. . 001 . 0001011 @sh6
+
 # XTheadCmo
 th_dcache_call   000 1 0 000 0 0001011
 th_dcache_ciall  000 00011 0 000 0 0001011
-- 
2.38.1

Re: [PATCH v4 00/12] Compiler warning fixes for libvhost-user,libvduse

2022-12-23 Thread Michael S. Tsirkin

On Fri, Dec 23, 2022 at 04:21:33PM +0100, Paolo Bonzini wrote:
> On 12/22/22 21:36, Marcel Holtmann wrote:
> > The libvhost-user and libvduse libraries are also useful for external
> > usage outside of QEMU and thus it would be nice if their files could
> > be just copied and used. However due to different compiler settings, a
> > lot of manual fixups are needed. This is the first attempt at some
> > obvious fixes that can be done without any harm to the code and its
> > readability.
> > 
> > Marcel Holtmann (12):
> >libvhost-user: Provide _GNU_SOURCE when compiling outside of QEMU
> >libvhost-user: Replace typeof with __typeof__
> >libvhost-user: Cast rc variable to avoid compiler warning
> >libvhost-user: Use unsigned int i for some for-loop iterations
> >libvhost-user: Declare uffdio_register early to make it C90 compliant
> >libvhost-user: Change dev->postcopy_ufd assignment to make it C90 
> > compliant
> >libvduse: Provide _GNU_SOURCE when compiling outside of QEMU
> >libvduse: Switch to unsigned int for inuse field in struct VduseVirtq
> >libvduse: Fix assignment in vring_set_avail_event
> >libvhost-user: Fix assignment in vring_set_avail_event
> >libvhost-user: Add extra compiler warnings
> >libvduse: Add extra compiler warnings
> > 
> >   subprojects/libvduse/libvduse.c   |  9 --
> >   subprojects/libvduse/meson.build  |  8 -
> >   subprojects/libvhost-user/libvhost-user.c | 36 +--
> >   subprojects/libvhost-user/meson.build |  8 -
> >   4 files changed, 42 insertions(+), 19 deletions(-)
> > 
> 
> Looks good, but what happened to "libvhost-user: Switch to unsigned int for
> inuse field in struct VuVirtq"?
> 
> (I can pick it up from v3, no need to respin).
> 
> Paolo

I merged that one IIRC.
Paolo I wandered whether if you are going to be merging patches in these
areas you wanted to be added to MAINTAINERS.

-- 
MST

[PULL v2 0/6] testing updates

2022-12-23 Thread Alex Bennée

The following changes since commit 222059a0fccf4af3be776fe35a5ea2d6a68f9a0b:

  Merge tag 'pull-ppc-20221221' of https://gitlab.com/danielhb/qemu into 
staging (2022-12-21 18:08:09 +)

are available in the Git repository at:

  https://gitlab.com/stsquad/qemu.git tags/pull-testing-next-231222-1

for you to fetch changes up to 3b4f911921e4233df0ba78d4acd2077da0b144ef:

  gitlab-ci: Disable docs and GUIs for the build-tci and build-tcg-disabled 
jobs (2022-12-23 15:17:13 +)


testing updates:

  - fix minor shell-ism that can break check-tcg
  - turn off verbose logging on custom runners
  - make configure echo call in CI
  - fix unused variable in linux-test
  - add binary compiler docker image for hexagon
  - disable doc and gui builds for tci and disable-tcg builds


Alex Bennée (3):
  gitlab: turn off verbose logging for make check on custom runners
  configure: repeat ourselves for the benefit of CI
  tests/tcg: fix unused variable in linux-test

Mukilan Thiyagarajan (2):
  configure: Fix check-tcg not executing any tests
  tests/docker: use prebuilt toolchain for debian-hexagon-cross

Thomas Huth (1):
  gitlab-ci: Disable docs and GUIs for the build-tci and build-tcg-disabled 
jobs

 configure  |  11 +-
 tests/tcg/multiarch/linux/linux-test.c |   6 +-
 .gitlab-ci.d/buildtest.yml |  10 +-
 .gitlab-ci.d/container-cross.yml   |  22 +---
 .gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml |  12 +-
 .../custom-runners/ubuntu-22.04-aarch32.yml|   2 +-
 .../custom-runners/ubuntu-22.04-aarch64.yml|  12 +-
 MAINTAINERS|   1 -
 tests/docker/Makefile.include  |   4 -
 .../debian-hexagon-cross.d/build-toolchain.sh  | 141 -
 .../docker/dockerfiles/debian-hexagon-cross.docker |  53 +++-
 11 files changed, 47 insertions(+), 227 deletions(-)
 delete mode 100755 
tests/docker/dockerfiles/debian-hexagon-cross.d/build-toolchain.sh

-- 
2.34.1

Re: [PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-23 Thread Alexander Graf


Hey Cornelia,

On 23.12.22 13:30, Cornelia Huck wrote:

On Fri, Dec 23 2022, Alexander Graf  wrote:


Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

   - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation

v2 -> v3:

   - Fix comment
   - Flip kvm-enabled logic for host around
---
  hw/arm/virt.c | 198 ++
  include/hw/arm/virt.h |  15 ++--
  2 files changed, 112 insertions(+), 101 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea2413a0ba..6d27f044fe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  }
  }
  
+static VirtGICType finalize_gic_version_do(const char *accel_name,

+   VirtGICType gic_version,
+   int gics_supported,
+   unsigned int max_cpus)
+{
+/* Convert host/max/nosel to GIC version number */
+switch (gic_version) {
+case VIRT_GIC_VERSION_HOST:
+if (!kvm_enabled()) {
+error_report("gic-version=host requires KVM");
+exit(1);
+}
+
+/* For KVM, gic-version=host means gic-version=max */
+return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+   gics_supported, max_cpus);

I think I'd still rather use /* fallthrough */ here, but let's leave
that decision to the maintainers.



I originally had a fallthrough here, then looked at the code and 
concluded for myself that I dislike fallthroughs :). They make more 
complicated code flows insanely complicated and are super error prone.



In any case,

Reviewed-by: Cornelia Huck 

[As an aside, we have a QEMU_FALLTHROUGH #define that maps to
__attribute__((fallthrough)) if available, but unlike the Linux kernel,
we didn't bother to convert everything to use it in QEMU. Should we?
Would using the attribute give us some extra benefits?]



IMHO we're be better off just refactoring code in ways that don't 
require fall-throughs. Modern compilers inline functions pretty well, so 
I think there's very little reason for them anymore.


Thanks a lot for the reviews!


Alex

Re: [PATCH v3] intel-iommu: Document iova_tree

2022-12-23 Thread Peter Xu

On Fri, Dec 23, 2022 at 03:48:01PM +0800, Jason Wang wrote:
> On Wed, Dec 7, 2022 at 6:13 AM Peter Xu  wrote:
> >
> > It seems not super clear on when iova_tree is used, and why.  Add a rich
> > comment above iova_tree to track why we needed the iova_tree, and when we
> > need it.
> >
> > Also comment for the map/unmap messages, on how they're used and
> > implications (e.g. unmap can be larger than the mapped ranges).
> >
> > Suggested-by: Jason Wang 
> > Signed-off-by: Peter Xu 
> > ---
> > v3:
> > - Adjust according to Eric's comment
> > ---
> >  include/exec/memory.h | 28 ++
> >  include/hw/i386/intel_iommu.h | 38 ++-
> >  2 files changed, 65 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 91f8a2395a..269ecb873b 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -129,6 +129,34 @@ struct IOMMUTLBEntry {
> >  /*
> >   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> >   * register with one or multiple IOMMU Notifier capability bit(s).
> > + *
> > + * Normally there're two use cases for the notifiers:
> > + *
> > + *   (1) When the device needs accurate synchronizations of the vIOMMU page
> > + *   tables, it needs to register with both MAP|UNMAP notifies (which
> > + *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).
> > + *
> > + *   Regarding to accurate synchronization, it's when the notified
> > + *   device maintains a shadow page table and must be notified on each
> > + *   guest MAP (page table entry creation) and UNMAP (invalidation)
> > + *   events (e.g. VFIO). Both notifications must be accurate so that
> > + *   the shadow page table is fully in sync with the guest view.
> > + *
> > + *   (2) When the device doesn't need accurate synchronizations of the
> > + *   vIOMMU page tables, it needs to register only with UNMAP or
> > + *   DEVIOTLB_UNMAP notifies.
> > + *
> > + *   It's when the device maintains a cache of IOMMU translations
> > + *   (IOTLB) and is able to fill that cache by requesting translations
> > + *   from the vIOMMU through a protocol similar to ATS (Address
> > + *   Translation Service).
> > + *
> > + *   Note that in this mode the vIOMMU will not maintain a shadowed
> > + *   page table for the address space, and the UNMAP messages can be
> > + *   actually larger than the real invalidations (just like how the
> > + *   Linux IOMMU driver normally works, where an invalidation can be
> > + *   enlarged as long as it still covers the target range).  The IOMMU
> 
> Just spot this when testing your fix for DSI:
> 
> assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> 
> Do we need to remove this (but it seems a partial revert of
> 03c7140c1a0336af3d4fca768de791b9c0e2b128)?

Replied in the othe thread.

I assume this documentation patch is still correct, am I right?  It's
talking about the possibility of enlarged invalidation range sent from the
guest and vIOMMU.  That should still not be bigger than the registered
range in iommu notifiers (even if bigger than the actual unmapped range).

Thanks,

-- 
Peter Xu

Re: [PATCH 3/3] intel-iommu: build iova tree during IOMMU translation

2022-12-23 Thread Peter Xu

On Fri, Dec 23, 2022 at 04:02:29PM +0800, Jason Wang wrote:
> On Tue, Dec 6, 2022 at 9:58 PM Peter Xu  wrote:
> >
> > On Tue, Dec 06, 2022 at 11:18:03AM +0800, Jason Wang wrote:
> > > On Tue, Dec 6, 2022 at 7:19 AM Peter Xu  wrote:
> > > >
> > > > Jason,
> > > >
> > > > On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > > > > I'm fine to go without iova-tree. Would you mind to post patches for
> > > > > fix? I can test and include it in this series then.
> > > >
> > > > One sample patch attached, only compile tested.
> > >
> > > I don't see any direct connection between the attached patch and the
> > > intel-iommu?
> >
> > Sorry!  Wrong tree dumped...  Trying again.
> 
> The HWADDR breaks memory_region_notify_iommu_one():
> 
> qemu-system-x86_64: ../softmmu/memory.c:1991:
> memory_region_notify_iommu_one: Assertion `entry->iova >=
> notifier->start && entry_end <= notifier->end' failed.
> 
> I wonder if we need either:
> 
> 1) remove the assert

I vote for this one.  Not only removing the assertion, we should probably
crop the range too just like dev-iotlb unmaps?

Thanks,

> 
> or
> 
> 2) introduce a new memory_region_notify_unmap_all() to unmap from
> notifier->start to notifier->end.

-- 
Peter Xu

Re: [PATCH 2/2] tests/migration: add support for ppc64le for guestperf.py

2022-12-23 Thread Daniel Henrique Barboza





On 8/8/22 21:24, Murilo Opsfelder Araujo wrote:

Add support for ppc64le for guestperf.py. On ppc, console is usually
hvc0 and serial device for pseries machine is spapr-vty.

Signed-off-by: Murilo Opsfelder Araujo 
---


Reviewed-by: Daniel Henrique Barboza 


  tests/migration/guestperf/engine.py | 28 +---
  1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index 87a6ab2009..88da516899 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -282,6 +282,26 @@ def _migrate(self, hardware, scenario, src, dst, 
connect_uri):
  resp = src.command("stop")
  paused = True
  
+def _is_ppc64le(self):

+_, _, _, _, machine = os.uname()
+if machine == "ppc64le":
+return True
+return False
+
+def _get_guest_console_args(self):
+if self._is_ppc64le():
+return "console=hvc0"
+else:
+return "console=ttyS0"
+
+def _get_qemu_serial_args(self):
+if self._is_ppc64le():
+return ["-chardev", "stdio,id=cdev0",
+"-device", "spapr-vty,chardev=cdev0"]
+else:
+return ["-chardev", "stdio,id=cdev0",
+"-device", "isa-serial,chardev=cdev0"]
+
  def _get_common_args(self, hardware, tunnelled=False):
  args = [
  "noapic",
@@ -290,8 +310,10 @@ def _get_common_args(self, hardware, tunnelled=False):
  "noreplace-smp",
  "cgroup_disable=memory",
  "pci=noearly",
-"console=ttyS0",
  ]
+
+args.append(self._get_guest_console_args())
+
  if self._debug:
  args.append("debug")
  else:
@@ -309,12 +331,12 @@ def _get_common_args(self, hardware, tunnelled=False):
  "-kernel", self._kernel,
  "-initrd", self._initrd,
  "-append", cmdline,
-"-chardev", "stdio,id=cdev0",
-"-device", "isa-serial,chardev=cdev0",
  "-m", str((hardware._mem * 1024) + 512),
  "-smp", str(hardware._cpus),
  ]
  
+argv.extend(self._get_qemu_serial_args())

+
  if self._debug:
  argv.extend(["-device", "sga"])

Re: [PATCH 1/2] tests/migration: add sysprof-capture-4 as dependency for stress binary

2022-12-23 Thread Daniel Henrique Barboza


Until it's upstream or rejected, no patch will be left behind.


I wasn't able to compile tests/migration/stress at all without this patch,
regardless of having sysprof-4 libraries installed in the host.


Reviewed-by: Daniel Henrique Barboza 


Juan/Dr.David, if  you don't mind I'll take this via ppc-next since
there's a PPC only change that depends on it.



Daniel


On 8/8/22 21:24, Murilo Opsfelder Araujo wrote:

`make tests/migration/stress` fails with:

 FAILED: tests/migration/stress
 cc -m64 -mlittle-endian  -o tests/migration/stress 
tests/migration/stress.p/stress.c.o -Wl,--as-needed -Wl,--no-undefined -pie 
-Wl,--warn-common -Wl,-z,relro -Wl,-z,now -fstack-protector-strong -static 
-pthread -Wl,--start-group -lgthread-2.0 -lglib-2.0 -Wl,--end-group
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gutils.c.o):
 in function `.annobin_gutils.c':
 (.text+0x3b4): warning: Using 'getpwuid' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
 /usr/bin/ld: (.text+0x178): warning: Using 'getpwnam_r' in statically 
linked applications requires at runtime the shared libraries from the glibc 
version used for linking
 /usr/bin/ld: (.text+0x1bc): warning: Using 'getpwuid_r' in statically 
linked applications requires at runtime the shared libraries from the glibc 
version used for linking
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gthread.c.o):(.toc+0x0):
 undefined reference to `sysprof_clock'
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gtrace.c.o):
 in function `.annobin_gtrace.c':
 (.text+0x24): undefined reference to `sysprof_collector_mark_vprintf'
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gtrace.c.o):
 in function `g_trace_define_int64_counter':
 (.text+0x8c): undefined reference to `sysprof_collector_request_counters'
 /usr/bin/ld: (.text+0x108): undefined reference to 
`sysprof_collector_define_counters'
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gtrace.c.o):
 in function `g_trace_set_int64_counter':
 (.text+0x23c): undefined reference to `sysprof_collector_set_counters'
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gspawn.c.o):(.toc+0x0):
 undefined reference to `sysprof_clock'
 /usr/bin/ld: 
/usr/lib/gcc/ppc64le-redhat-linux/11/../../../../lib64/libglib-2.0.a(gmain.c.o):(.toc+0x0):
 undefined reference to `sysprof_clock'
 collect2: error: ld returned 1 exit status
 ninja: build stopped: subcommand failed.
 make: *** [Makefile:162: run-ninja] Error 1

Add sysprof-capture-4 as dependency for stress binary.

Tested on:
   - CentOS Stream 9 ppc64le
   - Fedora 36 x86_64

Signed-off-by: Murilo Opsfelder Araujo 
---
  tests/migration/meson.build | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/migration/meson.build b/tests/migration/meson.build
index f215ee7d3a..3fd87f7849 100644
--- a/tests/migration/meson.build
+++ b/tests/migration/meson.build
@@ -1,7 +1,9 @@
+sysprof = dependency('sysprof-capture-4')
+
  stress = executable(
'stress',
files('stress.c'),
-  dependencies: [glib],
+  dependencies: [glib, sysprof],
link_args: ['-static'],
build_by_default: false,
  )

Re: [RFC v4 3/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread David Hildenbrand


On 23.12.22 15:23, Chuang Xu wrote:

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8260 CPU
   - NVIDIA Mellanox ConnectX-5
- VM
   - 32 CPUs 128GB RAM VM
   - 8 16-queue vhost-net device
   - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8260 CPU
   - NVIDIA Mellanox ConnectX-5
- VM
   - 32 CPUs 128GB RAM VM
   - 8 1-queue vhost-net device
   - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8260 CPU
   - NVIDIA Mellanox ConnectX-5
- VM
   - 32 CPUs 128GB RAM VM
   - 1 16-queue vhost-net device
   - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
  migration/savevm.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..19785e5a54 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,9 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis)
  uint8_t section_type;
  int ret = 0;
  
+/* call memory_region_transaction_begin() before loading vmstate */



I'd suggest extending the comment *why* you are doing that, that it's a 
pure performance optimization, and how it achieves that.


--
Thanks,

David / dhildenb

Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2022-12-23 Thread Peter Xu

Hi, Paolo,

On Fri, Dec 23, 2022 at 04:47:57PM +0100, Paolo Bonzini wrote:
> On 12/23/22 15:23, Chuang Xu wrote:
> >   static inline FlatView *address_space_to_flatview(AddressSpace *as)
> >   {
> > +/*
> > + * Before using any flatview, sanity check we're not during a memory
> > + * region transaction or the map can be invalid.  Note that this can
> > + * also be called during commit phase of memory transaction, but that
> > + * should also only happen when the depth decreases to 0 first.
> > + */
> > +assert(memory_region_transaction_get_depth() == 0 || 
> > rcu_read_locked());
> >   return qatomic_rcu_read(&as->current_map);
> >   }
> 
> This is not valid because the transaction could happen in *another* thread.
> In that case memory_region_transaction_depth() will be > 0, but RCU is
> needed.

Do you mean the code is wrong, or the comment?  Note that the code has
checked rcu_read_locked() where introduced in patch 1, but maybe something
else was missed?

Thanks,

-- 
Peter Xu

Re: [RFC v4 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Peter Xu

Chuang,

On Fri, Dec 23, 2022 at 10:23:04PM +0800, Chuang Xu wrote:
> In this version:
> 
> - attach more information in the cover letter.
> - remove changes on virtio_load().
> - add rcu_read_locked() to detect holding of rcu lock.
> 
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
> 
> This patch packs all the changes to memory region during the period of
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
> 
> Here are the test1 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 16-queue vhost-net device
>   - 16 4-queue vhost-user-blk device.
> 
>   time of loading non-iterable vmstate downtime
> beforeabout 150 ms  740+ ms
> after about 30 ms   630+ ms

Have you investigated why multi-queue added so much downtime overhead with
the same environment, comparing to below [1]?

> 
> (This result is different from that of v1. It may be that someone has 
> changed something on my host.., but it does not affect the display of 
> the optimization effect.)
> 
> 
> In test2, we keep the number of the device the same as test1, reduce the 
> number of queues per device:
> 
> Here are the test2 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 1-queue vhost-net device
>   - 16 1-queue vhost-user-blk device.
> 
>   time of loading non-iterable vmstate downtime
> beforeabout 90 ms  about 250 ms
> after about 25 ms  about 160 ms

[1]

> 
> In test3, we keep the number of queues per device the same as test1, reduce 
> the number of devices:
> 
> Here are the test3 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 1 16-queue vhost-net device
>   - 1 4-queue vhost-user-blk device.
> 
>   time of loading non-iterable vmstate downtime
> beforeabout 20 ms  about 70 ms
> after about 11 ms  about 60 ms
> 
> 
> As we can see from the test results above, both the number of queues and 
> the number of devices have a great impact on the time of loading non-iterable 
> vmstate. The growth of the number of devices and queues will lead to more 
> mr commits, and the time consumption caused by the flatview reconstruction 
> will also increase.

The downtime measured in precopy can be more complicated than postcopy
because the time of switch is calculated by qemu based on the downtime
setup, and also that contains part of RAM migrations.  Postcopy should be
more accurate on that because there's no calculation done, meanwhile
there's no RAM transferred during downtime.

However postcopy downtime is not accurate either in implementation of it in
postcopy_start(), where the downtime is measured right after we flushed the
packed data, and right below it there's some idea of optimizing it:

if (migrate_postcopy_ram()) {
/*
 * Although this ping is just for debug, it could potentially be
 * used for getting a better measurement of downtime at the source.
 */
qemu_savevm_send_ping(ms->to_dst_file, 4);
}

So maybe I'll have a look there.

Besides above, personally I'm happy with the series, one trivial comment in
patch 2 but not a huge deal.  I don't expect you can get any more comment
before the end of this year.. but let's wait until after the Xmas holiday.

Thanks!

-- 
Peter Xu

Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2022-12-23 Thread Paolo Bonzini


On 12/23/22 15:23, Chuang Xu wrote:

  static inline FlatView *address_space_to_flatview(AddressSpace *as)
  {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked());
  return qatomic_rcu_read(&as->current_map);
  }


This is not valid because the transaction could happen in *another* 
thread.  In that case memory_region_transaction_depth() will be > 0, but 
RCU is needed.


Paolo

Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2022-12-23 Thread Peter Xu

On Fri, Dec 23, 2022 at 10:23:06PM +0800, Chuang Xu wrote:
> Before using any flatview, sanity check we're not during a memory
> region transaction or the map can be invalid.
> 
> Signed-off-by: Chuang Xu 
> ---
>  include/exec/memory.h | 9 +
>  softmmu/memory.c  | 5 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 91f8a2395a..66c43b4862 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1069,8 +1069,17 @@ struct FlatView {
>  MemoryRegion *root;
>  };
>  
> +int memory_region_transaction_get_depth(void);
> +
>  static inline FlatView *address_space_to_flatview(AddressSpace *as)
>  {
> +/*
> + * Before using any flatview, sanity check we're not during a memory
> + * region transaction or the map can be invalid.  Note that this can
> + * also be called during commit phase of memory transaction, but that
> + * should also only happen when the depth decreases to 0 first.

Nitpick: after adding the RCU check the comment may need slight touch up:

* Meanwhile it's safe to access current_map with RCU read lock held
* even if during a memory transaction.  It means the user can bear
* with an obsolete map.

> + */
> +assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked());
>  return qatomic_rcu_read(&as->current_map);
>  }
>  
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index bc0be3f62c..01192e2e5b 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1116,6 +1116,11 @@ void memory_region_transaction_commit(void)
> }
>  }
>  
> +int memory_region_transaction_get_depth(void)
> +{
> +return memory_region_transaction_depth;
> +}
> +
>  static void memory_region_destructor_none(MemoryRegion *mr)
>  {
>  }
> -- 
> 2.20.1
> 

-- 
Peter Xu

Re: [PATCH v4 00/12] Compiler warning fixes for libvhost-user,libvduse

2022-12-23 Thread Marcel Holtmann

Hi Paolo,

>> The libvhost-user and libvduse libraries are also useful for external
>> usage outside of QEMU and thus it would be nice if their files could
>> be just copied and used. However due to different compiler settings, a
>> lot of manual fixups are needed. This is the first attempt at some
>> obvious fixes that can be done without any harm to the code and its
>> readability.
>> Marcel Holtmann (12):
>>   libvhost-user: Provide _GNU_SOURCE when compiling outside of QEMU
>>   libvhost-user: Replace typeof with __typeof__
>>   libvhost-user: Cast rc variable to avoid compiler warning
>>   libvhost-user: Use unsigned int i for some for-loop iterations
>>   libvhost-user: Declare uffdio_register early to make it C90 compliant
>>   libvhost-user: Change dev->postcopy_ufd assignment to make it C90 compliant
>>   libvduse: Provide _GNU_SOURCE when compiling outside of QEMU
>>   libvduse: Switch to unsigned int for inuse field in struct VduseVirtq
>>   libvduse: Fix assignment in vring_set_avail_event
>>   libvhost-user: Fix assignment in vring_set_avail_event
>>   libvhost-user: Add extra compiler warnings
>>   libvduse: Add extra compiler warnings
>>  subprojects/libvduse/libvduse.c   |  9 --
>>  subprojects/libvduse/meson.build  |  8 -
>>  subprojects/libvhost-user/libvhost-user.c | 36 +--
>>  subprojects/libvhost-user/meson.build |  8 -
>>  4 files changed, 42 insertions(+), 19 deletions(-)
> 
> Looks good, but what happened to "libvhost-user: Switch to unsigned int for 
> inuse field in struct VuVirtq"?
> 
> (I can pick it up from v3, no need to respin).

I found that it was already upstream and thus I removed it.

Regards

Marcel

Re: [PATCH v4 00/12] Compiler warning fixes for libvhost-user,libvduse

2022-12-23 Thread Paolo Bonzini


On 12/22/22 21:36, Marcel Holtmann wrote:

The libvhost-user and libvduse libraries are also useful for external
usage outside of QEMU and thus it would be nice if their files could
be just copied and used. However due to different compiler settings, a
lot of manual fixups are needed. This is the first attempt at some
obvious fixes that can be done without any harm to the code and its
readability.

Marcel Holtmann (12):
   libvhost-user: Provide _GNU_SOURCE when compiling outside of QEMU
   libvhost-user: Replace typeof with __typeof__
   libvhost-user: Cast rc variable to avoid compiler warning
   libvhost-user: Use unsigned int i for some for-loop iterations
   libvhost-user: Declare uffdio_register early to make it C90 compliant
   libvhost-user: Change dev->postcopy_ufd assignment to make it C90 compliant
   libvduse: Provide _GNU_SOURCE when compiling outside of QEMU
   libvduse: Switch to unsigned int for inuse field in struct VduseVirtq
   libvduse: Fix assignment in vring_set_avail_event
   libvhost-user: Fix assignment in vring_set_avail_event
   libvhost-user: Add extra compiler warnings
   libvduse: Add extra compiler warnings

  subprojects/libvduse/libvduse.c   |  9 --
  subprojects/libvduse/meson.build  |  8 -
  subprojects/libvhost-user/libvhost-user.c | 36 +--
  subprojects/libvhost-user/meson.build |  8 -
  4 files changed, 42 insertions(+), 19 deletions(-)



Looks good, but what happened to "libvhost-user: Switch to unsigned int 
for inuse field in struct VuVirtq"?


(I can pick it up from v3, no need to respin).

Paolo

Re: [PULL v2 07/14] accel/tcg: Use interval tree for user-only page tracking

2022-12-23 Thread Ilya Leoshkevich

On Tue, Dec 20, 2022 at 09:03:06PM -0800, Richard Henderson wrote:
> Finish weaning user-only away from PageDesc.
> 
> Using an interval tree to track page permissions means that
> we can represent very large regions efficiently.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
> Reviewed-by: Alex Bennée 
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/internal.h   |   4 +-
>  accel/tcg/tb-maint.c   |  20 +-
>  accel/tcg/user-exec.c  | 615 ++---
>  tests/tcg/multiarch/test-vma.c |  22 ++
>  4 files changed, 451 insertions(+), 210 deletions(-)
>  create mode 100644 tests/tcg/multiarch/test-vma.c

Hi,

After staring at vma-pthread.c failures for some time, I finally
spotted a few lines here that look suspicious.



>  int page_get_flags(target_ulong address)
>  {
> -PageDesc *p;
> +PageFlagsNode *p = pageflags_find(address, address);
>  
> -p = page_find(address >> TARGET_PAGE_BITS);
> -if (!p) {
> +/*
> + * See util/interval-tree.c re lockless lookups: no false positives but
> + * there are false negatives.  If we find nothing, retry with the mmap
> + * lock acquired.
> + */
> +if (p) {
> +return p->flags;
> +}
> +if (have_mmap_lock()) {
>  return 0;
>  }
> -return p->flags;
> +
> +mmap_lock();
> +p = pageflags_find(address, address);
> +mmap_unlock();

How does the code ensure that p is not freed here?

> +return p ? p->flags : 0;
> +}



>  int page_check_range(target_ulong start, target_ulong len, int flags)
>  {
> -PageDesc *p;
> -target_ulong end;
> -target_ulong addr;
> -
> -/*
> - * This function should never be called with addresses outside the
> - * guest address space.  If this assert fires, it probably indicates
> - * a missing call to h2g_valid.
> - */
> -if (TARGET_ABI_BITS > L1_MAP_ADDR_SPACE_BITS) {
> -assert(start < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
> -}
> +target_ulong last;
>  
>  if (len == 0) {
> -return 0;
> -}
> -if (start + len - 1 < start) {
> -/* We've wrapped around.  */
> -return -1;
> +return 0;  /* trivial length */
>  }
>  
> -/* must do before we loose bits in the next step */
> -end = TARGET_PAGE_ALIGN(start + len);
> -start = start & TARGET_PAGE_MASK;
> +last = start + len - 1;
> +if (last < start) {
> +return -1; /* wrap around */
> +}
> +
> +while (true) {
> +PageFlagsNode *p = pageflags_find(start, last);

We can end up here without mmap_lock if we come from the syscall code.
Do we need a retry like in page_get_flags()?
Or would it make sense to just take mmap_lock in lock_user()?

Speaking of which: does lock_user() actually guarantee that it's safe
to access the respective pages until unlock_user()? If yes, doesn't
this mean that mmap_lock must be held between the two? And if no, and
the SEGV handler is already supposed to gracefully handle SEGVs in
syscall.c, do we need to call access_ok_untagged() there at all?

> +int missing;
>  
> -for (addr = start, len = end - start;
> - len != 0;
> - len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
> -p = page_find(addr >> TARGET_PAGE_BITS);
>  if (!p) {
> -return -1;
> +return -1; /* entire region invalid */
>  }
> -if (!(p->flags & PAGE_VALID)) {
> -return -1;
> +if (start < p->itree.start) {
> +return -1; /* initial bytes invalid */
>  }
>  
> -if ((flags & PAGE_READ) && !(p->flags & PAGE_READ)) {
> -return -1;
> +missing = flags & ~p->flags;
> +if (missing & PAGE_READ) {
> +return -1; /* page not readable */
>  }
> -if (flags & PAGE_WRITE) {
> +if (missing & PAGE_WRITE) {
>  if (!(p->flags & PAGE_WRITE_ORG)) {
> +return -1; /* page not writable */
> +}
> +/* Asking about writable, but has been protected: undo. */
> +if (!page_unprotect(start, 0)) {
>  return -1;
>  }
> -/* unprotect the page if it was put read-only because it
> -   contains translated code */
> -if (!(p->flags & PAGE_WRITE)) {
> -if (!page_unprotect(addr, 0)) {
> -return -1;
> -}
> +/* TODO: page_unprotect should take a range, not a single page. 
> */
> +if (last - start < TARGET_PAGE_SIZE) {
> +return 0; /* ok */
>  }
> +start += TARGET_PAGE_SIZE;
> +continue;
>  }
> +
> +if (last <= p->itree.last) {
> +return 0; /* ok */
> +}
> +s

Re: [PATCH v2] pflash: Only read non-zero parts of backend image

2022-12-23 Thread Ard Biesheuvel

On Tue, 20 Dec 2022 at 16:33, Gerd Hoffmann  wrote:
>
> On Tue, Dec 20, 2022 at 10:30:43AM +0100, Philippe Mathieu-Daudé wrote:
> > [Extending to people using UEFI VARStore on Virt machines]
> >
> > On 20/12/22 09:42, Gerd Hoffmann wrote:
> > > From: Xiang Zheng 
> > >
> > > Currently we fill the VIRT_FLASH memory space with two 64MB NOR images
> > > when using persistent UEFI variables on virt board. Actually we only use
> > > a very small(non-zero) part of the memory while the rest significant
> > > large(zero) part of memory is wasted.
> > >
> > > So this patch checks the block status and only writes the non-zero part
> > > into memory. This requires pflash devices to use sparse files for
> > > backends.
> >
> > I like the idea, but I'm not sure how to relate with NOR flash devices.
> >
> > From the block layer, we get BDRV_BLOCK_ZERO when a block is fully
> > filled by zeroes ('\0').
> >
> > We don't want to waste host memory, I get it.
> >
> > Now what "sees" the guest? Is the UEFI VARStore filled with zeroes?
>
> The varstore is filled with 0xff.  It's 768k in size.  The padding
> following (63M plus a bit) is 0x00.  To be exact:
>
> kraxel@sirius ~# hex /usr/share/edk2/aarch64/vars-template-pflash.raw
>   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  
> 0010  8d 2b f1 ff  96 76 8b 4c  a9 85 27 47  07 5b 4f 50  .+...v.L..'G.[OP
> 0020  00 00 0c 00  00 00 00 00  5f 46 56 48  ff fe 04 00  _FVH
> 0030  48 00 28 09  00 00 00 02  03 00 00 00  00 00 04 00  H.(.
> 0040  00 00 00 00  00 00 00 00  78 2c f3 aa  7b 94 9a 43  x,..{..C
> 0050  a1 80 2e 14  4e c3 77 92  b8 ff 03 00  5a fe 00 00  N.w.Z...
> 0060  00 00 00 00  ff ff ff ff  ff ff ff ff  ff ff ff ff  
> 0070  ff ff ff ff  ff ff ff ff  ff ff ff ff  ff ff ff ff  
> *
> 0004  2b 29 58 9e  68 7c 7d 49  a0 ce 65 00  fd 9f 1b 95  +)X.h|}I..e.
> 00040010  5b e7 c6 86  fe ff ff ff  e0 ff 03 00  00 00 00 00  [...
> 00040020  ff ff ff ff  ff ff ff ff  ff ff ff ff  ff ff ff ff  
> *
> 000c  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  
> *
>
> > If so, is it a EDK2 specific case for all virt machines?  This would
> > be a virtualization optimization and in that case, this patch would
> > work.
>
> vars-template-pflash.raw (padded image) is simply QEMU_VARS.fd (unpadded
> image) with 'truncate --size 64M' applied.
>
> Yes, that's a pure virtual machine thing.  On physical hardware you
> would probably just flash the first 768k and leave the remaining flash
> capacity untouched.
>
> > * or you are trying to optimize paravirtualized guests.
>
> This.  Ideally without putting everything upside-down.
>
> >   In that case why insist with emulated NOR devices? Why not have EDK2
> >   directly use a paravirtualized block driver which we can optimize /
> >   tune without interfering with emulated models?
>
> While that probably would work for the variable store (I think we could
> very well do with variable store not being mapped and requiring explicit
> read/write requests) that idea is not going to work very well for the
> firmware code which must be mapped into the address space.  pflash is
> almost the only device we have which serves that need.  The only other
> option I can see would be a rom (the code is usually mapped r/o anyway),
> but that has pretty much the same problem space.  We would likewise want
> a big enough fixed size ROM, to avoid life migration problems and all
> that, and we want the unused space not waste memory.
>
> > Keeping insisting on optimizing guests using the QEMU pflash device
> > seems wrong to me. I'm pretty sure we can do better optimizing clouds
> > payloads.
>
> Moving away from pflash for efi variable storage would cause alot of
> churn through the whole stack.  firmware, qemu, libvirt, upper
> management, all affected.  Is that worth the trouble?  Using pflash
> isn't that much of a problem IMHO.
>

Agreed. pflash is a bit clunky but not a huge problem atm (although
setting up and tearing down the r/o memslot for every read resp. write
results in some performance issues under kvm/arm64)

*If* we decide to replace it, I would suggest an emulated ROM for the
executable image (without any emulated programming facility
whatsoever) and a paravirtualized get/setvariable interface which can
be used in a sane way to virtualize secure boot without having to
emulate SMM or other secure world firmware interfaces.

[RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2022-12-23 Thread Chuang Xu

Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 5 +
 2 files changed, 14 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..66c43b4862 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+int memory_region_transaction_get_depth(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked());
 return qatomic_rcu_read(&as->current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..01192e2e5b 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1116,6 +1116,11 @@ void memory_region_transaction_commit(void)
}
 }
 
+int memory_region_transaction_get_depth(void)
+{
+return memory_region_transaction_depth;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1

[RFC v4 1/3] rcu: introduce rcu_read_locked()

2022-12-23 Thread Chuang Xu

add rcu_read_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..42cbd0080f 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1

[RFC v4 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Chuang Xu

In this version:

- attach more information in the cover letter.
- remove changes on virtio_load().
- add rcu_read_locked() to detect holding of rcu lock.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

(This result is different from that of v1. It may be that someone has 
changed something on my host.., but it does not affect the display of 
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms



In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms

[RFC v4 3/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Chuang Xu

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..19785e5a54 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,9 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis)
 uint8_t section_type;
 int ret = 0;
 
+/* call memory_region_transaction_begin() before loading vmstate */
+memory_region_transaction_begin();
+
 retry:
 while (true) {
 section_type = qemu_get_byte(f);
@@ -2684,6 +2687,10 @@ out:
 goto retry;
 }
 }
+
+/* call memory_region_transaction_commit() after loading vmstate */
+memory_region_transaction_commit();
+
 return ret;
 }
 
-- 
2.20.1

Re: [PATCH v2 2/5] target/riscv: Update VS timer whenever htimedelta changes

2022-12-23 Thread Anup Patel

On Thu, Dec 15, 2022 at 8:55 AM Alistair Francis  wrote:
>
> On Mon, Dec 12, 2022 at 9:12 PM Anup Patel  wrote:
> >
> > On Mon, Dec 12, 2022 at 11:23 AM Alistair Francis  
> > wrote:
> > >
> > > On Thu, Dec 8, 2022 at 6:41 PM Anup Patel  wrote:
> > > >
> > > > On Thu, Dec 8, 2022 at 9:00 AM Alistair Francis  
> > > > wrote:
> > > > >
> > > > > On Tue, Nov 8, 2022 at 11:07 PM Anup Patel  
> > > > > wrote:
> > > > > >
> > > > > > The htimedelta[h] CSR has impact on the VS timer comparison so we
> > > > > > should call riscv_timer_write_timecmp() whenever htimedelta changes.
> > > > > >
> > > > > > Fixes: 3ec0fe18a31f ("target/riscv: Add vstimecmp suppor")
> > > > > > Signed-off-by: Anup Patel 
> > > > > > Reviewed-by: Alistair Francis 
> > > > >
> > > > > This patch breaks my Xvisor test. When running OpenSBI and Xvisor 
> > > > > like this:
> > > > >
> > > > > qemu-system-riscv64 -machine virt \
> > > > > -m 1G -serial mon:stdio -serial null -nographic \
> > > > > -append 'vmm.console=uart@1000 vmm.bootcmd="vfs mount initrd
> > > > > /;vfs run /boot.xscript;vfs cat /system/banner.txt; guest kick guest0;
> > > > > vserial bind guest0/uart0"' \
> > > > > -smp 4 -d guest_errors \
> > > > > -bios none \
> > > > > -device loader,file=./images/qemuriscv64/vmm.bin,addr=0x8020 \
> > > > > -kernel ./images/qemuriscv64/fw_jump.elf \
> > > > > -initrd ./images/qemuriscv64/vmm-disk-linux.img -cpu rv64,h=true
> > > > >
> > > > > Running:
> > > > >
> > > > > Xvisor v0.3.0-129-gbc33f339 (Jan  1 1970 00:00:00)
> > > > >
> > > > > I see this failure:
> > > > >
> > > > > INIT: bootcmd:  guest kick guest0
> > > > >
> > > > > guest0: Kicked
> > > > >
> > > > > INIT: bootcmd:  vserial bind guest0/uart0
> > > > >
> > > > > [guest0/uart0] cpu_vcpu_stage2_map: guest_phys=0x3B9AC000
> > > > > size=0x4096 map failed
> > > > >
> > > > > do_error: CPU3: VCPU=guest0/vcpu0 page fault failed (error -1)
> > > > >
> > > > >zero=0x  ra=0x80001B4E
> > > > >
> > > > >  sp=0x8001CF80  gp=0x
> > > > >
> > > > >  tp=0x  s0=0x8001CFB0
> > > > >
> > > > >  s1=0x  a0=0x10001048
> > > > >
> > > > >  a1=0x  a2=0x00989680
> > > > >
> > > > >  a3=0x3B9ACA00  a4=0x0048
> > > > >
> > > > >  a5=0x  a6=0x00019000
> > > > >
> > > > >  a7=0x  s2=0x
> > > > >
> > > > >  s3=0x  s4=0x
> > > > >
> > > > >  s5=0x  s6=0x
> > > > >
> > > > >  s7=0x  s8=0x
> > > > >
> > > > >  s9=0x s10=0x
> > > > >
> > > > > s11=0x  t0=0x4000
> > > > >
> > > > >  t1=0x0100  t2=0x
> > > > >
> > > > >  t3=0x  t4=0x
> > > > >
> > > > >  t5=0x  t6=0x
> > > > >
> > > > >sepc=0x80001918 sstatus=0x00024120
> > > > >
> > > > > hstatus=0x0002002001C0 sp_exec=0x10A64000
> > > > >
> > > > >  scause=0x0017   stval=0x3B9ACAF8
> > > > >
> > > > >   htval=0x0EE6B2BE  htinst=0x00D03021
> > > > >
> > > > > I have tried updating to a newer Xvisor release, but with that I don't
> > > > > get any serial output.
> > > > >
> > > > > Can you help get the Xvisor tests back up and running?
> > > >
> > > > I tried the latest Xvisor-next (https://github.com/avpatel/xvisor-next)
> > > > with your QEMU riscv-to-apply.next branch and it works fine (both
> > > > with and without Sstc).
> > >
> > > Does it work with the latest release?
> >
> > Yes, the latest Xvisor-next repo works for QEMU v7.2.0-rc4 and
> > your riscv-to-apply.next branch (commit 51bb9de2d188)
>
> I can't get anything to work with this patch. I have dropped this and
> the patches after this.
>
> I'm building the latest Xvisor release with:
>
> export CROSS_COMPILE=riscv64-linux-gnu-
> ARCH=riscv make generic-64b-defconfig
> make
>
> and running it as above, yet nothing. What am I missing here?

I tried multiple times with the latest Xvisor on different machines but
still can't reproduce the issue you are seeing.

We generally provide pre-built binaries with every Xvisor release
so I will share with you pre-built binaries of the upcoming Xvisor-0.3.2
release. Maybe that would help you ?

Regards,
Anup

>
> Alistair
>
> >
> > Regards,
> > Anup
> >
> > >
> > > Alistair
> > >
> > > >
> > > > Here's the QEMU command which I use:
> > > >
> > > > qemu-system-riscv64 -M virt -m 512M -nographic \
> > > > -bios opensbi/build/p

Re: [PATCH 13/15] hw/riscv/spike.c: simplify create_fdt()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:29 AM Daniel Henrique Barboza
 wrote:
>
> 'mem_size' and 'cmdline' aren't being used and the MachineState pointer
> is being retrieved via a MACHINE() macro.
>
> Remove 'mem_size' and 'cmdline' and add MachineState as a parameter.

Why do you add MachineState as a parameter? What's the problem of
using the MACHINE() macro?

>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/spike.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
> index 2b9af5689e..181bf394a0 100644
> --- a/hw/riscv/spike.c
> +++ b/hw/riscv/spike.c
> @@ -48,15 +48,14 @@ static const MemMapEntry spike_memmap[] = {
>  [SPIKE_DRAM] = { 0x8000,0x0 },
>  };
>
> -static void create_fdt(SpikeState *s, const MemMapEntry *memmap,
> -   uint64_t mem_size, const char *cmdline, bool 
> is_32_bit)
> +static void create_fdt(MachineState *mc, SpikeState *s,
> +   const MemMapEntry *memmap, bool is_32_bit)
>  {
>  void *fdt;
>  int fdt_size;
>  uint64_t addr, size;
>  unsigned long clint_addr;
>  int cpu, socket;
> -MachineState *mc = MACHINE(s);
>  uint32_t *clint_cells;
>  uint32_t cpu_phandle, intc_phandle, phandle = 1;
>  char *name, *mem_name, *clint_name, *clust_name;
> @@ -254,8 +253,7 @@ static void spike_board_init(MachineState *machine)
>  mask_rom);
>
>  /* Create device tree */
> -create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline,
> -   riscv_is_32bit(&s->soc[0]));
> +create_fdt(machine, s, memmap, riscv_is_32bit(&s->soc[0]));
>
>  /*
>   * Not like other RISC-V machines that use plain binary bios images,
> --

Regards,
Bin

Re: [PATCH 12/15] hw/riscv/boot.c: make riscv_load_initrd() static

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:25 AM Daniel Henrique Barboza
 wrote:
>
> The only remaining caller is riscv_load_kernel() which belongs to the
> same file.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c | 76 -
>  include/hw/riscv/boot.h |  1 -
>  2 files changed, 38 insertions(+), 39 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH 11/15] hw/riscv/boot.c: consolidate all kernel init in riscv_load_kernel()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:29 AM Daniel Henrique Barboza
 wrote:
>
> The microchip_icicle_kit, sifive_u, spike and virt boards are now doing
> the same steps when '-kernel' is used:
>
> - execute load_kernel()
> - load init_rd()
> - write kernel_cmdline
>
> Let's fold everything inside riscv_load_kernel() to avoid code
> repetition. Every other board that uses riscv_load_kernel() will have
> this same behavior, including boards that doesn't have a valid FDT, so
> we need to take care to not do FDT operations without checking it first.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c| 21 ++---
>  hw/riscv/microchip_pfsoc.c |  9 -
>  hw/riscv/sifive_u.c|  9 -
>  hw/riscv/spike.c   |  9 -
>  hw/riscv/virt.c|  9 -
>  5 files changed, 18 insertions(+), 39 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-23 Thread Cornelia Huck

On Fri, Dec 23 2022, Alexander Graf  wrote:

> Up to now, the finalize_gic_version() code open coded what is essentially
> a support bitmap match between host/emulation environment and desired
> target GIC type.
>
> This open coding leads to undesirable side effects. For example, a VM with
> KVM and -smp 10 will automatically choose GICv3 while the same command
> line with TCG will stay on GICv2 and fail the launch.
>
> This patch combines the TCG and KVM matching code paths by making
> everything a 2 pass process. First, we determine which GIC versions the
> current environment is able to support, then we go through a single
> state machine to determine which target GIC mode that means for us.
>
> After this patch, the only user noticable changes should be consolidated
> error messages as well as TCG -M virt supporting -smp > 8 automatically.
>
> Signed-off-by: Alexander Graf 
>
> ---
>
> v1 -> v2:
>
>   - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation
>
> v2 -> v3:
>
>   - Fix comment
>   - Flip kvm-enabled logic for host around
> ---
>  hw/arm/virt.c | 198 ++
>  include/hw/arm/virt.h |  15 ++--
>  2 files changed, 112 insertions(+), 101 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ea2413a0ba..6d27f044fe 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int 
> pa_bits)
>  }
>  }
>  
> +static VirtGICType finalize_gic_version_do(const char *accel_name,
> +   VirtGICType gic_version,
> +   int gics_supported,
> +   unsigned int max_cpus)
> +{
> +/* Convert host/max/nosel to GIC version number */
> +switch (gic_version) {
> +case VIRT_GIC_VERSION_HOST:
> +if (!kvm_enabled()) {
> +error_report("gic-version=host requires KVM");
> +exit(1);
> +}
> +
> +/* For KVM, gic-version=host means gic-version=max */
> +return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
> +   gics_supported, max_cpus);

I think I'd still rather use /* fallthrough */ here, but let's leave
that decision to the maintainers.

In any case,

Reviewed-by: Cornelia Huck 

[As an aside, we have a QEMU_FALLTHROUGH #define that maps to
__attribute__((fallthrough)) if available, but unlike the Linux kernel,
we didn't bother to convert everything to use it in QEMU. Should we?
Would using the attribute give us some extra benefits?]

> +case VIRT_GIC_VERSION_MAX:
> +if (gics_supported & VIRT_GIC_VERSION_4_MASK) {
> +gic_version = VIRT_GIC_VERSION_4;
> +} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
> +gic_version = VIRT_GIC_VERSION_3;
> +} else {
> +gic_version = VIRT_GIC_VERSION_2;
> +}
> +break;
> +case VIRT_GIC_VERSION_NOSEL:
> +if ((gics_supported & VIRT_GIC_VERSION_2_MASK) &&
> +max_cpus <= GIC_NCPU) {
> +gic_version = VIRT_GIC_VERSION_2;
> +} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
> +/*
> + * in case the host does not support v2 emulation or
> + * the end-user requested more than 8 VCPUs we now default
> + * to v3. In any case defaulting to v2 would be broken.
> + */
> +gic_version = VIRT_GIC_VERSION_3;
> +} else if (max_cpus > GIC_NCPU) {
> +error_report("%s only supports GICv2 emulation but more than 8 "
> + "vcpus are requested", accel_name);
> +exit(1);
> +}
> +break;
> +case VIRT_GIC_VERSION_2:
> +case VIRT_GIC_VERSION_3:
> +case VIRT_GIC_VERSION_4:
> +break;
> +}

Re: [PATCH] linux-user: Improve strace output of personality() and sysinfo()

2022-12-23 Thread Helge Deller


On 12/23/22 12:01, Philippe Mathieu-Daudé wrote:

On 23/12/22 11:53, Helge Deller wrote:

On 12/23/22 11:50, Philippe Mathieu-Daudé wrote:

On 23/12/22 11:01, Helge Deller wrote:

Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
---
  linux-user/strace.list | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index f9254725a1..909298099e 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1043,7 +1043,7 @@
  { TARGET_NR_perfctr, "perfctr" , NULL, NULL, NULL },
  #endif
  #ifdef TARGET_NR_personality
-{ TARGET_NR_personality, "personality" , NULL, NULL, NULL },
+{ TARGET_NR_personality, "personality" , "%s(%p)", NULL, 
print_syscall_ret_addr },


Shouldn't this be:

    { TARGET_NR_personality, "personality" , "%s(%u)", NULL, NULL },


Basically yes, but...
it's a bitmap, so printing it as hex value (similiar to a pointer)
is easier to read/analyze.


Oh, good point. Then "%s(0x"TARGET_ABI_FMT_lx")" is self-explicit.


Hmm ... I don't see that as any benefit for the user and the output is the same.


Also for clarity:

#define print_syscall_ret_persona print_syscall_ret_addr

So what do you think of:

{ TARGET_NR_personality, "personality" , "%s(0x"TARGET_ABI_FMT_lx")",
    NULL, print_syscall_ret_persona },


This change does make sense if someone fully implements the
strace of personality() including showing the flags as string.
Until then it's IMHO just one more function which uses space
and gains nothing.

Helge

Re: [PATCH] tests/tcg/multiarch: add vma-pthread.c

2022-12-23 Thread Ilya Leoshkevich

On Fri, 2022-12-23 at 13:02 +0100, Ilya Leoshkevich wrote:
> Add a test that locklessly changes and exercises page protection bits
> from various threads. This helps catch race conditions in the VMA
> handling.
> 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  tests/tcg/multiarch/Makefile.target  |   3 +
>  tests/tcg/multiarch/munmap-pthread.c |  16 +--
>  tests/tcg/multiarch/nop_func.h   |  25 
>  tests/tcg/multiarch/vma-pthread.c    | 185
> +++
>  4 files changed, 214 insertions(+), 15 deletions(-)
>  create mode 100644 tests/tcg/multiarch/nop_func.h
>  create mode 100644 tests/tcg/multiarch/vma-pthread.c

This was meant to be a reply to the bug report for [1], but apparently
I forgot to Cc: the mailing list. Copying the original message here:

---
Hi,

Wasmtime testsuite started failing randomly, complaining that
clock_gettime() returns -EFAULT. Bisect points to this commit.

I could not see anything obviously wrong here with the manual review,
and the failure was not reproducible when running individual testcases
or using strace. So I wrote a stress test (which I will post shortly),
which runs fine on the host, but reproduces the issue with qemu-user.

When run with -strace, it also triggers an assertion:

qemu-x86_64: ../accel/tcg/tb-maint.c:595:
tb_invalidate_phys_page_unwind: Assertion `pc != 0' failed.
qemu-x86_64: /home/iii/qemu/include/qemu/rcu.h:102:
rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

I haven't tried analyzing what is causing all this yet, but at least
now the reproducer is small (~200LOC) and fails faster than 1s.

Best regards,
Ilya
---

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg03615.html

[PATCH] tests/tcg/multiarch: add vma-pthread.c

2022-12-23 Thread Ilya Leoshkevich

Add a test that locklessly changes and exercises page protection bits
from various threads. This helps catch race conditions in the VMA
handling.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/multiarch/Makefile.target  |   3 +
 tests/tcg/multiarch/munmap-pthread.c |  16 +--
 tests/tcg/multiarch/nop_func.h   |  25 
 tests/tcg/multiarch/vma-pthread.c| 185 +++
 4 files changed, 214 insertions(+), 15 deletions(-)
 create mode 100644 tests/tcg/multiarch/nop_func.h
 create mode 100644 tests/tcg/multiarch/vma-pthread.c

diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index 5f0fee1aadb..e7213af4925 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -39,6 +39,9 @@ signals: LDFLAGS+=-lrt -lpthread
 munmap-pthread: CFLAGS+=-pthread
 munmap-pthread: LDFLAGS+=-pthread
 
+vma-pthread: CFLAGS+=-pthread
+vma-pthread: LDFLAGS+=-pthread
+
 # We define the runner for test-mmap after the individual
 # architectures have defined their supported pages sizes. If no
 # additional page sizes are defined we only run the default test.
diff --git a/tests/tcg/multiarch/munmap-pthread.c 
b/tests/tcg/multiarch/munmap-pthread.c
index d7143b00d5f..1c79005846d 100644
--- a/tests/tcg/multiarch/munmap-pthread.c
+++ b/tests/tcg/multiarch/munmap-pthread.c
@@ -7,21 +7,7 @@
 #include 
 #include 
 
-static const char nop_func[] = {
-#if defined(__aarch64__)
-0xc0, 0x03, 0x5f, 0xd6, /* ret */
-#elif defined(__alpha__)
-0x01, 0x80, 0xFA, 0x6B, /* ret */
-#elif defined(__arm__)
-0x1e, 0xff, 0x2f, 0xe1, /* bx lr */
-#elif defined(__riscv)
-0x67, 0x80, 0x00, 0x00, /* ret */
-#elif defined(__s390__)
-0x07, 0xfe, /* br %r14 */
-#elif defined(__i386__) || defined(__x86_64__)
-0xc3,   /* ret */
-#endif
-};
+#include "nop_func.h"
 
 static void *thread_mmap_munmap(void *arg)
 {
diff --git a/tests/tcg/multiarch/nop_func.h b/tests/tcg/multiarch/nop_func.h
new file mode 100644
index 000..f714d21
--- /dev/null
+++ b/tests/tcg/multiarch/nop_func.h
@@ -0,0 +1,25 @@
+/*
+ * No-op functions that can be safely copied.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef NOP_FUNC_H
+#define NOP_FUNC_H
+
+static const char nop_func[] = {
+#if defined(__aarch64__)
+0xc0, 0x03, 0x5f, 0xd6, /* ret */
+#elif defined(__alpha__)
+0x01, 0x80, 0xFA, 0x6B, /* ret */
+#elif defined(__arm__)
+0x1e, 0xff, 0x2f, 0xe1, /* bx lr */
+#elif defined(__riscv)
+0x67, 0x80, 0x00, 0x00, /* ret */
+#elif defined(__s390__)
+0x07, 0xfe, /* br %r14 */
+#elif defined(__i386__) || defined(__x86_64__)
+0xc3,   /* ret */
+#endif
+};
+
+#endif
diff --git a/tests/tcg/multiarch/vma-pthread.c 
b/tests/tcg/multiarch/vma-pthread.c
new file mode 100644
index 000..c405cd46329
--- /dev/null
+++ b/tests/tcg/multiarch/vma-pthread.c
@@ -0,0 +1,185 @@
+/*
+ * Test that VMA updates do not race.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Map a contiguous chunk of RWX memory. Split it into 8 equally sized
+ * regions, each of which is guaranteed to have a certain combination of
+ * protection bits set.
+ *
+ * Reader, writer and executor threads perform the respective operations on
+ * pages, which are guaranteed to have the respective protection bit set.
+ * Two mutator threads change the non-fixed protection bits randomly.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "nop_func.h"
+
+#define PAGE_IDX_BITS 10
+#define PAGE_COUNT (1 << PAGE_IDX_BITS)
+#define PAGE_IDX_MASK (PAGE_COUNT - 1)
+#define REGION_IDX_BITS 3
+#define PAGE_IDX_R_MASK (1 << 7)
+#define PAGE_IDX_W_MASK (1 << 8)
+#define PAGE_IDX_X_MASK (1 << 9)
+#define REGION_MASK (PAGE_IDX_R_MASK | PAGE_IDX_W_MASK | PAGE_IDX_X_MASK)
+#define PAGES_PER_REGION (1 << (PAGE_IDX_BITS - REGION_IDX_BITS))
+
+struct context {
+int pagesize;
+char *ptr;
+int dev_null_fd;
+volatile int mutator_count;
+};
+
+static void *thread_read(void *arg)
+{
+struct context *ctx = arg;
+ssize_t sret;
+size_t i, j;
+int ret;
+
+for (i = 0; ctx->mutator_count; i++) {
+j = (i & PAGE_IDX_MASK) | PAGE_IDX_R_MASK;
+/* Read directly. */
+ret = memcmp(&ctx->ptr[j * ctx->pagesize], nop_func, sizeof(nop_func));
+assert(ret == 0);
+/* Read indirectly. */
+sret = write(ctx->dev_null_fd, &ctx->ptr[j * ctx->pagesize], 1);
+assert(sret == 1);
+}
+
+return NULL;
+}
+
+static void *thread_write(void *arg)
+{
+struct context *ctx = arg;
+struct timespec *ts;
+size_t i, j;
+int ret;
+
+for (i = 0; ctx->mutator_count; i++) {
+j = (i & PAGE_IDX_MASK) | PAGE_IDX_W_MASK;
+/* Write directly. */
+memcpy(&ctx->ptr[j * ctx->pagesize], nop_func, sizeof(nop_func));
+/* Write using a sy

[PULL 4/5] hw/9pfs: Drop unnecessary *xattr wrapper API declarations

2022-12-23 Thread Christian Schoenebeck

From: Bin Meng 

These are not used anywhere in the source tree. Drop them.

Signed-off-by: Bin Meng 
Reviewed-by: Greg Kurz 
Message-Id: <20221219102022.2167736-3-bin.m...@windriver.com>
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-util.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
index c3526144c9..ccfc8b1cb3 100644
--- a/hw/9pfs/9p-util.h
+++ b/hw/9pfs/9p-util.h
@@ -90,19 +90,8 @@ static inline int errno_to_dotl(int err) {
 
 #ifdef CONFIG_DARWIN
 #define qemu_fgetxattr(...) fgetxattr(__VA_ARGS__, 0, 0)
-#define qemu_lgetxattr(...) getxattr(__VA_ARGS__, 0, XATTR_NOFOLLOW)
-#define qemu_llistxattr(...) listxattr(__VA_ARGS__, XATTR_NOFOLLOW)
-#define qemu_lremovexattr(...) removexattr(__VA_ARGS__, XATTR_NOFOLLOW)
-static inline int qemu_lsetxattr(const char *path, const char *name,
- const void *value, size_t size, int flags) {
-return setxattr(path, name, value, size, 0, flags | XATTR_NOFOLLOW);
-}
 #else
 #define qemu_fgetxattr fgetxattr
-#define qemu_lgetxattr lgetxattr
-#define qemu_llistxattr llistxattr
-#define qemu_lremovexattr lremovexattr
-#define qemu_lsetxattr lsetxattr
 #endif
 
 static inline void close_preserve_errno(int fd)
-- 
2.30.2

[PULL 5/5] hw/9pfs: Replace the direct call to xxxat() APIs with a wrapper

2022-12-23 Thread Christian Schoenebeck

From: Bin Meng 

xxxat() APIs are only available on POSIX platforms. For future
extension to Windows, let's replace the direct call to xxxat()
APIs with a wrapper.

Signed-off-by: Bin Meng 
Message-Id: <20221219102022.2167736-4-bin.m...@windriver.com>
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-local.c | 32 
 hw/9pfs/9p-util.h  | 15 +++
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index d42ce6d8b8..d2246a3d7e 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -103,14 +103,14 @@ static void renameat_preserve_errno(int odirfd, const 
char *opath, int ndirfd,
 const char *npath)
 {
 int serrno = errno;
-renameat(odirfd, opath, ndirfd, npath);
+qemu_renameat(odirfd, opath, ndirfd, npath);
 errno = serrno;
 }
 
 static void unlinkat_preserve_errno(int dirfd, const char *path, int flags)
 {
 int serrno = errno;
-unlinkat(dirfd, path, flags);
+qemu_unlinkat(dirfd, path, flags);
 errno = serrno;
 }
 
@@ -194,7 +194,7 @@ static int local_lstat(FsContext *fs_ctx, V9fsPath 
*fs_path, struct stat *stbuf)
 goto out;
 }
 
-err = fstatat(dirfd, name, stbuf, AT_SYMLINK_NOFOLLOW);
+err = qemu_fstatat(dirfd, name, stbuf, AT_SYMLINK_NOFOLLOW);
 if (err) {
 goto err_out;
 }
@@ -253,7 +253,7 @@ static int local_set_mapped_file_attrat(int dirfd, const 
char *name,
 }
 }
 } else {
-ret = mkdirat(dirfd, VIRTFS_META_DIR, 0700);
+ret = qemu_mkdirat(dirfd, VIRTFS_META_DIR, 0700);
 if (ret < 0 && errno != EEXIST) {
 return -1;
 }
@@ -349,7 +349,7 @@ static int fchmodat_nofollow(int dirfd, const char *name, 
mode_t mode)
  */
 
  /* First, we clear non-racing symlinks out of the way. */
-if (fstatat(dirfd, name, &stbuf, AT_SYMLINK_NOFOLLOW)) {
+if (qemu_fstatat(dirfd, name, &stbuf, AT_SYMLINK_NOFOLLOW)) {
 return -1;
 }
 if (S_ISLNK(stbuf.st_mode)) {
@@ -734,7 +734,7 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath 
*dir_path,
 
 if (fs_ctx->export_flags & V9FS_SM_MAPPED ||
 fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-err = mkdirat(dirfd, name, fs_ctx->dmode);
+err = qemu_mkdirat(dirfd, name, fs_ctx->dmode);
 if (err == -1) {
 goto out;
 }
@@ -750,7 +750,7 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath 
*dir_path,
 }
 } else if (fs_ctx->export_flags & V9FS_SM_PASSTHROUGH ||
fs_ctx->export_flags & V9FS_SM_NONE) {
-err = mkdirat(dirfd, name, credp->fc_mode);
+err = qemu_mkdirat(dirfd, name, credp->fc_mode);
 if (err == -1) {
 goto out;
 }
@@ -990,7 +990,7 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
 int omap_dirfd, nmap_dirfd;
 
-ret = mkdirat(ndirfd, VIRTFS_META_DIR, 0700);
+ret = qemu_mkdirat(ndirfd, VIRTFS_META_DIR, 0700);
 if (ret < 0 && errno != EEXIST) {
 goto err_undo_link;
 }
@@ -1085,7 +1085,7 @@ static int local_utimensat(FsContext *s, V9fsPath 
*fs_path,
 goto out;
 }
 
-ret = utimensat(dirfd, name, buf, AT_SYMLINK_NOFOLLOW);
+ret = qemu_utimensat(dirfd, name, buf, AT_SYMLINK_NOFOLLOW);
 close_preserve_errno(dirfd);
 out:
 g_free(dirpath);
@@ -1116,7 +1116,7 @@ static int local_unlinkat_common(FsContext *ctx, int 
dirfd, const char *name,
 if (fd == -1) {
 return -1;
 }
-ret = unlinkat(fd, VIRTFS_META_DIR, AT_REMOVEDIR);
+ret = qemu_unlinkat(fd, VIRTFS_META_DIR, AT_REMOVEDIR);
 close_preserve_errno(fd);
 if (ret < 0 && errno != ENOENT) {
 return -1;
@@ -1124,7 +1124,7 @@ static int local_unlinkat_common(FsContext *ctx, int 
dirfd, const char *name,
 }
 map_dirfd = openat_dir(dirfd, VIRTFS_META_DIR);
 if (map_dirfd != -1) {
-ret = unlinkat(map_dirfd, name, 0);
+ret = qemu_unlinkat(map_dirfd, name, 0);
 close_preserve_errno(map_dirfd);
 if (ret < 0 && errno != ENOENT) {
 return -1;
@@ -1134,7 +1134,7 @@ static int local_unlinkat_common(FsContext *ctx, int 
dirfd, const char *name,
 }
 }
 
-return unlinkat(dirfd, name, flags);
+return qemu_unlinkat(dirfd, name, flags);
 }
 
 static int local_remove(FsContext *ctx, const char *path)
@@ -1151,7 +1151,7 @@ static int local_remove(FsContext *ctx, const char *path)
 goto out;
 }
 
-if (fstatat(dirfd, name, &stbuf, AT_SYMLINK_NOFOLLOW) < 0) {
+if (qemu_fstatat(dirfd, name, &stbuf, AT_SYMLINK_NOFOLLOW) < 0) {
 goto err_out;
 }
 
@@ -1296,7 +1296,7 @@ static int local_renameat(FsContext *ctx, V9fsPath 
*olddir,
 return -1;
 }

[PULL 3/5] qemu/xattr.h: Exclude for Windows

2022-12-23 Thread Christian Schoenebeck

From: Bin Meng 

Windows does not have .

Signed-off-by: Bin Meng 
Message-Id: <20221219102022.2167736-2-bin.m...@windriver.com>
Signed-off-by: Christian Schoenebeck 
---
 include/qemu/xattr.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/qemu/xattr.h b/include/qemu/xattr.h
index f1d0f7be74..b08a934acc 100644
--- a/include/qemu/xattr.h
+++ b/include/qemu/xattr.h
@@ -25,7 +25,9 @@
 #  if !defined(ENOATTR)
 #define ENOATTR ENODATA
 #  endif
-#  include 
+#  ifndef CONFIG_WIN32
+#include 
+#  endif
 #endif
 
 #endif
-- 
2.30.2

[PULL 0/5] 9p queue 2022-12-23

2022-12-23 Thread Christian Schoenebeck

The following changes since commit 222059a0fccf4af3be776fe35a5ea2d6a68f9a0b:

  Merge tag 'pull-ppc-20221221' of https://gitlab.com/danielhb/qemu into 
staging (2022-12-21 18:08:09 +)

are available in the Git repository at:

  https://github.com/cschoenebeck/qemu.git tags/pull-9p-20221223

for you to fetch changes up to 6ca60cd7a388a776d72739e5a404e65c19460511:

  hw/9pfs: Replace the direct call to xxxat() APIs with a wrapper (2022-12-23 
11:48:13 +0100)


9pfs: Windows host prep, cleanup

* Next preparatory patches for upcoming Windows host support.

* Cleanup patches.


Bin Meng (3):
  qemu/xattr.h: Exclude  for Windows
  hw/9pfs: Drop unnecessary *xattr wrapper API declarations
  hw/9pfs: Replace the direct call to xxxat() APIs with a wrapper

Christian Schoenebeck (1):
  MAINTAINERS: Add 9p test client to section "virtio-9p"

Greg Kurz (1):
  9pfs: Fix some return statements in the synth backend

 MAINTAINERS  |  1 +
 hw/9pfs/9p-local.c   | 32 
 hw/9pfs/9p-synth.c   | 12 ++--
 hw/9pfs/9p-util.h| 26 +++---
 include/qemu/xattr.h |  4 +++-
 5 files changed, 37 insertions(+), 38 deletions(-)

[PULL 1/5] 9pfs: Fix some return statements in the synth backend

2022-12-23 Thread Christian Schoenebeck

From: Greg Kurz 

The qemu_v9fs_synth_mkdir() and qemu_v9fs_synth_add_file() functions
currently return a positive errno value on failure. This causes
checkpatch.pl to spit several errors like the one below:

ERROR: return of an errno should typically be -ve (return -EAGAIN)
+return EAGAIN;

Simply change the sign. This has no consequence since callers
assert() the returned value to be equal to 0.

Reported-by: Markus Armbruster 
Signed-off-by: Greg Kurz 
Message-Id: <166930551818.827792.10663674346122681963.stgit@bahia>
[C.S.: - Resolve conflict with 66997c42e02c. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-synth.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index 38d787f494..f62c40b639 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -75,10 +75,10 @@ int qemu_v9fs_synth_mkdir(V9fsSynthNode *parent, int mode,
 V9fsSynthNode *node, *tmp;
 
 if (!synth_fs) {
-return EAGAIN;
+return -EAGAIN;
 }
 if (!name || (strlen(name) >= NAME_MAX)) {
-return EINVAL;
+return -EINVAL;
 }
 if (!parent) {
 parent = &synth_root;
@@ -86,7 +86,7 @@ int qemu_v9fs_synth_mkdir(V9fsSynthNode *parent, int mode,
 QEMU_LOCK_GUARD(&synth_mutex);
 QLIST_FOREACH(tmp, &parent->child, sibling) {
 if (!strcmp(tmp->name, name)) {
-return EEXIST;
+return -EEXIST;
 }
 }
 /* Add the name */
@@ -106,10 +106,10 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int 
mode,
 V9fsSynthNode *node, *tmp;
 
 if (!synth_fs) {
-return EAGAIN;
+return -EAGAIN;
 }
 if (!name || (strlen(name) >= NAME_MAX)) {
-return EINVAL;
+return -EINVAL;
 }
 if (!parent) {
 parent = &synth_root;
@@ -118,7 +118,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int 
mode,
 QEMU_LOCK_GUARD(&synth_mutex);
 QLIST_FOREACH(tmp, &parent->child, sibling) {
 if (!strcmp(tmp->name, name)) {
-return EEXIST;
+return -EEXIST;
 }
 }
 /* Add file type and remove write bits */
-- 
2.30.2

[PULL 2/5] MAINTAINERS: Add 9p test client to section "virtio-9p"

2022-12-23 Thread Christian Schoenebeck

The 9p test cases use a dedicated, lite-weight 9p client implementation
(using virtio transport) under tests/qtest/libqos/ to communicate with
QEMU's 9p server.

It's already there for a long time. Let's officially assign it to 9p
maintainers.

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Greg Kurz 
Reviewed-by: Wilfred Mallawa 
Message-Id: 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b270eb8e5b..b0091d2ad8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2053,6 +2053,7 @@ X: hw/9pfs/xen-9p*
 F: fsdev/
 F: docs/tools/virtfs-proxy-helper.rst
 F: tests/qtest/virtio-9p-test.c
+F: tests/qtest/libqos/virtio-9p*
 T: git https://gitlab.com/gkurz/qemu.git 9p-next
 T: git https://github.com/cschoenebeck/qemu.git 9p.next
 
-- 
2.30.2

Re: [PATCH] linux-user: Improve strace output of personality() and sysinfo()

2022-12-23 Thread Philippe Mathieu-Daudé


On 23/12/22 11:53, Helge Deller wrote:

On 12/23/22 11:50, Philippe Mathieu-Daudé wrote:

On 23/12/22 11:01, Helge Deller wrote:

Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
---
  linux-user/strace.list | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index f9254725a1..909298099e 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1043,7 +1043,7 @@
  { TARGET_NR_perfctr, "perfctr" , NULL, NULL, NULL },
  #endif
  #ifdef TARGET_NR_personality
-{ TARGET_NR_personality, "personality" , NULL, NULL, NULL },
+{ TARGET_NR_personality, "personality" , "%s(%p)", NULL, 
print_syscall_ret_addr },


Shouldn't this be:

    { TARGET_NR_personality, "personality" , "%s(%u)", NULL, NULL },


Basically yes, but...
it's a bitmap, so printing it as hex value (similiar to a pointer)
is easier to read/analyze.


Oh, good point. Then "%s(0x"TARGET_ABI_FMT_lx")" is self-explicit.

Also for clarity:

#define print_syscall_ret_persona print_syscall_ret_addr

So what do you think of:

{ TARGET_NR_personality, "personality" , "%s(0x"TARGET_ABI_FMT_lx")",
   NULL, print_syscall_ret_persona },

?

Re: [PATCH 10/15] hw/riscv/boot.c: use MachineState in riscv_load_kernel()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:24 AM Daniel Henrique Barboza
 wrote:
>
> All callers are using kernel_filename as machine->kernel_filename.
>
> This will also simplify the changes in riscv_load_kernel() that we're
> going to do next.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c| 3 ++-
>  hw/riscv/microchip_pfsoc.c | 3 +--
>  hw/riscv/opentitan.c   | 3 +--
>  hw/riscv/sifive_e.c| 3 +--
>  hw/riscv/sifive_u.c| 3 +--
>  hw/riscv/spike.c   | 3 +--
>  hw/riscv/virt.c| 3 +--
>  include/hw/riscv/boot.h| 2 +-
>  8 files changed, 9 insertions(+), 14 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH] nubus-device: fix memory leak in nubus_device_realize

2022-12-23 Thread Philippe Mathieu-Daudé


On 22/12/22 18:29, Mauro Matteo Cascella wrote:

Local variable "name" is allocated through strdup_printf and should be
freed with g_free() to avoid memory leak.

Fixes: 3616f424 ("nubus-device: add romfile property for loading declaration 
ROMs")
Signed-off-by: Mauro Matteo Cascella 
---
  hw/nubus/nubus-device.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] linux-user: Improve strace output of personality() and sysinfo()

2022-12-23 Thread Helge Deller


On 12/23/22 11:50, Philippe Mathieu-Daudé wrote:

On 23/12/22 11:01, Helge Deller wrote:

Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
---
  linux-user/strace.list | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index f9254725a1..909298099e 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1043,7 +1043,7 @@
  { TARGET_NR_perfctr, "perfctr" , NULL, NULL, NULL },
  #endif
  #ifdef TARGET_NR_personality
-{ TARGET_NR_personality, "personality" , NULL, NULL, NULL },
+{ TARGET_NR_personality, "personality" , "%s(%p)", NULL, 
print_syscall_ret_addr },


Shouldn't this be:

    { TARGET_NR_personality, "personality" , "%s(%u)", NULL, NULL },


Basically yes, but...
it's a bitmap, so printing it as hex value (similiar to a pointer)
is easier to read/analyze.

Helge



?


  #endif
  #ifdef TARGET_NR_pipe
  { TARGET_NR_pipe, "pipe" , NULL, NULL, NULL },
@@ -1502,7 +1502,7 @@
  { TARGET_NR_sysfs, "sysfs" , NULL, NULL, NULL },
  #endif
  #ifdef TARGET_NR_sysinfo
-{ TARGET_NR_sysinfo, "sysinfo" , NULL, NULL, NULL },
+{ TARGET_NR_sysinfo, "sysinfo" , "%s(%p)", NULL, NULL },
  #endif
  #ifdef TARGET_NR_sys_kexec_load
  { TARGET_NR_sys_kexec_load, "sys_kexec_load" , NULL, NULL, NULL },
--
2.38.1

Re: [PATCH] linux-user: Improve strace output of personality() and sysinfo()

2022-12-23 Thread Philippe Mathieu-Daudé


On 23/12/22 11:01, Helge Deller wrote:

Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
---
  linux-user/strace.list | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index f9254725a1..909298099e 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1043,7 +1043,7 @@
  { TARGET_NR_perfctr, "perfctr" , NULL, NULL, NULL },
  #endif
  #ifdef TARGET_NR_personality
-{ TARGET_NR_personality, "personality" , NULL, NULL, NULL },
+{ TARGET_NR_personality, "personality" , "%s(%p)", NULL, 
print_syscall_ret_addr },


Shouldn't this be:

   { TARGET_NR_personality, "personality" , "%s(%u)", NULL, NULL },

?


  #endif
  #ifdef TARGET_NR_pipe
  { TARGET_NR_pipe, "pipe" , NULL, NULL, NULL },
@@ -1502,7 +1502,7 @@
  { TARGET_NR_sysfs, "sysfs" , NULL, NULL, NULL },
  #endif
  #ifdef TARGET_NR_sysinfo
-{ TARGET_NR_sysinfo, "sysinfo" , NULL, NULL, NULL },
+{ TARGET_NR_sysinfo, "sysinfo" , "%s(%p)", NULL, NULL },
  #endif
  #ifdef TARGET_NR_sys_kexec_load
  { TARGET_NR_sys_kexec_load, "sys_kexec_load" , NULL, NULL, NULL },
--
2.38.1

Re: [PATCH v2 4/4] docs/devel: Rules on #include in headers

2022-12-23 Thread Bernhard Beschow




Am 22. Dezember 2022 12:08:13 UTC schrieb Markus Armbruster :
>Rules for headers were proposed a long time ago, and generally liked:
>
>Message-ID: <87h9g8j57d@blackfin.pond.sub.org>
>https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html
>
>Wortk them into docs/devel/style.rst.
>
>Suggested-by: Bernhard Beschow 
>Signed-off-by: Markus Armbruster 
>---
> docs/devel/style.rst | 7 +++
> 1 file changed, 7 insertions(+)
>
>diff --git a/docs/devel/style.rst b/docs/devel/style.rst
>index 7ddd42b6c2..68aa776930 100644
>--- a/docs/devel/style.rst
>+++ b/docs/devel/style.rst
>@@ -293,6 +293,13 @@ that QEMU depends on.
> Do not include "qemu/osdep.h" from header files since the .c file will have
> already included it.
> 
>+Headers should normally include everything they need beyond osdep.h.
>+If exceptions are needed for some reason, they must be documented in
>+the header.  If all that's needed from a header is typedefs, consider
>+putting those into qemu/typedefs.h instead of including the header.
>+
>+Cyclic inclusion is forbidden.
>+

Nice!

I wonder if these should be bullet points like in your mail from 2016. I found 
them crystal clear since they looked like a todo list for review.

Feel free to respin. Either way:

Reviewed-by: Bernhard Beschow 

> C types
> ===
>

Re: [PATCH 09/15] hw/riscv/boot.c: use MachineState in riscv_load_initrd()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:24 AM Daniel Henrique Barboza
 wrote:
>
> 'filename', 'mem_size' and 'fdt' from riscv_load_initrd() can all be
> retrieved by the MachineState object for all callers.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c| 6 --
>  hw/riscv/microchip_pfsoc.c | 3 +--
>  hw/riscv/sifive_u.c| 3 +--
>  hw/riscv/spike.c   | 3 +--
>  hw/riscv/virt.c| 3 +--
>  include/hw/riscv/boot.h| 3 +--
>  6 files changed, 9 insertions(+), 12 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH v2 0/7] include/hw/pci include/hw/cxl: Clean up includes

2022-12-23 Thread Bernhard Beschow




Am 23. Dezember 2022 05:27:07 UTC schrieb Markus Armbruster :
>"Michael S. Tsirkin"  writes:
>
>> On Thu, Dec 22, 2022 at 11:48:25AM +0100, Markus Armbruster wrote:
>>> Bernhard Beschow  writes:
>>> 
>>> > Am 22. Dezember 2022 10:03:23 UTC schrieb Markus Armbruster 
>>> > :
>>> >>Back in 2016, we discussed[1] rules for headers, and these were
>>> >>generally liked:
>>> >>
>>> >>1. Have a carefully curated header that's included everywhere first.  We
>>> >>   got that already thanks to Peter: osdep.h.
>>> >>
>>> >>2. Headers should normally include everything they need beyond osdep.h.
>>> >>   If exceptions are needed for some reason, they must be documented in
>>> >>   the header.  If all that's needed from a header is typedefs, put
>>> >>   those into qemu/typedefs.h instead of including the header.
>>> >>
>>> >>3. Cyclic inclusion is forbidden.
>>> >
>>> > Sounds like these -- useful and sane -- rules belong in QEMU's coding 
>>> > style. What about putting them there for easy reference?
>>> 
>>> Makes sense.  I'll see what I can do.  Thanks!
>>
>> It would be even better if there was e.g. a make target
>> pulling in each header and making sure it's self consistent and
>> no circularity. We could run it e.g. in CI.
>
>Yes, that would be nice, but the problem I've been unable to crack is
>deciding whether a header is supposed to compile target-independently or
>not.  In my manual testing, I use trial and error: if it fails to
>compile target-independently, compile for all targets.  This is s-l-o-w.
>
>The other problem, of course, is coding it up in Meson.  I haven't even
>tried.

There is https://include-what-you-use.org which is a Clang-based tool. Maybe 
that works?

Best regards,
Bernhard

Re: [PATCH 08/15] hw/riscv: write bootargs 'chosen' FDT after riscv_load_kernel()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:24 AM Daniel Henrique Barboza
 wrote:
>
> The sifive_u, spike and virt machines are writing the 'bootargs' FDT
> node during their respective create_fdt().
>
> Given that bootargs is written only when '-append' is used, and this
> option is only allowed with the '-kernel' option, which in turn is
> already being check before executing riscv_load_kernel(), write

being checked

> 'bootargs' in the same code path as riscv_load_kernel().
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/sifive_u.c | 11 +--
>  hw/riscv/spike.c|  9 +
>  hw/riscv/virt.c | 11 +--
>  3 files changed, 15 insertions(+), 16 deletions(-)
>

Reviewed-by: Bin Meng

[PATCH] linux-user: Fix brk() to release pages

2022-12-23 Thread Helge Deller

The current brk() implementation does not de-allocate pages if a lower
address is given compared to earlier brk() calls.
But according to the manpage, brk() shall deallocate memory in this case
and currently it breaks a real-world application, specifically building
the debian gcl package in qemu-user.

Fix this issue by reworking the qemu brk() implementation.

Tested with the C-code testcase included in qemu commit 4d1de87c750, and
by building debian package of gcl in a hppa-linux guest on a x86-64
host.

Signed-off-by: Helge Deller 
---
 linux-user/syscall.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4fee882cd7..d306b02e21 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -838,13 +838,12 @@ static inline int host_to_target_sock_type(int host_type)
 }

 static abi_ulong target_brk;
-static abi_ulong target_original_brk;
 static abi_ulong brk_page;

 void target_set_brk(abi_ulong new_brk)
 {
-target_original_brk = target_brk = HOST_PAGE_ALIGN(new_brk);
-brk_page = HOST_PAGE_ALIGN(target_brk);
+target_brk = TARGET_PAGE_ALIGN(new_brk);
+brk_page = HOST_PAGE_ALIGN(new_brk);
 }

 //#define DEBUGF_BRK(message, args...) do { fprintf(stderr, (message), ## 
args); } while (0)
@@ -855,29 +854,29 @@ abi_long do_brk(abi_ulong new_brk)
 {
 abi_long mapped_addr;
 abi_ulong new_alloc_size;
+abi_ulong new_host_brk_page;

 /* brk pointers are always untagged */

+new_brk = TARGET_PAGE_ALIGN(new_brk);
+new_host_brk_page = HOST_PAGE_ALIGN(new_brk);
+
 DEBUGF_BRK("do_brk(" TARGET_ABI_FMT_lx ") -> ", new_brk);

-if (!new_brk) {
+if (!new_brk || new_brk == target_brk) {
 DEBUGF_BRK(TARGET_ABI_FMT_lx " (!new_brk)\n", target_brk);
 return target_brk;
 }
-if (new_brk < target_original_brk) {
-DEBUGF_BRK(TARGET_ABI_FMT_lx " (new_brk < target_original_brk)\n",
-   target_brk);
-return target_brk;
-}

-/* If the new brk is less than the highest page reserved to the
- * target heap allocation, set it and we're almost done...  */
-if (new_brk <= brk_page) {
-/* Heap contents are initialized to zero, as for anonymous
- * mapped pages.  */
-if (new_brk > target_brk) {
-memset(g2h_untagged(target_brk), 0, new_brk - target_brk);
-}
+/* Release heap if necesary */
+if (new_brk < target_brk) {
+/* empty remaining bytes in (possibly larger) host page */
+memset(g2h_untagged(new_brk), 0, new_host_brk_page - new_brk);
+
+/* free unused host pages and set new brk_page */
+target_munmap(new_host_brk_page, brk_page - new_host_brk_page);
+brk_page = new_host_brk_page;
+
target_brk = new_brk;
 DEBUGF_BRK(TARGET_ABI_FMT_lx " (new_brk <= brk_page)\n", target_brk);
return target_brk;
@@ -889,10 +888,14 @@ abi_long do_brk(abi_ulong new_brk)
  * itself); instead we treat "mapped but at wrong address" as
  * a failure and unmap again.
  */
-new_alloc_size = HOST_PAGE_ALIGN(new_brk - brk_page);
-mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size,
+new_alloc_size = new_host_brk_page - brk_page;
+if (new_alloc_size) {
+mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size,
 PROT_READ|PROT_WRITE,
 MAP_ANON|MAP_PRIVATE, 0, 0));
+} else {
+mapped_addr = brk_page;
+}

 if (mapped_addr == brk_page) {
 /* Heap contents are initialized to zero, as for anonymous
@@ -905,7 +908,7 @@ abi_long do_brk(abi_ulong new_brk)
 memset(g2h_untagged(target_brk), 0, brk_page - target_brk);

 target_brk = new_brk;
-brk_page = HOST_PAGE_ALIGN(target_brk);
+brk_page = new_host_brk_page;
 DEBUGF_BRK(TARGET_ABI_FMT_lx " (mapped_addr == brk_page)\n",
 target_brk);
 return target_brk;
--
2.38.1

Re: [PATCH 07/15] hw/riscv: write initrd 'chosen' FDT inside riscv_load_initrd()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:24 AM Daniel Henrique Barboza
 wrote:
>
> riscv_load_initrd() returns the initrd end addr while also writing a
> 'start' var to mark the addr start. These informations are being used
> just to write the initrd FDT node. Every existing caller of
> riscv_load_initrd() is writing the FDT in the same manner.
>
> We can simplify things by writing the FDT inside riscv_load_initrd(),
> sparing callers from having to manage start/end addrs to write the FDT
> themselves.
>
> An 'if (fdt)' check is already inserted at the end of the function
> because we'll end up using it later on with other boards that doesn´t

doesn't

> have a FDT.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c| 18 --
>  hw/riscv/microchip_pfsoc.c | 10 ++
>  hw/riscv/sifive_u.c| 10 ++
>  hw/riscv/spike.c   | 10 ++
>  hw/riscv/virt.c| 10 ++
>  include/hw/riscv/boot.h|  4 ++--
>  6 files changed, 22 insertions(+), 40 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2022-12-23 Thread Philippe Mathieu-Daudé


On 23/12/22 09:50, Alexander Graf wrote:

While trying to make Windows work with GICv3 emulation, I stumbled over
the fact that it only supports ITT entry sizes that are power of 2 sized.

While the spec allows arbitrary sizes, in practice hardware will always
expose power of 2 sizes and so this limitation is not really a problem
in real world scenarios. However, we only expose a 12 byte ITT entry size
which makes Windows blue screen on boot.

The easy way to get around that problem is to bump the size to 16. That
is a power of 2, basically is what hardware would expose given the amount
of bits we need per entry and doesn't break any existing scenarios. To
play it safe, this patch set only bumps them on newer machine types.

Alexander Graf (2):
   hw/intc/arm_gicv3: Make ITT entry size configurable
   hw/intc/arm_gicv3: Bump ITT entry size to 16


Series:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/6] hw/arm/xilinx_zynq: Remove tswap32() calls and constify smpboot[]

2022-12-23 Thread Philippe Mathieu-Daudé


On 23/12/22 11:01, Philippe Mathieu-Daudé wrote:

On 23/12/22 04:54, Edgar E. Iglesias wrote:

On Thu, Dec 22, 2022 at 10:55:48PM +0100, Philippe Mathieu-Daudé wrote:

ARM CPUs fetch instructions in little-endian.

smpboot[] encoded instructions are written in little-endian.

We call tswap32() on the array. tswap32 function swap a 32-bit
value if the target endianness doesn't match the host one.
Otherwise it is a NOP.

* On a little-endian host, the array is stored as it. tswap32()
   is a NOP, and the vCPU fetches the instructions as it, in
   little-endian.

* On a big-endian host, the array is stored as it. tswap32()
   swap the instructions to little-endian, and the vCPU fetches
   the instructions as it, in little-endian.

Using tswap() on system emulation is a bit odd: while the target
particularities might change the system emulation, the host ones
(such its endianness) shouldn't interfere.

We can simplify by using const_le32() to always store the
instructions in the array in little-endian, removing the need
for the dubious tswap().



Hi Philippe,




Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/arm/xilinx_zynq.c | 27 ---
  1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 3190cc0b8d..4316143b71 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -71,6 +71,11 @@ static const int dma_irqs[8] = {
  #define ZYNQ_SDHCI_CAPABILITIES 0x69ec0080  /* Datasheet: UG585 
(v1.12.1) */

+struct ZynqMachineState {
+    MachineState parent;
+    Clock *ps_clk;
+};
+
  #define ARMV7_IMM16(x) (extract32((x),  0, 12) | \
  extract32((x), 12,  4) << 16)
@@ -79,29 +84,21 @@ static const int dma_irqs[8] = {
   */
  #define SLCR_WRITE(addr, val) \
-    0xe3001000 + ARMV7_IMM16(extract32((val),  0, 16)), /* movw r1 
... */ \
-    0xe3401000 + ARMV7_IMM16(extract32((val), 16, 16)), /* movt r1 
... */ \

-    0xe5801000 + (addr)
-
-struct ZynqMachineState {
-    MachineState parent;
-    Clock *ps_clk;
-};
+    cpu_to_le32(0xe3001000 + ARMV7_IMM16(extract32((val),  0, 16))), 
/* movw r1 ... */ \
+    cpu_to_le32(0xe3401000 + ARMV7_IMM16(extract32((val), 16, 16))), 
/* movt r1 ... */ \


Looks like the callers all pass in constants, perhaps const_le32 
should be used everywhere or am I missing something?


extract32() is a function. I agree we can rewrite this macro to remove
it, I was simply lazy ;) I'll do for v2 so the array will be const.


Well it is already runtime const, I meant 'static const' so it becomes
build-time const.

Re: [PATCH 06/15] hw/riscv/spike.c: load initrd right after riscv_load_kernel()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:28 AM Daniel Henrique Barboza
 wrote:
>
> This will make the code more in line with what the other boards are
> doing. We'll also avoid an extra check to machine->kernel_filename since
> we already checked that before executing riscv_load_kernel().
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/spike.c | 31 +++
>  1 file changed, 15 insertions(+), 16 deletions(-)
>

Reviewed-by: Bin Meng

[PATCH] linux-user: Improve strace output of personality() and sysinfo()

2022-12-23 Thread Helge Deller

Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
---
 linux-user/strace.list | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index f9254725a1..909298099e 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1043,7 +1043,7 @@
 { TARGET_NR_perfctr, "perfctr" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_personality
-{ TARGET_NR_personality, "personality" , NULL, NULL, NULL },
+{ TARGET_NR_personality, "personality" , "%s(%p)", NULL, 
print_syscall_ret_addr },
 #endif
 #ifdef TARGET_NR_pipe
 { TARGET_NR_pipe, "pipe" , NULL, NULL, NULL },
@@ -1502,7 +1502,7 @@
 { TARGET_NR_sysfs, "sysfs" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_sysinfo
-{ TARGET_NR_sysinfo, "sysinfo" , NULL, NULL, NULL },
+{ TARGET_NR_sysinfo, "sysinfo" , "%s(%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_sys_kexec_load
 { TARGET_NR_sys_kexec_load, "sys_kexec_load" , NULL, NULL, NULL },
--
2.38.1

Re: [PATCH 5/6] hw/arm/xilinx_zynq: Remove tswap32() calls and constify smpboot[]

2022-12-23 Thread Philippe Mathieu-Daudé


On 23/12/22 04:54, Edgar E. Iglesias wrote:

On Thu, Dec 22, 2022 at 10:55:48PM +0100, Philippe Mathieu-Daudé wrote:

ARM CPUs fetch instructions in little-endian.

smpboot[] encoded instructions are written in little-endian.

We call tswap32() on the array. tswap32 function swap a 32-bit
value if the target endianness doesn't match the host one.
Otherwise it is a NOP.

* On a little-endian host, the array is stored as it. tswap32()
   is a NOP, and the vCPU fetches the instructions as it, in
   little-endian.

* On a big-endian host, the array is stored as it. tswap32()
   swap the instructions to little-endian, and the vCPU fetches
   the instructions as it, in little-endian.

Using tswap() on system emulation is a bit odd: while the target
particularities might change the system emulation, the host ones
(such its endianness) shouldn't interfere.

We can simplify by using const_le32() to always store the
instructions in the array in little-endian, removing the need
for the dubious tswap().



Hi Philippe,




Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/arm/xilinx_zynq.c | 27 ---
  1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 3190cc0b8d..4316143b71 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -71,6 +71,11 @@ static const int dma_irqs[8] = {
  
  #define ZYNQ_SDHCI_CAPABILITIES 0x69ec0080  /* Datasheet: UG585 (v1.12.1) */
  
+struct ZynqMachineState {

+MachineState parent;
+Clock *ps_clk;
+};
+
  #define ARMV7_IMM16(x) (extract32((x),  0, 12) | \
  extract32((x), 12,  4) << 16)
  
@@ -79,29 +84,21 @@ static const int dma_irqs[8] = {

   */
  
  #define SLCR_WRITE(addr, val) \

-0xe3001000 + ARMV7_IMM16(extract32((val),  0, 16)), /* movw r1 ... */ \
-0xe3401000 + ARMV7_IMM16(extract32((val), 16, 16)), /* movt r1 ... */ \
-0xe5801000 + (addr)
-
-struct ZynqMachineState {
-MachineState parent;
-Clock *ps_clk;
-};
+cpu_to_le32(0xe3001000 + ARMV7_IMM16(extract32((val),  0, 16))), /* movw 
r1 ... */ \
+cpu_to_le32(0xe3401000 + ARMV7_IMM16(extract32((val), 16, 16))), /* movt 
r1 ... */ \


Looks like the callers all pass in constants, perhaps const_le32 should be used 
everywhere or am I missing something?


extract32() is a function. I agree we can rewrite this macro to remove
it, I was simply lazy ;) I'll do for v2 so the array will be const.





+const_le32(0xe5801000 + (addr))

Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)

2022-12-23 Thread David Hildenbrand


On 22.12.22 12:02, David Hildenbrand wrote:

For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content. This
information is immutable on the migration source while migration is active,

For example, we want to use this information for proper preallocation
support with migration: currently, we don't preallocate memory on the
migration target, and especially with hugetlb, we can easily run out of
hugetlb pages during RAM migration and will crash (SIGBUS) instead of
catching this gracefully via preallocation.

Migrating device state before we start iterating is currently impossible.
Introduce and use qemu_savevm_state_start_precopy(), and use
a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
state will be saved in qemu_savevm_state_start_precopy() or in
qemu_savevm_state_complete_precopy_*().

We have to take care of properly including the early device state in the
vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
a bit sub-optimal, but we use that explicitly or implicitly all over the
place already, so this barely matters in practice.

Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of that.

Signed-off-by: David Hildenbrand 


[...]

  
  if (inactivate_disks) {

@@ -1427,6 +1474,10 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
  qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
  }
  
+/* Free it now to detect any inconsistencies. */

+g_free(vmdesc);


Missed to convert that to a json_writer_free().

--
Thanks,

David / dhildenb

Re: [PATCH 05/15] hw/riscv/boot.c: introduce riscv_default_firmware_name()

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:28 AM Daniel Henrique Barboza
 wrote:
>
> Some boards are duplicating the 'riscv_find_and_load_firmware' call
> because the 32 and 64 bits images have different names. Create
> a function to handle this detail instead of hardcoding it in the boards.
>
> Ideally we would bake this logic inside riscv_find_and_load_firmware(),
> or even create a riscv_load_default_firmware(), but at this moment we
> cannot infer whether the machine is running 32 or 64 bits without
> accessing RISCVHartArrayState, which in turn can't be accessed via the
> common code from boot.c. In the end we would exchange 'firmware_name'
> for a flag with riscv_is_32bit(), which isn't much better than what we
> already have today.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c |  9 +
>  hw/riscv/sifive_u.c | 11 ---
>  hw/riscv/spike.c| 14 +-
>  hw/riscv/virt.c | 10 +++---
>  include/hw/riscv/boot.h |  1 +
>  5 files changed, 22 insertions(+), 23 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH v2 1/2] hw/intc/loongarch_pch_msi: add irq number property

2022-12-23 Thread maobibo



Tianrui,

  We should solve all the issues in the previous mailing list,
and send the next version. 
  
https://patchwork.kernel.org/project/qemu-devel/patch/20221215065011.2133471-2-zhaotian...@loongson.cn/

  We should not be in such hurry:)

regards
bibo,mao

在 2022/12/23 16:08, Tianrui Zhao 写道:
> This patch adds irq number property for loongarch msi interrupt
> controller, and remove hard coding irq number macro.
> 
> Signed-off-by: Tianrui Zhao 
> ---
>  hw/intc/loongarch_pch_msi.c | 30 ++---
>  hw/loongarch/virt.c | 11 +++
>  include/hw/intc/loongarch_pch_msi.h |  3 ++-
>  include/hw/pci-host/ls7a.h  |  1 -
>  4 files changed, 36 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/intc/loongarch_pch_msi.c b/hw/intc/loongarch_pch_msi.c
> index b36d6d76e4..5b8de43d42 100644
> --- a/hw/intc/loongarch_pch_msi.c
> +++ b/hw/intc/loongarch_pch_msi.c
> @@ -32,7 +32,7 @@ static void loongarch_msi_mem_write(void *opaque, hwaddr 
> addr,
>   */
>  irq_num = (val & 0xff) - s->irq_base;
>  trace_loongarch_msi_set_irq(irq_num);
> -assert(irq_num < PCH_MSI_IRQ_NUM);
> +assert(irq_num < s->irq_num);
>  qemu_set_irq(s->pch_msi_irq[irq_num], 1);
>  }
>  
> @@ -49,6 +49,29 @@ static void pch_msi_irq_handler(void *opaque, int irq, int 
> level)
>  qemu_set_irq(s->pch_msi_irq[irq], level);
>  }
>  
> +static void loongarch_pch_msi_realize(DeviceState *dev, Error **errp)
> +{
> +LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
> +
> +assert(s->irq_num > 0);
> +
> +s->pch_msi_irq = g_malloc(sizeof(qemu_irq) * s->irq_num);
> +if (!s->pch_msi_irq) {
> +error_report("loongarch_pch_msi: fail to alloc memory");
> +exit(1);
> +}
> +
> +qdev_init_gpio_out(dev, s->pch_msi_irq, s->irq_num);
> +qdev_init_gpio_in(dev, pch_msi_irq_handler, s->irq_num);
> +}
> +
> +static void loongarch_pch_msi_unrealize(DeviceState *dev)
> +{
> +LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
> +
> +g_free(s->pch_msi_irq);
> +}
> +
>  static void loongarch_pch_msi_init(Object *obj)
>  {
>  LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(obj);
> @@ -59,12 +82,11 @@ static void loongarch_pch_msi_init(Object *obj)
>  sysbus_init_mmio(sbd, &s->msi_mmio);
>  msi_nonbroken = true;
>  
> -qdev_init_gpio_out(DEVICE(obj), s->pch_msi_irq, PCH_MSI_IRQ_NUM);
> -qdev_init_gpio_in(DEVICE(obj), pch_msi_irq_handler, PCH_MSI_IRQ_NUM);
>  }
>  
>  static Property loongarch_msi_properties[] = {
>  DEFINE_PROP_UINT32("msi_irq_base", LoongArchPCHMSI, irq_base, 0),
> +DEFINE_PROP_UINT32("msi_irq_num",  LoongArchPCHMSI, irq_num, 0),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -72,6 +94,8 @@ static void loongarch_pch_msi_class_init(ObjectClass 
> *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  
> +dc->realize = loongarch_pch_msi_realize;
> +dc->unrealize = loongarch_pch_msi_unrealize;
>  device_class_set_props(dc, loongarch_msi_properties);
>  }
>  
> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
> index 958be74fa1..3547d5f711 100644
> --- a/hw/loongarch/virt.c
> +++ b/hw/loongarch/virt.c
> @@ -496,7 +496,7 @@ static void loongarch_irq_init(LoongArchMachineState 
> *lams)
>  LoongArchCPU *lacpu;
>  CPULoongArchState *env;
>  CPUState *cpu_state;
> -int cpu, pin, i;
> +int cpu, pin, i, start, num;
>  
>  ipi = qdev_new(TYPE_LOONGARCH_IPI);
>  sysbus_realize_and_unref(SYS_BUS_DEVICE(ipi), &error_fatal);
> @@ -576,14 +576,17 @@ static void loongarch_irq_init(LoongArchMachineState 
> *lams)
>  }
>  
>  pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);
> -qdev_prop_set_uint32(pch_msi, "msi_irq_base", PCH_MSI_IRQ_START);
> +start   =  PCH_PIC_IRQ_NUM;
> +num = 256 - start;
> +qdev_prop_set_uint32(pch_msi, "msi_irq_base", start);
> +qdev_prop_set_uint32(pch_msi, "msi_irq_num", num);
>  d = SYS_BUS_DEVICE(pch_msi);
>  sysbus_realize_and_unref(d, &error_fatal);
>  sysbus_mmio_map(d, 0, VIRT_PCH_MSI_ADDR_LOW);
> -for (i = 0; i < PCH_MSI_IRQ_NUM; i++) {
> +for (i = 0; i < num; i++) {
>  /* Connect 192 pch_msi irqs to extioi */
>  qdev_connect_gpio_out(DEVICE(d), i,
> -  qdev_get_gpio_in(extioi, i + 
> PCH_MSI_IRQ_START));
> +  qdev_get_gpio_in(extioi, i + start));
>  }
>  
>  loongarch_devices_init(pch_pic, lams);
> diff --git a/include/hw/intc/loongarch_pch_msi.h 
> b/include/hw/intc/loongarch_pch_msi.h
> index 6d67560dea..c5a52bc327 100644
> --- a/include/hw/intc/loongarch_pch_msi.h
> +++ b/include/hw/intc/loongarch_pch_msi.h
> @@ -15,8 +15,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(LoongArchPCHMSI, 
> LOONGARCH_PCH_MSI)
>  
>  struct LoongArchPCHMSI {
>  SysBusDevice parent_obj;
> -qemu_irq pch_msi_irq[PCH_MSI_IRQ_NUM];
> +qemu_irq *pch_msi_irq;
>  MemoryRegion msi_mmio;
>  /* irq base passed to upper extioi intc */
>

Re: [PATCH 04/15] hw/riscv/boot.c: make riscv_find_firmware() static

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:27 AM Daniel Henrique Barboza
 wrote:
>
> The only caller is riscv_find_and_load_firmware(), which is in the same
> file.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c | 44 -
>  include/hw/riscv/boot.h |  1 -
>  2 files changed, 22 insertions(+), 23 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH 03/15] hw/riscv/sifive_u: use 'fdt' from MachineState

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:29 AM Daniel Henrique Barboza
 wrote:
>
> The MachineState object provides a 'fdt' pointer that is already being
> used by other RISC-V machines, and it's also used by the 'dumpdtb' QMP
> command.
>
> Remove the 'fdt' pointer from SiFiveUState and use MachineState::fdt
> instead.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/sifive_u.c | 15 ++-
>  include/hw/riscv/sifive_u.h |  3 ---
>  2 files changed, 6 insertions(+), 12 deletions(-)
>

Reviewed-by: Bin Meng

Re: [PATCH 02/15] hw/riscv/spike: use 'fdt' from MachineState

2022-12-23 Thread Bin Meng

On Thu, Dec 22, 2022 at 2:24 AM Daniel Henrique Barboza
 wrote:
>
> The MachineState object provides a 'fdt' pointer that is already being
> used by other RISC-V machines, and it's also used by the 'dumpdtb' QMP
> command.
>
> Remove the 'fdt' pointer from SpikeState and use MachineState::fdt
> instead.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/spike.c | 12 +---
>  include/hw/riscv/spike.h |  2 --
>  2 files changed, 5 insertions(+), 9 deletions(-)
>

Reviewed-by: Bin Meng

[PATCH v3 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()

2022-12-23 Thread Alexander Graf

The finalize_gic_version() function tries to determine which GIC version
the current accelerator / host combination supports. During the initial
HVF porting efforts, I didn't realize that I also had to touch this
function. Then Zenghui brought up this function as reply to my HVF GICv3
enablement patch - and boy it is a mess.

This patch set cleans up all of the GIC finalization so that we can
easily plug HVF in and also hopefully will have a better time extending
it in the future. As second step, it explicitly adds HVF support and
fails loudly for any unsupported accelerators.

Alex

v1 -> v2:

  - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation
  - Include TCG header for tcg_enabled()

v2 -> v3:

  - Fix comment
  - Flip kvm-enabled logic for host around

Alexander Graf (2):
  hw/arm/virt: Consolidate GIC finalize logic
  hw/arm/virt: Make accels in GIC finalize logic explicit

 hw/arm/virt.c | 200 ++
 include/hw/arm/virt.h |  15 ++--
 2 files changed, 115 insertions(+), 100 deletions(-)

-- 
2.37.1 (Apple Git-137.1)

[PATCH v3 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit

2022-12-23 Thread Alexander Graf

Let's explicitly list out all accelerators that we support when trying to
determine the supported set of GIC versions. KVM was already separate, so
the only missing one is HVF which simply reuses all of TCG's emulation
code and thus has the same compatibility matrix.

Signed-off-by: Alexander Graf 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Cornelia Huck 

---

v1 -> v2:

  - Include TCG header for tcg_enabled()
---
 hw/arm/virt.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6d27f044fe..611f40c1da 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -47,6 +47,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/runstate.h"
 #include "sysemu/tpm.h"
+#include "sysemu/tcg.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
 #include "hw/loader.h"
@@ -1929,7 +1930,7 @@ static void finalize_gic_version(VirtMachineState *vms)
 /* KVM w/o kernel irqchip can only deal with GICv2 */
 gics_supported |= VIRT_GIC_VERSION_2_MASK;
 accel_name = "KVM with kernel-irqchip=off";
-} else {
+} else if (tcg_enabled() || hvf_enabled())  {
 gics_supported |= VIRT_GIC_VERSION_2_MASK;
 if (module_object_class_by_name("arm-gicv3")) {
 gics_supported |= VIRT_GIC_VERSION_3_MASK;
@@ -1938,6 +1939,9 @@ static void finalize_gic_version(VirtMachineState *vms)
 gics_supported |= VIRT_GIC_VERSION_4_MASK;
 }
 }
+} else {
+error_report("Unsupported accelerator, can not determine GIC support");
+exit(1);
 }
 
 /*
-- 
2.37.1 (Apple Git-137.1)

[PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-23 Thread Alexander Graf

Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation

v2 -> v3:

  - Fix comment
  - Flip kvm-enabled logic for host around
---
 hw/arm/virt.c | 198 ++
 include/hw/arm/virt.h |  15 ++--
 2 files changed, 112 insertions(+), 101 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea2413a0ba..6d27f044fe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 }
 }
 
+static VirtGICType finalize_gic_version_do(const char *accel_name,
+   VirtGICType gic_version,
+   int gics_supported,
+   unsigned int max_cpus)
+{
+/* Convert host/max/nosel to GIC version number */
+switch (gic_version) {
+case VIRT_GIC_VERSION_HOST:
+if (!kvm_enabled()) {
+error_report("gic-version=host requires KVM");
+exit(1);
+}
+
+/* For KVM, gic-version=host means gic-version=max */
+return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+   gics_supported, max_cpus);
+case VIRT_GIC_VERSION_MAX:
+if (gics_supported & VIRT_GIC_VERSION_4_MASK) {
+gic_version = VIRT_GIC_VERSION_4;
+} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+gic_version = VIRT_GIC_VERSION_3;
+} else {
+gic_version = VIRT_GIC_VERSION_2;
+}
+break;
+case VIRT_GIC_VERSION_NOSEL:
+if ((gics_supported & VIRT_GIC_VERSION_2_MASK) &&
+max_cpus <= GIC_NCPU) {
+gic_version = VIRT_GIC_VERSION_2;
+} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+/*
+ * in case the host does not support v2 emulation or
+ * the end-user requested more than 8 VCPUs we now default
+ * to v3. In any case defaulting to v2 would be broken.
+ */
+gic_version = VIRT_GIC_VERSION_3;
+} else if (max_cpus > GIC_NCPU) {
+error_report("%s only supports GICv2 emulation but more than 8 "
+ "vcpus are requested", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_2:
+case VIRT_GIC_VERSION_3:
+case VIRT_GIC_VERSION_4:
+break;
+}
+
+/* Check chosen version is effectively supported */
+switch (gic_version) {
+case VIRT_GIC_VERSION_2:
+if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) {
+error_report("%s does not support GICv2 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_3:
+if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) {
+error_report("%s does not support GICv3 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_4:
+if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) {
+error_report("%s does not support GICv4 emulation, is 
virtualization=on?",
+ accel_name);
+exit(1);
+}
+break;
+default:
+error_report("logic error in finalize_gic_version");
+exit(1);
+break;
+}
+
+return gic_version;
+}
+
 /*
  * finalize_gic_version - Determines the final gic_version
  * according to the gic-version property
@@ -1828,118 +1906,46 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  */
 static void finalize_gic_version(VirtMachineState *vms)
 {
+const char *accel_name = current_accel_name();
 unsigned int max_cpus = MACHINE(vms)->smp.max_cpus;
+int gics_supported = 0;
 
-if (kvm_enabled()) {
-int probe_bitmap;
+/* Determine which GIC versions the current environment supports */
+if (kvm_enabled() && kvm_irqchip_in_kernel()) {
+int probe_bitmap = kvm_arm_vgic_probe();
 
-if (!kvm_irqchip_in_kernel()) {
-switch (vms->gic_version) {
-case VIRT_GIC_VERSION_HOST:
-

[PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2022-12-23 Thread Alexander Graf

While trying to make Windows work with GICv3 emulation, I stumbled over
the fact that it only supports ITT entry sizes that are power of 2 sized.

While the spec allows arbitrary sizes, in practice hardware will always
expose power of 2 sizes and so this limitation is not really a problem
in real world scenarios. However, we only expose a 12 byte ITT entry size
which makes Windows blue screen on boot.

The easy way to get around that problem is to bump the size to 16. That
is a power of 2, basically is what hardware would expose given the amount
of bits we need per entry and doesn't break any existing scenarios. To
play it safe, this patch set only bumps them on newer machine types.

Alexander Graf (2):
  hw/intc/arm_gicv3: Make ITT entry size configurable
  hw/intc/arm_gicv3: Bump ITT entry size to 16

 hw/core/machine.c  |  4 +++-
 hw/intc/arm_gicv3_its.c| 13 ++---
 hw/intc/gicv3_internal.h   |  2 +-
 include/hw/intc/arm_gicv3_its_common.h |  1 +
 4 files changed, 15 insertions(+), 5 deletions(-)

-- 
2.37.1 (Apple Git-137.1)

[PATCH 1/2] hw/intc/arm_gicv3: Make ITT entry size configurable

2022-12-23 Thread Alexander Graf

An ITT entry is opaque to the OS. The only thing it does get told by HW is
its size. In theory, that size can be any byte aligned number, in practice
HW will always use power of 2s to simplify offset calculation. We currently
expose the size as 12, which is not a power of 2.

To prepare for a future where we expose power of 2 sized entry sizes, let's
make the size itself configurable. We only need to watch out that we don't
have an entry be smaller than the fields we want to access inside. Bigger
is always fine.

Signed-off-by: Alexander Graf 
---
 hw/intc/arm_gicv3_its.c| 14 +++---
 hw/intc/gicv3_internal.h   |  2 +-
 include/hw/intc/arm_gicv3_its_common.h |  1 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 57c79da5c5..e7cabeb46c 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -215,7 +215,7 @@ static bool update_ite(GICv3ITSState *s, uint32_t eventid, 
const DTEntry *dte,
 {
 AddressSpace *as = &s->gicv3->dma_as;
 MemTxResult res = MEMTX_OK;
-hwaddr iteaddr = dte->ittaddr + eventid * ITS_ITT_ENTRY_SIZE;
+hwaddr iteaddr = dte->ittaddr + eventid * s->itt_entry_size;
 uint64_t itel = 0;
 uint32_t iteh = 0;
 
@@ -253,7 +253,7 @@ static MemTxResult get_ite(GICv3ITSState *s, uint32_t 
eventid,
 MemTxResult res = MEMTX_OK;
 uint64_t itel;
 uint32_t iteh;
-hwaddr iteaddr = dte->ittaddr + eventid * ITS_ITT_ENTRY_SIZE;
+hwaddr iteaddr = dte->ittaddr + eventid * s->itt_entry_size;
 
 itel = address_space_ldq_le(as, iteaddr, MEMTXATTRS_UNSPECIFIED, &res);
 if (res != MEMTX_OK) {
@@ -1934,6 +1934,12 @@ static void gicv3_arm_its_realize(DeviceState *dev, 
Error **errp)
 }
 }
 
+if (s->itt_entry_size < MIN_ITS_ITT_ENTRY_SIZE) {
+error_setg(errp, "ITT entry size must be at least %d",
+   MIN_ITS_ITT_ENTRY_SIZE);
+return;
+}
+
 gicv3_add_its(s->gicv3, dev);
 
 gicv3_its_init_mmio(s, &gicv3_its_control_ops, &gicv3_its_translation_ops);
@@ -1941,7 +1947,7 @@ static void gicv3_arm_its_realize(DeviceState *dev, Error 
**errp)
 /* set the ITS default features supported */
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, PHYSICAL, 1);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, ITT_ENTRY_SIZE,
-  ITS_ITT_ENTRY_SIZE - 1);
+  s->itt_entry_size - 1);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, IDBITS, ITS_IDBITS);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, DEVBITS, ITS_DEVBITS);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, CIL, 1);
@@ -2008,6 +2014,8 @@ static void gicv3_its_post_load(GICv3ITSState *s)
 static Property gicv3_its_props[] = {
 DEFINE_PROP_LINK("parent-gicv3", GICv3ITSState, gicv3, "arm-gicv3",
  GICv3State *),
+DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size,
+  MIN_ITS_ITT_ENTRY_SIZE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 29d5cdc1b6..2aca1ba095 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -450,7 +450,7 @@ FIELD(VINVALL_1, VPEID, 32, 16)
  * the value of that field in memory cannot be relied upon -- older
  * versions of QEMU did not correctly write to that memory.)
  */
-#define ITS_ITT_ENTRY_SIZE0xC
+#define MIN_ITS_ITT_ENTRY_SIZE0xC
 
 FIELD(ITE_L, VALID, 0, 1)
 FIELD(ITE_L, INTTYPE, 1, 1)
diff --git a/include/hw/intc/arm_gicv3_its_common.h 
b/include/hw/intc/arm_gicv3_its_common.h
index a11a0f6654..e730a5482c 100644
--- a/include/hw/intc/arm_gicv3_its_common.h
+++ b/include/hw/intc/arm_gicv3_its_common.h
@@ -66,6 +66,7 @@ struct GICv3ITSState {
 int dev_fd; /* kvm device fd if backed by kvm vgic support */
 uint64_t gits_translater_gpa;
 bool translater_gpa_known;
+uint8_t itt_entry_size;
 
 /* Registers */
 uint32_t ctlr;
-- 
2.37.1 (Apple Git-137.1)

[PATCH 2/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2022-12-23 Thread Alexander Graf

Some Operating Systems (like Windows) can only deal with ITT entry sizes
that are a power of 2. While the spec allows arbitrarily sized ITT entry
sizes, in practice all hardware will use power of 2 because that
simplifies offset calculation and ensures that a power of 2 sized region
can hold a set of entries without gap at the end.

So let's just bump the entry size to 16. That gives us enough space for
the 12 bytes of data that we want to have in each ITT entry and makes
QEMU look a bit more like real hardware.

Signed-off-by: Alexander Graf 
---
 hw/core/machine.c   | 4 +++-
 hw/intc/arm_gicv3_its.c | 3 +--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f589b92909..d9a3f01ed9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,7 +40,9 @@
 #include "hw/virtio/virtio-pci.h"
 #include "qom/object_interfaces.h"
 
-GlobalProperty hw_compat_7_2[] = {};
+GlobalProperty hw_compat_7_2[] = {
+{ "arm-gicv3-its", "itt-entry-size", "12" },
+};
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
 GlobalProperty hw_compat_7_1[] = {
diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index e7cabeb46c..6754523321 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -2014,8 +2014,7 @@ static void gicv3_its_post_load(GICv3ITSState *s)
 static Property gicv3_its_props[] = {
 DEFINE_PROP_LINK("parent-gicv3", GICv3ITSState, gicv3, "arm-gicv3",
  GICv3State *),
-DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size,
-  MIN_ITS_ITT_ENTRY_SIZE),
+DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size, 16),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.37.1 (Apple Git-137.1)

Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

2022-12-23 Thread Chao Peng

On Thu, Dec 22, 2022 at 06:15:24PM +, Sean Christopherson wrote:
> On Wed, Dec 21, 2022, Chao Peng wrote:
> > On Tue, Dec 20, 2022 at 08:33:05AM +, Huang, Kai wrote:
> > > On Tue, 2022-12-20 at 15:22 +0800, Chao Peng wrote:
> > > > On Mon, Dec 19, 2022 at 08:48:10AM +, Huang, Kai wrote:
> > > > > On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote:
> > > But for non-restricted-mem case, it is correct for KVM to decrease page's
> > > refcount after setting up mapping in the secondary mmu, otherwise the 
> > > page will
> > > be pinned by KVM for normal VM (since KVM uses GUP to get the page).
> > 
> > That's true. Actually even true for restrictedmem case, most likely we
> > will still need the kvm_release_pfn_clean() for KVM generic code. On one
> > side, other restrictedmem users like pKVM may not require page pinning
> > at all. On the other side, see below.
> > 
> > > 
> > > So what we are expecting is: for KVM if the page comes from restricted 
> > > mem, then
> > > KVM cannot decrease the refcount, otherwise for normal page via GUP KVM 
> > > should.
> 
> No, requiring the user (KVM) to guard against lack of support for page 
> migration
> in restricted mem is a terrible API.  It's totally fine for restricted mem to 
> not
> support page migration until there's a use case, but punting the problem to 
> KVM
> is not acceptable.  Restricted mem itself doesn't yet support page migration,
> e.g. explosions would occur even if KVM wanted to allow migration since there 
> is
> no notification to invalidate existing mappings.
> 
> > I argue that this page pinning (or page migration prevention) is not
> > tied to where the page comes from, instead related to how the page will
> > be used. Whether the page is restrictedmem backed or GUP() backed, once
> > it's used by current version of TDX then the page pinning is needed. So
> > such page migration prevention is really TDX thing, even not KVM generic
> > thing (that's why I think we don't need change the existing logic of
> > kvm_release_pfn_clean()). Wouldn't better to let TDX code (or who
> > requires that) to increase/decrease the refcount when it populates/drops
> > the secure EPT entries? This is exactly what the current TDX code does:
> 
> I agree that whether or not migration is supported should be controllable by 
> the
> user, but I strongly disagree on punting refcount management to KVM (or TDX).
> The whole point of restricted mem is to support technologies like TDX and SNP,
> accomodating their special needs for things like page migration should be 
> part of
> the API, not some footnote in the documenation.

I never doubt page migration should be part of restrictedmem API, but
that's not an initial implementing as we all agreed? Then before that
API being introduced, we need find a solution to prevent page migration
for TDX. Other than refcount management, do we have any other workable
solution? 

> 
> It's not difficult to let the user communicate support for page migration, 
> e.g.
> if/when restricted mem gains support, add a hook to restrictedmem_notifier_ops
> to signal support (or lack thereof) for page migration.  NULL == no migration,
> non-NULL == migration allowed.

I know.

> 
> We know that supporting page migration in TDX and SNP is possible, and we know
> that page migration will require a dedicated API since the backing store can't
> memcpy() the page.  I don't see any reason to ignore that eventuality.

No, I'm not ignoring it. It's just about the short-term page migration
prevention before that dedicated API being introduced.

> 
> But again, unless I'm missing something, that's a future problem because 
> restricted
> mem doesn't yet support page migration regardless of the downstream user.

It's true a future problem for page migration support itself, but page
migration prevention is not a future problem since TDX pages need to be
pinned before page migration gets supported.

Thanks,
Chao

Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

2022-12-23 Thread Chao Peng

On Thu, Dec 22, 2022 at 12:37:19AM +, Huang, Kai wrote:
> On Wed, 2022-12-21 at 21:39 +0800, Chao Peng wrote:
> > > On Tue, Dec 20, 2022 at 08:33:05AM +, Huang, Kai wrote:
> > > > > On Tue, 2022-12-20 at 15:22 +0800, Chao Peng wrote:
> > > > > > > On Mon, Dec 19, 2022 at 08:48:10AM +, Huang, Kai wrote:
> > > > > > > > > On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > [...]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > + /*
> > > > > > > > > > > > > > > +  * These pages are currently unmovable so don't 
> > > > > > > > > > > > > > > place them into
> > > > > > > > > > > > > > > movable
> > > > > > > > > > > > > > > +  * pageblocks (e.g. CMA and ZONE_MOVABLE).
> > > > > > > > > > > > > > > +  */
> > > > > > > > > > > > > > > + mapping = memfd->f_mapping;
> > > > > > > > > > > > > > > + mapping_set_unevictable(mapping);
> > > > > > > > > > > > > > > + mapping_set_gfp_mask(mapping,
> > > > > > > > > > > > > > > +  mapping_gfp_mask(mapping) 
> > > > > > > > > > > > > > > & ~__GFP_MOVABLE);
> > > > > > > > > > > > > 
> > > > > > > > > > > > > But, IIUC removing __GFP_MOVABLE flag here only makes 
> > > > > > > > > > > > > page allocation from
> > > > > > > > > > > > > non-
> > > > > > > > > > > > > movable zones, but doesn't necessarily prevent page 
> > > > > > > > > > > > > from being migrated.  My
> > > > > > > > > > > > > first glance is you need to implement either 
> > > > > > > > > > > > > a_ops->migrate_folio() or just
> > > > > > > > > > > > > get_page() after faulting in the page to prevent.
> > > > > > > > > > > 
> > > > > > > > > > > The current api restrictedmem_get_page() already does 
> > > > > > > > > > > this, after the
> > > > > > > > > > > caller calling it, it holds a reference to the page. The 
> > > > > > > > > > > caller then
> > > > > > > > > > > decides when to call put_page() appropriately.
> > > > > > > > > 
> > > > > > > > > I tried to dig some history. Perhaps I am missing something, 
> > > > > > > > > but it seems Kirill
> > > > > > > > > said in v9 that this code doesn't prevent page migration, and 
> > > > > > > > > we need to
> > > > > > > > > increase page refcount in restrictedmem_get_page():
> > > > > > > > > 
> > > > > > > > > https://lore.kernel.org/linux-mm/20221129112139.usp6dqhbih47q...@box.shutemov.name/
> > > > > > > > > 
> > > > > > > > > But looking at this series it seems restrictedmem_get_page() 
> > > > > > > > > in this v10 is
> > > > > > > > > identical to the one in v9 (except v10 uses 'folio' instead 
> > > > > > > > > of 'page')?
> > > > > > > 
> > > > > > > restrictedmem_get_page() increases page refcount several versions 
> > > > > > > ago so
> > > > > > > no change in v10 is needed. You probably missed my reply:
> > > > > > > 
> > > > > > > https://lore.kernel.org/linux-mm/20221129135844.ga902...@chaop.bj.intel.com/
> > > > > 
> > > > > But for non-restricted-mem case, it is correct for KVM to decrease 
> > > > > page's
> > > > > refcount after setting up mapping in the secondary mmu, otherwise the 
> > > > > page will
> > > > > be pinned by KVM for normal VM (since KVM uses GUP to get the page).
> > > 
> > > That's true. Actually even true for restrictedmem case, most likely we
> > > will still need the kvm_release_pfn_clean() for KVM generic code. On one
> > > side, other restrictedmem users like pKVM may not require page pinning
> > > at all. On the other side, see below.
> 
> OK. Agreed.
> 
> > > 
> > > > > 
> > > > > So what we are expecting is: for KVM if the page comes from 
> > > > > restricted mem, then
> > > > > KVM cannot decrease the refcount, otherwise for normal page via GUP 
> > > > > KVM should.
> > > 
> > > I argue that this page pinning (or page migration prevention) is not
> > > tied to where the page comes from, instead related to how the page will
> > > be used. Whether the page is restrictedmem backed or GUP() backed, once
> > > it's used by current version of TDX then the page pinning is needed. So
> > > such page migration prevention is really TDX thing, even not KVM generic
> > > thing (that's why I think we don't need change the existing logic of
> > > kvm_release_pfn_clean()). 
> > > 
> 
> This essentially boils down to who "owns" page migration handling, and sadly,
> page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle 
> page
> migration by itself -- it's just a passive receiver.

No, I'm not talking on the page migration handling itself, I know page
migration requires coordination from both core-mm and KVM. I'm more
concerning on the page migration prevention here. This is something we
need to address for TDX before the page migration is supported.

> 
> For normal pages, page migration is totally done by the core-kernel (i.e. it
> unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops-
> >migrate_page() to actually migrate the page).
> 
> In the sense of

[PATCH v2 1/2] hw/intc/loongarch_pch_msi: add irq number property

2022-12-23 Thread Tianrui Zhao

This patch adds irq number property for loongarch msi interrupt
controller, and remove hard coding irq number macro.

Signed-off-by: Tianrui Zhao 
---
 hw/intc/loongarch_pch_msi.c | 30 ++---
 hw/loongarch/virt.c | 11 +++
 include/hw/intc/loongarch_pch_msi.h |  3 ++-
 include/hw/pci-host/ls7a.h  |  1 -
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/hw/intc/loongarch_pch_msi.c b/hw/intc/loongarch_pch_msi.c
index b36d6d76e4..5b8de43d42 100644
--- a/hw/intc/loongarch_pch_msi.c
+++ b/hw/intc/loongarch_pch_msi.c
@@ -32,7 +32,7 @@ static void loongarch_msi_mem_write(void *opaque, hwaddr addr,
  */
 irq_num = (val & 0xff) - s->irq_base;
 trace_loongarch_msi_set_irq(irq_num);
-assert(irq_num < PCH_MSI_IRQ_NUM);
+assert(irq_num < s->irq_num);
 qemu_set_irq(s->pch_msi_irq[irq_num], 1);
 }
 
@@ -49,6 +49,29 @@ static void pch_msi_irq_handler(void *opaque, int irq, int 
level)
 qemu_set_irq(s->pch_msi_irq[irq], level);
 }
 
+static void loongarch_pch_msi_realize(DeviceState *dev, Error **errp)
+{
+LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
+
+assert(s->irq_num > 0);
+
+s->pch_msi_irq = g_malloc(sizeof(qemu_irq) * s->irq_num);
+if (!s->pch_msi_irq) {
+error_report("loongarch_pch_msi: fail to alloc memory");
+exit(1);
+}
+
+qdev_init_gpio_out(dev, s->pch_msi_irq, s->irq_num);
+qdev_init_gpio_in(dev, pch_msi_irq_handler, s->irq_num);
+}
+
+static void loongarch_pch_msi_unrealize(DeviceState *dev)
+{
+LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
+
+g_free(s->pch_msi_irq);
+}
+
 static void loongarch_pch_msi_init(Object *obj)
 {
 LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(obj);
@@ -59,12 +82,11 @@ static void loongarch_pch_msi_init(Object *obj)
 sysbus_init_mmio(sbd, &s->msi_mmio);
 msi_nonbroken = true;
 
-qdev_init_gpio_out(DEVICE(obj), s->pch_msi_irq, PCH_MSI_IRQ_NUM);
-qdev_init_gpio_in(DEVICE(obj), pch_msi_irq_handler, PCH_MSI_IRQ_NUM);
 }
 
 static Property loongarch_msi_properties[] = {
 DEFINE_PROP_UINT32("msi_irq_base", LoongArchPCHMSI, irq_base, 0),
+DEFINE_PROP_UINT32("msi_irq_num",  LoongArchPCHMSI, irq_num, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -72,6 +94,8 @@ static void loongarch_pch_msi_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc->realize = loongarch_pch_msi_realize;
+dc->unrealize = loongarch_pch_msi_unrealize;
 device_class_set_props(dc, loongarch_msi_properties);
 }
 
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 958be74fa1..3547d5f711 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -496,7 +496,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 LoongArchCPU *lacpu;
 CPULoongArchState *env;
 CPUState *cpu_state;
-int cpu, pin, i;
+int cpu, pin, i, start, num;
 
 ipi = qdev_new(TYPE_LOONGARCH_IPI);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(ipi), &error_fatal);
@@ -576,14 +576,17 @@ static void loongarch_irq_init(LoongArchMachineState 
*lams)
 }
 
 pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);
-qdev_prop_set_uint32(pch_msi, "msi_irq_base", PCH_MSI_IRQ_START);
+start   =  PCH_PIC_IRQ_NUM;
+num = 256 - start;
+qdev_prop_set_uint32(pch_msi, "msi_irq_base", start);
+qdev_prop_set_uint32(pch_msi, "msi_irq_num", num);
 d = SYS_BUS_DEVICE(pch_msi);
 sysbus_realize_and_unref(d, &error_fatal);
 sysbus_mmio_map(d, 0, VIRT_PCH_MSI_ADDR_LOW);
-for (i = 0; i < PCH_MSI_IRQ_NUM; i++) {
+for (i = 0; i < num; i++) {
 /* Connect 192 pch_msi irqs to extioi */
 qdev_connect_gpio_out(DEVICE(d), i,
-  qdev_get_gpio_in(extioi, i + PCH_MSI_IRQ_START));
+  qdev_get_gpio_in(extioi, i + start));
 }
 
 loongarch_devices_init(pch_pic, lams);
diff --git a/include/hw/intc/loongarch_pch_msi.h 
b/include/hw/intc/loongarch_pch_msi.h
index 6d67560dea..c5a52bc327 100644
--- a/include/hw/intc/loongarch_pch_msi.h
+++ b/include/hw/intc/loongarch_pch_msi.h
@@ -15,8 +15,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(LoongArchPCHMSI, LOONGARCH_PCH_MSI)
 
 struct LoongArchPCHMSI {
 SysBusDevice parent_obj;
-qemu_irq pch_msi_irq[PCH_MSI_IRQ_NUM];
+qemu_irq *pch_msi_irq;
 MemoryRegion msi_mmio;
 /* irq base passed to upper extioi intc */
 unsigned int irq_base;
+unsigned int irq_num;
 };
diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index df7fa55a30..6443327bd7 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -34,7 +34,6 @@
  */
 #define PCH_PIC_IRQ_OFFSET   64
 #define VIRT_DEVICE_IRQS 16
-#define VIRT_PCI_IRQS48
 #define VIRT_UART_IRQ(PCH_PIC_IRQ_OFFSET + 2)
 #define VIRT_UART_BASE   0x1fe001e0
 #define VIRT_UART_SIZE   0X100
-- 
2.31.1

[PATCH v2 0/2] Add irq number property for loongarch pch interrupt controller

2022-12-23 Thread Tianrui Zhao

This series add irq number property for loongarch pch_msi
and pch_pic interrupt controller.

Changes for v2:
(1) Free pch_msi_irq array in pch_msi_unrealize().

Changes for v1:
(1) Add irq number property for loongarch_pch_msi.
(2) Add irq number property for loongarch_pch_pic.

Tianrui Zhao (2):
  hw/intc/loongarch_pch_msi: add irq number property
  hw/intc/loongarch_pch_pic: add irq number property

 hw/intc/loongarch_pch_msi.c | 30 ++---
 hw/intc/loongarch_pch_pic.c | 29 
 hw/loongarch/virt.c | 19 +++---
 include/hw/intc/loongarch_pch_msi.h |  3 ++-
 include/hw/intc/loongarch_pch_pic.h |  5 ++---
 include/hw/pci-host/ls7a.h  |  1 -
 6 files changed, 68 insertions(+), 19 deletions(-)

-- 
2.31.1

[PATCH v2 2/2] hw/intc/loongarch_pch_pic: add irq number property

2022-12-23 Thread Tianrui Zhao

With loongarch 7A1000 manual, irq number supported can be set
in PCH_PIC_INT_ID_HI register. This patch adds irq number property
for loongarch_pch_pic, so that virt machine can set different
irq number when pch_pic intc is added.

Signed-off-by: Tianrui Zhao 
---
 hw/intc/loongarch_pch_pic.c | 29 +
 hw/loongarch/virt.c | 10 ++
 include/hw/intc/loongarch_pch_pic.h |  5 ++---
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/hw/intc/loongarch_pch_pic.c b/hw/intc/loongarch_pch_pic.c
index 3380b09807..26f36501b4 100644
--- a/hw/intc/loongarch_pch_pic.c
+++ b/hw/intc/loongarch_pch_pic.c
@@ -10,6 +10,7 @@
 #include "hw/loongarch/virt.h"
 #include "hw/irq.h"
 #include "hw/intc/loongarch_pch_pic.h"
+#include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
 #include "trace.h"
 
@@ -40,7 +41,7 @@ static void pch_pic_irq_handler(void *opaque, int irq, int 
level)
 LoongArchPCHPIC *s = LOONGARCH_PCH_PIC(opaque);
 uint64_t mask = 1ULL << irq;
 
-assert(irq < PCH_PIC_IRQ_NUM);
+assert(irq < s->irq_num);
 trace_loongarch_pch_pic_irq_handler(irq, level);
 
 if (s->intedge & mask) {
@@ -78,7 +79,12 @@ static uint64_t loongarch_pch_pic_low_readw(void *opaque, 
hwaddr addr,
 val = PCH_PIC_INT_ID_VAL;
 break;
 case PCH_PIC_INT_ID_HI:
-val = PCH_PIC_INT_ID_NUM;
+/*
+ * With 7A1000 manual
+ *   bit  0-15 pch irqchip version
+ *   bit 16-31 irq number supported with pch irqchip
+ */
+val = PCH_PIC_INT_ID_VER + ((s->irq_num - 1) << 16);
 break;
 case PCH_PIC_INT_MASK_LO:
 val = (uint32_t)s->int_mask;
@@ -365,6 +371,16 @@ static void loongarch_pch_pic_reset(DeviceState *d)
 s->int_polarity = 0x0;
 }
 
+static void loongarch_pch_pic_realize(DeviceState *dev, Error **errp)
+{
+LoongArchPCHPIC *s = LOONGARCH_PCH_PIC(dev);
+
+assert(s->irq_num > 0 && (s->irq_num <= 64));
+
+qdev_init_gpio_out(dev, s->parent_irq, s->irq_num);
+qdev_init_gpio_in(dev, pch_pic_irq_handler, s->irq_num);
+}
+
 static void loongarch_pch_pic_init(Object *obj)
 {
 LoongArchPCHPIC *s = LOONGARCH_PCH_PIC(obj);
@@ -382,10 +398,13 @@ static void loongarch_pch_pic_init(Object *obj)
 sysbus_init_mmio(sbd, &s->iomem8);
 sysbus_init_mmio(sbd, &s->iomem32_high);
 
-qdev_init_gpio_out(DEVICE(obj), s->parent_irq, PCH_PIC_IRQ_NUM);
-qdev_init_gpio_in(DEVICE(obj), pch_pic_irq_handler, PCH_PIC_IRQ_NUM);
 }
 
+static Property loongarch_pch_pic_properties[] = {
+DEFINE_PROP_UINT32("pch_pic_irq_num",  LoongArchPCHPIC, irq_num, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static const VMStateDescription vmstate_loongarch_pch_pic = {
 .name = TYPE_LOONGARCH_PCH_PIC,
 .version_id = 1,
@@ -411,8 +430,10 @@ static void loongarch_pch_pic_class_init(ObjectClass 
*klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc->realize = loongarch_pch_pic_realize;
 dc->reset = loongarch_pch_pic_reset;
 dc->vmsd = &vmstate_loongarch_pch_pic;
+device_class_set_props(dc, loongarch_pch_pic_properties);
 }
 
 static const TypeInfo loongarch_pch_pic_info = {
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 3547d5f711..761eb81c65 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -559,6 +559,8 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 }
 
 pch_pic = qdev_new(TYPE_LOONGARCH_PCH_PIC);
+num = PCH_PIC_IRQ_NUM;
+qdev_prop_set_uint32(pch_pic, "pch_pic_irq_num", num);
 d = SYS_BUS_DEVICE(pch_pic);
 sysbus_realize_and_unref(d, &error_fatal);
 memory_region_add_subregion(get_system_memory(), VIRT_IOAPIC_REG_BASE,
@@ -570,13 +572,13 @@ static void loongarch_irq_init(LoongArchMachineState 
*lams)
 VIRT_IOAPIC_REG_BASE + PCH_PIC_INT_STATUS_LO,
 sysbus_mmio_get_region(d, 2));
 
-/* Connect 64 pch_pic irqs to extioi */
-for (int i = 0; i < PCH_PIC_IRQ_NUM; i++) {
+/* Connect pch_pic irqs to extioi */
+for (int i = 0; i < num; i++) {
 qdev_connect_gpio_out(DEVICE(d), i, qdev_get_gpio_in(extioi, i));
 }
 
 pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);
-start   =  PCH_PIC_IRQ_NUM;
+start   =  num;
 num = 256 - start;
 qdev_prop_set_uint32(pch_msi, "msi_irq_base", start);
 qdev_prop_set_uint32(pch_msi, "msi_irq_num", num);
@@ -584,7 +586,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 sysbus_realize_and_unref(d, &error_fatal);
 sysbus_mmio_map(d, 0, VIRT_PCH_MSI_ADDR_LOW);
 for (i = 0; i < num; i++) {
-/* Connect 192 pch_msi irqs to extioi */
+/* Connect pch_msi irqs to extioi */
 qdev_connect_gpio_out(DEVICE(d), i,
   qdev_get_gpio_in(extioi, i + start));
 }
diff --git a/include/hw/intc/loongarch_pch_pic.h 
b/include/hw/intc/loongarch_pch_pic.h
index 2d4aa9ed6f..ba3a47fa88 100

Re: [PATCH 3/3] intel-iommu: build iova tree during IOMMU translation

2022-12-23 Thread Jason Wang

On Tue, Dec 6, 2022 at 9:58 PM Peter Xu  wrote:
>
> On Tue, Dec 06, 2022 at 11:18:03AM +0800, Jason Wang wrote:
> > On Tue, Dec 6, 2022 at 7:19 AM Peter Xu  wrote:
> > >
> > > Jason,
> > >
> > > On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > > > I'm fine to go without iova-tree. Would you mind to post patches for
> > > > fix? I can test and include it in this series then.
> > >
> > > One sample patch attached, only compile tested.
> >
> > I don't see any direct connection between the attached patch and the
> > intel-iommu?
>
> Sorry!  Wrong tree dumped...  Trying again.

The HWADDR breaks memory_region_notify_iommu_one():

qemu-system-x86_64: ../softmmu/memory.c:1991:
memory_region_notify_iommu_one: Assertion `entry->iova >=
notifier->start && entry_end <= notifier->end' failed.

I wonder if we need either:

1) remove the assert

or

2) introduce a new memory_region_notify_unmap_all() to unmap from
notifier->start to notifier->end.

Thanks

>
> >
> > >
> > > I can also work on this but I'll be slow in making progress, so I'll add 
> > > it
> > > into my todo.  If you can help to fix this issue it'll be more than great.
> >
> > Ok, let me try but it might take some time :)
>
> Sure. :)
>
> I'll also add it into my todo (and I think the other similar one has been
> there for a while.. :( ).
>
> --
> Peter Xu

95 matches

Mail list logo