Re: [PATCH v3 02/13] tcg/s390x: Remove TCG_REG_TB

2022-12-06 Thread Thomas Huth

On 06/12/2022 23.22, Richard Henderson wrote:

On 12/6/22 13:29, Ilya Leoshkevich wrote:

On Thu, Dec 01, 2022 at 10:51:49PM -0800, Richard Henderson wrote:

This reverts 829e1376d940 ("tcg/s390: Introduce TCG_REG_TB"), and
several follow-up patches.  The primary motivation is to reduce the
less-tested code paths, pre-z10.  Secondarily, this allows the
unconditional use of TCG_TARGET_HAS_direct_jump, which might be more
important for performance than any slight increase in code size.

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.h |   2 +-
  tcg/s390x/tcg-target.c.inc | 176 +
  2 files changed, 23 insertions(+), 155 deletions(-)


Reviewed-by: Ilya Leoshkevich 

I have a few questions/ideas for the future below.

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 22d70d431b..645f522058 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -103,7 +103,7 @@ extern uint64_t s390_facilities[3];
  #define TCG_TARGET_HAS_mulsh_i32  0
  #define TCG_TARGET_HAS_extrl_i64_i32  0
  #define TCG_TARGET_HAS_extrh_i64_i32  0
-#define TCG_TARGET_HAS_direct_jump    HAVE_FACILITY(GEN_INST_EXT)
+#define TCG_TARGET_HAS_direct_jump    1


This change doesn't seem to affect that, but what is the minimum
supported s390x qemu host? z900?


Possibly z990, if I'm reading the gcc processor_flags_table[] correctly; 
long-displacement-facility is definitely a minimum.


We probably should revisit what the minimum for TCG should be, assert those 
features at startup, and drop the corresponding runtime tests.


If we consider the official IBM support statement:

https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Mainframe%20Life%20Cycle%20History%20V2.10%20-%20Sept%2013%202022_1.pdf

... that would mean that the z10 and all older machines are not supported 
anymore.


 Thomas




Re: [PATCH] target/riscv: Set pc_succ_insn for !rvc illegal insn

2022-12-06 Thread Philippe Mathieu-Daudé

On 3/12/22 18:57, Richard Henderson wrote:

Failure to set pc_succ_insn may result in a TB covering zero bytes,
which triggers an assert within the code generator.

Cc: qemu-sta...@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1224
Signed-off-by: Richard Henderson 
---
  target/riscv/translate.c  | 12 
  tests/tcg/Makefile.target |  2 ++
  tests/tcg/riscv64/Makefile.target |  5 +
  tests/tcg/riscv64/test-noc.S  | 32 +++
  4 files changed, 43 insertions(+), 8 deletions(-)
  create mode 100644 tests/tcg/riscv64/test-noc.S

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] target/riscv: Set pc_succ_insn for !rvc illegal insn

2022-12-06 Thread Alistair Francis
On Sun, Dec 4, 2022 at 3:58 AM Richard Henderson
 wrote:
>
> Failure to set pc_succ_insn may result in a TB covering zero bytes,
> which triggers an assert within the code generator.
>
> Cc: qemu-sta...@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1224
> Signed-off-by: Richard Henderson 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/translate.c  | 12 
>  tests/tcg/Makefile.target |  2 ++
>  tests/tcg/riscv64/Makefile.target |  5 +
>  tests/tcg/riscv64/test-noc.S  | 32 +++
>  4 files changed, 43 insertions(+), 8 deletions(-)
>  create mode 100644 tests/tcg/riscv64/test-noc.S
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index db123da5ec..1ed4bb5ec3 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -1064,14 +1064,10 @@ static void decode_opc(CPURISCVState *env, 
> DisasContext *ctx, uint16_t opcode)
>
>  /* Check for compressed insn */
>  if (insn_len(opcode) == 2) {
> -if (!has_ext(ctx, RVC)) {
> -gen_exception_illegal(ctx);
> -} else {
> -ctx->opcode = opcode;
> -ctx->pc_succ_insn = ctx->base.pc_next + 2;
> -if (decode_insn16(ctx, opcode)) {
> -return;
> -}
> +ctx->opcode = opcode;
> +ctx->pc_succ_insn = ctx->base.pc_next + 2;
> +if (has_ext(ctx, RVC) && decode_insn16(ctx, opcode)) {
> +return;
>  }
>  } else {
>  uint32_t opcode32 = opcode;
> diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
> index 75257f2b29..14bc013181 100644
> --- a/tests/tcg/Makefile.target
> +++ b/tests/tcg/Makefile.target
> @@ -117,6 +117,8 @@ endif
>
>  %: %.c
> $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
> +%: %.S
> +   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
>  else
>  # For softmmu targets we include a different Makefile fragement as the
>  # build options for bare programs are usually pretty different. They
> diff --git a/tests/tcg/riscv64/Makefile.target 
> b/tests/tcg/riscv64/Makefile.target
> index b5b89dfb0e..9973ba3b5f 100644
> --- a/tests/tcg/riscv64/Makefile.target
> +++ b/tests/tcg/riscv64/Makefile.target
> @@ -4,3 +4,8 @@
>  VPATH += $(SRC_PATH)/tests/tcg/riscv64
>  TESTS += test-div
>  TESTS += noexec
> +
> +# Disable compressed instructions for test-noc
> +TESTS += test-noc
> +test-noc: LDFLAGS = -nostdlib -static
> +run-test-noc: QEMU_OPTS += -cpu rv64,c=false
> diff --git a/tests/tcg/riscv64/test-noc.S b/tests/tcg/riscv64/test-noc.S
> new file mode 100644
> index 00..e29d60c8b3
> --- /dev/null
> +++ b/tests/tcg/riscv64/test-noc.S
> @@ -0,0 +1,32 @@
> +#include 
> +
> +   .text
> +   .globl _start
> +_start:
> +   .option norvc
> +   li  a0, 4   /* SIGILL */
> +   la  a1, sa
> +   li  a2, 0
> +   li  a3, 8
> +   li  a7, __NR_rt_sigaction
> +   scall
> +
> +   .option rvc
> +   li  a0, 1
> +   j   exit
> +   .option norvc
> +
> +pass:
> +   li  a0, 0
> +exit:
> +   li  a7, __NR_exit
> +   scall
> +
> +   .data
> +   /* struct kernel_sigaction sa = { .sa_handler = pass }; */
> +   .type   sa, @object
> +   .size   sa, 32
> +sa:
> +   .dword  pass
> +   .zero   24
> +
> --
> 2.34.1
>
>



Re: [PATCH v10 7/9] KVM: Update lpage info when private/shared memory are mixed

2022-12-06 Thread Isaku Yamahata
On Tue, Dec 06, 2022 at 08:02:24PM +0800,
Chao Peng  wrote:

> On Mon, Dec 05, 2022 at 02:49:59PM -0800, Isaku Yamahata wrote:
> > On Fri, Dec 02, 2022 at 02:13:45PM +0800,
> > Chao Peng  wrote:
> > 
> > > A large page with mixed private/shared subpages can't be mapped as large
> > > page since its sub private/shared pages are from different memory
> > > backends and may also treated by architecture differently. When
> > > private/shared memory are mixed in a large page, the current lpage_info
> > > is not sufficient to decide whether the page can be mapped as large page
> > > or not and additional private/shared mixed information is needed.
> > > 
> > > Tracking this 'mixed' information with the current 'count' like
> > > disallow_lpage is a bit challenge so reserve a bit in 'disallow_lpage'
> > > to indicate a large page has mixed private/share subpages and update
> > > this 'mixed' bit whenever the memory attribute is changed between
> > > private and shared.
> > > 
> > > Signed-off-by: Chao Peng 
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |   8 ++
> > >  arch/x86/kvm/mmu/mmu.c  | 134 +++-
> > >  arch/x86/kvm/x86.c  |   2 +
> > >  include/linux/kvm_host.h|  19 +
> > >  virt/kvm/kvm_main.c |   9 ++-
> > >  5 files changed, 169 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > b/arch/x86/include/asm/kvm_host.h
> > > index 283cbb83d6ae..7772ab37ac89 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -38,6 +38,7 @@
> > >  #include 
> > >  
> > >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
> > > +#define __KVM_HAVE_ARCH_SET_MEMORY_ATTRIBUTES
> > >  
> > >  #define KVM_MAX_VCPUS 1024
> > >  
> > > @@ -1011,6 +1012,13 @@ struct kvm_vcpu_arch {
> > >  #endif
> > >  };
> > >  
> > > +/*
> > > + * Use a bit in disallow_lpage to indicate private/shared pages mixed at 
> > > the
> > > + * level. The remaining bits are used as a reference count.
> > > + */
> > > +#define KVM_LPAGE_PRIVATE_SHARED_MIXED   (1U << 31)
> > > +#define KVM_LPAGE_COUNT_MAX  ((1U << 31) - 1)
> > > +
> > >  struct kvm_lpage_info {
> > >   int disallow_lpage;
> > >  };
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index e2c70b5afa3e..2190fd8c95c0 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -763,11 +763,16 @@ static void update_gfn_disallow_lpage_count(const 
> > > struct kvm_memory_slot *slot,
> > >  {
> > >   struct kvm_lpage_info *linfo;
> > >   int i;
> > > + int disallow_count;
> > >  
> > >   for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
> > >   linfo = lpage_info_slot(gfn, slot, i);
> > > +
> > > + disallow_count = linfo->disallow_lpage & KVM_LPAGE_COUNT_MAX;
> > > + WARN_ON(disallow_count + count < 0 ||
> > > + disallow_count > KVM_LPAGE_COUNT_MAX - count);
> > > +
> > >   linfo->disallow_lpage += count;
> > > - WARN_ON(linfo->disallow_lpage < 0);
> > >   }
> > >  }
> > >  
> > > @@ -6986,3 +6991,130 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> > >   if (kvm->arch.nx_huge_page_recovery_thread)
> > >   kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
> > >  }
> > > +
> > > +static bool linfo_is_mixed(struct kvm_lpage_info *linfo)
> > > +{
> > > + return linfo->disallow_lpage & KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > > +}
> > > +
> > > +static void linfo_set_mixed(gfn_t gfn, struct kvm_memory_slot *slot,
> > > + int level, bool mixed)
> > > +{
> > > + struct kvm_lpage_info *linfo = lpage_info_slot(gfn, slot, level);
> > > +
> > > + if (mixed)
> > > + linfo->disallow_lpage |= KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > > + else
> > > + linfo->disallow_lpage &= ~KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > > +}
> > > +
> > > +static bool is_expected_attr_entry(void *entry, unsigned long 
> > > expected_attrs)
> > > +{
> > > + bool expect_private = expected_attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> > > +
> > > + if (xa_to_value(entry) & KVM_MEMORY_ATTRIBUTE_PRIVATE) {
> > > + if (!expect_private)
> > > + return false;
> > > + } else if (expect_private)
> > > + return false;
> > > +
> > > + return true;
> > > +}
> > > +
> > > +static bool mem_attrs_mixed_2m(struct kvm *kvm, unsigned long attrs,
> > > +gfn_t start, gfn_t end)
> > > +{
> > > + XA_STATE(xas, >mem_attr_array, start);
> > > + gfn_t gfn = start;
> > > + void *entry;
> > > + bool mixed = false;
> > > +
> > > + rcu_read_lock();
> > > + entry = xas_load();
> > > + while (gfn < end) {
> > > + if (xas_retry(, entry))
> > > + continue;
> > > +
> > > + KVM_BUG_ON(gfn != xas.xa_index, kvm);
> > > +
> > > + if (!is_expected_attr_entry(entry, attrs)) {
> > > + mixed = true;
> > > + break;
> > > + }

Re: [PATCH v10 5/9] KVM: Use gfn instead of hva for mmu_notifier_retry

2022-12-06 Thread Isaku Yamahata
On Tue, Dec 06, 2022 at 07:56:23PM +0800,
Chao Peng  wrote:

> > > -   if (unlikely(kvm->mmu_invalidate_in_progress) &&
> > > -   hva >= kvm->mmu_invalidate_range_start &&
> > > -   hva < kvm->mmu_invalidate_range_end)
> > > -   return 1;
> > > +   if (unlikely(kvm->mmu_invalidate_in_progress)) {
> > > +   /*
> > > +* Dropping mmu_lock after bumping 
> > > mmu_invalidate_in_progress
> > > +* but before updating the range is a KVM bug.
> > > +*/
> > > +   if (WARN_ON_ONCE(kvm->mmu_invalidate_range_start == 
> > > INVALID_GPA ||
> > > +kvm->mmu_invalidate_range_end == 
> > > INVALID_GPA))
> > 
> > INVALID_GPA is an x86-specific define in
> > arch/x86/include/asm/kvm_host.h, so this doesn't build on other
> > architectures. The obvious fix is to move it to
> > include/linux/kvm_host.h.
> 
> Hmm, INVALID_GPA is defined as ZERO for x86, not 100% confident this is
> correct choice for other architectures, but after search it has not been
> used for other architectures, so should be safe to make it common.

INVALID_GPA is defined as all bit 1.  Please notice "~" (tilde).

#define INVALID_GPA (~(gpa_t)0)
-- 
Isaku Yamahata 



How to best make include/hw/pci/pcie_sriov.h self-contained

2022-12-06 Thread Markus Armbruster
pcie_sriov.h needs PCI_NUM_REGIONS from pci.h, but doesn't include it.
pci.h must be included before pcie_sriov.h or else compile fails.

Adding #include "pci/pci.h" to pcie_sriov would be wrong, because it
would close an inclusion loop: pci.h includes pcie.h (for
PCIExpressDevice) includes pcie_sriov.h (for PCIESriovPF) includes pci.h
(for PCI_NUM_REGIONS).

The obvious solution is to move PCI_NUM_REGIONS pci.h somewhere
pcie_sriov.h can include without creating a loop.

We already have a few headers that don't include anything: pci_ids.h,
pci_regs.h (includes include/standard-headers/linux/pci_regs.h, which
doesn't count), pcie_regs.h.  Moving PCI_NUM_REGIONS to one of these
would work, but it doesn't feel right.

We could create a new one, say pci_defs.h.  Just for PCI_NUM_REGIONS
feels silly.  So, what else should move there?

Any other ideas?

In case you wonder why I bother you with this...

Back in 2016, we discussed[1] rules for headers, and these were
generally liked:

1. Have a carefully curated header that's included everywhere first.  We
   got that already thanks to Peter: osdep.h.

2. Headers should normally include everything they need beyond osdep.h.
   If exceptions are needed for some reason, they must be documented in
   the header.  If all that's needed from a header is typedefs, put
   those into qemu/typedefs.h instead of including the header.

3. Cyclic inclusion is forbidden.

I'm working on patches to get include/ closer to obeying 2.

[1] Message-ID: <87h9g8j57d@blackfin.pond.sub.org>
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html




Re: [PATCH] target/riscv: Set pc_succ_insn for !rvc illegal insn

2022-12-06 Thread Alistair Francis
On Sun, Dec 4, 2022 at 3:58 AM Richard Henderson
 wrote:
>
> Failure to set pc_succ_insn may result in a TB covering zero bytes,
> which triggers an assert within the code generator.
>
> Cc: qemu-sta...@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1224
> Signed-off-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/translate.c  | 12 
>  tests/tcg/Makefile.target |  2 ++
>  tests/tcg/riscv64/Makefile.target |  5 +
>  tests/tcg/riscv64/test-noc.S  | 32 +++
>  4 files changed, 43 insertions(+), 8 deletions(-)
>  create mode 100644 tests/tcg/riscv64/test-noc.S
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index db123da5ec..1ed4bb5ec3 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -1064,14 +1064,10 @@ static void decode_opc(CPURISCVState *env, 
> DisasContext *ctx, uint16_t opcode)
>
>  /* Check for compressed insn */
>  if (insn_len(opcode) == 2) {
> -if (!has_ext(ctx, RVC)) {
> -gen_exception_illegal(ctx);
> -} else {
> -ctx->opcode = opcode;
> -ctx->pc_succ_insn = ctx->base.pc_next + 2;
> -if (decode_insn16(ctx, opcode)) {
> -return;
> -}
> +ctx->opcode = opcode;
> +ctx->pc_succ_insn = ctx->base.pc_next + 2;
> +if (has_ext(ctx, RVC) && decode_insn16(ctx, opcode)) {
> +return;
>  }
>  } else {
>  uint32_t opcode32 = opcode;
> diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
> index 75257f2b29..14bc013181 100644
> --- a/tests/tcg/Makefile.target
> +++ b/tests/tcg/Makefile.target
> @@ -117,6 +117,8 @@ endif
>
>  %: %.c
> $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
> +%: %.S
> +   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
>  else
>  # For softmmu targets we include a different Makefile fragement as the
>  # build options for bare programs are usually pretty different. They
> diff --git a/tests/tcg/riscv64/Makefile.target 
> b/tests/tcg/riscv64/Makefile.target
> index b5b89dfb0e..9973ba3b5f 100644
> --- a/tests/tcg/riscv64/Makefile.target
> +++ b/tests/tcg/riscv64/Makefile.target
> @@ -4,3 +4,8 @@
>  VPATH += $(SRC_PATH)/tests/tcg/riscv64
>  TESTS += test-div
>  TESTS += noexec
> +
> +# Disable compressed instructions for test-noc
> +TESTS += test-noc
> +test-noc: LDFLAGS = -nostdlib -static
> +run-test-noc: QEMU_OPTS += -cpu rv64,c=false
> diff --git a/tests/tcg/riscv64/test-noc.S b/tests/tcg/riscv64/test-noc.S
> new file mode 100644
> index 00..e29d60c8b3
> --- /dev/null
> +++ b/tests/tcg/riscv64/test-noc.S
> @@ -0,0 +1,32 @@
> +#include 
> +
> +   .text
> +   .globl _start
> +_start:
> +   .option norvc
> +   li  a0, 4   /* SIGILL */
> +   la  a1, sa
> +   li  a2, 0
> +   li  a3, 8
> +   li  a7, __NR_rt_sigaction
> +   scall
> +
> +   .option rvc
> +   li  a0, 1
> +   j   exit
> +   .option norvc
> +
> +pass:
> +   li  a0, 0
> +exit:
> +   li  a7, __NR_exit
> +   scall
> +
> +   .data
> +   /* struct kernel_sigaction sa = { .sa_handler = pass }; */
> +   .type   sa, @object
> +   .size   sa, 32
> +sa:
> +   .dword  pass
> +   .zero   24
> +
> --
> 2.34.1
>
>



Re: [PATCH] target/riscv: Fix mret exception cause when no pmp rule is configured

2022-12-06 Thread Alistair Francis
On Mon, Dec 5, 2022 at 4:54 PM Bin Meng  wrote:
>
> The priv spec v1.12 says:
>
>   If no PMP entry matches an M-mode access, the access succeeds. If
>   no PMP entry matches an S-mode or U-mode access, but at least one
>   PMP entry is implemented, the access fails. Failed accesses generate
>   an instruction, load, or store access-fault exception.
>
> At present the exception cause is set to 'illegal instruction' but
> should have been 'instruction access fault'.
>
> Fixes: d102f19a2085 ("target/riscv/pmp: Raise exception if no PMP entry is 
> configured")
> Signed-off-by: Bin Meng 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>
>  target/riscv/op_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
> index 09f1f5185d..d7af7f056b 100644
> --- a/target/riscv/op_helper.c
> +++ b/target/riscv/op_helper.c
> @@ -202,7 +202,7 @@ target_ulong helper_mret(CPURISCVState *env)
>
>  if (riscv_feature(env, RISCV_FEATURE_PMP) &&
>  !pmp_get_num_rules(env) && (prev_priv != PRV_M)) {
> -riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
> +riscv_raise_exception(env, RISCV_EXCP_INST_ACCESS_FAULT, GETPC());
>  }
>
>  target_ulong prev_virt = get_field(env->mstatus, MSTATUS_MPV);
> --
> 2.34.1
>
>



Re: [PATCH 15/15] hw/intc: sifive_plic: Fix the pending register range check

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:10 AM Bin Meng  wrote:
>
> The pending register upper limit is currently set to
> plic->num_sources >> 3, which is wrong, e.g.: considering
> plic->num_sources is 7, the upper limit becomes 0 which fails
> the range check if reading the pending register at pending_base.
>
> Fixes: 1e24429e40df ("SiFive RISC-V PLIC Block")
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
>  hw/intc/sifive_plic.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
> index 7a6a358c57..a3fc8222c7 100644
> --- a/hw/intc/sifive_plic.c
> +++ b/hw/intc/sifive_plic.c
> @@ -143,7 +143,8 @@ static uint64_t sifive_plic_read(void *opaque, hwaddr 
> addr, unsigned size)
>  uint32_t irq = (addr - plic->priority_base) >> 2;
>
>  return plic->source_priority[irq];
> -} else if (addr_between(addr, plic->pending_base, plic->num_sources >> 
> 3)) {
> +} else if (addr_between(addr, plic->pending_base,
> +(plic->num_sources + 31) >> 3)) {
>  uint32_t word = (addr - plic->pending_base) >> 2;
>
>  return plic->pending[word];
> @@ -202,7 +203,7 @@ static void sifive_plic_write(void *opaque, hwaddr addr, 
> uint64_t value,
>  sifive_plic_update(plic);
>  }
>  } else if (addr_between(addr, plic->pending_base,
> -plic->num_sources >> 3)) {
> +(plic->num_sources + 31) >> 3)) {
>  qemu_log_mask(LOG_GUEST_ERROR,
>"%s: invalid pending write: 0x%" HWADDR_PRIx "",
>__func__, addr);
> --
> 2.34.1
>
>



Re: [PATCH 14/15] hw/riscv: opentitan: Drop "hartid-base" and "priority-base" initialization

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:09 AM Bin Meng  wrote:
>
> "hartid-base" and "priority-base" are zero by default. There is no
> need to initialize them to zero again.

What is the defaults change though? I feel like these are worth leaving in

Alistair

>
> Signed-off-by: Bin Meng 
> ---
>
>  hw/riscv/opentitan.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
> index be7ff1eea0..da73aa51f5 100644
> --- a/hw/riscv/opentitan.c
> +++ b/hw/riscv/opentitan.c
> @@ -165,10 +165,8 @@ static void lowrisc_ibex_soc_realize(DeviceState 
> *dev_soc, Error **errp)
>
>  /* PLIC */
>  qdev_prop_set_string(DEVICE(>plic), "hart-config", "M");
> -qdev_prop_set_uint32(DEVICE(>plic), "hartid-base", 0);
>  qdev_prop_set_uint32(DEVICE(>plic), "num-sources", 180);
>  qdev_prop_set_uint32(DEVICE(>plic), "num-priorities", 3);
> -qdev_prop_set_uint32(DEVICE(>plic), "priority-base", 0x00);
>  qdev_prop_set_uint32(DEVICE(>plic), "pending-base", 0x1000);
>  qdev_prop_set_uint32(DEVICE(>plic), "enable-base", 0x2000);
>  qdev_prop_set_uint32(DEVICE(>plic), "enable-stride", 32);
> --
> 2.34.1
>
>



Re: [PATCH 13/15] hw/intc: sifive_plic: Change "priority-base" to start from interrupt source 0

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:10 AM Bin Meng  wrote:
>
> At present the SiFive PLIC model "priority-base" expects interrupt
> priority register base starting from source 1 instead source 0,
> that's why on most platforms "priority-base" is set to 0x04 except
> 'opentitan' machine. 'opentitan' should have set "priority-base"
> to 0x04 too.
>
> Note the irq number calculation in sifive_plic_{read,write} is
> correct as the codes make up for the irq number by adding 1.
>
> Let's simply update "priority-base" to start from interrupt source
> 0 and add a comment to make it crystal clear.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  include/hw/riscv/microchip_pfsoc.h | 2 +-
>  include/hw/riscv/shakti_c.h| 2 +-
>  include/hw/riscv/sifive_e.h| 2 +-
>  include/hw/riscv/sifive_u.h| 2 +-
>  include/hw/riscv/virt.h| 2 +-
>  hw/intc/sifive_plic.c  | 5 +++--
>  6 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/hw/riscv/microchip_pfsoc.h 
> b/include/hw/riscv/microchip_pfsoc.h
> index 9720bac2d5..c10d87a601 100644
> --- a/include/hw/riscv/microchip_pfsoc.h
> +++ b/include/hw/riscv/microchip_pfsoc.h
> @@ -152,7 +152,7 @@ enum {
>
>  #define MICROCHIP_PFSOC_PLIC_NUM_SOURCES187
>  #define MICROCHIP_PFSOC_PLIC_NUM_PRIORITIES 7
> -#define MICROCHIP_PFSOC_PLIC_PRIORITY_BASE  0x04
> +#define MICROCHIP_PFSOC_PLIC_PRIORITY_BASE  0x00
>  #define MICROCHIP_PFSOC_PLIC_PENDING_BASE   0x1000
>  #define MICROCHIP_PFSOC_PLIC_ENABLE_BASE0x2000
>  #define MICROCHIP_PFSOC_PLIC_ENABLE_STRIDE  0x80
> diff --git a/include/hw/riscv/shakti_c.h b/include/hw/riscv/shakti_c.h
> index daf0aae13f..539fe1156d 100644
> --- a/include/hw/riscv/shakti_c.h
> +++ b/include/hw/riscv/shakti_c.h
> @@ -65,7 +65,7 @@ enum {
>  #define SHAKTI_C_PLIC_NUM_SOURCES 28
>  /* Excluding Priority 0 */
>  #define SHAKTI_C_PLIC_NUM_PRIORITIES 2
> -#define SHAKTI_C_PLIC_PRIORITY_BASE 0x04
> +#define SHAKTI_C_PLIC_PRIORITY_BASE 0x00
>  #define SHAKTI_C_PLIC_PENDING_BASE 0x1000
>  #define SHAKTI_C_PLIC_ENABLE_BASE 0x2000
>  #define SHAKTI_C_PLIC_ENABLE_STRIDE 0x80
> diff --git a/include/hw/riscv/sifive_e.h b/include/hw/riscv/sifive_e.h
> index 9e58247fd8..b824a79e2d 100644
> --- a/include/hw/riscv/sifive_e.h
> +++ b/include/hw/riscv/sifive_e.h
> @@ -89,7 +89,7 @@ enum {
>   */
>  #define SIFIVE_E_PLIC_NUM_SOURCES 53
>  #define SIFIVE_E_PLIC_NUM_PRIORITIES 7
> -#define SIFIVE_E_PLIC_PRIORITY_BASE 0x04
> +#define SIFIVE_E_PLIC_PRIORITY_BASE 0x00
>  #define SIFIVE_E_PLIC_PENDING_BASE 0x1000
>  #define SIFIVE_E_PLIC_ENABLE_BASE 0x2000
>  #define SIFIVE_E_PLIC_ENABLE_STRIDE 0x80
> diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
> index 8f63a183c4..e680d61ece 100644
> --- a/include/hw/riscv/sifive_u.h
> +++ b/include/hw/riscv/sifive_u.h
> @@ -158,7 +158,7 @@ enum {
>
>  #define SIFIVE_U_PLIC_NUM_SOURCES 54
>  #define SIFIVE_U_PLIC_NUM_PRIORITIES 7
> -#define SIFIVE_U_PLIC_PRIORITY_BASE 0x04
> +#define SIFIVE_U_PLIC_PRIORITY_BASE 0x00
>  #define SIFIVE_U_PLIC_PENDING_BASE 0x1000
>  #define SIFIVE_U_PLIC_ENABLE_BASE 0x2000
>  #define SIFIVE_U_PLIC_ENABLE_STRIDE 0x80
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index 7c23aea4a0..37819c168c 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -99,7 +99,7 @@ enum {
>  #define VIRT_IRQCHIP_MAX_GUESTS_BITS 3
>  #define VIRT_IRQCHIP_MAX_GUESTS ((1U << VIRT_IRQCHIP_MAX_GUESTS_BITS) - 1U)
>
> -#define VIRT_PLIC_PRIORITY_BASE 0x04
> +#define VIRT_PLIC_PRIORITY_BASE 0x00
>  #define VIRT_PLIC_PENDING_BASE 0x1000
>  #define VIRT_PLIC_ENABLE_BASE 0x2000
>  #define VIRT_PLIC_ENABLE_STRIDE 0x80
> diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
> index 2bd292410d..7a6a358c57 100644
> --- a/hw/intc/sifive_plic.c
> +++ b/hw/intc/sifive_plic.c
> @@ -140,7 +140,7 @@ static uint64_t sifive_plic_read(void *opaque, hwaddr 
> addr, unsigned size)
>  SiFivePLICState *plic = opaque;
>
>  if (addr_between(addr, plic->priority_base, plic->num_sources << 2)) {
> -uint32_t irq = ((addr - plic->priority_base) >> 2) + 1;
> +uint32_t irq = (addr - plic->priority_base) >> 2;
>
>  return plic->source_priority[irq];
>  } else if (addr_between(addr, plic->pending_base, plic->num_sources >> 
> 3)) {
> @@ -187,7 +187,7 @@ static void sifive_plic_write(void *opaque, hwaddr addr, 
> uint64_t value,
>  SiFivePLICState *plic = opaque;
>
>  if (addr_between(addr, plic->priority_base, plic->num_sources << 2)) {
> -uint32_t irq = ((addr - plic->priority_base) >> 2) + 1;
> +uint32_t irq = (addr - plic->priority_base) >> 2;
>
>  if (((plic->num_priorities + 1) & plic->num_priorities) == 0) {
>  /*
> @@ -428,6 +428,7 @@ static Property sifive_plic_properties[] = {
>  /* number of interrupt sources including interrupt source 0 */
>  DEFINE_PROP_UINT32("num-sources", 

Re: [PATCH 12/15] hw/riscv: virt: Fix the value of "riscv, ndev" in the dtb

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:11 AM Bin Meng  wrote:
>
> Commit 28d8c281200f ("hw/riscv: virt: Add optional AIA IMSIC support to virt 
> machine")
> changed the value of VIRT_IRQCHIP_NUM_SOURCES from 127 to 53, which
> is VIRTIO_NDEV and also used as the value of "riscv,ndev" property
> in the dtb. Unfortunately this is wrong as VIRT_IRQCHIP_NUM_SOURCES
> should include interrupt source 0 but "riscv,ndev" does not.
>
> While we are here, we also fix the comments of platform bus irq range
> which is now "64 to 96", but should be "64 to 95", introduced since
> commit 1832b7cb3f64 ("hw/riscv: virt: Create a platform bus").
>
> Fixes: 28d8c281200f ("hw/riscv: virt: Add optional AIA IMSIC support to virt 
> machine")
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  include/hw/riscv/virt.h | 5 ++---
>  hw/riscv/virt.c | 3 ++-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index be4ab8fe7f..7c23aea4a0 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -87,15 +87,14 @@ enum {
>  VIRTIO_IRQ = 1, /* 1 to 8 */
>  VIRTIO_COUNT = 8,
>  PCIE_IRQ = 0x20, /* 32 to 35 */
> -VIRT_PLATFORM_BUS_IRQ = 64, /* 64 to 96 */
> -VIRTIO_NDEV = 96 /* Arbitrary maximum number of interrupts */
> +VIRT_PLATFORM_BUS_IRQ = 64, /* 64 to 95 */
>  };
>
>  #define VIRT_PLATFORM_BUS_NUM_IRQS 32
>
>  #define VIRT_IRQCHIP_IPI_MSI 1
>  #define VIRT_IRQCHIP_NUM_MSIS 255
> -#define VIRT_IRQCHIP_NUM_SOURCES VIRTIO_NDEV
> +#define VIRT_IRQCHIP_NUM_SOURCES 96
>  #define VIRT_IRQCHIP_NUM_PRIO_BITS 3
>  #define VIRT_IRQCHIP_MAX_GUESTS_BITS 3
>  #define VIRT_IRQCHIP_MAX_GUESTS ((1U << VIRT_IRQCHIP_MAX_GUESTS_BITS) - 1U)
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index a5bc7353b4..c4ee489a80 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -468,7 +468,8 @@ static void create_fdt_socket_plic(RISCVVirtState *s,
>  plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
>  qemu_fdt_setprop_cells(mc->fdt, plic_name, "reg",
>  0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> -qemu_fdt_setprop_cell(mc->fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> +qemu_fdt_setprop_cell(mc->fdt, plic_name, "riscv,ndev",
> +  VIRT_IRQCHIP_NUM_SOURCES - 1);
>  riscv_socket_fdt_write_id(mc, mc->fdt, plic_name, socket);
>  qemu_fdt_setprop_cell(mc->fdt, plic_name, "phandle",
>  plic_phandles[socket]);
> --
> 2.34.1
>
>



Re: [PATCH 11/15] hw/riscv: sifive_u: Avoid using magic number for "riscv, ndev"

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:11 AM Bin Meng  wrote:
>
> At present magic number is used to create "riscv,ndev" property
> in the dtb. Let's use the macro SIFIVE_U_PLIC_NUM_SOURCES that
> is used to instantiate the PLIC model instead.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  hw/riscv/sifive_u.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
> index b139824aab..b40a4767e2 100644
> --- a/hw/riscv/sifive_u.c
> +++ b/hw/riscv/sifive_u.c
> @@ -287,7 +287,8 @@ static void create_fdt(SiFiveUState *s, const MemMapEntry 
> *memmap,
>  qemu_fdt_setprop_cells(fdt, nodename, "reg",
>  0x0, memmap[SIFIVE_U_DEV_PLIC].base,
>  0x0, memmap[SIFIVE_U_DEV_PLIC].size);
> -qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", 0x35);
> +qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev",
> +  SIFIVE_U_PLIC_NUM_SOURCES - 1);
>  qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
>  plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
>  g_free(cells);
> --
> 2.34.1
>
>



Re: [PATCH 10/15] hw/riscv: sifive_e: Fix the number of interrupt sources of PLIC

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:12 AM Bin Meng  wrote:
>
> Per chapter 10 in Freedom E310 manuals [1][2][3], E310 G002 and G003
> supports 52 interrupt sources while G000 supports 51 interrupt sources.
>
> We use the value of G002 and G003, so it is 53 (including source 0).
>
> [1] G000 manual:
> https://sifive.cdn.prismic.io/sifive/4faf3e34-4a42-4c2f-be9e-c77baa4928c7_fe310-g000-manual-v3p2.pdf
>
> [2] G002 manual:
> https://sifive.cdn.prismic.io/sifive/034760b5-ac6a-4b1c-911c-f4148bb2c4a5_fe310-g002-v1p5.pdf
>
> [3] G003 manual:
> https://sifive.cdn.prismic.io/sifive/3af39c59-6498-471e-9dab-5355a0d539eb_fe310-g003-manual.pdf
>
> Fixes: eb637edb1241 ("SiFive Freedom E Series RISC-V Machine")
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  include/hw/riscv/sifive_e.h | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/riscv/sifive_e.h b/include/hw/riscv/sifive_e.h
> index d738745925..9e58247fd8 100644
> --- a/include/hw/riscv/sifive_e.h
> +++ b/include/hw/riscv/sifive_e.h
> @@ -82,7 +82,12 @@ enum {
>  };
>
>  #define SIFIVE_E_PLIC_HART_CONFIG "M"
> -#define SIFIVE_E_PLIC_NUM_SOURCES 127
> +/*
> + * Freedom E310 G002 and G003 supports 52 interrupt sources while
> + * Freedom E310 G000 supports 51 interrupt sources. We use the value
> + * of G002 and G003, so it is 53 (including interrupt source 0).
> + */
> +#define SIFIVE_E_PLIC_NUM_SOURCES 53
>  #define SIFIVE_E_PLIC_NUM_PRIORITIES 7
>  #define SIFIVE_E_PLIC_PRIORITY_BASE 0x04
>  #define SIFIVE_E_PLIC_PENDING_BASE 0x1000
> --
> 2.34.1
>
>



Re: [PATCH 09/15] hw/riscv: microchip_pfsoc: Fix the number of interrupt sources of PLIC

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:11 AM Bin Meng  wrote:
>
> Per chapter 6.5.2 in [1], the number of interupt sources including
> interrupt source 0 should be 187.
>
> [1] PolarFire SoC MSS TRM:
> https://ww1.microchip.com/downloads/aemDocuments/documents/FPGA/ProductDocuments/ReferenceManuals/PolarFire_SoC_FPGA_MSS_Technical_Reference_Manual_VC.pdf
>
> Fixes: 56f6e31e7b7e ("hw/riscv: Initial support for Microchip PolarFire SoC 
> Icicle Kit board")
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  include/hw/riscv/microchip_pfsoc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/hw/riscv/microchip_pfsoc.h 
> b/include/hw/riscv/microchip_pfsoc.h
> index a757b240e0..9720bac2d5 100644
> --- a/include/hw/riscv/microchip_pfsoc.h
> +++ b/include/hw/riscv/microchip_pfsoc.h
> @@ -150,7 +150,7 @@ enum {
>  #define MICROCHIP_PFSOC_MANAGEMENT_CPU_COUNT1
>  #define MICROCHIP_PFSOC_COMPUTE_CPU_COUNT   4
>
> -#define MICROCHIP_PFSOC_PLIC_NUM_SOURCES185
> +#define MICROCHIP_PFSOC_PLIC_NUM_SOURCES187
>  #define MICROCHIP_PFSOC_PLIC_NUM_PRIORITIES 7
>  #define MICROCHIP_PFSOC_PLIC_PRIORITY_BASE  0x04
>  #define MICROCHIP_PFSOC_PLIC_PENDING_BASE   0x1000
> --
> 2.34.1
>
>



Re: [PATCH 08/15] hw/intc: sifive_plic: Update "num-sources" property default value

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:12 AM Bin Meng  wrote:
>
> At present the default value of "num-sources" property is zero,
> which does not make a lot of sense, as in sifive_plic_realize()
> we see s->bitfield_words is calculated by:
>
>   s->bitfield_words = (s->num_sources + 31) >> 5;
>
> if the we don't configure "num-sources" property its default value
> zero makes s->bitfield_words zero too, which isn't true because
> interrupt source 0 still occupies one word.
>
> Let's change the default value to 1 meaning that only interrupt
> source 0 is supported by default and a sanity check in realize().
>
> While we are here, add a comment to describe the exact meaning of
> this property that the number should include interrupt source 0.
> A wrong multi-line comment format is corrected too.
>
> Signed-off-by: Bin Meng 
> ---
>
>  hw/intc/sifive_plic.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
> index 5fd9a53569..2bd292410d 100644
> --- a/hw/intc/sifive_plic.c
> +++ b/hw/intc/sifive_plic.c
> @@ -363,6 +363,11 @@ static void sifive_plic_realize(DeviceState *dev, Error 
> **errp)
>
>  parse_hart_config(s);
>
> +if (!s->num_sources) {
> +error_report("plic: invalid number of interrupt sources");

We should propagate the error up via errp instead

Otherwise:

Reviewed-by: Alistair Francis 

Alistair

> +exit(1);
> +}
> +
>  s->bitfield_words = (s->num_sources + 31) >> 5;
>  s->num_enables = s->bitfield_words * s->num_addrs;
>  s->source_priority = g_new0(uint32_t, s->num_sources);
> @@ -379,7 +384,8 @@ static void sifive_plic_realize(DeviceState *dev, Error 
> **errp)
>  s->m_external_irqs = g_malloc(sizeof(qemu_irq) * s->num_harts);
>  qdev_init_gpio_out(dev, s->m_external_irqs, s->num_harts);
>
> -/* We can't allow the supervisor to control SEIP as this would allow the
> +/*
> + * We can't allow the supervisor to control SEIP as this would allow the
>   * supervisor to clear a pending external interrupt which will result in
>   * lost a interrupt in the case a PLIC is attached. The SEIP bit must be
>   * hardware controlled when a PLIC is attached.
> @@ -419,7 +425,8 @@ static const VMStateDescription vmstate_sifive_plic = {
>  static Property sifive_plic_properties[] = {
>  DEFINE_PROP_STRING("hart-config", SiFivePLICState, hart_config),
>  DEFINE_PROP_UINT32("hartid-base", SiFivePLICState, hartid_base, 0),
> -DEFINE_PROP_UINT32("num-sources", SiFivePLICState, num_sources, 0),
> +/* number of interrupt sources including interrupt source 0 */
> +DEFINE_PROP_UINT32("num-sources", SiFivePLICState, num_sources, 1),
>  DEFINE_PROP_UINT32("num-priorities", SiFivePLICState, num_priorities, 0),
>  DEFINE_PROP_UINT32("priority-base", SiFivePLICState, priority_base, 0),
>  DEFINE_PROP_UINT32("pending-base", SiFivePLICState, pending_base, 0),
> --
> 2.34.1
>
>



Re: [PATCH 07/15] hw/intc: sifive_plic: Improve robustness of the PLIC config parser

2022-12-06 Thread Alistair Francis
On Fri, Dec 2, 2022 at 12:15 AM Bin Meng  wrote:
>
> At present the PLIC config parser can only handle legal config string
> like "MS,MS". However if a config string like ",MS,MS,,MS,MS,," is
> given the parser won't get the correct configuration.
>
> This commit improves the config parser to make it more robust.
>
> Signed-off-by: Bin Meng 

Acked-by: Alistair Francis 

Alistair

> ---
>
>  hw/intc/sifive_plic.c | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
> index 3f6ffb1d70..5fd9a53569 100644
> --- a/hw/intc/sifive_plic.c
> +++ b/hw/intc/sifive_plic.c
> @@ -290,7 +290,7 @@ static void sifive_plic_reset(DeviceState *dev)
>   */
>  static void parse_hart_config(SiFivePLICState *plic)
>  {
> -int addrid, hartid, modes;
> +int addrid, hartid, modes, m;
>  const char *p;
>  char c;
>
> @@ -299,11 +299,13 @@ static void parse_hart_config(SiFivePLICState *plic)
>  p = plic->hart_config;
>  while ((c = *p++)) {
>  if (c == ',') {
> -addrid += ctpop8(modes);
> -modes = 0;
> -hartid++;
> +if (modes) {
> +addrid += ctpop8(modes);
> +hartid++;
> +modes = 0;
> +}
>  } else {
> -int m = 1 << char_to_mode(c);
> +m = 1 << char_to_mode(c);
>  if (modes == (modes | m)) {
>  error_report("plic: duplicate mode '%c' in config: %s",
>   c, plic->hart_config);
> @@ -314,8 +316,9 @@ static void parse_hart_config(SiFivePLICState *plic)
>  }
>  if (modes) {
>  addrid += ctpop8(modes);
> +hartid++;
> +modes = 0;
>  }
> -hartid++;
>
>  plic->num_addrs = addrid;
>  plic->num_harts = hartid;
> @@ -326,11 +329,16 @@ static void parse_hart_config(SiFivePLICState *plic)
>  p = plic->hart_config;
>  while ((c = *p++)) {
>  if (c == ',') {
> -hartid++;
> +if (modes) {
> +hartid++;
> +modes = 0;
> +}
>  } else {
> +m = char_to_mode(c);
>  plic->addr_config[addrid].addrid = addrid;
>  plic->addr_config[addrid].hartid = hartid;
> -plic->addr_config[addrid].mode = char_to_mode(c);
> +plic->addr_config[addrid].mode = m;
> +modes |= (1 << m);
>  addrid++;
>  }
>  }
> --
> 2.34.1
>
>



Re: [PATCH v3 1/3] hw/misc: sifive_e_aon: Support the watchdog timer of HiFive 1 rev b.

2022-12-06 Thread Alistair Francis
On Wed, Nov 30, 2022 at 11:56 AM Tommy Wu  wrote:
>
> The watchdog timer is in the always-on domain device of HiFive 1 rev b,
> so this patch added the AON device to the sifive_e machine. This patch
> only implemented the functionality of the watchdog timer.
>
> Signed-off-by: Tommy Wu 
> ---
>  hw/misc/Kconfig|   3 +
>  hw/misc/meson.build|   1 +
>  hw/misc/sifive_e_aon.c | 383 +
>  include/hw/misc/sifive_e_aon.h |  60 ++
>  4 files changed, 447 insertions(+)
>  create mode 100644 hw/misc/sifive_e_aon.c
>  create mode 100644 include/hw/misc/sifive_e_aon.h
>
> diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
> index cbabe9f78c..7d1247822e 100644
> --- a/hw/misc/Kconfig
> +++ b/hw/misc/Kconfig
> @@ -162,6 +162,9 @@ config SIFIVE_TEST
>  config SIFIVE_E_PRCI
>  bool
>
> +config SIFIVE_E_AON
> +bool
> +
>  config SIFIVE_U_OTP
>  bool
>
> diff --git a/hw/misc/meson.build b/hw/misc/meson.build
> index 95268eddc0..94170dce76 100644
> --- a/hw/misc/meson.build
> +++ b/hw/misc/meson.build
> @@ -31,6 +31,7 @@ softmmu_ss.add(when: 'CONFIG_MCHP_PFSOC_IOSCB', if_true: 
> files('mchp_pfsoc_ioscb
>  softmmu_ss.add(when: 'CONFIG_MCHP_PFSOC_SYSREG', if_true: 
> files('mchp_pfsoc_sysreg.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_TEST', if_true: files('sifive_test.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_E_PRCI', if_true: 
> files('sifive_e_prci.c'))
> +softmmu_ss.add(when: 'CONFIG_SIFIVE_E_AON', if_true: files('sifive_e_aon.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_U_OTP', if_true: files('sifive_u_otp.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_U_PRCI', if_true: 
> files('sifive_u_prci.c'))
>
> diff --git a/hw/misc/sifive_e_aon.c b/hw/misc/sifive_e_aon.c
> new file mode 100644
> index 00..27ec26cf7c
> --- /dev/null
> +++ b/hw/misc/sifive_e_aon.c
> @@ -0,0 +1,383 @@
> +/*
> + * SiFive HiFive1 AON (Always On Domain) for QEMU.
> + *
> + * Copyright (c) 2022 SiFive, Inc. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/timer.h"
> +#include "qemu/log.h"
> +#include "hw/irq.h"
> +#include "hw/registerfields.h"
> +#include "hw/misc/sifive_e_aon.h"
> +#include "qapi/visitor.h"
> +#include "qapi/error.h"
> +#include "sysemu/watchdog.h"
> +
> +REG32(AON_WDT_WDOGCFG, 0x0)
> +FIELD(AON_WDT_WDOGCFG,
> +  SCALE, 0, 4)
> +FIELD(AON_WDT_WDOGCFG,
> +  RSVD0, 4, 4)
> +FIELD(AON_WDT_WDOGCFG,
> +  RSTEN, 8, 1)
> +FIELD(AON_WDT_WDOGCFG,
> +  ZEROCMP, 9, 1)
> +FIELD(AON_WDT_WDOGCFG,
> +  RSVD1, 10, 2)
> +FIELD(AON_WDT_WDOGCFG,
> +  EN_ALWAYS, 12, 1)
> +FIELD(AON_WDT_WDOGCFG,
> +  EN_CORE_AWAKE, 13, 1)
> +FIELD(AON_WDT_WDOGCFG,
> +  RSVD2, 14, 14)
> +FIELD(AON_WDT_WDOGCFG,
> +  IP0, 28, 1)
> +FIELD(AON_WDT_WDOGCFG,
> +  RSVD3, 29, 3)
> +REG32(AON_WDT_WDOGCOUNT, 0x8)
> +REG32(AON_WDT_WDOGS, 0x10)
> +REG32(AON_WDT_WDOGFEED, 0x18)
> +REG32(AON_WDT_WDOGKEY, 0x1c)
> +REG32(AON_WDT_WDOGCMP0, 0x20)
> +
> +static void sifive_e_aon_wdt_update_wdogcount(SiFiveEAONState *r)
> +{
> +int64_t now;
> +if (0 == FIELD_EX32(r->wdogcfg,
> +AON_WDT_WDOGCFG,
> +EN_ALWAYS) &&
> +0 == FIELD_EX32(r->wdogcfg,
> +AON_WDT_WDOGCFG,
> +EN_CORE_AWAKE)) {
> +return;
> +}
> +
> +now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +r->wdogcount += muldiv64(now - r->wdog_restart_time,
> + r->wdogclk_freq, NANOSECONDS_PER_SECOND);
> +/* Clean the most significant bit. */
> +r->wdogcount = ((r->wdogcount << 1) >> 1);
> +r->wdog_restart_time = now;
> +}
> +
> +static void sifive_e_aon_wdt_update_state(SiFiveEAONState *r)
> +{
> +uint16_t wdogs;
> +bool cmp_signal = false;
> +sifive_e_aon_wdt_update_wdogcount(r);
> +wdogs = (uint16_t)(r->wdogcount >>
> +FIELD_EX32(r->wdogcfg,
> +   AON_WDT_WDOGCFG,
> +   SCALE));
> +if (wdogs >= r->wdogcmp0) {
> +cmp_signal = true;
> +if (1 == FIELD_EX32(r->wdogcfg,
> +AON_WDT_WDOGCFG,
> +ZEROCMP)) {
> +   

Re: [PATCH v4] RISC-V: Add Zawrs ISA extension support

2022-12-06 Thread Alistair Francis
On Thu, Oct 6, 2022 at 12:52 AM Christoph Muellner
 wrote:
>
> This patch adds support for the Zawrs ISA extension.
> Given the current (incomplete) implementation of reservation sets
> there seems to be no way to provide a full emulation of the WRS
> instruction (wake on reservation set invalidation or timeout or
> interrupt). Therefore, we just exit the TB and return to the main loop.
>
> The specification can be found here:
>   https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
>
> Note, that the Zawrs extension is frozen, but not ratified yet.
>
> Changes since v3:
> * Remove "RFC" since the extension is frozen
> * Rebase on master and fix integration issues
> * Fix entry ordering in extension list
>
> Changes since v2:
> * Rebase on master and resolve conflicts
> * Adjustments according to a specification change
> * Inline REQUIRE_ZAWRS() since it has only one user
>
> Changes since v1:
> * Adding zawrs to the ISA string that is passed to the kernel
>
> Signed-off-by: Christoph Müllner 
> ---
>  target/riscv/cpu.c  |  7 +++
>  target/riscv/cpu.h  |  1 +
>  target/riscv/insn32.decode  |  4 ++
>  target/riscv/insn_trans/trans_rvzawrs.c.inc | 51 +
>  target/riscv/translate.c|  1 +
>  5 files changed, 64 insertions(+)
>  create mode 100644 target/riscv/insn_trans/trans_rvzawrs.c.inc
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index b29c88b9f0..b08ce94ba6 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -76,6 +76,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
>  ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
>  ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
>  ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
> ext_zihintpause),
> +ISA_EXT_DATA_ENTRY(zawrs, true, PRIV_VERSION_1_12_0, ext_zawrs),
>  ISA_EXT_DATA_ENTRY(zfh, true, PRIV_VERSION_1_12_0, ext_zfh),
>  ISA_EXT_DATA_ENTRY(zfhmin, true, PRIV_VERSION_1_12_0, ext_zfhmin),
>  ISA_EXT_DATA_ENTRY(zfinx, true, PRIV_VERSION_1_12_0, ext_zfinx),
> @@ -744,6 +745,11 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>
> +if ((cpu->cfg.ext_zawrs) && !cpu->cfg.ext_a) {
> +error_setg(errp, "Zawrs extension requires A extension");
> +return;
> +}
> +
>  if ((cpu->cfg.ext_zfh || cpu->cfg.ext_zfhmin) && !cpu->cfg.ext_f) {
>  error_setg(errp, "Zfh/Zfhmin extensions require F extension");
>  return;
> @@ -999,6 +1005,7 @@ static Property riscv_cpu_extensions[] = {
>  DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
>  DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
>  DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
> +DEFINE_PROP_BOOL("Zawrs", RISCVCPU, cfg.ext_zawrs, true),
>  DEFINE_PROP_BOOL("Zfh", RISCVCPU, cfg.ext_zfh, false),
>  DEFINE_PROP_BOOL("Zfhmin", RISCVCPU, cfg.ext_zfhmin, false),
>  DEFINE_PROP_BOOL("Zve32f", RISCVCPU, cfg.ext_zve32f, false),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index b131fa8c8e..2b87966373 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -446,6 +446,7 @@ struct RISCVCPUConfig {
>  bool ext_svnapot;
>  bool ext_svpbmt;
>  bool ext_zdinx;
> +bool ext_zawrs;
>  bool ext_zfh;
>  bool ext_zfhmin;
>  bool ext_zfinx;
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index d0253b8104..b7e7613ea2 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -718,6 +718,10 @@ vsetvli 0 ... . 111 . 1010111  
> @r2_zimm11
>  vsetivli11 .. . 111 . 1010111  @r2_zimm10
>  vsetvl  100 . . 111 . 1010111  @r
>
> +# *** Zawrs Standard Extension ***
> +wrs_nto1101 0 000 0 1110011
> +wrs_sto00011101 0 000 0 1110011
> +
>  # *** RV32 Zba Standard Extension ***
>  sh1add 001 .. 010 . 0110011 @r
>  sh2add 001 .. 100 . 0110011 @r
> diff --git a/target/riscv/insn_trans/trans_rvzawrs.c.inc 
> b/target/riscv/insn_trans/trans_rvzawrs.c.inc
> new file mode 100644
> index 00..f0da2fe50a
> --- /dev/null
> +++ b/target/riscv/insn_trans/trans_rvzawrs.c.inc
> @@ -0,0 +1,51 @@
> +/*
> + * RISC-V translation routines for the RISC-V Zawrs Extension.
> + *
> + * Copyright (c) 2022 Christoph Muellner, christoph.muell...@vrull.io
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of 

Re: [PATCH] hw/intc: sifive_plic: fix out-of-bound access of source_priority array

2022-12-06 Thread Alistair Francis
On Mon, Nov 28, 2022 at 2:59 AM Jim Shu  wrote:
>
> If the number of interrupt is not multiple of 32, PLIC will have
> out-of-bound access to source_priority array. Compute the number of
> interrupt in the last word to avoid this out-of-bound access of array.
>
> Signed-off-by: Jim Shu 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  hw/intc/sifive_plic.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
> index c2dfacf028..1cf156cf85 100644
> --- a/hw/intc/sifive_plic.c
> +++ b/hw/intc/sifive_plic.c
> @@ -78,6 +78,7 @@ static uint32_t sifive_plic_claimed(SiFivePLICState *plic, 
> uint32_t addrid)
>  uint32_t max_irq = 0;
>  uint32_t max_prio = plic->target_priority[addrid];
>  int i, j;
> +int num_irq_in_word = 32;
>
>  for (i = 0; i < plic->bitfield_words; i++) {
>  uint32_t pending_enabled_not_claimed =
> @@ -88,7 +89,16 @@ static uint32_t sifive_plic_claimed(SiFivePLICState *plic, 
> uint32_t addrid)
>  continue;
>  }
>
> -for (j = 0; j < 32; j++) {
> +if (i == (plic->bitfield_words - 1)) {
> +/*
> + * If plic->num_sources is not multiple of 32, num-of-irq in last
> + * word is not 32. Compute the num-of-irq of last word to avoid
> + * out-of-bound access of source_priority array.
> + */
> +num_irq_in_word = plic->num_sources - ((plic->bitfield_words - 
> 1) << 5);
> +}
> +
> +for (j = 0; j < num_irq_in_word; j++) {
>  int irq = (i << 5) + j;
>  uint32_t prio = plic->source_priority[irq];
>  int enabled = pending_enabled_not_claimed & (1 << j);
> --
> 2.17.1
>
>



[QEMU][PATCH v3 0/4] Introduce Xilinx Versal CANFD

2022-12-06 Thread Vikram Garhwal
Hi,
This patch implements CANFD controller for xlnx-versal-virt machine. There are
two controllers CANFD0@0xFF06_ and CANFD1@0xFF07_ are connected to the
machine.

Also, added basic qtests for data exchange between both the controllers in
various supported configs.

Changelog:
v2->v3:
Corrected reg2frame().
Added assert to prevent out of bound cases.
Replace tx_id link list with GSList and removed sorting function.
Replaced PTIMER_POLICY_LEGACY with proper timer policies.
Corrected minor code format issues.

v1->v2
Update xlnx-versal-virt.rst with CANFD examples and add this in 03/05 patch.
Addressed comments for patch 02/05 and 03/05.
Add reviewed-by tags for patch 01/05, 04/05 and 05/05.
Change commit message for patch 02/05.
Add SPDX license for Qtest.

Regards,
Vikram

Vikram Garhwal (4):
  MAINTAINERS: Include canfd tests under Xilinx CAN
  hw/net/can: Introduce Xilinx Versal CANFD controller
  xlnx-versal: Connect Xilinx VERSAL CANFD controllers
  tests/qtest: Introduce tests for Xilinx VERSAL CANFD controller

 MAINTAINERS  |2 +-
 docs/system/arm/xlnx-versal-virt.rst |   31 +
 hw/arm/xlnx-versal-virt.c|   48 +
 hw/arm/xlnx-versal.c |   37 +
 hw/net/can/meson.build   |1 +
 hw/net/can/trace-events  |7 +
 hw/net/can/xlnx-versal-canfd.c   | 2121 ++
 include/hw/arm/xlnx-versal.h |   12 +
 include/hw/net/xlnx-versal-canfd.h   |   90 ++
 tests/qtest/meson.build  |1 +
 tests/qtest/xlnx-canfd-test.c|  422 +
 11 files changed, 2771 insertions(+), 1 deletion(-)
 create mode 100644 hw/net/can/xlnx-versal-canfd.c
 create mode 100644 include/hw/net/xlnx-versal-canfd.h
 create mode 100644 tests/qtest/xlnx-canfd-test.c

-- 
2.17.1




[QEMU][PATCH v3 2/4] hw/net/can: Introduce Xilinx Versal CANFD controller

2022-12-06 Thread Vikram Garhwal
The Xilinx Versal CANFD controller is developed based on SocketCAN, QEMU CAN bus
implementation. Bus connection and socketCAN connection for each CAN module
can be set through command lines.

Signed-off-by: Vikram Garhwal 
---
 hw/net/can/meson.build |1 +
 hw/net/can/trace-events|7 +
 hw/net/can/xlnx-versal-canfd.c | 2121 
 include/hw/net/xlnx-versal-canfd.h |   90 ++
 4 files changed, 2219 insertions(+)
 create mode 100644 hw/net/can/xlnx-versal-canfd.c
 create mode 100644 include/hw/net/xlnx-versal-canfd.h

diff --git a/hw/net/can/meson.build b/hw/net/can/meson.build
index 8fabbd9ee6..8d85201cb0 100644
--- a/hw/net/can/meson.build
+++ b/hw/net/can/meson.build
@@ -5,3 +5,4 @@ softmmu_ss.add(when: 'CONFIG_CAN_PCI', if_true: 
files('can_mioe3680_pci.c'))
 softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD', if_true: files('ctucan_core.c'))
 softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD_PCI', if_true: files('ctucan_pci.c'))
 softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP', if_true: files('xlnx-zynqmp-can.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-canfd.c'))
diff --git a/hw/net/can/trace-events b/hw/net/can/trace-events
index 8346a98ab5..de64ac1b31 100644
--- a/hw/net/can/trace-events
+++ b/hw/net/can/trace-events
@@ -7,3 +7,10 @@ xlnx_can_filter_mask_pre_write(uint8_t filter_num, uint32_t 
value) "Filter%d MAS
 xlnx_can_tx_data(uint32_t id, uint8_t dlc, uint8_t db0, uint8_t db1, uint8_t 
db2, uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: 
ID: 0x%08x DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 
0x%02x"
 xlnx_can_rx_data(uint32_t id, uint32_t dlc, uint8_t db0, uint8_t db1, uint8_t 
db2, uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: 
ID: 0x%08x DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 
0x%02x"
 xlnx_can_rx_discard(uint32_t status) "Controller is not enabled for bus 
communication. Status Register: 0x%08x"
+
+# xlnx-versal-canfd.c
+xlnx_canfd_update_irq(char *path, uint32_t isr, uint32_t ier, uint32_t irq) 
"%s: ISR: 0x%08x IER: 0x%08x IRQ: 0x%08x"
+xlnx_canfd_rx_fifo_filter_reject(char *path, uint32_t id, uint8_t dlc) "%s: 
Frame: ID: 0x%08x DLC: 0x%02x"
+xlnx_canfd_rx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flags) "%s: 
Frame: ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x"
+xlnx_canfd_tx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flgas) "%s: 
Frame: ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x"
+xlnx_canfd_reset(char *path, uint32_t val) "%s: Resetting controller with 
value = 0x%08x"
diff --git a/hw/net/can/xlnx-versal-canfd.c b/hw/net/can/xlnx-versal-canfd.c
new file mode 100644
index 00..02a32d6042
--- /dev/null
+++ b/hw/net/can/xlnx-versal-canfd.c
@@ -0,0 +1,2121 @@
+/*
+ * QEMU model of the Xilinx Versal CANFD device.
+ *
+ * This implementation is based on the following datasheet:
+ * https://docs.xilinx.com/v/u/2.0-English/pg223-canfd
+ *
+ * Copyright (c) 2022 AMD Inc.
+ *
+ * Written-by: Vikram Garhwal
+ *
+ * Based on QEMU CANFD Device emulation implemented by Jin Yang, Deniz Eren and
+ * Pavel Pisa
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/irq.h"
+#include "hw/register.h"
+#include "qapi/error.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "qemu/cutils.h"
+#include "qemu/event_notifier.h"
+#include "hw/qdev-properties.h"
+#include "qom/object_interfaces.h"
+#include "migration/vmstate.h"
+#include "hw/net/xlnx-versal-canfd.h"
+#include "trace.h"
+
+/*
+ * This is done to avoid the build issues on Windows machines. The ERROR field
+ * of INTERRUPT_STATUS_REGISTER collides with a macro in the Windows build
+ * environment.
+ */
+#undef ERROR
+
+REG32(SOFTWARE_RESET_REGISTER, 0x0)
+FIELD(SOFTWARE_RESET_REGISTER, CEN, 1, 1)
+FIELD(SOFTWARE_RESET_REGISTER, SRST, 0, 1)

[QEMU][PATCH v3 1/4] MAINTAINERS: Include canfd tests under Xilinx CAN

2022-12-06 Thread Vikram Garhwal
Signed-off-by: Vikram Garhwal 
Reviewed-by: Peter Maydell 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6966490c94..a76221f260 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1753,7 +1753,7 @@ M: Francisco Iglesias 
 S: Maintained
 F: hw/net/can/xlnx-*
 F: include/hw/net/xlnx-*
-F: tests/qtest/xlnx-can-test*
+F: tests/qtest/xlnx-can*-test*
 
 EDU
 M: Jiri Slaby 
-- 
2.17.1




[QEMU][PATCH v3 3/4] xlnx-zynqmp: Connect Xilinx VERSAL CANFD controllers

2022-12-06 Thread Vikram Garhwal
Connect CANFD0 and CANFD1 on the Versal-virt machine and update xlnx-versal-virt
document with CANFD command line examples.

Signed-off-by: Vikram Garhwal 
---
 docs/system/arm/xlnx-versal-virt.rst | 31 ++
 hw/arm/xlnx-versal-virt.c| 48 
 hw/arm/xlnx-versal.c | 37 +
 include/hw/arm/xlnx-versal.h | 12 +++
 4 files changed, 128 insertions(+)

diff --git a/docs/system/arm/xlnx-versal-virt.rst 
b/docs/system/arm/xlnx-versal-virt.rst
index 92ad10d2da..372e4249f0 100644
--- a/docs/system/arm/xlnx-versal-virt.rst
+++ b/docs/system/arm/xlnx-versal-virt.rst
@@ -34,6 +34,7 @@ Implemented devices:
 - DDR memory
 - BBRAM (36 bytes of Battery-backed RAM)
 - eFUSE (3072 bytes of one-time field-programmable bit array)
+- 2 CANFDs
 
 QEMU does not yet model any other devices, including the PL and the AI Engine.
 
@@ -224,3 +225,33 @@ To use a different index value, N, from default of 1, add:
 
   Better yet, do not use actual product data when running guest image
   on this Xilinx Versal Virt board.
+
+Using CANFDs for Versal Virt
+
+Versal CANFD controller is developed based on SocketCAN and QEMU CAN bus
+implementation. Bus connection and socketCAN connection for each CAN module
+can be set through command lines.
+
+To connect both CANFD0 and CANFD1 on the same bus:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+
+To connect CANFD0 and CANFD1 to separate buses:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus0 -object can-bus,id=canbus1 \
+-machine canbus0=canbus0 -machine canbus1=canbus1
+
+SocketCAN interface can connect to a Physical or a Virtual CAN interfaces on
+host machine. Please check this document to learn about CAN interface on Linux:
+docs/system/devices/can.rst
+
+To connect CANFD0 and CANFD1 to host machine's CAN interface can0:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+-object can-host-socketcan,id=canhost0,if=can0,canbus=canbus
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 37fc9b919c..963ace861e 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -40,9 +40,11 @@ struct VersalVirt {
 uint32_t clk_25Mhz;
 uint32_t usb;
 uint32_t dwc;
+uint32_t canfd[2];
 } phandle;
 struct arm_boot_info binfo;
 
+CanBusState *canbus[XLNX_VERSAL_NR_CANFD];
 struct {
 bool secure;
 } cfg;
@@ -235,6 +237,33 @@ static void fdt_add_uart_nodes(VersalVirt *s)
 }
 }
 
+static void fdt_add_canfd_nodes(VersalVirt *s)
+{
+uint64_t addrs[] = { MM_CANFD0, MM_CANFD1 };
+uint32_t size[] = { MM_CANFD0_SIZE, MM_CANFD1_SIZE };
+unsigned int irqs[] = { VERSAL_CANFD0_IRQ_0, VERSAL_CANFD1_IRQ_0 };
+int i;
+
+/* Create and connect CANFD0 and CANFD1 nodes to canbus0. */
+for (i = 0; i < ARRAY_SIZE(addrs); i++) {
+char *name = g_strdup_printf("/canfd@%" PRIx64, addrs[i]);
+qemu_fdt_add_subnode(s->fdt, name);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo0", 0x40);
+qemu_fdt_setprop_cell(s->fdt, name, "enable-rx-fifo1", 0x1);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo1", 0x40);
+
+qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+   GIC_FDT_IRQ_TYPE_SPI, irqs[i],
+   GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+ 2, addrs[i], 2, size[i]);
+qemu_fdt_setprop_string(s->fdt, name, "compatible",
+"xlnx,versal-canfd");
+
+g_free(name);
+}
+}
+
 static void fdt_add_fixed_link_nodes(VersalVirt *s, char *gemname,
  uint32_t phandle)
 {
@@ -639,12 +668,17 @@ static void versal_virt_init(MachineState *machine)
 TYPE_XLNX_VERSAL);
 object_property_set_link(OBJECT(>soc), "ddr", OBJECT(machine->ram),
  _abort);
+object_property_set_link(OBJECT(>soc), "canbus0", OBJECT(s->canbus[0]),
+ _abort);
+object_property_set_link(OBJECT(>soc), "canbus1", OBJECT(s->canbus[1]),
+ _abort);
 sysbus_realize(SYS_BUS_DEVICE(>soc), _fatal);
 
 fdt_create(s);
 create_virtio_regions(s);
 fdt_add_gem_nodes(s);
 fdt_add_uart_nodes(s);
+fdt_add_canfd_nodes(s);
 fdt_add_gic_nodes(s);
 fdt_add_timer_nodes(s);
 fdt_add_zdma_nodes(s);
@@ -712,6 +746,20 @@ static void versal_virt_init(MachineState *machine)
 
 static void versal_virt_machine_instance_init(Object *obj)
 {
+VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(obj);
+
+/*
+ * User can set canbus0 and canbus1 properties to can-bus object and 
connect
+ * to socketcan(optional) interface via 

[QEMU][PATCH v3 4/4] tests/qtest: Introduce tests for Xilinx VERSAL CANFD controller

2022-12-06 Thread Vikram Garhwal
The QTests perform three tests on the Xilinx VERSAL CANFD controller:
Tests the CANFD controllers in loopback.
Tests the CANFD controllers in normal mode with CAN frame.
Tests the CANFD controllers in normal mode with CANFD frame.

Signed-off-by: Vikram Garhwal 
Acked-by: Thomas Huth 
Reviewed-by: Francisco Iglesias 
---
 tests/qtest/meson.build   |   1 +
 tests/qtest/xlnx-canfd-test.c | 422 ++
 2 files changed, 423 insertions(+)
 create mode 100644 tests/qtest/xlnx-canfd-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c07a5b1a5f..9486ebee24 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -213,6 +213,7 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-test'] : []) +\
   (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-swtpm-test'] : []) +  \
   (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 
'fuzz-xlnx-dp-test'] : []) + \
+  (config_all_devices.has_key('CONFIG_XLNX_VERSAL') ? ['xlnx-canfd-test'] : 
[]) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
diff --git a/tests/qtest/xlnx-canfd-test.c b/tests/qtest/xlnx-canfd-test.c
new file mode 100644
index 00..4ccc0267d4
--- /dev/null
+++ b/tests/qtest/xlnx-canfd-test.c
@@ -0,0 +1,422 @@
+/* SPDX-License-Identifier: MIT
+ *
+ * QTests for the Xilinx Versal CANFD controller.
+ *
+ * Copyright (c) 2022 AMD Inc.
+ *
+ * Written-by: Vikram Garhwal
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+/* Base address. */
+#define CANFD0_BASE_ADDR0xff06
+#define CANFD1_BASE_ADDR0xff07
+
+/* Register addresses. */
+#define R_SRR_OFFSET0x00
+#define R_MSR_OFFSET0x04
+#define R_FILTER_CONTROL_REGISTER   0xe0
+#define R_SR_OFFSET 0x18
+#define R_ISR_OFFSET0x1c
+#define R_IER_OFFSET0x20
+#define R_ICR_OFFSET0x24
+#define R_TX_READY_REQ_REGISTER 0x90
+#define RX_FIFO_STATUS_REGISTER 0xe8
+#define R_TXID_OFFSET   0x100
+#define R_TXDLC_OFFSET  0x104
+#define R_TXDATA1_OFFSET0x108
+#define R_TXDATA2_OFFSET0x10c
+#define R_AFMR_REGISTER00xa00
+#define R_AFIR_REGISTER00xa04
+#define R_RX0_ID_OFFSET 0x2100
+#define R_RX0_DLC_OFFSET0x2104
+#define R_RX0_DATA1_OFFSET  0x2108
+#define R_RX0_DATA2_OFFSET  0x210c
+
+/* CANFD modes. */
+#define SRR_CONFIG_MODE 0x00
+#define MSR_NORMAL_MODE 0x00
+#define MSR_LOOPBACK_MODE   (1 << 1)
+#define ENABLE_CANFD(1 << 1)
+
+/* CANFD status. */
+#define STATUS_CONFIG_MODE  (1 << 0)
+#define STATUS_NORMAL_MODE  (1 << 3)
+#define STATUS_LOOPBACK_MODE(1 << 1)
+#define ISR_TXOK(1 << 1)
+#define ISR_RXOK(1 << 4)
+
+#define ENABLE_ALL_FILTERS  0x
+#define ENABLE_ALL_INTERRUPTS   0x
+
+/* We are sending one canfd message. */
+#define TX_READY_REG_VAL0x1
+
+#define FIRST_RX_STORE_INDEX0x1
+#define STATUS_REG_MASK 0xf
+#define DLC_FD_BIT_SHIFT0x1b
+#define DLC_FD_BIT_MASK 0xf800
+#define FIFO_STATUS_READ_INDEX_MASK 0x3f
+#define FIFO_STATUS_FILL_LEVEL_MASK 0x7f00
+#define FILL_LEVEL_SHIFT0x8
+
+/* CANFD frame size ID, DLC and 16 DATA word. */
+#define CANFD_FRAME_SIZE18
+/* CAN frame size ID, DLC and 2 DATA word. */
+#define CAN_FRAME_SIZE  4
+
+/* Set the filters for CANFD controller. */
+static void 

[QEMU][PATCH v3 3/4] xlnx-versal: Connect Xilinx VERSAL CANFD controllers

2022-12-06 Thread Vikram Garhwal
Connect CANFD0 and CANFD1 on the Versal-virt machine and update xlnx-versal-virt
document with CANFD command line examples.

Signed-off-by: Vikram Garhwal 
---
 docs/system/arm/xlnx-versal-virt.rst | 31 ++
 hw/arm/xlnx-versal-virt.c| 48 
 hw/arm/xlnx-versal.c | 37 +
 include/hw/arm/xlnx-versal.h | 12 +++
 4 files changed, 128 insertions(+)

diff --git a/docs/system/arm/xlnx-versal-virt.rst 
b/docs/system/arm/xlnx-versal-virt.rst
index 92ad10d2da..d2d1b26692 100644
--- a/docs/system/arm/xlnx-versal-virt.rst
+++ b/docs/system/arm/xlnx-versal-virt.rst
@@ -34,6 +34,7 @@ Implemented devices:
 - DDR memory
 - BBRAM (36 bytes of Battery-backed RAM)
 - eFUSE (3072 bytes of one-time field-programmable bit array)
+- 2 CANFDs
 
 QEMU does not yet model any other devices, including the PL and the AI Engine.
 
@@ -224,3 +225,33 @@ To use a different index value, N, from default of 1, add:
 
   Better yet, do not use actual product data when running guest image
   on this Xilinx Versal Virt board.
+
+Using CANFDs for Versal Virt
+
+Versal CANFD controller is developed based on SocketCAN and QEMU CAN bus
+implementation. Bus connection and socketCAN connection for each CAN module
+can be set through command lines.
+
+To connect both CANFD0 and CANFD1 on the same bus:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+
+To connect CANFD0 and CANFD1 to separate buses:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus0 -object can-bus,id=canbus1 \
+-machine canbus0=canbus0 -machine canbus1=canbus1
+
+The SocketCAN interface can connect to a Physical or a Virtual CAN interfaces 
on
+the host machine. Please check this document to learn about CAN interface on
+Linux: docs/system/devices/can.rst
+
+To connect CANFD0 and CANFD1 to host machine's CAN interface can0:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+-object can-host-socketcan,id=canhost0,if=can0,canbus=canbus
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 37fc9b919c..963ace861e 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -40,9 +40,11 @@ struct VersalVirt {
 uint32_t clk_25Mhz;
 uint32_t usb;
 uint32_t dwc;
+uint32_t canfd[2];
 } phandle;
 struct arm_boot_info binfo;
 
+CanBusState *canbus[XLNX_VERSAL_NR_CANFD];
 struct {
 bool secure;
 } cfg;
@@ -235,6 +237,33 @@ static void fdt_add_uart_nodes(VersalVirt *s)
 }
 }
 
+static void fdt_add_canfd_nodes(VersalVirt *s)
+{
+uint64_t addrs[] = { MM_CANFD0, MM_CANFD1 };
+uint32_t size[] = { MM_CANFD0_SIZE, MM_CANFD1_SIZE };
+unsigned int irqs[] = { VERSAL_CANFD0_IRQ_0, VERSAL_CANFD1_IRQ_0 };
+int i;
+
+/* Create and connect CANFD0 and CANFD1 nodes to canbus0. */
+for (i = 0; i < ARRAY_SIZE(addrs); i++) {
+char *name = g_strdup_printf("/canfd@%" PRIx64, addrs[i]);
+qemu_fdt_add_subnode(s->fdt, name);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo0", 0x40);
+qemu_fdt_setprop_cell(s->fdt, name, "enable-rx-fifo1", 0x1);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo1", 0x40);
+
+qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+   GIC_FDT_IRQ_TYPE_SPI, irqs[i],
+   GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+ 2, addrs[i], 2, size[i]);
+qemu_fdt_setprop_string(s->fdt, name, "compatible",
+"xlnx,versal-canfd");
+
+g_free(name);
+}
+}
+
 static void fdt_add_fixed_link_nodes(VersalVirt *s, char *gemname,
  uint32_t phandle)
 {
@@ -639,12 +668,17 @@ static void versal_virt_init(MachineState *machine)
 TYPE_XLNX_VERSAL);
 object_property_set_link(OBJECT(>soc), "ddr", OBJECT(machine->ram),
  _abort);
+object_property_set_link(OBJECT(>soc), "canbus0", OBJECT(s->canbus[0]),
+ _abort);
+object_property_set_link(OBJECT(>soc), "canbus1", OBJECT(s->canbus[1]),
+ _abort);
 sysbus_realize(SYS_BUS_DEVICE(>soc), _fatal);
 
 fdt_create(s);
 create_virtio_regions(s);
 fdt_add_gem_nodes(s);
 fdt_add_uart_nodes(s);
+fdt_add_canfd_nodes(s);
 fdt_add_gic_nodes(s);
 fdt_add_timer_nodes(s);
 fdt_add_zdma_nodes(s);
@@ -712,6 +746,20 @@ static void versal_virt_init(MachineState *machine)
 
 static void versal_virt_machine_instance_init(Object *obj)
 {
+VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(obj);
+
+/*
+ * User can set canbus0 and canbus1 properties to can-bus object and 
connect
+ * to socketcan(optional) 

Re: [PATCH v3 02/13] tcg/s390x: Remove TCG_REG_TB

2022-12-06 Thread Richard Henderson

On 12/6/22 16:22, Richard Henderson wrote:

Wouldn't it be worth keeping XILF/XIFH here?


I don't know.  It's difficult for me to guess whether a dependency chain like

     val -> xor -> xor

(3 insns with serial dependencies) is better than

     val   --> xor
     load  -/

(3 insns, but only one serial dependency) is better.  But there may also be instruction 
fusion going on at the micro-architectural level, so that there's really only one xor.


If you have suggestions, I'm all ears.


Related microarchitectural question:

If a 32-bit insn and a 64-bit insn have a similar size encoding (and perhaps even if they 
don't), is it better to produce a 64-bit output so that the hw doesn't have a false 
dependency on the upper 32-bits of the register?


Just wondering whether most of the distinction between 32-bit and 64-bit opcodes ought to 
be discarded, simplifying code generation.  The only items that seem most likely to have 
real execution time differences are multiply and divide.



r~



Re: [PATCH v2 1/2] vhost: configure all host notifiers in a single MR transaction

2022-12-06 Thread longpeng2--- via




在 2022/12/6 18:45, Philippe Mathieu-Daudé 写道:
On 6/12/22 11:28, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) wrote:



在 2022/12/6 17:07, Philippe Mathieu-Daudé 写道:

On 6/12/22 09:18, Longpeng(Mike) via wrote:

From: Longpeng 

This allows the vhost device to batch the setup of all its host 
notifiers.
This significantly reduces the device starting time, e.g. the time 
spend

on enabling notifiers reduce from 376ms to 9.1ms for a VM with 64 vCPUs
and 3 vhost-vDPA generic devices[1] (64vq per device)

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg921541.html

Signed-off-by: Longpeng 
---
  hw/virtio/vhost.c | 40 ++--
  1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 7fb008bc9e..16f8391d86 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1507,7 +1507,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)

  {
  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    int i, r, e;
+    int i, n, r, e;
  /* We will pass the notifiers to the kernel, make sure that QEMU
   * doesn't interfere.
@@ -1518,6 +1518,12 @@ int vhost_dev_enable_notifiers(struct 
vhost_dev *hdev, VirtIODevice *vdev)

  goto fail;
  }
+    /*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+    memory_region_transaction_begin();
+
  for (i = 0; i < hdev->nvqs; ++i) {
  r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), 
hdev->vq_index + i,

   true);
@@ -1527,8 +1533,12 @@ int vhost_dev_enable_notifiers(struct 
vhost_dev *hdev, VirtIODevice *vdev)

  }
  }
+    memory_region_transaction_commit();
+
  return 0;
  fail_vq:
+    /* save i for a second iteration after transaction is 
committed. */

+    n = i;
  while (--i >= 0) {
  e = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), 
hdev->vq_index + i,

   false);
@@ -1536,8 +1546,18 @@ fail_vq:
  error_report("vhost VQ %d notifier cleanup error: %d", 
i, -r);

  }
  assert (e >= 0);
-    virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), 
hdev->vq_index + i);

  }
+
+    /*
+ * The transaction expects the ioeventfds to be open when it
+ * commits. Do it now, before the cleanup loop.
+ */
+    memory_region_transaction_commit();
+
+    while (--n >= 0) {
+    virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), 
hdev->vq_index + n);

+    }
+
  virtio_device_release_ioeventfd(vdev);
  fail:
  return r;


Similarly to patch #2, removing both goto statement in this function 
(as a preliminary patch) will 1/ simplify the code 2/ simplify 
reviewing your changes, resulting in something like:


int vhost_dev_enable_notifiers(struct vhost_dev *hdev,
    VirtIODevice *vdev)
{
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
 int i, r, e;

 /* We will pass the notifiers to the kernel, make sure that QEMU
  * doesn't interfere.
  */
 r = virtio_device_grab_ioeventfd(vdev);
 if (r < 0) {
 error_report("binding does not support host notifiers");
 return r;
 }

 memory_region_transaction_begin();

 for (i = 0; i < hdev->nvqs; ++i) {
 r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus),
  hdev->vq_index + i,
  true);
 if (r < 0) {
 error_report("vhost VQ %d notifier binding failed: %d",
  i, -r);
 while (--i >= 0) {
 e = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus),
  hdev->vq_index + i,
  false);
 if (e < 0) {
 error_report(
    "vhost VQ %d notifier cleanup error: 
%d",

  i, -r);
 }
 assert (e >= 0);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus),
  hdev->vq_index + i);
 }
 virtio_device_release_ioeventfd(vdev);
 break;
 }
 }

 memory_region_transaction_commit();

 return r;
}

What do you think?

Maybe we can use vhost_dev_disable_notifiers to further simplify the 
error path ?


Good idea, but having the BusState resolved on each call seems a waste.
Eventually factor it out and pass as argument ...


And we must commit before invoking virtio_bus_cleanup_host_notifier.


... but with that info on top, finally your original patch is simpler.


Yes, I'll try in next version, thanks.


.




[QEMU][PATCH v3 3/4] xlnx-versal: Connect Xilinx VERSAL CANFD controllers

2022-12-06 Thread Vikram Garhwal
Connect CANFD0 and CANFD1 on the Versal-virt machine and update xlnx-versal-virt
document with CANFD command line examples.

Signed-off-by: Vikram Garhwal 
---
 docs/system/arm/xlnx-versal-virt.rst | 31 ++
 hw/arm/xlnx-versal-virt.c| 48 
 hw/arm/xlnx-versal.c | 37 +
 include/hw/arm/xlnx-versal.h | 12 +++
 4 files changed, 128 insertions(+)

diff --git a/docs/system/arm/xlnx-versal-virt.rst 
b/docs/system/arm/xlnx-versal-virt.rst
index 92ad10d2da..d2d1b26692 100644
--- a/docs/system/arm/xlnx-versal-virt.rst
+++ b/docs/system/arm/xlnx-versal-virt.rst
@@ -34,6 +34,7 @@ Implemented devices:
 - DDR memory
 - BBRAM (36 bytes of Battery-backed RAM)
 - eFUSE (3072 bytes of one-time field-programmable bit array)
+- 2 CANFDs
 
 QEMU does not yet model any other devices, including the PL and the AI Engine.
 
@@ -224,3 +225,33 @@ To use a different index value, N, from default of 1, add:
 
   Better yet, do not use actual product data when running guest image
   on this Xilinx Versal Virt board.
+
+Using CANFDs for Versal Virt
+
+Versal CANFD controller is developed based on SocketCAN and QEMU CAN bus
+implementation. Bus connection and socketCAN connection for each CAN module
+can be set through command lines.
+
+To connect both CANFD0 and CANFD1 on the same bus:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+
+To connect CANFD0 and CANFD1 to separate buses:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus0 -object can-bus,id=canbus1 \
+-machine canbus0=canbus0 -machine canbus1=canbus1
+
+The SocketCAN interface can connect to a Physical or a Virtual CAN interfaces 
on
+the host machine. Please check this document to learn about CAN interface on
+Linux: docs/system/devices/can.rst
+
+To connect CANFD0 and CANFD1 to host machine's CAN interface can0:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+-object can-host-socketcan,id=canhost0,if=can0,canbus=canbus
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 37fc9b919c..963ace861e 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -40,9 +40,11 @@ struct VersalVirt {
 uint32_t clk_25Mhz;
 uint32_t usb;
 uint32_t dwc;
+uint32_t canfd[2];
 } phandle;
 struct arm_boot_info binfo;
 
+CanBusState *canbus[XLNX_VERSAL_NR_CANFD];
 struct {
 bool secure;
 } cfg;
@@ -235,6 +237,33 @@ static void fdt_add_uart_nodes(VersalVirt *s)
 }
 }
 
+static void fdt_add_canfd_nodes(VersalVirt *s)
+{
+uint64_t addrs[] = { MM_CANFD0, MM_CANFD1 };
+uint32_t size[] = { MM_CANFD0_SIZE, MM_CANFD1_SIZE };
+unsigned int irqs[] = { VERSAL_CANFD0_IRQ_0, VERSAL_CANFD1_IRQ_0 };
+int i;
+
+/* Create and connect CANFD0 and CANFD1 nodes to canbus0. */
+for (i = 0; i < ARRAY_SIZE(addrs); i++) {
+char *name = g_strdup_printf("/canfd@%" PRIx64, addrs[i]);
+qemu_fdt_add_subnode(s->fdt, name);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo0", 0x40);
+qemu_fdt_setprop_cell(s->fdt, name, "enable-rx-fifo1", 0x1);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo1", 0x40);
+
+qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+   GIC_FDT_IRQ_TYPE_SPI, irqs[i],
+   GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+ 2, addrs[i], 2, size[i]);
+qemu_fdt_setprop_string(s->fdt, name, "compatible",
+"xlnx,versal-canfd");
+
+g_free(name);
+}
+}
+
 static void fdt_add_fixed_link_nodes(VersalVirt *s, char *gemname,
  uint32_t phandle)
 {
@@ -639,12 +668,17 @@ static void versal_virt_init(MachineState *machine)
 TYPE_XLNX_VERSAL);
 object_property_set_link(OBJECT(>soc), "ddr", OBJECT(machine->ram),
  _abort);
+object_property_set_link(OBJECT(>soc), "canbus0", OBJECT(s->canbus[0]),
+ _abort);
+object_property_set_link(OBJECT(>soc), "canbus1", OBJECT(s->canbus[1]),
+ _abort);
 sysbus_realize(SYS_BUS_DEVICE(>soc), _fatal);
 
 fdt_create(s);
 create_virtio_regions(s);
 fdt_add_gem_nodes(s);
 fdt_add_uart_nodes(s);
+fdt_add_canfd_nodes(s);
 fdt_add_gic_nodes(s);
 fdt_add_timer_nodes(s);
 fdt_add_zdma_nodes(s);
@@ -712,6 +746,20 @@ static void versal_virt_init(MachineState *machine)
 
 static void versal_virt_machine_instance_init(Object *obj)
 {
+VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(obj);
+
+/*
+ * User can set canbus0 and canbus1 properties to can-bus object and 
connect
+ * to socketcan(optional) 

[QEMU][PATCH v3 2/4] hw/net/can: Introduce Xilinx Versal CANFD controller

2022-12-06 Thread Vikram Garhwal
The Xilinx Versal CANFD controller is developed based on SocketCAN, QEMU CAN bus
implementation. Bus connection and socketCAN connection for each CAN module
can be set through command lines.

Signed-off-by: Vikram Garhwal 
---
 hw/net/can/meson.build |1 +
 hw/net/can/trace-events|7 +
 hw/net/can/xlnx-versal-canfd.c | 2121 
 include/hw/net/xlnx-versal-canfd.h |   90 ++
 4 files changed, 2219 insertions(+)
 create mode 100644 hw/net/can/xlnx-versal-canfd.c
 create mode 100644 include/hw/net/xlnx-versal-canfd.h

diff --git a/hw/net/can/meson.build b/hw/net/can/meson.build
index 8fabbd9ee6..8d85201cb0 100644
--- a/hw/net/can/meson.build
+++ b/hw/net/can/meson.build
@@ -5,3 +5,4 @@ softmmu_ss.add(when: 'CONFIG_CAN_PCI', if_true: 
files('can_mioe3680_pci.c'))
 softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD', if_true: files('ctucan_core.c'))
 softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD_PCI', if_true: files('ctucan_pci.c'))
 softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP', if_true: files('xlnx-zynqmp-can.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-canfd.c'))
diff --git a/hw/net/can/trace-events b/hw/net/can/trace-events
index 8346a98ab5..de64ac1b31 100644
--- a/hw/net/can/trace-events
+++ b/hw/net/can/trace-events
@@ -7,3 +7,10 @@ xlnx_can_filter_mask_pre_write(uint8_t filter_num, uint32_t 
value) "Filter%d MAS
 xlnx_can_tx_data(uint32_t id, uint8_t dlc, uint8_t db0, uint8_t db1, uint8_t 
db2, uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: 
ID: 0x%08x DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 
0x%02x"
 xlnx_can_rx_data(uint32_t id, uint32_t dlc, uint8_t db0, uint8_t db1, uint8_t 
db2, uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: 
ID: 0x%08x DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 
0x%02x"
 xlnx_can_rx_discard(uint32_t status) "Controller is not enabled for bus 
communication. Status Register: 0x%08x"
+
+# xlnx-versal-canfd.c
+xlnx_canfd_update_irq(char *path, uint32_t isr, uint32_t ier, uint32_t irq) 
"%s: ISR: 0x%08x IER: 0x%08x IRQ: 0x%08x"
+xlnx_canfd_rx_fifo_filter_reject(char *path, uint32_t id, uint8_t dlc) "%s: 
Frame: ID: 0x%08x DLC: 0x%02x"
+xlnx_canfd_rx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flags) "%s: 
Frame: ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x"
+xlnx_canfd_tx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flgas) "%s: 
Frame: ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x"
+xlnx_canfd_reset(char *path, uint32_t val) "%s: Resetting controller with 
value = 0x%08x"
diff --git a/hw/net/can/xlnx-versal-canfd.c b/hw/net/can/xlnx-versal-canfd.c
new file mode 100644
index 00..02a32d6042
--- /dev/null
+++ b/hw/net/can/xlnx-versal-canfd.c
@@ -0,0 +1,2121 @@
+/*
+ * QEMU model of the Xilinx Versal CANFD device.
+ *
+ * This implementation is based on the following datasheet:
+ * https://docs.xilinx.com/v/u/2.0-English/pg223-canfd
+ *
+ * Copyright (c) 2022 AMD Inc.
+ *
+ * Written-by: Vikram Garhwal
+ *
+ * Based on QEMU CANFD Device emulation implemented by Jin Yang, Deniz Eren and
+ * Pavel Pisa
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/irq.h"
+#include "hw/register.h"
+#include "qapi/error.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "qemu/cutils.h"
+#include "qemu/event_notifier.h"
+#include "hw/qdev-properties.h"
+#include "qom/object_interfaces.h"
+#include "migration/vmstate.h"
+#include "hw/net/xlnx-versal-canfd.h"
+#include "trace.h"
+
+/*
+ * This is done to avoid the build issues on Windows machines. The ERROR field
+ * of INTERRUPT_STATUS_REGISTER collides with a macro in the Windows build
+ * environment.
+ */
+#undef ERROR
+
+REG32(SOFTWARE_RESET_REGISTER, 0x0)
+FIELD(SOFTWARE_RESET_REGISTER, CEN, 1, 1)
+FIELD(SOFTWARE_RESET_REGISTER, SRST, 0, 1)

[QEMU][PATCH v3 3/4] xlnx-zynqmp: Connect Xilinx VERSAL CANFD controllers

2022-12-06 Thread Vikram Garhwal
Connect CANFD0 and CANFD1 on the Versal-virt machine and update xlnx-versal-virt
document with CANFD command line examples.

Signed-off-by: Vikram Garhwal 
---
 docs/system/arm/xlnx-versal-virt.rst | 31 ++
 hw/arm/xlnx-versal-virt.c| 48 
 hw/arm/xlnx-versal.c | 37 +
 include/hw/arm/xlnx-versal.h | 12 +++
 4 files changed, 128 insertions(+)

diff --git a/docs/system/arm/xlnx-versal-virt.rst 
b/docs/system/arm/xlnx-versal-virt.rst
index 92ad10d2da..372e4249f0 100644
--- a/docs/system/arm/xlnx-versal-virt.rst
+++ b/docs/system/arm/xlnx-versal-virt.rst
@@ -34,6 +34,7 @@ Implemented devices:
 - DDR memory
 - BBRAM (36 bytes of Battery-backed RAM)
 - eFUSE (3072 bytes of one-time field-programmable bit array)
+- 2 CANFDs
 
 QEMU does not yet model any other devices, including the PL and the AI Engine.
 
@@ -224,3 +225,33 @@ To use a different index value, N, from default of 1, add:
 
   Better yet, do not use actual product data when running guest image
   on this Xilinx Versal Virt board.
+
+Using CANFDs for Versal Virt
+
+Versal CANFD controller is developed based on SocketCAN and QEMU CAN bus
+implementation. Bus connection and socketCAN connection for each CAN module
+can be set through command lines.
+
+To connect both CANFD0 and CANFD1 on the same bus:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+
+To connect CANFD0 and CANFD1 to separate buses:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus0 -object can-bus,id=canbus1 \
+-machine canbus0=canbus0 -machine canbus1=canbus1
+
+SocketCAN interface can connect to a Physical or a Virtual CAN interfaces on
+host machine. Please check this document to learn about CAN interface on Linux:
+docs/system/devices/can.rst
+
+To connect CANFD0 and CANFD1 to host machine's CAN interface can0:
+
+.. code-block:: bash
+
+-object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus
+-object can-host-socketcan,id=canhost0,if=can0,canbus=canbus
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 37fc9b919c..963ace861e 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -40,9 +40,11 @@ struct VersalVirt {
 uint32_t clk_25Mhz;
 uint32_t usb;
 uint32_t dwc;
+uint32_t canfd[2];
 } phandle;
 struct arm_boot_info binfo;
 
+CanBusState *canbus[XLNX_VERSAL_NR_CANFD];
 struct {
 bool secure;
 } cfg;
@@ -235,6 +237,33 @@ static void fdt_add_uart_nodes(VersalVirt *s)
 }
 }
 
+static void fdt_add_canfd_nodes(VersalVirt *s)
+{
+uint64_t addrs[] = { MM_CANFD0, MM_CANFD1 };
+uint32_t size[] = { MM_CANFD0_SIZE, MM_CANFD1_SIZE };
+unsigned int irqs[] = { VERSAL_CANFD0_IRQ_0, VERSAL_CANFD1_IRQ_0 };
+int i;
+
+/* Create and connect CANFD0 and CANFD1 nodes to canbus0. */
+for (i = 0; i < ARRAY_SIZE(addrs); i++) {
+char *name = g_strdup_printf("/canfd@%" PRIx64, addrs[i]);
+qemu_fdt_add_subnode(s->fdt, name);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo0", 0x40);
+qemu_fdt_setprop_cell(s->fdt, name, "enable-rx-fifo1", 0x1);
+qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo1", 0x40);
+
+qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+   GIC_FDT_IRQ_TYPE_SPI, irqs[i],
+   GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+ 2, addrs[i], 2, size[i]);
+qemu_fdt_setprop_string(s->fdt, name, "compatible",
+"xlnx,versal-canfd");
+
+g_free(name);
+}
+}
+
 static void fdt_add_fixed_link_nodes(VersalVirt *s, char *gemname,
  uint32_t phandle)
 {
@@ -639,12 +668,17 @@ static void versal_virt_init(MachineState *machine)
 TYPE_XLNX_VERSAL);
 object_property_set_link(OBJECT(>soc), "ddr", OBJECT(machine->ram),
  _abort);
+object_property_set_link(OBJECT(>soc), "canbus0", OBJECT(s->canbus[0]),
+ _abort);
+object_property_set_link(OBJECT(>soc), "canbus1", OBJECT(s->canbus[1]),
+ _abort);
 sysbus_realize(SYS_BUS_DEVICE(>soc), _fatal);
 
 fdt_create(s);
 create_virtio_regions(s);
 fdt_add_gem_nodes(s);
 fdt_add_uart_nodes(s);
+fdt_add_canfd_nodes(s);
 fdt_add_gic_nodes(s);
 fdt_add_timer_nodes(s);
 fdt_add_zdma_nodes(s);
@@ -712,6 +746,20 @@ static void versal_virt_init(MachineState *machine)
 
 static void versal_virt_machine_instance_init(Object *obj)
 {
+VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(obj);
+
+/*
+ * User can set canbus0 and canbus1 properties to can-bus object and 
connect
+ * to socketcan(optional) interface via 

[QEMU][PATCH v3 1/4] MAINTAINERS: Include canfd tests under Xilinx CAN

2022-12-06 Thread Vikram Garhwal
Signed-off-by: Vikram Garhwal 
Reviewed-by: Peter Maydell 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6966490c94..a76221f260 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1753,7 +1753,7 @@ M: Francisco Iglesias 
 S: Maintained
 F: hw/net/can/xlnx-*
 F: include/hw/net/xlnx-*
-F: tests/qtest/xlnx-can-test*
+F: tests/qtest/xlnx-can*-test*
 
 EDU
 M: Jiri Slaby 
-- 
2.17.1




[QEMU][PATCH v3 4/4] tests/qtest: Introduce tests for Xilinx VERSAL CANFD controller

2022-12-06 Thread Vikram Garhwal
The QTests perform three tests on the Xilinx VERSAL CANFD controller:
Tests the CANFD controllers in loopback.
Tests the CANFD controllers in normal mode with CAN frame.
Tests the CANFD controllers in normal mode with CANFD frame.

Signed-off-by: Vikram Garhwal 
Acked-by: Thomas Huth 
Reviewed-by: Francisco Iglesias 
---
 tests/qtest/meson.build   |   1 +
 tests/qtest/xlnx-canfd-test.c | 422 ++
 2 files changed, 423 insertions(+)
 create mode 100644 tests/qtest/xlnx-canfd-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c07a5b1a5f..9486ebee24 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -213,6 +213,7 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-test'] : []) +\
   (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-swtpm-test'] : []) +  \
   (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 
'fuzz-xlnx-dp-test'] : []) + \
+  (config_all_devices.has_key('CONFIG_XLNX_VERSAL') ? ['xlnx-canfd-test'] : 
[]) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
diff --git a/tests/qtest/xlnx-canfd-test.c b/tests/qtest/xlnx-canfd-test.c
new file mode 100644
index 00..4ccc0267d4
--- /dev/null
+++ b/tests/qtest/xlnx-canfd-test.c
@@ -0,0 +1,422 @@
+/* SPDX-License-Identifier: MIT
+ *
+ * QTests for the Xilinx Versal CANFD controller.
+ *
+ * Copyright (c) 2022 AMD Inc.
+ *
+ * Written-by: Vikram Garhwal
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+/* Base address. */
+#define CANFD0_BASE_ADDR0xff06
+#define CANFD1_BASE_ADDR0xff07
+
+/* Register addresses. */
+#define R_SRR_OFFSET0x00
+#define R_MSR_OFFSET0x04
+#define R_FILTER_CONTROL_REGISTER   0xe0
+#define R_SR_OFFSET 0x18
+#define R_ISR_OFFSET0x1c
+#define R_IER_OFFSET0x20
+#define R_ICR_OFFSET0x24
+#define R_TX_READY_REQ_REGISTER 0x90
+#define RX_FIFO_STATUS_REGISTER 0xe8
+#define R_TXID_OFFSET   0x100
+#define R_TXDLC_OFFSET  0x104
+#define R_TXDATA1_OFFSET0x108
+#define R_TXDATA2_OFFSET0x10c
+#define R_AFMR_REGISTER00xa00
+#define R_AFIR_REGISTER00xa04
+#define R_RX0_ID_OFFSET 0x2100
+#define R_RX0_DLC_OFFSET0x2104
+#define R_RX0_DATA1_OFFSET  0x2108
+#define R_RX0_DATA2_OFFSET  0x210c
+
+/* CANFD modes. */
+#define SRR_CONFIG_MODE 0x00
+#define MSR_NORMAL_MODE 0x00
+#define MSR_LOOPBACK_MODE   (1 << 1)
+#define ENABLE_CANFD(1 << 1)
+
+/* CANFD status. */
+#define STATUS_CONFIG_MODE  (1 << 0)
+#define STATUS_NORMAL_MODE  (1 << 3)
+#define STATUS_LOOPBACK_MODE(1 << 1)
+#define ISR_TXOK(1 << 1)
+#define ISR_RXOK(1 << 4)
+
+#define ENABLE_ALL_FILTERS  0x
+#define ENABLE_ALL_INTERRUPTS   0x
+
+/* We are sending one canfd message. */
+#define TX_READY_REG_VAL0x1
+
+#define FIRST_RX_STORE_INDEX0x1
+#define STATUS_REG_MASK 0xf
+#define DLC_FD_BIT_SHIFT0x1b
+#define DLC_FD_BIT_MASK 0xf800
+#define FIFO_STATUS_READ_INDEX_MASK 0x3f
+#define FIFO_STATUS_FILL_LEVEL_MASK 0x7f00
+#define FILL_LEVEL_SHIFT0x8
+
+/* CANFD frame size ID, DLC and 16 DATA word. */
+#define CANFD_FRAME_SIZE18
+/* CAN frame size ID, DLC and 2 DATA word. */
+#define CAN_FRAME_SIZE  4
+
+/* Set the filters for CANFD controller. */
+static void 

Re: [PATCH for-8.0] hw/rtc/mc146818rtc: Make this rtc device target independent

2022-12-06 Thread Bernhard Beschow



Am 6. Dezember 2022 20:06:41 UTC schrieb Thomas Huth :
>The only code that is really, really target dependent is the apic-related
>code in rtc_policy_slew_deliver_irq(). By moving this code into the hw/i386/
>folder (renamed to rtc_apic_policy_slew_deliver_irq()) and passing this
>function as parameter to mc146818_rtc_init(), we can make the RTC completely
>target-independent.
>
>Signed-off-by: Thomas Huth 
>---
> include/hw/rtc/mc146818rtc.h |  7 +--
> hw/alpha/dp264.c |  2 +-
> hw/hppa/machine.c|  2 +-
> hw/i386/microvm.c|  3 ++-
> hw/i386/pc.c | 10 +-
> hw/mips/jazz.c   |  2 +-
> hw/ppc/pnv.c |  2 +-
> hw/rtc/mc146818rtc.c | 34 +++---
> hw/rtc/meson.build   |  3 +--
> 9 files changed, 32 insertions(+), 33 deletions(-)
>
>diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
>index 1db0fcee92..c687953cc4 100644
>--- a/include/hw/rtc/mc146818rtc.h
>+++ b/include/hw/rtc/mc146818rtc.h
>@@ -46,14 +46,17 @@ struct RTCState {
> Notifier clock_reset_notifier;
> LostTickPolicy lost_tick_policy;
> Notifier suspend_notifier;
>+bool (*policy_slew_deliver_irq)(RTCState *s);
> QLIST_ENTRY(RTCState) link;
> };
> 
> #define RTC_ISA_IRQ 8
> 
>-ISADevice *mc146818_rtc_init(ISABus *bus, int base_year,
>- qemu_irq intercept_irq);
>+ISADevice *mc146818_rtc_init(ISABus *bus, int base_year, qemu_irq 
>intercept_irq,
>+ bool (*policy_slew_deliver_irq)(RTCState *s));
> void rtc_set_memory(ISADevice *dev, int addr, int val);
> int rtc_get_memory(ISADevice *dev, int addr);
>+bool rtc_apic_policy_slew_deliver_irq(RTCState *s);
>+void qmp_rtc_reset_reinjection(Error **errp);
> 
> #endif /* HW_RTC_MC146818RTC_H */
>diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
>index c502c8c62a..8723942b52 100644
>--- a/hw/alpha/dp264.c
>+++ b/hw/alpha/dp264.c
>@@ -118,7 +118,7 @@ static void clipper_init(MachineState *machine)
> qdev_connect_gpio_out(i82378_dev, 0, isa_irq);
> 
> /* Since we have an SRM-compatible PALcode, use the SRM epoch.  */
>-mc146818_rtc_init(isa_bus, 1900, rtc_irq);
>+mc146818_rtc_init(isa_bus, 1900, rtc_irq, NULL);
> 
> /* VGA setup.  Don't bother loading the bios.  */
> pci_vga_init(pci_bus);
>diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
>index de1cc7ab71..311031714a 100644
>--- a/hw/hppa/machine.c
>+++ b/hw/hppa/machine.c
>@@ -232,7 +232,7 @@ static void machine_hppa_init(MachineState *machine)
> assert(isa_bus);
> 
> /* Realtime clock, used by firmware for PDC_TOD call. */
>-mc146818_rtc_init(isa_bus, 2000, NULL);
>+mc146818_rtc_init(isa_bus, 2000, NULL, NULL);
> 
> /* Serial ports: Lasi and Dino use a 7.272727 MHz clock. */
> serial_mm_init(addr_space, LASI_UART_HPA + 0x800, 0,
>diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>index 170a331e3f..d0ed4dca50 100644
>--- a/hw/i386/microvm.c
>+++ b/hw/i386/microvm.c
>@@ -267,7 +267,8 @@ static void microvm_devices_init(MicrovmMachineState *mms)
> 
> if (mms->rtc == ON_OFF_AUTO_ON ||
> (mms->rtc == ON_OFF_AUTO_AUTO && !kvm_enabled())) {
>-rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
>+rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL,
>+  rtc_apic_policy_slew_deliver_irq);
> microvm_set_rtc(mms, rtc_state);
> }
> 
>diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>index 546b703cb4..650e7bc199 100644
>--- a/hw/i386/pc.c
>+++ b/hw/i386/pc.c
>@@ -1244,6 +1244,13 @@ static void pc_superio_init(ISABus *isa_bus, bool 
>create_fdctrl,
> g_free(a20_line);
> }
> 
>+bool rtc_apic_policy_slew_deliver_irq(RTCState *s)
>+{
>+apic_reset_irq_delivered();
>+qemu_irq_raise(s->irq);
>+return apic_get_irq_delivered();
>+}
>+
> void pc_basic_device_init(struct PCMachineState *pcms,
>   ISABus *isa_bus, qemu_irq *gsi,
>   ISADevice **rtc_state,
>@@ -1299,7 +1306,8 @@ void pc_basic_device_init(struct PCMachineState *pcms,
> pit_alt_irq = qdev_get_gpio_in(hpet, HPET_LEGACY_PIT_INT);
> rtc_irq = qdev_get_gpio_in(hpet, HPET_LEGACY_RTC_INT);
> }
>-*rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq);
>+*rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq,
>+   rtc_apic_policy_slew_deliver_irq);

In my PIIX consolidation series [1] I'm instantiating the RTC in the south 
bridges since embedding the struct in the host device is the preferred new way. 
In the end there is one initialization shared by both PIIX3 and -4. While PIIX3 
(PC) will require rtc_apic_policy_slew_deliver_irq, PIIX4 (Malta) won't. 
Furthermore, my goal ist to reuse PIIX4 in the PC machine to eliminate today's 
Frankenstein PIIX4 ACPI controller. Any idea how to solve this?

Thanks,
Bernhard

[1] 

Re: [PATCH for-8.0] hw/rtc/mc146818rtc: Make this rtc device target independent

2022-12-06 Thread BALATON Zoltan

On Tue, 6 Dec 2022, Thomas Huth wrote:

The only code that is really, really target dependent is the apic-related
code in rtc_policy_slew_deliver_irq(). By moving this code into the hw/i386/
folder (renamed to rtc_apic_policy_slew_deliver_irq()) and passing this
function as parameter to mc146818_rtc_init(), we can make the RTC completely
target-independent.

Signed-off-by: Thomas Huth 
---
include/hw/rtc/mc146818rtc.h |  7 +--
hw/alpha/dp264.c |  2 +-
hw/hppa/machine.c|  2 +-
hw/i386/microvm.c|  3 ++-
hw/i386/pc.c | 10 +-
hw/mips/jazz.c   |  2 +-
hw/ppc/pnv.c |  2 +-
hw/rtc/mc146818rtc.c | 34 +++---
hw/rtc/meson.build   |  3 +--
9 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 1db0fcee92..c687953cc4 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -46,14 +46,17 @@ struct RTCState {
Notifier clock_reset_notifier;
LostTickPolicy lost_tick_policy;
Notifier suspend_notifier;
+bool (*policy_slew_deliver_irq)(RTCState *s);
QLIST_ENTRY(RTCState) link;
};

#define RTC_ISA_IRQ 8

-ISADevice *mc146818_rtc_init(ISABus *bus, int base_year,
- qemu_irq intercept_irq);
+ISADevice *mc146818_rtc_init(ISABus *bus, int base_year, qemu_irq 
intercept_irq,
+ bool (*policy_slew_deliver_irq)(RTCState *s));
void rtc_set_memory(ISADevice *dev, int addr, int val);
int rtc_get_memory(ISADevice *dev, int addr);
+bool rtc_apic_policy_slew_deliver_irq(RTCState *s);
+void qmp_rtc_reset_reinjection(Error **errp);

#endif /* HW_RTC_MC146818RTC_H */
diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
index c502c8c62a..8723942b52 100644
--- a/hw/alpha/dp264.c
+++ b/hw/alpha/dp264.c
@@ -118,7 +118,7 @@ static void clipper_init(MachineState *machine)
qdev_connect_gpio_out(i82378_dev, 0, isa_irq);

/* Since we have an SRM-compatible PALcode, use the SRM epoch.  */
-mc146818_rtc_init(isa_bus, 1900, rtc_irq);
+mc146818_rtc_init(isa_bus, 1900, rtc_irq, NULL);

/* VGA setup.  Don't bother loading the bios.  */
pci_vga_init(pci_bus);
diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index de1cc7ab71..311031714a 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -232,7 +232,7 @@ static void machine_hppa_init(MachineState *machine)
assert(isa_bus);

/* Realtime clock, used by firmware for PDC_TOD call. */
-mc146818_rtc_init(isa_bus, 2000, NULL);
+mc146818_rtc_init(isa_bus, 2000, NULL, NULL);

/* Serial ports: Lasi and Dino use a 7.272727 MHz clock. */
serial_mm_init(addr_space, LASI_UART_HPA + 0x800, 0,
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..d0ed4dca50 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -267,7 +267,8 @@ static void microvm_devices_init(MicrovmMachineState *mms)

if (mms->rtc == ON_OFF_AUTO_ON ||
(mms->rtc == ON_OFF_AUTO_AUTO && !kvm_enabled())) {
-rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
+rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL,
+  rtc_apic_policy_slew_deliver_irq);
microvm_set_rtc(mms, rtc_state);
}

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 546b703cb4..650e7bc199 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1244,6 +1244,13 @@ static void pc_superio_init(ISABus *isa_bus, bool 
create_fdctrl,
g_free(a20_line);
}

+bool rtc_apic_policy_slew_deliver_irq(RTCState *s)
+{
+apic_reset_irq_delivered();
+qemu_irq_raise(s->irq);
+return apic_get_irq_delivered();
+}
+
void pc_basic_device_init(struct PCMachineState *pcms,
  ISABus *isa_bus, qemu_irq *gsi,
  ISADevice **rtc_state,
@@ -1299,7 +1306,8 @@ void pc_basic_device_init(struct PCMachineState *pcms,
pit_alt_irq = qdev_get_gpio_in(hpet, HPET_LEGACY_PIT_INT);
rtc_irq = qdev_get_gpio_in(hpet, HPET_LEGACY_RTC_INT);
}
-*rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq);
+*rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq,
+   rtc_apic_policy_slew_deliver_irq);

qemu_register_boot_set(pc_boot_set, *rtc_state);

diff --git a/hw/mips/jazz.c b/hw/mips/jazz.c
index 6aefe9a61b..50fbd57b23 100644
--- a/hw/mips/jazz.c
+++ b/hw/mips/jazz.c
@@ -356,7 +356,7 @@ static void mips_jazz_init(MachineState *machine,
fdctrl_init_sysbus(qdev_get_gpio_in(rc4030, 1), 0x80003000, fds);

/* Real time clock */
-mc146818_rtc_init(isa_bus, 1980, NULL);
+mc146818_rtc_init(isa_bus, 1980, NULL, NULL);
memory_region_init_io(rtc, NULL, _ops, NULL, "rtc", 0x1000);
memory_region_add_subregion(address_space, 0x80004000, rtc);

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 3d01e26f84..c5482554b7 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -992,7 +992,7 

Re: [QEMU][PATCH v2 2/5] hw/net/can: Introduce Xilinx Versal CANFD controller

2022-12-06 Thread Vikram Garhwal

Hi Peter,

On 11/10/22 9:07 AM, Peter Maydell wrote:

On Sat, 22 Oct 2022 at 06:48, Vikram Garhwal  wrote:

The Xilinx Versal CANFD controller is developed based on SocketCAN, QEMU CAN bus
implementation. Bus connection and socketCAN connection for each CAN module
can be set through command lines.

Signed-off-by: Vikram Garhwal
---
  hw/net/can/meson.build |1 +
  hw/net/can/trace-events|7 +
  hw/net/can/xlnx-versal-canfd.c | 2160 
  include/hw/net/xlnx-versal-canfd.h |   91 ++
  4 files changed, 2259 insertions(+)
  create mode 100644 hw/net/can/xlnx-versal-canfd.c
  create mode 100644 include/hw/net/xlnx-versal-canfd.h

diff --git a/hw/net/can/meson.build b/hw/net/can/meson.build
index 8fabbd9ee6..8d85201cb0 100644
--- a/hw/net/can/meson.build
+++ b/hw/net/can/meson.build
@@ -5,3 +5,4 @@ softmmu_ss.add(when: 'CONFIG_CAN_PCI', if_true: 
files('can_mioe3680_pci.c'))
  softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD', if_true: files('ctucan_core.c'))
  softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD_PCI', if_true: 
files('ctucan_pci.c'))
  softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP', if_true: 
files('xlnx-zynqmp-can.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-canfd.c'))
diff --git a/hw/net/can/trace-events b/hw/net/can/trace-events
index 8346a98ab5..de64ac1b31 100644
--- a/hw/net/can/trace-events
+++ b/hw/net/can/trace-events
@@ -7,3 +7,10 @@ xlnx_can_filter_mask_pre_write(uint8_t filter_num, uint32_t value) 
"Filter%d MAS
  xlnx_can_tx_data(uint32_t id, uint8_t dlc, uint8_t db0, uint8_t db1, uint8_t db2, 
uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: ID: 0x%08x 
DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x"
  xlnx_can_rx_data(uint32_t id, uint32_t dlc, uint8_t db0, uint8_t db1, uint8_t db2, 
uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: ID: 0x%08x 
DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x"
  xlnx_can_rx_discard(uint32_t status) "Controller is not enabled for bus 
communication. Status Register: 0x%08x"
+
+# xlnx-versal-canfd.c
+xlnx_canfd_update_irq(char *path, uint32_t isr, uint32_t ier, uint32_t irq) "%s: 
ISR: 0x%08x IER: 0x%08x IRQ: 0x%08x"
+xlnx_canfd_rx_fifo_filter_reject(char *path, uint32_t id, uint8_t dlc) "%s: Frame: 
ID: 0x%08x DLC: 0x%02x"
+xlnx_canfd_rx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flags) "%s: Frame: 
ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x"
+xlnx_canfd_tx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flgas) "%s: Frame: 
ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x"
+xlnx_canfd_reset(char *path, uint32_t val) "%s: Resetting controller with value = 
0x%08x"
diff --git a/hw/net/can/xlnx-versal-canfd.c b/hw/net/can/xlnx-versal-canfd.c
new file mode 100644
index 00..592c61fcf3
--- /dev/null
+++ b/hw/net/can/xlnx-versal-canfd.c
@@ -0,0 +1,2160 @@
+/*
+ * QEMU model of the Xilinx Versal CANFD device.
+ *
+ * This implementation is based on the following datasheet:
+ *https://docs.xilinx.com/v/u/2.0-English/pg223-canfd
+ *
+ * Copyright (c) 2022 AMD Inc.
+ *
+ * Written-by: Vikram Garhwal
+ *
+ * Based on QEMU CANFD Device emulation implemented by Jin Yang, Deniz Eren and
+ * Pavel Pisa
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/irq.h"
+#include "hw/register.h"
+#include "qapi/error.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "qemu/cutils.h"
+#include "qemu/event_notifier.h"
+#include "hw/qdev-properties.h"
+#include "qom/object_interfaces.h"
+#include "migration/vmstate.h"
+#include "hw/net/xlnx-versal-canfd.h"
+#include "trace.h"
+
+/* This is done to avoid the build issues on Windows machines. The ERROR field
+ * of INTERRUPT_STATUS_REGISTER collides with a macro in the Windows build
+ * environment.
+ */

QEMU coding style wants the initial "/*" on 

Re: [PATCH v3 02/13] tcg/s390x: Remove TCG_REG_TB

2022-12-06 Thread Richard Henderson

On 12/6/22 13:29, Ilya Leoshkevich wrote:

On Thu, Dec 01, 2022 at 10:51:49PM -0800, Richard Henderson wrote:

This reverts 829e1376d940 ("tcg/s390: Introduce TCG_REG_TB"), and
several follow-up patches.  The primary motivation is to reduce the
less-tested code paths, pre-z10.  Secondarily, this allows the
unconditional use of TCG_TARGET_HAS_direct_jump, which might be more
important for performance than any slight increase in code size.

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.h |   2 +-
  tcg/s390x/tcg-target.c.inc | 176 +
  2 files changed, 23 insertions(+), 155 deletions(-)


Reviewed-by: Ilya Leoshkevich 

I have a few questions/ideas for the future below.
  

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 22d70d431b..645f522058 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -103,7 +103,7 @@ extern uint64_t s390_facilities[3];
  #define TCG_TARGET_HAS_mulsh_i32  0
  #define TCG_TARGET_HAS_extrl_i64_i32  0
  #define TCG_TARGET_HAS_extrh_i64_i32  0
-#define TCG_TARGET_HAS_direct_jumpHAVE_FACILITY(GEN_INST_EXT)
+#define TCG_TARGET_HAS_direct_jump1


This change doesn't seem to affect that, but what is the minimum
supported s390x qemu host? z900?


Possibly z990, if I'm reading the gcc processor_flags_table[] correctly; 
long-displacement-facility is definitely a minimum.


We probably should revisit what the minimum for TCG should be, assert those features at 
startup, and drop the corresponding runtime tests.



I did some benchmarking of various ways to load constants in context of
GCC in the past, and it turned out that LLIHF+OILF is more efficient
than literal pool [1].


Interesting.  If we include extended-immediate-facility (base_GEN9_GA1, z9-109?) in the 
bare minimum that would definitely simplify a few things.



-/* Use the constant pool if USE_REG_TB, but not for small constants.  */
-if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
-if (type == TCG_TYPE_I32) {
-tcg_out_insn(s, RR, XR, dest, TCG_TMP0);
-} else {
-tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
-}
-} else if (USE_REG_TB) {
-tcg_out_insn(s, RXY, XG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
-new_pool_label(s, val, R_390_20, s->code_ptr - 2,
-   tcg_tbrel_diff(s, NULL));
+tcg_out_movi(s, type, TCG_TMP0, val);
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, XR, dest, TCG_TMP0);
  } else {
-/* Perform the xor by parts.  */
-tcg_debug_assert(HAVE_FACILITY(EXT_IMM));
-if (val & 0x) {
-tcg_out_insn(s, RIL, XILF, dest, val);
-}
-if (val > 0x) {
-tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
-}
+tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
  }
  }


Wouldn't it be worth keeping XILF/XIFH here?


I don't know.  It's difficult for me to guess whether a dependency chain like

val -> xor -> xor

(3 insns with serial dependencies) is better than

val   --> xor
load  -/

(3 insns, but only one serial dependency) is better.  But there may also be instruction 
fusion going on at the micro-architectural level, so that there's really only one xor.


If you have suggestions, I'm all ears.


I don't have any numbers right now, but it looks more compact/efficient
than a load + XGR.


If we assume general-instruction-extension-facility (z10?), LGRL + XGR is smaller than 
XILF + XIFH, ignoring the constant pool entry which might be shared, and modulo the µarch 
questions above.




Same for OGR above; I even wonder if both implementations could be
unified.


Sadly not, because of OILL et al.  There are no 16-bit xor immediate insns.


+/*
+ * branch displacement must be aligned for atomic patching;
+ * see if we need to add extra nop before branch
+ */
+if (!QEMU_PTR_IS_ALIGNED(s->code_ptr + 1, 4)) {
+tcg_out16(s, NOP);
  }
+tcg_out16(s, RIL_BRCL | (S390_CC_ALWAYS << 4));
+s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+tcg_out32(s, 0);
  set_jmp_reset_offset(s, a0);


This seems to work in practice, but I don't think patching branch
offsets is allowed by PoP (in a multi-threaded environment). For
example, I had to do [2] in order to work around this limitation in
ftrace.


Really?  How does the processor distinguish between overwriting opcode/condition vs 
overwriting immediate operand when invalidating cached instructions?


If overwriting operand truly isn't correct, then I think we have to use indirect branch 
always for goto_tb.



A third benefit seems to be that we now have one more register to
allocate.


Yes.  It's call clobbered, so it isn't live so often, but sometimes.


r~



[PATCH v3] intel-iommu: Document iova_tree

2022-12-06 Thread Peter Xu
It seems not super clear on when iova_tree is used, and why.  Add a rich
comment above iova_tree to track why we needed the iova_tree, and when we
need it.

Also comment for the map/unmap messages, on how they're used and
implications (e.g. unmap can be larger than the mapped ranges).

Suggested-by: Jason Wang 
Signed-off-by: Peter Xu 
---
v3:
- Adjust according to Eric's comment
---
 include/exec/memory.h | 28 ++
 include/hw/i386/intel_iommu.h | 38 ++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..269ecb873b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,34 @@ struct IOMMUTLBEntry {
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
+ *
+ * Normally there're two use cases for the notifiers:
+ *
+ *   (1) When the device needs accurate synchronizations of the vIOMMU page
+ *   tables, it needs to register with both MAP|UNMAP notifies (which
+ *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).
+ *
+ *   Regarding to accurate synchronization, it's when the notified
+ *   device maintains a shadow page table and must be notified on each
+ *   guest MAP (page table entry creation) and UNMAP (invalidation)
+ *   events (e.g. VFIO). Both notifications must be accurate so that
+ *   the shadow page table is fully in sync with the guest view.
+ *
+ *   (2) When the device doesn't need accurate synchronizations of the
+ *   vIOMMU page tables, it needs to register only with UNMAP or
+ *   DEVIOTLB_UNMAP notifies.
+ *
+ *   It's when the device maintains a cache of IOMMU translations
+ *   (IOTLB) and is able to fill that cache by requesting translations
+ *   from the vIOMMU through a protocol similar to ATS (Address
+ *   Translation Service).
+ *
+ *   Note that in this mode the vIOMMU will not maintain a shadowed
+ *   page table for the address space, and the UNMAP messages can be
+ *   actually larger than the real invalidations (just like how the
+ *   Linux IOMMU driver normally works, where an invalidation can be
+ *   enlarged as long as it still covers the target range).  The IOMMU
+ *   notifiee should be able to take care of over-sized invalidations.
  */
 typedef enum {
 IOMMU_NOTIFIER_NONE = 0,
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 46d973e629..89dcbc5e1e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -109,7 +109,43 @@ struct VTDAddressSpace {
 QLIST_ENTRY(VTDAddressSpace) next;
 /* Superset of notifier flags that this address space has */
 IOMMUNotifierFlag notifier_flags;
-IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
+/*
+ * @iova_tree traces mapped IOVA ranges.
+ *
+ * The tree is not needed if no MAP notifier is registered with current
+ * VTD address space, because all guest invalidate commands can be
+ * directly passed to the IOMMU UNMAP notifiers without any further
+ * reshuffling.
+ *
+ * The tree OTOH is required for MAP typed iommu notifiers for a few
+ * reasons.
+ *
+ * Firstly, there's no way to identify whether an PSI (Page Selective
+ * Invalidations) or DSI (Domain Selective Invalidations) event is an
+ * MAP or UNMAP event within the message itself.  Without having prior
+ * knowledge of existing state vIOMMU doesn't know whether it should
+ * notify MAP or UNMAP for a PSI message it received when caching mode
+ * is enabled (for MAP notifiers).
+ *
+ * Secondly, PSI messages received from guest driver can be enlarged in
+ * range, covers but not limited to what the guest driver wanted to
+ * invalidate.  When the range to invalidates gets bigger than the
+ * limit of a PSI message, it can even become a DSI which will
+ * invalidate the whole domain.  If the vIOMMU directly notifies the
+ * registered device with the unmodified range, it may confuse the
+ * registered drivers (e.g. vfio-pci) on either:
+ *
+ *   (1) Trying to map the same region more than once (for
+ *   VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or,
+ *
+ *   (2) Trying to UNMAP a range that is still partially mapped.
+ *
+ * That accuracy is not required for UNMAP-only notifiers, but it is a
+ * must-to-have for notifiers registered with MAP events, because the
+ * vIOMMU needs to make sure the shadow page table is always in sync
+ * with the guest IOMMU pgtables for a device.
+ */
+IOVATree *iova_tree;
 };
 
 struct VTDIOTLBEntry {
-- 
2.37.3




Re: [PATCH v2] intel-iommu: Document iova_tree

2022-12-06 Thread Peter Xu
On Tue, Dec 06, 2022 at 05:10:39PM -0500, Peter Xu wrote:
> It seems not super clear on when iova_tree is used, and why.  Add a rich
> comment above iova_tree to track why we needed the iova_tree, and when we
> need it.
> 
> Also comment for the map/unmap messages, on how they're used and
> implications (e.g. unmap can be larger than the mapped ranges).
> 
> Suggested-by: Jason Wang 
> Signed-off-by: Peter Xu 
> ---
> v2:
> - Adjust according to Eric's comment

This is the old version.. sorry, will repost.

-- 
Peter Xu




[PATCH v2] intel-iommu: Document iova_tree

2022-12-06 Thread Peter Xu
It seems not super clear on when iova_tree is used, and why.  Add a rich
comment above iova_tree to track why we needed the iova_tree, and when we
need it.

Also comment for the map/unmap messages, on how they're used and
implications (e.g. unmap can be larger than the mapped ranges).

Suggested-by: Jason Wang 
Signed-off-by: Peter Xu 
---
v2:
- Adjust according to Eric's comment
---
 include/exec/memory.h | 20 ++
 include/hw/i386/intel_iommu.h | 38 ++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..a8489feb51 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,26 @@ struct IOMMUTLBEntry {
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
+ *
+ * Normally there're two use cases for the notifiers:
+ *
+ *   (1) When the device needs accurate synchronizations of the vIOMMU page
+ *   tables, it needs to register with both MAP|UNMAP notifies (which
+ *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
+ *   events are registered, the notifications will be accurate but
+ *   there's overhead on synchronizing the guest vIOMMU page tables.
+ *
+ *   (2) When the device doesn't need accurate synchronizations of the
+ *   vIOMMU page tables (when the device can both cache translations
+ *   and requesting to translate dynamically during DMA process), it
+ *   needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
+ *   Note that in this working mode the vIOMMU will not maintain a
+ *   shadowed page table for the address space, and the UNMAP messages
+ *   can be actually larger than the real invalidations (just like how
+ *   the Linux IOMMU driver normally works, where an invalidation can
+ *   be enlarged as long as it still covers the target range).  The
+ *   IOMMU notifiee should be able to take care of over-sized
+ *   invalidations.
  */
 typedef enum {
 IOMMU_NOTIFIER_NONE = 0,
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 46d973e629..89dcbc5e1e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -109,7 +109,43 @@ struct VTDAddressSpace {
 QLIST_ENTRY(VTDAddressSpace) next;
 /* Superset of notifier flags that this address space has */
 IOMMUNotifierFlag notifier_flags;
-IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
+/*
+ * @iova_tree traces mapped IOVA ranges.
+ *
+ * The tree is not needed if no MAP notifier is registered with current
+ * VTD address space, because all guest invalidate commands can be
+ * directly passed to the IOMMU UNMAP notifiers without any further
+ * reshuffling.
+ *
+ * The tree OTOH is required for MAP typed iommu notifiers for a few
+ * reasons.
+ *
+ * Firstly, there's no way to identify whether an PSI (Page Selective
+ * Invalidations) or DSI (Domain Selective Invalidations) event is an
+ * MAP or UNMAP event within the message itself.  Without having prior
+ * knowledge of existing state vIOMMU doesn't know whether it should
+ * notify MAP or UNMAP for a PSI message it received when caching mode
+ * is enabled (for MAP notifiers).
+ *
+ * Secondly, PSI messages received from guest driver can be enlarged in
+ * range, covers but not limited to what the guest driver wanted to
+ * invalidate.  When the range to invalidates gets bigger than the
+ * limit of a PSI message, it can even become a DSI which will
+ * invalidate the whole domain.  If the vIOMMU directly notifies the
+ * registered device with the unmodified range, it may confuse the
+ * registered drivers (e.g. vfio-pci) on either:
+ *
+ *   (1) Trying to map the same region more than once (for
+ *   VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or,
+ *
+ *   (2) Trying to UNMAP a range that is still partially mapped.
+ *
+ * That accuracy is not required for UNMAP-only notifiers, but it is a
+ * must-to-have for notifiers registered with MAP events, because the
+ * vIOMMU needs to make sure the shadow page table is always in sync
+ * with the guest IOMMU pgtables for a device.
+ */
+IOVATree *iova_tree;
 };
 
 struct VTDIOTLBEntry {
-- 
2.37.3




Re: [PATCH] intel-iommu: Document iova_tree

2022-12-06 Thread Peter Xu
On Tue, Dec 06, 2022 at 05:28:01PM +0100, Eric Auger wrote:
> 
> 
> On 12/6/22 17:05, Peter Xu wrote:
> > On Tue, Dec 06, 2022 at 02:16:32PM +0100, Eric Auger wrote:
> >> Hi Peter,
> >> On 12/6/22 00:28, Peter Xu wrote:
> >>> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
>  On Fri, Dec 2, 2022 at 12:25 AM Peter Xu  wrote:
> > It seems not super clear on when iova_tree is used, and why.  Add a rich
> > comment above iova_tree to track why we needed the iova_tree, and when 
> > we
> > need it.
> >
> > Suggested-by: Jason Wang 
> > Signed-off-by: Peter Xu 
> > ---
> >  include/hw/i386/intel_iommu.h | 30 +-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/hw/i386/intel_iommu.h 
> > b/include/hw/i386/intel_iommu.h
> > index 46d973e629..8d130ab2e3 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> >  QLIST_ENTRY(VTDAddressSpace) next;
> >  /* Superset of notifier flags that this address space has */
> >  IOMMUNotifierFlag notifier_flags;
> > -IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
> > +/*
> > + * @iova_tree traces mapped IOVA ranges.
> > + *
> > + * The tree is not needed if no MAP notifiers is registered with
> > + * current VTD address space, because all UNMAP (including iotlb or
> > + * dev-iotlb) events can be transparently delivered to !MAP iommu
> > + * notifiers.
>  So this means the UNMAP notifier doesn't need to be as accurate as
>  MAP. (Should we document it in the notifier headers)?
> >>> Yes.
> >>>
>  For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
> >>> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> >>> b+Y] as long as the range covers [a, b]?
> >>>
> > + *
> > + * The tree OTOH is required for MAP typed iommu notifiers for a 
> > few
> > + * reasons.
> > + *
> > + * Firstly, there's no way to identify whether an PSI event is MAP 
> > or
> > + * UNMAP within the PSI message itself.  Without having prior 
> > knowledge
> > + * of existing state vIOMMU doesn't know whether it should notify 
> > MAP
> > + * or UNMAP for a PSI message it received.
> > + *
> > + * Secondly, PSI received from guest driver (or even a large PSI 
> > can
> > + * grow into a DSI at least with Linux intel-iommu driver) can be
> > + * larger in range than the newly mapped ranges for either MAP or 
> > UNMAP
> > + * events.
>  Yes, so I think we need a document that the UNMAP handler should be
>  prepared for this.
> >>> How about I squash below into this same patch?
> >>>
> >>> diff --git a/include/exec/memory.h b/include/exec/memory.h
> >>> index 91f8a2395a..c83bd11a68 100644
> >>> --- a/include/exec/memory.h
> >>> +++ b/include/exec/memory.h
> >>> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
> >>>  /*
> >>>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> >>>   * register with one or multiple IOMMU Notifier capability bit(s).
> >>> + *
> >>> + * Normally there're two use cases for the notifiers:
> >>> + *
> >>> + *   (1) When the device needs accurate synchronizations of the vIOMMU 
> >>> page
> >> accurate synchronizations sound too vague & subjective to me.
> > Suggestions?
> Well I would say:
> when the notified device maintains a shadow page table and must to be

s/to//

> notified on each guest MAP (page table entry creation) and UNMAP
> (invalidation) events (VFIO). Both notifications must be accurate so
> that the shadow page table is fully in sync with the guest view.

Thanks, I'll try to squash this into the new version.

> >
> >>> + *   tables, it needs to register with both MAP|UNMAP notifies (which
> >>> + *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as 
> >>> MAP
> >>> + *   events are registered, the notifications will be accurate but
> >>> + *   there's overhead on synchronizing the guest vIOMMU page tables.
> >>> + *
> >>> + *   (2) When the device doesn't need accurate synchronizations of the
> >>> + *   vIOMMU page tables (when the device can both cache translations
> >>> + *   and requesting to translate dynamically during DMA process), it
> when the notified device maintains a cache of IOMMU translations (IOTLB)
> and is able to fill that cache by requesting translations from the
> vIOMMU through a protocol similar to ATS. In that case the notified
> device only needs to register an UNMAP notifier. In that case the unmap
> notifications are allower to be wider than the strict necessary.

Same here.

> 
> However the problem is since you need to satisfy the VFIO use case, how
> do you detect when you are allowed to invalidate 

Re: [PATCH v3 13/13] tcg/s390x: Implement ctpop operation

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:52:00PM -0800, Richard Henderson wrote:
> There is an older form that produces per-byte results,
> and a newer form that produces per-register results,
> and a vector form that produces per-element results.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.h |  5 ++--
>  tcg/s390x/tcg-target.c.inc | 51 ++
>  2 files changed, 54 insertions(+), 2 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v12 1/7] s390x/cpu topology: Creating CPU topology device

2022-12-06 Thread Janis Schoetterl-Glausch
On Tue, 2022-12-06 at 15:35 +0100, Pierre Morel wrote:
> 
> On 12/6/22 14:35, Janis Schoetterl-Glausch wrote:
> > On Tue, 2022-12-06 at 11:32 +0100, Pierre Morel wrote:
> > > 
> > > On 12/6/22 10:31, Janis Schoetterl-Glausch wrote:
> > > > On Tue, 2022-11-29 at 18:42 +0100, Pierre Morel wrote:
> > > > > We will need a Topology device to transfer the topology
> > > > > during migration and to implement machine reset.
> > > > > 
> > > > > The device creation is fenced by s390_has_topology().
> > > > > 
> > > > > Signed-off-by: Pierre Morel 
> > > > > ---
> > > > >include/hw/s390x/cpu-topology.h| 44 +++
> > > > >include/hw/s390x/s390-virtio-ccw.h |  1 +
> > > > >hw/s390x/cpu-topology.c| 87 
> > > > > ++
> > > > >hw/s390x/s390-virtio-ccw.c | 25 +
> > > > >hw/s390x/meson.build   |  1 +
> > > > >5 files changed, 158 insertions(+)
> > > > >create mode 100644 include/hw/s390x/cpu-topology.h
> > > > >create mode 100644 hw/s390x/cpu-topology.c
> > > > 
[...]
> > > > >
> > > > > +static DeviceState *s390_init_topology(MachineState *machine, Error 
> > > > > **errp)
> > > > > +{
> > > > > +DeviceState *dev;
> > > > > +
> > > > > +dev = qdev_new(TYPE_S390_CPU_TOPOLOGY);
> > > > > +
> > > > > +object_property_add_child(>parent_obj,
> > > > > +  TYPE_S390_CPU_TOPOLOGY, OBJECT(dev));
> > > > 
> > > > Why set this property, and why on the machine parent?
> > > 
> > > For what I understood setting the num_cores and num_sockets as
> > > properties of the CPU Topology object allows to have them better
> > > integrated in the QEMU object framework.
> > 
> > That I understand.
> > > 
> > > The topology is added to the S390CcwmachineState, it is the parent of
> > > the machine.
> > 
> > But why? And is it added to the S390CcwMachineState, or its parent?
> 
> it is added to the S390CcwMachineState.
> We receive the MachineState as the "machine" parameter here and it is 
> added to the "machine->parent_obj" which is the S390CcwMachineState.

Oh, I was confused. >parent_obj is just a cast of MachineState* to 
Object*.
It's the very same object.
And what is the reason to add the topology as child property?
Just so it shows up in the qtree? Wouldn't it anyway under the sysbus?
> 
> 
> 
> > > 
> > > 
> > > > 
> > > > > +object_property_set_int(OBJECT(dev), "num-cores",
> > > > > +machine->smp.cores * 
> > > > > machine->smp.threads, errp);
> > > > > +object_property_set_int(OBJECT(dev), "num-sockets",
> > > > > +machine->smp.sockets, errp);
> > > > > +
> > > > > +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);
> > > > 
> > > > I must admit that I haven't fully grokked qemu's memory management yet.
> > > > Is the topology devices now owned by the sysbus?
> > > 
> > > Yes it is so we see it on the qtree with its properties.
> > > 
> > > 
> > > > If so, is it fine to have a pointer to it S390CcwMachineState?
> > > 
> > > Why not?
> > 
> > If it's owned by the sysbus and the object is not explicitly referenced
> > for the pointer, it might be deallocated and then you'd have a dangling 
> > pointer.
> 
> Why would it be deallocated ?

That's beside the point, if you transfer ownership, you have no control over 
when
the deallocation happens.
It's going to be fine in practice, but I don't think you should rely on it.
I think you could just do sysbus_realize instead of ..._and_unref,
but like I said, I haven't fully understood qemu memory management.
(It would also leak in a sense, but since the machine exists forever that 
should be fine)

> as long it is not unrealized it belongs to the sysbus doesn't it?
> 
> Regards,
> Pierre
> 




Re: [PATCH v3 12/13] tcg/s390x: Use tgen_movcond_int in tgen_clz

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:59PM -0800, Richard Henderson wrote:
> Reuse code from movcond to conditionally copy a2 to dest,
> based on the condition codes produced by FLOGR.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target-con-set.h |  1 +
>  tcg/s390x/tcg-target.c.inc | 26 +++---
>  2 files changed, 12 insertions(+), 15 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 11/13] tcg/s390x: Support SELGR instruction in movcond

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:58PM -0800, Richard Henderson wrote:
> The new select instruction provides two separate register inputs,
> whereas the old load-on-condition instruction overlaps one of the
> register inputs with the destination.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.c.inc | 21 +
>  1 file changed, 21 insertions(+)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 10/13] tcg/s390x: Generalize movcond implementation

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:57PM -0800, Richard Henderson wrote:
> Generalize movcond to support pre-computed conditions, and the same
> set of arguments at all times.  This will be assumed by a following
> patch, which needs to reuse tgen_movcond_int.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target-con-set.h |  3 +-
>  tcg/s390x/tcg-target.c.inc | 78 ++
>  2 files changed, 61 insertions(+), 20 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 06/13] tcg/s390x: Support MIE2 multiply single instructions

2022-12-06 Thread Richard Henderson
On Tue, 6 Dec 2022, 14:02 Ilya Leoshkevich,  wrote:

> On Thu, Dec 01, 2022 at 10:51:53PM -0800, Richard Henderson wrote:
> > The MIE2 facility adds 3-operand versions of multiply.
> >
> > Signed-off-by: Richard Henderson 
> > ---
> >  tcg/s390x/tcg-target-con-set.h |  1 +
> >  tcg/s390x/tcg-target.h |  1 +
> >  tcg/s390x/tcg-target.c.inc | 34 --
> >  3 files changed, 26 insertions(+), 10 deletions(-)
>
> Reviewed-by: Ilya Leoshkevich 
>
> I have one small suggestion, see below.
>
> > diff --git a/tcg/s390x/tcg-target-con-set.h
> b/tcg/s390x/tcg-target-con-set.h
> > index 00ba727b70..33a82e3286 100644
> > --- a/tcg/s390x/tcg-target-con-set.h
> > +++ b/tcg/s390x/tcg-target-con-set.h
> > @@ -23,6 +23,7 @@ C_O1_I2(r, 0, ri)
> >  C_O1_I2(r, 0, rI)
> >  C_O1_I2(r, 0, rJ)
> >  C_O1_I2(r, r, ri)
> > +C_O1_I2(r, r, rJ)
> >  C_O1_I2(r, rZ, r)
> >  C_O1_I2(v, v, r)
> >  C_O1_I2(v, v, v)
> > diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
> > index 645f522058..bfd623a639 100644
> > --- a/tcg/s390x/tcg-target.h
> > +++ b/tcg/s390x/tcg-target.h
> > @@ -63,6 +63,7 @@ typedef enum TCGReg {
> >  #define FACILITY_FAST_BCR_SER FACILITY_LOAD_ON_COND
> >  #define FACILITY_DISTINCT_OPS FACILITY_LOAD_ON_COND
> >  #define FACILITY_LOAD_ON_COND253
> > +#define FACILITY_MISC_INSN_EXT2   58
> >  #define FACILITY_VECTOR   129
> >  #define FACILITY_VECTOR_ENH1  135
> >
> > diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
> > index d02b433271..cd39b2a208 100644
> > --- a/tcg/s390x/tcg-target.c.inc
> > +++ b/tcg/s390x/tcg-target.c.inc
> > @@ -180,6 +180,8 @@ typedef enum S390Opcode {
> >  RRE_SLBGR   = 0xb989,
> >  RRE_XGR = 0xb982,
> >
> > +RRFa_MSRKC  = 0xb9fd,
> > +RRFa_MSGRKC = 0xb9ed,
> >  RRFa_NRK= 0xb9f4,
> >  RRFa_NGRK   = 0xb9e4,
> >  RRFa_ORK= 0xb9f6,
> > @@ -2140,14 +2142,18 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
> >  break;
> >
> >  case INDEX_op_mul_i32:
> > +a0 = args[0], a1 = args[1], a2 = (int32_t)args[2];
> >  if (const_args[2]) {
> > -if ((int32_t)args[2] == (int16_t)args[2]) {
> > -tcg_out_insn(s, RI, MHI, args[0], args[2]);
> > +tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
>
> Should we consider a0 == a1 case here as well, in order to get rid of
> this extra move when possible?
>

tcg_out_mov already does that.


r~


Re: [PATCH for-8.0] hw/rtc/mc146818rtc: Make this rtc device target independent

2022-12-06 Thread Philippe Mathieu-Daudé

On 6/12/22 21:06, Thomas Huth wrote:

The only code that is really, really target dependent is the apic-related
code in rtc_policy_slew_deliver_irq(). By moving this code into the hw/i386/
folder (renamed to rtc_apic_policy_slew_deliver_irq()) and passing this
function as parameter to mc146818_rtc_init(), we can make the RTC completely
target-independent.

Signed-off-by: Thomas Huth 
---
  include/hw/rtc/mc146818rtc.h |  7 +--
  hw/alpha/dp264.c |  2 +-
  hw/hppa/machine.c|  2 +-
  hw/i386/microvm.c|  3 ++-
  hw/i386/pc.c | 10 +-
  hw/mips/jazz.c   |  2 +-
  hw/ppc/pnv.c |  2 +-
  hw/rtc/mc146818rtc.c | 34 +++---
  hw/rtc/meson.build   |  3 +--
  9 files changed, 32 insertions(+), 33 deletions(-)


Cool!

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 09/13] tcg/s390x: Create tgen_cmp2 to simplify movcond

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:56PM -0800, Richard Henderson wrote:
> Return both regular and inverted condition codes from tgen_cmp2.
> This lets us choose after the fact which comparision we want.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.c.inc | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH for-8.0] ui/vnc: fix bad address parsing

2022-12-06 Thread Philippe Mathieu-Daudé

On 6/12/22 20:23, Vladimir Sementsov-Ogievskiy wrote:

IF addrstr == "[" and websocket is true, hostlen becomes 0 and we try
to access addrstr[hostlen-1] which is bad idea.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  ui/vnc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 88f55cbf3c..8830bfe382 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3765,7 +3765,7 @@ static int vnc_display_get_address(const char *addrstr,
  
  addr->type = SOCKET_ADDRESS_TYPE_INET;

  inet = >u.inet;
-if (addrstr[0] == '[' && addrstr[hostlen - 1] == ']') {
+if (hostlen >= 2 && addrstr[0] == '[' && addrstr[hostlen - 1] == ']') {
  inet->host = g_strndup(addrstr + 1, hostlen - 2);
  } else {
  inet->host = g_strndup(addrstr, hostlen);


If addrstr is "[" then inet->host ends up being "[" too now, right?

I was pretty sure we had a helper for that, but can't find any.



Re: [PATCH v3 08/13] tcg/s390x: Support MIE3 logical operations

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:55PM -0800, Richard Henderson wrote:
> This is andc, orc, nand, nor, eqv.
> We can use nor for implementing not.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target-con-set.h |   1 +
>  tcg/s390x/tcg-target.h |  25 +
>  tcg/s390x/tcg-target.c.inc | 100 +
>  3 files changed, 114 insertions(+), 12 deletions(-)

Reviewed-by: Ilya Leoshkevich 



[PATCH for-8.0] hw/rtc/mc146818rtc: Make this rtc device target independent

2022-12-06 Thread Thomas Huth
The only code that is really, really target dependent is the apic-related
code in rtc_policy_slew_deliver_irq(). By moving this code into the hw/i386/
folder (renamed to rtc_apic_policy_slew_deliver_irq()) and passing this
function as parameter to mc146818_rtc_init(), we can make the RTC completely
target-independent.

Signed-off-by: Thomas Huth 
---
 include/hw/rtc/mc146818rtc.h |  7 +--
 hw/alpha/dp264.c |  2 +-
 hw/hppa/machine.c|  2 +-
 hw/i386/microvm.c|  3 ++-
 hw/i386/pc.c | 10 +-
 hw/mips/jazz.c   |  2 +-
 hw/ppc/pnv.c |  2 +-
 hw/rtc/mc146818rtc.c | 34 +++---
 hw/rtc/meson.build   |  3 +--
 9 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 1db0fcee92..c687953cc4 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -46,14 +46,17 @@ struct RTCState {
 Notifier clock_reset_notifier;
 LostTickPolicy lost_tick_policy;
 Notifier suspend_notifier;
+bool (*policy_slew_deliver_irq)(RTCState *s);
 QLIST_ENTRY(RTCState) link;
 };
 
 #define RTC_ISA_IRQ 8
 
-ISADevice *mc146818_rtc_init(ISABus *bus, int base_year,
- qemu_irq intercept_irq);
+ISADevice *mc146818_rtc_init(ISABus *bus, int base_year, qemu_irq 
intercept_irq,
+ bool (*policy_slew_deliver_irq)(RTCState *s));
 void rtc_set_memory(ISADevice *dev, int addr, int val);
 int rtc_get_memory(ISADevice *dev, int addr);
+bool rtc_apic_policy_slew_deliver_irq(RTCState *s);
+void qmp_rtc_reset_reinjection(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
index c502c8c62a..8723942b52 100644
--- a/hw/alpha/dp264.c
+++ b/hw/alpha/dp264.c
@@ -118,7 +118,7 @@ static void clipper_init(MachineState *machine)
 qdev_connect_gpio_out(i82378_dev, 0, isa_irq);
 
 /* Since we have an SRM-compatible PALcode, use the SRM epoch.  */
-mc146818_rtc_init(isa_bus, 1900, rtc_irq);
+mc146818_rtc_init(isa_bus, 1900, rtc_irq, NULL);
 
 /* VGA setup.  Don't bother loading the bios.  */
 pci_vga_init(pci_bus);
diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index de1cc7ab71..311031714a 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -232,7 +232,7 @@ static void machine_hppa_init(MachineState *machine)
 assert(isa_bus);
 
 /* Realtime clock, used by firmware for PDC_TOD call. */
-mc146818_rtc_init(isa_bus, 2000, NULL);
+mc146818_rtc_init(isa_bus, 2000, NULL, NULL);
 
 /* Serial ports: Lasi and Dino use a 7.272727 MHz clock. */
 serial_mm_init(addr_space, LASI_UART_HPA + 0x800, 0,
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..d0ed4dca50 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -267,7 +267,8 @@ static void microvm_devices_init(MicrovmMachineState *mms)
 
 if (mms->rtc == ON_OFF_AUTO_ON ||
 (mms->rtc == ON_OFF_AUTO_AUTO && !kvm_enabled())) {
-rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
+rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL,
+  rtc_apic_policy_slew_deliver_irq);
 microvm_set_rtc(mms, rtc_state);
 }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 546b703cb4..650e7bc199 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1244,6 +1244,13 @@ static void pc_superio_init(ISABus *isa_bus, bool 
create_fdctrl,
 g_free(a20_line);
 }
 
+bool rtc_apic_policy_slew_deliver_irq(RTCState *s)
+{
+apic_reset_irq_delivered();
+qemu_irq_raise(s->irq);
+return apic_get_irq_delivered();
+}
+
 void pc_basic_device_init(struct PCMachineState *pcms,
   ISABus *isa_bus, qemu_irq *gsi,
   ISADevice **rtc_state,
@@ -1299,7 +1306,8 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 pit_alt_irq = qdev_get_gpio_in(hpet, HPET_LEGACY_PIT_INT);
 rtc_irq = qdev_get_gpio_in(hpet, HPET_LEGACY_RTC_INT);
 }
-*rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq);
+*rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq,
+   rtc_apic_policy_slew_deliver_irq);
 
 qemu_register_boot_set(pc_boot_set, *rtc_state);
 
diff --git a/hw/mips/jazz.c b/hw/mips/jazz.c
index 6aefe9a61b..50fbd57b23 100644
--- a/hw/mips/jazz.c
+++ b/hw/mips/jazz.c
@@ -356,7 +356,7 @@ static void mips_jazz_init(MachineState *machine,
 fdctrl_init_sysbus(qdev_get_gpio_in(rc4030, 1), 0x80003000, fds);
 
 /* Real time clock */
-mc146818_rtc_init(isa_bus, 1980, NULL);
+mc146818_rtc_init(isa_bus, 1980, NULL, NULL);
 memory_region_init_io(rtc, NULL, _ops, NULL, "rtc", 0x1000);
 memory_region_add_subregion(address_space, 0x80004000, rtc);
 
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 3d01e26f84..c5482554b7 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c

Re: [PATCH v3 07/13] tcg/s390x: Support MIE2 MGRK instruction

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:54PM -0800, Richard Henderson wrote:
> The MIE2 facility adds a 3-operand signed 64x64->128 multiply.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target-con-set.h | 1 +
>  tcg/s390x/tcg-target.h | 2 +-
>  tcg/s390x/tcg-target.c.inc | 8 
>  3 files changed, 10 insertions(+), 1 deletion(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 06/13] tcg/s390x: Support MIE2 multiply single instructions

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:53PM -0800, Richard Henderson wrote:
> The MIE2 facility adds 3-operand versions of multiply.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target-con-set.h |  1 +
>  tcg/s390x/tcg-target.h |  1 +
>  tcg/s390x/tcg-target.c.inc | 34 --
>  3 files changed, 26 insertions(+), 10 deletions(-)

Reviewed-by: Ilya Leoshkevich 

I have one small suggestion, see below.

> diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
> index 00ba727b70..33a82e3286 100644
> --- a/tcg/s390x/tcg-target-con-set.h
> +++ b/tcg/s390x/tcg-target-con-set.h
> @@ -23,6 +23,7 @@ C_O1_I2(r, 0, ri)
>  C_O1_I2(r, 0, rI)
>  C_O1_I2(r, 0, rJ)
>  C_O1_I2(r, r, ri)
> +C_O1_I2(r, r, rJ)
>  C_O1_I2(r, rZ, r)
>  C_O1_I2(v, v, r)
>  C_O1_I2(v, v, v)
> diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
> index 645f522058..bfd623a639 100644
> --- a/tcg/s390x/tcg-target.h
> +++ b/tcg/s390x/tcg-target.h
> @@ -63,6 +63,7 @@ typedef enum TCGReg {
>  #define FACILITY_FAST_BCR_SER FACILITY_LOAD_ON_COND
>  #define FACILITY_DISTINCT_OPS FACILITY_LOAD_ON_COND
>  #define FACILITY_LOAD_ON_COND253
> +#define FACILITY_MISC_INSN_EXT2   58
>  #define FACILITY_VECTOR   129
>  #define FACILITY_VECTOR_ENH1  135
>  
> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
> index d02b433271..cd39b2a208 100644
> --- a/tcg/s390x/tcg-target.c.inc
> +++ b/tcg/s390x/tcg-target.c.inc
> @@ -180,6 +180,8 @@ typedef enum S390Opcode {
>  RRE_SLBGR   = 0xb989,
>  RRE_XGR = 0xb982,
>  
> +RRFa_MSRKC  = 0xb9fd,
> +RRFa_MSGRKC = 0xb9ed,
>  RRFa_NRK= 0xb9f4,
>  RRFa_NGRK   = 0xb9e4,
>  RRFa_ORK= 0xb9f6,
> @@ -2140,14 +2142,18 @@ static inline void tcg_out_op(TCGContext *s, 
> TCGOpcode opc,
>  break;
>  
>  case INDEX_op_mul_i32:
> +a0 = args[0], a1 = args[1], a2 = (int32_t)args[2];
>  if (const_args[2]) {
> -if ((int32_t)args[2] == (int16_t)args[2]) {
> -tcg_out_insn(s, RI, MHI, args[0], args[2]);
> +tcg_out_mov(s, TCG_TYPE_I32, a0, a1);

Should we consider a0 == a1 case here as well, in order to get rid of
this extra move when possible?

> +if (a2 == (int16_t)a2) {
> +tcg_out_insn(s, RI, MHI, a0, a2);
>  } else {
> -tcg_out_insn(s, RIL, MSFI, args[0], args[2]);
> +tcg_out_insn(s, RIL, MSFI, a0, a2);
>  }
> +} else if (a0 == a1) {
> +tcg_out_insn(s, RRE, MSR, a0, a2);
>  } else {
> -tcg_out_insn(s, RRE, MSR, args[0], args[2]);
> +tcg_out_insn(s, RRFa, MSRKC, a0, a1, a2);
>  }
>  break;
>  
> @@ -2405,14 +2411,18 @@ static inline void tcg_out_op(TCGContext *s, 
> TCGOpcode opc,
>  break;
>  
>  case INDEX_op_mul_i64:
> +a0 = args[0], a1 = args[1], a2 = args[2];
>  if (const_args[2]) {
> -if (args[2] == (int16_t)args[2]) {
> -tcg_out_insn(s, RI, MGHI, args[0], args[2]);
> +tcg_out_mov(s, TCG_TYPE_I64, a0, a1);

Same here.

> +if (a2 == (int16_t)a2) {
> +tcg_out_insn(s, RI, MGHI, a0, a2);
>  } else {
> -tcg_out_insn(s, RIL, MSGFI, args[0], args[2]);
> +tcg_out_insn(s, RIL, MSGFI, a0, a2);
>  }
> +} else if (a0 == a1) {
> +tcg_out_insn(s, RRE, MSGR, a0, a2);
>  } else {
> -tcg_out_insn(s, RRE, MSGR, args[0], args[2]);
> +tcg_out_insn(s, RRFa, MSGRKC, a0, a1, a2);
>  }
>  break;
>  
> @@ -3072,12 +3082,16 @@ static TCGConstraintSetIndex 
> tcg_target_op_def(TCGOpcode op)
> MULTIPLY SINGLE IMMEDIATE with a signed 32-bit, otherwise we
> have only MULTIPLY HALFWORD IMMEDIATE, with a signed 16-bit.  */
>  return (HAVE_FACILITY(GEN_INST_EXT)
> -? C_O1_I2(r, 0, ri)
> +? (HAVE_FACILITY(MISC_INSN_EXT2)
> +   ? C_O1_I2(r, r, ri)
> +   : C_O1_I2(r, 0, ri))
>  : C_O1_I2(r, 0, rI));
>  
>  case INDEX_op_mul_i64:
>  return (HAVE_FACILITY(GEN_INST_EXT)
> -? C_O1_I2(r, 0, rJ)
> +? (HAVE_FACILITY(MISC_INSN_EXT2)
> +   ? C_O1_I2(r, r, rJ)
> +   : C_O1_I2(r, 0, rJ))
>  : C_O1_I2(r, 0, rI));
>  
>  case INDEX_op_shl_i32:
> -- 
> 2.34.1
> 
> 



Re: [PATCH v3 05/13] tcg/s390x: Distinguish RIE formats

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:52PM -0800, Richard Henderson wrote:
> There are multiple variations, with different fields.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.c.inc | 47 +-
>  1 file changed, 26 insertions(+), 21 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 04/13] tcg/s390x: Distinguish RRF-a and RRF-c formats

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:51PM -0800, Richard Henderson wrote:
> One has 3 register arguments; the other has 2 plus an m3 field.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.c.inc | 57 +-
>  1 file changed, 32 insertions(+), 25 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 03/13] tcg/s390x: Use LARL+AGHI for odd addresses

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:50PM -0800, Richard Henderson wrote:
> Add one instead of dropping odd addresses to the constant pool.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.c.inc | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v3 02/13] tcg/s390x: Remove TCG_REG_TB

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:49PM -0800, Richard Henderson wrote:
> This reverts 829e1376d940 ("tcg/s390: Introduce TCG_REG_TB"), and
> several follow-up patches.  The primary motivation is to reduce the
> less-tested code paths, pre-z10.  Secondarily, this allows the
> unconditional use of TCG_TARGET_HAS_direct_jump, which might be more
> important for performance than any slight increase in code size.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target.h |   2 +-
>  tcg/s390x/tcg-target.c.inc | 176 +
>  2 files changed, 23 insertions(+), 155 deletions(-)

Reviewed-by: Ilya Leoshkevich 

I have a few questions/ideas for the future below.
 
> diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
> index 22d70d431b..645f522058 100644
> --- a/tcg/s390x/tcg-target.h
> +++ b/tcg/s390x/tcg-target.h
> @@ -103,7 +103,7 @@ extern uint64_t s390_facilities[3];
>  #define TCG_TARGET_HAS_mulsh_i32  0
>  #define TCG_TARGET_HAS_extrl_i64_i32  0
>  #define TCG_TARGET_HAS_extrh_i64_i32  0
> -#define TCG_TARGET_HAS_direct_jumpHAVE_FACILITY(GEN_INST_EXT)
> +#define TCG_TARGET_HAS_direct_jump1

This change doesn't seem to affect that, but what is the minimum
supported s390x qemu host? z900?

>  #define TCG_TARGET_HAS_qemu_st8_i32   0
>  
>  #define TCG_TARGET_HAS_div2_i64   1
> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
> index cb00bb6999..8a4bec0a28 100644
> --- a/tcg/s390x/tcg-target.c.inc
> +++ b/tcg/s390x/tcg-target.c.inc
> @@ -65,12 +65,6 @@
>  /* A scratch register that may be be used throughout the backend.  */
>  #define TCG_TMP0TCG_REG_R1
>  
> -/* A scratch register that holds a pointer to the beginning of the TB.
> -   We don't need this when we have pc-relative loads with the general
> -   instructions extension facility.  */
> -#define TCG_REG_TB  TCG_REG_R12
> -#define USE_REG_TB  (!HAVE_FACILITY(GEN_INST_EXT))
> -
>  #ifndef CONFIG_SOFTMMU
>  #define TCG_GUEST_BASE_REG TCG_REG_R13
>  #endif
> @@ -813,8 +807,8 @@ static bool maybe_out_small_movi(TCGContext *s, TCGType 
> type,
>  }
>  
>  /* load a register with an immediate value */
> -static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
> - tcg_target_long sval, bool in_prologue)
> +static void tcg_out_movi(TCGContext *s, TCGType type,
> + TCGReg ret, tcg_target_long sval)
>  {
>  tcg_target_ulong uval;
>  
> @@ -853,14 +847,6 @@ static void tcg_out_movi_int(TCGContext *s, TCGType 
> type, TCGReg ret,
>  tcg_out_insn(s, RIL, LARL, ret, off);
>  return;
>  }
> -} else if (USE_REG_TB && !in_prologue) {
> -ptrdiff_t off = tcg_tbrel_diff(s, (void *)sval);
> -if (off == sextract64(off, 0, 20)) {
> -/* This is certain to be an address within TB, and therefore
> -   OFF will be negative; don't try RX_LA.  */
> -tcg_out_insn(s, RXY, LAY, ret, TCG_REG_TB, TCG_REG_NONE, off);
> -return;
> -}
>  }
>  
>  /* A 32-bit unsigned value can be loaded in 2 insns.  And given
> @@ -876,10 +862,6 @@ static void tcg_out_movi_int(TCGContext *s, TCGType 
> type, TCGReg ret,
>  if (HAVE_FACILITY(GEN_INST_EXT)) {
>  tcg_out_insn(s, RIL, LGRL, ret, 0);
>  new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
> -} else if (USE_REG_TB && !in_prologue) {
> -tcg_out_insn(s, RXY, LG, ret, TCG_REG_TB, TCG_REG_NONE, 0);
> -new_pool_label(s, sval, R_390_20, s->code_ptr - 2,
> -   tcg_tbrel_diff(s, NULL));
>  } else {
>  TCGReg base = ret ? ret : TCG_TMP0;
>  tcg_out_insn(s, RIL, LARL, base, 0);
> @@ -888,12 +870,6 @@ static void tcg_out_movi_int(TCGContext *s, TCGType 
> type, TCGReg ret,
>  }
>  }

I did some benchmarking of various ways to load constants in context of
GCC in the past, and it turned out that LLIHF+OILF is more efficient
than literal pool [1].

> -static void tcg_out_movi(TCGContext *s, TCGType type,
> - TCGReg ret, tcg_target_long sval)
> -{
> -tcg_out_movi_int(s, type, ret, sval, false);
> -}
> -
>  /* Emit a load/store type instruction.  Inputs are:
> DATA: The register to be loaded or stored.
> BASE+OFS: The effective address.
> @@ -1020,35 +996,6 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType 
> type, TCGArg val,
>  return false;
>  }
>  
> -/* load data from an absolute host address */
> -static void tcg_out_ld_abs(TCGContext *s, TCGType type,
> -   TCGReg dest, const void *abs)
> -{
> -intptr_t addr = (intptr_t)abs;
> -
> -if (HAVE_FACILITY(GEN_INST_EXT) && !(addr & 1)) {
> -ptrdiff_t disp = tcg_pcrel_diff(s, abs) >> 1;
> -if (disp == (int32_t)disp) {
> -if (type == TCG_TYPE_I32) {
> -tcg_out_insn(s, RIL, LRL, dest, disp);
> -} else {

[PATCH for-8.0] ui/vnc: fix bad address parsing

2022-12-06 Thread Vladimir Sementsov-Ogievskiy
IF addrstr == "[" and websocket is true, hostlen becomes 0 and we try
to access addrstr[hostlen-1] which is bad idea.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 ui/vnc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 88f55cbf3c..8830bfe382 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3765,7 +3765,7 @@ static int vnc_display_get_address(const char *addrstr,
 
 addr->type = SOCKET_ADDRESS_TYPE_INET;
 inet = >u.inet;
-if (addrstr[0] == '[' && addrstr[hostlen - 1] == ']') {
+if (hostlen >= 2 && addrstr[0] == '[' && addrstr[hostlen - 1] == ']') {
 inet->host = g_strndup(addrstr + 1, hostlen - 2);
 } else {
 inet->host = g_strndup(addrstr, hostlen);
-- 
2.34.1




[Bug 1903470] Re: qemu 5.1.0: Add UNIX socket support for netdev socket

2022-12-06 Thread Laurent Vivier
This was added to support passt:

https://passt.top

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1903470

Title:
  qemu 5.1.0: Add UNIX socket support for netdev socket

Status in QEMU:
  Expired

Bug description:
  Note: this is a feature request.

  qemu has a way to connect instances using a socket:

  -netdev socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]

  This can also be used to connect a qemu instance to something else
  using a socket connection, however there is no authentication or
  security to the connection, so rather than using a port which can be
  accessed by any user on the machine, having the ability to use or
  connect to UNIX sockets would be helpful, and adding this option
  should be fairly trivial.

  UNIX sockets can be found in various parts of qemu (monitor, etc) so I
  believe having this on network would make sense.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1903470/+subscriptions




[Bug 1903470] Re: qemu 5.1.0: Add UNIX socket support for netdev socket

2022-12-06 Thread Laurent Vivier
This will be available in the next QEMU release (7.2) under a sligthly
different form:

"-netdev stream" for TCP socket and "-netdev dgram" for UDP socket.

Both support inet and unix sockets. See qemu(1).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1903470

Title:
  qemu 5.1.0: Add UNIX socket support for netdev socket

Status in QEMU:
  Expired

Bug description:
  Note: this is a feature request.

  qemu has a way to connect instances using a socket:

  -netdev socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]

  This can also be used to connect a qemu instance to something else
  using a socket connection, however there is no authentication or
  security to the connection, so rather than using a port which can be
  accessed by any user on the machine, having the ability to use or
  connect to UNIX sockets would be helpful, and adding this option
  should be fairly trivial.

  UNIX sockets can be found in various parts of qemu (monitor, etc) so I
  believe having this on network would make sense.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1903470/+subscriptions




Re: Thoughts on removing the TARGET_I386 part of hw/display/vga/vbe_portio_list[]

2022-12-06 Thread Warner Losh
On Tue, Dec 6, 2022 at 5:32 AM Dr. David Alan Gilbert 
wrote:

> From intel arch manual 19.3:
>  '..16-bit ports should be aligned to even addresses (0, 2, 4, ...) so
> that all 16 bits can be transferred in a
>   single bus cycle. Likewise, 32-bit ports should be aligned to addresses
> that are multiples of four (0, 4, 8, ...). The
>   processor supports data transfers to unaligned ports, but there is a
> performance penalty because one or more
>   extra bus cycle must be used.'
>
> I think I've even seen it suggested that a 32bit access to  might be
> defined - although I'm not sure if that's legal.
>

I don't know how well defined it is from an Official Intel Bus Definition
perspective, but on at least one 486-era core and one Pentium-era core it
would wrap. So an inl(0x) would result in an inb(0x), inw(0),
inb(3) showing up on the bus. I hit this as a bug in debugging a custom
driver way too many years ago. The cores weren't from intel, but were AMD
and/or some other third party (I don't recall which ones). I'd rate my
surity of this knowledge as medium, so if there's some other resource that
contradicts this, I'd tend to believe that source for this edge-case
behavior.

Warner


Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus

2022-12-06 Thread Hyman Huang




在 2022/12/7 0:00, Peter Xu 写道:

Hi, Shivam,

On Tue, Dec 06, 2022 at 11:18:52AM +0530, Shivam Kumar wrote:

[...]


Note
--
--

We understand that there is a good scope of improvement in the current
implementation. Here is a list of things we are working on:
1) Adding dirty quota as a migration capability so that it can be toggled
through QMP command.
2) Adding support for throttling guest DMAs.
3) Not enabling dirty quota for the first migration iteration.


Agreed.


4) Falling back to current auto-converge based throttling in cases where dirty
quota throttling can overthrottle.


If overthrottle happens, would auto-converge always be better?



Please stay tuned for the next patchset.

Shivam Kumar (1):
Dirty quota-based throttling of vcpus

   accel/kvm/kvm-all.c   | 91 +++
   include/exec/memory.h |  3 ++
   include/hw/core/cpu.h |  5 +++
   include/sysemu/kvm_int.h  |  1 +
   linux-headers/linux/kvm.h |  9 
   migration/migration.c | 22 ++
   migration/migration.h | 31 +
   softmmu/memory.c  | 64 +++
   8 files changed, 226 insertions(+)



It'd be great if I could get some more feedback before I send v2. Thanks.


Sorry to respond late.

What's the status of the kernel patchset?

 From high level the approach looks good at least to me.  It's just that (as
I used to mention) we have two similar approaches now on throttling the
guest for precopy.  I'm not sure what's the best way to move forward if
without doing a comparison of the two.

https://lore.kernel.org/all/cover.1669047366.git.huang...@chinatelecom.cn/

Sorry to say so, and no intention to create a contention, but merging the
two without any thought will definitely confuse everybody.  We need to
figure out a way.

 From what I can tell..

One way is we choose one of them which will be superior to the other and
all of us stick with it (for either higher possibility of migrate, less
interference to the workloads, and so on).

The other way is we take both, when each of them may be suitable for
different scenarios.  However in this latter case, we'd better at least be
aware of the differences (which suites what), then that'll be part of
documentation we need for each of the features when the user wants to use
them.

Add Yong into the loop.

Any thoughts?

This is quite different from "dirtylimit capability of migration". IMHO, 
quota-based implementation seems a little complicated, because it 
depends on correctness of dirty quota and the measured data, which 
involves the patchset both in qemu and kernel. It seems that dirtylimit 
and quota-based are not mutually exclusive, at least we can figure out

which suites what firstly depending on the test results as Peter said.

--
Best regard

Hyman Huang(黄勇)



Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Richard Henderson

On 12/6/22 11:09, Philippe Mathieu-Daudé wrote:

On 6/12/22 16:38, Claudio Fontana wrote:

On 12/6/22 15:53, Claudio Fontana wrote:

On 5/17/21 13:11, Philippe Mathieu-Daudé wrote:

Guard declarations within hwaddr.h against inclusion
from user-mode emulation.

To make it clearer this header is sysemu specific,
move it to the sysemu/ directory.


Hi Philippe,

do we need include/exec/sysemu/... .h

as opposed to just use the existing

include/sysemu/

?


...and I would if anything go include/sysemu/exec/ not include/exec/sysemu ,

to highlight first that it is part of the sysemu build, when trying to reason about what 
gets built for sysemu vs anything else.


While refreshing this series I moved these files directly in include/sysemu/. Do you think 
the exec/ subdirectory {help|meaning}ful?


I don't think exec/ is particularly meaningful.


r~



Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Philippe Mathieu-Daudé

On 6/12/22 16:38, Claudio Fontana wrote:

On 12/6/22 15:53, Claudio Fontana wrote:

On 5/17/21 13:11, Philippe Mathieu-Daudé wrote:

Guard declarations within hwaddr.h against inclusion
from user-mode emulation.

To make it clearer this header is sysemu specific,
move it to the sysemu/ directory.


Hi Philippe,

do we need include/exec/sysemu/... .h

as opposed to just use the existing

include/sysemu/

?


...and I would if anything go include/sysemu/exec/ not include/exec/sysemu ,

to highlight first that it is part of the sysemu build, when trying to reason 
about what gets built for sysemu vs anything else.


While refreshing this series I moved these files directly in 
include/sysemu/. Do you think the exec/ subdirectory {help|meaning}ful?




[Bug 1903470] Re: qemu 5.1.0: Add UNIX socket support for netdev socket

2022-12-06 Thread Yury Bushmelev
JFYI I miss the ability to use Unix socket right now.. I'm trying to use
vagrant + vagrant-qemu + socket_vmnet on Macbook m1. It'd be MUCH easier
to connect QEMU to the socket_vmnet' Unix socket directly w/o any
wrappers..

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1903470

Title:
  qemu 5.1.0: Add UNIX socket support for netdev socket

Status in QEMU:
  Expired

Bug description:
  Note: this is a feature request.

  qemu has a way to connect instances using a socket:

  -netdev socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]

  This can also be used to connect a qemu instance to something else
  using a socket connection, however there is no authentication or
  security to the connection, so rather than using a port which can be
  accessed by any user on the machine, having the ability to use or
  connect to UNIX sockets would be helpful, and adding this option
  should be fairly trivial.

  UNIX sockets can be found in various parts of qemu (monitor, etc) so I
  believe having this on network would make sense.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1903470/+subscriptions




Re: [PATCH] intel-iommu: Document iova_tree

2022-12-06 Thread Eric Auger



On 12/6/22 17:05, Peter Xu wrote:
> On Tue, Dec 06, 2022 at 02:16:32PM +0100, Eric Auger wrote:
>> Hi Peter,
>> On 12/6/22 00:28, Peter Xu wrote:
>>> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
 On Fri, Dec 2, 2022 at 12:25 AM Peter Xu  wrote:
> It seems not super clear on when iova_tree is used, and why.  Add a rich
> comment above iova_tree to track why we needed the iova_tree, and when we
> need it.
>
> Suggested-by: Jason Wang 
> Signed-off-by: Peter Xu 
> ---
>  include/hw/i386/intel_iommu.h | 30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 46d973e629..8d130ab2e3 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>  QLIST_ENTRY(VTDAddressSpace) next;
>  /* Superset of notifier flags that this address space has */
>  IOMMUNotifierFlag notifier_flags;
> -IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
> +/*
> + * @iova_tree traces mapped IOVA ranges.
> + *
> + * The tree is not needed if no MAP notifiers is registered with
> + * current VTD address space, because all UNMAP (including iotlb or
> + * dev-iotlb) events can be transparently delivered to !MAP iommu
> + * notifiers.
 So this means the UNMAP notifier doesn't need to be as accurate as
 MAP. (Should we document it in the notifier headers)?
>>> Yes.
>>>
 For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
>>> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
>>> b+Y] as long as the range covers [a, b]?
>>>
> + *
> + * The tree OTOH is required for MAP typed iommu notifiers for a few
> + * reasons.
> + *
> + * Firstly, there's no way to identify whether an PSI event is MAP or
> + * UNMAP within the PSI message itself.  Without having prior 
> knowledge
> + * of existing state vIOMMU doesn't know whether it should notify MAP
> + * or UNMAP for a PSI message it received.
> + *
> + * Secondly, PSI received from guest driver (or even a large PSI can
> + * grow into a DSI at least with Linux intel-iommu driver) can be
> + * larger in range than the newly mapped ranges for either MAP or 
> UNMAP
> + * events.
 Yes, so I think we need a document that the UNMAP handler should be
 prepared for this.
>>> How about I squash below into this same patch?
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 91f8a2395a..c83bd11a68 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
>>>  /*
>>>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>>>   * register with one or multiple IOMMU Notifier capability bit(s).
>>> + *
>>> + * Normally there're two use cases for the notifiers:
>>> + *
>>> + *   (1) When the device needs accurate synchronizations of the vIOMMU page
>> accurate synchronizations sound too vague & subjective to me.
> Suggestions?
Well I would say:
when the notified device maintains a shadow page table and must to be
notified on each guest MAP (page table entry creation) and UNMAP
(invalidation) events (VFIO). Both notifications must be accurate so
that the shadow page table is fully in sync with the guest view.
>
>>> + *   tables, it needs to register with both MAP|UNMAP notifies (which
>>> + *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
>>> + *   events are registered, the notifications will be accurate but
>>> + *   there's overhead on synchronizing the guest vIOMMU page tables.
>>> + *
>>> + *   (2) When the device doesn't need accurate synchronizations of the
>>> + *   vIOMMU page tables (when the device can both cache translations
>>> + *   and requesting to translate dynamically during DMA process), it
when the notified device maintains a cache of IOMMU translations (IOTLB)
and is able to fill that cache by requesting translations from the
vIOMMU through a protocol similar to ATS. In that case the notified
device only needs to register an UNMAP notifier. In that case the unmap
notifications are allower to be wider than the strict necessary.

However the problem is since you need to satisfy the VFIO use case, how
do you detect when you are allowed to invalidate more that the strict
necessary?

Eric
 
>> s/requesting/request
>>> + *   needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
>> would be nice to clarify the distinction between both then
>>> + *   Note that in such working mode shadow page table is not used for
>>> + *   vIOMMU unit on this address space, so the UNMAP messages can be
>> I do not catch 'is not used for vIOMMU 

Re: Thoughts on removing the TARGET_I386 part of hw/display/vga/vbe_portio_list[]

2022-12-06 Thread Richard Henderson

On 12/6/22 10:02, Peter Maydell wrote:

On Tue, 6 Dec 2022 at 15:56, Philippe Mathieu-Daudé  wrote:


On 6/12/22 13:30, Dr. David Alan Gilbert wrote:

I don't know that bit of qemu well enough to know whether the cpu part
of qemu should be splitting the unaligned accesses or not.

All I/O accesses are gated thru access_with_adjusted_size() in
softmmu/memory.c.

There is an old access_with_adjusted_size_unaligned() version [1] from
Andrew and a more recent series [2] from Richard. Maybe the latter fixes
some long-standing bug [3] we have here?


There definitely are some unaddressed bugs there -- maybe this
is the time to work through what semantics we want that
softmmu code to provide and fix the bugs...


Yes, indeed.  Let's not forget Mark C-A's m68k bug[1] which so far has no 
resolution.


r~

[1] https://gitlab.com/qemu-project/qemu/-/issues/360



Re: Thoughts on removing the TARGET_I386 part of hw/display/vga/vbe_portio_list[]

2022-12-06 Thread Philippe Mathieu-Daudé

On 6/12/22 15:38, Gerd Hoffmann wrote:

   Hi,


So on x86 we can have 16-bit I/O accesses unaligned to 8-bit boundary?


Yes.


So I _think_ today we should be good with removing the x86 line:

-# ifdef TARGET_I386
-{ 1, 1, 2, .read = vbe_ioport_read_data, .write = vbe_ioport_write_data },
-# endif


Nope.  Breaks vgabios.  Testcase:

qemu-system-x86_64 -kernel /boot/vmlinuz-$(uname -r) -append vga=ask


Adding

 -trace memory_region_ops_\*

I get:

memory_region_ops_write cpu 0 mr 0x13eefbf60 addr 0x1ce value 0x0 size 2 
name 'vbe'
memory_region_ops_write cpu 0 mr 0x13eefbf60 addr 0x1cf value 0xb0c0 
size 2 name 'vbe'
memory_region_ops_write cpu 0 mr 0x13eefbf60 addr 0x1ce value 0x0 size 2 
name 'vbe'
memory_region_ops_read cpu 0 mr 0x13eefbf60 addr 0x1cf value 0x size 
2 name 'vbe'



All graphics modes are gone.


Yeah I'll investigate, thanks for the easy test case.




Re: [PATCH] intel-iommu: Document iova_tree

2022-12-06 Thread Peter Xu
On Tue, Dec 06, 2022 at 02:16:32PM +0100, Eric Auger wrote:
> Hi Peter,
> On 12/6/22 00:28, Peter Xu wrote:
> > On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
> >> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu  wrote:
> >>> It seems not super clear on when iova_tree is used, and why.  Add a rich
> >>> comment above iova_tree to track why we needed the iova_tree, and when we
> >>> need it.
> >>>
> >>> Suggested-by: Jason Wang 
> >>> Signed-off-by: Peter Xu 
> >>> ---
> >>>  include/hw/i386/intel_iommu.h | 30 +-
> >>>  1 file changed, 29 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> >>> index 46d973e629..8d130ab2e3 100644
> >>> --- a/include/hw/i386/intel_iommu.h
> >>> +++ b/include/hw/i386/intel_iommu.h
> >>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
> >>>  QLIST_ENTRY(VTDAddressSpace) next;
> >>>  /* Superset of notifier flags that this address space has */
> >>>  IOMMUNotifierFlag notifier_flags;
> >>> -IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
> >>> +/*
> >>> + * @iova_tree traces mapped IOVA ranges.
> >>> + *
> >>> + * The tree is not needed if no MAP notifiers is registered with
> >>> + * current VTD address space, because all UNMAP (including iotlb or
> >>> + * dev-iotlb) events can be transparently delivered to !MAP iommu
> >>> + * notifiers.
> >> So this means the UNMAP notifier doesn't need to be as accurate as
> >> MAP. (Should we document it in the notifier headers)?
> > Yes.
> >
> >> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
> > IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> > b+Y] as long as the range covers [a, b]?
> >
> >>> + *
> >>> + * The tree OTOH is required for MAP typed iommu notifiers for a few
> >>> + * reasons.
> >>> + *
> >>> + * Firstly, there's no way to identify whether an PSI event is MAP or
> >>> + * UNMAP within the PSI message itself.  Without having prior 
> >>> knowledge
> >>> + * of existing state vIOMMU doesn't know whether it should notify MAP
> >>> + * or UNMAP for a PSI message it received.
> >>> + *
> >>> + * Secondly, PSI received from guest driver (or even a large PSI can
> >>> + * grow into a DSI at least with Linux intel-iommu driver) can be
> >>> + * larger in range than the newly mapped ranges for either MAP or 
> >>> UNMAP
> >>> + * events.
> >> Yes, so I think we need a document that the UNMAP handler should be
> >> prepared for this.
> > How about I squash below into this same patch?
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 91f8a2395a..c83bd11a68 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
> >  /*
> >   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> >   * register with one or multiple IOMMU Notifier capability bit(s).
> > + *
> > + * Normally there're two use cases for the notifiers:
> > + *
> > + *   (1) When the device needs accurate synchronizations of the vIOMMU page
> accurate synchronizations sound too vague & subjective to me.

Suggestions?

> > + *   tables, it needs to register with both MAP|UNMAP notifies (which
> > + *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
> > + *   events are registered, the notifications will be accurate but
> > + *   there's overhead on synchronizing the guest vIOMMU page tables.
> > + *
> > + *   (2) When the device doesn't need accurate synchronizations of the
> > + *   vIOMMU page tables (when the device can both cache translations
> > + *   and requesting to translate dynamically during DMA process), it
> s/requesting/request
> > + *   needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
> would be nice to clarify the distinction between both then
> > + *   Note that in such working mode shadow page table is not used for
> > + *   vIOMMU unit on this address space, so the UNMAP messages can be
> I do not catch 'is not used for vIOMMU unit on this address space'

How about: "Note that in this working mode the vIOMMU will not maintain a
shadowed page table for the address space, and the UNMAP messages can be.."?

> > + *   actually larger than the real invalidations (just like how the
> > + *   Linux IOMMU driver normally works, where an invalidation can be
> > + *   enlarged as long as it still covers the target range).
> >   */
> >  typedef enum {
> >  IOMMU_NOTIFIER_NONE = 0,
> >
> > Thanks,
> >
> Thanks
> 
> Eric
> 

-- 
Peter Xu




Re: Thoughts on removing the TARGET_I386 part of hw/display/vga/vbe_portio_list[]

2022-12-06 Thread Peter Maydell
On Tue, 6 Dec 2022 at 15:56, Philippe Mathieu-Daudé  wrote:
>
> On 6/12/22 13:30, Dr. David Alan Gilbert wrote:
> > I don't know that bit of qemu well enough to know whether the cpu part
> > of qemu should be splitting the unaligned accesses or not.
> All I/O accesses are gated thru access_with_adjusted_size() in
> softmmu/memory.c.
>
> There is an old access_with_adjusted_size_unaligned() version [1] from
> Andrew and a more recent series [2] from Richard. Maybe the latter fixes
> some long-standing bug [3] we have here?

There definitely are some unaddressed bugs there -- maybe this
is the time to work through what semantics we want that
softmmu code to provide and fix the bugs...

-- PMM



Re: [PATCH] intel-iommu: Document iova_tree

2022-12-06 Thread Peter Xu
On Tue, Dec 06, 2022 at 02:06:54PM +0100, Eric Auger wrote:
> >>> + * current VTD address space, because all UNMAP (including iotlb or
> >>> + * dev-iotlb) events can be transparently delivered to !MAP iommu
> >>> + * notifiers.
> >> because all UNMAP notifications (iotlb or dev-iotlb) can be triggered
> >> directly, as opposed to MAP notifications. (?)
> > What I wanted to say is any PSI or DSI messages we got from the guest can
> > be transparently delivered to QEMU's iommu notifiers.  I'm not sure
> > "triggered directly" best describe the case here.
> yes "transparently delivered" is OK. Or "guest invalidate commands can
> be directly passed to the IOMMU UNMAP notifiers without any further
> reshuffling". But that's nitpicking.

Will do.

> >
> > PSI: Page Selective Invalidations
> > DSI: Domain Selective Invalidations
> >
> > Sorry to mention these terms again, but that's really what the "transparent
> > delivery" means here - we get the PSI/DSI messages, then we notify with the
> > same ranges in IOMMU notifiers.  They're not the same concept but we do
> > that transparently without changing the core of the messages.
> >
> > Maybe I should spell out "!MAP" as "UNMAP-only"?  Would that help?
> yeah those are unmap notifiers if I am correct.
> >
> >>> + *
> >>> + * The tree OTOH is required for MAP typed iommu notifiers for a few
> >>> + * reasons.
> >>> + *
> >>> + * Firstly, there's no way to identify whether an PSI event is MAP or
> >> maybe give the decryption of the 'PSI' and 'DSI" acronyms once ;-)
> > Please see above. :)
> ok thanks
> >
> > These are VT-d terms used in multiple places in the .[ch] files, I assume
> > I'll just keep using them because otherwise I'll need to comment them
> > everytime we use any PSI/DSI terms.  It might become an overkill I'm afraid.
> OK maybe just using the full terminology once is enough.

Ok, I'll add them.

Thanks Eric.

-- 
Peter Xu




Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus

2022-12-06 Thread Peter Xu
Hi, Shivam,

On Tue, Dec 06, 2022 at 11:18:52AM +0530, Shivam Kumar wrote:

[...]

> > Note
> > --
> > --
> > 
> > We understand that there is a good scope of improvement in the current
> > implementation. Here is a list of things we are working on:
> > 1) Adding dirty quota as a migration capability so that it can be toggled
> > through QMP command.
> > 2) Adding support for throttling guest DMAs.
> > 3) Not enabling dirty quota for the first migration iteration.

Agreed.

> > 4) Falling back to current auto-converge based throttling in cases where 
> > dirty
> > quota throttling can overthrottle.

If overthrottle happens, would auto-converge always be better?

> > 
> > Please stay tuned for the next patchset.
> > 
> > Shivam Kumar (1):
> >Dirty quota-based throttling of vcpus
> > 
> >   accel/kvm/kvm-all.c   | 91 +++
> >   include/exec/memory.h |  3 ++
> >   include/hw/core/cpu.h |  5 +++
> >   include/sysemu/kvm_int.h  |  1 +
> >   linux-headers/linux/kvm.h |  9 
> >   migration/migration.c | 22 ++
> >   migration/migration.h | 31 +
> >   softmmu/memory.c  | 64 +++
> >   8 files changed, 226 insertions(+)
> > 
> 
> It'd be great if I could get some more feedback before I send v2. Thanks.

Sorry to respond late.

What's the status of the kernel patchset?

>From high level the approach looks good at least to me.  It's just that (as
I used to mention) we have two similar approaches now on throttling the
guest for precopy.  I'm not sure what's the best way to move forward if
without doing a comparison of the two.

https://lore.kernel.org/all/cover.1669047366.git.huang...@chinatelecom.cn/

Sorry to say so, and no intention to create a contention, but merging the
two without any thought will definitely confuse everybody.  We need to
figure out a way.

>From what I can tell..

One way is we choose one of them which will be superior to the other and
all of us stick with it (for either higher possibility of migrate, less
interference to the workloads, and so on).

The other way is we take both, when each of them may be suitable for
different scenarios.  However in this latter case, we'd better at least be
aware of the differences (which suites what), then that'll be part of
documentation we need for each of the features when the user wants to use
them.

Add Yong into the loop.

Any thoughts?

-- 
Peter Xu




Re: [PATCH v3 29/34] tcg: Reorg function calls

2022-12-06 Thread Ilya Leoshkevich
On Tue, 2022-12-06 at 09:49 -0600, Richard Henderson wrote:
> On 12/6/22 09:28, Ilya Leoshkevich wrote:
> > > +    switch (TCG_TARGET_CALL_ARG_I64) {
> > > +    case TCG_CALL_ARG_EVEN:
> > 
> > On a s390x host with gcc-11.0.1-0.3.1.ibm.fc34.s390x I get:
> > 
> > FAILED: libqemu-aarch64-softmmu.fa.p/tcg_tcg.c.o
> > ../tcg/tcg.c: In function ‘init_call_layout’:
> > ../tcg/tcg.c:739:13: error: case value ‘1’ not in enumerated type
> > [-Werror=switch]
> >    739 | case TCG_CALL_ARG_EVEN:
> >    | ^~~~
> > 
> > The following helps:
> 
> Yes, I found and fixed this since.
> 
> > --- a/tcg/tcg.c
> > +++ b/tcg/tcg.c
> > @@ -735,7 +735,7 @@ static void init_call_layout(TCGHelperInfo
> > *info)
> >   break;
> >   
> >   case TCG_TYPE_I64:
> > -    switch (TCG_TARGET_CALL_ARG_I64) {
> > +    switch ((TCGCallArgumentKind)TCG_TARGET_CALL_ARG_I64)
> > {
> >   case TCG_CALL_ARG_EVEN:
> >   layout_arg_even();
> >   /* fall through */
> > 
> > This looks like a gcc bug to me.
> 
> The gcc "bug" is only in not being sufficiently verbose.  It should
> say something about 
> *differing* enumerated types, and perhaps name them.
> 
> Back in patch 20, tcg/s390x/tcg-target.h,
> 
> -#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_RET_NORMAL
> +#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
> 
> 
> r~

I looked at this line and completely missed the RET vs ARG difference.
Your diff fixes the issue for me too, of course.
Thanks!


Re: Thoughts on removing the TARGET_I386 part of hw/display/vga/vbe_portio_list[]

2022-12-06 Thread Philippe Mathieu-Daudé

On 6/12/22 13:30, Dr. David Alan Gilbert wrote:

* Philippe Mathieu-Daudé (phi...@linaro.org) wrote:

Hi,

I'm trying to understand the x86 architecture-specific code in
hw/display/vga.c:

 const MemoryRegionPortio vbe_portio_list[] = {
 { 0, 1, 2, .read = vbe_ioport_read_index,
.write = vbe_ioport_write_index },
 # ifdef TARGET_I386
 { 1, 1, 2, .read = vbe_ioport_read_data,
.write = vbe_ioport_write_data },
 # endif
 { 2, 1, 2, .read = vbe_ioport_read_data,
.write = vbe_ioport_write_data },
 PORTIO_END_OF_LIST(),
 };

Having:

 typedef struct MemoryRegionPortio {
 uint32_t offset;
 uint32_t len;
 unsigned size;
 uint32_t (*read)(...);
 void (*write)(...);
 ...
 } MemoryRegionPortio;

So on x86 we can have 16-bit I/O accesses unaligned to 8-bit boundary?


Yes, like most things in x86 the requirement for alignment is a 'should'
followed by a description of what might happen if you don't:

 From intel arch manual 19.3:
  '..16-bit ports should be aligned to even addresses (0, 2, 4, ...) so that 
all 16 bits can be transferred in a
   single bus cycle. Likewise, 32-bit ports should be aligned to addresses that 
are multiples of four (0, 4, 8, ...). The
   processor supports data transfers to unaligned ports, but there is a 
performance penalty because one or more
   extra bus cycle must be used.'


So you confirm this is a architecture behavior, not a device one, thanks.


I think I've even seen it suggested that a 32bit access to  might be
defined - although I'm not sure if that's legal.


Easy to test :) If unspecified and there is some ISA-to-XXX bridge, then 
I expect this to be implementation dependent of the bridge.



I don't know that bit of qemu well enough to know whether the cpu part
of qemu should be splitting the unaligned accesses or not.
All I/O accesses are gated thru access_with_adjusted_size() in 
softmmu/memory.c.


There is an old access_with_adjusted_size_unaligned() version [1] from
Andrew and a more recent series [2] from Richard. Maybe the latter fixes
some long-standing bug [3] we have here?

[1] 
https://lore.kernel.org/qemu-devel/20170630030058.28943-1-and...@aj.id.au/
[2] 
https://lore.kernel.org/qemu-devel/20210619172626.875885-15-richard.hender...@linaro.org/
[3] 
https://lore.kernel.org/qemu-devel/cafeaca-fmurwnpu90qf1lwgsq36m-pmx2uc1+kent__otlx...@mail.gmail.com/




Re: [PATCH v3 29/34] tcg: Reorg function calls

2022-12-06 Thread Richard Henderson

On 12/6/22 09:28, Ilya Leoshkevich wrote:

+switch (TCG_TARGET_CALL_ARG_I64) {
+case TCG_CALL_ARG_EVEN:


On a s390x host with gcc-11.0.1-0.3.1.ibm.fc34.s390x I get:

FAILED: libqemu-aarch64-softmmu.fa.p/tcg_tcg.c.o
../tcg/tcg.c: In function ‘init_call_layout’:
../tcg/tcg.c:739:13: error: case value ‘1’ not in enumerated type 
[-Werror=switch]
   739 | case TCG_CALL_ARG_EVEN:
   | ^~~~

The following helps:


Yes, I found and fixed this since.


--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -735,7 +735,7 @@ static void init_call_layout(TCGHelperInfo *info)
  break;
  
  case TCG_TYPE_I64:

-switch (TCG_TARGET_CALL_ARG_I64) {
+switch ((TCGCallArgumentKind)TCG_TARGET_CALL_ARG_I64) {
  case TCG_CALL_ARG_EVEN:
  layout_arg_even();
  /* fall through */

This looks like a gcc bug to me.


The gcc "bug" is only in not being sufficiently verbose.  It should say something about 
*differing* enumerated types, and perhaps name them.


Back in patch 20, tcg/s390x/tcg-target.h,

-#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_RET_NORMAL
+#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL


r~



Re: [PATCH v3 01/13] tcg/s390x: Use register pair allocation for div and mulu2

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 10:51:48PM -0800, Richard Henderson wrote:
> Previously we hard-coded R2 and R3.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/s390x/tcg-target-con-set.h |  4 ++--
>  tcg/s390x/tcg-target-con-str.h |  8 +--
>  tcg/s390x/tcg-target.c.inc | 43 +-
>  3 files changed, 35 insertions(+), 20 deletions(-)

Reviewed-by: Ilya Leoshkevich 



Re: [PATCH v10 5/9] KVM: Use gfn instead of hva for mmu_notifier_retry

2022-12-06 Thread Fuad Tabba
Hi,

On Tue, Dec 6, 2022 at 12:01 PM Chao Peng  wrote:
>
> On Mon, Dec 05, 2022 at 09:23:49AM +, Fuad Tabba wrote:
> > Hi Chao,
> >
> > On Fri, Dec 2, 2022 at 6:19 AM Chao Peng  
> > wrote:
> > >
> > > Currently in mmu_notifier invalidate path, hva range is recorded and
> > > then checked against by mmu_notifier_retry_hva() in the page fault
> > > handling path. However, for the to be introduced private memory, a page
> > > fault may not have a hva associated, checking gfn(gpa) makes more sense.
> > >
> > > For existing hva based shared memory, gfn is expected to also work. The
> > > only downside is when aliasing multiple gfns to a single hva, the
> > > current algorithm of checking multiple ranges could result in a much
> > > larger range being rejected. Such aliasing should be uncommon, so the
> > > impact is expected small.
> > >
> > > Suggested-by: Sean Christopherson 
> > > Signed-off-by: Chao Peng 
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c   |  8 +---
> > >  include/linux/kvm_host.h | 33 +
> > >  virt/kvm/kvm_main.c  | 32 +++-
> > >  3 files changed, 49 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 4736d7849c60..e2c70b5afa3e 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4259,7 +4259,7 @@ static bool is_page_fault_stale(struct kvm_vcpu 
> > > *vcpu,
> > > return true;
> > >
> > > return fault->slot &&
> > > -  mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
> > > +  mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn);
> > >  }
> > >
> > >  static int direct_page_fault(struct kvm_vcpu *vcpu, struct 
> > > kvm_page_fault *fault)
> > > @@ -6098,7 +6098,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t 
> > > gfn_start, gfn_t gfn_end)
> > >
> > > write_lock(>mmu_lock);
> > >
> > > -   kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end);
> > > +   kvm_mmu_invalidate_begin(kvm);
> > > +
> > > +   kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end);
> > >
> > > flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
> > >
> > > @@ -6112,7 +6114,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t 
> > > gfn_start, gfn_t gfn_end)
> > > kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
> > >gfn_end - gfn_start);
> > >
> > > -   kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
> > > +   kvm_mmu_invalidate_end(kvm);
> > >
> > > write_unlock(>mmu_lock);
> > >  }
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 02347e386ea2..3d69484d2704 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -787,8 +787,8 @@ struct kvm {
> > > struct mmu_notifier mmu_notifier;
> > > unsigned long mmu_invalidate_seq;
> > > long mmu_invalidate_in_progress;
> > > -   unsigned long mmu_invalidate_range_start;
> > > -   unsigned long mmu_invalidate_range_end;
> > > +   gfn_t mmu_invalidate_range_start;
> > > +   gfn_t mmu_invalidate_range_end;
> > >  #endif
> > > struct list_head devices;
> > > u64 manual_dirty_log_protect;
> > > @@ -1389,10 +1389,9 @@ void kvm_mmu_free_memory_cache(struct 
> > > kvm_mmu_memory_cache *mc);
> > >  void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
> > >  #endif
> > >
> > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
> > > - unsigned long end);
> > > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start,
> > > -   unsigned long end);
> > > +void kvm_mmu_invalidate_begin(struct kvm *kvm);
> > > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t 
> > > end);
> > > +void kvm_mmu_invalidate_end(struct kvm *kvm);
> > >
> > >  long kvm_arch_dev_ioctl(struct file *filp,
> > > unsigned int ioctl, unsigned long arg);
> > > @@ -1963,9 +1962,9 @@ static inline int mmu_invalidate_retry(struct kvm 
> > > *kvm, unsigned long mmu_seq)
> > > return 0;
> > >  }
> > >
> > > -static inline int mmu_invalidate_retry_hva(struct kvm *kvm,
> > > +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
> > >unsigned long mmu_seq,
> > > -  unsigned long hva)
> > > +  gfn_t gfn)
> > >  {
> > > lockdep_assert_held(>mmu_lock);
> > > /*
> > > @@ -1974,10 +1973,20 @@ static inline int mmu_invalidate_retry_hva(struct 
> > > kvm *kvm,
> > >  * that might be being invalidated. Note that it may include some 
> > > false
> >
> > nit: "might be" (or) "is being"
> >
> > >  * positives, due to shortcuts when handing concurrent 
> > > invalidations.
> >
> > 

Re: [PATCH v10 4/9] KVM: Add KVM_EXIT_MEMORY_FAULT exit

2022-12-06 Thread Fuad Tabba
Hi,

On Fri, Dec 2, 2022 at 6:19 AM Chao Peng  wrote:
>
> This new KVM exit allows userspace to handle memory-related errors. It
> indicates an error happens in KVM at guest memory range [gpa, gpa+size).
> The flags includes additional information for userspace to handle the
> error. Currently bit 0 is defined as 'private memory' where '1'
> indicates error happens due to private memory access and '0' indicates
> error happens due to shared memory access.
>
> When private memory is enabled, this new exit will be used for KVM to
> exit to userspace for shared <-> private memory conversion in memory
> encryption usage. In such usage, typically there are two kind of memory
> conversions:
>   - explicit conversion: happens when guest explicitly calls into KVM
> to map a range (as private or shared), KVM then exits to userspace
> to perform the map/unmap operations.
>   - implicit conversion: happens in KVM page fault handler where KVM
> exits to userspace for an implicit conversion when the page is in a
> different state than requested (private or shared).
>
> Suggested-by: Sean Christopherson 
> Co-developed-by: Yu Zhang 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> Reviewed-by: Fuad Tabba 
> ---
>  Documentation/virt/kvm/api.rst | 22 ++
>  include/uapi/linux/kvm.h   |  8 
>  2 files changed, 30 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 99352170c130..d9edb14ce30b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6634,6 +6634,28 @@ array field represents return values. The userspace 
> should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>
> +::
> +
> +   /* KVM_EXIT_MEMORY_FAULT */
> +   struct {
> +  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 0)
> +   __u64 flags;

I see you've removed the padding and increased the flag size.

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad




> +   __u64 gpa;
> +   __u64 size;
> +   } memory;
> +
> +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has
> +encountered a memory error which is not handled by KVM kernel module and
> +userspace may choose to handle it. The 'flags' field indicates the memory
> +properties of the exit.
> +
> + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by
> +   private memory access when the bit is set. Otherwise the memory error is
> +   caused by shared memory access when the bit is clear.
> +
> +'gpa' and 'size' indicate the memory range the error occurs at. The userspace
> +may handle the error and return to KVM to retry the previous memory access.
> +
>  ::
>
>  /* KVM_EXIT_NOTIFY */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 13bff963b8b0..c7e9d375a902 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -300,6 +300,7 @@ struct kvm_xen_exit {
>  #define KVM_EXIT_RISCV_SBI35
>  #define KVM_EXIT_RISCV_CSR36
>  #define KVM_EXIT_NOTIFY   37
> +#define KVM_EXIT_MEMORY_FAULT 38
>
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -541,6 +542,13 @@ struct kvm_run {
>  #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
> __u32 flags;
> } notify;
> +   /* KVM_EXIT_MEMORY_FAULT */
> +   struct {
> +#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1ULL << 0)
> +   __u64 flags;
> +   __u64 gpa;
> +   __u64 size;
> +   } memory;
> /* Fix the size of the union. */
> char padding[256];
> };
> --
> 2.25.1
>



Re: [PATCH v1 13/13] virtio/vhost-user: dynamically assign VhostUserHostNotifiers

2022-12-06 Thread Stefan Hajnoczi
On Mon, 21 Mar 2022 at 11:59, Alex Bennée  wrote:
>
> At a couple of hundred bytes per notifier allocating one for every
> potential queue is very wasteful as most devices only have a few
> queues. Instead of having this handled statically dynamically assign
> them and track in a GPtrArray.
>
> [AJB: it's hard to trigger the vhost notifiers code, I assume as it
> requires a KVM guest with appropriate backend]

I think vhost works with TCG. There is ioeventfd emulation in QEMU's
memory dispatch code, see memory_region_dispatch_write_eventfds().
There is irqfd emulation code for VIRTIO devices in
virtio_queue_set_guest_notifier_fd_handler().

Why do you say it's hard to trigger?

Stefan

>
> Signed-off-by: Alex Bennée 
> ---
>  include/hw/virtio/vhost-user.h | 42 -
>  hw/virtio/vhost-user.c | 83 +++---
>  hw/virtio/trace-events |  1 +
>  3 files changed, 108 insertions(+), 18 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 6e0e8a71a3..c6e693cd3f 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -11,21 +11,61 @@
>  #include "chardev/char-fe.h"
>  #include "hw/virtio/virtio.h"
>
> +/**
> + * VhostUserHostNotifier - notifier information for one queue
> + * @rcu: rcu_head for cleanup
> + * @mr: memory region of notifier
> + * @addr: current mapped address
> + * @unmap_addr: address to be un-mapped
> + * @idx: virtioqueue index
> + *
> + * The VhostUserHostNotifier entries are re-used. When an old mapping
> + * is to be released it is moved to @unmap_addr and @addr is replaced.
> + * Once the RCU process has completed the unmap @unmap_addr is
> + * cleared.
> + */
>  typedef struct VhostUserHostNotifier {
>  struct rcu_head rcu;
>  MemoryRegion mr;
>  void *addr;
>  void *unmap_addr;
> +int idx;
>  } VhostUserHostNotifier;
>
> +/**
> + * VhostUserState - shared state for all vhost-user devices
> + * @chr: the character backend for the socket
> + * @notifiers: GPtrArray of @VhostUserHostnotifier
> + * @memory_slots:
> + */
>  typedef struct VhostUserState {
>  CharBackend *chr;
> -VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> +GPtrArray *notifiers;
>  int memory_slots;
>  bool supports_config;
>  } VhostUserState;
>
> +/**
> + * vhost_user_init() - initialise shared vhost_user state
> + * @user: allocated area for storing shared state
> + * @chr: the chardev for the vhost socket
> + * @errp: error handle
> + *
> + * User can either directly g_new() space for the state or embed
> + * VhostUserState in their larger device structure and just point to
> + * it.
> + *
> + * Return: true on success, false on error while setting errp.
> + */
>  bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp);
> +
> +/**
> + * vhost_user_cleanup() - cleanup state
> + * @user: ptr to use state
> + *
> + * Cleans up shared state and notifiers, callee is responsible for
> + * freeing the @VhostUserState memory itself.
> + */
>  void vhost_user_cleanup(VhostUserState *user);
>
>  #endif
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 6ce082861b..4c0423de55 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1174,14 +1174,16 @@ static void 
> vhost_user_host_notifier_free(VhostUserHostNotifier *n)
>  n->unmap_addr = NULL;
>  }
>
> -static void vhost_user_host_notifier_remove(VhostUserState *user,
> -VirtIODevice *vdev, int 
> queue_idx)
> +/*
> + * clean-up function for notifier, will finally free the structure
> + * under rcu.
> + */
> +static void vhost_user_host_notifier_remove(VhostUserHostNotifier *n,
> +VirtIODevice *vdev)
>  {
> -VhostUserHostNotifier *n = >notifier[queue_idx];
> -
>  if (n->addr) {
>  if (vdev) {
> -virtio_queue_set_host_notifier_mr(vdev, queue_idx, >mr, 
> false);
> +virtio_queue_set_host_notifier_mr(vdev, n->idx, >mr, false);
>  }
>  assert(!n->unmap_addr);
>  n->unmap_addr = n->addr;
> @@ -1225,6 +1227,15 @@ static int vhost_user_set_vring_enable(struct 
> vhost_dev *dev, int enable)
>  return 0;
>  }
>
> +static VhostUserHostNotifier *fetch_notifier(VhostUserState *u,
> + int idx)
> +{
> +if (idx >= u->notifiers->len) {
> +return NULL;
> +}
> +return g_ptr_array_index(u->notifiers, idx);
> +}
> +
>  static int vhost_user_get_vring_base(struct vhost_dev *dev,
>   struct vhost_vring_state *ring)
>  {
> @@ -1237,7 +1248,10 @@ static int vhost_user_get_vring_base(struct vhost_dev 
> *dev,
>  };
>  struct vhost_user *u = dev->opaque;
>
> -vhost_user_host_notifier_remove(u->user, dev->vdev, ring->index);
> +VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
> +if 

Re: [PATCH 22/22] tcg/riscv: Implement direct branch for goto_tb

2022-12-06 Thread Richard Henderson

On 12/6/22 01:48, Philippe Mathieu-Daudé wrote:

On 6/12/22 05:17, Richard Henderson wrote:

Now that tcg can handle direct and indirect goto_tb simultaneously,
we can optimistically leave space for a direct branch and fall back
to loading the pointer from the TB for an indirect branch.

Signed-off-by: Richard Henderson 
---
  tcg/riscv/tcg-target.h |  5 +
  tcg/riscv/tcg-target.c.inc | 19 +--
  2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 56f7bc3346..a75c84f2a6 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -159,6 +159,11 @@ typedef enum {
  #define TCG_TARGET_HAS_mulsh_i64    1
  #endif
+<<< HEAD
+===
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
+
+>>> 89ab294271 (tcg/riscv: Implement TCG_TARGET_HAS_direct_jump)


HEAD is correct :)


Ouch. Clearly didn't get the fixed branch pushed back.
I wonder what else I missed...


r~




Re: [PATCH 18/22] tcg/sparc64: Remove USE_REG_TB

2022-12-06 Thread Richard Henderson

On 12/6/22 01:44, Philippe Mathieu-Daudé wrote:

On 6/12/22 05:17, Richard Henderson wrote:

This is always true for sparc64, so this is dead since 3a5f6805c7ca.

Signed-off-by: Richard Henderson 
---
  tcg/sparc64/tcg-target.c.inc | 57 ++--
  1 file changed, 22 insertions(+), 35 deletions(-)



@@ -1897,7 +1884,7 @@ void tb_target_set_jmp_target(const TranslationBlock *tb, 
int n,
  tcg_debug_assert(tb_disp == (int32_t)tb_disp);
  tcg_debug_assert(br_disp == (int32_t)br_disp);
-    if (!USE_REG_TB) {
+    if (0) {
  qatomic_set((uint32_t *)jmp_rw,
  deposit32(CALL, 0, 30, br_disp >> 2));
  flush_idcache_range(jmp_rx, jmp_rw, 4);


Why remove in the next patch and not here?


Heh.  I did that so I could move this code in the next patch.
I meant to go back and edit this patch to delete, after I'd done that.


r~



Reviewed-by: Philippe Mathieu-Daudé 






Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Claudio Fontana
On 12/6/22 15:53, Claudio Fontana wrote:
> On 5/17/21 13:11, Philippe Mathieu-Daudé wrote:
>> Guard declarations within hwaddr.h against inclusion
>> from user-mode emulation.
>>
>> To make it clearer this header is sysemu specific,
>> move it to the sysemu/ directory.
> 
> Hi Philippe,
> 
> do we need include/exec/sysemu/... .h
> 
> as opposed to just use the existing
> 
> include/sysemu/
> 
> ?

...and I would if anything go include/sysemu/exec/ not include/exec/sysemu ,

to highlight first that it is part of the sysemu build, when trying to reason 
about what gets built for sysemu vs anything else.

Ciao

C







Re: [PATCH v3 29/34] tcg: Reorg function calls

2022-12-06 Thread Ilya Leoshkevich
On Thu, Dec 01, 2022 at 09:39:53PM -0800, Richard Henderson wrote:
> Pre-compute the function call layout for each helper at startup.
> Drop TCG_CALL_DUMMY_ARG, as we no longer need to leave gaps
> in the op->args[] array.  This allows several places to stop
> checking for NULL TCGTemp, to which TCG_CALL_DUMMY_ARG mapped.
> 
> For tcg_gen_callN, loop over the arguments once.  Allocate the TCGOp
> for the call early but delay emitting it, collecting arguments first.
> This allows the argument processing loop to emit code for extensions
> and have them sequenced before the call.
> 
> For tcg_reg_alloc_call, loop over the arguments in reverse order,
> which allows stack slots to be filled first naturally.
> 
> Signed-off-by: Richard Henderson 
> ---
>  include/exec/helper-head.h |   2 +
>  include/tcg/tcg.h  |   5 +-
>  tcg/tcg-internal.h |  22 +-
>  tcg/optimize.c |   6 +-
>  tcg/tcg.c  | 609 ++---
>  5 files changed, 394 insertions(+), 250 deletions(-)

...

> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index d08323db49..74f7491d73 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -547,7 +547,7 @@ void tcg_pool_reset(TCGContext *s)
>  
>  #include "exec/helper-proto.h"
>  
> -static const TCGHelperInfo all_helpers[] = {
> +static TCGHelperInfo all_helpers[] = {
>  #include "exec/helper-tcg.h"
>  };
>  static GHashTable *helper_table;
> @@ -565,6 +565,154 @@ static ffi_type * const typecode_to_ffi[8] = {
>  };
>  #endif
>  
> +typedef struct TCGCumulativeArgs {
> +int arg_idx;/* tcg_gen_callN args[] */
> +int info_in_idx;/* TCGHelperInfo in[] */
> +int arg_slot;   /* regs+stack slot */
> +int ref_slot;   /* stack slots for references */
> +} TCGCumulativeArgs;
> +
> +static void layout_arg_even(TCGCumulativeArgs *cum)
> +{
> +cum->arg_slot += cum->arg_slot & 1;
> +}
> +
> +static void layout_arg_1(TCGCumulativeArgs *cum, TCGHelperInfo *info,
> + TCGCallArgumentKind kind)
> +{
> +TCGCallArgumentLoc *loc = >in[cum->info_in_idx];
> +
> +*loc = (TCGCallArgumentLoc){
> +.kind = kind,
> +.arg_idx = cum->arg_idx,
> +.arg_slot = cum->arg_slot,
> +};
> +cum->info_in_idx++;
> +cum->arg_slot++;
> +}
> +
> +static void layout_arg_normal_n(TCGCumulativeArgs *cum,
> +TCGHelperInfo *info, int n)
> +{
> +TCGCallArgumentLoc *loc = >in[cum->info_in_idx];
> +
> +for (int i = 0; i < n; ++i) {
> +/* Layout all using the same arg_idx, adjusting the subindex. */
> +loc[i] = (TCGCallArgumentLoc){
> +.kind = TCG_CALL_ARG_NORMAL,
> +.arg_idx = cum->arg_idx,
> +.tmp_subindex = i,
> +.arg_slot = cum->arg_slot + i,
> +};
> +}
> +cum->info_in_idx += n;
> +cum->arg_slot += n;
> +}
> +
> +static void init_call_layout(TCGHelperInfo *info)
> +{
> +int max_reg_slots = ARRAY_SIZE(tcg_target_call_iarg_regs);
> +int max_stk_slots = TCG_STATIC_CALL_ARGS_SIZE / sizeof(tcg_target_long);
> +unsigned typemask = info->typemask;
> +unsigned typecode;
> +TCGCumulativeArgs cum = { };
> +
> +/*
> + * Parse and place any function return value.
> + */
> +typecode = typemask & 7;
> +switch (typecode) {
> +case dh_typecode_void:
> +info->nr_out = 0;
> +break;
> +case dh_typecode_i32:
> +case dh_typecode_s32:
> +case dh_typecode_ptr:
> +info->nr_out = 1;
> +info->out_kind = TCG_CALL_RET_NORMAL;
> +break;
> +case dh_typecode_i64:
> +case dh_typecode_s64:
> +info->nr_out = 64 / TCG_TARGET_REG_BITS;
> +info->out_kind = TCG_CALL_RET_NORMAL;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
> +
> +/*
> + * Parse and place function arguments.
> + */
> +for (typemask >>= 3; typemask; typemask >>= 3, cum.arg_idx++) {
> +TCGCallArgumentKind kind;
> +TCGType type;
> +
> +typecode = typemask & 7;
> +switch (typecode) {
> +case dh_typecode_i32:
> +case dh_typecode_s32:
> +type = TCG_TYPE_I32;
> +break;
> +case dh_typecode_i64:
> +case dh_typecode_s64:
> +type = TCG_TYPE_I64;
> +break;
> +case dh_typecode_ptr:
> +type = TCG_TYPE_PTR;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +
> +switch (type) {
> +case TCG_TYPE_I32:
> +switch (TCG_TARGET_CALL_ARG_I32) {
> +case TCG_CALL_ARG_EVEN:
> +layout_arg_even();
> +/* fall through */
> +case TCG_CALL_ARG_NORMAL:
> +layout_arg_1(, info, TCG_CALL_ARG_NORMAL);
> +break;
> +

Re: [PATCH v10 2/9] KVM: Introduce per-page memory attributes

2022-12-06 Thread Fuad Tabba
Hi,

On Fri, Dec 2, 2022 at 6:18 AM Chao Peng  wrote:
>
> In confidential computing usages, whether a page is private or shared is
> necessary information for KVM to perform operations like page fault
> handling, page zapping etc. There are other potential use cases for
> per-page memory attributes, e.g. to make memory read-only (or no-exec,
> or exec-only, etc.) without having to modify memslots.
>
> Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow
> userspace to operate on the per-page memory attributes.
>   - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to
> a guest memory range.
>   - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported
> memory attributes.
>
> KVM internally uses xarray to store the per-page memory attributes.
>
> Suggested-by: Sean Christopherson 
> Signed-off-by: Chao Peng 
> Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com/
> ---
>  Documentation/virt/kvm/api.rst | 63 
>  arch/x86/kvm/Kconfig   |  1 +
>  include/linux/kvm_host.h   |  3 ++
>  include/uapi/linux/kvm.h   | 17 
>  virt/kvm/Kconfig   |  3 ++
>  virt/kvm/kvm_main.c| 76 ++
>  6 files changed, 163 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 5617bc4f899f..bb2f709c0900 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5952,6 +5952,59 @@ delivery must be provided via the "reg_aen" struct.
>  The "pad" and "reserved" fields may be used for future extensions and should 
> be
>  set to 0s by userspace.
>
> +4.138 KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES
> +-
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: u64 memory attributes bitmask(out)
> +:Returns: 0 on success, <0 on error
> +
> +Returns supported memory attributes bitmask. Supported memory attributes will
> +have the corresponding bits set in u64 memory attributes bitmask.
> +
> +The following memory attributes are defined::
> +
> +  #define KVM_MEMORY_ATTRIBUTE_READ  (1ULL << 0)
> +  #define KVM_MEMORY_ATTRIBUTE_WRITE (1ULL << 1)
> +  #define KVM_MEMORY_ATTRIBUTE_EXECUTE   (1ULL << 2)
> +  #define KVM_MEMORY_ATTRIBUTE_PRIVATE   (1ULL << 3)
> +
> +4.139 KVM_SET_MEMORY_ATTRIBUTES
> +-
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_memory_attributes(in/out)
> +:Returns: 0 on success, <0 on error
> +
> +Sets memory attributes for pages in a guest memory range. Parameters are
> +specified via the following structure::
> +
> +  struct kvm_memory_attributes {
> +   __u64 address;
> +   __u64 size;
> +   __u64 attributes;
> +   __u64 flags;
> +  };
> +
> +The user sets the per-page memory attributes to a guest memory range 
> indicated
> +by address/size, and in return KVM adjusts address and size to reflect the
> +actual pages of the memory range have been successfully set to the 
> attributes.
> +If the call returns 0, "address" is updated to the last successful address + 
> 1
> +and "size" is updated to the remaining address size that has not been set
> +successfully. The user should check the return value as well as the size to
> +decide if the operation succeeded for the whole range or not. The user may 
> want
> +to retry the operation with the returned address/size if the previous range 
> was
> +partially successful.
> +
> +Both address and size should be page aligned and the supported attributes 
> can be
> +retrieved with KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES.
> +
> +The "flags" field may be used for future extensions and should be set to 0s.
> +
>  5. The kvm_run structure
>  
>
> @@ -8270,6 +8323,16 @@ structure.
>  When getting the Modified Change Topology Report value, the attr->addr
>  must point to a byte where the value will be stored or retrieved from.
>
> +8.40 KVM_CAP_MEMORY_ATTRIBUTES
> +--
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: vm
> +
> +This capability indicates KVM supports per-page memory attributes and ioctls
> +KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES are available.
> +
>  9. Known KVM API problems
>  =
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index fbeaa9ddef59..a8e379a3afee 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -49,6 +49,7 @@ config KVM
> select SRCU
> select INTERVAL_TREE
> select HAVE_KVM_PM_NOTIFIER if PM
> +   select HAVE_KVM_MEMORY_ATTRIBUTES
> help
>   Support hosting fully virtualized guest machines using hardware
>   virtualization extensions.  You will need a fairly recent
> 

Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Peter Maydell
On Mon, 17 May 2021 at 12:16, Philippe Mathieu-Daudé  wrote:
>
> Guard declarations within hwaddr.h against inclusion
> from user-mode emulation.

They're all safe, though; none of them are target-dependent.

I wonder if we should move MemMapEntry somewhere else -- it's used
in less than 20 files and doesn't really warrant putting
it in hwaddr.h.

-- PMM



Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

2022-12-06 Thread Fuad Tabba
Hi,

On Fri, Dec 2, 2022 at 6:18 AM Chao Peng  wrote:
>
> From: "Kirill A. Shutemov" 
>
> Introduce 'memfd_restricted' system call with the ability to create
> memory areas that are restricted from userspace access through ordinary
> MMU operations (e.g. read/write/mmap). The memory content is expected to
> be used through the new in-kernel interface by a third kernel module.
>
> memfd_restricted() is useful for scenarios where a file descriptor(fd)
> can be used as an interface into mm but want to restrict userspace's
> ability on the fd. Initially it is designed to provide protections for
> KVM encrypted guest memory.
>
> Normally KVM uses memfd memory via mmapping the memfd into KVM userspace
> (e.g. QEMU) and then using the mmaped virtual address to setup the
> mapping in the KVM secondary page table (e.g. EPT). With confidential
> computing technologies like Intel TDX, the memfd memory may be encrypted
> with special key for special software domain (e.g. KVM guest) and is not
> expected to be directly accessed by userspace. Precisely, userspace
> access to such encrypted memory may lead to host crash so should be
> prevented.
>
> memfd_restricted() provides semantics required for KVM guest encrypted
> memory support that a fd created with memfd_restricted() is going to be
> used as the source of guest memory in confidential computing environment
> and KVM can directly interact with core-mm without the need to expose
> the memoy content into KVM userspace.

nit: memory

>
> KVM userspace is still in charge of the lifecycle of the fd. It should
> pass the created fd to KVM. KVM uses the new restrictedmem_get_page() to
> obtain the physical memory page and then uses it to populate the KVM
> secondary page table entries.
>
> The userspace restricted memfd can be fallocate-ed or hole-punched
> from userspace. When hole-punched, KVM can get notified through
> invalidate_start/invalidate_end() callbacks, KVM then gets chance to
> remove any mapped entries of the range in the secondary page tables.
>
> Machine check can happen for memory pages in the restricted memfd,
> instead of routing this directly to userspace, we call the error()
> callback that KVM registered. KVM then gets chance to handle it
> correctly.
>
> memfd_restricted() itself is implemented as a shim layer on top of real
> memory file systems (currently tmpfs). Pages in restrictedmem are marked
> as unmovable and unevictable, this is required for current confidential
> usage. But in future this might be changed.
>
> By default memfd_restricted() prevents userspace read, write and mmap.
> By defining new bit in the 'flags', it can be extended to support other
> restricted semantics in the future.
>
> The system call is currently wired up for x86 arch.

Reviewed-by: Fuad Tabba 
After wiring the system call for arm64 (on qemu/arm64):
Tested-by: Fuad Tabba 

Cheers,
/fuad



>
> Signed-off-by: Kirill A. Shutemov 
> Signed-off-by: Chao Peng 
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  include/linux/restrictedmem.h  |  71 ++
>  include/linux/syscalls.h   |   1 +
>  include/uapi/asm-generic/unistd.h  |   5 +-
>  include/uapi/linux/magic.h |   1 +
>  kernel/sys_ni.c|   3 +
>  mm/Kconfig |   4 +
>  mm/Makefile|   1 +
>  mm/memory-failure.c|   3 +
>  mm/restrictedmem.c | 318 +
>  11 files changed, 408 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/restrictedmem.h
>  create mode 100644 mm/restrictedmem.c
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
> b/arch/x86/entry/syscalls/syscall_32.tbl
> index 320480a8db4f..dc70ba90247e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -455,3 +455,4 @@
>  448i386process_mreleasesys_process_mrelease
>  449i386futex_waitv sys_futex_waitv
>  450i386set_mempolicy_home_node sys_set_mempolicy_home_node
> +451i386memfd_restrictedsys_memfd_restricted
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
> b/arch/x86/entry/syscalls/syscall_64.tbl
> index c84d12608cd2..06516abc8318 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -372,6 +372,7 @@
>  448common  process_mreleasesys_process_mrelease
>  449common  futex_waitv sys_futex_waitv
>  450common  set_mempolicy_home_node sys_set_mempolicy_home_node
> +451common  memfd_restrictedsys_memfd_restricted
>
>  #
>  # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/include/linux/restrictedmem.h b/include/linux/restrictedmem.h
> new file mode 100644
> index ..c2700c5daa43
> --- /dev/null
> +++ b/include/linux/restrictedmem.h
> @@ 

Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Claudio Fontana
On 5/17/21 13:11, Philippe Mathieu-Daudé wrote:
> Guard declarations within hwaddr.h against inclusion
> from user-mode emulation.
> 
> To make it clearer this header is sysemu specific,
> move it to the sysemu/ directory.

Hi Philippe,

do we need include/exec/sysemu/... .h

as opposed to just use the existing

include/sysemu/

?

Thanks,

Claudio

> 
> Patch created mechanically using:
> 
>   $ sed -i s,exec/hwaddr.h,exec/sysemu/hwaddr.h, $(git grep -l exec/hwaddr.h)
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/audio/lm4549.h   | 2 +-
>  hw/net/can/can_sja1000.h| 2 +-
>  hw/net/can/ctucan_core.h| 2 +-
>  hw/net/net_tx_pkt.h | 2 +-
>  include/disas/disas.h   | 4 +++-
>  include/exec/cpu-all.h  | 2 +-
>  include/exec/cpu-common.h   | 2 +-
>  include/exec/cpu-defs.h | 2 +-
>  include/exec/memory.h   | 2 +-
>  include/exec/{ => sysemu}/hwaddr.h  | 7 +--
>  include/hw/arm/sharpsl.h| 2 +-
>  include/hw/arm/soc_dma.h| 2 +-
>  include/hw/arm/sysbus-fdt.h | 2 +-
>  include/hw/arm/virt.h   | 2 +-
>  include/hw/block/block.h| 2 +-
>  include/hw/block/fdc.h  | 2 +-
>  include/hw/block/flash.h| 2 +-
>  include/hw/core/cpu.h   | 4 +++-
>  include/hw/cris/etraxfs_dma.h   | 2 +-
>  include/hw/display/vga.h| 2 +-
>  include/hw/i386/microvm.h   | 2 +-
>  include/hw/i386/x86.h   | 2 +-
>  include/hw/input/lasips2.h  | 2 +-
>  include/hw/loader-fit.h | 2 +-
>  include/hw/misc/allwinner-h3-dramc.h| 2 +-
>  include/hw/misc/empty_slot.h| 2 +-
>  include/hw/nvram/fw_cfg.h   | 2 +-
>  include/hw/pci-host/gpex.h  | 2 +-
>  include/hw/remote/memory.h  | 2 +-
>  include/hw/remote/mpqemu-link.h | 2 +-
>  include/hw/rtc/m48t59.h | 2 +-
>  include/hw/rtc/sun4v-rtc.h  | 2 +-
>  include/hw/timer/tmu012.h   | 2 +-
>  include/hw/virtio/virtio-access.h   | 2 +-
>  include/monitor/monitor.h   | 2 +-
>  include/qemu/accel.h| 4 +++-
>  include/qemu/iova-tree.h| 2 +-
>  include/qemu/userfaultfd.h  | 2 +-
>  dump/dump.c | 2 +-
>  dump/win_dump.c | 2 +-
>  hw/arm/sbsa-ref.c   | 2 +-
>  hw/input/lasips2.c  | 2 +-
>  hw/m68k/next-cube.c | 2 +-
>  hw/ppc/pnv_homer.c  | 2 +-
>  tests/qtest/microbit-test.c | 2 +-
>  MAINTAINERS | 1 +
>  scripts/codeconverter/codeconverter/test_regexps.py | 4 ++--
>  47 files changed, 58 insertions(+), 48 deletions(-)
>  rename include/exec/{ => sysemu}/hwaddr.h (81%)
> 
> diff --git a/hw/audio/lm4549.h b/hw/audio/lm4549.h
> index aba9bb5b077..5d53c2f2179 100644
> --- a/hw/audio/lm4549.h
> +++ b/hw/audio/lm4549.h
> @@ -13,7 +13,7 @@
>  #define HW_LM4549_H
>  
>  #include "audio/audio.h"
> -#include "exec/hwaddr.h"
> +#include "exec/sysemu/hwaddr.h"
>  
>  typedef void (*lm4549_callback)(void *opaque);
>  
> diff --git a/hw/net/can/can_sja1000.h b/hw/net/can/can_sja1000.h
> index 7ca9cd681ed..57e6d4d34e4 100644
> --- a/hw/net/can/can_sja1000.h
> +++ b/hw/net/can/can_sja1000.h
> @@ -27,7 +27,7 @@
>  #ifndef HW_CAN_SJA1000_H
>  #define HW_CAN_SJA1000_H
>  
> -#include "exec/hwaddr.h"
> +#include "exec/sysemu/hwaddr.h"
>  #include "net/can_emu.h"
>  
>  #define CAN_SJA_MEM_SIZE  128
> diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h
> index bbc09ae0678..c0e4beafba2 100644
> --- a/hw/net/can/ctucan_core.h
> +++ b/hw/net/can/ctucan_core.h
> @@ -28,7 +28,7 @@
>  #ifndef HW_CAN_CTUCAN_CORE_H
>  #define HW_CAN_CTUCAN_CORE_H
>  
> -#include "exec/hwaddr.h"
> +#include "exec/sysemu/hwaddr.h"
>  #include "net/can_emu.h"
>  
>  #ifndef HOST_WORDS_BIGENDIAN
> diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
> index 4ec8bbe9bd9..86548b4f613 100644
> --- a/hw/net/net_tx_pkt.h
> +++ b/hw/net/net_tx_pkt.h
> @@ -19,7 +19,7 @@
>  #define NET_TX_PKT_H
>  
>  #include "net/eth.h"
> -#include "exec/hwaddr.h"
> +#include "exec/sysemu/hwaddr.h"
>  
>  /* define to enable packet 

[PATCH qemu] migration/ram: support resize of option rom

2022-12-06 Thread ~tianren
From: Tianren Zhang 

The pci option rom is a RAMBlock mapped from a rom file,
but in some cases of migration, the src and dest machine
may have rom files with different size, which causes the
migration to fail due to mismatch of RAMBlock size.

In those cases, we could make the migration more compatible
by initializing the RAMBlock of the option rom as resizeable.
When a guest with a smaller option rom size(e.g. 72k) is
migrated to the dest started with larger rom size(e.g. 256k),
the resize is totally feasible on the dest qemu because 72K
of incoming RAMBlock < local 256K RAMBlock for the option rom.

Signed-off-by: Tianren Zhang 
---
 hw/pci/pci.c  | 26 -
 include/exec/memory.h | 25 
 softmmu/memory.c  | 45 +++
 3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2f450f6a72..4a662e1d9a 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2407,6 +2407,30 @@ static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, 
uint32_t size)
 }
 }
 
+/*
+ * This is a hook function which is called when the RAMBlock of a option rom
+ * is resized by QEMU in a live migration. For option rom, since the PCI bar
+ * is registered based on the size of ROM MemoryRegion, after a resize is done
+ * on this ROM MemoryRegion, we should re-register its PCI bar based on the new
+ * size.
+ */
+static void pci_option_rom_resized(const char* id, uint64_t length, void 
*host) {
+MemoryRegion *mr = memory_region_from_ramblock_id(id);
+if (mr == NULL) {
+// The failure to react the resize may cause the later check for
+// PCI config to fail in the migration, so the migration may fail,
+// but will not affect the src VM.
+error_report("failed to find the block %s\n", id);
+return;
+}
+
+PCIDevice *pdev = (PCIDevice *)DEVICE(mr->owner);
+fprintf(stdout, "block id: %s resized to 0x" RAM_ADDR_FMT \
+", re-register pci bar for device: %s\n", id, length, pdev->name);
+
+pci_register_bar(pdev, PCI_ROM_SLOT, 0, >rom);
+}
+
 /* Add an option rom for the device */
 static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
Error **errp)
@@ -2486,7 +2510,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom,
 snprintf(name, sizeof(name), "%s.rom", 
object_get_typename(OBJECT(pdev)));
 }
 pdev->has_rom = true;
-memory_region_init_rom(>rom, OBJECT(pdev), name, pdev->romsize, 
_fatal);
+memory_region_init_resizeable_rom(>rom, OBJECT(pdev), name, 
pdev->romsize, pdev->romsize, pci_option_rom_resized, _fatal);
 ptr = memory_region_get_ram_ptr(>rom);
 if (load_image_size(path, ptr, size) < 0) {
 error_setg(errp, "failed to load romfile \"%s\"", pdev->romfile);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..531d3bb6ef 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1276,6 +1276,29 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
uint64_t length,
void *host),
Error **errp);
+
+/*
+ * memory_region_init_resizeable_rom:  Initialize memory region with resizeable
+ * ROM.
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
+ *must be unique within any device
+ * @size: used size of the region.
+ * @max_size: max size of the region.
+ * @resized: callback to notify owner about used size change.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_resizeable_rom(MemoryRegion *mr,
+   struct Object *owner,
+   const char *name,
+   uint64_t size,
+   uint64_t max_size,
+   void (*resized)(const char*,
+   uint64_t length,
+   void *host),
+   Error **errp);
+
 #ifdef CONFIG_POSIX
 
 /**
@@ -2820,6 +2843,8 @@ MemTxResult 
address_space_write_cached_slow(MemoryRegionCache *cache,
 int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
 bool prepare_mmio_access(MemoryRegion *mr);
 
+MemoryRegion *memory_region_from_ramblock_id(const char *id);
+
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
 if (is_write) {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..0f69ed320b 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2399,6 +2399,18 @@ 

Re: [PATCH v12 2/7] s390x/cpu topology: reporting the CPU topology to the guest

2022-12-06 Thread Pierre Morel




On 11/29/22 18:42, Pierre Morel wrote:

The guest uses the STSI instruction to get information on the
CPU topology.

Let us implement the STSI instruction for the basis CPU topology
level, level 2.

Signed-off-by: Pierre Morel 
---
  target/s390x/cpu.h  |  77 +++
  hw/s390x/s390-virtio-ccw.c  |  12 +--
  target/s390x/cpu_topology.c | 186 
  target/s390x/kvm/kvm.c  |   6 +-
  target/s390x/meson.build|   1 +
  5 files changed, 274 insertions(+), 8 deletions(-)
  create mode 100644 target/s390x/cpu_topology.c




+ */
+static void s390_topology_add_cpu(S390Topology *topo, S390CPU *cpu)
+{
+int core_id = cpu->env.core_id;
+int bit, origin;
+int socket_id;
+
+cpu->machine_data = topo;


Sorry this wrong machine_data is already used as a pointer to the 
S390CcwMachineState machine.





+socket_id = core_id / topo->num_cores;
+/*


...snip...


+
+static int setup_stsi(S390CPU *cpu, SysIB_151x *sysib, int level)
+{
+S390Topology *topo = (S390Topology *)cpu->machine_data;


Sorry, wrong too this must be:

S390CcwMachineState *s390ms = cpu->machine_data;
S390Topology *topo = S390_CPU_TOPOLOGY(s390ms->topology);


+char *p = sysib->tle;
+
+sysib->mnest = level;
+switch (level) {
+case 2:
+sysib->mag[S390_TOPOLOGY_MAG2] = topo->num_sockets;
+sysib->mag[S390_TOPOLOGY_MAG1] = topo->num_cores;
+p = s390_top_set_level2(topo, p);
+break;
+}
+
+return p - (char *)sysib;
+}
+



Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen



Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Philippe Mathieu-Daudé

On 6/12/22 15:32, Philippe Mathieu-Daudé wrote:

On 26/5/21 20:15, Richard Henderson wrote:

On 5/17/21 4:11 AM, Philippe Mathieu-Daudé wrote:

--- a/include/exec/hwaddr.h
+++ b/include/exec/sysemu/hwaddr.h
@@ -1,8 +1,9 @@
  /* Define hwaddr if it exists.  */
-#ifndef HWADDR_H
-#define HWADDR_H
+#ifndef EXEC_SYSEMU_HWADDR_H
+#define EXEC_SYSEMU_HWADDR_H
+#ifndef CONFIG_USER_ONLY
  #define HWADDR_BITS 64
  /* hwaddr is the type of a physical address (its size can
@@ -23,4 +24,6 @@ typedef struct MemMapEntry {
  hwaddr size;
  } MemMapEntry;
+#endif /* !CONFIG_USER_ONLY */
+
  #endif


Why no #error on this one, unlike the next patch.


Because many files in user emulation include "exec/hwaddr.h" :(


See for example gdbstub/user.c:

int gdb_breakpoint_insert(CPUState *cs, int type, hwaddr addr, hwaddr len)

I suppose we should change the two hwaddr by vaddr:

/**
 * vaddr:
 * Type wide enough to contain any #target_ulong virtual address.
 */



Re: Thoughts on removing the TARGET_I386 part of hw/display/vga/vbe_portio_list[]

2022-12-06 Thread Gerd Hoffmann
  Hi,

> So on x86 we can have 16-bit I/O accesses unaligned to 8-bit boundary?

Yes.

> So I _think_ today we should be good with removing the x86 line:
> 
> -# ifdef TARGET_I386
> -{ 1, 1, 2, .read = vbe_ioport_read_data, .write = vbe_ioport_write_data 
> },
> -# endif

Nope.  Breaks vgabios.  Testcase:

qemu-system-x86_64 -kernel /boot/vmlinuz-$(uname -r) -append vga=ask

All graphics modes are gone.

take care,
  Gerd




Re: [PATCH v12 1/7] s390x/cpu topology: Creating CPU topology device

2022-12-06 Thread Pierre Morel




On 12/6/22 14:35, Janis Schoetterl-Glausch wrote:

On Tue, 2022-12-06 at 11:32 +0100, Pierre Morel wrote:


On 12/6/22 10:31, Janis Schoetterl-Glausch wrote:

On Tue, 2022-11-29 at 18:42 +0100, Pierre Morel wrote:

We will need a Topology device to transfer the topology
during migration and to implement machine reset.

The device creation is fenced by s390_has_topology().

Signed-off-by: Pierre Morel 
---
   include/hw/s390x/cpu-topology.h| 44 +++
   include/hw/s390x/s390-virtio-ccw.h |  1 +
   hw/s390x/cpu-topology.c| 87 ++
   hw/s390x/s390-virtio-ccw.c | 25 +
   hw/s390x/meson.build   |  1 +
   5 files changed, 158 insertions(+)
   create mode 100644 include/hw/s390x/cpu-topology.h
   create mode 100644 hw/s390x/cpu-topology.c


[...]


diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index 9bba21a916..47ce0aa6fa 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -28,6 +28,7 @@ struct S390CcwMachineState {
   bool dea_key_wrap;
   bool pv;
   uint8_t loadparm[8];
+DeviceState *topology;


Why is this a DeviceState, not S390Topology?
It *has* to be a S390Topology, right? Since you cast it to one in patch 2.


Yes, currently it is the S390Topology.
The idea of Cedric was to have something more generic for future use.


But it still needs to be a S390Topology otherwise you cannot cast it to one, 
can you?


May be I did not understand correctly what Cedric wants.
For my part I agree with you I do not see the point to have something 
different than a S390Topology pointer.


Also doing that is more secure as we do not need cast... which reveals a 
bug I have in setup_stsi() 


Let's do that and see what Cedric says.






   };
   
   struct S390CcwMachineClass {

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
new file mode 100644
index 00..bbf97cd66a
--- /dev/null
+++ b/hw/s390x/cpu-topology.c


[...]
   
+static DeviceState *s390_init_topology(MachineState *machine, Error **errp)

+{
+DeviceState *dev;
+
+dev = qdev_new(TYPE_S390_CPU_TOPOLOGY);
+
+object_property_add_child(>parent_obj,
+  TYPE_S390_CPU_TOPOLOGY, OBJECT(dev));


Why set this property, and why on the machine parent?


For what I understood setting the num_cores and num_sockets as
properties of the CPU Topology object allows to have them better
integrated in the QEMU object framework.


That I understand.


The topology is added to the S390CcwmachineState, it is the parent of
the machine.


But why? And is it added to the S390CcwMachineState, or its parent?


it is added to the S390CcwMachineState.
We receive the MachineState as the "machine" parameter here and it is 
added to the "machine->parent_obj" which is the S390CcwMachineState.










+object_property_set_int(OBJECT(dev), "num-cores",
+machine->smp.cores * machine->smp.threads, errp);
+object_property_set_int(OBJECT(dev), "num-sockets",
+machine->smp.sockets, errp);
+
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);


I must admit that I haven't fully grokked qemu's memory management yet.
Is the topology devices now owned by the sysbus?


Yes it is so we see it on the qtree with its properties.



If so, is it fine to have a pointer to it S390CcwMachineState?


Why not?


If it's owned by the sysbus and the object is not explicitly referenced
for the pointer, it might be deallocated and then you'd have a dangling pointer.


Why would it be deallocated ?
as long it is not unrealized it belongs to the sysbus doesn't it?

Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen



Re: [PATCH 02/11] exec: Restrict hwaddr.h to sysemu/

2022-12-06 Thread Philippe Mathieu-Daudé

On 26/5/21 20:15, Richard Henderson wrote:

On 5/17/21 4:11 AM, Philippe Mathieu-Daudé wrote:

--- a/include/exec/hwaddr.h
+++ b/include/exec/sysemu/hwaddr.h
@@ -1,8 +1,9 @@
  /* Define hwaddr if it exists.  */
-#ifndef HWADDR_H
-#define HWADDR_H
+#ifndef EXEC_SYSEMU_HWADDR_H
+#define EXEC_SYSEMU_HWADDR_H
+#ifndef CONFIG_USER_ONLY
  #define HWADDR_BITS 64
  /* hwaddr is the type of a physical address (its size can
@@ -23,4 +24,6 @@ typedef struct MemMapEntry {
  hwaddr size;
  } MemMapEntry;
+#endif /* !CONFIG_USER_ONLY */
+
  #endif


Why no #error on this one, unlike the next patch.


Because many files in user emulation include "exec/hwaddr.h" :(



Re: [PATCH 3/3] intel-iommu: build iova tree during IOMMU translation

2022-12-06 Thread Peter Xu
On Tue, Dec 06, 2022 at 11:18:03AM +0800, Jason Wang wrote:
> On Tue, Dec 6, 2022 at 7:19 AM Peter Xu  wrote:
> >
> > Jason,
> >
> > On Mon, Dec 05, 2022 at 12:12:04PM +0800, Jason Wang wrote:
> > > I'm fine to go without iova-tree. Would you mind to post patches for
> > > fix? I can test and include it in this series then.
> >
> > One sample patch attached, only compile tested.
> 
> I don't see any direct connection between the attached patch and the
> intel-iommu?

Sorry!  Wrong tree dumped...  Trying again.

> 
> >
> > I can also work on this but I'll be slow in making progress, so I'll add it
> > into my todo.  If you can help to fix this issue it'll be more than great.
> 
> Ok, let me try but it might take some time :)

Sure. :)

I'll also add it into my todo (and I think the other similar one has been
there for a while.. :( ).

-- 
Peter Xu
>From 37c743761d20c16891856c5bef2e7b3fb89893b6 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Mon, 5 Dec 2022 18:11:36 -0500
Subject: [PATCH] intel-iommu: Send unmap notifications for domain or global
 inv desc
Content-type: text/plain

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 41 +
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a08ee85edf..2c6ca68df0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -206,6 +206,23 @@ static inline gboolean 
vtd_as_has_map_notifier(VTDAddressSpace *as)
 return as->notifier_flags & IOMMU_NOTIFIER_MAP;
 }
 
+static void vtd_as_notify_unmap(VTDAddressSpace *as, hwaddr iova,
+hwaddr addr_mask)
+{
+IOMMUTLBEvent event = {
+.type = IOMMU_NOTIFIER_UNMAP,
+.entry = {
+.target_as = _space_memory,
+.iova = iova,
+.translated_addr = 0,
+.addr_mask = addr_mask,
+.perm = IOMMU_NONE,
+},
+};
+
+memory_region_notify_iommu(>iommu, 0, event);
+}
+
 /* GHashTable functions */
 static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -1530,13 +1547,15 @@ static int 
vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
 return vtd_page_walk(s, ce, addr, addr + size, , vtd_as->pasid);
 }
 
-static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
+static int vtd_address_space_sync(VTDAddressSpace *vtd_as)
 {
 int ret;
 VTDContextEntry ce;
 IOMMUNotifier *n;
 
-if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
+/* If no MAP notifier registered, we simply invalidate all the cache */
+if (!vtd_as_has_map_notifier(vtd_as)) {
+vtd_as_notify_unmap(vtd_as, 0, HWADDR_MAX);
 return 0;
 }
 
@@ -2000,7 +2019,7 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
 VTDAddressSpace *vtd_as;
 
 QLIST_FOREACH(vtd_as, >vtd_as_with_notifiers, next) {
-vtd_sync_shadow_page_table(vtd_as);
+vtd_address_space_sync(vtd_as);
 }
 }
 
@@ -2082,7 +2101,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState 
*s,
  * framework will skip MAP notifications if that
  * happened.
  */
-vtd_sync_shadow_page_table(vtd_as);
+vtd_address_space_sync(vtd_as);
 }
 }
 }
@@ -2140,7 +2159,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState 
*s, uint16_t domain_id)
 if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
   vtd_as->devfn, ) &&
 domain_id == vtd_get_domain_id(s, , vtd_as->pasid)) {
-vtd_sync_shadow_page_table(vtd_as);
+vtd_address_space_sync(vtd_as);
 }
 }
 }
@@ -2174,17 +2193,7 @@ static void 
vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
  * page tables.  We just deliver the PSI down to
  * invalidate caches.
  */
-IOMMUTLBEvent event = {
-.type = IOMMU_NOTIFIER_UNMAP,
-.entry = {
-.target_as = _space_memory,
-.iova = addr,
-.translated_addr = 0,
-.addr_mask = size - 1,
-.perm = IOMMU_NONE,
-},
-};
-memory_region_notify_iommu(_as->iommu, 0, event);
+vtd_as_notify_unmap(vtd_as, addr, size - 1);
 }
 }
 }
-- 
2.37.3



Re: [PATCH v12 1/7] s390x/cpu topology: Creating CPU topology device

2022-12-06 Thread Janis Schoetterl-Glausch
On Tue, 2022-12-06 at 11:32 +0100, Pierre Morel wrote:
> 
> On 12/6/22 10:31, Janis Schoetterl-Glausch wrote:
> > On Tue, 2022-11-29 at 18:42 +0100, Pierre Morel wrote:
> > > We will need a Topology device to transfer the topology
> > > during migration and to implement machine reset.
> > > 
> > > The device creation is fenced by s390_has_topology().
> > > 
> > > Signed-off-by: Pierre Morel 
> > > ---
> > >   include/hw/s390x/cpu-topology.h| 44 +++
> > >   include/hw/s390x/s390-virtio-ccw.h |  1 +
> > >   hw/s390x/cpu-topology.c| 87 ++
> > >   hw/s390x/s390-virtio-ccw.c | 25 +
> > >   hw/s390x/meson.build   |  1 +
> > >   5 files changed, 158 insertions(+)
> > >   create mode 100644 include/hw/s390x/cpu-topology.h
> > >   create mode 100644 hw/s390x/cpu-topology.c
> > > 
> > [...]
> > 
> > > diff --git a/include/hw/s390x/s390-virtio-ccw.h 
> > > b/include/hw/s390x/s390-virtio-ccw.h
> > > index 9bba21a916..47ce0aa6fa 100644
> > > --- a/include/hw/s390x/s390-virtio-ccw.h
> > > +++ b/include/hw/s390x/s390-virtio-ccw.h
> > > @@ -28,6 +28,7 @@ struct S390CcwMachineState {
> > >   bool dea_key_wrap;
> > >   bool pv;
> > >   uint8_t loadparm[8];
> > > +DeviceState *topology;
> > 
> > Why is this a DeviceState, not S390Topology?
> > It *has* to be a S390Topology, right? Since you cast it to one in patch 2.
> 
> Yes, currently it is the S390Topology.
> The idea of Cedric was to have something more generic for future use.

But it still needs to be a S390Topology otherwise you cannot cast it to one, 
can you?
> 
> > 
> > >   };
> > >   
> > >   struct S390CcwMachineClass {
> > > diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
> > > new file mode 100644
> > > index 00..bbf97cd66a
> > > --- /dev/null
> > > +++ b/hw/s390x/cpu-topology.c
> > > 
> > [...]
> > >   
> > > +static DeviceState *s390_init_topology(MachineState *machine, Error 
> > > **errp)
> > > +{
> > > +DeviceState *dev;
> > > +
> > > +dev = qdev_new(TYPE_S390_CPU_TOPOLOGY);
> > > +
> > > +object_property_add_child(>parent_obj,
> > > +  TYPE_S390_CPU_TOPOLOGY, OBJECT(dev));
> > 
> > Why set this property, and why on the machine parent?
> 
> For what I understood setting the num_cores and num_sockets as 
> properties of the CPU Topology object allows to have them better 
> integrated in the QEMU object framework.

That I understand.
> 
> The topology is added to the S390CcwmachineState, it is the parent of 
> the machine.

But why? And is it added to the S390CcwMachineState, or its parent?
> 
> 
> > 
> > > +object_property_set_int(OBJECT(dev), "num-cores",
> > > +machine->smp.cores * machine->smp.threads, 
> > > errp);
> > > +object_property_set_int(OBJECT(dev), "num-sockets",
> > > +machine->smp.sockets, errp);
> > > +
> > > +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);
> > 
> > I must admit that I haven't fully grokked qemu's memory management yet.
> > Is the topology devices now owned by the sysbus?
> 
> Yes it is so we see it on the qtree with its properties.
> 
> 
> > If so, is it fine to have a pointer to it S390CcwMachineState?
> 
> Why not?

If it's owned by the sysbus and the object is not explicitly referenced
for the pointer, it might be deallocated and then you'd have a dangling pointer.

> It seems logical to me that the sysbus belong to the virtual machine.
> But sometime the way of QEMU are not very transparent for me :)
> so I can be wrong.
> 
> Regards,
> Pierre
> 
> > > +
> > > +return dev;
> > > +}
> > > +
> > [...]
> 




Re: [PATCH v10 2/9] KVM: Introduce per-page memory attributes

2022-12-06 Thread Fabiano Rosas
Chao Peng  writes:

> In confidential computing usages, whether a page is private or shared is
> necessary information for KVM to perform operations like page fault
> handling, page zapping etc. There are other potential use cases for
> per-page memory attributes, e.g. to make memory read-only (or no-exec,
> or exec-only, etc.) without having to modify memslots.
>
> Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow
> userspace to operate on the per-page memory attributes.
>   - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to
> a guest memory range.
>   - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported
> memory attributes.
>
> KVM internally uses xarray to store the per-page memory attributes.
>
> Suggested-by: Sean Christopherson 
> Signed-off-by: Chao Peng 
> Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com/
> ---
>  Documentation/virt/kvm/api.rst | 63 
>  arch/x86/kvm/Kconfig   |  1 +
>  include/linux/kvm_host.h   |  3 ++
>  include/uapi/linux/kvm.h   | 17 
>  virt/kvm/Kconfig   |  3 ++
>  virt/kvm/kvm_main.c| 76 ++
>  6 files changed, 163 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 5617bc4f899f..bb2f709c0900 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5952,6 +5952,59 @@ delivery must be provided via the "reg_aen" struct.
>  The "pad" and "reserved" fields may be used for future extensions and should 
> be
>  set to 0s by userspace.
>  
> +4.138 KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES
> +-
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: u64 memory attributes bitmask(out)
> +:Returns: 0 on success, <0 on error
> +
> +Returns supported memory attributes bitmask. Supported memory attributes will
> +have the corresponding bits set in u64 memory attributes bitmask.
> +
> +The following memory attributes are defined::
> +
> +  #define KVM_MEMORY_ATTRIBUTE_READ  (1ULL << 0)
> +  #define KVM_MEMORY_ATTRIBUTE_WRITE (1ULL << 1)
> +  #define KVM_MEMORY_ATTRIBUTE_EXECUTE   (1ULL << 2)
> +  #define KVM_MEMORY_ATTRIBUTE_PRIVATE   (1ULL << 3)
> +
> +4.139 KVM_SET_MEMORY_ATTRIBUTES
> +-
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_memory_attributes(in/out)
> +:Returns: 0 on success, <0 on error
> +
> +Sets memory attributes for pages in a guest memory range. Parameters are
> +specified via the following structure::
> +
> +  struct kvm_memory_attributes {
> + __u64 address;
> + __u64 size;
> + __u64 attributes;
> + __u64 flags;
> +  };
> +
> +The user sets the per-page memory attributes to a guest memory range 
> indicated
> +by address/size, and in return KVM adjusts address and size to reflect the
> +actual pages of the memory range have been successfully set to the 
> attributes.

This wording could cause some confusion, what about a simpler:

"reflect the range of pages that had its attributes successfully set"

> +If the call returns 0, "address" is updated to the last successful address + 
> 1
> +and "size" is updated to the remaining address size that has not been set
> +successfully.

"address + 1 page" or "subsequent page" perhaps.

In fact, wouldn't this all become simpler if size were number of pages instead?

> The user should check the return value as well as the size to
> +decide if the operation succeeded for the whole range or not. The user may 
> want
> +to retry the operation with the returned address/size if the previous range 
> was
> +partially successful.
> +
> +Both address and size should be page aligned and the supported attributes 
> can be
> +retrieved with KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES.
> +
> +The "flags" field may be used for future extensions and should be set to 0s.
> +

...

> +static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> +struct kvm_memory_attributes *attrs)
> +{
> + gfn_t start, end;
> + unsigned long i;
> + void *entry;
> + u64 supported_attrs = kvm_supported_mem_attributes(kvm);
> +
> + /* flags is currently not used. */
> + if (attrs->flags)
> + return -EINVAL;
> + if (attrs->attributes & ~supported_attrs)
> + return -EINVAL;
> + if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
> + return -EINVAL;
> + if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> + return -EINVAL;
> +
> + start = attrs->address >> PAGE_SHIFT;
> + end = (attrs->address + attrs->size - 1 + PAGE_SIZE) >> PAGE_SHIFT;

Here PAGE_SIZE and -1 cancel out.

Consider using gpa_to_gfn as well.


Re: [PATCH 2/3] intel-iommu: fail DEVIOTLB_UNMAP without dt mode

2022-12-06 Thread Eric Auger
Hi jason,

On 11/29/22 09:10, Jason Wang wrote:
> Without dt mode, device IOTLB notifier won't work since guest won't
> send device IOTLB invalidation descriptor in this case. Let's fail
> early instead of misbehaving silently.
>
> Signed-off-by: Jason Wang 
> ---
>  hw/i386/intel_iommu.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 9143376677..d025ef2873 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3179,6 +3179,7 @@ static int 
> vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>  {
>  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>  IntelIOMMUState *s = vtd_as->iommu_state;
> +X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
>  
>  /* TODO: add support for VFIO and vhost users */
>  if (s->snoop_control) {
> @@ -3193,6 +3194,13 @@ static int 
> vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>   PCI_FUNC(vtd_as->devfn));
>  return -ENOTSUP;
>  }
> +if (!x86_iommu->dt_supported && (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP)) {
> +error_setg_errno(errp, ENOTSUP,
> + "device %02x.%02x.%x requires device IOTLB mode",
maybe precise INTEL IOMMU device-IOTLB mode. otherwise this may be
confused with device ATS capability?

While thinking about those error handlings (including the SMMU ones)
nothing should really prevent you from registering a notifier that is
not signalled. Maybe we should add in the documentation that any attempt
to register an IOMMU notifier to an IOMMU MR that is not able to signal
it will return an error.

Besides
Reviewed-by: Eric Auger 

Eric
> + pci_bus_num(vtd_as->bus), PCI_SLOT(vtd_as->devfn),
> + PCI_FUNC(vtd_as->devfn));
> +return -ENOTSUP;
> +}
>  
>  /* Update per-address-space notifier flags */
>  vtd_as->notifier_flags = new;




Re: [PATCH 1/3] intel-iommu: fail MAP notifier without caching mode

2022-12-06 Thread Eric Auger
Hi Jason,

On 11/29/22 09:10, Jason Wang wrote:
> Without caching mode, MAP notifier won't work correctly since guest
> won't send IOTLB update event when it establishes new mappings in the
> I/O page tables. Let's fail the IOMMU notifiers early instead of
> misbehaving silently.
>
> Signed-off-by: Jason Wang 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  hw/i386/intel_iommu.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index a08ee85edf..9143376677 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3186,6 +3186,13 @@ static int 
> vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>   "Snoop Control with vhost or VFIO is not 
> supported");
>  return -ENOTSUP;
>  }
> +if (!s->caching_mode && (new & IOMMU_NOTIFIER_MAP)) {
> +error_setg_errno(errp, ENOTSUP,
> + "device %02x.%02x.%x requires caching mode",
> + pci_bus_num(vtd_as->bus), PCI_SLOT(vtd_as->devfn),
> + PCI_FUNC(vtd_as->devfn));
> +return -ENOTSUP;
> +}
>  
>  /* Update per-address-space notifier flags */
>  vtd_as->notifier_flags = new;




Re: [PATCH] intel-iommu: Document iova_tree

2022-12-06 Thread Eric Auger
Hi Peter,
On 12/6/22 00:28, Peter Xu wrote:
> On Mon, Dec 05, 2022 at 12:23:20PM +0800, Jason Wang wrote:
>> On Fri, Dec 2, 2022 at 12:25 AM Peter Xu  wrote:
>>> It seems not super clear on when iova_tree is used, and why.  Add a rich
>>> comment above iova_tree to track why we needed the iova_tree, and when we
>>> need it.
>>>
>>> Suggested-by: Jason Wang 
>>> Signed-off-by: Peter Xu 
>>> ---
>>>  include/hw/i386/intel_iommu.h | 30 +-
>>>  1 file changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>> index 46d973e629..8d130ab2e3 100644
>>> --- a/include/hw/i386/intel_iommu.h
>>> +++ b/include/hw/i386/intel_iommu.h
>>> @@ -109,7 +109,35 @@ struct VTDAddressSpace {
>>>  QLIST_ENTRY(VTDAddressSpace) next;
>>>  /* Superset of notifier flags that this address space has */
>>>  IOMMUNotifierFlag notifier_flags;
>>> -IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
>>> +/*
>>> + * @iova_tree traces mapped IOVA ranges.
>>> + *
>>> + * The tree is not needed if no MAP notifiers is registered with
>>> + * current VTD address space, because all UNMAP (including iotlb or
>>> + * dev-iotlb) events can be transparently delivered to !MAP iommu
>>> + * notifiers.
>> So this means the UNMAP notifier doesn't need to be as accurate as
>> MAP. (Should we document it in the notifier headers)?
> Yes.
>
>> For MAP[a, b] MAP[b, c] we can do a UNMAP[a. c].
> IIUC a better way to say this is, for MAP[a, b] we can do an UNMAP[a-X,
> b+Y] as long as the range covers [a, b]?
>
>>> + *
>>> + * The tree OTOH is required for MAP typed iommu notifiers for a few
>>> + * reasons.
>>> + *
>>> + * Firstly, there's no way to identify whether an PSI event is MAP or
>>> + * UNMAP within the PSI message itself.  Without having prior knowledge
>>> + * of existing state vIOMMU doesn't know whether it should notify MAP
>>> + * or UNMAP for a PSI message it received.
>>> + *
>>> + * Secondly, PSI received from guest driver (or even a large PSI can
>>> + * grow into a DSI at least with Linux intel-iommu driver) can be
>>> + * larger in range than the newly mapped ranges for either MAP or UNMAP
>>> + * events.
>> Yes, so I think we need a document that the UNMAP handler should be
>> prepared for this.
> How about I squash below into this same patch?
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 91f8a2395a..c83bd11a68 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -129,6 +129,24 @@ struct IOMMUTLBEntry {
>  /*
>   * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>   * register with one or multiple IOMMU Notifier capability bit(s).
> + *
> + * Normally there're two use cases for the notifiers:
> + *
> + *   (1) When the device needs accurate synchronizations of the vIOMMU page
accurate synchronizations sound too vague & subjective to me.
> + *   tables, it needs to register with both MAP|UNMAP notifies (which
> + *   is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
> + *   events are registered, the notifications will be accurate but
> + *   there's overhead on synchronizing the guest vIOMMU page tables.
> + *
> + *   (2) When the device doesn't need accurate synchronizations of the
> + *   vIOMMU page tables (when the device can both cache translations
> + *   and requesting to translate dynamically during DMA process), it
s/requesting/request
> + *   needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
would be nice to clarify the distinction between both then
> + *   Note that in such working mode shadow page table is not used for
> + *   vIOMMU unit on this address space, so the UNMAP messages can be
I do not catch 'is not used for vIOMMU unit on this address space'
> + *   actually larger than the real invalidations (just like how the
> + *   Linux IOMMU driver normally works, where an invalidation can be
> + *   enlarged as long as it still covers the target range).
>   */
>  typedef enum {
>  IOMMU_NOTIFIER_NONE = 0,
>
> Thanks,
>
Thanks

Eric




  1   2   >