Re: [PATCH] qmp: Stabilize preconfig

2021-11-12 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 11/12/21 12:48, Markus Armbruster wrote:
>>> The monitor starts, the question is the availability of the event loop.
>> 
>> What does the event loop depend on?
>
> It depends on moving the relevant code out of qemu_init (at least 
> conditionally, as is the case for what is in qmp_x_exit_preconfig). 
> This in turn has the problem that it's ugly to have lingering unapplied 
> settings from the command line.
>
> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
> and the VM is ready.  Devices plugged in this phase already count as
> hot-plugged.  -S starts the monitor here.
>> 
>> Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
>> the plug is hot) instead of earlier (when it's cold)?
>
> Well, PHASE_MACHINE_READY includes the whole time the guest is running. 
>   So the simplest thing to do is to tell the user "if it hurts, don't do 
> it".  If you want a cold-plugged device, plug it during 
> PHASE_MACHINE_INIT, which right now means on the command line.

One, we don't tell users anything of the sort as far as I can tell, and
two, I'm afraid you missed my question :)

I'm not asking what to do "if it hurts", or "if you want a cold-plugged
device".  I'm asking whether there's a reason for ever wanting hot plug
instead of cold plug.  Or in other words, what can hot plug possibly
gain us over cold plug?

As far as I know, the answer is "nothing but trouble".

If that's true, then what we should tell users is to stick to -device
for initial configuration, and stay away from device_add.

Such advice would rain on the "configure everything with QMP" parade.
No big deal, we already know that parade needs plenty of work before it
can hit main street, and having to provide a way to cold plug with QMP
is merely yet another sub-task.

 Related question: when exactly in these phases do we create devices
 specified with -device?
>>>
>>> In PHASE_MACHINE_INIT---that is, after the machine has been initialized
>>> and before machine-done-notifiers have been called.
>> 
>> In other words, you should never use device_add where -device would do,
>> because the latter gives you cold plug (which is simple and reliable),
>> and the former hot plug (which is the opposite).
>
> Exactly.
>
>>> No, because the monitor goes directly from a point where device_add 
>>> fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
>>> (PHASE_MACHINE_READY).
>> 
>> Bummer.
>
> True, but consider that these "phases" were reconstructed ex post.  It's 
> not like x-exit-preconfig was designed to skip PHASE_MACHINE_INIT; it's 
> just that preconfig used to call qemu_main_loop() at the point which is 
> now known as PHASE_ACCEL_CREATED.

Understand.  I'm just trying to map the terrain so we can hopefully get
from here to a better place.


[...]

 The earlier the monitor becomes available, the better.
 Ideally, we'd process the command line strictly left to right, and fail
 options that are "out of phase".  Make the monitor available right when
 we process its -mon.  The -chardev for its character device must precede
 it.
>>>
>>> The boat for this has sailed.  The only sane way to do this is a new binary.
>> 
>> "Ideally" still applies to any new binary.
>
> Well, "ideally" any new binary would only have a few command line 
> options, and ordering would be mostly irrelevant.  For example I'd 
> expect a QMP binary to only have a few options, mostly for 
> debugging/development (-L, -trace) and for process-wide settings (such 
> as -name).

This is where we disagree.

For me, a new, alternative qemu-system-FOO binary should be able to
replace the warty one we have.

One important kind of user is management applications.  Libvirt
developers tell us that they'd like to configure as much as possible via
QMP.

Another kind of user dear to me is me^H^Hdevelopers.  For ad hoc
testing, having to configure via QMP is a pain we'd rathe do without.  A
combination of configuration file(s), CLI and HMP is much quicker.  I
don't want to remain stuck on the traditional binary, I want to do this
with the new one.

Catering to this kind of users should not be hard.  All it takes is a
sensiblly designed startup.  Rough sketch without much thought:

1. Start event loop

2. Feed it CLI left to right.  Each option runs a handler just like each
   QMP command does.

   Options that read a configuration file inject the file into the feed.

   Options that create a monitor create it suspended.

   Options may advance the phase / run state, and they may require
   certain phase(s).

3. When we're done with CLI, resume any monitors we created.

4. Monitors now feed commands to the event loop.  Commands may advance
   the phase / run state, and they may require certain phase(s).

 Likewise, we'd fail QMP commands that are "out of phase".
 @allow-preconfig is a crutch that only exists because we're afraid (with
 reason) of hidden assumptions 

[RFC v3 5/5] *-user: move safe-syscall.* to common-user

2021-11-12 Thread Warner Losh
Move linux-user/safe-syscall.S to common-user/common-safe-syscall.S and
replace it with a #include "common-safe-syscall.S" so that bsd-user can
also use it. Also move safe-syscall.h so that it can define a few more
externs.

Signed-off-by: Warner Losh 
---
 common-user/common-safe-syscall.S  | 30 +
 {linux-user => common-user}/safe-syscall.h |  0
 linux-user/safe-syscall.S  | 31 +-
 linux-user/signal.c|  1 +
 meson.build|  1 +
 5 files changed, 33 insertions(+), 30 deletions(-)
 create mode 100644 common-user/common-safe-syscall.S
 rename {linux-user => common-user}/safe-syscall.h (100%)

diff --git a/common-user/common-safe-syscall.S 
b/common-user/common-safe-syscall.S
new file mode 100644
index 00..42ea7c40ba
--- /dev/null
+++ b/common-user/common-safe-syscall.S
@@ -0,0 +1,30 @@
+/*
+ * safe-syscall.S : include the host-specific assembly fragment
+ * to handle signals occurring at the same time as system calls.
+ *
+ * Written by Peter Maydell 
+ *
+ * Copyright (C) 2016 Linaro Limited
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "hostdep.h"
+#include "target_errno_defs.h"
+
+/* We have the correct host directory on our include path
+ * so that this will pull in the right fragment for the architecture.
+ */
+#ifdef HAVE_SAFE_SYSCALL
+#include "safe-syscall.inc.S"
+#endif
+
+/* We must specifically say that we're happy for the stack to not be
+ * executable, otherwise the toolchain will default to assuming our
+ * assembly needs an executable stack and the whole QEMU binary will
+ * needlessly end up with one. This should be the last thing in this file.
+ */
+#if defined(__linux__) && defined(__ELF__)
+.section.note.GNU-stack, "", %progbits
+#endif
diff --git a/linux-user/safe-syscall.h b/common-user/safe-syscall.h
similarity index 100%
rename from linux-user/safe-syscall.h
rename to common-user/safe-syscall.h
diff --git a/linux-user/safe-syscall.S b/linux-user/safe-syscall.S
index 42ea7c40ba..c86f0aea74 100644
--- a/linux-user/safe-syscall.S
+++ b/linux-user/safe-syscall.S
@@ -1,30 +1 @@
-/*
- * safe-syscall.S : include the host-specific assembly fragment
- * to handle signals occurring at the same time as system calls.
- *
- * Written by Peter Maydell 
- *
- * Copyright (C) 2016 Linaro Limited
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#include "hostdep.h"
-#include "target_errno_defs.h"
-
-/* We have the correct host directory on our include path
- * so that this will pull in the right fragment for the architecture.
- */
-#ifdef HAVE_SAFE_SYSCALL
-#include "safe-syscall.inc.S"
-#endif
-
-/* We must specifically say that we're happy for the stack to not be
- * executable, otherwise the toolchain will default to assuming our
- * assembly needs an executable stack and the whole QEMU binary will
- * needlessly end up with one. This should be the last thing in this file.
- */
-#if defined(__linux__) && defined(__ELF__)
-.section.note.GNU-stack, "", %progbits
-#endif
+#include "common-safe-syscall.S"
diff --git a/linux-user/signal.c b/linux-user/signal.c
index ee038c2399..cfda166f9c 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -31,6 +31,7 @@
 #include "trace.h"
 #include "signal-common.h"
 #include "host-signal.h"
+#include "safe-syscall.h"
 
 static struct target_sigaction sigact_table[TARGET_NSIG];
 
diff --git a/meson.build b/meson.build
index 728d305403..2f3b0fb2d6 100644
--- a/meson.build
+++ b/meson.build
@@ -2873,6 +2873,7 @@ foreach target : target_dirs
   base_dir = 'linux-user'
   target_inc += include_directories('linux-user/host/' / 
config_host['ARCH'])
   target_inc += include_directories('common-user/host/' / 
config_host['ARCH'])
+  target_inc += include_directories('common-user')
 endif
 if 'CONFIG_BSD_USER' in config_target
   base_dir = 'bsd-user'
-- 
2.33.0




[RFC v3 1/5] linux-user: Add host_signal_set_pc to set pc in mcontext

2021-11-12 Thread Warner Losh
Add a new function host_signal_set_pc to set the next pc in an
mcontext. The caller should ensure this is a valid PC for execution.

Signed-off-by: Warner Losh 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 linux-user/host/aarch64/host-signal.h | 5 +
 linux-user/host/alpha/host-signal.h   | 5 +
 linux-user/host/arm/host-signal.h | 5 +
 linux-user/host/i386/host-signal.h| 5 +
 linux-user/host/mips/host-signal.h| 5 +
 linux-user/host/ppc/host-signal.h | 5 +
 linux-user/host/riscv/host-signal.h   | 5 +
 linux-user/host/s390/host-signal.h| 5 +
 linux-user/host/sparc/host-signal.h   | 9 +
 linux-user/host/x86_64/host-signal.h  | 5 +
 10 files changed, 54 insertions(+)

diff --git a/linux-user/host/aarch64/host-signal.h 
b/linux-user/host/aarch64/host-signal.h
index 0c0b08383a..9770b36dc1 100644
--- a/linux-user/host/aarch64/host-signal.h
+++ b/linux-user/host/aarch64/host-signal.h
@@ -35,6 +35,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.pc;
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.pc = pc;
+}
+
 static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
 {
 struct _aarch64_ctx *hdr;
diff --git a/linux-user/host/alpha/host-signal.h 
b/linux-user/host/alpha/host-signal.h
index e080be412f..f4c942948a 100644
--- a/linux-user/host/alpha/host-signal.h
+++ b/linux-user/host/alpha/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.sc_pc;
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.sc_pc = pc;
+}
+
 static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
 {
 uint32_t *pc = (uint32_t *)host_signal_pc(uc);
diff --git a/linux-user/host/arm/host-signal.h 
b/linux-user/host/arm/host-signal.h
index efb165c0c5..6c095773c0 100644
--- a/linux-user/host/arm/host-signal.h
+++ b/linux-user/host/arm/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.arm_pc;
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.arm_pc = pc;
+}
+
 static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
 {
 /*
diff --git a/linux-user/host/i386/host-signal.h 
b/linux-user/host/i386/host-signal.h
index 4c8eef99ce..abe1ece5c9 100644
--- a/linux-user/host/i386/host-signal.h
+++ b/linux-user/host/i386/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.gregs[REG_EIP];
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.gregs[REG_EIP] = pc;
+}
+
 static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
 {
 return uc->uc_mcontext.gregs[REG_TRAPNO] == 0xe
diff --git a/linux-user/host/mips/host-signal.h 
b/linux-user/host/mips/host-signal.h
index ef341f7c20..c666ed8c3f 100644
--- a/linux-user/host/mips/host-signal.h
+++ b/linux-user/host/mips/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.pc;
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.pc = pc;
+}
+
 #if defined(__misp16) || defined(__mips_micromips)
 #error "Unsupported encoding"
 #endif
diff --git a/linux-user/host/ppc/host-signal.h 
b/linux-user/host/ppc/host-signal.h
index a491c413dc..1d8e658ff7 100644
--- a/linux-user/host/ppc/host-signal.h
+++ b/linux-user/host/ppc/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.regs->nip;
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.regs->nip = pc;
+}
+
 static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
 {
 return uc->uc_mcontext.regs->trap != 0x400
diff --git a/linux-user/host/riscv/host-signal.h 
b/linux-user/host/riscv/host-signal.h
index 3b168cb58b..a4f170efb0 100644
--- a/linux-user/host/riscv/host-signal.h
+++ b/linux-user/host/riscv/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.__gregs[REG_PC];
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.__gregs[REG_PC] = pc;
+}
+
 static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
 {
 /*
diff --git a/linux-user/host/s390/host-signal.h 
b/linux-user/host/s390/host-signal.h
index 26990e4893..a524f2ab00 100644
--- a/linux-user/host/s390/host-signal.h
+++ b/linux-user/host/s390/host-signal.h
@@ -16,6 +16,11 @@ static inline uintptr_t host_signal_pc(ucontext_t *uc)
 return uc->uc_mcontext.psw.addr;
 }
 
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.psw.addr = pc;
+}
+
 static inline bool 

[RFC v3 2/5] linux-user/signal.c: Create a common rewind_if_in_safe_syscall

2021-11-12 Thread Warner Losh
All instances of rewind_if_in_safe_syscall are the same, differing only
in how the instruction point is fetched from the ucontext and the size
of the registers. Use host_signal_pc and new host_signal_set_pc
interfaces to fetch the pointer to the PC and adjust if needed. Delete
all the old copies of rewind_if_in_safe_syscall.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
---
 linux-user/host/aarch64/hostdep.h | 20 
 linux-user/host/arm/hostdep.h | 20 
 linux-user/host/i386/hostdep.h| 20 
 linux-user/host/ppc64/hostdep.h   | 20 
 linux-user/host/riscv/hostdep.h   | 20 
 linux-user/host/s390x/hostdep.h   | 20 
 linux-user/host/x86_64/hostdep.h  | 20 
 linux-user/safe-syscall.h |  3 +++
 linux-user/signal.c   | 14 +-
 9 files changed, 16 insertions(+), 141 deletions(-)

diff --git a/linux-user/host/aarch64/hostdep.h 
b/linux-user/host/aarch64/hostdep.h
index a8d41a21ad..39299d798a 100644
--- a/linux-user/host/aarch64/hostdep.h
+++ b/linux-user/host/aarch64/hostdep.h
@@ -15,24 +15,4 @@
 /* We have a safe-syscall.inc.S */
 #define HAVE_SAFE_SYSCALL
 
-#ifndef __ASSEMBLER__
-
-/* These are defined by the safe-syscall.inc.S file */
-extern char safe_syscall_start[];
-extern char safe_syscall_end[];
-
-/* Adjust the signal context to rewind out of safe-syscall if we're in it */
-static inline void rewind_if_in_safe_syscall(void *puc)
-{
-ucontext_t *uc = puc;
-__u64 *pcreg = >uc_mcontext.pc;
-
-if (*pcreg > (uintptr_t)safe_syscall_start
-&& *pcreg < (uintptr_t)safe_syscall_end) {
-*pcreg = (uintptr_t)safe_syscall_start;
-}
-}
-
-#endif /* __ASSEMBLER__ */
-
 #endif
diff --git a/linux-user/host/arm/hostdep.h b/linux-user/host/arm/hostdep.h
index 9276fe6ceb..86b137875a 100644
--- a/linux-user/host/arm/hostdep.h
+++ b/linux-user/host/arm/hostdep.h
@@ -15,24 +15,4 @@
 /* We have a safe-syscall.inc.S */
 #define HAVE_SAFE_SYSCALL
 
-#ifndef __ASSEMBLER__
-
-/* These are defined by the safe-syscall.inc.S file */
-extern char safe_syscall_start[];
-extern char safe_syscall_end[];
-
-/* Adjust the signal context to rewind out of safe-syscall if we're in it */
-static inline void rewind_if_in_safe_syscall(void *puc)
-{
-ucontext_t *uc = puc;
-unsigned long *pcreg = >uc_mcontext.arm_pc;
-
-if (*pcreg > (uintptr_t)safe_syscall_start
-&& *pcreg < (uintptr_t)safe_syscall_end) {
-*pcreg = (uintptr_t)safe_syscall_start;
-}
-}
-
-#endif /* __ASSEMBLER__ */
-
 #endif
diff --git a/linux-user/host/i386/hostdep.h b/linux-user/host/i386/hostdep.h
index 073be74d87..ce7136501f 100644
--- a/linux-user/host/i386/hostdep.h
+++ b/linux-user/host/i386/hostdep.h
@@ -15,24 +15,4 @@
 /* We have a safe-syscall.inc.S */
 #define HAVE_SAFE_SYSCALL
 
-#ifndef __ASSEMBLER__
-
-/* These are defined by the safe-syscall.inc.S file */
-extern char safe_syscall_start[];
-extern char safe_syscall_end[];
-
-/* Adjust the signal context to rewind out of safe-syscall if we're in it */
-static inline void rewind_if_in_safe_syscall(void *puc)
-{
-ucontext_t *uc = puc;
-greg_t *pcreg = >uc_mcontext.gregs[REG_EIP];
-
-if (*pcreg > (uintptr_t)safe_syscall_start
-&& *pcreg < (uintptr_t)safe_syscall_end) {
-*pcreg = (uintptr_t)safe_syscall_start;
-}
-}
-
-#endif /* __ASSEMBLER__ */
-
 #endif
diff --git a/linux-user/host/ppc64/hostdep.h b/linux-user/host/ppc64/hostdep.h
index 98979ad917..0c290dd904 100644
--- a/linux-user/host/ppc64/hostdep.h
+++ b/linux-user/host/ppc64/hostdep.h
@@ -15,24 +15,4 @@
 /* We have a safe-syscall.inc.S */
 #define HAVE_SAFE_SYSCALL
 
-#ifndef __ASSEMBLER__
-
-/* These are defined by the safe-syscall.inc.S file */
-extern char safe_syscall_start[];
-extern char safe_syscall_end[];
-
-/* Adjust the signal context to rewind out of safe-syscall if we're in it */
-static inline void rewind_if_in_safe_syscall(void *puc)
-{
-ucontext_t *uc = puc;
-unsigned long *pcreg = >uc_mcontext.gp_regs[PT_NIP];
-
-if (*pcreg > (uintptr_t)safe_syscall_start
-&& *pcreg < (uintptr_t)safe_syscall_end) {
-*pcreg = (uintptr_t)safe_syscall_start;
-}
-}
-
-#endif /* __ASSEMBLER__ */
-
 #endif
diff --git a/linux-user/host/riscv/hostdep.h b/linux-user/host/riscv/hostdep.h
index 2ba07456ae..7f67c22868 100644
--- a/linux-user/host/riscv/hostdep.h
+++ b/linux-user/host/riscv/hostdep.h
@@ -11,24 +11,4 @@
 /* We have a safe-syscall.inc.S */
 #define HAVE_SAFE_SYSCALL
 
-#ifndef __ASSEMBLER__
-
-/* These are defined by the safe-syscall.inc.S file */
-extern char safe_syscall_start[];
-extern char safe_syscall_end[];
-
-/* Adjust the signal context to rewind out of safe-syscall if we're in it */
-static inline void rewind_if_in_safe_syscall(void *puc)
-{
-ucontext_t *uc = puc;
-unsigned long *pcreg = 

[RFC v3 4/5] common-user: Adjust system call return on FreeBSD

2021-11-12 Thread Warner Losh
All the *-users generally use the negative errno return codes to signal
errno for a system call.  FreeBSD's system calls, on the other hand,
returns errno, not -errno. Add ifdefs for FreeBSD to make the adjustment
on the 4 hosts that we have support for.

Signed-off-by: Warner Losh 
---
 common-user/host/aarch64/safe-syscall.inc.S | 8 
 common-user/host/arm/safe-syscall.inc.S | 7 +++
 common-user/host/i386/safe-syscall.inc.S| 9 +
 common-user/host/x86_64/safe-syscall.inc.S  | 9 +
 4 files changed, 33 insertions(+)

diff --git a/common-user/host/aarch64/safe-syscall.inc.S 
b/common-user/host/aarch64/safe-syscall.inc.S
index bc1f5a9792..9f9525fe25 100644
--- a/common-user/host/aarch64/safe-syscall.inc.S
+++ b/common-user/host/aarch64/safe-syscall.inc.S
@@ -64,6 +64,14 @@ safe_syscall_start:
svc 0x0
 safe_syscall_end:
/* code path for having successfully executed the syscall */
+#ifdef __FreeBSD__
+/*
+ * FreeBSD kernel returns C bit set with positive errno.
+ * Encode this for use in bsd-user as -errno:
+*x0 = !c ? x0 : -x0
+*/
+   csneg  x0, x0, x0, cc
+#endif
ret
 
 0:
diff --git a/common-user/host/arm/safe-syscall.inc.S 
b/common-user/host/arm/safe-syscall.inc.S
index 88c4958504..459e5f87c2 100644
--- a/common-user/host/arm/safe-syscall.inc.S
+++ b/common-user/host/arm/safe-syscall.inc.S
@@ -78,6 +78,13 @@ safe_syscall_start:
swi 0
 safe_syscall_end:
/* code path for having successfully executed the syscall */
+#ifdef __FreeBSD__
+/*
+ * FreeBSD kernel returns C bit set with positive errno.
+ * Encode this for use in bsd-user as -errno:
+ */
+negcs   r0, r0
+#endif
pop { r4, r5, r6, r7, r8, pc }
 
 1:
diff --git a/common-user/host/i386/safe-syscall.inc.S 
b/common-user/host/i386/safe-syscall.inc.S
index 9e58fc6504..ba55a35e92 100644
--- a/common-user/host/i386/safe-syscall.inc.S
+++ b/common-user/host/i386/safe-syscall.inc.S
@@ -75,6 +75,15 @@ safe_syscall_start:
int $0x80
 safe_syscall_end:
/* code path for having successfully executed the syscall */
+#ifdef __FreeBSD__
+/*
+ * FreeBSD kernel returns C bit set with positive errno.
+ * Encode this for use in bsd-user as -errno:
+ */
+jnb 2f
+neg %eax
+2:
+#endif
pop %ebx
.cfi_remember_state
.cfi_adjust_cfa_offset -4
diff --git a/common-user/host/x86_64/safe-syscall.inc.S 
b/common-user/host/x86_64/safe-syscall.inc.S
index f36992daa3..46c527e058 100644
--- a/common-user/host/x86_64/safe-syscall.inc.S
+++ b/common-user/host/x86_64/safe-syscall.inc.S
@@ -72,6 +72,15 @@ safe_syscall_start:
 syscall
 safe_syscall_end:
 /* code path for having successfully executed the syscall */
+#ifdef __FreeBSD__
+/*
+ * FreeBSD kernel returns C bit set with positive errno.
+ * Encode this for use in bsd-user as -errno:
+ */
+jnb 2f
+neg %rax
+2:
+#endif
 pop %rbp
 .cfi_remember_state
 .cfi_def_cfa_offset 8
-- 
2.33.0




[RFC v3 3/5] linux-user/safe-syscall.inc.S: Move to common-user

2021-11-12 Thread Warner Losh
Move all the safe_syscall.inc.S files to common-user. They are almost
identical between linux-user and bsd-user to re-use.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 {linux-user => common-user}/host/aarch64/safe-syscall.inc.S | 0
 {linux-user => common-user}/host/arm/safe-syscall.inc.S | 0
 {linux-user => common-user}/host/i386/safe-syscall.inc.S| 0
 {linux-user => common-user}/host/ppc64/safe-syscall.inc.S   | 0
 {linux-user => common-user}/host/riscv/safe-syscall.inc.S   | 0
 {linux-user => common-user}/host/s390x/safe-syscall.inc.S   | 0
 {linux-user => common-user}/host/x86_64/safe-syscall.inc.S  | 0
 meson.build | 1 +
 8 files changed, 1 insertion(+)
 rename {linux-user => common-user}/host/aarch64/safe-syscall.inc.S (100%)
 rename {linux-user => common-user}/host/arm/safe-syscall.inc.S (100%)
 rename {linux-user => common-user}/host/i386/safe-syscall.inc.S (100%)
 rename {linux-user => common-user}/host/ppc64/safe-syscall.inc.S (100%)
 rename {linux-user => common-user}/host/riscv/safe-syscall.inc.S (100%)
 rename {linux-user => common-user}/host/s390x/safe-syscall.inc.S (100%)
 rename {linux-user => common-user}/host/x86_64/safe-syscall.inc.S (100%)

diff --git a/linux-user/host/aarch64/safe-syscall.inc.S 
b/common-user/host/aarch64/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/aarch64/safe-syscall.inc.S
rename to common-user/host/aarch64/safe-syscall.inc.S
diff --git a/linux-user/host/arm/safe-syscall.inc.S 
b/common-user/host/arm/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/arm/safe-syscall.inc.S
rename to common-user/host/arm/safe-syscall.inc.S
diff --git a/linux-user/host/i386/safe-syscall.inc.S 
b/common-user/host/i386/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/i386/safe-syscall.inc.S
rename to common-user/host/i386/safe-syscall.inc.S
diff --git a/linux-user/host/ppc64/safe-syscall.inc.S 
b/common-user/host/ppc64/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/ppc64/safe-syscall.inc.S
rename to common-user/host/ppc64/safe-syscall.inc.S
diff --git a/linux-user/host/riscv/safe-syscall.inc.S 
b/common-user/host/riscv/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/riscv/safe-syscall.inc.S
rename to common-user/host/riscv/safe-syscall.inc.S
diff --git a/linux-user/host/s390x/safe-syscall.inc.S 
b/common-user/host/s390x/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/s390x/safe-syscall.inc.S
rename to common-user/host/s390x/safe-syscall.inc.S
diff --git a/linux-user/host/x86_64/safe-syscall.inc.S 
b/common-user/host/x86_64/safe-syscall.inc.S
similarity index 100%
rename from linux-user/host/x86_64/safe-syscall.inc.S
rename to common-user/host/x86_64/safe-syscall.inc.S
diff --git a/meson.build b/meson.build
index 9702fdce6d..728d305403 100644
--- a/meson.build
+++ b/meson.build
@@ -2872,6 +2872,7 @@ foreach target : target_dirs
 if 'CONFIG_LINUX_USER' in config_target
   base_dir = 'linux-user'
   target_inc += include_directories('linux-user/host/' / 
config_host['ARCH'])
+  target_inc += include_directories('common-user/host/' / 
config_host['ARCH'])
 endif
 if 'CONFIG_BSD_USER' in config_target
   base_dir = 'bsd-user'
-- 
2.33.0




[RFC v3 0/5] linux-user: simplify safe signal handling

2021-11-12 Thread Warner Losh
This is a quick RFC to see if something like this is worth doing.

I've created a new interface host_signal_set_pc. This allows us to move all the
nearly identical copies of rewind_if_in_safe_syscall into signal.c.  This
reduces the amount of code that needs to be rewritten for bsd-user's adaptation
of both the safe signal handling and the sigsegv/sigbus changes that have
happened. Since BSD's mcontext_t differs in some cases, we wouldn't be able to
share this between platforms, but it reduces the number of nearly identical
routines I'd have to write.

In addition, the assembler glue for the safe system calls is almost identical
between linux and bsd-user's fork. The only difference is inverting the system
call return to comply with the -ERRNO convention *-user uses in the rest of the
code which is native to Linux, but differs for the BSDs and other traditional
unix targets.

I know the patches may not be sliced and diced in the typical desired
fashion. This is a RFC, and the changes are short enough to be easily digested
though since it's quite repetitive. They do now pass a push to gitlab and
the default CI (see note in v2 section about one ugly kludge that likely
needs discussion).

These were extracted from the 'blitz' branch we have in the bsd-user fork and
then that was adapted to use the common code. I've pushed a branch to gitlab
(viewable at https://gitlab.com/bsdimp/qemu/-/tree/blitz if you prefer that to
fetching) that shows how these will be used. I'm working on upstreaming
bsd-user/signal.c next which will take a little bit of time to work into a place
where it can be reviewed here. I wanted to get feedback because this is
one chunk I can cleave off and make landing that easier.

v3: o Make arm and aarch64 fixes as suggested in the review
o Fix a stray & that remained after some churn for 32-bit sparc,
  clearly not compiled in our CI pipeline...
o Fix the comments to be more descriptive as to the errno convetion
  and not characterize it as the Linux way.

v2:
o move the externs for the system call setup to safe-syscall.h
o move to using the #ifdef __FreeBSD__ code for FreeBSD's adjustment
  to return value from system calls.
o move safe-syscall.inc to common-user so bsd-user can use it too
o create a kludge for mips to allow CI to pass (but maybe we should
  remove mips hosts as a supported platform instead)
o side note: the blitz bsd-user branch hasn't been updated yet since
  I think the first two of this series may be merged early to solve
  a different problem.

Warner Losh (5):
  linux-user: Add host_signal_set_pc to set pc in mcontext
  linux-user/signal.c: Create a common rewind_if_in_safe_syscall
  linux-user/safe-syscall.inc.S: Move to common-user
  common-user: Adjust system call return on FreeBSD
  *-user: move safe-syscall.* to common-user

 common-user/common-safe-syscall.S | 30 ++
 .../host/aarch64/safe-syscall.inc.S   |  8 +
 .../host/arm/safe-syscall.inc.S   |  7 +
 .../host/i386/safe-syscall.inc.S  |  9 ++
 .../host/ppc64/safe-syscall.inc.S |  0
 .../host/riscv/safe-syscall.inc.S |  0
 .../host/s390x/safe-syscall.inc.S |  0
 .../host/x86_64/safe-syscall.inc.S|  9 ++
 {linux-user => common-user}/safe-syscall.h|  3 ++
 linux-user/host/aarch64/host-signal.h |  5 +++
 linux-user/host/aarch64/hostdep.h | 20 
 linux-user/host/alpha/host-signal.h   |  5 +++
 linux-user/host/arm/host-signal.h |  5 +++
 linux-user/host/arm/hostdep.h | 20 
 linux-user/host/i386/host-signal.h|  5 +++
 linux-user/host/i386/hostdep.h| 20 
 linux-user/host/mips/host-signal.h|  5 +++
 linux-user/host/ppc/host-signal.h |  5 +++
 linux-user/host/ppc64/hostdep.h   | 20 
 linux-user/host/riscv/host-signal.h   |  5 +++
 linux-user/host/riscv/hostdep.h   | 20 
 linux-user/host/s390/host-signal.h|  5 +++
 linux-user/host/s390x/hostdep.h   | 20 
 linux-user/host/sparc/host-signal.h   |  9 ++
 linux-user/host/x86_64/host-signal.h  |  5 +++
 linux-user/host/x86_64/hostdep.h  | 20 
 linux-user/safe-syscall.S | 31 +--
 linux-user/signal.c   | 15 -
 meson.build   |  2 ++
 29 files changed, 137 insertions(+), 171 deletions(-)
 create mode 100644 common-user/common-safe-syscall.S
 rename {linux-user => common-user}/host/aarch64/safe-syscall.inc.S (92%)
 rename {linux-user => common-user}/host/arm/safe-syscall.inc.S (93%)
 rename {linux-user => common-user}/host/i386/safe-syscall.inc.S (93%)
 rename {linux-user => 

Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-12 Thread WANG Xuerui

On 11/12/21 22:05, Richard Henderson wrote:

On 11/12/21 7:53 AM, Song Gao wrote:

+#
+# Fields
+#
+%rd  0:5
+%rj  5:5
+%rk  10:5
+%sa2 15:2
+%si12    10:s12
+%ui12    10:12
+%si16    10:s16
+%si20    5:s20


You should only create separate field definitions like this when they 
are complex: e.g. the logical field is disjoint or there's a need for 
!function.



+
+#
+# Argument sets
+#
+_rdrjrk rd rj rk
+_rdrjsi12   rd rj si12
+_rdrjrksa2  rd rj rk sa2
+_rdrjsi16   rd rj si16
+_rdrjui12   rd rj ui12
+_rdsi20 rd si20


Some of these should be combined.  The width of the immediate is a 
detail of the format, not the decoded argument set.  Thus you should have


_rdimm rd imm
_rdrjimm   rd rj imm
_rdrjrk    rd rj rk
_rdrjrksa  rd rj rk sa


I'd like to add, that the organization of the whole decodetree file 
closely resembles that of the ISA manual, most likely on purpose (while 
not stated anywhere in the patch). However the manual itself is not 
without errors or inconsistencies; for example, the 9 "base instruction 
formats" classification is nowhere near accurate, and here we can see 
the author is forced to create ad-hoc names (repeating the operand 
slots). I suggest just generating the descriptions from the 
loongarch-opcodes project [1]; no need to duplicate work. I'll happily 
help if you decide to do that.


[1]: https://github.com/loongson-community/loongarch-opcodes



+alsl_w     010 .. . . .   
@fmt_rdrjrksa2

+alsl_wu    011 .. . . . @fmt_rdrjrksa2
+alsl_d    0010 110 .. . . . @fmt_rdrjrksa2


The encoding of these insns is that the shift is sa+1.

While you compensate for this in gen_alsl_*, we print the "wrong" 
number in the disassembly.  I think it would be better to do


%sa2p1 15:2 !function=plus_1
@fmt_rdrjrksa2p1    ... .. rk:5 rj:5 rd:5 \
  _rdrjrksa sa=%sa2p1


Here again, the manual was inconsistent with the binutils 
implementation; the manual says (for ALSL.W, it's SLADD in 
loongarch-opcodes project's revised mnemonics):


"ALSL.W logically left-shifts rj[31:0] by (sa2+1) bits, [snip]" 
(translation mine, not copied from the official translation)


Clearly the "+1" part is not meant to show up in disassembly. Yet the 
binutils implementation acts as if the operand should be pre-added 1 in 
source code, and disassembles and prints as such, obvious mismatch here. 
I'd suggest fixing the disassembly code to remove this inconsistency. 
And the "+1" "feature" is not used anywhere else AFAIK, so it wouldn't 
hurt to just delete everything about it.





r~





Re: QEMU on x64

2021-11-12 Thread Christopher Caulfield
Hi folks! Wanted to share some documentation if you all want to give QEMU a
try within WinDbg. This is something we've been invested in supporting.

   - Link to public project:
   https://github.com/microsoft/WinDbg-Samples/tree/master/Exdi/exdigdbsrv
   

   - Link to external readme:  WinDbg-Samples/ExdiGdbSrv_readme.md at
   master · microsoft/WinDbg-Samples · GitHub
   

   .

Anyone planning to add the missing x86-64 system registers to the QEMU
x86-64 GDb server?: QEMU registers support on x64 (#510) · Issues · QEMU /
QEMU · GitLab ? (I just
realized the title isn't great - O well...)

Thanks so much!
-Christopher

On Mon, Aug 2, 2021 at 6:34 PM Christopher Caulfield 
wrote:

> Thanks folks! I went ahead and made a feature/issue request based on
> Paolo's suggestion:
> QEMU registers support on x64 (#510) · Issues · QEMU / QEMU · GitLab
> 
>
> Please let me know if someone has the cycles to support this.
>
> -Christopher
>
> On Mon, Aug 2, 2021 at 10:37 AM Alex Bennée 
> wrote:
>
>>
>> Peter Maydell  writes:
>>
>> > On Fri, 30 Jul 2021 at 19:05, Christopher Caulfield
>> >  wrote:
>> >> This is Christopher from the debugging experiences team at Microsoft
>> focused on kernel debugging. I am reaching out with a few questions about
>> QEMU on x64.
>> >>
>> >> Is it possible for the QEMU-x86-64 GDB Server to send the full set
>> >> of x64 system registers (whether they are included in a separated
>> >> system xml file or as part of the core registers xml file)?
>> >
>> > Do you mean "is it possible for somebody to write code for
>> > QEMU to make it do that", or "does QEMU do it today if you pass
>> > it the right command line option" ? The answer to the former
>> > is "yes", to the latter "no". (If you want the debugger to
>> > be able to write to the system registers this might be a little
>> > trickier, mostly in terms of "auditing the code to make sure this
>> > can't confuse QEMU if you change some sysreg under its feet.".)
>> >
>> >> e.g. System registers missing from i386-64bit.xml file
>> >
>> >> DWORD64 IDTBase;
>> >> DWORD64 IDTLimit;
>> >> DWORD64 GDTBase;
>> >> DWORD64 GDTLimit;
>> >> DWORD SelLDT;
>> >> SEG64_DESC_INFO SegLDT;
>> >> DWORD SelTSS;
>> >> SEG64_DESC_INFO SegTSS;
>> >>
>> >> How can I access x64 MSR registers by using the QEMU-x86-64 GDB server?
>> >>
>> >> #define MSR_EFER 0xc080 // extended function enable register
>> >
>> > EFER is in the xml ("x64_efer") so should be already accessible.
>> > For anything else you're going to need to write some code to
>> > make it happen.
>> >
>> >>is there any plan to support reading/writing to MSRs via QEMU-x86-64
>> >GDB server?
>>
>> Not at the moment but I am keen to see any eventual solution try to be
>> generic rather than hardwired for one architecture. The ARM code
>> currently builds custom XML from it's register descriptors to expose
>> it's MSR registers to the gdbstub. Ideally architecture front ends
>> should register their registers with a new subsystem which can then do
>> the glue between gdbstub as well as other systems that also care about
>> register values (logging, HMP, TCG plugins).
>>
>> That said I'm not going to block any patches that just fix up the
>> current XML and target/i386/gdbstub code. I'm not familiar enough with
>> what the internal register representation state is for x86 w.r.t to TCG
>> and hypervisor based running modes.
>>
>> > Not that I know of. We'd be happy to review patches if you want to
>> > write them.
>> >
>> > thanks
>> > -- PMM
>>
>>
>> --
>> Alex Bennée
>>
>


Re: [PATCH] qmp: Stabilize preconfig

2021-11-12 Thread Paolo Bonzini

On 11/12/21 12:48, Markus Armbruster wrote:

The monitor starts, the question is the availability of the event loop.


What does the event loop depend on?


It depends on moving the relevant code out of qemu_init (at least 
conditionally, as is the case for what is in qmp_x_exit_preconfig). 
This in turn has the problem that it's ugly to have lingering unapplied 
settings from the command line.



5) PHASE_MACHINE_READY - machine init done notifiers have been called
and the VM is ready.  Devices plugged in this phase already count as
hot-plugged.  -S starts the monitor here.


Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
the plug is hot) instead of earlier (when it's cold)?


Well, PHASE_MACHINE_READY includes the whole time the guest is running. 
 So the simplest thing to do is to tell the user "if it hurts, don't do 
it".  If you want a cold-plugged device, plug it during 
PHASE_MACHINE_INIT, which right now means on the command line.



Related question: when exactly in these phases do we create devices
specified with -device?


In PHASE_MACHINE_INIT---that is, after the machine has been initialized
and before machine-done-notifiers have been called.


In other words, you should never use device_add where -device would do,
because the latter gives you cold plug (which is simple and reliable),
and the former hot plug (which is the opposite).


Exactly.

No, because the monitor goes directly from a point where device_add 
fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
(PHASE_MACHINE_READY).


Bummer.


True, but consider that these "phases" were reconstructed ex post.  It's 
not like x-exit-preconfig was designed to skip PHASE_MACHINE_INIT; it's 
just that preconfig used to call qemu_main_loop() at the point which is 
now known as PHASE_ACCEL_CREATED.



With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
reached with a machine-set command (corresponding to the
non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
reached with an accel-set command (corresponding to -accel).


I don't think this depends on "pure-QMP configuration flow".  -machine
and -accel could advance the phase just like their buddies machine-set
and accel-set.


They already do (see qemu_init's calls to phase_advance).


State transition diagram:

  PHASE_NO_MACHINE (initial state)
  |  -machine or machine-set
  PHASE_MACHINE_CREATED
  |  -accel or accel-set
  PHASE_ACCEL_CREATED
  |


qmp_x_exit_preconfig() -> qemu_init_board() -> machine_run_board_init()


I read this as "the state transition happens in
machine_run_board_init(), called from qmp_x_exit_preconfig() via
qemu_init_board()".


Exactly.  And in turn qmp_x_exit_preconfig() is reached from either the 
monitor (with -preconfig) or qemu_init (without -preconfig).



  PHASE_MACHINE_INIT
  |


qmp_x_exit_preconfig() -> qemu_machine_creation_done() ->
qdev_machine_creation_done()


I read this as "the state transition happens in
qdev_machine_creation_done(), called from qmp_x_exit_preconfig() via
qemu_machine_creation_done()".


Right again.  In both cases, just grep for calls of "phase_advance".


The earlier the monitor becomes available, the better.
Ideally, we'd process the command line strictly left to right, and fail
options that are "out of phase".  Make the monitor available right when
we process its -mon.  The -chardev for its character device must precede
it.


The boat for this has sailed.  The only sane way to do this is a new binary.


"Ideally" still applies to any new binary.


Well, "ideally" any new binary would only have a few command line 
options, and ordering would be mostly irrelevant.  For example I'd 
expect a QMP binary to only have a few options, mostly for 
debugging/development (-L, -trace) and for process-wide settings (such 
as -name).



Likewise, we'd fail QMP commands that are "out of phase".
@allow-preconfig is a crutch that only exists because we're afraid (with
reason) of hidden assumptions in QMP commands.


At this point, it's not even like that anymore (except for block devices
because my patches haven't been applied).


My point is that we still have quite a few commands without
'allow-preconfig' mostly because we are afraid of allowing them in
preconfig state, not because of true phase dependencies.


I think there's very few of them, if any (outside the block layer for 
which patches exist), and those are due to distraction more than fear.


Paolo




Re: Guests wont start with 15 pcie-root-port devices

2021-11-12 Thread Igor Mammedov
On Fri, 12 Nov 2021 17:53:42 +
Daniel P. Berrangé  wrote:

> On Fri, Nov 12, 2021 at 12:35:07PM -0500, Brian Rak wrote:
> > In 6.1, a guest with 15 empty pcie-root-port devices will not boot properly
> > - it just hangs on "Guest has not initialized the display (yet).".  As soon
> > as I remove the last pcie-root-port, the guest begins starting up normally. 
> >   
> 
> Yes, QEMU 6.1 has a regression
> 
>   https://gitlab.com/qemu-project/qemu/-/issues/641 
> 
> 
> > commit e2a6290aab578b2170c1f5909fa556385dc0d820
> > Author: Marcel Apfelbaum 
> > Date:   Mon Aug 2 12:00:57 2021 +0300
> > 
> >     hw/pcie-root-port: Fix hotplug for PCI devices requiring IO
> > 
> > Although I can't say I really understand why that commit triggered it.  
> 
> It caused the firmware to always allocate I/O space for every port
> and there's limited total I/O space, so it runs out at 15 devices.

alternatively instead of reverting to native PCIe hotplug as in the issue
Daniel's mentioned, you can apply following fix
 https://patchew.org/QEMU/2022110857.3116853-1-imamm...@redhat.com/

> 
> Regards,
> Daniel




Re: [PATCH v4 0/1] hw/hyperv/vmbus: Is it maintained?

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 09:32:31PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add Den and Roman (his new address)

Thanks, I missed it on the list indeed.

> 06.11.2021 16:41, Philippe Mathieu-Daudé wrote:
> > This is the 4th time I send this patch. Is the VMBus infrastructure
> > used / maintained? Should we deprecate & remove?

I think it's fair to say it's not maintained.  The whole
hw/hyperv/vmbus.c was submitted as a part of the work by Jon to enable
some obscure windows debugging feature which only worked in presence of
VMBus.  It was mostly taken from the respective branch of the (now
effectively abandoned) downstream tree with an implementation of the
core VMBus infrastructure and the devices using it; however, none of the
actual VMBus devices ever made it into the mainline tree.

> > 
> >$ ./scripts/get_maintainer.pl -f hw/hyperv/vmbus.c -f 
> > include/hw/hyperv/vmbus.h
> >get_maintainer.pl: No maintainers found
> > 
> > Philippe Mathieu-Daudé (1):
> >hw/hyperv/vmbus: Remove unused vmbus_load/save_req()
> > 
> >   include/hw/hyperv/vmbus.h |  3 --
> >   hw/hyperv/vmbus.c | 59 ---
> >   2 files changed, 62 deletions(-)

This seems to basically be the revert of 4dd8a7064b "vmbus: add
infrastructure to save/load vmbus requests"; it was originally meant to
be submitted with the code that would use it, vmbus scsi controller, but
that never happened.  It believe it's safe to remove without affecting
Jon's work, but I'd rather check with him.

Thanks,
Roman.



Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 12:37:59PM +0100, Kevin Wolf wrote:
> Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben:
> > On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> > > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > > > vhost-user-blk realize only attempts to reconnect if the previous
> > > > connection attempt failed on "a problem with the connection and not an
> > > > error related to the content (which would fail again the same way in the
> > > > next attempt)".
> > > > 
> > > > However this distinction is very subtle, and may be inadvertently broken
> > > > if the code changes somewhere deep down the stack and a new error gets
> > > > propagated up to here.
> > > > 
> > > > OTOH now that the number of reconnection attempts is limited it seems
> > > > harmless to try reconnecting on any error.
> > > > 
> > > > So relax the condition of whether to retry connecting to check for any
> > > > error.
> > > > 
> > > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > > > during realize".
> > > > 
> > > > Signed-off-by: Roman Kagan 
> > > 
> > > It results in less than perfect error messages. With a modified export
> > > that just crashes qemu-storage-daemon during get_features, I get:
> > > 
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read 
> > > msg header. Read 0 instead of 12. Original request 1.
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: vhost_backend_init failed: Protocol error
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: Failed to connect to '/tmp/vsock': Connection refused
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: Failed to connect to '/tmp/vsock': Connection refused
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to 
> > > connect to '/tmp/vsock': Connection refused
> > 
> > This patch doesn't change any error messages.  Which ones specifically
> > became less than perfect as a result of this patch?
> 
> But it adds error messages (for each retry), which are different from
> the first error message. As I said this is not the end of the world, but
> maybe a bit more confusing.

Ah, now I see what you mean: it adds reconnection attempts where there
used to be immediate failure return, so now every failed attempt logs
its own message.

> > > I guess this might be tolerable. On the other hand, the patch doesn't
> > > really fix anything either, but just gets rid of possible subtleties.
> > 
> > The remaining patches in the series make other errors beside -EPROTO
> > propagate up to this point, and some (most) of them are retryable.  This
> > was the reason to include this patch at the beginning of the series (I
> > guess I should've mentioned that in the patch log).
> 
> I see. I hadn't looked at the rest of the series yet because I ran out
> of time, but now that I'm skimming them, I see quite a few places that
> use non-EPROTO, but I wonder which of them actually should be
> reconnected. So far all I saw were presumably persistent errors where a
> retry won't help. Can you give me some examples?

E.g. the particular case you mention earlier, -ECONNREFUSED, is not
unlikely to happen due to the vhost-user server restart for maintenance;
in this case retying looks like a reasonable thing to do, doesn't it?

Thanks,
Roman.



[PATCH 2/2] vdpa: Check for existence of opts.vhostdev

2021-11-12 Thread Eugenio Pérez
Since net_init_vhost_vdpa is trying to open it. Not specifying it in the
command line crash qemu.

Fixes: 7327813d17 ("vhost-vdpa: open device fd in net_init_vhost_vdpa()")
Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 6ffb29f4da..bbd3576f23 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -260,6 +260,10 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
+if (!opts->vhostdev) {
+error_setg(errp, "vdpa character device not specified with vhostdev");
+return -1;
+}
 
 vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
 if (vdpa_device_fd == -1) {
-- 
2.27.0




[PATCH 1/2] vdpa: Replace qemu_open_old by qemu_open at

2021-11-12 Thread Eugenio Pérez
There is no reason to keep using the old one, since we neither use the
variadics arguments nor open it with O_DIRECT.

Also, net_client_init1, the caller of net_init_vhost_vdpa, wants all
net_client_init_fun to use Error API, so it's a good step in that
direction.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 49ab322511..6ffb29f4da 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -261,7 +261,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
 
-vdpa_device_fd = qemu_open_old(opts->vhostdev, O_RDWR);
+vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
 if (vdpa_device_fd == -1) {
 return -errno;
 }
-- 
2.27.0




[PATCH 0/2] vdpa: Check for existence of opts.vhostdev

2021-11-12 Thread Eugenio Pérez
Since net_init_vhost_vdpa is trying to open it. Not specifying it in the
command line crash qemu.

While we're at it, stop using qemu_open_old.

Eugenio Pérez (2):
  vdpa: Replace qemu_open_old by qemu_open at
  vdpa: Check for existence of opts.vhostdev

 net/vhost-vdpa.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.27.0





Re: [RFC PATCH 1/6] mm: Add F_SEAL_GUEST to shmem/memfd

2021-11-12 Thread Kirill A. Shutemov
On Thu, Nov 11, 2021 at 10:13:40PM +0800, Chao Peng wrote:
> The new seal is only allowed if there's no pre-existing pages in the fd
> and there's no existing mapping of the file. After the seal is set, no
> read/write/mmap from userspace is allowed.
> 
> Signed-off-by: Kirill A. Shutemov 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 

Below is replacement patch with fallocate callback support.

I also replaced page_level if order of the page because PG_LEVEL_2M/4K is
x86-specific can cannot be used in the generic code.

There's also bugix in guest_invalidate_page().


>From 9419ccb4bc3c1df4cc88f6c8ba212f4b1699 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" 
Date: Fri, 12 Nov 2021 21:27:40 +0300
Subject: [PATCH] mm/shmem: Introduce F_SEAL_GUEST

The new seal type provides semantics required for KVM guest private
memory support. A file descriptor with the seal set is going to be used
as source of guest memory in confidential computing environments such as
Intel TDX and AMD SEV.

F_SEAL_GUEST can only be set on empty memfd. After the seal is set
userspace cannot read, write or mmap the memfd.

Userspace is in charge of guest memory lifecycle: it can allocate the
memory with falloc or punch hole to free memory from the guest.

The file descriptor passed down to KVM as guest memory backend. KVM
register itself as the owner of the memfd via memfd_register_guest().

KVM provides callback that needed to be called on fallocate and punch
hole.

memfd_register_guest() returns callbacks that need be used for
requesting a new page from memfd.

Signed-off-by: Kirill A. Shutemov 
---
 include/linux/memfd.h  |  24 
 include/linux/shmem_fs.h   |   9 +++
 include/uapi/linux/fcntl.h |   1 +
 mm/memfd.c |  32 +-
 mm/shmem.c | 117 -
 5 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/include/linux/memfd.h b/include/linux/memfd.h
index 4f1600413f91..500dfe88043e 100644
--- a/include/linux/memfd.h
+++ b/include/linux/memfd.h
@@ -4,13 +4,37 @@
 
 #include 
 
+struct guest_ops {
+   void (*invalidate_page_range)(struct inode *inode, void *owner,
+ pgoff_t start, pgoff_t end);
+   void (*fallocate)(struct inode *inode, void *owner,
+ pgoff_t start, pgoff_t end);
+};
+
+struct guest_mem_ops {
+   unsigned long (*get_lock_pfn)(struct inode *inode, pgoff_t offset,
+ int *order);
+   void (*put_unlock_pfn)(unsigned long pfn);
+
+};
+
 #ifdef CONFIG_MEMFD_CREATE
 extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long 
arg);
+
+extern inline int memfd_register_guest(struct inode *inode, void *owner,
+  const struct guest_ops *guest_ops,
+  const struct guest_mem_ops 
**guest_mem_ops);
 #else
 static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned long a)
 {
return -EINVAL;
 }
+static inline int memfd_register_guest(struct inode *inode, void *owner,
+  const struct guest_ops *guest_ops,
+  const struct guest_mem_ops 
**guest_mem_ops)
+{
+   return -EINVAL;
+}
 #endif
 
 #endif /* __LINUX_MEMFD_H */
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 8e775ce517bb..265d0c13bc5e 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -12,6 +12,9 @@
 
 /* inode in-kernel data */
 
+struct guest_ops;
+struct guest_mem_ops;
+
 struct shmem_inode_info {
spinlock_t  lock;
unsigned intseals;  /* shmem seals */
@@ -24,6 +27,8 @@ struct shmem_inode_info {
struct simple_xattrsxattrs; /* list of xattrs */
atomic_tstop_eviction;  /* hold when working on inode */
struct inodevfs_inode;
+   void*guest_owner;
+   const struct guest_ops  *guest_ops;
 };
 
 struct shmem_sb_info {
@@ -90,6 +95,10 @@ extern unsigned long shmem_swap_usage(struct vm_area_struct 
*vma);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 
+extern int shmem_register_guest(struct inode *inode, void *owner,
+   const struct guest_ops *guest_ops,
+   const struct guest_mem_ops **guest_mem_ops);
+
 /* Flag allocation requirements to shmem_getpage */
 enum sgp_type {
SGP_READ,   /* don't exceed i_size, don't allocate page */
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 2f86b2ad6d7e..c79bc8572721 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,7 @@
 #define F_SEAL_GROW0x0004  /* prevent file from growing */
 #define F_SEAL_WRITE   0x0008  /* prevent writes 

Re: [PATCH 04/10] chardev/char-fe: don't allow EAGAIN from blocking read

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 12:24:06PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Nov 11, 2021 at 7:44 PM Roman Kagan  wrote:
> 
> > As its name suggests, ChardevClass.chr_sync_read is supposed to do a
> > blocking read.  The only implementation of it, tcp_chr_sync_read, does
> > set the underlying io channel to the blocking mode indeed.
> >
> > Therefore a failure return with EAGAIN is not expected from this call.
> >
> > So do not retry it in qemu_chr_fe_read_all; instead place an assertion
> > that it doesn't fail with EAGAIN.
> >
> 
> The code was introduced in :
> commit 7b0bfdf52d694c9a3a96505aa42ce3f8d63acd35
> Author: Nikolay Nikolaev 
> Date:   Tue May 27 15:03:48 2014 +0300
> 
> Add chardev API qemu_chr_fe_read_all

Right, but at that point chr_sync_read wasn't made to block.  It
happened later in

commit bcdeb9be566ded2eb35233aaccf38742a21e5daa
Author: Marc-André Lureau 
Date:   Thu Jul 6 19:03:53 2017 +0200

chardev: block during sync read

A sync read should block until all requested data is
available (instead of retrying in qemu_chr_fe_read_all). Change the
channel to blocking during sync_read.

> > @@ -68,13 +68,10 @@ int qemu_chr_fe_read_all(CharBackend *be, uint8_t
> > *buf, int len)
> >  }
> >
> >  while (offset < len) {
> > -retry:
> >  res = CHARDEV_GET_CLASS(s)->chr_sync_read(s, buf + offset,
> >len - offset);
> > -if (res == -1 && errno == EAGAIN) {
> > -g_usleep(100);
> > -goto retry;
> > -}
> > +/* ->chr_sync_read should block */
> > +assert(!(res < 0 && errno == EAGAIN));
> >
> >
> While I agree with the rationale to clean this code a bit, I am not so sure
> about replacing it with an assert(). In the past, when we did such things
> we had unexpected regressions :)

Valid point, qemu may be run against some OS where a blocking call may
sporadically return -EAGAIN, and it would be hard to reliably catch this
with testing.

> A slightly better approach perhaps is g_warn_if_fail(), although it's not
> very popular in qemu.

I think the first thing to decide is whether -EAGAIN from a blocking
call isn't broken enough, and justifies (unlimited) retries.  I'm
tempted to just remove any special handling of -EAGAIN and treat it as
any other error, leaving up to the caller to handle (most probably to
fail the call and initiate a recovery, if possible).

Does this make sense?

Thanks,
Roman.



Re: [PATCH v2 1/1] Jobs based on custom runners: add CentOS Stream 8

2021-11-12 Thread Willian Rampazzo
On Thu, Nov 11, 2021 at 1:06 PM Cleber Rosa  wrote:
>
> This introduces three different parts of a job designed to run
> on a custom runner managed by Red Hat.  The goals include:
>
>   a) propose a model for other organizations that want to onboard
>  their own runners, with their specific platforms, build
>  configuration and tests.
>
>   b) bring awareness to the differences between upstream QEMU and the
>  version available under CentOS Stream, which is "A preview of
>  upcoming Red Hat Enterprise Linux minor and major releases".
>
>   c) because of b), it should be easier to identify and reduce the gap
>  between Red Hat's downstream and upstream QEMU.
>
> The components of this custom job are:
>
>   I) OS build environment setup code:
>
>  - additions to the existing "build-environment.yml" playbook
>that can be used to set up CentOS/EL 8 systems.
>
>  - a CentOS Stream 8 specific "build-environment.yml" playbook
>that adds to the generic one.
>
>  II) QEMU build configuration: a script that will produce binaries with
>  features as similar as possible to the ones built and packaged on
>  CentOS stream 8.
>
> III) Scripts that define the minimum amount of testing that the
>  binaries built with the given configuration (point II) under the
>  given OS build environment (point I) should be subjected to.
>
>  IV) Job definition: GitLab CI jobs that will dispatch the build/test
>  jobs (see points #II and #III) to the machine specifically
>  configured according to #I.
>
> Signed-off-by: Cleber Rosa 
> ---
>  .gitlab-ci.d/custom-runners.yml   |  29 +++
>  docs/devel/ci-jobs.rst.inc|   7 +
>  .../org.centos/stream/8/build-environment.yml |  51 +
>  .../ci/org.centos/stream/8/x86_64/configure   | 208 ++
>  .../org.centos/stream/8/x86_64/test-avocado   |  70 ++
>  scripts/ci/org.centos/stream/README   |  17 ++
>  scripts/ci/setup/build-environment.yml|  38 
>  7 files changed, 420 insertions(+)
>  create mode 100644 scripts/ci/org.centos/stream/8/build-environment.yml
>  create mode 100755 scripts/ci/org.centos/stream/8/x86_64/configure
>  create mode 100755 scripts/ci/org.centos/stream/8/x86_64/test-avocado
>  create mode 100644 scripts/ci/org.centos/stream/README
>

Maybe it is too late, but just for the records:

Reviewed-by: Willian Rampazzo 
Tested-by: Willian Rampazzo 

CI job on a custom VM runner:
https://gitlab.com/willianrampazzo/qemu/-/jobs/1778451942




Re: [PATCH v4 0/1] hw/hyperv/vmbus: Is it maintained?

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

Add Den and Roman (his new address)

06.11.2021 16:41, Philippe Mathieu-Daudé wrote:

This is the 4th time I send this patch. Is the VMBus infrastructure
used / maintained? Should we deprecate & remove?

   $ ./scripts/get_maintainer.pl -f hw/hyperv/vmbus.c -f 
include/hw/hyperv/vmbus.h
   get_maintainer.pl: No maintainers found

Philippe Mathieu-Daudé (1):
   hw/hyperv/vmbus: Remove unused vmbus_load/save_req()

  include/hw/hyperv/vmbus.h |  3 --
  hw/hyperv/vmbus.c | 59 ---
  2 files changed, 62 deletions(-)




--
Best regards,
Vladimir



Re: [PATCH v1] job.c: add missing notifier initialization

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

03.11.2021 19:21, Emanuele Giuseppe Esposito wrote:

It seems that on_idle list is not properly initialized like
the other notifiers.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  job.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/job.c b/job.c
index dbfa67bb0a..54db80df66 100644
--- a/job.c
+++ b/job.c
@@ -352,6 +352,7 @@ void *job_create(const char *job_id, const JobDriver 
*driver, JobTxn *txn,
  notifier_list_init(>on_finalize_completed);
  notifier_list_init(>on_pending);
  notifier_list_init(>on_ready);
+notifier_list_init(>on_idle);
  
  job_state_transition(job, JOB_STATUS_CREATED);

  aio_timer_init(qemu_get_aio_context(), >sleep_timer,



Reviewed-by: Vladimir Sementsov-Ogievskiy 


I don't think it worth applying it now:

job object is alloced with g_malloc0, so job->on_idle is initialized to zero.

notifier_list_init() simply calls QLIST_INIT(), which initializes the only 
field of QLIST structure to NULL. So, actually these notifier_list_init() calls 
are no-op in this context.

I queue it in jobs branch, but will not send a pull request until more critical 
fix comes for 6.2 or 6.3 development starts.

Thanks!

--
Best regards,
Vladimir



Re: Guests wont start with 15 pcie-root-port devices

2021-11-12 Thread Daniel P . Berrangé
On Fri, Nov 12, 2021 at 12:35:07PM -0500, Brian Rak wrote:
> In 6.1, a guest with 15 empty pcie-root-port devices will not boot properly
> - it just hangs on "Guest has not initialized the display (yet).".  As soon
> as I remove the last pcie-root-port, the guest begins starting up normally. 

Yes, QEMU 6.1 has a regression

  https://gitlab.com/qemu-project/qemu/-/issues/641 


> commit e2a6290aab578b2170c1f5909fa556385dc0d820
> Author: Marcel Apfelbaum 
> Date:   Mon Aug 2 12:00:57 2021 +0300
> 
>     hw/pcie-root-port: Fix hotplug for PCI devices requiring IO
> 
> Although I can't say I really understand why that commit triggered it.

It caused the firmware to always allocate I/O space for every port
and there's limited total I/O space, so it runs out at 15 devices.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Guests wont start with 15 pcie-root-port devices

2021-11-12 Thread Brian Rak
In 6.1, a guest with 15 empty pcie-root-port devices will not boot 
properly - it just hangs on "Guest has not initialized the display 
(yet).".  As soon as I remove the last pcie-root-port, the guest begins 
starting up normally.  My qemu command line:


/usr/libexec/qemu-kvm -name guest=xxx,debug-threads=on -S -object 
{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-29-xxx/master-key.aes"} 
-machine 
pc-q35-6.1,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram 
-cpu Haswell-noTSX-IBRS -m 4096 -object 
{"qom-type":"memory-backend-ram","id":"pc.ram","size":4294967296} 
-overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
daf2c139-4991-4079-9b8a-b4c98fc675e0 -no-user-config -nodefaults 
-chardev socket,id=charmonitor,fd=31,server=on,wait=off -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device 
pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 
-device 
pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 
-device 
pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 
-device 
pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 
-device 
pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 
-device 
pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 
-device 
pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 
-device 
pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 
-device 
pcie-root-port,port=0x18,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 
-device 
pcie-root-port,port=0x19,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 
-device 
pcie-root-port,port=0x1a,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 
-device 
pcie-root-port,port=0x1b,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 
-device 
pcie-root-port,port=0x1c,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 
-device 
pcie-root-port,port=0x1d,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 
-device 
pcie-root-port,port=0x1e,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 
-device qemu-xhci,id=usb,bus=pci.1,addr=0x0 -audiodev 
id=audio1,driver=none -vnc 
127.0.0.1:5410,websocket=41310,password=on,audiodev=audio1 -device 
cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -device 
virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x0 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny 
-msg timestamp=on


The same guest XML that produced this worked fine in 5.2  I was able to 
bisect this down to this commit:



commit e2a6290aab578b2170c1f5909fa556385dc0d820
Author: Marcel Apfelbaum 
Date:   Mon Aug 2 12:00:57 2021 +0300

    hw/pcie-root-port: Fix hotplug for PCI devices requiring IO

Although I can't say I really understand why that commit triggered it.




[RFC PATCH] hw/intc: clean-up error reporting for failed ITS cmd

2021-11-12 Thread Alex Bennée
While trying to debug a GIC ITS failure I saw some guest errors that
had poor formatting as well as leaving me confused as to what failed.
As most of the checks aren't possible without a valid dte split that
check apart and then check the other conditions in steps. This avoids
us relying on undefined data.

I still get a failure with the current kvm-unit-tests but at least I
know (partially) why now:

  Exception return from AArch64 EL1 to AArch64 EL1 PC 0x40080588
  PASS: gicv3: its-trigger: inv/invall: dev2/eventid=20 now triggers an LPI
  ITS: MAPD devid=2 size = 0x8 itt=0x4043 valid=0
  INT dev_id=2 event_id=20
  process_its_cmd: invalid command attributes: invalid dte: 0 for 2 (MEM_TX: 0)
  PASS: gicv3: its-trigger: mapd valid=false: no LPI after device unmap
  SUMMARY: 6 tests, 1 unexpected failures

Signed-off-by: Alex Bennée 
Cc: Shashi Mallela 
Cc: Peter Maydell 
---
 hw/intc/arm_gicv3_its.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 84bcbb5f56..d5267814ab 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -274,21 +274,36 @@ static bool process_its_cmd(GICv3ITSState *s, uint64_t 
value, uint32_t offset,
 if (res != MEMTX_OK) {
 return result;
 }
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: invalid command attributes: "
+  "invalid dte: %"PRIx64" for %d (MEM_TX: %d)\n",
+  __func__, dte, devid, res);
+return result;
 }
 
-if ((devid > s->dt.maxids.max_devids) || !dte_valid || !ite_valid ||
-!cte_valid || (eventid > max_eventid)) {
+
+/*
+ * In this implementation, in case of guest errors we ignore the
+ * command and move onto the next command in the queue.
+ */
+if (devid > s->dt.maxids.max_devids) {
 qemu_log_mask(LOG_GUEST_ERROR,
-  "%s: invalid command attributes "
-  "devid %d or eventid %d or invalid dte %d or"
-  "invalid cte %d or invalid ite %d\n",
-  __func__, devid, eventid, dte_valid, cte_valid,
-  ite_valid);
-/*
- * in this implementation, in case of error
- * we ignore this command and move onto the next
- * command in the queue
- */
+  "%s: invalid command attributes: devid %d>%d",
+  __func__, devid, s->dt.maxids.max_devids);
+
+} else if (!dte_valid || !ite_valid || !cte_valid) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: invalid command attributes: "
+  "dte: %s, ite: %s, cte: %s\n",
+  __func__,
+  dte_valid ? "valid" : "invalid",
+  ite_valid ? "valid" : "invalid",
+  cte_valid ? "valid" : "invalid");
+} else if (eventid > max_eventid) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: invalid command attributes: eventid %d > %d\n",
+  __func__, eventid, max_eventid);
 } else {
 /*
  * Current implementation only supports rdbase == procnum
-- 
2.30.2




Re: [RFC PATCH v1 1/3] virtio: introduce virtio_force_modern()

2021-11-12 Thread Halil Pasic
On Fri, 12 Nov 2021 16:55:10 +0100
Cornelia Huck  wrote:

> On Fri, Nov 12 2021, Halil Pasic  wrote:
> 
> > On Fri, 29 Oct 2021 16:53:25 +0200
> > Cornelia Huck  wrote:
> >  
> >> On Fri, Oct 29 2021, Halil Pasic  wrote:
> >>   
> >> > Legacy vs modern should be detected via transport specific means. We
> >> > can't wait till feature negotiation is done. Let us introduce
> >> > virtio_force_modern() as a means for the transport code to signal
> >> > that the device should operate in modern mode (because a modern driver
> >> > was detected).
> >> >
> >> > Signed-off-by: Halil Pasic 
> >> > ---
> >> >
> >> > I'm still struggling with how to deal with vhost-user and co. The
> >> > problem is that I'm not very familiar with the life-cycle of, let us
> >> > say, a vhost_user device.
> >> >
> >> > Looks to me like the vhost part might be just an implementation detail,
> >> > and could even become a hot swappable thing.
> >> >
> >> > Another thing is, that vhost processes set_features differently. It
> >> > might or might not be a good idea to change this.
> >> >
> >> > Does anybody know why don't we propagate the features on features_set,
> >> > but under a set of different conditions, one of which is the vhost
> >> > device is started?
> >> > ---
> >> >  hw/virtio/virtio.c | 12 
> >> >  include/hw/virtio/virtio.h |  1 +
> >> >  2 files changed, 13 insertions(+)
> >> >
> >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> > index 3a1f6c520c..75aee0e098 100644
> >> > --- a/hw/virtio/virtio.c
> >> > +++ b/hw/virtio/virtio.c
> >> > @@ -3281,6 +3281,18 @@ void virtio_init(VirtIODevice *vdev, const char 
> >> > *name,
> >> >  vdev->use_guest_notifier_mask = true;
> >> >  }
> >> >  
> >> > +void  virtio_force_modern(VirtIODevice *vdev)
> >> 
> >>  I'm not sure I like that name. We're not actually forcing the
> >> device to be modern, we just set an early indication in the device
> >> before proper feature negotiation has finished. Maybe
> >> virtio_indicate_modern()?   
> >
> >
> > I don't like virtio_indicate_modern(dev) form object orientation
> > perspective. In an OO language one would write it like
> > dev.virtio_indicate_modern()
> > which would read like the device should indicate modern to somebody.  
> 
> I think that is actually what happens: we indicate that it is a modern
> device to the code making the endianness decisions.
> 

But in an OO school of thought that code belongs to the given
virtio device object and is one of the building blocks that makes the
object what it is. What I'm trying to explain is: that code ain't no
external entity we have to indicate something to.

On the contrary, if we had to indicate 'modern' to the driver, how would
you name that function? Clearly we don't need such functionality, I'm
just trying to make an argument here.

To take a different example, imagine a ccw channel path. We may break the
the channel path, we may indicate to the OS that the channel path is
broken (via CRW), and we may do first break than indicate.


> >
> > In my opinion what happens is that we want to disable the legacy
> > interface if it is exposed by the device, or in other words instruct the
> > device that should act (precisely and exclusively) according to the
> > interface specification of the modern interface.  
> 
> I don't see us disabling anything; the driver has already chosen what
> they want, and we simply need to make sure that all code honours that
> decision.

IMHO a buggy driver could make an attempt at using the legacy interface
at least in case of pci.

My understanding is that the decision of the driver results in an
interaction between the driver and the device, and as a result of that
interaction, the state of the device changes. This function is supposed
to implement that state-change. 

Do we agree that there is a state change? If yes, how would you describe
that state change?

> 
> >
> > Maybe we can find a better name than force_modern, but I don't think
> > indicate_modern is a better name.
> >  
> >>   
> >> > +{
> >> > +/*
> >> > + * This takes care of the devices that implement config space access
> >> > + * in QEMU. For vhost-user and similar we need to make sure the 
> >> > features
> >> > + * are actually propagated to the device implementing the config 
> >> > space.
> >> > + *
> >> > + * A VirtioDeviceClass callback may be a good idea.
> >> > + */
> >> > +virtio_set_features(vdev, (1ULL << VIRTIO_F_VERSION_1));
> >> 
> >> Do we really need/want to do the whole song-and-dance for setting
> >> features, just for setting VERSION_1?   
> >
> > When doing the whole song-and-dance the chance is higher that the
> > information will propagate to every place it needs to reach. For
> > example to the acked_features of vhost_dev. I've just posted a v2 RFC.
> > It should not be hard to see what I mean after examining that RFC.
> >  
> >> Devices may modify some of their
> >> behaviour or features, 

Re: Fwd: New Defects reported by Coverity Scan for QEMU

2021-11-12 Thread Matheus K. Ferst

On 10/11/2021 05:18, Cédric Le Goater wrote:

Hello Luis,

Coverity found a couple of issues which seem related to the DFP patchset.
Could you please take a look ?

Thanks,

C.


 Forwarded Message 
Subject: New Defects reported by Coverity Scan for QEMU
Date: Tue, 9 Nov 2021 22:09:40 +
From: scan-ad...@coverity.com
To: c...@kaod.org

Hi,

Please find the latest report on new defect(s) introduced to QEMU found 
with Coverity Scan.


16 new defect(s) introduced to QEMU found with Coverity Scan.
19 defect(s), reported by Coverity Scan earlier, were marked fixed in 
the recent build analyzed by Coverity Scan.


New defect(s) Reported-by: Coverity Scan
Showing 16 of 16 defect(s)


** CID 1465791:  Uninitialized variables  (UNINIT)


 


*** CID 1465791:  Uninitialized variables  (UNINIT)
/qemu/target/ppc/dfp_helper.c: 1202 in helper_DENBCD()
1196 
}    \
1197 
dfp_finalize_decimal##size();    \
1198 
dfp_set_FPRF_from_FRT(); \
1199 set_dfp##size(t, 
);   \

1200 }
1201

    CID 1465791:  Uninitialized variables  (UNINIT)
    Using uninitialized element of array "dfp.vt" when calling 
"set_dfp64".

1202 DFP_HELPER_ENBCD(DENBCD, 64)
1203 DFP_HELPER_ENBCD(DENBCDQ, 128)


Hi Cédric,

The only change was the helper name that is now uppercase, so nothing 
new here. The underlying cause is that dfp_finalize_decimal64 only sets 
dfp->vt.VsrD(1) and set_dfp64 receives a pointer to the complete struct.


But since set_dfp64 also only access VsrD(1), it shouldn't be a real 
problem AFAICT. The same applies to CID 1465776~1465786 and 1465788~1465790.



** CID 1465787:    (BAD_SHIFT)
/qemu/target/ppc/int_helper.c: 369 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 370 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 356 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 356 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 356 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 369 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 370 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 370 in helper_CFUGED()
/qemu/target/ppc/int_helper.c: 369 in helper_CFUGED()


 


*** CID 1465787:    (BAD_SHIFT)
/qemu/target/ppc/int_helper.c: 369 in helper_CFUGED()
363 /*
364  * Discards the processed bits from 'src' and 'mask'. 
Note that we are
365  * removing 'n' trailing zeros from 'mask', but the 
logical shift will
366  * add 'n' leading zeros back, so the population count 
of 'mask' is kept

367  * the same.
368  */

    CID 1465787:    (BAD_SHIFT)
    In expression "src >>= n", right shifting by more than 63 bits 
has undefined behavior.  The shift amount, "n", is as much as 64.


Similar case here, the helper was just renamed. The value of "n" comes 
from ctz64(mask) and mask == 0 is a trivial case handled before anything 
else.


Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer 



Re: [PATCH v2 10/10] iotests/030: Unthrottle parallel jobs in reverse

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

See the comment for why this is necessary.

Signed-off-by: Hanna Reitz 
---
  tests/qemu-iotests/030 | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 5fb65b4bef..567bf1da67 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
   speed=1024)
  self.assert_qmp(result, 'return', {})
  
-for job in pending_jobs:

+# Do this in reverse: After unthrottling them, some jobs may finish
+# before we have unthrottled all of them.  This will drain their
+# subgraph, and this will make jobs above them advance (despite those
+# jobs on top being throttled).  In the worst case, all jobs below the
+# top one are finished before we can unthrottle it, and this makes it
+# advance so far that it completes before we can unthrottle it - which
+# results in an error.
+# Starting from the top (i.e. in reverse) does not have this problem:
+# When a job finishes, the ones below it are not advanced.


Hmm, interesting why only jobs above the finished job may advance in the 
situation..

Looks like something may change and this workaround will stop working.

Isn't it better just handle the error, and don't care if job was just finished?

Something like

if result['return'] != {}:
   # Job was finished during drain caused by finish of already unthrottled job
   self.assert_qmp(result, 'error/class', 'DeviceNotActive')

Next thing in the test case is checking for completion events, so we'll get all 
events anyway.



+for job in reversed(pending_jobs):
  result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
  self.assert_qmp(result, 'return', {})
  




--
Best regards,
Vladimir



Re: [PATCH v2 09/10] block: Let replace_child_noperm free children

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz 
---
  block.c | 102 +++-
  1 file changed, 79 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index a40027161c..0ac5b163d2 100644
--- a/block.c
+++ b/block.c
@@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
  static bool bdrv_recurse_has_child(BlockDriverState *bs,
 BlockDriverState *child);
  
+static void bdrv_child_free(BdrvChild *child);

  static void bdrv_replace_child_noperm(BdrvChild **child,
-  BlockDriverState *new_bs);
+  BlockDriverState *new_bs,
+  bool free_empty_child);
  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
BdrvChild *child,
Transaction *tran);
@@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
  BdrvChild *child;
  BdrvChild **childp;
  BlockDriverState *old_bs;
+bool free_empty_child;
  } BdrvReplaceChildState;
  
  static void bdrv_replace_child_commit(void *opaque)

  {
  BdrvReplaceChildState *s = opaque;
  
+if (s->free_empty_child && !s->child->bs) {

+bdrv_child_free(s->child);
+}
  bdrv_unref(s->old_bs);
  }
  
@@ -2278,22 +2284,26 @@ static void bdrv_replace_child_abort(void *opaque)

   * modify the BdrvChild * pointer we indirectly pass to it, i.e. it
   * will not modify s->child.  From that perspective, it does not 
matter
   * whether we pass s->childp or >child.
- * (TODO: Right now, bdrv_replace_child_noperm() never modifies that
- * pointer anyway (though it will in the future), so at this point it
- * absolutely does not matter whether we pass s->childp or >child.)
   * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
   * it here.
   * (3) If new_bs is NULL, *s->childp will have been NULLed by
   * bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and 
we
   * must not pass a NULL *s->childp here.
- * (TODO: In its current state, bdrv_replace_child_noperm() will not
- * have NULLed *s->childp, so this does not apply yet.  It will in the
- * future.)


What I don't like about this patch is that it does two different things: 
zeroing the pointer and clearing the object. And if we look at the latter in 
separate, it seems that it's not needed:

Look: bdrv_replace_child_tran(): new parameter is set to true in two places, in 
both of them we are sure (and do assertion and comment) that new bs is not NULL 
and nothing will be freed.

Similarly, bdrv_replace_child_noperm() is called with true in two places where 
we sure that new bs is not NULL.

and only one place where new parameter set to true really do something:


@@ -2960,8 +3013,7 @@ static void bdrv_detach_child(BdrvChild **childp)
  {
  BlockDriverState *old_bs = (*childp)->bs;
  
-bdrv_replace_child_noperm(childp, NULL);

-bdrv_child_free(*childp);
+bdrv_replace_child_noperm(childp, NULL, true);
  
  if (old_bs) {

  /*


And it doesn't worth the whole complexity of new parameters for two functions.

In this place we can simply do something like

BdrvChild *child = *childp;

bdrv_replace_child_noperm(childp, NULL);

bdrv_child_free(child);


I understand the idea: it seems good and intuitive to do zeroing the pointer and clearing 
the object in one shot. But this patch itself shows that we just can't do it in 90% of 
cases. So, I think better is not do it and live with only "zeroing the pointer" 
part of this patch.





Another idea that come to my mind while reviewing this series: did you consider zeroing 
bs->file / bs->backing in .detach, like you do with bs->children list at start of 
the series?  We can 

Re: [RFC PATCH v1 1/3] virtio: introduce virtio_force_modern()

2021-11-12 Thread Cornelia Huck
On Fri, Nov 12 2021, Halil Pasic  wrote:

> On Fri, 29 Oct 2021 16:53:25 +0200
> Cornelia Huck  wrote:
>
>> On Fri, Oct 29 2021, Halil Pasic  wrote:
>> 
>> > Legacy vs modern should be detected via transport specific means. We
>> > can't wait till feature negotiation is done. Let us introduce
>> > virtio_force_modern() as a means for the transport code to signal
>> > that the device should operate in modern mode (because a modern driver
>> > was detected).
>> >
>> > Signed-off-by: Halil Pasic 
>> > ---
>> >
>> > I'm still struggling with how to deal with vhost-user and co. The
>> > problem is that I'm not very familiar with the life-cycle of, let us
>> > say, a vhost_user device.
>> >
>> > Looks to me like the vhost part might be just an implementation detail,
>> > and could even become a hot swappable thing.
>> >
>> > Another thing is, that vhost processes set_features differently. It
>> > might or might not be a good idea to change this.
>> >
>> > Does anybody know why don't we propagate the features on features_set,
>> > but under a set of different conditions, one of which is the vhost
>> > device is started?
>> > ---
>> >  hw/virtio/virtio.c | 12 
>> >  include/hw/virtio/virtio.h |  1 +
>> >  2 files changed, 13 insertions(+)
>> >
>> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> > index 3a1f6c520c..75aee0e098 100644
>> > --- a/hw/virtio/virtio.c
>> > +++ b/hw/virtio/virtio.c
>> > @@ -3281,6 +3281,18 @@ void virtio_init(VirtIODevice *vdev, const char 
>> > *name,
>> >  vdev->use_guest_notifier_mask = true;
>> >  }
>> >  
>> > +void  virtio_force_modern(VirtIODevice *vdev)  
>> 
>>  I'm not sure I like that name. We're not actually forcing the
>> device to be modern, we just set an early indication in the device
>> before proper feature negotiation has finished. Maybe
>> virtio_indicate_modern()? 
>
>
> I don't like virtio_indicate_modern(dev) form object orientation
> perspective. In an OO language one would write it like
> dev.virtio_indicate_modern()
> which would read like the device should indicate modern to somebody.

I think that is actually what happens: we indicate that it is a modern
device to the code making the endianness decisions.

>
> In my opinion what happens is that we want to disable the legacy
> interface if it is exposed by the device, or in other words instruct the
> device that should act (precisely and exclusively) according to the
> interface specification of the modern interface.

I don't see us disabling anything; the driver has already chosen what
they want, and we simply need to make sure that all code honours that
decision.

>
> Maybe we can find a better name than force_modern, but I don't think
> indicate_modern is a better name.
>
>> 
>> > +{
>> > +/*
>> > + * This takes care of the devices that implement config space access
>> > + * in QEMU. For vhost-user and similar we need to make sure the 
>> > features
>> > + * are actually propagated to the device implementing the config 
>> > space.
>> > + *
>> > + * A VirtioDeviceClass callback may be a good idea.
>> > + */
>> > +virtio_set_features(vdev, (1ULL << VIRTIO_F_VERSION_1));  
>> 
>> Do we really need/want to do the whole song-and-dance for setting
>> features, just for setting VERSION_1? 
>
> When doing the whole song-and-dance the chance is higher that the
> information will propagate to every place it needs to reach. For
> example to the acked_features of vhost_dev. I've just posted a v2 RFC.
> It should not be hard to see what I mean after examining that RFC.
>
>> Devices may modify some of their
>> behaviour or features, depending on what features they are called with,
>
> I believe, if this is the case, we want the behavior that corresponds to
> VERSION_1 set, i.e. 'modern'. So in my understanding this is rather good
> than bad.
>
>> and we will be calling this one again later with what is likely a
>> different feature set. 
>
> That is true, but the driver is allowed to set the features multiple
> times, and since transports only support piecemeal access to the
> features (32 bits at a time), I guess this is biz as usual.

Also see my comment in the v2: I'm not sure how well tested that
actually is.

>
>>Also, the return code is not checked.
>> 
>
> That is true! It might be a good idea to log an error. Unfortunately I
> don't think there is anything else we can sanely do.
>
>> Maybe introduce a new function that sets guest_features directly and
>> errors out if the features are not set in host_features? 
>
> See above.
>
>> If we try to
>> set VERSION_1 here despite the device not offering it, we are in a
>> pickle anyway, as we should not have gotten here if we did not offer it,
>> and we really should moan and fail in that case.
>
> I agree about the moan part. I'm not sure what is the best way to
> 'fail'. Maybe we should continue this discussion in the v2 thread.

Yeah, let's continue there, since that code is a bit different.




Re: [PATCH v4 14/25] include/systemu/blockdev.h: global state API

2021-11-12 Thread Hanna Reitz

Subject: s/systemu/sysemu/

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

blockdev functions run always under the BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  include/sysemu/blockdev.h | 18 ++
  1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 960b54d320..b07f15df09 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -13,9 +13,6 @@
  #include "block/block.h"
  #include "qemu/queue.h"
  
-void blockdev_mark_auto_del(BlockBackend *blk);

-void blockdev_auto_del(BlockBackend *blk);
-
  typedef enum {
  IF_DEFAULT = -1,/* for use with drive_add() only */
  /*
@@ -40,6 +37,16 @@ struct DriveInfo {
  QTAILQ_ENTRY(DriveInfo) next;
  };
  
+/*

+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
+void blockdev_mark_auto_del(BlockBackend *blk);
+void blockdev_auto_del(BlockBackend *blk);
+
  DriveInfo *blk_legacy_dinfo(BlockBackend *blk);
  DriveInfo *blk_set_legacy_dinfo(BlockBackend *blk, DriveInfo *dinfo);
  BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo);
@@ -50,10 +57,13 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int 
unit);
  void drive_check_orphaned(void);
  DriveInfo *drive_get_by_index(BlockInterfaceType type, int index);
  int drive_get_max_bus(BlockInterfaceType type);
-int drive_get_max_devs(BlockInterfaceType type);
  DriveInfo *drive_get_next(BlockInterfaceType type);
  
  DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type,

   Error **errp);
  
+/* Common functions that are neither I/O nor Global State */

+
+int drive_get_max_devs(BlockInterfaceType type);
+


It seems to me like this function is never used and could just be 
dropped.  In any case, if it were used, it looks to me like it’d be used 
in a GS context.  (Not that I know anything about it, but I don’t see 
what makes it different from the other functions here.)


Hanna




Re: [RFC PATCH v1 1/3] virtio: introduce virtio_force_modern()

2021-11-12 Thread Halil Pasic
On Fri, 29 Oct 2021 16:53:25 +0200
Cornelia Huck  wrote:

> On Fri, Oct 29 2021, Halil Pasic  wrote:
> 
> > Legacy vs modern should be detected via transport specific means. We
> > can't wait till feature negotiation is done. Let us introduce
> > virtio_force_modern() as a means for the transport code to signal
> > that the device should operate in modern mode (because a modern driver
> > was detected).
> >
> > Signed-off-by: Halil Pasic 
> > ---
> >
> > I'm still struggling with how to deal with vhost-user and co. The
> > problem is that I'm not very familiar with the life-cycle of, let us
> > say, a vhost_user device.
> >
> > Looks to me like the vhost part might be just an implementation detail,
> > and could even become a hot swappable thing.
> >
> > Another thing is, that vhost processes set_features differently. It
> > might or might not be a good idea to change this.
> >
> > Does anybody know why don't we propagate the features on features_set,
> > but under a set of different conditions, one of which is the vhost
> > device is started?
> > ---
> >  hw/virtio/virtio.c | 12 
> >  include/hw/virtio/virtio.h |  1 +
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 3a1f6c520c..75aee0e098 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -3281,6 +3281,18 @@ void virtio_init(VirtIODevice *vdev, const char 
> > *name,
> >  vdev->use_guest_notifier_mask = true;
> >  }
> >  
> > +void  virtio_force_modern(VirtIODevice *vdev)  
> 
>  I'm not sure I like that name. We're not actually forcing the
> device to be modern, we just set an early indication in the device
> before proper feature negotiation has finished. Maybe
> virtio_indicate_modern()? 


I don't like virtio_indicate_modern(dev) form object orientation
perspective. In an OO language one would write it like
dev.virtio_indicate_modern()
which would read like the device should indicate modern to somebody.

In my opinion what happens is that we want to disable the legacy
interface if it is exposed by the device, or in other words instruct the
device that should act (precisely and exclusively) according to the
interface specification of the modern interface.

Maybe we can find a better name than force_modern, but I don't think
indicate_modern is a better name.

> 
> > +{
> > +/*
> > + * This takes care of the devices that implement config space access
> > + * in QEMU. For vhost-user and similar we need to make sure the 
> > features
> > + * are actually propagated to the device implementing the config space.
> > + *
> > + * A VirtioDeviceClass callback may be a good idea.
> > + */
> > +virtio_set_features(vdev, (1ULL << VIRTIO_F_VERSION_1));  
> 
> Do we really need/want to do the whole song-and-dance for setting
> features, just for setting VERSION_1? 

When doing the whole song-and-dance the chance is higher that the
information will propagate to every place it needs to reach. For
example to the acked_features of vhost_dev. I've just posted a v2 RFC.
It should not be hard to see what I mean after examining that RFC.

> Devices may modify some of their
> behaviour or features, depending on what features they are called with,

I believe, if this is the case, we want the behavior that corresponds to
VERSION_1 set, i.e. 'modern'. So in my understanding this is rather good
than bad.

> and we will be calling this one again later with what is likely a
> different feature set. 

That is true, but the driver is allowed to set the features multiple
times, and since transports only support piecemeal access to the
features (32 bits at a time), I guess this is biz as usual.

>Also, the return code is not checked.
> 

That is true! It might be a good idea to log an error. Unfortunately I
don't think there is anything else we can sanely do.

> Maybe introduce a new function that sets guest_features directly and
> errors out if the features are not set in host_features? 

See above.

> If we try to
> set VERSION_1 here despite the device not offering it, we are in a
> pickle anyway, as we should not have gotten here if we did not offer it,
> and we really should moan and fail in that case.

I agree about the moan part. I'm not sure what is the best way to
'fail'. Maybe we should continue this discussion in the v2 thread.


Thanks for your feedback! Sorry I didn't answer before sending out a v2.

Regards,
Halil

> 
> > +}
> > +
> >  /*
> >   * Only devices that have already been around prior to defining the virtio
> >   * standard support legacy mode; this includes devices not specified in 
> > the  
> 




Re: [PATCH v4 13/25] include/sysemu/blockdev.h: move drive_add and inline drive_def

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

drive_add is only used in softmmu/vl.c, so it can be a static
function there,and drive_def is only a particular use case of
qemu_opts_parse_noisily, so it can be inlined.

Also remove drive_mark_claimed_by_board, as it is only defined
but not implemented (nor used) anywhere.

This also helps simplifying next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block/monitor/block-hmp-cmds.c |  2 +-
  blockdev.c | 27 +--
  include/sysemu/blockdev.h  |  6 ++
  softmmu/vl.c   | 25 -
  4 files changed, 28 insertions(+), 32 deletions(-)


[...]


diff --git a/blockdev.c b/blockdev.c
index c1f6171c6c..1bf49ef610 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -73,7 +73,7 @@ void bdrv_set_monitor_owned(BlockDriverState *bs)
  QTAILQ_INSERT_TAIL(_bdrv_states, bs, monitor_list);
  }
  
-static const char *const if_name[IF_COUNT] = {

+const char *const if_name[IF_COUNT] = {


When making this global, I’d give its name a prefix, like 
`block_if_name` (or even `block_if_type_to_name`).


Hanna


  [IF_NONE] = "none",
  [IF_IDE] = "ide",
  [IF_SCSI] = "scsi",





Re: [RFC PATCH v2 1/5] virtio: introduce virtio_force_modern()

2021-11-12 Thread Cornelia Huck
On Fri, Nov 12 2021, Halil Pasic  wrote:

> Legacy vs modern should be detected via transport specific means. We
> can't wait till feature negotiation is done. Let us introduce
> virtio_force_modern() as a means for the transport code to signal
> that the device should operate in modern mode (because a modern driver
> was detected).
>
> A new callback is added for the situations where the device needs
> to do more than just setting the VIRTIO_F_VERSION_1 feature bit. For
> example, when vhost is involved, we may need to propagate the features
> to the vhost device.
>
> Signed-off-by: Halil Pasic 
> ---
>
> I'm still struggling with how to deal with vhost-user and co. The
> problem is that I'm not very familiar with the life-cycle of, let us
> say, a vhost_user device.
>
> Looks to me like the vhost part might be just an implementation detail,
> and could even become a hot swappable thing.
>
> Another thing is, that vhost processes set_features differently. It
> might or might not be a good idea to change this.
>
> Does anybody know why don't we propagate the features on features_set,
> but under a set of different conditions, one of which is the vhost
> device is started?
> ---
>  hw/virtio/virtio.c | 13 +
>  include/hw/virtio/virtio.h |  2 ++
>  2 files changed, 15 insertions(+)
>

Did you see my feedback in
https://lore.kernel.org/qemu-devel/87tugzc26y@redhat.com/? I think
at least some of it still applies.

> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 3a1f6c520c..26db1b31e6 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3281,6 +3281,19 @@ void virtio_init(VirtIODevice *vdev, const char *name,
>  vdev->use_guest_notifier_mask = true;
>  }
>  
> +void  virtio_force_modern(VirtIODevice *vdev)

I'd still prefer to call this virtio_indicate_modern: we don't really
force anything; the driver has simply already decided that it will use
the modern interface and we provide an early indication in the features
so that code looking at them makes the right decisions.

> +{
> +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> +
> +virtio_add_feature(>guest_features, VIRTIO_F_VERSION_1);
> +/* Let the device do it's normal thing. */
> +virtio_set_features(vdev, vdev->guest_features);

I don't think this is substantially different from setting VERSION_1
only: At least the callers you introduce call this during reset,
i.e. when guest_features is 0 anyway. We still have the whole processing
that is done after feature setting that may have effects different from
what the ultimate feature setting will give us. While I don't think
calling set_features twice is forbidden, that sequence is likely quite
untested, and I'm not sure we can exclude side effects.

> +/* For example for vhost-user we have to propagate to the vhost dev. */
> +if (k->force_modern) {
> +k->force_modern(vdev);
> +}
> +}
> +
>  /*
>   * Only devices that have already been around prior to defining the virtio
>   * standard support legacy mode; this includes devices not specified in the




Re: [PATCH v2 08/10] block: Let replace_child_tran keep indirect pointer

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler.  For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.

Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert.  This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.

(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node.  Luckily, they already are not, but now we should assert this.)

bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction.  Ensure this by keeping a strong reference to the BDS
whose >backing or >file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir



Re: [PATCH v4 12/25] assertions for blockob.h global state API

2021-11-12 Thread Hanna Reitz

Subject: s/blockob.h/blockjob.h/

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  blockjob.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index fbd6c7d873..4982f6a2b5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -61,6 +61,7 @@ static bool is_block_job(Job *job)
  
  BlockJob *block_job_next(BlockJob *bjob)

  {
+assert(qemu_in_main_thread());
  Job *job = bjob ? >job : NULL;


Here and...


  do {
@@ -72,6 +73,7 @@ BlockJob *block_job_next(BlockJob *bjob)
  
  BlockJob *block_job_get(const char *id)

  {
+assert(qemu_in_main_thread());
  Job *job = job_get(id);


...here, the assertion and declaration should be switched.

Hanna




[PATCH v5 10/18] target/riscv: support for 128-bit bitwise instructions

2021-11-12 Thread Frédéric Pétrot
The 128-bit bitwise instructions do not need any function prototype change
as the functions can be applied independently on the lower and upper part of
the registers.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 target/riscv/translate.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 554cf05084..508ae87985 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -448,7 +448,15 @@ static bool gen_logic_imm_fn(DisasContext *ctx, arg_i *a,
 
 func(dest, src1, a->imm);
 
-gen_set_gpr(ctx, a->rd, dest);
+if (get_xl(ctx) == MXL_RV128) {
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv desth = dest_gprh(ctx, a->rd);
+
+func(desth, src1h, -(a->imm < 0));
+gen_set_gpr128(ctx, a->rd, dest, desth);
+} else {
+gen_set_gpr(ctx, a->rd, dest);
+}
 
 return true;
 }
@@ -462,7 +470,16 @@ static bool gen_logic(DisasContext *ctx, arg_r *a,
 
 func(dest, src1, src2);
 
-gen_set_gpr(ctx, a->rd, dest);
+if (get_xl(ctx) == MXL_RV128) {
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv src2h = get_gprh(ctx, a->rs2);
+TCGv desth = dest_gprh(ctx, a->rd);
+
+func(desth, src1h, src2h);
+gen_set_gpr128(ctx, a->rd, dest, desth);
+} else {
+gen_set_gpr(ctx, a->rd, dest);
+}
 
 return true;
 }
-- 
2.33.1




[PATCH v5 14/18] target/riscv: support for 128-bit M extension

2021-11-12 Thread Frédéric Pétrot
Mult are generated inline (using a cool trick pointed out by Richard), but
for div and rem, given the complexity of the implementation of these
instructions, we call helpers to produce their behavior. From an
implementation standpoint, the helpers return the low part of the results,
while the high part is temporarily stored in a dedicated field of cpu_env
that is used to update the architectural register in the generation wrapper.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h  |   3 +
 target/riscv/helper.h   |   6 +
 target/riscv/insn32.decode  |   7 +
 target/riscv/m128_helper.c  | 109 ++
 target/riscv/insn_trans/trans_rvm.c.inc | 183 ++--
 target/riscv/meson.build|   1 +
 6 files changed, 296 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/m128_helper.c

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 8ff5b08d15..ae1f9cb876 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -143,6 +143,9 @@ struct CPURISCVState {
 uint32_t misa_ext;  /* current extensions */
 uint32_t misa_ext_mask; /* max ext for this cpu */
 
+/* 128-bit helpers upper part return value */
+target_ulong retxh;
+
 uint32_t features;
 
 #ifdef CONFIG_USER_ONLY
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c7a5376227..c036825723 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1147,3 +1147,9 @@ DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, 
env, i32)
 DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+/* 128-bit integer multiplication and division */
+DEF_HELPER_5(divu_i128, tl, env, tl, tl, tl, tl)
+DEF_HELPER_5(divs_i128, tl, env, tl, tl, tl, tl)
+DEF_HELPER_5(remu_i128, tl, env, tl, tl, tl, tl)
+DEF_HELPER_5(rems_i128, tl, env, tl, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index afaf243b4e..16d40362e6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -198,6 +198,13 @@ divuw001 .  . 101 . 0111011 @r
 remw 001 .  . 110 . 0111011 @r
 remuw001 .  . 111 . 0111011 @r
 
+# *** RV128M Standard Extension (in addition to RV64M) ***
+muld 001 .  . 000 . 011 @r
+divd 001 .  . 100 . 011 @r
+divud001 .  . 101 . 011 @r
+remd 001 .  . 110 . 011 @r
+remud001 .  . 111 . 011 @r
+
 # *** RV32A Standard Extension ***
 lr_w   00010 . . 0 . 010 . 010 @atom_ld
 sc_w   00011 . . . . 010 . 010 @atom_st
diff --git a/target/riscv/m128_helper.c b/target/riscv/m128_helper.c
new file mode 100644
index 00..7bf115b85e
--- /dev/null
+++ b/target/riscv/m128_helper.c
@@ -0,0 +1,109 @@
+/*
+ * RISC-V Emulation Helpers for QEMU.
+ *
+ * Copyright (c) 2016-2017 Sagar Karandikar, sag...@eecs.berkeley.edu
+ * Copyright (c) 2017-2018 SiFive, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "qemu/main-loop.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+
+target_ulong HELPER(divu_i128)(CPURISCVState *env,
+   target_ulong ul, target_ulong uh,
+   target_ulong vl, target_ulong vh)
+{
+target_ulong ql, qh;
+Int128 q;
+
+if (vl == 0 && vh == 0) { /* Handle special behavior on div by zero */
+ql = ~0x0;
+qh = ~0x0;
+} else {
+q = int128_divu(int128_make128(ul, uh), int128_make128(vl, vh));
+ql = int128_getlo(q);
+qh = int128_gethi(q);
+}
+
+env->retxh = qh;
+return ql;
+}
+
+target_ulong HELPER(remu_i128)(CPURISCVState *env,
+   target_ulong ul, target_ulong uh,
+   target_ulong vl, target_ulong vh)
+{
+target_ulong rl, rh;
+Int128 r;
+
+if (vl == 0 && vh == 0) {
+rl = ul;
+rh = uh;
+} else {
+r = int128_remu(int128_make128(ul, uh), int128_make128(vl, vh));
+rl = int128_getlo(r);
+rh = int128_gethi(r);
+}
+
+env->retxh = 

[PATCH v5 15/18] target/riscv: adding high part of some csrs

2021-11-12 Thread Frédéric Pétrot
Adding the high part of a very minimal set of csr.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h | 4 
 target/riscv/machine.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ae1f9cb876..15609a5533 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -195,6 +195,10 @@ struct CPURISCVState {
 target_ulong hgatp;
 uint64_t htimedelta;
 
+/* Upper 64-bits of 128-bit CSRs */
+uint64_t mscratchh;
+uint64_t sscratchh;
+
 /* Virtual CSRs */
 /*
  * For RV32 this is 32-bit vsstatus and 32-bit vsstatush.
diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index 7e2d02457e..6f0eabf66a 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -179,6 +179,8 @@ static const VMStateDescription vmstate_rv128 = {
 .needed = rv128_needed,
 .fields = (VMStateField[]) {
 VMSTATE_UINTTL_ARRAY(env.gprh, RISCVCPU, 32),
+VMSTATE_UINT64(env.mscratchh, RISCVCPU),
+VMSTATE_UINT64(env.sscratchh, RISCVCPU),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.33.1




Re: [PATCH v4 10/25] assertions for blockjob_int.h

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  blockjob.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 4bad1408cb..fbd6c7d873 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -83,6 +83,7 @@ BlockJob *block_job_get(const char *id)
  
  void block_job_free(Job *job)

  {
+assert(qemu_in_main_thread());
  BlockJob *bjob = container_of(job, BlockJob, job);


Our coding style (docs/devel/style.rst) requires all statements to come 
after all declarations in a block, so the assert() may not precede the 
bjob declaration.


  
  block_job_remove_all_bdrv(bjob);

@@ -436,6 +437,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  BlockBackend *blk;
  BlockJob *job;
  
+assert(qemu_in_main_thread());

+
  if (job_id == NULL && !(flags & JOB_INTERNAL)) {
  job_id = bdrv_get_device_name(bs);
  }
@@ -504,6 +507,7 @@ void block_job_iostatus_reset(BlockJob *job)
  
  void block_job_user_resume(Job *job)

  {
+assert(qemu_in_main_thread());
  BlockJob *bjob = container_of(job, BlockJob, job);


Same here.

(And now I see that I’ve missed such instances in the other assertion 
patches, like in bdrv_save_vmstate(), those should be fixed, too)


Hanna


  block_job_iostatus_reset(bjob);
  }





[PATCH v5 08/18] target/riscv: moving some insns close to similar insns

2021-11-12 Thread Frédéric Pétrot
lwu and ld are functionally close to the other loads, but were after the
stores in the source file.
Similarly, xor was away from or and and by two arithmetic functions, while
the immediate versions were nicely put together.
This patch moves the aforementioned loads after lhu, and xor above or,
where they more logically belong.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 34 -
 meson   |  2 +-
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 51607b3d40..710f5e6a85 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -176,6 +176,18 @@ static bool trans_lhu(DisasContext *ctx, arg_lhu *a)
 return gen_load(ctx, a, MO_TEUW);
 }
 
+static bool trans_lwu(DisasContext *ctx, arg_lwu *a)
+{
+REQUIRE_64BIT(ctx);
+return gen_load(ctx, a, MO_TEUL);
+}
+
+static bool trans_ld(DisasContext *ctx, arg_ld *a)
+{
+REQUIRE_64BIT(ctx);
+return gen_load(ctx, a, MO_TEUQ);
+}
+
 static bool gen_store(DisasContext *ctx, arg_sb *a, MemOp memop)
 {
 TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
@@ -207,18 +219,6 @@ static bool trans_sw(DisasContext *ctx, arg_sw *a)
 return gen_store(ctx, a, MO_TESL);
 }
 
-static bool trans_lwu(DisasContext *ctx, arg_lwu *a)
-{
-REQUIRE_64BIT(ctx);
-return gen_load(ctx, a, MO_TEUL);
-}
-
-static bool trans_ld(DisasContext *ctx, arg_ld *a)
-{
-REQUIRE_64BIT(ctx);
-return gen_load(ctx, a, MO_TEUQ);
-}
-
 static bool trans_sd(DisasContext *ctx, arg_sd *a)
 {
 REQUIRE_64BIT(ctx);
@@ -317,11 +317,6 @@ static bool trans_sltu(DisasContext *ctx, arg_sltu *a)
 return gen_arith(ctx, a, EXT_SIGN, gen_sltu);
 }
 
-static bool trans_xor(DisasContext *ctx, arg_xor *a)
-{
-return gen_logic(ctx, a, tcg_gen_xor_tl);
-}
-
 static bool trans_srl(DisasContext *ctx, arg_srl *a)
 {
 return gen_shift(ctx, a, EXT_ZERO, tcg_gen_shr_tl);
@@ -332,6 +327,11 @@ static bool trans_sra(DisasContext *ctx, arg_sra *a)
 return gen_shift(ctx, a, EXT_SIGN, tcg_gen_sar_tl);
 }
 
+static bool trans_xor(DisasContext *ctx, arg_xor *a)
+{
+return gen_logic(ctx, a, tcg_gen_xor_tl);
+}
+
 static bool trans_or(DisasContext *ctx, arg_or *a)
 {
 return gen_logic(ctx, a, tcg_gen_or_tl);
diff --git a/meson b/meson
index 12f9f04ba0..b25d94e7c7 16
--- a/meson
+++ b/meson
@@ -1 +1 @@
-Subproject commit 12f9f04ba0decfda425dbbf9a501084c153a2d18
+Subproject commit b25d94e7c77fda05a7fdfe8afe562cf9760d69da
-- 
2.33.1




[PATCH v5 16/18] target/riscv: helper functions to wrap calls to 128-bit csr insns

2021-11-12 Thread Frédéric Pétrot
Given the side effects they have, the csr instructions are realized as
helpers. We extend this existing infrastructure for 128-bit sized csr.
We return 128-bit values using the same approach as for div/rem.
Theses helpers all call a unique function that is currently a fallback
on the 64-bit version.
The trans_csrxx functions supporting 128-bit are yet to be implemented.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h   |  4 
 target/riscv/helper.h|  3 +++
 target/riscv/csr.c   | 17 
 target/riscv/op_helper.c | 44 
 4 files changed, 68 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 15609a5533..6828c136ad 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -483,6 +483,10 @@ typedef RISCVException (*riscv_csr_op_fn)(CPURISCVState 
*env, int csrno,
   target_ulong new_value,
   target_ulong write_mask);
 
+RISCVException riscv_csrrw_i128(CPURISCVState *env, int csrno,
+Int128 *ret_value,
+Int128 new_value, Int128 write_mask);
+
 typedef struct {
 const char *name;
 riscv_csr_predicate_fn predicate;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c036825723..bf2b338bfd 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -66,6 +66,9 @@ DEF_HELPER_FLAGS_2(clmulr, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_2(csrr, tl, env, int)
 DEF_HELPER_3(csrw, void, env, int, tl)
 DEF_HELPER_4(csrrw, tl, env, int, tl, tl)
+DEF_HELPER_2(csrr_i128, tl, env, int)
+DEF_HELPER_4(csrw_i128, void, env, int, tl, tl)
+DEF_HELPER_6(csrrw_i128, tl, env, int, tl, tl, tl, tl)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_2(sret, tl, env, tl)
 DEF_HELPER_2(mret, tl, env, tl)
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 9f41954894..dca9e19a64 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1788,6 +1788,23 @@ RISCVException riscv_csrrw(CPURISCVState *env, int csrno,
 return RISCV_EXCP_NONE;
 }
 
+RISCVException riscv_csrrw_i128(CPURISCVState *env, int csrno,
+   Int128 *ret_value,
+   Int128 new_value, Int128 write_mask)
+{
+/* fall back to 64-bit version for now */
+target_ulong ret_64;
+RISCVException ret = riscv_csrrw(env, csrno, _64,
+ int128_getlo(new_value),
+ int128_getlo(write_mask));
+
+if (ret_value) {
+*ret_value = int128_make64(ret_64);
+}
+
+return ret;
+}
+
 /*
  * Debugger support.  If not in user mode, set env->debugger before the
  * riscv_csrrw call and clear it after the call.
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index ee7c24efe7..f4cf9c4698 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -69,6 +69,50 @@ target_ulong helper_csrrw(CPURISCVState *env, int csr,
 return val;
 }
 
+target_ulong helper_csrr_i128(CPURISCVState *env, int csr)
+{
+Int128 rv = int128_zero();
+RISCVException ret = riscv_csrrw_i128(env, csr, ,
+  int128_zero(),
+  int128_zero());
+
+if (ret != RISCV_EXCP_NONE) {
+riscv_raise_exception(env, ret, GETPC());
+}
+
+env->retxh = int128_gethi(rv);
+return int128_getlo(rv);
+}
+
+void helper_csrw_i128(CPURISCVState *env, int csr,
+  target_ulong srcl, target_ulong srch)
+{
+RISCVException ret = riscv_csrrw_i128(env, csr, NULL,
+  int128_make128(srcl, srch),
+  UINT128_MAX);
+
+if (ret != RISCV_EXCP_NONE) {
+riscv_raise_exception(env, ret, GETPC());
+}
+}
+
+target_ulong helper_csrrw_i128(CPURISCVState *env, int csr,
+   target_ulong srcl, target_ulong srch,
+   target_ulong maskl, target_ulong maskh)
+{
+Int128 rv = int128_zero();
+RISCVException ret = riscv_csrrw_i128(env, csr, ,
+  int128_make128(srcl, srch),
+  int128_make128(maskl, maskh));
+
+if (ret != RISCV_EXCP_NONE) {
+riscv_raise_exception(env, ret, GETPC());
+}
+
+env->retxh = int128_gethi(rv);
+return int128_getlo(rv);
+}
+
 #ifndef CONFIG_USER_ONLY
 
 target_ulong helper_sret(CPURISCVState *env, target_ulong cpu_pc_deb)
-- 
2.33.1




[PATCH v5 07/18] target/riscv: setup everything so that riscv128-softmmu compiles

2021-11-12 Thread Frédéric Pétrot
This patch is kind of a mess because several files have to be slightly
modified to allow for a new target. In the current status, we have done
our best to have RV64 and RV128 under the same RV64 umbrella, but there
is still work to do to have a single executable for both.
In particular, we have no atomic accesses for aligned 128-bit addresses.

Once this patch applied, adding risc128-sofmmu to --target-list produces
a (no so useful yet) executable.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 configs/devices/riscv128-softmmu/default.mak | 17 +++
 configs/targets/riscv128-softmmu.mak |  6 ++
 include/disas/dis-asm.h  |  1 +
 include/hw/riscv/sifive_cpu.h|  3 +++
 target/riscv/cpu-param.h |  5 +
 target/riscv/cpu.h   |  3 +++
 disas/riscv.c|  5 +
 target/riscv/cpu.c   | 22 ++--
 target/riscv/gdbstub.c   |  8 +++
 target/riscv/insn_trans/trans_rvd.c.inc  | 12 +--
 target/riscv/insn_trans/trans_rvf.c.inc  |  6 +++---
 target/riscv/Kconfig |  3 +++
 12 files changed, 80 insertions(+), 11 deletions(-)
 create mode 100644 configs/devices/riscv128-softmmu/default.mak
 create mode 100644 configs/targets/riscv128-softmmu.mak

diff --git a/configs/devices/riscv128-softmmu/default.mak 
b/configs/devices/riscv128-softmmu/default.mak
new file mode 100644
index 00..e838f35785
--- /dev/null
+++ b/configs/devices/riscv128-softmmu/default.mak
@@ -0,0 +1,17 @@
+# Default configuration for riscv128-softmmu
+
+# Uncomment the following lines to disable these optional devices:
+#
+#CONFIG_PCI_DEVICES=n
+# No does not seem to be an option for these two parameters
+CONFIG_SEMIHOSTING=y
+CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
+
+# Boards:
+#
+CONFIG_SPIKE=n
+CONFIG_SIFIVE_E=n
+CONFIG_SIFIVE_U=n
+CONFIG_RISCV_VIRT=y
+CONFIG_MICROCHIP_PFSOC=n
+CONFIG_SHAKTI_C=n
diff --git a/configs/targets/riscv128-softmmu.mak 
b/configs/targets/riscv128-softmmu.mak
new file mode 100644
index 00..d812cc1c80
--- /dev/null
+++ b/configs/targets/riscv128-softmmu.mak
@@ -0,0 +1,6 @@
+TARGET_ARCH=riscv128
+TARGET_BASE_ARCH=riscv
+# As long as we have no atomic accesses for aligned 128-bit addresses
+TARGET_SUPPORTS_MTTCG=n
+TARGET_XML_FILES=gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml 
gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml
+TARGET_NEED_FDT=y
diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index 08e1beec85..102a1e7f50 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -459,6 +459,7 @@ int print_insn_nios2(bfd_vma, disassemble_info*);
 int print_insn_xtensa   (bfd_vma, disassemble_info*);
 int print_insn_riscv32  (bfd_vma, disassemble_info*);
 int print_insn_riscv64  (bfd_vma, disassemble_info*);
+int print_insn_riscv128 (bfd_vma, disassemble_info*);
 int print_insn_rx(bfd_vma, disassemble_info *);
 int print_insn_hexagon(bfd_vma, disassemble_info *);
 
diff --git a/include/hw/riscv/sifive_cpu.h b/include/hw/riscv/sifive_cpu.h
index 136799633a..64078feba8 100644
--- a/include/hw/riscv/sifive_cpu.h
+++ b/include/hw/riscv/sifive_cpu.h
@@ -26,6 +26,9 @@
 #elif defined(TARGET_RISCV64)
 #define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
 #define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
+#else
+#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
+#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
 #endif
 
 #endif /* HW_SIFIVE_CPU_H */
diff --git a/target/riscv/cpu-param.h b/target/riscv/cpu-param.h
index 80eb615f93..c10459b56f 100644
--- a/target/riscv/cpu-param.h
+++ b/target/riscv/cpu-param.h
@@ -16,6 +16,11 @@
 # define TARGET_LONG_BITS 32
 # define TARGET_PHYS_ADDR_SPACE_BITS 34 /* 22-bit PPN */
 # define TARGET_VIRT_ADDR_SPACE_BITS 32 /* sv32 */
+#else
+/* 64-bit target, since QEMU isn't built to have TARGET_LONG_BITS over 64 */
+# define TARGET_LONG_BITS 64
+# define TARGET_PHYS_ADDR_SPACE_BITS 56 /* 44-bit PPN */
+# define TARGET_VIRT_ADDR_SPACE_BITS 48 /* sv48 */
 #endif
 #define TARGET_PAGE_BITS 12 /* 4 KiB Pages */
 /*
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 53a295efb7..8ff5b08d15 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -38,6 +38,7 @@
 #define TYPE_RISCV_CPU_ANY  RISCV_CPU_TYPE_NAME("any")
 #define TYPE_RISCV_CPU_BASE32   RISCV_CPU_TYPE_NAME("rv32")
 #define TYPE_RISCV_CPU_BASE64   RISCV_CPU_TYPE_NAME("rv64")
+#define TYPE_RISCV_CPU_BASE128  RISCV_CPU_TYPE_NAME("rv128")
 #define TYPE_RISCV_CPU_IBEX RISCV_CPU_TYPE_NAME("lowrisc-ibex")
 #define TYPE_RISCV_CPU_SHAKTI_C RISCV_CPU_TYPE_NAME("shakti-c")
 #define TYPE_RISCV_CPU_SIFIVE_E31   RISCV_CPU_TYPE_NAME("sifive-e31")
@@ -50,6 +51,8 @@
 # define TYPE_RISCV_CPU_BASETYPE_RISCV_CPU_BASE32
 #elif defined(TARGET_RISCV64)
 # define 

[PATCH v5 09/18] target/riscv: accessors to registers upper part and 128-bit load/store

2021-11-12 Thread Frédéric Pétrot
Get function to retrieve the 64 top bits of a register, stored in the gprh
field of the cpu state. Set function that writes the 128-bit value at once.
The access to the gprh field can not be protected at compile time to make
sure it is accessed only in the 128-bit version of the processor because we
have no way to indicate that the misa_mxl_max field is const.

The 128-bit ISA adds ldu, lq and sq. We provide support for these
instructions. Note that we compute only 64-bit addresses to actually access
memory, cowardly utilizing the existing address translation mechanism of
QEMU.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 target/riscv/insn16.decode  |  27 ++-
 target/riscv/insn32.decode  |   5 ++
 target/riscv/translate.c|  41 ++
 target/riscv/insn_trans/trans_rvi.c.inc | 102 ++--
 4 files changed, 165 insertions(+), 10 deletions(-)

diff --git a/target/riscv/insn16.decode b/target/riscv/insn16.decode
index 2e9212663c..02c8f61b48 100644
--- a/target/riscv/insn16.decode
+++ b/target/riscv/insn16.decode
@@ -25,14 +25,17 @@
 # Immediates:
 %imm_ci12:s1 2:5
 %nzuimm_ciw7:4 11:2 5:1 6:1   !function=ex_shift_2
+%uimm_cl_q 10:1 5:2 11:2  !function=ex_shift_4
 %uimm_cl_d 5:2 10:3   !function=ex_shift_3
 %uimm_cl_w 5:1 10:3 6:1   !function=ex_shift_2
 %imm_cb12:s1 5:2 2:1 10:2 3:2 !function=ex_shift_1
 %imm_cj12:s1 8:1 9:2 6:1 7:1 2:1 11:1 3:3 !function=ex_shift_1
 
 %shimm_6bit   12:1 2:5   !function=ex_rvc_shifti
+%uimm_6bit_lq 2:4 12:1 6:1   !function=ex_shift_4
 %uimm_6bit_ld 2:3 12:1 5:2   !function=ex_shift_3
 %uimm_6bit_lw 2:2 12:1 4:3   !function=ex_shift_2
+%uimm_6bit_sq 7:4 11:2   !function=ex_shift_4
 %uimm_6bit_sd 7:3 10:3   !function=ex_shift_3
 %uimm_6bit_sw 7:2 9:4!function=ex_shift_2
 
@@ -54,16 +57,20 @@
 # Formats 16:
 @cr  . .  ..   rs2=%rs2_5   rs1=%rd %rd
 @ci... . . .  ..   imm=%imm_ci  rs1=%rd %rd
+@cl_q  ... . .  . ..   imm=%uimm_cl_q   rs1=%rs1_3  rd=%rs2_3
 @cl_d  ... ... ... .. ... ..   imm=%uimm_cl_d   rs1=%rs1_3  rd=%rs2_3
 @cl_w  ... ... ... .. ... ..   imm=%uimm_cl_w   rs1=%rs1_3  rd=%rs2_3
 @cs_2  ... ... ... .. ... ..   rs2=%rs2_3   rs1=%rs1_3  rd=%rs1_3
+@cs_q  ... ... ... .. ... ..   imm=%uimm_cl_q   rs1=%rs1_3  
rs2=%rs2_3
 @cs_d  ... ... ... .. ... ..   imm=%uimm_cl_d   rs1=%rs1_3  
rs2=%rs2_3
 @cs_w  ... ... ... .. ... ..   imm=%uimm_cl_w   rs1=%rs1_3  
rs2=%rs2_3
 @cj...... ..   imm=%imm_cj
 @cb_z  ... ... ... .. ... ..   imm=%imm_cb  rs1=%rs1_3  rs2=0
 
+@c_lqsp... . .  . ..   imm=%uimm_6bit_lq rs1=2 %rd
 @c_ldsp... . .  . ..   imm=%uimm_6bit_ld rs1=2 %rd
 @c_lwsp... . .  . ..   imm=%uimm_6bit_lw rs1=2 %rd
+@c_sqsp... . .  . ..   imm=%uimm_6bit_sq rs1=2 rs2=%rs2_5
 @c_sdsp... . .  . ..   imm=%uimm_6bit_sd rs1=2 rs2=%rs2_5
 @c_swsp... . .  . ..   imm=%uimm_6bit_sw rs1=2 rs2=%rs2_5
 @c_li  ... . .  . ..   imm=%imm_ci rs1=0 %rd
@@ -87,9 +94,15 @@
   illegal 000  000 000 00 --- 00
   addi000  ... ... .. ... 00 @c_addi4spn
 }
-fld   001  ... ... .. ... 00 @cl_d
+{
+  lq  001  ... ... .. ... 00 @cl_q
+  fld 001  ... ... .. ... 00 @cl_d
+}
 lw010  ... ... .. ... 00 @cl_w
-fsd   101  ... ... .. ... 00 @cs_d
+{
+  sq  101  ... ... .. ... 00 @cs_q
+  fsd 101  ... ... .. ... 00 @cs_d
+}
 sw110  ... ... .. ... 00 @cs_w
 
 # *** RV32C and RV64C specific Standard Extension (Quadrant 0) ***
@@ -132,7 +145,10 @@ addw  100 1 11 ... 01 ... 01 @cs_2
 
 # *** RV32/64C Standard Extension (Quadrant 2) ***
 slli  000 .  .  . 10 @c_shift2
-fld   001 .  .  . 10 @c_ldsp
+{
+  lq  001  ... ... .. ... 10 @c_lqsp
+  fld 001 .  .  . 10 @c_ldsp
+}
 {
   illegal 010 -  0  - 10 # c.lwsp, RES rd=0
   lw  010 .  .  . 10 @c_lwsp
@@ -147,7 +163,10 @@ fld   001 .  .  . 10 @c_ldsp
   jalr100 1  .  0 10 @c_jalr rd=1  # C.JALR
   add 100 1  .  . 10 @cr
 }
-fsd   101   ..  . 10 @c_sdsp
+{
+  sq  101  ... ... .. ... 10 @c_sqsp
+  fsd 101   ..  . 10 @c_sdsp
+}
 sw110 .  .  . 10 @c_swsp
 
 # *** RV32C and RV64C specific Standard Extension (Quadrant 2) ***
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2f251dac1b..02889c6082 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -163,6 +163,11 @@ sllw 

[PATCH v5 12/18] target/riscv: support for 128-bit shift instructions

2021-11-12 Thread Frédéric Pétrot
Handling shifts for 32, 64 and 128 operation length for RV128, following the
general framework for handling various olens proposed by Richard.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 target/riscv/insn32.decode  |  10 ++
 target/riscv/translate.c|  58 --
 target/riscv/insn_trans/trans_rvb.c.inc |  22 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 224 ++--
 4 files changed, 270 insertions(+), 44 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 02889c6082..e338a803a0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -22,6 +22,7 @@
 %rs1   15:5
 %rd7:5
 %sh5   20:5
+%sh6   20:6
 
 %sh720:7
 %csr20:12
@@ -92,6 +93,9 @@
 # Formats 64:
 @sh5 ...  . .  ... . ...   shamt=%sh5  %rs1 
%rd
 
+# Formats 128:
+@sh6   .. .. . ... . ...  shamt=%sh6 %rs1 %rd
+
 # *** Privileged Instructions ***
 ecall    0 000 0 1110011
 ebreak  0001 0 000 0 1110011
@@ -167,6 +171,12 @@ sraw 010 .  . 101 . 0111011 @r
 ldu     . 111 . 011 @i
 lq      . 010 . 000 @i
 sq      . 100 . 0100011 @s
+sllid00 ..  . 001 . 1011011 @sh6
+srlid00 ..  . 101 . 1011011 @sh6
+sraid01 ..  . 101 . 1011011 @sh6
+slld 000 . .  001 . 011 @r
+srld 000 . .  101 . 011 @r
+srad 010 . .  101 . 011 @r
 
 # *** RV32M Standard Extension ***
 mul  001 .  . 000 . 0110011 @r
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d2a2f1021d..504fbfc26a 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -560,7 +560,8 @@ static bool gen_arith_per_ol(DisasContext *ctx, arg_r *a, 
DisasExtend ext,
 }
 
 static bool gen_shift_imm_fn(DisasContext *ctx, arg_shift *a, DisasExtend ext,
- void (*func)(TCGv, TCGv, target_long))
+ void (*func)(TCGv, TCGv, target_long),
+ void (*f128)(TCGv, TCGv, TCGv, TCGv, target_long))
 {
 TCGv dest, src1;
 int max_len = get_olen(ctx);
@@ -572,26 +573,38 @@ static bool gen_shift_imm_fn(DisasContext *ctx, arg_shift 
*a, DisasExtend ext,
 dest = dest_gpr(ctx, a->rd);
 src1 = get_gpr(ctx, a->rs1, ext);
 
-func(dest, src1, a->shamt);
+if (max_len < 128) {
+func(dest, src1, a->shamt);
+gen_set_gpr(ctx, a->rd, dest);
+} else {
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv desth = dest_gprh(ctx, a->rd);
 
-gen_set_gpr(ctx, a->rd, dest);
+if (f128 == NULL) {
+return false;
+}
+f128(dest, desth, src1, src1h, a->shamt);
+gen_set_gpr128(ctx, a->rd, dest, desth);
+}
 return true;
 }
 
 static bool gen_shift_imm_fn_per_ol(DisasContext *ctx, arg_shift *a,
 DisasExtend ext,
 void (*f_tl)(TCGv, TCGv, target_long),
-void (*f_32)(TCGv, TCGv, target_long))
+void (*f_32)(TCGv, TCGv, target_long),
+void (*f_128)(TCGv, TCGv, TCGv, TCGv,
+  target_long))
 {
 int olen = get_olen(ctx);
 if (olen != TARGET_LONG_BITS) {
 if (olen == 32) {
 f_tl = f_32;
-} else {
+} else if (olen != 128) {
 g_assert_not_reached();
 }
 }
-return gen_shift_imm_fn(ctx, a, ext, f_tl);
+return gen_shift_imm_fn(ctx, a, ext, f_tl, f_128);
 }
 
 static bool gen_shift_imm_tl(DisasContext *ctx, arg_shift *a, DisasExtend ext,
@@ -615,34 +628,49 @@ static bool gen_shift_imm_tl(DisasContext *ctx, arg_shift 
*a, DisasExtend ext,
 }
 
 static bool gen_shift(DisasContext *ctx, arg_r *a, DisasExtend ext,
-  void (*func)(TCGv, TCGv, TCGv))
+  void (*func)(TCGv, TCGv, TCGv),
+  void (*f128)(TCGv, TCGv, TCGv, TCGv, TCGv))
 {
-TCGv dest = dest_gpr(ctx, a->rd);
-TCGv src1 = get_gpr(ctx, a->rs1, ext);
 TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
 TCGv ext2 = tcg_temp_new();
+int max_len = get_olen(ctx);
 
-tcg_gen_andi_tl(ext2, src2, get_olen(ctx) - 1);
-func(dest, src1, ext2);
+tcg_gen_andi_tl(ext2, src2, max_len - 1);
 
-gen_set_gpr(ctx, a->rd, dest);
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+
+if (max_len < 128) {
+func(dest, src1, ext2);
+gen_set_gpr(ctx, a->rd, dest);
+} else {
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv desth = dest_gprh(ctx, a->rd);
+
+if (f128 == NULL) {
+  

[PATCH v5 13/18] target/riscv: support for 128-bit arithmetic instructions

2021-11-12 Thread Frédéric Pétrot
Addition of 128-bit adds and subs in their various sizes,
"set if less than"s and branches.
Refactored the code to have a comparison function used for both stls and
branches.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 target/riscv/insn32.decode  |   3 +
 target/riscv/translate.c|  63 --
 target/riscv/insn_trans/trans_rvb.c.inc |  20 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 159 +---
 target/riscv/insn_trans/trans_rvm.c.inc |  26 ++--
 5 files changed, 222 insertions(+), 49 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e338a803a0..afaf243b4e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -171,9 +171,12 @@ sraw 010 .  . 101 . 0111011 @r
 ldu     . 111 . 011 @i
 lq      . 010 . 000 @i
 sq      . 100 . 0100011 @s
+addid  .  000 . 1011011 @i
 sllid00 ..  . 001 . 1011011 @sh6
 srlid00 ..  . 101 . 1011011 @sh6
 sraid01 ..  . 101 . 1011011 @sh6
+addd 000 . .  000 . 011 @r
+subd 010 . .  000 . 011 @r
 slld 000 . .  001 . 011 @r
 srld 000 . .  101 . 011 @r
 srad 010 . .  101 . 011 @r
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 504fbfc26a..a5554275e2 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -506,57 +506,96 @@ static bool gen_logic(DisasContext *ctx, arg_r *a,
 }
 
 static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a, DisasExtend ext,
- void (*func)(TCGv, TCGv, target_long))
+ void (*func)(TCGv, TCGv, target_long),
+ void (*f128)(TCGv, TCGv, TCGv, TCGv, target_long))
 {
 TCGv dest = dest_gpr(ctx, a->rd);
 TCGv src1 = get_gpr(ctx, a->rs1, ext);
 
-func(dest, src1, a->imm);
+if (get_ol(ctx) < MXL_RV128) {
+func(dest, src1, a->imm);
+gen_set_gpr(ctx, a->rd, dest);
+} else {
+if (f128 == NULL) {
+return false;
+}
 
-gen_set_gpr(ctx, a->rd, dest);
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv desth = dest_gprh(ctx, a->rd);
+
+f128(dest, desth, src1, src1h, a->imm);
+gen_set_gpr128(ctx, a->rd, dest, desth);
+}
 return true;
 }
 
 static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, DisasExtend ext,
- void (*func)(TCGv, TCGv, TCGv))
+ void (*func)(TCGv, TCGv, TCGv),
+ void (*f128)(TCGv, TCGv, TCGv, TCGv, TCGv, TCGv))
 {
 TCGv dest = dest_gpr(ctx, a->rd);
 TCGv src1 = get_gpr(ctx, a->rs1, ext);
 TCGv src2 = tcg_constant_tl(a->imm);
 
-func(dest, src1, src2);
+if (get_ol(ctx) < MXL_RV128) {
+func(dest, src1, src2);
+gen_set_gpr(ctx, a->rd, dest);
+} else {
+if (f128 == NULL) {
+return false;
+}
 
-gen_set_gpr(ctx, a->rd, dest);
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv src2h = tcg_constant_tl(-(a->imm < 0));
+TCGv desth = dest_gprh(ctx, a->rd);
+
+f128(dest, desth, src1, src1h, src2, src2h);
+gen_set_gpr128(ctx, a->rd, dest, desth);
+}
 return true;
 }
 
 static bool gen_arith(DisasContext *ctx, arg_r *a, DisasExtend ext,
-  void (*func)(TCGv, TCGv, TCGv))
+  void (*func)(TCGv, TCGv, TCGv),
+  void (*f128)(TCGv, TCGv, TCGv, TCGv, TCGv, TCGv))
 {
 TCGv dest = dest_gpr(ctx, a->rd);
 TCGv src1 = get_gpr(ctx, a->rs1, ext);
 TCGv src2 = get_gpr(ctx, a->rs2, ext);
 
-func(dest, src1, src2);
+if (get_ol(ctx) < MXL_RV128) {
+func(dest, src1, src2);
+gen_set_gpr(ctx, a->rd, dest);
+} else {
+if (f128 == NULL) {
+return false;
+}
 
-gen_set_gpr(ctx, a->rd, dest);
+TCGv src1h = get_gprh(ctx, a->rs1);
+TCGv src2h = get_gprh(ctx, a->rs2);
+TCGv desth = dest_gprh(ctx, a->rd);
+
+f128(dest, desth, src1, src1h, src2, src2h);
+gen_set_gpr128(ctx, a->rd, dest, desth);
+}
 return true;
 }
 
 static bool gen_arith_per_ol(DisasContext *ctx, arg_r *a, DisasExtend ext,
  void (*f_tl)(TCGv, TCGv, TCGv),
- void (*f_32)(TCGv, TCGv, TCGv))
+ void (*f_32)(TCGv, TCGv, TCGv),
+ void (*f_128)(TCGv, TCGv, TCGv, TCGv, TCGv, TCGv))
 {
 int olen = get_olen(ctx);
 
 if (olen != TARGET_LONG_BITS) {
 if (olen == 32) {
 f_tl = f_32;
-} else {
+} else if (olen != 128) {
 g_assert_not_reached();
 }
 }
-

[PATCH v5 02/18] exec/memop: Adding signed quad and octo defines

2021-11-12 Thread Frédéric Pétrot
Adding defines to handle signed 64-bit and unsigned 128-bit quantities in
memory accesses.

Signed-off-by: Frédéric Pétrot 
---
 include/exec/memop.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/exec/memop.h b/include/exec/memop.h
index 72c2f0ff3d..2a885f3917 100644
--- a/include/exec/memop.h
+++ b/include/exec/memop.h
@@ -86,28 +86,35 @@ typedef enum MemOp {
 MO_UW= MO_16,
 MO_UL= MO_32,
 MO_UQ= MO_64,
+MO_UO= MO_128,
 MO_SB= MO_SIGN | MO_8,
 MO_SW= MO_SIGN | MO_16,
 MO_SL= MO_SIGN | MO_32,
+MO_SQ= MO_SIGN | MO_64,
+MO_SO= MO_SIGN | MO_128,
 
 MO_LEUW  = MO_LE | MO_UW,
 MO_LEUL  = MO_LE | MO_UL,
 MO_LEUQ  = MO_LE | MO_UQ,
 MO_LESW  = MO_LE | MO_SW,
 MO_LESL  = MO_LE | MO_SL,
+MO_LESQ  = MO_LE | MO_SQ,
 
 MO_BEUW  = MO_BE | MO_UW,
 MO_BEUL  = MO_BE | MO_UL,
 MO_BEUQ  = MO_BE | MO_UQ,
 MO_BESW  = MO_BE | MO_SW,
 MO_BESL  = MO_BE | MO_SL,
+MO_BESQ  = MO_BE | MO_SQ,
 
 #ifdef NEED_CPU_H
 MO_TEUW  = MO_TE | MO_UW,
 MO_TEUL  = MO_TE | MO_UL,
 MO_TEUQ  = MO_TE | MO_UQ,
+MO_TEUO  = MO_TE | MO_UO,
 MO_TESW  = MO_TE | MO_SW,
 MO_TESL  = MO_TE | MO_SL,
+MO_TESQ  = MO_TE | MO_SQ,
 #endif
 
 MO_SSIZE = MO_SIZE | MO_SIGN,
-- 
2.33.1




[PATCH v5 11/18] target/riscv: support for 128-bit U-type instructions

2021-11-12 Thread Frédéric Pétrot
Adding the 128-bit version of lui and auipc, and introducing to that end
a "set register with immediate" function to handle extension on 128 bits.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/translate.c| 21 +
 target/riscv/insn_trans/trans_rvi.c.inc |  8 
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 508ae87985..d2a2f1021d 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -289,6 +289,27 @@ static void gen_set_gpr(DisasContext *ctx, int reg_num, 
TCGv t)
 }
 }
 
+static void gen_set_gpri(DisasContext *ctx, int reg_num, target_long imm)
+{
+if (reg_num != 0) {
+switch (get_ol(ctx)) {
+case MXL_RV32:
+tcg_gen_movi_tl(cpu_gpr[reg_num], (int32_t)imm);
+break;
+case MXL_RV64:
+case MXL_RV128:
+tcg_gen_movi_tl(cpu_gpr[reg_num], imm);
+break;
+default:
+g_assert_not_reached();
+}
+
+if (get_xl_max(ctx) == MXL_RV128) {
+tcg_gen_movi_tl(cpu_gprh[reg_num], -(imm < 0));
+}
+}
+}
+
 static void gen_set_gpr128(DisasContext *ctx, int reg_num, TCGv rl, TCGv rh)
 {
 assert(get_ol(ctx) == MXL_RV128);
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index fc73735b9e..0070fe606a 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -26,14 +26,14 @@ static bool trans_illegal(DisasContext *ctx, arg_empty *a)
 
 static bool trans_c64_illegal(DisasContext *ctx, arg_empty *a)
 {
- REQUIRE_64BIT(ctx);
- return trans_illegal(ctx, a);
+REQUIRE_64_OR_128BIT(ctx);
+return trans_illegal(ctx, a);
 }
 
 static bool trans_lui(DisasContext *ctx, arg_lui *a)
 {
 if (a->rd != 0) {
-tcg_gen_movi_tl(cpu_gpr[a->rd], a->imm);
+gen_set_gpri(ctx, a->rd, a->imm);
 }
 return true;
 }
@@ -41,7 +41,7 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
 static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
 {
 if (a->rd != 0) {
-tcg_gen_movi_tl(cpu_gpr[a->rd], a->imm + ctx->base.pc_next);
+gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
 }
 return true;
 }
-- 
2.33.1




[PATCH v5 01/18] exec/memop: Adding signedness to quad definitions

2021-11-12 Thread Frédéric Pétrot
Renaming defines for quad in their various forms so that their signedness is
now explicit.
Done using git grep as suggested by Philippe, with a bit of hand edition to
keep assignments aligned.

Signed-off-by: Frédéric Pétrot 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/exec/memop.h   |  8 +--
 include/tcg/tcg-op.h   |  4 +-
 target/arm/translate-a32.h |  4 +-
 accel/tcg/cputlb.c | 30 +--
 accel/tcg/user-exec.c  |  8 +--
 target/alpha/translate.c   | 32 ++--
 target/arm/helper-a64.c|  8 +--
 target/arm/translate-a64.c |  8 +--
 target/arm/translate-neon.c|  6 +--
 target/arm/translate-sve.c | 10 ++--
 target/arm/translate-vfp.c |  8 +--
 target/arm/translate.c |  2 +-
 target/cris/translate.c|  2 +-
 target/hppa/translate.c|  4 +-
 target/i386/tcg/mem_helper.c   |  2 +-
 target/i386/tcg/translate.c| 36 +++---
 target/m68k/op_helper.c|  2 +-
 target/mips/tcg/translate.c| 58 +++---
 target/mips/tcg/tx79_translate.c   |  8 +--
 target/ppc/translate.c | 32 ++--
 target/s390x/tcg/mem_helper.c  |  8 +--
 target/s390x/tcg/translate.c   |  8 +--
 target/sh4/translate.c | 12 ++---
 target/sparc/translate.c   | 36 +++---
 target/tricore/translate.c |  4 +-
 target/xtensa/translate.c  |  4 +-
 tcg/tcg.c  |  4 +-
 tcg/tci.c  | 16 +++---
 accel/tcg/ldst_common.c.inc|  8 +--
 target/mips/tcg/micromips_translate.c.inc  | 10 ++--
 target/ppc/translate/fixedpoint-impl.c.inc | 22 
 target/ppc/translate/fp-impl.c.inc |  4 +-
 target/ppc/translate/vsx-impl.c.inc| 42 
 target/riscv/insn_trans/trans_rva.c.inc| 22 
 target/riscv/insn_trans/trans_rvd.c.inc|  4 +-
 target/riscv/insn_trans/trans_rvh.c.inc|  4 +-
 target/riscv/insn_trans/trans_rvi.c.inc|  4 +-
 target/s390x/tcg/translate_vx.c.inc| 18 +++
 tcg/aarch64/tcg-target.c.inc   |  2 +-
 tcg/arm/tcg-target.c.inc   | 10 ++--
 tcg/i386/tcg-target.c.inc  | 12 ++---
 tcg/mips/tcg-target.c.inc  | 12 ++---
 tcg/ppc/tcg-target.c.inc   | 16 +++---
 tcg/riscv/tcg-target.c.inc |  6 +--
 tcg/s390x/tcg-target.c.inc | 18 +++
 tcg/sparc/tcg-target.c.inc | 16 +++---
 target/s390x/tcg/insn-data.def | 28 +--
 47 files changed, 311 insertions(+), 311 deletions(-)

diff --git a/include/exec/memop.h b/include/exec/memop.h
index 04264ffd6b..72c2f0ff3d 100644
--- a/include/exec/memop.h
+++ b/include/exec/memop.h
@@ -85,29 +85,29 @@ typedef enum MemOp {
 MO_UB= MO_8,
 MO_UW= MO_16,
 MO_UL= MO_32,
+MO_UQ= MO_64,
 MO_SB= MO_SIGN | MO_8,
 MO_SW= MO_SIGN | MO_16,
 MO_SL= MO_SIGN | MO_32,
-MO_Q = MO_64,
 
 MO_LEUW  = MO_LE | MO_UW,
 MO_LEUL  = MO_LE | MO_UL,
+MO_LEUQ  = MO_LE | MO_UQ,
 MO_LESW  = MO_LE | MO_SW,
 MO_LESL  = MO_LE | MO_SL,
-MO_LEQ   = MO_LE | MO_Q,
 
 MO_BEUW  = MO_BE | MO_UW,
 MO_BEUL  = MO_BE | MO_UL,
+MO_BEUQ  = MO_BE | MO_UQ,
 MO_BESW  = MO_BE | MO_SW,
 MO_BESL  = MO_BE | MO_SL,
-MO_BEQ   = MO_BE | MO_Q,
 
 #ifdef NEED_CPU_H
 MO_TEUW  = MO_TE | MO_UW,
 MO_TEUL  = MO_TE | MO_UL,
+MO_TEUQ  = MO_TE | MO_UQ,
 MO_TESW  = MO_TE | MO_SW,
 MO_TESL  = MO_TE | MO_SL,
-MO_TEQ   = MO_TE | MO_Q,
 #endif
 
 MO_SSIZE = MO_SIZE | MO_SIGN,
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 0545a6224c..caa0a63612 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -894,7 +894,7 @@ static inline void tcg_gen_qemu_ld32s(TCGv ret, TCGv addr, 
int mem_index)
 
 static inline void tcg_gen_qemu_ld64(TCGv_i64 ret, TCGv addr, int mem_index)
 {
-tcg_gen_qemu_ld_i64(ret, addr, mem_index, MO_TEQ);
+tcg_gen_qemu_ld_i64(ret, addr, mem_index, MO_TEUQ);
 }
 
 static inline void tcg_gen_qemu_st8(TCGv arg, TCGv addr, int mem_index)
@@ -914,7 +914,7 @@ static inline void tcg_gen_qemu_st32(TCGv arg, TCGv addr, 
int mem_index)
 
 static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 {
-tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEQ);
+tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEUQ);
 }
 
 void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index 17af8dc95a..5be4b9b834 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h

[PATCH v5 18/18] target/riscv: actual functions to realize crs 128-bit insns

2021-11-12 Thread Frédéric Pétrot
The csrs are accessed through function pointers: we add 128-bit read
operations in the table for three csrs (writes fallback to the
64-bit version as the upper 64-bit information is handled elsewhere):
- misa, as mxl is needed for proper operation,
- mstatus and sstatus, to return sd
In addition, we also add read and write accesses to the machine and
supervisor scratch registers.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 target/riscv/cpu.h  |   7 ++
 target/riscv/cpu_bits.h |   3 +
 target/riscv/csr.c  | 199 ++--
 3 files changed, 179 insertions(+), 30 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 6828c136ad..bfba900ec7 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -487,12 +487,19 @@ RISCVException riscv_csrrw_i128(CPURISCVState *env, int 
csrno,
 Int128 *ret_value,
 Int128 new_value, Int128 write_mask);
 
+typedef RISCVException (*riscv_csr_read128_fn)(CPURISCVState *env, int csrno,
+   Int128 *ret_value);
+typedef RISCVException (*riscv_csr_write128_fn)(CPURISCVState *env, int csrno,
+ Int128 new_value);
+
 typedef struct {
 const char *name;
 riscv_csr_predicate_fn predicate;
 riscv_csr_read_fn read;
 riscv_csr_write_fn write;
 riscv_csr_op_fn op;
+riscv_csr_read128_fn read128;
+riscv_csr_write128_fn write128;
 } riscv_csr_operations;
 
 /* CSR function table constants */
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 9913fa9f77..390ba0a52f 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -392,6 +392,7 @@
 
 #define MSTATUS32_SD0x8000
 #define MSTATUS64_SD0x8000ULL
+#define MSTATUSH128_SD  0x8000ULL
 
 #define MISA32_MXL  0xC000
 #define MISA64_MXL  0xC000ULL
@@ -413,6 +414,8 @@ typedef enum {
 #define SSTATUS_SUM 0x0004 /* since: priv-1.10 */
 #define SSTATUS_MXR 0x0008
 
+#define SSTATUS64_UXL   0x0003ULL
+
 #define SSTATUS32_SD0x8000
 #define SSTATUS64_SD0x8000ULL
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index dca9e19a64..bfc13d4bff 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -453,7 +453,7 @@ static const target_ulong vs_delegable_excps = 
DELEGABLE_EXCPS &
   (1ULL << (RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT)));
 static const target_ulong sstatus_v1_10_mask = SSTATUS_SIE | SSTATUS_SPIE |
 SSTATUS_UIE | SSTATUS_UPIE | SSTATUS_SPP | SSTATUS_FS | SSTATUS_XS |
-SSTATUS_SUM | SSTATUS_MXR;
+SSTATUS_SUM | SSTATUS_MXR | (target_ulong)SSTATUS64_UXL;
 static const target_ulong sip_writable_mask = SIP_SSIP | MIP_USIP | MIP_UEIP;
 static const target_ulong hip_writable_mask = MIP_VSSIP;
 static const target_ulong hvip_writable_mask = MIP_VSSIP | MIP_VSTIP | 
MIP_VSEIP;
@@ -498,6 +498,8 @@ static uint64_t add_status_sd(RISCVMXL xl, uint64_t status)
 return status | MSTATUS32_SD;
 case MXL_RV64:
 return status | MSTATUS64_SD;
+case MXL_RV128:
+return MSTATUSH128_SD;
 default:
 g_assert_not_reached();
 }
@@ -547,10 +549,11 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
 
 mstatus = (mstatus & ~mask) | (val & mask);
 
-if (riscv_cpu_mxl(env) == MXL_RV64) {
+RISCVMXL xl = riscv_cpu_mxl(env);
+if (xl > MXL_RV32) {
 /* SXL and UXL fields are for now read only */
-mstatus = set_field(mstatus, MSTATUS64_SXL, MXL_RV64);
-mstatus = set_field(mstatus, MSTATUS64_UXL, MXL_RV64);
+mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
+mstatus = set_field(mstatus, MSTATUS64_UXL, xl);
 }
 env->mstatus = mstatus;
 
@@ -579,6 +582,20 @@ static RISCVException write_mstatush(CPURISCVState *env, 
int csrno,
 return RISCV_EXCP_NONE;
 }
 
+static RISCVException read_mstatus_i128(CPURISCVState *env, int csrno,
+Int128 *val)
+{
+*val = int128_make128(env->mstatus, add_status_sd(MXL_RV128, 
env->mstatus));
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException read_misa_i128(CPURISCVState *env, int csrno,
+ Int128 *val)
+{
+*val = int128_make128(env->misa_ext, (uint64_t)MXL_RV128 << 62);
+return RISCV_EXCP_NONE;
+}
+
 static RISCVException read_misa(CPURISCVState *env, int csrno,
 target_ulong *val)
 {
@@ -736,6 +753,21 @@ static RISCVException write_mcounteren(CPURISCVState *env, 
int csrno,
 }
 
 /* Machine Trap Handling */
+static RISCVException read_mscratch_i128(CPURISCVState *env, int csrno,
+ Int128 *val)
+{
+*val = int128_make128(env->mscratch, env->mscratchh);
+return 

[PATCH v5 05/18] target/riscv: separation of bitwise logic and arithmetic helpers

2021-11-12 Thread Frédéric Pétrot
Introduction of a gen_logic function for bitwise logic to implement
instructions in which not propagation of information occurs between bits and
use of this function on the bitwise instructions.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/translate.c| 27 +
 target/riscv/insn_trans/trans_rvb.c.inc |  6 +++---
 target/riscv/insn_trans/trans_rvi.c.inc | 12 +--
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d98bde9b6b..b4278a6a92 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -400,6 +400,33 @@ static int ex_rvc_shifti(DisasContext *ctx, int imm)
 /* Include the auto-generated decoder for 32 bit insn */
 #include "decode-insn32.c.inc"
 
+static bool gen_logic_imm_fn(DisasContext *ctx, arg_i *a,
+ void (*func)(TCGv, TCGv, target_long))
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+
+func(dest, src1, a->imm);
+
+gen_set_gpr(ctx, a->rd, dest);
+
+return true;
+}
+
+static bool gen_logic(DisasContext *ctx, arg_r *a,
+  void (*func)(TCGv, TCGv, TCGv))
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+
+func(dest, src1, src2);
+
+gen_set_gpr(ctx, a->rd, dest);
+
+return true;
+}
+
 static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a, DisasExtend ext,
  void (*func)(TCGv, TCGv, target_long))
 {
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index c8d31907c5..de2cd613b1 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -86,19 +86,19 @@ static bool trans_cpop(DisasContext *ctx, arg_cpop *a)
 static bool trans_andn(DisasContext *ctx, arg_andn *a)
 {
 REQUIRE_ZBB(ctx);
-return gen_arith(ctx, a, EXT_NONE, tcg_gen_andc_tl);
+return gen_logic(ctx, a, tcg_gen_andc_tl);
 }
 
 static bool trans_orn(DisasContext *ctx, arg_orn *a)
 {
 REQUIRE_ZBB(ctx);
-return gen_arith(ctx, a, EXT_NONE, tcg_gen_orc_tl);
+return gen_logic(ctx, a, tcg_gen_orc_tl);
 }
 
 static bool trans_xnor(DisasContext *ctx, arg_xnor *a)
 {
 REQUIRE_ZBB(ctx);
-return gen_arith(ctx, a, EXT_NONE, tcg_gen_eqv_tl);
+return gen_logic(ctx, a, tcg_gen_eqv_tl);
 }
 
 static bool trans_min(DisasContext *ctx, arg_min *a)
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 4a2aefe3a5..51607b3d40 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -252,17 +252,17 @@ static bool trans_sltiu(DisasContext *ctx, arg_sltiu *a)
 
 static bool trans_xori(DisasContext *ctx, arg_xori *a)
 {
-return gen_arith_imm_fn(ctx, a, EXT_NONE, tcg_gen_xori_tl);
+return gen_logic_imm_fn(ctx, a, tcg_gen_xori_tl);
 }
 
 static bool trans_ori(DisasContext *ctx, arg_ori *a)
 {
-return gen_arith_imm_fn(ctx, a, EXT_NONE, tcg_gen_ori_tl);
+return gen_logic_imm_fn(ctx, a, tcg_gen_ori_tl);
 }
 
 static bool trans_andi(DisasContext *ctx, arg_andi *a)
 {
-return gen_arith_imm_fn(ctx, a, EXT_NONE, tcg_gen_andi_tl);
+return gen_logic_imm_fn(ctx, a, tcg_gen_andi_tl);
 }
 
 static bool trans_slli(DisasContext *ctx, arg_slli *a)
@@ -319,7 +319,7 @@ static bool trans_sltu(DisasContext *ctx, arg_sltu *a)
 
 static bool trans_xor(DisasContext *ctx, arg_xor *a)
 {
-return gen_arith(ctx, a, EXT_NONE, tcg_gen_xor_tl);
+return gen_logic(ctx, a, tcg_gen_xor_tl);
 }
 
 static bool trans_srl(DisasContext *ctx, arg_srl *a)
@@ -334,12 +334,12 @@ static bool trans_sra(DisasContext *ctx, arg_sra *a)
 
 static bool trans_or(DisasContext *ctx, arg_or *a)
 {
-return gen_arith(ctx, a, EXT_NONE, tcg_gen_or_tl);
+return gen_logic(ctx, a, tcg_gen_or_tl);
 }
 
 static bool trans_and(DisasContext *ctx, arg_and *a)
 {
-return gen_arith(ctx, a, EXT_NONE, tcg_gen_and_tl);
+return gen_logic(ctx, a, tcg_gen_and_tl);
 }
 
 static bool trans_addiw(DisasContext *ctx, arg_addiw *a)
-- 
2.33.1




[PATCH v5 17/18] target/riscv: modification of the trans_csrxx for 128-bit support

2021-11-12 Thread Frédéric Pétrot
As opposed to the gen_arith and gen_shift generation helpers, the csr insns
do not have a common prototype, so the choice to generate 32/64 or 128-bit
helper calls is done in the trans_csrxx functions.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 205 ++--
 1 file changed, 160 insertions(+), 45 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index f43f00d9e5..9f05f47a43 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -883,20 +883,78 @@ static bool do_csrrw(DisasContext *ctx, int rd, int rc, 
TCGv src, TCGv mask)
 return do_csr_post(ctx);
 }
 
+static bool do_csrr_i128(DisasContext *ctx, int rd, int rc)
+{
+TCGv destl = dest_gpr(ctx, rd);
+TCGv desth = dest_gprh(ctx, rd);
+TCGv_i32 csr = tcg_constant_i32(rc);
+
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrr_i128(destl, cpu_env, csr);
+tcg_gen_ld_tl(desth, cpu_env, offsetof(CPURISCVState, retxh));
+gen_set_gpr128(ctx, rd, destl, desth);
+return do_csr_post(ctx);
+}
+
+static bool do_csrw_i128(DisasContext *ctx, int rc, TCGv srcl, TCGv srch)
+{
+TCGv_i32 csr = tcg_constant_i32(rc);
+
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrw_i128(cpu_env, csr, srcl, srch);
+return do_csr_post(ctx);
+}
+
+static bool do_csrrw_i128(DisasContext *ctx, int rd, int rc,
+  TCGv srcl, TCGv srch, TCGv maskl, TCGv maskh)
+{
+TCGv destl = dest_gpr(ctx, rd);
+TCGv desth = dest_gprh(ctx, rd);
+TCGv_i32 csr = tcg_constant_i32(rc);
+
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrrw_i128(destl, cpu_env, csr, srcl, srch, maskl, maskh);
+tcg_gen_ld_tl(desth, cpu_env, offsetof(CPURISCVState, retxh));
+gen_set_gpr128(ctx, rd, destl, desth);
+return do_csr_post(ctx);
+}
+
 static bool trans_csrrw(DisasContext *ctx, arg_csrrw *a)
 {
-TCGv src = get_gpr(ctx, a->rs1, EXT_NONE);
-
-/*
- * If rd == 0, the insn shall not read the csr, nor cause any of the
- * side effects that might occur on a csr read.
- */
-if (a->rd == 0) {
-return do_csrw(ctx, a->csr, src);
+if (get_xl(ctx) < MXL_RV128) {
+TCGv src = get_gpr(ctx, a->rs1, EXT_NONE);
+
+/*
+ * If rd == 0, the insn shall not read the csr, nor cause any of the
+ * side effects that might occur on a csr read.
+ */
+if (a->rd == 0) {
+return do_csrw(ctx, a->csr, src);
+}
+
+TCGv mask = tcg_constant_tl(-1);
+return do_csrrw(ctx, a->rd, a->csr, src, mask);
+} else {
+TCGv srcl = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv srch = get_gprh(ctx, a->rs1);
+
+/*
+ * If rd == 0, the insn shall not read the csr, nor cause any of the
+ * side effects that might occur on a csr read.
+ */
+if (a->rd == 0) {
+return do_csrw_i128(ctx, a->csr, srcl, srch);
+}
+
+TCGv mask = tcg_constant_tl(-1);
+return do_csrrw_i128(ctx, a->rd, a->csr, srcl, srch, mask, mask);
 }
-
-TCGv mask = tcg_constant_tl(-1);
-return do_csrrw(ctx, a->rd, a->csr, src, mask);
 }
 
 static bool trans_csrrs(DisasContext *ctx, arg_csrrs *a)
@@ -908,13 +966,24 @@ static bool trans_csrrs(DisasContext *ctx, arg_csrrs *a)
  * a zero value, the instruction will still attempt to write the
  * unmodified value back to the csr and will cause side effects.
  */
-if (a->rs1 == 0) {
-return do_csrr(ctx, a->rd, a->csr);
+if (get_xl(ctx) < MXL_RV128) {
+if (a->rs1 == 0) {
+return do_csrr(ctx, a->rd, a->csr);
+}
+
+TCGv ones = tcg_constant_tl(-1);
+TCGv mask = get_gpr(ctx, a->rs1, EXT_ZERO);
+return do_csrrw(ctx, a->rd, a->csr, ones, mask);
+} else {
+if (a->rs1 == 0) {
+return do_csrr_i128(ctx, a->rd, a->csr);
+}
+
+TCGv ones = tcg_constant_tl(-1);
+TCGv maskl = get_gpr(ctx, a->rs1, EXT_ZERO);
+TCGv maskh = get_gprh(ctx, a->rs1);
+return do_csrrw_i128(ctx, a->rd, a->csr, ones, ones, maskl, maskh);
 }
-
-TCGv ones = tcg_constant_tl(-1);
-TCGv mask = get_gpr(ctx, a->rs1, EXT_ZERO);
-return do_csrrw(ctx, a->rd, a->csr, ones, mask);
 }
 
 static bool trans_csrrc(DisasContext *ctx, arg_csrrc *a)
@@ -926,28 +995,54 @@ static bool trans_csrrc(DisasContext *ctx, arg_csrrc *a)
  * a zero value, the instruction will still attempt to write the
  * unmodified value back to the csr and will cause side effects.
  */
-if (a->rs1 == 0) {
-return do_csrr(ctx, a->rd, a->csr);
+if (get_xl(ctx) < MXL_RV128) {
+if (a->rs1 == 0) {
+   

[PATCH v5 04/18] target/riscv: additional macros to check instruction support

2021-11-12 Thread Frédéric Pétrot
Given that the 128-bit version of the riscv spec adds new instructions, and
that some instructions that were previously only available in 64-bit mode
are now available for both 64-bit and 128-bit, we added new macros to check
for the processor mode during translation.
Although RV128 is a superset of RV64, we keep for now the RV64 only tests
for extensions other than RVI and RVM.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Reviewed-by: Richard Henderson 
---
 target/riscv/translate.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 1d57bc97b5..d98bde9b6b 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -368,10 +368,22 @@ EX_SH(12)
 }  \
 } while (0)
 
-#define REQUIRE_64BIT(ctx) do {\
-if (get_xl(ctx) < MXL_RV64) {  \
-return false;  \
-}  \
+#define REQUIRE_64BIT(ctx) do { \
+if (get_xl(ctx) != MXL_RV64) { \
+return false;   \
+}   \
+} while (0)
+
+#define REQUIRE_128BIT(ctx) do {\
+if (get_xl(ctx) != MXL_RV128) { \
+return false;   \
+}   \
+} while (0)
+
+#define REQUIRE_64_OR_128BIT(ctx) do { \
+if (get_xl(ctx) == MXL_RV32) { \
+return false;  \
+}  \
 } while (0)
 
 static int ex_rvc_register(DisasContext *ctx, int reg)
-- 
2.33.1




[PATCH v5 06/18] target/riscv: array for the 64 upper bits of 128-bit registers

2021-11-12 Thread Frédéric Pétrot
The upper 64-bit of the 128-bit registers have now a place inside
the cpu state structure, and are created as globals for future use.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 target/riscv/cpu.h   |  2 ++
 target/riscv/cpu.c   |  9 +
 target/riscv/machine.c   | 20 
 target/riscv/translate.c |  5 -
 4 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0760c0af93..53a295efb7 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -110,6 +110,7 @@ FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 1, 1)
 
 struct CPURISCVState {
 target_ulong gpr[32];
+target_ulong gprh[32]; /* 64 top bits of the 128-bit registers */
 uint64_t fpr[32]; /* assume both F and D extensions */
 
 /* vector coprocessor state. */
@@ -339,6 +340,7 @@ static inline bool riscv_feature(CPURISCVState *env, int 
feature)
 #include "cpu_user.h"
 
 extern const char * const riscv_int_regnames[];
+extern const char * const riscv_int_regnamesh[];
 extern const char * const riscv_fpr_regnames[];
 
 const char *riscv_cpu_get_trap_name(target_ulong cause, bool async);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f812998123..364140f5ff 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -42,6 +42,15 @@ const char * const riscv_int_regnames[] = {
   "x28/t3",  "x29/t4", "x30/t5", "x31/t6"
 };
 
+const char * const riscv_int_regnamesh[] = {
+  "x0h/zeroh", "x1h/rah",  "x2h/sph",   "x3h/gph",   "x4h/tph",  "x5h/t0h",
+  "x6h/t1h",   "x7h/t2h",  "x8h/s0h",   "x9h/s1h",   "x10h/a0h", "x11h/a1h",
+  "x12h/a2h",  "x13h/a3h", "x14h/a4h",  "x15h/a5h",  "x16h/a6h", "x17h/a7h",
+  "x18h/s2h",  "x19h/s3h", "x20h/s4h",  "x21h/s5h",  "x22h/s6h", "x23h/s7h",
+  "x24h/s8h",  "x25h/s9h", "x26h/s10h", "x27h/s11h", "x28h/t3h", "x29h/t4h",
+  "x30h/t5h",  "x31h/t6h"
+};
+
 const char * const riscv_fpr_regnames[] = {
   "f0/ft0",   "f1/ft1",  "f2/ft2",   "f3/ft3",   "f4/ft4",  "f5/ft5",
   "f6/ft6",   "f7/ft7",  "f8/fs0",   "f9/fs1",   "f10/fa0", "f11/fa1",
diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index 7b4c739564..7e2d02457e 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -92,6 +92,14 @@ static bool pointermasking_needed(void *opaque)
 return riscv_has_ext(env, RVJ);
 }
 
+static bool rv128_needed(void *opaque)
+{
+RISCVCPU *cpu = opaque;
+CPURISCVState *env = >env;
+
+return env->misa_mxl_max == MXL_RV128;
+}
+
 static const VMStateDescription vmstate_vector = {
 .name = "cpu/vector",
 .version_id = 1,
@@ -164,6 +172,17 @@ static const VMStateDescription vmstate_hyper = {
 }
 };
 
+static const VMStateDescription vmstate_rv128 = {
+.name = "cpu/rv128",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = rv128_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINTTL_ARRAY(env.gprh, RISCVCPU, 32),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_riscv_cpu = {
 .name = "cpu",
 .version_id = 3,
@@ -218,6 +237,7 @@ const VMStateDescription vmstate_riscv_cpu = {
 _hyper,
 _vector,
 _pointermasking,
+_rv128,
 NULL
 }
 };
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index b4278a6a92..00a2cfa917 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -32,7 +32,7 @@
 #include "instmap.h"
 
 /* global register indices */
-static TCGv cpu_gpr[32], cpu_pc, cpu_vl;
+static TCGv cpu_gpr[32], cpu_gprh[32], cpu_pc, cpu_vl;
 static TCGv_i64 cpu_fpr[32]; /* assume F and D extensions */
 static TCGv load_res;
 static TCGv load_val;
@@ -777,10 +777,13 @@ void riscv_translate_init(void)
  * unless you specifically block reads/writes to reg 0.
  */
 cpu_gpr[0] = NULL;
+cpu_gprh[0] = NULL;
 
 for (i = 1; i < 32; i++) {
 cpu_gpr[i] = tcg_global_mem_new(cpu_env,
 offsetof(CPURISCVState, gpr[i]), riscv_int_regnames[i]);
+cpu_gprh[i] = tcg_global_mem_new(cpu_env,
+offsetof(CPURISCVState, gprh[i]), riscv_int_regnamesh[i]);
 }
 
 for (i = 0; i < 32; i++) {
-- 
2.33.1




[PATCH v5 03/18] qemu/int128: addition of div/rem 128-bit operations

2021-11-12 Thread Frédéric Pétrot
Addition of div and rem on 128-bit integers, using the 128/64->128 divu and
64x64->128 mulu in host-utils.
These operations will be used within div/rem helpers in the 128-bit riscv
target.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
 include/qemu/int128.h |   6 ++
 util/int128.c | 145 ++
 util/meson.build  |   1 +
 3 files changed, 152 insertions(+)
 create mode 100644 util/int128.c

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index b6d517aea4..ef41892dac 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -386,4 +386,10 @@ static inline void bswap128s(Int128 *s)
 *s = bswap128(*s);
 }
 
+#define UINT128_MAX int128_make128(~0LL, ~0LL)
+Int128 int128_divu(Int128, Int128);
+Int128 int128_remu(Int128, Int128);
+Int128 int128_divs(Int128, Int128);
+Int128 int128_rems(Int128, Int128);
+
 #endif /* INT128_H */
diff --git a/util/int128.c b/util/int128.c
new file mode 100644
index 00..c2ddf197e1
--- /dev/null
+++ b/util/int128.c
@@ -0,0 +1,145 @@
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/int128.h"
+
+#ifdef CONFIG_INT128
+
+Int128 int128_divu(Int128 a, Int128 b)
+{
+return (__uint128_t)a / (__uint128_t)b;
+}
+
+Int128 int128_remu(Int128 a, Int128 b)
+{
+return (__uint128_t)a % (__uint128_t)b;
+}
+
+Int128 int128_divs(Int128 a, Int128 b)
+{
+return a / b;
+}
+
+Int128 int128_rems(Int128 a, Int128 b)
+{
+return a % b;
+}
+
+#else
+
+/*
+ * Division and remainder algorithms for 128-bit due to Stefan Kanthak,
+ * https://skanthak.homepage.t-online.de/integer.html#udivmodti4
+ * Preconditions:
+ * - function should never be called with v equals to 0, it has to
+ *   be dealt with beforehand
+ * - quotien pointer must be valid
+ */
+static Int128 divrem128(Int128 u, Int128 v, Int128 *q)
+{
+Int128 qq;
+uint64_t hi, lo, tmp;
+int s;
+
+if ((s = clz64(v.hi)) == 64) {
+/* we have uu÷0v => let's use divu128 */
+hi = u.hi;
+lo = u.lo;
+tmp = divu128(, , v.lo);
+*q = int128_make128(lo, hi);
+return int128_make128(tmp, 0);
+} else {
+hi = int128_gethi(int128_lshift(v, s));
+
+if (hi > u.hi) {
+lo = u.lo;
+tmp = u.hi;
+divu128(, , hi);
+lo = int128_gethi(int128_lshift(int128_make128(lo, 0), s));
+} else { /* prevent overflow */
+lo = u.lo;
+tmp = u.hi - hi;
+divu128(, , hi);
+lo = int128_gethi(int128_lshift(int128_make128(lo, 1), s));
+}
+
+qq = int128_make64(lo);
+
+tmp = lo * v.hi;
+mulu64(, , lo, v.lo);
+hi += tmp;
+
+if (hi < tmp /* quotient * divisor >= 2**128 > dividend */
+|| hi > u.hi /* quotient * divisor > dividend */
+|| (hi == u.hi && lo > u.lo)) {
+qq.lo -= 1;
+mulu64(, , qq.lo, v.lo);
+hi += qq.lo * v.hi;
+}
+
+*q = qq;
+u.hi -= hi + (u.lo < lo);
+u.lo -= lo;
+return u;
+}
+}
+
+Int128 int128_divu(Int128 a, Int128 b)
+{
+Int128 q;
+divrem128(a, b, );
+return q;
+}
+
+Int128 int128_remu(Int128 a, Int128 b)
+{
+Int128 q;
+return divrem128(a, b, );
+}
+
+Int128 int128_divs(Int128 a, Int128 b)
+{
+Int128 q;
+bool sgna = !int128_nonneg(a);
+bool sgnb = !int128_nonneg(b);
+
+if (sgna) {
+a = int128_neg(a);
+}
+
+if (sgnb) {
+b = int128_neg(b);
+}
+
+divrem128(a, b, );
+
+if (sgna != sgnb) {
+q = int128_neg(q);
+}
+
+return q;
+}
+
+Int128 int128_rems(Int128 a, Int128 b)
+{
+Int128 q, r;
+bool sgna = !int128_nonneg(a);
+bool sgnb = !int128_nonneg(b);
+
+if (sgna) {
+a = int128_neg(a);
+}
+
+if (sgnb) {
+b = int128_neg(b);
+}
+
+r = divrem128(a, b, );
+
+if (sgna) {
+r = int128_neg(r);
+}
+
+return r;
+}
+
+#endif
diff --git a/util/meson.build b/util/meson.build
index 05b593055a..e676b2f6c6 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -48,6 +48,7 @@ util_ss.add(files('transactions.c'))
 util_ss.add(when: 'CONFIG_POSIX', if_true: files('drm.c'))
 util_ss.add(files('guest-random.c'))
 util_ss.add(files('yank.c'))
+util_ss.add(files('int128.c'))
 
 if have_user
   util_ss.add(files('selfmap.c'))
-- 
2.33.1




[RFC PATCH v2 1/5] virtio: introduce virtio_force_modern()

2021-11-12 Thread Halil Pasic
Legacy vs modern should be detected via transport specific means. We
can't wait till feature negotiation is done. Let us introduce
virtio_force_modern() as a means for the transport code to signal
that the device should operate in modern mode (because a modern driver
was detected).

A new callback is added for the situations where the device needs
to do more than just setting the VIRTIO_F_VERSION_1 feature bit. For
example, when vhost is involved, we may need to propagate the features
to the vhost device.

Signed-off-by: Halil Pasic 
---

I'm still struggling with how to deal with vhost-user and co. The
problem is that I'm not very familiar with the life-cycle of, let us
say, a vhost_user device.

Looks to me like the vhost part might be just an implementation detail,
and could even become a hot swappable thing.

Another thing is, that vhost processes set_features differently. It
might or might not be a good idea to change this.

Does anybody know why don't we propagate the features on features_set,
but under a set of different conditions, one of which is the vhost
device is started?
---
 hw/virtio/virtio.c | 13 +
 include/hw/virtio/virtio.h |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 3a1f6c520c..26db1b31e6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3281,6 +3281,19 @@ void virtio_init(VirtIODevice *vdev, const char *name,
 vdev->use_guest_notifier_mask = true;
 }
 
+void  virtio_force_modern(VirtIODevice *vdev)
+{
+VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+virtio_add_feature(>guest_features, VIRTIO_F_VERSION_1);
+/* Let the device do it's normal thing. */
+virtio_set_features(vdev, vdev->guest_features);
+/* For example for vhost-user we have to propagate to the vhost dev. */
+if (k->force_modern) {
+k->force_modern(vdev);
+}
+}
+
 /*
  * Only devices that have already been around prior to defining the virtio
  * standard support legacy mode; this includes devices not specified in the
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 8bab9cfb75..1bb1551865 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -126,6 +126,7 @@ struct VirtioDeviceClass {
 int (*validate_features)(VirtIODevice *vdev);
 void (*get_config)(VirtIODevice *vdev, uint8_t *config);
 void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
+void (*force_modern)(VirtIODevice *vdev);
 void (*reset)(VirtIODevice *vdev);
 void (*set_status)(VirtIODevice *vdev, uint8_t val);
 /* For transitional devices, this is a bitmap of features
@@ -394,6 +395,7 @@ static inline bool virtio_device_disabled(VirtIODevice 
*vdev)
 return unlikely(vdev->disabled || vdev->broken);
 }
 
+void  virtio_force_modern(VirtIODevice *vdev);
 bool virtio_legacy_allowed(VirtIODevice *vdev);
 bool virtio_legacy_check_disabled(VirtIODevice *vdev);
 
-- 
2.25.1




[RFC PATCH v2 4/5] vhost: push features to backend on force_modern

2021-11-12 Thread Halil Pasic
In vhost we don't push the features to the vhost device when the
features are set, but when the vhost device is started. This can
lead to problems when config space is implemented in the vhost
device, and the driver does some early config space reading (early in a
sense that it precedes setting FEATURES_OK).

Signed-off-by: Halil Pasic 
---
 hw/virtio/vhost.c | 17 +
 include/hw/virtio/vhost.h |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b4b29413e6..5764970298 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1628,6 +1628,23 @@ void vhost_dev_free_inflight(struct vhost_inflight 
*inflight)
 }
 }
 
+int vhost_dev_force_modern(struct vhost_dev *hdev)
+{
+uint64_t features;
+int r;
+
+assert(hdev->vhost_ops);
+
+hdev->acked_features |= (0x1ULL << VIRTIO_F_VERSION_1);
+features = hdev->acked_features;
+r = hdev->vhost_ops->vhost_set_features(hdev, features);
+if (r < 0) {
+VHOST_OPS_DEBUG("vhost_set_features failed");
+return -errno;
+}
+return 0;
+}
+
 static int vhost_dev_resize_inflight(struct vhost_inflight *inflight,
  uint64_t new_size)
 {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 1a9fc65089..9ef784e2e9 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -138,6 +138,8 @@ int vhost_dev_get_config(struct vhost_dev *hdev, uint8_t 
*config,
  uint32_t config_len, Error **errp);
 int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
  uint32_t offset, uint32_t size, uint32_t flags);
+
+int vhost_dev_force_modern(struct vhost_dev *vdev);
 /* notifier callback in case vhost device config space changed
  */
 void vhost_dev_set_config_notifier(struct vhost_dev *dev,
-- 
2.25.1




[PATCH v5 00/18] Adding partial support for 128-bit riscv target

2021-11-12 Thread Frédéric Pétrot
This series of patches provides partial 128-bit support for the riscv
target architecture, namely RVI and RVM, with minimal csr support.

First of all thanks for the feedback on v4 and guidance for v5.

This v5 mainly corrects flaws in the implementation pointed out by Richard
and Philippe:
- split the memop define renaming and addition in two patches
- 128-bit div/rem operations using the new version host-utils functions
  of Luis. The divrem algorithm is the one proposed by Stefan Kanthak and
  the implementation in QEMU appears to be a bit faster than gcc uint128_t
  support
- removed useless rv128 tests at various places
- refactoring the slt/bxx part so as to share the comparison part
- refactoring the 128-bit csr handling to share code more largely
  Also forwarding writes to the 64-bit version when not 128-bit version
  exists, as a vast majority of the csrs does not use the upper 64-bits

Frédéric Pétrot (18):
  exec/memop: Adding signedness to quad definitions
  exec/memop: Adding signed quad and octo defines
  qemu/int128: addition of div/rem 128-bit operations
  target/riscv: additional macros to check instruction support
  target/riscv: separation of bitwise logic and arithmetic helpers
  target/riscv: array for the 64 upper bits of 128-bit registers
  target/riscv: setup everything so that riscv128-softmmu compiles
  target/riscv: moving some insns close to similar insns
  target/riscv: accessors to registers upper part and 128-bit load/store
  target/riscv: support for 128-bit bitwise instructions
  target/riscv: support for 128-bit U-type instructions
  target/riscv: support for 128-bit shift instructions
  target/riscv: support for 128-bit arithmetic instructions
  target/riscv: support for 128-bit M extension
  target/riscv: adding high part of some csrs
  target/riscv: helper functions to wrap calls to 128-bit csr insns
  target/riscv: modification of the trans_csrxx for 128-bit support
  target/riscv: actual functions to realize crs 128-bit insns

 configs/devices/riscv128-softmmu/default.mak |  17 +
 configs/targets/riscv128-softmmu.mak |   6 +
 include/disas/dis-asm.h  |   1 +
 include/exec/memop.h |  15 +-
 include/hw/riscv/sifive_cpu.h|   3 +
 include/qemu/int128.h|   6 +
 include/tcg/tcg-op.h |   4 +-
 target/arm/translate-a32.h   |   4 +-
 target/riscv/cpu-param.h |   5 +
 target/riscv/cpu.h   |  23 +
 target/riscv/cpu_bits.h  |   3 +
 target/riscv/helper.h|   9 +
 target/riscv/insn16.decode   |  27 +-
 target/riscv/insn32.decode   |  25 +
 accel/tcg/cputlb.c   |  30 +-
 accel/tcg/user-exec.c|   8 +-
 disas/riscv.c|   5 +
 target/alpha/translate.c |  32 +-
 target/arm/helper-a64.c  |   8 +-
 target/arm/translate-a64.c   |   8 +-
 target/arm/translate-neon.c  |   6 +-
 target/arm/translate-sve.c   |  10 +-
 target/arm/translate-vfp.c   |   8 +-
 target/arm/translate.c   |   2 +-
 target/cris/translate.c  |   2 +-
 target/hppa/translate.c  |   4 +-
 target/i386/tcg/mem_helper.c |   2 +-
 target/i386/tcg/translate.c  |  36 +-
 target/m68k/op_helper.c  |   2 +-
 target/mips/tcg/translate.c  |  58 +-
 target/mips/tcg/tx79_translate.c |   8 +-
 target/ppc/translate.c   |  32 +-
 target/riscv/cpu.c   |  31 +-
 target/riscv/csr.c   | 198 -
 target/riscv/gdbstub.c   |   8 +
 target/riscv/m128_helper.c   | 109 +++
 target/riscv/machine.c   |  22 +
 target/riscv/op_helper.c |  44 ++
 target/riscv/translate.c | 252 ++-
 target/s390x/tcg/mem_helper.c|   8 +-
 target/s390x/tcg/translate.c |   8 +-
 target/sh4/translate.c   |  12 +-
 target/sparc/translate.c |  36 +-
 target/tricore/translate.c   |   4 +-
 target/xtensa/translate.c|   4 +-
 tcg/tcg.c|   4 +-
 tcg/tci.c|  16 +-
 util/int128.c| 145 
 accel/tcg/ldst_common.c.inc  |   8 +-
 target/mips/tcg/micromips_translate.c.inc|  10 +-
 target/ppc/translate/fixedpoint-impl.c.inc   |  22 +-
 target/ppc/translate/fp-impl.c.inc   |   4 +-
 target/ppc/translate/vsx-impl.c.inc  |  42 +-
 target/riscv/insn_trans/trans_rva.c.inc  |  22 +-
 

[RFC PATCH v2 5/5] virtio-net: handle force_modern for vhost

2021-11-12 Thread Halil Pasic
Signed-off-by: Halil Pasic 
---

Inspired by virtio_net_set_features() which I don't quite understand.
Why do we have to do vhost_net_ack_features() for each possible queue?
---
 hw/net/virtio-net.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f205331dcf..43ed9ef3ba 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -766,6 +766,25 @@ static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
 return features;
 }
 
+static void  virtio_net_force_modern(VirtIODevice *vdev)
+{
+VirtIONet *n = VIRTIO_NET(vdev);
+int i;
+
+/*
+ * Why do we have to loop over all queues? Are not features a
+ * per-device thing?
+ */
+for (i = 0;  i < n->max_queues; i++) {
+NetClientState *nc = qemu_get_subqueue(n->nic, i);
+
+if (!get_vhost_net(nc->peer)) {
+continue;
+}
+vhost_dev_force_modern(_vhost_net(nc->peer)->dev);
+}
+}
+
 static void virtio_net_apply_guest_offloads(VirtIONet *n)
 {
 qemu_set_offload(qemu_get_queue(n->nic)->peer,
@@ -3668,6 +3687,7 @@ static void virtio_net_class_init(ObjectClass *klass, 
void *data)
 vdc->get_features = virtio_net_get_features;
 vdc->set_features = virtio_net_set_features;
 vdc->bad_features = virtio_net_bad_features;
+vdc->force_modern = virtio_net_force_modern;
 vdc->reset = virtio_net_reset;
 vdc->set_status = virtio_net_set_status;
 vdc->guest_notifier_mask = virtio_net_guest_notifier_mask;
-- 
2.25.1




[RFC PATCH v2 3/5] virtio-pci: use virtio_force_modern()

2021-11-12 Thread Halil Pasic
Let us detect usage via the modern interface by tapping into the place
that implements the 'modern' reset.

Signed-off-by: Halil Pasic 
---
 hw/virtio/virtio-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 6e16e2705c..8dd862da21 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1297,6 +1297,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
 
 if (vdev->status == 0) {
 virtio_pci_reset(DEVICE(proxy));
+virtio_force_modern(virtio_bus_get_device(>bus));
 }
 
 break;
-- 
2.25.1




[RFC PATCH v2 2/5] virtio-ccw: use virtio_force_modern()

2021-11-12 Thread Halil Pasic
The fact that revision > 0 was negotiated implies that VIRTIO_VERSION_1
aka modern must be used. This negotiation is done before the obligatory
reset. Let us call virtio_force_modern() after the reset if revision > 0
was negotiated, so that the VIRTIO_VERSION_1 feature can be set, and
endianness starts working as it should for devices that comply to the
virtio spec.

Signed-off-by: Halil Pasic 
---
 hw/s390x/virtio-ccw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 6a2df1c1e9..88fbe87942 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -266,6 +266,9 @@ static void virtio_ccw_reset_virtio(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 dev->summary_indicator = NULL;
 }
 ccw_dev->sch->thinint_active = false;
+if (dev->revision > 0) {
+virtio_force_modern(vdev);
+}
 }
 
 static int virtio_ccw_handle_set_vq(SubchDev *sch, CCW1 ccw, bool check_len,
-- 
2.25.1




[RFC PATCH v2 0/5] virtio: early detect 'modern' virtio

2021-11-12 Thread Halil Pasic
This is an early RFC for a transport specific early detecton of
modern virtio, which is most relevant for transitional devices on big
endian platforms, when drivers access the config space before
FEATURES_OK is set.

The most important part that is missing here is fixing all the problems
that arise in the situation described in the previous paragraph, when
the config is managed by a vhost device (and thus outside QEMU. This
series tackles this problem only for virtio_net+vhost as an example. If
this approach is deemed good, we need to do something very similar for
every single affected device.

This series was only lightly tested. The vhost stuff is entirely
untested, unfortunately I don't have a working setup where this
handling would be needed (because the config space is handled in the
device). DPDK is not supported on s390x so at the moment I can't test
DPDK based setups. 

v1 -> v2:

* add callback
* tweak feature manipulation
* add generic handling for vhost that needs to be called by devices
* add handling for virtio

Halil Pasic (5):
  virtio: introduce virtio_force_modern()
  virtio-ccw: use virtio_force_modern()
  virtio-pci: use virtio_force_modern()
  vhost: push features to backend on force_modern
  virtio-net: handle force_modern for vhost

 hw/net/virtio-net.c| 20 
 hw/s390x/virtio-ccw.c  |  3 +++
 hw/virtio/vhost.c  | 17 +
 hw/virtio/virtio-pci.c |  1 +
 hw/virtio/virtio.c | 13 +
 include/hw/virtio/vhost.h  |  2 ++
 include/hw/virtio/virtio.h |  2 ++
 7 files changed, 58 insertions(+)


base-commit: 2c3e83f92d93fbab071b8a96b8ab769b01902475
-- 
2.25.1




Re: [PATCH v4 08/25] block: introduce assert_bdrv_graph_writable

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block.c|  5 +
  block/io.c | 11 +++
  include/block/block_int-global-state.h | 10 +-
  3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
  if (child->klass->detach) {
  child->klass->detach(child);
  }
+assert_bdrv_graph_writable(old_bs);
  QLIST_REMOVE(child, next_parent);


I think this belongs above the .detach() call (and the QLIST_REMOVE() 
belongs into the .detach() implementation, as done in 
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
which has been merged to Kevin’s block branch).



  }
  
  child->bs = new_bs;
  
  if (new_bs) {

+assert_bdrv_graph_writable(new_bs);
  QLIST_INSERT_HEAD(_bs->parents, child, next_parent);


In both these places it’s a bit strange that the assertion is done on 
the child nodes.  The subgraph starting from them isn’t modified after 
all, so their subgraph technically doesn’t need to be writable.  I think 
a single assertion on the parent node would be preferable.


I presume the problem with that is that we don’t have the parent node 
here?  Do we need a new BdrvChildClass method that performs this 
assertion on the parent node?


  
  /*

@@ -2940,6 +2942,7 @@ static int bdrv_attach_child_noperm(BlockDriverState 
*parent_bs,
  return ret;
  }
  
+assert_bdrv_graph_writable(parent_bs);

  QLIST_INSERT_HEAD(_bs->children, *child, next);
  /*
   * child is removed in bdrv_attach_child_common_abort(), so don't care to
@@ -3140,6 +3143,7 @@ static void bdrv_unset_inherits_from(BlockDriverState 
*root, BdrvChild *child,
  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
  {
  assert(qemu_in_main_thread());
+assert_bdrv_graph_writable(parent);


It looks to me like we have this assertion mainly because 
bdrv_replace_child_noperm() doesn’t have a pointer to this parent node.  
It’s a workaround, but we should have this in every path that eventually 
ends up at bdrv_replace_child_noperm(), and that seems rather difficult 
for the bdrv_replace_node() family of functions. That to me sounds like 
it’d be good to have this as a BdrvChildClass function.



  if (child == NULL) {
  return;
  }
@@ -4903,6 +4907,7 @@ static void bdrv_remove_filter_or_cow_child_abort(void 
*opaque)
  BdrvRemoveFilterOrCowChild *s = opaque;
  BlockDriverState *parent_bs = s->child->opaque;
  
+assert_bdrv_graph_writable(parent_bs);

  QLIST_INSERT_HEAD(_bs->children, s->child, next);
  if (s->is_backing) {
  parent_bs->backing = s->child;
diff --git a/block/io.c b/block/io.c
index f271ab3684..1c71e354d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -740,6 +740,17 @@ void bdrv_drain_all(void)
  bdrv_drain_all_end();
  }
  
+void assert_bdrv_graph_writable(BlockDriverState *bs)

+{
+/*
+ * TODO: this function is incomplete. Because the users of this
+ * assert lack the necessary drains, check only for BQL.
+ * Once the necessary drains are added,
+ * assert also for qatomic_read(>quiesce_counter) > 0
+ */
+assert(qemu_in_main_thread());
+}
+
  /**
   * Remove an active request from the tracked requests list
   *
diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index d08e80222c..6bd7746409 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -316,4 +316,12 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
   */
  void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
  
-#endif /* BLOCK_INT_GLOBAL_STATE*/

+/**
+ * Make sure that the function is either running under
+ * drain and BQL. The latter protects from concurrent writings


“either ... and” sounds wrong to me.  I’d drop the “either” or say 
“running under both drain and BQL”.


Hanna


+ * from the GS API, while the former prevents concurrent reads
+ * from I/O.
+ */
+void assert_bdrv_graph_writable(BlockDriverState *bs);
+
+#endif /* BLOCK_INT_GLOBAL_STATE */





[PATCH v2 2/3] target/ppc: Implement Vector Extract Mask

2021-11-12 Thread matheus . ferst
From: Matheus Ferst 

Implement the following PowerISA v3.1 instructions:
vextractbm: Vector Extract Byte Mask
vextracthm: Vector Extract Halfword Mask
vextractwm: Vector Extract Word Mask
vextractdm: Vector Extract Doubleword Mask
vextractqm: Vector Extract Quadword Mask

Suggested-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
v2:
- Applied rth suggestion to do_vextractm
---
 target/ppc/insn32.decode|  6 +++
 target/ppc/translate/vmx-impl.c.inc | 60 +
 2 files changed, 66 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 9a28f1d266..639ac22bf0 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -419,6 +419,12 @@ VEXPANDWM   000100 . 00010 . 1100110
@VX_tb
 VEXPANDDM   000100 . 00011 . 1100110@VX_tb
 VEXPANDQM   000100 . 00100 . 1100110@VX_tb
 
+VEXTRACTBM  000100 . 01000 . 1100110@VX_tb
+VEXTRACTHM  000100 . 01001 . 1100110@VX_tb
+VEXTRACTWM  000100 . 01010 . 1100110@VX_tb
+VEXTRACTDM  000100 . 01011 . 1100110@VX_tb
+VEXTRACTQM  000100 . 01100 . 1100110@VX_tb
+
 # VSX Load/Store Instructions
 
 LXV 01 . .  . 001   @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 58aca58f0f..dd7337c2f2 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb 
*a)
 return true;
 }
 
+static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
+TCGv_i64 t, b, tmp;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+REQUIRE_VECTOR(ctx);
+
+t = tcg_const_i64(0);
+b = tcg_temp_new_i64();
+tmp = tcg_temp_new_i64();
+
+for (int w = 0; w < 2; w++) {
+get_avr64(b, a->vrb, w);
+
+for (int i = 0; i < elem_count_half; i++) {
+int in_bit = (i + 1) * elem_width - 1;
+int out_bit = w * elem_count_half + i;
+
+if (in_bit > out_bit) {
+tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
+} else {
+tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
+}
+tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
+tcg_gen_or_i64(t, t, tmp);
+}
+}
+tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);
+
+tcg_temp_free_i64(t);
+tcg_temp_free_i64(b);
+tcg_temp_free_i64(tmp);
+
+return true;
+}
+
+TRANS(VEXTRACTBM, do_vextractm, MO_8)
+TRANS(VEXTRACTHM, do_vextractm, MO_16)
+TRANS(VEXTRACTWM, do_vextractm, MO_32)
+TRANS(VEXTRACTDM, do_vextractm, MO_64)
+
+static bool trans_VEXTRACTQM(DisasContext *ctx, arg_VX_tb *a)
+{
+TCGv_i64 tmp;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+REQUIRE_VECTOR(ctx);
+
+tmp = tcg_temp_new_i64();
+
+get_avr64(tmp, a->vrb, true);
+tcg_gen_shri_i64(tmp, tmp, 63);
+tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], tmp);
+
+tcg_temp_free_i64(tmp);
+
+return true;
+}
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)   \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)  \
 {   \
-- 
2.25.1




[PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns

2021-11-12 Thread matheus . ferst
From: Matheus Ferst 

Implement the following PowerISA v3.1 instructions:
mtvsrbm: Move to VSR Byte Mask
mtvsrhm: Move to VSR Halfword Mask
mtvsrwm: Move to VSR Word Mask
mtvsrdm: Move to VSR Doubleword Mask
mtvsrqm: Move to VSR Quadword Mask
mtvsrbmi: Move to VSR Byte Mask Immediate

Suggested-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
v2:
- Applied rth suggestions to do_mtvsrm and trans_MTVSRBMI
---
 target/ppc/insn32.decode|  11 +++
 target/ppc/translate/vmx-impl.c.inc | 115 
 2 files changed, 126 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 639ac22bf0..f68931f4f3 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -40,6 +40,10 @@
 %ds_rtp 22:4   !function=times_2
 @DS_rtp .. 0 ra:5 .. ..  rt=%ds_rtp 
si=%ds_si
 
+_b   vrt b
+%dx_b   6:10 16:5 0:1
+@DX_b   .. vrt:5  . .. . .  _b b=%dx_b
+
  rt d
 %dx_d   6:s10 16:5 0:1
 @DX .. rt:5  . .. . .d=%dx_d
@@ -413,6 +417,13 @@ VSRDBI  000100 . . . 01 ... 010110  @VN
 
 ## Vector Mask Manipulation Instructions
 
+MTVSRBM 000100 . 1 . 1100110@VX_tb
+MTVSRHM 000100 . 10001 . 1100110@VX_tb
+MTVSRWM 000100 . 10010 . 1100110@VX_tb
+MTVSRDM 000100 . 10011 . 1100110@VX_tb
+MTVSRQM 000100 . 10100 . 1100110@VX_tb
+MTVSRBMI000100 . . .. 01010 .   @DX_b
+
 VEXPANDBM   000100 . 0 . 1100110@VX_tb
 VEXPANDHM   000100 . 1 . 1100110@VX_tb
 VEXPANDWM   000100 . 00010 . 1100110@VX_tb
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index dd7337c2f2..404767e4ec 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1599,6 +1599,121 @@ static bool trans_VEXTRACTQM(DisasContext *ctx, 
arg_VX_tb *a)
 return true;
 }
 
+static bool do_mtvsrm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
+uint64_t c;
+int i, j;
+TCGv_i64 hi, lo, t0, t1;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+REQUIRE_VECTOR(ctx);
+
+hi = tcg_temp_new_i64();
+lo = tcg_temp_new_i64();
+t0 = tcg_temp_new_i64();
+t1 = tcg_temp_new_i64();
+
+tcg_gen_extu_tl_i64(t0, cpu_gpr[a->vrb]);
+tcg_gen_extract_i64(hi, t0, elem_count_half, elem_count_half);
+tcg_gen_extract_i64(lo, t0, 0, elem_count_half);
+
+/*
+ * Spread the bits into their respective elements.
+ * E.g. for bytes:
+ * abcdefgh
+ *   << 32 - 4
+ * abcdefgh
+ *   |
+ * abcdefghabcdefgh
+ *   << 16 - 2
+ * 00abcdefghabcdefgh00
+ *   |
+ * 00abcdefgh00abcdefgh00abcdefgh00abcdefgh
+ *   << 8 - 1
+ * 000abcdefgh00abcdefgh00abcdefgh00abcdefgh000
+ *   |
+ * 000abcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgh
+ *   & dup(1)
+ * 000a000b000c000d000e000f000g000h
+ *   * 0xff
+ * 
+ */
+for (i = elem_count_half / 2, j = 32; i > 0; i >>= 1, j >>= 1) {
+tcg_gen_shli_i64(t0, hi, j - i);
+tcg_gen_shli_i64(t1, lo, j - i);
+tcg_gen_or_i64(hi, hi, t0);
+tcg_gen_or_i64(lo, lo, t1);
+}
+
+c = dup_const(vece, 1);
+tcg_gen_andi_i64(hi, hi, c);
+tcg_gen_andi_i64(lo, lo, c);
+
+c = MAKE_64BIT_MASK(0, elem_width);
+tcg_gen_muli_i64(hi, hi, c);
+tcg_gen_muli_i64(lo, lo, c);
+
+set_avr64(a->vrt, lo, false);
+set_avr64(a->vrt, hi, true);
+
+tcg_temp_free_i64(hi);
+tcg_temp_free_i64(lo);
+tcg_temp_free_i64(t0);
+tcg_temp_free_i64(t1);
+
+return true;
+}
+
+TRANS(MTVSRBM, do_mtvsrm, MO_8)
+TRANS(MTVSRHM, do_mtvsrm, MO_16)
+TRANS(MTVSRWM, do_mtvsrm, MO_32)
+TRANS(MTVSRDM, do_mtvsrm, MO_64)
+
+static bool trans_MTVSRQM(DisasContext *ctx, arg_VX_tb *a)
+{
+TCGv_i64 tmp;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+REQUIRE_VECTOR(ctx);
+
+tmp = tcg_temp_new_i64();
+
+tcg_gen_ext_tl_i64(tmp, cpu_gpr[a->vrb]);
+tcg_gen_sextract_i64(tmp, tmp, 0, 1);
+set_avr64(a->vrt, tmp, false);
+set_avr64(a->vrt, tmp, true);
+
+tcg_temp_free_i64(tmp);
+
+return true;
+}
+
+static bool trans_MTVSRBMI(DisasContext *ctx, arg_DX_b *a)
+{
+const uint64_t mask = dup_const(MO_8, 1);
+uint64_t hi, lo;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+

[PATCH v2 1/3] target/ppc: Implement Vector Expand Mask

2021-11-12 Thread matheus . ferst
From: Matheus Ferst 

Implement the following PowerISA v3.1 instructions:
vexpandbm: Vector Expand Byte Mask
vexpandhm: Vector Expand Halfword Mask
vexpandwm: Vector Expand Word Mask
vexpanddm: Vector Expand Doubleword Mask
vexpandqm: Vector Expand Quadword Mask

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/insn32.decode| 11 ++
 target/ppc/translate/vmx-impl.c.inc | 34 +
 2 files changed, 45 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index e135b8aba4..9a28f1d266 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -56,6 +56,9 @@
 _uim4vrt uim vrb
 @VX_uim4.. vrt:5 . uim:4 vrb:5 ...  _uim4
 
+_tb  vrt vrb
+@VX_tb  .. vrt:5 . vrb:5 ..._tb
+
   rt ra rb
 @X  .. rt:5 ra:5 rb:5 .. .  
 
@@ -408,6 +411,14 @@ VINSWVRX000100 . . . 0011000@VX
 VSLDBI  000100 . . . 00 ... 010110  @VN
 VSRDBI  000100 . . . 01 ... 010110  @VN
 
+## Vector Mask Manipulation Instructions
+
+VEXPANDBM   000100 . 0 . 1100110@VX_tb
+VEXPANDHM   000100 . 1 . 1100110@VX_tb
+VEXPANDWM   000100 . 00010 . 1100110@VX_tb
+VEXPANDDM   000100 . 00011 . 1100110@VX_tb
+VEXPANDQM   000100 . 00100 . 1100110@VX_tb
+
 # VSX Load/Store Instructions
 
 LXV 01 . .  . 001   @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index b361f73a67..58aca58f0f 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1505,6 +1505,40 @@ static bool trans_VSRDBI(DisasContext *ctx, arg_VN *a)
 return true;
 }
 
+static bool do_vexpand(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+REQUIRE_VECTOR(ctx);
+
+tcg_gen_gvec_sari(vece, avr_full_offset(a->vrt), avr_full_offset(a->vrb),
+  (8 << vece) - 1, 16, 16);
+
+return true;
+}
+
+TRANS(VEXPANDBM, do_vexpand, MO_8)
+TRANS(VEXPANDHM, do_vexpand, MO_16)
+TRANS(VEXPANDWM, do_vexpand, MO_32)
+TRANS(VEXPANDDM, do_vexpand, MO_64)
+
+static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
+{
+TCGv_i64 tmp;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+REQUIRE_VECTOR(ctx);
+
+tmp = tcg_temp_new_i64();
+
+get_avr64(tmp, a->vrb, true);
+tcg_gen_sari_i64(tmp, tmp, 63);
+set_avr64(a->vrt, tmp, false);
+set_avr64(a->vrt, tmp, true);
+
+tcg_temp_free_i64(tmp);
+return true;
+}
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)   \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)  \
 {   \
-- 
2.25.1




[PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask

2021-11-12 Thread matheus . ferst
From: Matheus Ferst 

This is a small patch series just to allow Ubuntu 21.10 to boot with
-cpu POWER10. Glibc 2.34 is using vextractbm, so the init is killed by
SIGILL without the second patch of this series. The other two insns. are
included as they are somewhat close to Vector Extract Mask (at least in
pseudocode).

v2:
- Applied rth suggestions to VEXTRACT[BHWDQ]M and MTVSR[BHWDQ]M[I]

Matheus Ferst (3):
  target/ppc: Implement Vector Expand Mask
  target/ppc: Implement Vector Extract Mask
  target/ppc: Implement Vector Mask Move insns

 target/ppc/insn32.decode|  28 
 target/ppc/translate/vmx-impl.c.inc | 209 
 2 files changed, 237 insertions(+)

-- 
2.25.1




Re: [PATCH v10 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-12 Thread Richard Henderson

On 11/12/21 7:53 AM, Song Gao wrote:

+#
+# Fields
+#
+%rd  0:5
+%rj  5:5
+%rk  10:5
+%sa2 15:2
+%si1210:s12
+%ui1210:12
+%si1610:s16
+%si205:s20


You should only create separate field definitions like this when they are complex: e.g. 
the logical field is disjoint or there's a need for !function.



+
+#
+# Argument sets
+#
+_rdrjrk rd rj rk
+_rdrjsi12   rd rj si12
+_rdrjrksa2  rd rj rk sa2
+_rdrjsi16   rd rj si16
+_rdrjui12   rd rj ui12
+_rdsi20 rd si20


Some of these should be combined.  The width of the immediate is a detail of the format, 
not the decoded argument set.  Thus you should have


_rdimm rd imm
_rdrjimm   rd rj imm
_rdrjrkrd rj rk
_rdrjrksa  rd rj rk sa


+alsl_w     010 .. . . .   @fmt_rdrjrksa2
+alsl_wu    011 .. . . .   @fmt_rdrjrksa2
+alsl_d    0010 110 .. . . .   @fmt_rdrjrksa2


The encoding of these insns is that the shift is sa+1.

While you compensate for this in gen_alsl_*, we print the "wrong" number in the 
disassembly.  I think it would be better to do


%sa2p1 15:2 !function=plus_1
@fmt_rdrjrksa2p1    ... .. rk:5 rj:5 rd:5 \
  _rdrjrksa sa=%sa2p1


r~



Re: [PATCH v4 07/25] assertions for block_int global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block.c | 17 +
  block/backup.c  |  1 +
  block/block-backend.c   |  3 +++
  block/commit.c  |  2 ++
  block/dirty-bitmap.c|  1 +
  block/io.c  |  6 ++
  block/mirror.c  |  4 
  block/monitor/bitmap-qmp-cmds.c |  6 ++
  block/stream.c  |  2 ++
  blockdev.c  |  7 +++
  10 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index 672f946065..41c5883c5c 100644
--- a/block.c
+++ b/block.c


[...]


@@ -7473,6 +7488,7 @@ static bool append_strong_runtime_options(QDict *d, 
BlockDriverState *bs)
   * would result in exactly bs->backing. */
  bool bdrv_backing_overridden(BlockDriverState *bs)
  {
+assert(qemu_in_main_thread());
  if (bs->backing) {
  return strcmp(bs->auto_backing_file,
bs->backing->bs->filename);


This function is in block_int-common.h, though.

[...]


diff --git a/block/io.c b/block/io.c
index c5d7f8495e..f271ab3684 100644
--- a/block/io.c
+++ b/block/io.c


[...]


@@ -3419,6 +3423,7 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, 
int64_t src_offset,
  {
  trace_bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, bytes,
read_flags, write_flags);
+assert(qemu_in_main_thread());
  return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
 bytes, read_flags, write_flags, true);
  }


This is a block_int-io.h function.


@@ -3435,6 +3440,7 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, 
int64_t src_offset,
  {
  trace_bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes,
  read_flags, write_flags);
+assert(qemu_in_main_thread());
  return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
 bytes, read_flags, write_flags, false);
  }


This, too.

Hanna




Re: [PULL 03/54] target/ppc: Move load and store floating point instructions to decodetree

2021-11-12 Thread Cédric Le Goater

On 11/10/21 18:04, Laurent Vivier wrote:

On 10/11/2021 17:56, Cédric Le Goater wrote:

On 11/10/21 17:33, Laurent Vivier wrote:

On 09/11/2021 06:51, David Gibson wrote:

From: Fernando Eckhardt Valle 

Move load floating point instructions (lfs, lfsu, lfsx, lfsux, lfd, lfdu, lfdx, 
lfdux)
and store floating point instructions(stfs, stfsu, stfsx, stfsux, stfd, stfdu, 
stfdx,
stfdux) from legacy system to decodetree.

Reviewed-by: Richard Henderson 
Signed-off-by: Fernando Eckhardt Valle 
Signed-off-by: Matheus Ferst 
Message-Id: <20211029202424.175401-4-matheus.fe...@eldorado.org.br>
Signed-off-by: David Gibson 
---
  target/ppc/insn32.decode   |  24 +++
  target/ppc/translate/fp-impl.c.inc | 247 +
  target/ppc/translate/fp-ops.c.inc  |  29 
  3 files changed, 95 insertions(+), 205 deletions(-)



This patch breaks qemu linux-user with an ubuntu bionic chroot.

The fix proposed by Matheus [1] fixes it for me.
When will it be merged?
It's needed in 6.2



It's queued for 6.2 :

   https://github.com/legoater/qemu/commits/ppc-6.2

I wanted to wait the end of the week before sending a PR. Unless
this is critical for you of course.


Not it's not critical. It can wait the end of the week.


It's merged.

Thanks,

C.
 



Re: does drive_get_next(IF_NONE) make sense?

2021-11-12 Thread Markus Armbruster
Thomas Huth  writes:

> On 03/11/2021 09.41, Markus Armbruster wrote:
>> Peter Maydell  writes:
>> 
>>> Does it make sense for a device/board to do drive_get_next(IF_NONE) ?
>> Short answer: hell, no!  ;)
>
> Would it make sense to add an "assert(type != IF_NONE)" to drive_get()
> to avoid such mistakes in the future?

Worth a try.




Re: [PULL 0/3] ppc 6.2 queue

2021-11-12 Thread Richard Henderson

On 11/12/21 12:15 PM, Cédric Le Goater wrote:

The following changes since commit 0a70bcf18caf7a61d480f8448723c15209d128ef:

   Update version for v6.2.0-rc0 release (2021-11-09 18:22:57 +0100)

are available in the Git repository at:

   https://github.com/legoater/qemu/ tags/pull-ppc-2022

for you to fetch changes up to d139786e1b3d67991e6cb49a8a59bb2182350285:

   ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb() 
(2021-11-11 11:35:13 +0100)


ppc 6.2 queue :

* Fix of a regression in floating point load instructions (Matheus)
* Associativity fix for pseries machine (Daniel)
* tlbivax fix for BookE machines (Danel)


Daniel Henrique Barboza (2):
   spapr_numa.c: fix FORM1 distance-less nodes
   ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb()

Matheus Ferst (1):
   target/ppc: Fix register update on lf[sd]u[x]/stf[sd]u[x]

  hw/ppc/spapr_numa.c| 62 +++---
  target/ppc/mmu_helper.c|  2 +-
  target/ppc/translate/fp-impl.c.inc |  2 +-
  3 files changed, 33 insertions(+), 33 deletions(-)


Applied, thanks.

r~




Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI

2021-11-12 Thread Igor Mammedov
On Wed, 10 Nov 2021 12:01:11 +0100
David Hildenbrand  wrote:

> On 10.11.21 11:33, Igor Mammedov wrote:
> > On Fri, 5 Nov 2021 23:47:37 +1100
> > Gavin Shan  wrote:
> >   
> >> Hi Drew and Igor,
> >>
> >> On 11/2/21 6:39 PM, Andrew Jones wrote:  
> >>> On Tue, Nov 02, 2021 at 10:44:08AM +1100, Gavin Shan wrote:
> 
>  Yeah, I agree. I don't have strong sense to expose these empty nodes
>  for now. Please ignore the patch.
> 
> >>>
> >>> So were describing empty numa nodes on the command line ever a reasonable
> >>> thing to do? What happens on x86 machine types when describing empty numa
> >>> nodes? I'm starting to think that the solution all along was just to
> >>> error out when a numa node has memory size = 0...  
> > 
> > memory less nodes are fine as long as there is another type of device
> > that describes  a node (apic/gic/...).
> > But there is no way in spec to describe completely empty nodes,
> > and I dislike adding out of spec entries just to fake an empty node.
> >   
> 
> There are reasonable *upcoming* use cases for initially completely empty
> NUMA nodes with virtio-mem: being able to expose a dynamic amount of
> performance-differentiated memory to a VM. I don't know of any existing
> use cases that would require that as of now.
> 
> Examples include exposing HBM or PMEM to the VM. Just like on real HW,
> this memory is exposed via cpu-less, special nodes. In contrast to real
> HW, the memory is hotplugged later (I don't think HW supports hotplug
> like that yet, but it might just be a matter of time).

I suppose some of that maybe covered by GENERIC_AFFINITY entries in SRAT
some by MEMORY entries. Or nodes created dynamically like with normal
hotplug memory.


> The same should be true when using DIMMs instead of virtio-mem in this
> example.
> 
> >   
> >> Sorry for the delay as I spent a few days looking into linux virtio-mem
> >> driver. I'm afraid we still need this patch for ARM64. I don't think x86  
> > 
> > does it behave the same way is using pc-dimm hotplug instead of virtio-mem?
> > 
> > CCing David
> > as it might be virtio-mem issue.  
> 
> Can someone share the details why it's a problem on arm64 but not on
> x86-64? I assume this really only applies when having a dedicated, empty
> node -- correct?
> 
> > 
> > PS:
> > maybe for virtio-mem-pci, we need to add GENERIC_AFFINITY entry into SRAT
> > and describe it as PCI device (we don't do that yet if I'm no mistaken).  
> 
> virtio-mem exposes the PXM itself, and avoids exposing it memory via any
> kind of platform specific firmware maps. The PXM gets translated in the
> guest accordingly. For now there was no need to expose this in SRAT --
> the SRAT is really only used to expose the maximum possible PFN to the
> VM, just like it would have to be used to expose "this is a possible node".
> 
> Of course, we could use any other paravirtualized interface to expose
> both information. For example, on s390x, I'll have to introduce a new
> hypercall to query the "device memory region" to detect the maximum
> possible PFN, because existing interfaces don't allow for that. For now
> we're ruinning SRAT to expose "maximum possible PFN" simply because it's
> easy to re-use.
> 
> But I assume that hotplugging a DIMM to an empty node will have similar
> issues on arm64.
> 
> >   
> >> has this issue even though I didn't experiment on X86. For example, I
> >> have the following command lines. The hot added memory is put into node#0
> >> instead of node#2, which is wrong.  
> 
> I assume Linux will always fallback to node 0 if node X is not possible
> when translating the PXM.

I tested how x86 behaves, with pc-dimm, and it seems that
fc43 guest works only sometimes.
cmd:
  -numa node,memdev=mem,cpus=0 -numa node,cpus=1 -numa node -numa node

1: hotplug into the empty last node creates a new node dynamically 
2: hotplug into intermediate empty node (last-1) is broken, memory goes into 
the first node

We should check if it possible to fix guest instead of adding bogus SRAT 
entries.




Re: [PATCH for 6.2 v3 4/5] hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC

2021-11-12 Thread Ani Sinha
On Fri, Nov 12, 2021 at 4:41 PM Igor Mammedov  wrote:
>
> From: Julia Suvorova 
>
> There are two ways to enable ACPI PCI Hot-plug:
>
> * Disable the Hot-plug Capable bit on PCIe slots.
>
> This was the first approach which led to regression [1-2], as
> I/O space for a port is allocated only when it is hot-pluggable,
> which is determined by HPC bit.
>
> * Leave the HPC bit on and disable PCIe Native Hot-plug in _OSC
>   method.
>
> This removes the (future) ability of hot-plugging switches with PCIe
> Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
> bridges. If the user wants to explicitely use this feature, they can
> disable ACPI PCI Hot-plug with:
> --global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
>
> Change the bit in _OSC method so that the OS selects ACPI PCI Hot-plug
> instead of PCIe Native.
>
> [1] https://gitlab.com/qemu-project/qemu/-/issues/641
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409
>
> Signed-off-by: Julia Suvorova 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Ani Sinha 

> ---
> v2:
>   - (mst)
>   * drop local hotplug var and opencode it
>   * rename acpi_pcihp parameter to enable_native_pcie_hotplug
> to reflect what it actually does
>
> tested:
>   with hotplugging nic into 1 root port with seabios/ovmf/Fedora34
>   Windows tested only with seabios (using exiting images)
>   (installer fails to install regardless on bios)
> ---
>  hw/i386/acpi-build.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index a3ad6abd33..a99c6e4fe3 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1337,7 +1337,7 @@ static void build_x86_acpi_pci_hotplug(Aml *table, 
> uint64_t pcihp_addr)
>  aml_append(table, scope);
>  }
>
> -static Aml *build_q35_osc_method(void)
> +static Aml *build_q35_osc_method(bool enable_native_pcie_hotplug)
>  {
>  Aml *if_ctx;
>  Aml *if_ctx2;
> @@ -1359,8 +1359,10 @@ static Aml *build_q35_osc_method(void)
>  /*
>   * Always allow native PME, AER (no dependencies)
>   * Allow SHPC (PCI bridges can have SHPC controller)
> + * Disable PCIe Native Hot-plug if ACPI PCI Hot-plug is enabled.
>   */
> -aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
> +aml_append(if_ctx, aml_and(a_ctrl,
> +aml_int(0x1E | (enable_native_pcie_hotplug ? 0x1 : 0x0)), a_ctrl));
>
>  if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1;
>  /* Unknown revision */
> @@ -1449,7 +1451,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>  aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
>  aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
>  aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
> -aml_append(dev, build_q35_osc_method());
> +aml_append(dev, build_q35_osc_method(!pm->pcihp_bridge_en));
>  aml_append(sb_scope, dev);
>  if (mcfg_valid) {
>  aml_append(sb_scope, build_q35_dram_controller());
> @@ -1565,7 +1567,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>  if (pci_bus_is_express(bus)) {
>  aml_append(dev, aml_name_decl("_HID", 
> aml_eisaid("PNP0A08")));
>  aml_append(dev, aml_name_decl("_CID", 
> aml_eisaid("PNP0A03")));
> -aml_append(dev, build_q35_osc_method());
> +
> +/* Expander bridges do not have ACPI PCI Hot-plug enabled */
> +aml_append(dev, build_q35_osc_method(true));
>  } else {
>  aml_append(dev, aml_name_decl("_HID", 
> aml_eisaid("PNP0A03")));
>  }
> --
> 2.27.0
>



Re: [PATCH v4 04/25] include/sysemu/block-backend: split header into I/O and global state (GS) API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Similarly to the previous patches, split block-backend.h
in block-backend-io.h and block-backend-global-state.h

In addition, remove "block/block.h" include as it seems
it is not necessary anymore, together with "qemu/iov.h"

block-backend-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block/block-backend.c   |   9 +-
  include/sysemu/block-backend-common.h   |  74 ++
  include/sysemu/block-backend-global-state.h | 122 +
  include/sysemu/block-backend-io.h   | 139 ++
  include/sysemu/block-backend.h  | 269 +---
  5 files changed, 344 insertions(+), 269 deletions(-)
  create mode 100644 include/sysemu/block-backend-common.h
  create mode 100644 include/sysemu/block-backend-global-state.h
  create mode 100644 include/sysemu/block-backend-io.h


[...]


diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index e5e1524f06..038be9fc40 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -13,272 +13,9 @@
  #ifndef BLOCK_BACKEND_H
  #define BLOCK_BACKEND_H
  
-#include "qemu/iov.h"

-#include "block/throttle-groups.h"
+#include "block-backend-global-state.h"
+#include "block-backend-io.h"
  
-/*

- * TODO Have to include block/block.h for a bunch of block layer
- * types.  Unfortunately, this pulls in the whole BlockDriverState
- * API, which we don't want used by many BlockBackend users.  Some of
- * the types belong here, and the rest should be split into a common
- * header and one for the BlockDriverState API.
- */
-#include "block/block.h"


This note and the include is gone.  Sounds like something positive, but 
why is this possible?


Hanna




Re: [PATCH v4 02/25] include/block/block: split header into I/O and global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

block.h currently contains a mix of functions:
some of them run under the BQL and modify the block layer graph,
others are instead thread-safe and perform I/O in iothreads.
It is not easy to understand which function is part of which
group (I/O vs GS), and this patch aims to clarify it.

The "GS" functions need the BQL, and often use
aio_context_acquire/release and/or drain to be sure they
can modify the graph safely.
The I/O function are instead thread safe, and can run in
any AioContext.

By splitting the header in two files, block-io.h
and block-global-state.h we have a clearer view on what
needs what kind of protection. block-common.h
contains common structures shared by both headers.

block.h is left there for legacy and to avoid changing
all includes in all c files that use the block APIs.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block.c|   3 +
  block/meson.build  |   7 +-
  include/block/block-common.h   | 389 +
  include/block/block-global-state.h | 286 ++
  include/block/block-io.h   | 306 ++
  include/block/block.h  | 878 +
  6 files changed, 1012 insertions(+), 857 deletions(-)
  create mode 100644 include/block/block-common.h
  create mode 100644 include/block/block-global-state.h
  create mode 100644 include/block/block-io.h


[...]


diff --git a/include/block/block-common.h b/include/block/block-common.h
new file mode 100644
index 00..4f1fd8de21
--- /dev/null
+++ b/include/block/block-common.h


[...]


+#define BLKDBG_EVENT(child, evt) \
+do { \
+if (child) { \
+bdrv_debug_event(child->bs, evt); \
+} \
+} while (0)


This is defined twice, once here, and...


diff --git a/include/block/block-io.h b/include/block/block-io.h
new file mode 100644
index 00..9af4609ccb
--- /dev/null
+++ b/include/block/block-io.h


[...]


+#define BLKDBG_EVENT(child, evt) \
+do { \
+if (child) { \
+bdrv_debug_event(child->bs, evt); \
+} \
+} while (0)


...once here.

[...]


+/**
+ * bdrv_drained_begin:
+ *
+ * Begin a quiesced section for exclusive access to the BDS, by disabling
+ * external request sources including NBD server and device model. Note that
+ * this doesn't block timers or coroutines from submitting more requests, which
+ * means block_job_pause is still necessary.


Where does this sentence come from?  I can’t see it in master or in the 
lines removed from block.h:



+ *
+ * This function can be recursive.
+ */
+void bdrv_drained_begin(BlockDriverState *bs);


[...]


diff --git a/include/block/block.h b/include/block/block.h
index e5dd22b034..1e6b8fef1e 100644
--- a/include/block/block.h
+++ b/include/block/block.h


[...]


-/**
- * bdrv_drained_begin:
- *
- * Begin a quiesced section for exclusive access to the BDS, by disabling
- * external request sources including NBD server, block jobs, and device model.
- *
- * This function can be recursive.
- */
-void bdrv_drained_begin(BlockDriverState *bs);





Re: [PATCH v2 07/10] transactions: Invoke clean() after everything else

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

Invoke the transaction drivers' .clean() methods only after all
.commit() or .abort() handlers are done.

This makes it easier to have nested transactions where the top-level
transactions pass objects to lower transactions that the latter can
still use throughout their commit/abort phases, while the top-level
transaction keeps a reference that is released in its .clean() method.

(Before this commit, that is also possible, but the top-level
transaction would need to take care to invoke tran_add() before the
lower-level transaction does.  This commit makes the ordering
irrelevant, which is just a bit nicer.)

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH 0/6] RfC: try improve native hotplug for pcie root ports

2021-11-12 Thread Igor Mammedov
On Fri, 12 Nov 2021 12:15:28 +0100
Gerd Hoffmann  wrote:

> On Thu, Nov 11, 2021 at 10:39:59AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2021 at 01:09:05PM +0100, Gerd Hoffmann wrote:  
> > >   Hi,
> > >   
> > > > When the acpihp driver is used the linux kernel will just call the aml
> > > > methods and I suspect the pci device will stay invisible then because
> > > > nobody flips the slot power control bit (with native-hotplug=on, for
> > > > native-hotplug=off this isn't a problem of course).  
> > > 
> > > Hmm, on a quick smoke test with both patch series (mine + igors) applied
> > > everything seems to work fine on a quick glance.  Dunno why.  Maybe the
> > > pcieport driver turns on slot power even in case pciehp is not active.  
> 
> Digged deeper.  Updating power status is handled by the plug() callback,
> which is never called in case acpi hotplug is active.  The guest seems
> to never touch slot power control either, so it's working fine.  Still
> feels a bit fragile though.
> 
> > Well power and hotplug capabilities are mostly unrelated, right?  
> 
> At least they are separate slot capabilities.  The linux pciehp driver
> checks whenever the power control is present before using it, so having
> PwrCtrl- HotPlug+ seems to be a valid combination.
> 
> We even have an option for that: pcie-root-port.power_controller_present
> 
> So flipping that to off in case apci hotplug is active should make sure
> we never run into trouble with pci devices being powered off.
> 
> Igor?  Can you add that to your patch series?

Sorry, saw it too late.
I'll test idea with my set of guests to see if there are any adverse effects.


> > I feel switching to native so late would be inappropriate, looks more
> > like a feature than a bugfix. Given that - we need Igor's patches.
> > Given that - would you say I should apply yours?  
> 
> I think when setting power_controller_present=off for acpi hotplug it is
> safe to merge both mine and igor's.
> 
> take care,
>   Gerd
> 




Re: [PATCH v4 06/25] include/block/block_int: split header into I/O and global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Similarly to the previous patch, split block_int.h
in block_int-io.h and block_int-global-state.h

block_int-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  blockdev.c |5 +
  include/block/block_int-common.h   | 1164 +++
  include/block/block_int-global-state.h |  319 +
  include/block/block_int-io.h   |  163 +++
  include/block/block_int.h  | 1478 +---
  5 files changed, 1654 insertions(+), 1475 deletions(-)
  create mode 100644 include/block/block_int-common.h
  create mode 100644 include/block/block_int-global-state.h
  create mode 100644 include/block/block_int-io.h


[...]


diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
new file mode 100644
index 00..79a3d801d2
--- /dev/null
+++ b/include/block/block_int-common.h


[...]


+struct BlockDriver {


[...]


+/**
+ * Try to get @bs's logical and physical block size.
+ * On success, store them in @bsz and return zero.
+ * On failure, return negative errno.
+ */
+/* I/O API, even though if it's a filter jumps on parent */


I don’t understand this...


+int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);
+/**
+ * Try to get @bs's geometry (cyls, heads, sectors)
+ * On success, store them in @geo and return 0.
+ * On failure return -errno.
+ * Only drivers that want to override guest geometry implement this
+ * callback; see hd_geometry_guess().
+ */
+/* I/O API, even though if it's a filter jumps on parent */


...or this comment.  bdrv_probe_blocksizes() and bdrv_probe_geometry() 
are in block-global-state.h, so why are these methods part of the I/O 
API?  (And I’m afraid I can’t parse “even though if it’s a filter jumps 
on parent”.)


Hanna


+int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);





Re: [PATCH v2 06/10] block: Restructure remove_file_or_backing_child()

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

As of a future patch, bdrv_replace_child_tran() will take a BdrvChild **
pointer.  Prepare for that by getting such a pointer and using it where
applicable, and (dereferenced) as a parameter for
bdrv_replace_child_tran().

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v2 05/10] block: Pass BdrvChild ** to replace_child_noperm

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
set it to NULL.  That is dangerous, because BDS parents generally assume
that their children's .bs pointer is never NULL.  We therefore want to
let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
to NULL, too.

This patch lays the foundation for it by passing a BdrvChild ** pointer
to bdrv_replace_child_noperm() so that it can later use it to NULL the
BdrvChild pointer immediately after setting BdrvChild.bs to NULL.

(We will still need to undertake some intermediate steps, though.)

Signed-off-by: Hanna Reitz 


Series already applied, but I still feel myself responsible to track how 
transactions changed:)

Don't bother with applying my r-b marks into applied series.


---
  block.c | 23 ---
  1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index c7d5aa5254..d668156eca 100644
--- a/block.c
+++ b/block.c
@@ -87,7 +87,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
  static bool bdrv_recurse_has_child(BlockDriverState *bs,
 BlockDriverState *child);
  
-static void bdrv_replace_child_noperm(BdrvChild *child,

+static void bdrv_replace_child_noperm(BdrvChild **child,
BlockDriverState *new_bs);
  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
BdrvChild *child,
@@ -2270,7 +2270,7 @@ static void bdrv_replace_child_abort(void *opaque)
  BlockDriverState *new_bs = s->child->bs;
  
  /* old_bs reference is transparently moved from @s to @s->child */

-bdrv_replace_child_noperm(s->child, s->old_bs);
+bdrv_replace_child_noperm(>child, s->old_bs);


 - no sense / no harm in  clearing the pointer, as it's a field in transaction 
state struct, and should not be used after abort
 - hard to say do we really need clearing some another pointer, upper level 
should care about it


  bdrv_unref(new_bs);
  }
  
@@ -2300,7 +2300,7 @@ static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,

  if (new_bs) {
  bdrv_ref(new_bs);
  }
-bdrv_replace_child_noperm(child, new_bs);
+bdrv_replace_child_noperm(, new_bs);


 - no sence / no harm, as it's a local variable, which is not used anymore
 - most probably we have some another pointer that should be cleared, but it's 
not available here.. To make it available, bdrv_replace_child_tran() should get 
BdrvChild **.. maybe later patch will do it


  /* old_bs reference is transparently moved from @child to @s */
  }
  
@@ -2672,9 +2672,10 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)

  return permissions[qapi_perm];
  }
  
-static void bdrv_replace_child_noperm(BdrvChild *child,

+static void bdrv_replace_child_noperm(BdrvChild **childp,
BlockDriverState *new_bs)
  {
+BdrvChild *child = *childp;


No real logic change for now, OK


  BlockDriverState *old_bs = child->bs;
  int new_bs_quiesce_counter;
  int drain_saldo;
@@ -2767,7 +2768,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
  BdrvChild *child = *s->child;
  BlockDriverState *bs = child->bs;
  
-bdrv_replace_child_noperm(child, NULL);

+bdrv_replace_child_noperm(s->child, NULL);


More interesting. Currently bdrv_replace_child_tran() clear the pointer as last action, so 
later we can remove this last "*s->child = NULL" as bdrv_replace_child_noperm() 
will do it.
No harm: in the the function we use local variable, initialized as *s->child.

  
  if (bdrv_get_aio_context(bs) != s->old_child_ctx) {

  bdrv_try_set_aio_context(bs, s->old_child_ctx, _abort);
@@ -2867,7 +2868,7 @@ static int bdrv_attach_child_common(BlockDriverState 
*child_bs,
  }
  
  bdrv_ref(child_bs);

-bdrv_replace_child_noperm(new_child, child_bs);
+bdrv_replace_child_noperm(_child, child_bs);


Here child_bs must not be NULL, otherwise bdrv_ref() crashes. So, nothing would 
be cleared.

  
  *child = new_child;
  
@@ -2922,12 +2923,12 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,

  return 0;
  }
  
-static void bdrv_detach_child(BdrvChild *child)

+static void bdrv_detach_child(BdrvChild **childp)
  {
-BlockDriverState *old_bs = child->bs;
+BlockDriverState *old_bs = (*childp)->bs;
  
-bdrv_replace_child_noperm(child, NULL);

-bdrv_child_free(child);
+bdrv_replace_child_noperm(childp, NULL);


And here for sure we'll clear the pointer


+bdrv_child_free(*childp);


This obviously should be changed in further patches

  
  if (old_bs) {

  /*
@@ -3033,7 +3034,7 @@ void bdrv_root_unref_child(BdrvChild *child)
  BlockDriverState *child_bs;
  
  child_bs = child->bs;

-bdrv_detach_child(child);
+bdrv_detach_child();


 - no sence 

Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux

2021-11-12 Thread Markus Armbruster
Juan Quintela  writes:

> Leonardo Bras  wrote:
>> Add property that allows zerocopy migration of memory pages,
>> and also includes a helper function migrate_use_zerocopy() to check
>> if it's enabled.
>>
>> No code is introduced to actually do the migration, but it allow
>> future implementations to enable/disable this feature.
>>
>> On non-Linux builds this parameter is compiled-out.
>>
>> Signed-off-by: Leonardo Bras 
>
> Hi
>
>> +# @zerocopy: Controls behavior on sending memory pages on migration.
>> +#When true, enables a zerocopy mechanism for sending memory
>> +#pages, if host supports it.
>> +#Defaults to false. (Since 6.2)
>> +#
>
> This needs to be changed to next release, but not big deal.

Rename to zero-copy while there.  QAPI/QMP strongly prefer separating
words with dashes.  "zerocopy" is not a word, "zero" and "copy" are.

[...]




Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux

2021-11-12 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Fri, Nov 12, 2021 at 12:04:33PM +0100, Juan Quintela wrote:
>> Leonardo Bras  wrote:

[...]

>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index abaf6f9e3d..add3dabc56 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -886,6 +886,10 @@ MigrationParameters 
>> > *qmp_query_migrate_parameters(Error **errp)
>> >  params->multifd_zlib_level = s->parameters.multifd_zlib_level;
>> >  params->has_multifd_zstd_level = true;
>> >  params->multifd_zstd_level = s->parameters.multifd_zstd_level;
>> > +#ifdef CONFIG_LINUX
>> > +params->has_zerocopy = true;
>> > +params->zerocopy = s->parameters.zerocopy;
>> > +#endif
>> >  params->has_xbzrle_cache_size = true;
>> >  params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
>> >  params->has_max_postcopy_bandwidth = true;
>> > @@ -1538,6 +1542,11 @@ static void 
>> > migrate_params_test_apply(MigrateSetParameters *params,
>> >  if (params->has_multifd_compression) {
>> >  dest->multifd_compression = params->multifd_compression;
>> >  }
>> > +#ifdef CONFIG_LINUX
>> > +if (params->has_zerocopy) {
>> > +dest->zerocopy = params->zerocopy;
>> > +}
>> > +#endif
>> >  if (params->has_xbzrle_cache_size) {
>> >  dest->xbzrle_cache_size = params->xbzrle_cache_size;
>> >  }
>> > @@ -1650,6 +1659,11 @@ static void 
>> > migrate_params_apply(MigrateSetParameters *params, Error **errp)
>> >  if (params->has_multifd_compression) {
>> >  s->parameters.multifd_compression = params->multifd_compression;
>> >  }
>> > +#ifdef CONFIG_LINUX
>> > +if (params->has_zerocopy) {
>> > +s->parameters.zerocopy = params->zerocopy;
>> > +}
>> > +#endif
>> 
>> After seing all this CONFIG_LINUX mess, I am not sure that it is a good
>> idea to add the parameter only for LINUX.  It appears that it is better
>> to add it for all OS's and just not allow to set it to true there.
>> 
>> But If QAPI/QOM people preffer that way, I am not going to get into the 
>> middle.
>
> I don't like all the conditionals either, but QAPI design wants the
> conditionals, as that allows mgmt apps to query whether the feature
> is supported in a build or not.

Specifically, the conditionals keep @zerocopy out of query-qmp-schema
(a.k.a. schema introspection) when it's not actually supported.

This lets management applications recognize zero-copy support.

Without conditionals, the only way to probe for it is trying to switch
it on.  This is inconvenient and error-prone.

Immature ideas to avoid conditionals:

1. Make *values* conditional, i.e. unconditional false, but true only if
CONFIG_LINUX.  The QAPI schema language lets you do this for
enumerations today, but not for bool.

2. A new kind of conditional that only applies to schema introspection,
so you can eat your introspection cake and keep the #ifdef-less code
cake (and the slight binary bloat that comes with it).




Re: [PATCH] qmp: Stabilize preconfig

2021-11-12 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 11/11/21 15:37, Markus Armbruster wrote:
>>> 1) PHASE_NO_MACHINE - backends can already be created here, but no
>>> machine exists yet
>>>
>>> 2) PHASE_MACHINE_CREATED - the machine object has been created.  It's
>>> not initialized, but it's there.
>>>
>>> 3) PHASE_ACCEL_CREATED - the accelerator object has been created.  The
>>> accelerator needs the machine object, because for example KVM might
>>> not support all machine types.  So the accelerator queries the machine
>>> object and fails creation in case of incompatibility.  This enables
>>> e.g. fallback to TCG.  -preconfig starts the monitor here.
>> 
>> We should be able to start monitors first, if we put in the work.
>
> The monitor starts, the question is the availability of the event loop. 

What does the event loop depend on?

>   This requires a command (or a something) to advance to the next phase. 
>x-exit-preconfig is such a command.
>
> In addition, one thing I don't like of preconfig is that command line 
> arguments linger until they are triggered by x-exit-preconfig.  Adding 
> more such commands makes things worse.

Yes, that's ugly.  I'd prefer command line left to right, and then QMP
commands in order.  If your command line advances the phase too far for
your QMP commands, then that's your own fault.

>>> 4) PHASE_MACHINE_INIT - machine initialization consists mostly in
>>> creating the onboard devices.  For this to happen, the machine needs
>>> to learn about the accelerator, because onboard devices include CPUs
>>> and other accelerator-dependent devices.  Devices plugged in this
>>> phase are cold-plugged.
>>>
>>> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
>>> and the VM is ready.  Devices plugged in this phase already count as
>>> hot-plugged.  -S starts the monitor here.
>> 
>> Remind us: what work is done in the machine init done notifiers?
>
> It depends, but---generally speaking---what they do applies only to 
> cold-plugged devices.  For example, fw_cfg gathers the boot order in the 
> machine-init-done notifier (via get_boot_devices_list).
>
>> What exactly necessitates "count as hot-plugged"?  Is it something done
>> in these notifiers?
>
> It depends on the bus.  It boils down to this code in device_initfn:
>
>  if (phase_check(PHASE_MACHINE_READY)) {
>  dev->hotplugged = 1;
>  qdev_hot_added = true;
>  }
>
> For example, hotplugged PCI devices must define function 0 last; 
> coldplugged PCI devices can define functions in any order 
> (do_pci_register_device, called by pci_qdev_realize).
>
> Another example, a device_add after machine-done causes an ACPI hotplug 
> event, because acpi_pcihp_device_plug_cb checks dev->hotplugged.

Worse, if the guest doesn't play ball, the device remains in hot plug
limbo.

Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
the plug is hot) instead of earlier (when it's cold)?

>>> x-exit-preconfig goes straight from PHASE_ACCEL_CREATED to
>>> PHASE_MACHINE_READY.  Devices can only be created after
>>> PHASE_MACHINE_INIT, so device_add cannot be enabled at preconfig
>>> stage.
>> 
>> Now I am confused again.  Can you cold plug devices with device_add in
>> presence of -preconfig, and if yes, how?
>
> No, because the monitor goes directly from a point where device_add 
> fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
> (PHASE_MACHINE_READY).

Bummer.

>> Related question: when exactly in these phases do we create devices
>> specified with -device?
>
> In PHASE_MACHINE_INIT---that is, after the machine has been initialized 
> and before machine-done-notifiers have been called.

In other words, you should never use device_add where -device would do,
because the latter gives you cold plug (which is simple and reliable),
and the former hot plug (which is the opposite).

>>> With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
>>> reached with a machine-set command (corresponding to the
>>> non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
>>> reached with an accel-set command (corresponding to -accel).
>> 
>> I don't think this depends on "pure-QMP configuration flow".  -machine
>> and -accel could advance the phase just like their buddies machine-set
>> and accel-set.
>
> They already do (see qemu_init's calls to phase_advance).
>
>> State transition diagram:
>> 
>>  PHASE_NO_MACHINE (initial state)
>>  |
>>  |  -machine or machine-set
>>  v
>>  PHASE_MACHINE_CREATED
>>  |
>>  |  -accel or accel-set
>>  v
>>  PHASE_ACCEL_CREATED
>>  |
>>  |  ???
>
> qmp_x_exit_preconfig() -> qemu_init_board() -> machine_run_board_init()

I read this as "the state transition happens in
machine_run_board_init(), called from qmp_x_exit_preconfig() via
qemu_init_board()".

>>  v
>>  PHASE_MACHINE_INIT
>>  |
>>  |  

Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-12 Thread Kevin Wolf
Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben:
> On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > > vhost-user-blk realize only attempts to reconnect if the previous
> > > connection attempt failed on "a problem with the connection and not an
> > > error related to the content (which would fail again the same way in the
> > > next attempt)".
> > > 
> > > However this distinction is very subtle, and may be inadvertently broken
> > > if the code changes somewhere deep down the stack and a new error gets
> > > propagated up to here.
> > > 
> > > OTOH now that the number of reconnection attempts is limited it seems
> > > harmless to try reconnecting on any error.
> > > 
> > > So relax the condition of whether to retry connecting to check for any
> > > error.
> > > 
> > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > > during realize".
> > > 
> > > Signed-off-by: Roman Kagan 
> > 
> > It results in less than perfect error messages. With a modified export
> > that just crashes qemu-storage-daemon during get_features, I get:
> > 
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read 
> > msg header. Read 0 instead of 12. Original request 1.
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: vhost_backend_init failed: Protocol error
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: Failed to connect to '/tmp/vsock': Connection refused
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: Failed to connect to '/tmp/vsock': Connection refused
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect 
> > to '/tmp/vsock': Connection refused
> 
> This patch doesn't change any error messages.  Which ones specifically
> became less than perfect as a result of this patch?

But it adds error messages (for each retry), which are different from
the first error message. As I said this is not the end of the world, but
maybe a bit more confusing.

> > I guess this might be tolerable. On the other hand, the patch doesn't
> > really fix anything either, but just gets rid of possible subtleties.
> 
> The remaining patches in the series make other errors beside -EPROTO
> propagate up to this point, and some (most) of them are retryable.  This
> was the reason to include this patch at the beginning of the series (I
> guess I should've mentioned that in the patch log).

I see. I hadn't looked at the rest of the series yet because I ran out
of time, but now that I'm skimming them, I see quite a few places that
use non-EPROTO, but I wonder which of them actually should be
reconnected. So far all I saw were presumably persistent errors where a
retry won't help. Can you give me some examples?

Kevin




Re: [PATCH v4 03/25] assertions for block global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block.c| 136 +++--
  block/commit.c |   2 +
  block/io.c |  20 
  blockdev.c |   1 +
  4 files changed, 156 insertions(+), 3 deletions(-)


bdrv_make_zero() seems missing here – it can be considered an I/O or a 
GS function, but patch 2 classified it as GS.


Hanna




[PULL 2/3] spapr_numa.c: fix FORM1 distance-less nodes

2021-11-12 Thread Cédric Le Goater
From: Daniel Henrique Barboza 

Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.

During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node   0   1   2   3
  0:  10  160  160  160
  1:  160  10  160  160
  2:  160  160  10  160
  3:  160  160  160  10

Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a99 applies to FORM1 as well.

This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.

After this patch, the FORM1 guest mentioned above will have the
following topology:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10

This is compatible with what FORM2 guests and other archs do in this
case.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html

Fixes: 690fbe4295d5 ("spapr_numa: consider user input when defining 
associativity")
CC: Aneesh Kumar K.V 
CC: Nicholas Piggin 
Reviewed-by: Richard Henderson 
Signed-off-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
---
 hw/ppc/spapr_numa.c | 62 ++---
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 56ab2a5fb649..e9ef7e764696 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -66,16 +66,41 @@ static const uint32_t *get_associativity(SpaprMachineState 
*spapr, int node_id)
 return spapr->FORM1_assoc_array[node_id];
 }
 
+/*
+ * Wrapper that returns node distance from ms->numa_state->nodes
+ * after handling edge cases where the distance might be absent.
+ */
+static int get_numa_distance(MachineState *ms, int src, int dst)
+{
+NodeInfo *numa_info = ms->numa_state->nodes;
+int ret = numa_info[src].distance[dst];
+
+if (ret != 0) {
+return ret;
+}
+
+/*
+ * In case QEMU adds a default NUMA single node when the user
+ * did not add any, or where the user did not supply distances,
+ * the distance will be absent (zero). Return local/remote
+ * distance in this case.
+ */
+if (src == dst) {
+return NUMA_DISTANCE_MIN;
+}
+
+return NUMA_DISTANCE_DEFAULT;
+}
+
 static bool spapr_numa_is_symmetrical(MachineState *ms)
 {
-int src, dst;
 int nb_numa_nodes = ms->numa_state->num_nodes;
-NodeInfo *numa_info = ms->numa_state->nodes;
+int src, dst;
 
 for (src = 0; src < nb_numa_nodes; src++) {
 for (dst = src; dst < nb_numa_nodes; dst++) {
-if (numa_info[src].distance[dst] !=
-numa_info[dst].distance[src]) {
+if (get_numa_distance(ms, src, dst) !=
+get_numa_distance(ms, dst, src)) {
 return false;
 }
 }
@@ -133,7 +158,6 @@ static uint8_t spapr_numa_get_numa_level(uint8_t distance)
 static void spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
 {
 MachineState *ms = MACHINE(spapr);
-NodeInfo *numa_info = ms->numa_state->nodes;
 int nb_numa_nodes = ms->numa_state->num_nodes;
 int src, dst, i, j;
 
@@ -170,7 +194,7 @@ static void 
spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
  * The PPC kernel expects the associativity domains of node 0 to
  * be always 0, and this algorithm will grant that by default.
  */
-uint8_t distance = numa_info[src].distance[dst];
+uint8_t distance = get_numa_distance(ms, src, dst);
 uint8_t n_level = spapr_numa_get_numa_level(distance);
 uint32_t assoc_src;
 
@@ -498,7 +522,6 @@ static void 
spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
void *fdt, int rtas)
 {
 MachineState *ms = MACHINE(spapr);
-NodeInfo *numa_info = ms->numa_state->nodes;
 int nb_numa_nodes = ms->numa_state->num_nodes;
 int distance_table_entries = nb_numa_nodes * nb_numa_nodes;
 g_autofree uint32_t *lookup_index_table = NULL;
@@ -540,30 +563,7 @@ static void 

[PULL 0/3] ppc 6.2 queue

2021-11-12 Thread Cédric Le Goater
The following changes since commit 0a70bcf18caf7a61d480f8448723c15209d128ef:

  Update version for v6.2.0-rc0 release (2021-11-09 18:22:57 +0100)

are available in the Git repository at:

  https://github.com/legoater/qemu/ tags/pull-ppc-2022

for you to fetch changes up to d139786e1b3d67991e6cb49a8a59bb2182350285:

  ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb() 
(2021-11-11 11:35:13 +0100)


ppc 6.2 queue :

* Fix of a regression in floating point load instructions (Matheus)
* Associativity fix for pseries machine (Daniel)
* tlbivax fix for BookE machines (Danel)


Daniel Henrique Barboza (2):
  spapr_numa.c: fix FORM1 distance-less nodes
  ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb()

Matheus Ferst (1):
  target/ppc: Fix register update on lf[sd]u[x]/stf[sd]u[x]

 hw/ppc/spapr_numa.c| 62 +++---
 target/ppc/mmu_helper.c|  2 +-
 target/ppc/translate/fp-impl.c.inc |  2 +-
 3 files changed, 33 insertions(+), 33 deletions(-)



[PATCH for 6.2 v3 5/5] tests: bios-tables-test update expected blobs

2021-11-12 Thread Igor Mammedov
The changes are the result of
'hw/i386/acpi-build: Deny control on PCIe Native Hot-Plug in _OSC'
which hides PCIE hotplug bit in host-bridge _OSC

Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
 {
 CreateDWordField (Arg3, Zero, CDW1)
 If ((Arg0 == ToUUID ("33db4d5b-1ff7-401c-9657-7441c03dd766") 
/* PCI Host Bridge Device */))
 {
 CreateDWordField (Arg3, 0x04, CDW2)
 CreateDWordField (Arg3, 0x08, CDW3)
 Local0 = CDW3 /* \_SB_.PCI0._OSC.CDW3 */
-Local0 &= 0x1F
+Local0 &= 0x1E

Signed-off-by: Igor Mammedov 
---
 tests/qtest/bios-tables-test-allowed-diff.h |  16 
 tests/data/acpi/q35/DSDT| Bin 8289 -> 8289 bytes
 tests/data/acpi/q35/DSDT.acpihmat   | Bin 9614 -> 9614 bytes
 tests/data/acpi/q35/DSDT.bridge | Bin 11003 -> 11003 bytes
 tests/data/acpi/q35/DSDT.cphp   | Bin 8753 -> 8753 bytes
 tests/data/acpi/q35/DSDT.dimmpxm| Bin 9943 -> 9943 bytes
 tests/data/acpi/q35/DSDT.ipmibt | Bin 8364 -> 8364 bytes
 tests/data/acpi/q35/DSDT.ivrs   | Bin 8306 -> 8306 bytes
 tests/data/acpi/q35/DSDT.memhp  | Bin 9648 -> 9648 bytes
 tests/data/acpi/q35/DSDT.mmio64 | Bin 9419 -> 9419 bytes
 tests/data/acpi/q35/DSDT.multi-bridge   | Bin 8583 -> 8583 bytes
 tests/data/acpi/q35/DSDT.nohpet | Bin 8147 -> 8147 bytes
 tests/data/acpi/q35/DSDT.numamem| Bin 8295 -> 8295 bytes
 tests/data/acpi/q35/DSDT.tis.tpm12  | Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.tis.tpm2   | Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.xapic  | Bin 35652 -> 35652 bytes
 16 files changed, 16 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 48e5634d4b..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,17 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/q35/DSDT",
-"tests/data/acpi/q35/DSDT.tis",
-"tests/data/acpi/q35/DSDT.bridge",
-"tests/data/acpi/q35/DSDT.mmio64",
-"tests/data/acpi/q35/DSDT.ipmibt",
-"tests/data/acpi/q35/DSDT.cphp",
-"tests/data/acpi/q35/DSDT.memhp",
-"tests/data/acpi/q35/DSDT.acpihmat",
-"tests/data/acpi/q35/DSDT.numamem",
-"tests/data/acpi/q35/DSDT.dimmpxm",
-"tests/data/acpi/q35/DSDT.nohpet",
-"tests/data/acpi/q35/DSDT.tis.tpm2",
-"tests/data/acpi/q35/DSDT.tis.tpm12",
-"tests/data/acpi/q35/DSDT.multi-bridge",
-"tests/data/acpi/q35/DSDT.ivrs",
-"tests/data/acpi/q35/DSDT.xapic",
diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
index 
281fc82c03b2562d2e6b7caec0d817b034a47138..c1965f6051ef2af81dd8412abe169d87845bb033
 100644
GIT binary patch
delta 24
gcmaFp@X&$FCDET<0BnZ{w*UYD

delta 24
gcmaFp@X&$FCDET<0BnK?w*UYD

diff --git a/tests/data/acpi/q35/DSDT.acpihmat 
b/tests/data/acpi/q35/DSDT.acpihmat
index 
8c1e05a11a328ec1cc6f86e36e52c28f41f9744e..f24d4874bff8d327a165ed7c36de507aea114edd
 100644
GIT binary patch
delta 24
fcmeD4?(^ny33dtTQ)OUa+&+=(3ZvY{`|DKzU@Hhn

delta 24
fcmeD4?(^ny33dtTQ)OUa+%}Qx3ZwkS`|DKzU?vDi

diff --git a/tests/data/acpi/q35/DSDT.bridge b/tests/data/acpi/q35/DSDT.bridge
index 
6f1464b6c712d7f33cb4b891b7ce76fe228f44c9..424d51bd1cb39ea73501ef7d0044ee52cec5bdac
 100644
GIT binary patch
delta 24
gcmewz`a6`%CDF$oF>WH)6-K#@_k$DxTWtqt

delta 24
fcmdn!veAXhCDF$oF?J%?6-N1u_k$DxTWAMo

diff --git a/tests/data/acpi/q35/DSDT.dimmpxm b/tests/data/acpi/q35/DSDT.dimmpxm
index 
fe5820d93d057ef09a001662369b15afbc5b87e2..76e451e829ec4c245315f7eed8731aa1be45a747
 100644
GIT binary patch
delta 24
gcmccad)=4ICDD~$3R?@yKo0BrFHn*aa+

diff --git a/tests/data/acpi/q35/DSDT.memhp b/tests/data/acpi/q35/DSDT.memhp
index 
9bc11518fc57687ca789dc70793b48b29a0d74ed..4e9cb3dc6896bb79ccac0fe342a404549f6610e8
 100644
GIT binary patch
delta 24
gcmdnsy}_HyCD7Sg;8$f{S^uTTRaEr

delta 24
fcmZp7Zg=K#33dr-S7cyd?48JUg;9Rv{S^uTTQ>*m

diff --git a/tests/data/acpi/q35/DSDT.nohpet b/tests/data/acpi/q35/DSDT.nohpet
index 
e8202e6ddfbe96071f32f1ec05758f650569943e..83d1aa00ac5686df479673fb0d7830f946e25dea
 100644
GIT binary patch
delta 24
gcmca?f7zbPCDn+a

delta 24
gcmaFv@Z5pRCDn+a

diff --git a/tests/data/acpi/q35/DSDT.tis.tpm12 
b/tests/data/acpi/q35/DSDT.tis.tpm12
index 
c96b5277a14ae98174408d690d6e0246bd932623..0ebdf6fbd77967f1ab5d5337b7b1fed314cfaca8
 100644
GIT binary patch
delta 24
gcmdnzy3du%CDyjp@iVCN7s?mk^h31_s6r6S=N1%5A)#+64f6_X

delta 26
icmX>yjp@iVCN7s?mk^h31_s9U6S=N1%5S`%+64f6@(F(c

-- 
2.27.0




[PULL 1/3] target/ppc: Fix register update on lf[sd]u[x]/stf[sd]u[x]

2021-11-12 Thread Cédric Le Goater
From: Matheus Ferst 

These instructions should update the GPR indicated by the field RA
instead of RT. This error caused a regression on Mac OS 9 boot and some
graphical glitches in OS X.

Fixes: a39a106634a9 ("target/ppc: Move load and store floating point 
instructions to decodetree")
Reported-by: Mark Cave-Ayland 
Tested-by: Mark Cave-Ayland 
Signed-off-by: Matheus Ferst 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/translate/fp-impl.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index d1dbb1b96b16..c9e05201d9e7 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -1328,7 +1328,7 @@ static bool do_lsfpsd(DisasContext *ctx, int rt, int ra, 
TCGv displ,
 set_fpr(rt, t0);
 }
 if (update) {
-tcg_gen_mov_tl(cpu_gpr[rt], ea);
+tcg_gen_mov_tl(cpu_gpr[ra], ea);
 }
 tcg_temp_free_i64(t0);
 tcg_temp_free(ea);
-- 
2.31.1




Re: [PATCH 0/6] RfC: try improve native hotplug for pcie root ports

2021-11-12 Thread Gerd Hoffmann
On Thu, Nov 11, 2021 at 10:39:59AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 01:09:05PM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > When the acpihp driver is used the linux kernel will just call the aml
> > > methods and I suspect the pci device will stay invisible then because
> > > nobody flips the slot power control bit (with native-hotplug=on, for
> > > native-hotplug=off this isn't a problem of course).
> > 
> > Hmm, on a quick smoke test with both patch series (mine + igors) applied
> > everything seems to work fine on a quick glance.  Dunno why.  Maybe the
> > pcieport driver turns on slot power even in case pciehp is not active.

Digged deeper.  Updating power status is handled by the plug() callback,
which is never called in case acpi hotplug is active.  The guest seems
to never touch slot power control either, so it's working fine.  Still
feels a bit fragile though.

> Well power and hotplug capabilities are mostly unrelated, right?

At least they are separate slot capabilities.  The linux pciehp driver
checks whenever the power control is present before using it, so having
PwrCtrl- HotPlug+ seems to be a valid combination.

We even have an option for that: pcie-root-port.power_controller_present

So flipping that to off in case apci hotplug is active should make sure
we never run into trouble with pci devices being powered off.

Igor?  Can you add that to your patch series?

> I feel switching to native so late would be inappropriate, looks more
> like a feature than a bugfix. Given that - we need Igor's patches.
> Given that - would you say I should apply yours?

I think when setting power_controller_present=off for acpi hotplug it is
safe to merge both mine and igor's.

take care,
  Gerd




[PULL 3/3] ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb()

2021-11-12 Thread Cédric Le Goater
From: Daniel Henrique Barboza 

'tlbivax' is implemented by gen_tlbivax_booke206() via
gen_helper_booke206_tlbivax(). In case the TLB needs to be flushed,
booke206_invalidate_ea_tlb() is called. All these functions, but
booke206_invalidate_ea_tlb(), uses a 64-bit effective address 'ea'.

booke206_invalidate_ea_tlb() uses an uint32_t 'ea' argument that
truncates the original 'ea' value for apparently no particular reason.
This function retrieves the tlb pointer by calling booke206_get_tlbm(),
which also uses a target_ulong address as parameter - in this case, a
truncated 'ea' address. All the surrounding logic considers the
effective TLB address as a 64 bit value, aside from the signature of
booke206_invalidate_ea_tlb().

Last but not the least, PowerISA 2.07B section 6.11.4.9 [2] makes it
clear that the effective address "EA" is a 64 bit value.

Commit 01662f3e5133 introduced this code and no changes were made ever
since. An user detected a problem with tlbivax [1] stating that this
address truncation was the cause. This same behavior might be the source
of several subtle bugs that were never caught.

For all these reasons, this patch assumes that this address truncation
is the result of a mistake/oversight of the original commit, and changes
booke206_invalidate_ea_tlb() 'ea' argument to 'vaddr'.

[1] https://gitlab.com/qemu-project/qemu/-/issues/52
[2] https://wiki.raptorcs.com/wiki/File:PowerISA_V2.07B.pdf

Fixes: 01662f3e5133 ("PPC: Implement e500 (FSL) MMU")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/52
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/mmu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index 2cb98c516987..e0c4950dda53 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -1216,7 +1216,7 @@ void helper_booke206_tlbsx(CPUPPCState *env, target_ulong 
address)
 }
 
 static inline void booke206_invalidate_ea_tlb(CPUPPCState *env, int tlbn,
-  uint32_t ea)
+  vaddr ea)
 {
 int i;
 int ways = booke206_tlb_ways(env, tlbn);
-- 
2.31.1




[PATCH for 6.2 v3 3/5] bios-tables-test: Allow changes in DSDT ACPI tables

2021-11-12 Thread Igor Mammedov
From: Julia Suvorova 

Prepare for changing the _OSC method in q35 DSDT.

Signed-off-by: Julia Suvorova 
Signed-off-by: Igor Mammedov 
Acked-by: Ani Sinha 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..48e5634d4b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,17 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.tis",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.acpihmat",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.dimmpxm",
+"tests/data/acpi/q35/DSDT.nohpet",
+"tests/data/acpi/q35/DSDT.tis.tpm2",
+"tests/data/acpi/q35/DSDT.tis.tpm12",
+"tests/data/acpi/q35/DSDT.multi-bridge",
+"tests/data/acpi/q35/DSDT.ivrs",
+"tests/data/acpi/q35/DSDT.xapic",
-- 
2.27.0




Re: [PATCH 05/10] vhost-backend: avoid overflow on memslots_limit

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 09:56:17AM +, Daniel P. Berrangé wrote:
> On Fri, Nov 12, 2021 at 10:46:46AM +0300, Roman Kagan wrote:
> > On Thu, Nov 11, 2021 at 06:59:43PM +0100, Philippe Mathieu-Daudé wrote:
> > > On 11/11/21 16:33, Roman Kagan wrote:
> > > > Fix the (hypothetical) potential problem when the value parsed out of
> > > > the vhost module parameter in sysfs overflows the return value from
> > > > vhost_kernel_memslots_limit.
> > > > 
> > > > Signed-off-by: Roman Kagan 
> > > > ---
> > > >  hw/virtio/vhost-backend.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > index b65f8f7e97..44f7dbb243 100644
> > > > --- a/hw/virtio/vhost-backend.c
> > > > +++ b/hw/virtio/vhost-backend.c
> > > > @@ -58,7 +58,7 @@ static int vhost_kernel_memslots_limit(struct 
> > > > vhost_dev *dev)
> > > >  if 
> > > > (g_file_get_contents("/sys/module/vhost/parameters/max_mem_regions",
> > > >  , NULL, NULL)) {
> > > >  uint64_t val = g_ascii_strtoull(s, NULL, 10);
> > > 
> > > Would using qemu_strtou64() simplify this?
> > 
> > I'm afraid not.  None of the existing strtoXX converting functions has
> > the desired output range (0 < retval < INT_MAX), so the following
> > condition will remain necessary anyway; then it doesn't seem to matter
> > which particular parser is used to extract the value which is in the
> > range, so I left the one that was already there to reduce churn.
> 
> If  qemu_strtou64() can't handle all values in (0 < retval < INT_MAX)
> isn't that a bug in qemu_strtou64 ?

I must have been unclear.  It sure can handle all values in this range;
the point is that the range check after it would still be needed, so
switching from g_ascii_strtoull to qemu_strtoXX saves nothing, therefore
I left it as it was.

Thanks,
Roman.



[PATCH for 6.2 v3 4/5] hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC

2021-11-12 Thread Igor Mammedov
From: Julia Suvorova 

There are two ways to enable ACPI PCI Hot-plug:

* Disable the Hot-plug Capable bit on PCIe slots.

This was the first approach which led to regression [1-2], as
I/O space for a port is allocated only when it is hot-pluggable,
which is determined by HPC bit.

* Leave the HPC bit on and disable PCIe Native Hot-plug in _OSC
  method.

This removes the (future) ability of hot-plugging switches with PCIe
Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
bridges. If the user wants to explicitely use this feature, they can
disable ACPI PCI Hot-plug with:
--global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off

Change the bit in _OSC method so that the OS selects ACPI PCI Hot-plug
instead of PCIe Native.

[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409

Signed-off-by: Julia Suvorova 
Signed-off-by: Igor Mammedov 
---
v2:
  - (mst)
  * drop local hotplug var and opencode it
  * rename acpi_pcihp parameter to enable_native_pcie_hotplug
to reflect what it actually does

tested:
  with hotplugging nic into 1 root port with seabios/ovmf/Fedora34
  Windows tested only with seabios (using exiting images)
  (installer fails to install regardless on bios)
---
 hw/i386/acpi-build.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a3ad6abd33..a99c6e4fe3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1337,7 +1337,7 @@ static void build_x86_acpi_pci_hotplug(Aml *table, 
uint64_t pcihp_addr)
 aml_append(table, scope);
 }
 
-static Aml *build_q35_osc_method(void)
+static Aml *build_q35_osc_method(bool enable_native_pcie_hotplug)
 {
 Aml *if_ctx;
 Aml *if_ctx2;
@@ -1359,8 +1359,10 @@ static Aml *build_q35_osc_method(void)
 /*
  * Always allow native PME, AER (no dependencies)
  * Allow SHPC (PCI bridges can have SHPC controller)
+ * Disable PCIe Native Hot-plug if ACPI PCI Hot-plug is enabled.
  */
-aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
+aml_append(if_ctx, aml_and(a_ctrl,
+aml_int(0x1E | (enable_native_pcie_hotplug ? 0x1 : 0x0)), a_ctrl));
 
 if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1;
 /* Unknown revision */
@@ -1449,7 +1451,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
 aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
-aml_append(dev, build_q35_osc_method());
+aml_append(dev, build_q35_osc_method(!pm->pcihp_bridge_en));
 aml_append(sb_scope, dev);
 if (mcfg_valid) {
 aml_append(sb_scope, build_q35_dram_controller());
@@ -1565,7 +1567,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 if (pci_bus_is_express(bus)) {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
 aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
-aml_append(dev, build_q35_osc_method());
+
+/* Expander bridges do not have ACPI PCI Hot-plug enabled */
+aml_append(dev, build_q35_osc_method(true));
 } else {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
 }
-- 
2.27.0




[PATCH for 6.2 v3 2/5] hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type

2021-11-12 Thread Igor Mammedov
From: Julia Suvorova 

To solve issues [1-2] the Hot Plug Capable bit in PCIe Slots will be
turned on, while the switch to ACPI Hot-plug will be done in the
DSDT table.

Introducing 'x-keep-native-hpc' property disables the HPC bit only
in 6.1 and as a result keeps the forced 'reserve-io' on
pcie-root-ports in 6.1 too.

[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409

Signed-off-by: Julia Suvorova 
Signed-off-by: Igor Mammedov 
---
v2:
   * s/native-hpc-bit/x-native-hotplug/ to fix conflict
---
 include/hw/acpi/ich9.h |  1 +
 hw/acpi/ich9.c | 18 ++
 hw/i386/pc.c   |  2 ++
 hw/i386/pc_q35.c   |  7 ++-
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index f04f1791bd..7ca92843c6 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -56,6 +56,7 @@ typedef struct ICH9LPCPMRegs {
 AcpiCpuHotplug gpe_cpu;
 CPUHotplugState cpuhp_state;
 
+bool keep_pci_slot_hpc;
 bool use_acpi_hotplug_bridge;
 AcpiPciHpState acpi_pci_hotplug;
 MemHotplugState acpi_memory_hotplug;
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1ee2ba2c50..ebe08ed831 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -419,6 +419,20 @@ static void ich9_pm_set_acpi_pci_hotplug(Object *obj, bool 
value, Error **errp)
 s->pm.use_acpi_hotplug_bridge = value;
 }
 
+static bool ich9_pm_get_keep_pci_slot_hpc(Object *obj, Error **errp)
+{
+ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+return s->pm.keep_pci_slot_hpc;
+}
+
+static void ich9_pm_set_keep_pci_slot_hpc(Object *obj, bool value, Error 
**errp)
+{
+ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+s->pm.keep_pci_slot_hpc = value;
+}
+
 void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 {
 static const uint32_t gpe0_len = ICH9_PMIO_GPE0_LEN;
@@ -428,6 +442,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 pm->disable_s4 = 0;
 pm->s4_val = 2;
 pm->use_acpi_hotplug_bridge = true;
+pm->keep_pci_slot_hpc = true;
 
 object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
>pm_io_base, OBJ_PROP_FLAG_READ);
@@ -454,6 +469,9 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 object_property_add_bool(obj, ACPI_PM_PROP_ACPI_PCIHP_BRIDGE,
  ich9_pm_get_acpi_pci_hotplug,
  ich9_pm_set_acpi_pci_hotplug);
+object_property_add_bool(obj, "x-keep-pci-slot-hpc",
+ ich9_pm_get_keep_pci_slot_hpc,
+ ich9_pm_set_keep_pci_slot_hpc);
 }
 
 void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2592a82148..a2ef40ecbc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -98,6 +98,7 @@ GlobalProperty pc_compat_6_1[] = {
 { TYPE_X86_CPU, "hv-version-id-build", "0x1bbc" },
 { TYPE_X86_CPU, "hv-version-id-major", "0x0006" },
 { TYPE_X86_CPU, "hv-version-id-minor", "0x0001" },
+{ "ICH9-LPC", "x-keep-pci-slot-hpc", "false" },
 };
 const size_t pc_compat_6_1_len = G_N_ELEMENTS(pc_compat_6_1);
 
@@ -107,6 +108,7 @@ GlobalProperty pc_compat_6_0[] = {
 { "qemu64" "-" TYPE_X86_CPU, "stepping", "3" },
 { TYPE_X86_CPU, "x-vendor-cpuid-only", "off" },
 { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" },
+{ "ICH9-LPC", "x-keep-pci-slot-hpc", "true" },
 };
 const size_t pc_compat_6_0_len = G_N_ELEMENTS(pc_compat_6_0);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index fc34b905ee..e1e100316d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -137,6 +137,7 @@ static void pc_q35_init(MachineState *machine)
 DriveInfo *hd[MAX_SATA_PORTS];
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 bool acpi_pcihp;
+bool keep_pci_slot_hpc;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -242,7 +243,11 @@ static void pc_q35_init(MachineState *machine)
   ACPI_PM_PROP_ACPI_PCIHP_BRIDGE,
   NULL);
 
-if (acpi_pcihp) {
+keep_pci_slot_hpc = object_property_get_bool(OBJECT(lpc),
+ "x-keep-pci-slot-hpc",
+ NULL);
+
+if (!keep_pci_slot_hpc && acpi_pcihp) {
 object_register_sugar_prop(TYPE_PCIE_SLOT, "x-native-hotplug",
"false", true);
 }
-- 
2.27.0




[PATCH for 6.2 v3 1/5] pcie: rename 'native-hotplug' to 'x-native-hotplug'

2021-11-12 Thread Igor Mammedov
Mark property as experimental/internal adding 'x-' prefix.

Property was introduced in 6.1 and it should have provided
ability to turn on native PCIE hotplug on port even when
ACPI PCI hotplug is in use is user explicitly sets property
on CLI. However that never worked since slot is wired to
ACPI hotplug controller.
Another non-intended usecase: disable native hotplug on slot
when APCI based hotplug is disabled, which works but slot has
'hotplug' property for this taks.

It should be relatively safe to rename it to experimental
as no users should exist for it and given that the property
is broken we don't really want to leave it around for much
longer lest users start using it.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
---
CC: qemu-sta...@nongnu.org
---
 hw/i386/pc_q35.c   | 2 +-
 hw/pci/pcie_port.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 797e09500b..fc34b905ee 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -243,7 +243,7 @@ static void pc_q35_init(MachineState *machine)
   NULL);
 
 if (acpi_pcihp) {
-object_register_sugar_prop(TYPE_PCIE_SLOT, "native-hotplug",
+object_register_sugar_prop(TYPE_PCIE_SLOT, "x-native-hotplug",
"false", true);
 }
 
diff --git a/hw/pci/pcie_port.c b/hw/pci/pcie_port.c
index da850e8dde..e95c1e5519 100644
--- a/hw/pci/pcie_port.c
+++ b/hw/pci/pcie_port.c
@@ -148,7 +148,7 @@ static Property pcie_slot_props[] = {
 DEFINE_PROP_UINT8("chassis", PCIESlot, chassis, 0),
 DEFINE_PROP_UINT16("slot", PCIESlot, slot, 0),
 DEFINE_PROP_BOOL("hotplug", PCIESlot, hotplug, true),
-DEFINE_PROP_BOOL("native-hotplug", PCIESlot, native_hotplug, true),
+DEFINE_PROP_BOOL("x-native-hotplug", PCIESlot, native_hotplug, true),
 DEFINE_PROP_END_OF_LIST()
 };
 
-- 
2.27.0




[PATCH for-6.2 v3 0/5] Fix Q35 ACPI PCI Hot-plug I/O issues

2021-11-12 Thread Igor Mammedov

 
Changelog:
  v3:
* drop unnecessary expected blobs   

  v2:   
 
* simplify [1/5] and rename property to x-native-hotplug (CC stable)
 
* [4/5] 
 
   - rename function parameter to reflect actual action 
 
   - drop local 'hotplug' variable and opencode statement   
 
* test with SeaBIOS/OVMF and Linux guest,   
 
  Windows also works with SeaBIOS, can't install it in EFI  
 
  mode on current master (it's stuck when formatting disk/or
 
  copying files to hdd).
 

 
Attempt [1] to fix I/O allocation with the 'reserve-io' hint on each
 
pcie-root-port resulted in regression [2-3]. This patchset aims to fix  
 
it by addressing the root cause of the problem - the disabled PCIe  
 
Slot HPC bit.
This series enables PCIe Slot HPC bit which allows UEFI to enumerate and
initialize resources on ports, instead we hide PCIe hotplug capability in
host-bridge's ACPI _OSC method, which effectively make guest to use
ACPI based hotplug on host-bridge attached hierarchy.

 
[1] 'hw/pcie-root-port: Fix hotplug for PCI devices requiring IO'   
 
[2] https://gitlab.com/qemu-project/qemu/-/issues/641   
 
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2006409 
 
  

Igor Mammedov (2):
  pcie: rename 'native-hotplug' to 'x-native-hotplug'
  tests: bios-tables-test update expected blobs

Julia Suvorova (3):
  hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
  bios-tables-test: Allow changes in DSDT ACPI tables
  hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC

 include/hw/acpi/ich9.h|   1 +
 hw/acpi/ich9.c|  18 ++
 hw/i386/acpi-build.c  |  12 
 hw/i386/pc.c  |   2 ++
 hw/i386/pc_q35.c  |   9 +++--
 hw/pci/pcie_port.c|   2 +-
 tests/data/acpi/q35/DSDT  | Bin 8289 -> 8289 bytes
 tests/data/acpi/q35/DSDT.acpihmat | Bin 9614 -> 9614 bytes
 tests/data/acpi/q35/DSDT.bridge   | Bin 11003 -> 11003 bytes
 tests/data/acpi/q35/DSDT.cphp | Bin 8753 -> 8753 bytes
 tests/data/acpi/q35/DSDT.dimmpxm  | Bin 9943 -> 9943 bytes
 tests/data/acpi/q35/DSDT.ipmibt   | Bin 8364 -> 8364 bytes
 tests/data/acpi/q35/DSDT.ivrs | Bin 8306 -> 8306 bytes
 tests/data/acpi/q35/DSDT.memhp| Bin 9648 -> 9648 bytes
 tests/data/acpi/q35/DSDT.mmio64   | Bin 9419 -> 9419 bytes
 tests/data/acpi/q35/DSDT.multi-bridge | Bin 8583 -> 8583 bytes
 tests/data/acpi/q35/DSDT.nohpet   | Bin 8147 -> 8147 bytes
 tests/data/acpi/q35/DSDT.numamem  | Bin 8295 -> 8295 bytes
 tests/data/acpi/q35/DSDT.tis.tpm12| Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.tis.tpm2 | Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.xapic| Bin 35652 -> 35652 bytes
 21 files changed, 37 insertions(+), 7 deletions(-)

-- 
2.27.0




Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux

2021-11-12 Thread Daniel P . Berrangé
On Fri, Nov 12, 2021 at 12:04:33PM +0100, Juan Quintela wrote:
> Leonardo Bras  wrote:
> > Add property that allows zerocopy migration of memory pages,
> > and also includes a helper function migrate_use_zerocopy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
> >
> > Signed-off-by: Leonardo Bras 
> 
> Hi
> 
> > +# @zerocopy: Controls behavior on sending memory pages on migration.
> > +#When true, enables a zerocopy mechanism for sending memory
> > +#pages, if host supports it.
> > +#Defaults to false. (Since 6.2)
> > +#
> 
> This needs to be changed to next release, but not big deal.
> 
> 
> > +#ifdef CONFIG_LINUX
> > +int migrate_use_zerocopy(void);
> 
> Please, return bool
> 
> > +#else
> > +#define migrate_use_zerocopy() (0)
> > +#endif
> 
> and false here.
> 
> I know, I know.  We are not consistent here, but the preffered way is
> the other way.
> 
> >  int migrate_use_xbzrle(void);
> >  uint64_t migrate_xbzrle_cache_size(void);
> >  bool migrate_colo_enabled(void);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index abaf6f9e3d..add3dabc56 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -886,6 +886,10 @@ MigrationParameters 
> > *qmp_query_migrate_parameters(Error **errp)
> >  params->multifd_zlib_level = s->parameters.multifd_zlib_level;
> >  params->has_multifd_zstd_level = true;
> >  params->multifd_zstd_level = s->parameters.multifd_zstd_level;
> > +#ifdef CONFIG_LINUX
> > +params->has_zerocopy = true;
> > +params->zerocopy = s->parameters.zerocopy;
> > +#endif
> >  params->has_xbzrle_cache_size = true;
> >  params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
> >  params->has_max_postcopy_bandwidth = true;
> > @@ -1538,6 +1542,11 @@ static void 
> > migrate_params_test_apply(MigrateSetParameters *params,
> >  if (params->has_multifd_compression) {
> >  dest->multifd_compression = params->multifd_compression;
> >  }
> > +#ifdef CONFIG_LINUX
> > +if (params->has_zerocopy) {
> > +dest->zerocopy = params->zerocopy;
> > +}
> > +#endif
> >  if (params->has_xbzrle_cache_size) {
> >  dest->xbzrle_cache_size = params->xbzrle_cache_size;
> >  }
> > @@ -1650,6 +1659,11 @@ static void 
> > migrate_params_apply(MigrateSetParameters *params, Error **errp)
> >  if (params->has_multifd_compression) {
> >  s->parameters.multifd_compression = params->multifd_compression;
> >  }
> > +#ifdef CONFIG_LINUX
> > +if (params->has_zerocopy) {
> > +s->parameters.zerocopy = params->zerocopy;
> > +}
> > +#endif
> 
> After seing all this CONFIG_LINUX mess, I am not sure that it is a good
> idea to add the parameter only for LINUX.  It appears that it is better
> to add it for all OS's and just not allow to set it to true there.
> 
> But If QAPI/QOM people preffer that way, I am not going to get into the 
> middle.

I don't like all the conditionals either, but QAPI design wants the
conditionals, as that allows mgmt apps to query whether the feature
is supported in a build or not.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux

2021-11-12 Thread Daniel P . Berrangé
On Fri, Nov 12, 2021 at 02:10:39AM -0300, Leonardo Bras wrote:
> Add property that allows zerocopy migration of memory pages,
> and also includes a helper function migrate_use_zerocopy() to check
> if it's enabled.
> 
> No code is introduced to actually do the migration, but it allow
> future implementations to enable/disable this feature.
> 
> On non-Linux builds this parameter is compiled-out.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  qapi/migration.json   | 18 ++
>  migration/migration.h |  5 +
>  migration/migration.c | 32 
>  migration/multifd.c   | 17 +
>  migration/socket.c|  5 +
>  monitor/hmp-cmds.c|  6 ++
>  6 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48cf0b..9534c299d7 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -730,6 +730,11 @@
>  #  will consume more CPU.
>  #  Defaults to 1. (Since 5.0)
>  #
> +# @zerocopy: Controls behavior on sending memory pages on migration.
> +#When true, enables a zerocopy mechanism for sending memory
> +#pages, if host supports it.
> +#Defaults to false. (Since 6.2)

Add

   Requires that QEMU be permitted to use locked memory for guest
   RAM pages.

Also 7.0 since this has missed the 6.2 deadline.


Both these notes apply to later in this file too



> diff --git a/migration/multifd.c b/migration/multifd.c
> index 7c9deb1921..ab8f0f97be 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask 
> *task, gpointer opaque)
>  trace_multifd_new_send_channel_async(p->id);
>  if (qio_task_propagate_error(task, _err)) {
>  goto cleanup;
> -} else {
> -p->c = QIO_CHANNEL(sioc);
> -qio_channel_set_delay(p->c, false);
> -p->running = true;
> -if (!multifd_channel_connect(p, sioc, local_err)) {
> -goto cleanup;
> -}
> -return;
>  }
>  
> +p->c = QIO_CHANNEL(sioc);
> +qio_channel_set_delay(p->c, false);
> +p->running = true;
> +if (!multifd_channel_connect(p, sioc, local_err)) {
> +goto cleanup;
> +}
> +
> +return;
> +
>  cleanup:
>  multifd_new_send_channel_cleanup(p, sioc, local_err);
>  }

This change is just a code style alteration with no relation to
zerocopy. Either remove it, or do this change in its own patch
seprate from zerocopy.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux

2021-11-12 Thread Juan Quintela
Leonardo Bras  wrote:
> Add property that allows zerocopy migration of memory pages,
> and also includes a helper function migrate_use_zerocopy() to check
> if it's enabled.
>
> No code is introduced to actually do the migration, but it allow
> future implementations to enable/disable this feature.
>
> On non-Linux builds this parameter is compiled-out.
>
> Signed-off-by: Leonardo Bras 

Hi

> +# @zerocopy: Controls behavior on sending memory pages on migration.
> +#When true, enables a zerocopy mechanism for sending memory
> +#pages, if host supports it.
> +#Defaults to false. (Since 6.2)
> +#

This needs to be changed to next release, but not big deal.


> +#ifdef CONFIG_LINUX
> +int migrate_use_zerocopy(void);

Please, return bool

> +#else
> +#define migrate_use_zerocopy() (0)
> +#endif

and false here.

I know, I know.  We are not consistent here, but the preffered way is
the other way.

>  int migrate_use_xbzrle(void);
>  uint64_t migrate_xbzrle_cache_size(void);
>  bool migrate_colo_enabled(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index abaf6f9e3d..add3dabc56 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
> **errp)
>  params->multifd_zlib_level = s->parameters.multifd_zlib_level;
>  params->has_multifd_zstd_level = true;
>  params->multifd_zstd_level = s->parameters.multifd_zstd_level;
> +#ifdef CONFIG_LINUX
> +params->has_zerocopy = true;
> +params->zerocopy = s->parameters.zerocopy;
> +#endif
>  params->has_xbzrle_cache_size = true;
>  params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
>  params->has_max_postcopy_bandwidth = true;
> @@ -1538,6 +1542,11 @@ static void 
> migrate_params_test_apply(MigrateSetParameters *params,
>  if (params->has_multifd_compression) {
>  dest->multifd_compression = params->multifd_compression;
>  }
> +#ifdef CONFIG_LINUX
> +if (params->has_zerocopy) {
> +dest->zerocopy = params->zerocopy;
> +}
> +#endif
>  if (params->has_xbzrle_cache_size) {
>  dest->xbzrle_cache_size = params->xbzrle_cache_size;
>  }
> @@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters 
> *params, Error **errp)
>  if (params->has_multifd_compression) {
>  s->parameters.multifd_compression = params->multifd_compression;
>  }
> +#ifdef CONFIG_LINUX
> +if (params->has_zerocopy) {
> +s->parameters.zerocopy = params->zerocopy;
> +}
> +#endif

After seing all this CONFIG_LINUX mess, I am not sure that it is a good
idea to add the parameter only for LINUX.  It appears that it is better
to add it for all OS's and just not allow to set it to true there.

But If QAPI/QOM people preffer that way, I am not going to get into the middle.

> diff --git a/migration/multifd.c b/migration/multifd.c
> index 7c9deb1921..ab8f0f97be 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask 
> *task, gpointer opaque)
>  trace_multifd_new_send_channel_async(p->id);
>  if (qio_task_propagate_error(task, _err)) {
>  goto cleanup;
> -} else {
> -p->c = QIO_CHANNEL(sioc);
> -qio_channel_set_delay(p->c, false);
> -p->running = true;
> -if (!multifd_channel_connect(p, sioc, local_err)) {
> -goto cleanup;
> -}
> -return;
>  }
>  
> +p->c = QIO_CHANNEL(sioc);
> +qio_channel_set_delay(p->c, false);
> +p->running = true;
> +if (!multifd_channel_connect(p, sioc, local_err)) {
> +goto cleanup;
> +}
> +
> +return;
> +
>  cleanup:
>  multifd_new_send_channel_cleanup(p, sioc, local_err);
>  }

As far as I can see, this chunk is a NOP, and it don't belong to this patch.

Later, Juan.




  1   2   >