date:20201204

[Bug 1906905] Re: qemu-system-sparc stucked while booting using ss20_v2.25_rom

2020-12-04 Thread yapkv

I have just compiled a few version from source code:

4.1.1  worked: able to boot up with -bios ss20_v2.25.rom 
5.0.0  worked: able to boot up with -bios ss20_v2.25.rom 
5.1.0  not working. Stuck after "Power-On Reset"

SS5.bin worked for 5.1.0

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906905

Title:
  qemu-system-sparc stucked while booting using ss20_v2.25_rom

Status in QEMU:
  New

Bug description:
  I cannot boot up OBP using the current (5.1) version of qemu with
  ss20_v2.25_rom. It just stuck at "Power-ON reset" and hanged.  However
  using the previous version from 2015 I can successfully both up the
  OBP.

  qemu-system-sparc -M SS-20 -m 256 -bios ss20_v2.25.rom -nographic

  Power-ON Reset

  (*hang)

  regards
  Yap KV

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906905/+subscriptions

[Bug 1906905] [NEW] qemu-system-sparc stucked while booting using ss20_v2.25_rom

2020-12-04 Thread yapkv

Public bug reported:

I cannot boot up OBP using the current (5.1) version of qemu with
ss20_v2.25_rom. It just stuck at "Power-ON reset" and hanged.  However
using the previous version from 2015 I can successfully both up the OBP.

qemu-system-sparc -M SS-20 -m 256 -bios ss20_v2.25.rom -nographic

Power-ON Reset

(*hang)

regards
Yap KV

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906905

Title:
  qemu-system-sparc stucked while booting using ss20_v2.25_rom

Status in QEMU:
  New

Bug description:
  I cannot boot up OBP using the current (5.1) version of qemu with
  ss20_v2.25_rom. It just stuck at "Power-ON reset" and hanged.  However
  using the previous version from 2015 I can successfully both up the
  OBP.

  qemu-system-sparc -M SS-20 -m 256 -bios ss20_v2.25.rom -nographic

  Power-ON Reset

  (*hang)

  regards
  Yap KV

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906905/+subscriptions

[Bug 1906156] Re: Host OS Reboot Required, for Guest kext to Load (Fully)

2020-12-04 Thread Russell Morris

OK, found my issue! :-). Still a bit odd, but virt-manager complaints about the 
custom QEMU executable => but virsh still works. So I did get the VM running, 
with,
QEMU emulator version 5.1.93 (v5.2.0-rc3)
Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

But it still performed the same. I also checked the xml file (VM
definition), and made sure to change the machine to the most current
version (pc-q35-5.2), but also no improvement.

Other things to try?

Thanks!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906156

Title:
  Host OS Reboot Required, for Guest kext to Load (Fully)

Status in QEMU:
  Incomplete

Bug description:
  Hi,

  Finding this one a bit odd, but I am loading a driver (kext) in a
  macOS guest ... and it works, on the first VM (domain) startup after a
  full / clean host OS boot (or reboot). However, if I even reboot the
  guest OS, then the driver load fails => can be "corrected" by a full
  host OS reboot (which seems very extreme).

  Is this a known issue, and/or is there a workaround?

  FYI, running,
  QEMU emulator version 5.0.0 (Debian 1:5.0-5ubuntu9.1)
  Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

  This is for a macOS guest, on a Linux host.

  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906156/+subscriptions

[Bug 1906156] Re: Host OS Reboot Required, for Guest kext to Load (Fully)

2020-12-04 Thread Russell Morris

My apologies, but I'm somewhat stuck here :-(. Trying to run the latest 
(upstream) version of QEMU, but no luck getting it to execute. I even tried 
setting securit_driver = "none", as captured here,
https://gitlab.com/apparmor/apparmor/-/wikis/Libvirt

But no luck. Open to any suggestions.

Thanks!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906156

Title:
  Host OS Reboot Required, for Guest kext to Load (Fully)

Status in QEMU:
  Incomplete

Bug description:
  Hi,

  Finding this one a bit odd, but I am loading a driver (kext) in a
  macOS guest ... and it works, on the first VM (domain) startup after a
  full / clean host OS boot (or reboot). However, if I even reboot the
  guest OS, then the driver load fails => can be "corrected" by a full
  host OS reboot (which seems very extreme).

  Is this a known issue, and/or is there a workaround?

  FYI, running,
  QEMU emulator version 5.0.0 (Debian 1:5.0-5ubuntu9.1)
  Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

  This is for a macOS guest, on a Linux host.

  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906156/+subscriptions

[Bug 1906193] Re: riscv32 user mode emulation: fork return values broken

2020-12-04 Thread Andreas K . Hüttel

This is the (statically linked) binary resulting from the source; with
it the problem can be demonstrated "standalone", without any other rv32
libraries or a complete chroot, just running the binary with qemu-
riscv32.

Generated with

(riscv-ilp32 chroot) farino /tmp # gcc -static -o wait-test-short -g
wait-test-short.c


** Attachment added: "wait-test-short"
   
https://bugs.launchpad.net/qemu/+bug/1906193/+attachment/5441136/+files/wait-test-short

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906193

Title:
  riscv32 user mode emulation: fork return values broken

Status in QEMU:
  New

Bug description:
  When running in a chroot with riscv32 (on x86_64; qemu git master as
  of today):

  The following short program forks; the child immediately returns with
  exit(42). The parent checks for the return value - and obtains 40!

  gcc-10.2

  ===
  #include 
  #include 
  #include 
  #include 

  main(c, v)
   int c;
   char **v;
  {
pid_t pid, p;
int s, i, n;

s = 0;
pid = fork();
if (pid == 0)
  exit(42);

/* wait for the process */
p = wait();
if (p != pid)
  exit (255);

if (WIFEXITED(s))
{
   int r=WEXITSTATUS(s);
   if (r!=42) {
printf("child wants to return %i (0x%X), parent received %i (0x%X), 
difference %i\n",42,42,r,r,r-42);
   }
}
  }
  ===

  (riscv-ilp32 chroot) farino /tmp # ./wait-test-short 
  child wants to return 42 (0x2A), parent received 40 (0x28), difference -2

  ===
  (riscv-ilp32 chroot) farino /tmp # gcc --version
  gcc (Gentoo 10.2.0-r1 p2) 10.2.0
  Copyright (C) 2020 Free Software Foundation, Inc.
  Dies ist freie Software; die Kopierbedingungen stehen in den Quellen. Es
  gibt KEINE Garantie; auch nicht für MARKTGÄNGIGKEIT oder FÜR SPEZIELLE ZWECKE.

  (riscv-ilp32 chroot) farino /tmp # ld --version
  GNU ld (Gentoo 2.34 p6) 2.34.0
  Copyright (C) 2020 Free Software Foundation, Inc.
  This program is free software; you may redistribute it under the terms of
  the GNU General Public License version 3 or (at your option) a later version.
  This program has absolutely no warranty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906193/+subscriptions

Re: x86 TCG helpers clobbered registers

2020-12-04 Thread Stephane Duverger

On Fri, Dec 04, 2020 at 01:35:55PM -0600, Richard Henderson wrote:

Thank you Richard for your answer. I don't want to generate a debate,
or defend the way I've done things initially. Really want to clarify
these internals. Hope it will benefit to other QEMU enthusiasts.

> You can't just inject a call anywhere you like.  If you add it at
> the IR level, then the rest of the compiler will see it and work
> properly.  If you add the call in the middle of another operation,
> the compiler doesn't get to see it and Bad Things Happen.

I do understand that, and surprisingly isn't it what is done in the
qemu slow path ? I mean, the call to the helper is not generated at IR
level but rather injected through a 'jmp' right in the middle of
currently generated instructions, plus code added at the end of the
TB.

What's the difference between the way it is currently done for the
slow path and something like:

static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
{ [...]
tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
 label_ptr, offsetof(CPUTLBEntry, addr_write));

/* TLB Hit.  */
tcg_out_qemu_st_filter(s, opc, addrlo, addrhi, datalo, datahi);
tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);

/* Record the current context of a store into ldst label */
add_qemu_ldst_label(s, false, is64, oi, datalo, datahi, addrlo, addrhi,
s->code_ptr, label_ptr);
}

Where:
static void tcg_out_qemu_st_filter(TCGContext *s, MemOp opc,
   TCGReg addrlo, TCGReg addrhi,
   TCGReg datalo, TCGReg datahi)
{
  MemOp s_bits = opc & MO_SIZE;

  tcg_out_push(s, TCG_REG_L1); // used later on by tcg_out_qemu_st_direct

  tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
  tcg_target_call_iarg_regs[0], addrlo);

  tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
  tcg_target_call_iarg_regs[1], datalo);

  tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], opc);

  tcg_out_call(s, (void*)filter_store_memop);

  tcg_out_pop(s, TCG_REG_L1);
}

Does the ldst_label mechanism generating slow path code at TB's end
change something ? There is still an injected 'jne' at
tcg_out_tlb_load() which redirects to the slow path code, whatever its
location, like I do in-place for tcg_out_qemu_st_filter.

For sure the TCG is blind at some point, but it works for the slow
path, so it should for the filter. The TCG qemu_st_i32 op is

DEF(qemu_st_i32, 0, TLADDR_ARGS + 1, 1,
TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)

And as you stated, the tcg_reg_alloc_op() had properly managed the
call clobbered registers. So we should be safe calling a helper from
tcg_out_qemu_st() and arguably that's why you do so for the slow path
?

> > I noticed that 'esp' is not shifted down before stacking up the
> > args, which might corrupt last stacked words.
> 
> No, we generate code for a constant esp, as if by gcc's
> -mno-push-args option. We have reserved TCG_STATIC_CALL_ARGS_SIZE
> bytes of stack for the arguments (which is actually larger than
> necessary for any of the tcg targets).

As this is done only at the TB prologue, do you mean that the TCG will
never generate an equivalent to a push *followed* by a memory
store/load ? Our host esp will never point to a last stacked word,
issued by the translation of a TCG op ?

[PATCH v4 4/5] configure,meson: support Control-Flow Integrity

2020-12-04 Thread Daniele Buono

This patch adds a flag to enable/disable control flow integrity checks
on indirect function calls.
This feature only allows indirect function calls at runtime to functions
with compatible signatures.

This feature is only provided by LLVM/Clang, and depends on link-time
optimization which is currently supported only with LLVM/Clang >= 6.0

We also add an option to enable a debugging version of cfi, with verbose
output in case of a CFI violation.

CFI on indirect function calls does not support calls to functions in
shared libraries (since they were not known at compile time), and such
calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
so we make modules incompatible with CFI.

All the checks are performed in meson.build. configure is only used to
forward the flags to meson

Signed-off-by: Daniele Buono 
---
 configure | 21 -
 meson.build   | 45 +
 meson_options.txt |  4 
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index fee118518b..c4e5d92167 100755
--- a/configure
+++ b/configure
@@ -400,6 +400,8 @@ coroutine=""
 coroutine_pool=""
 debug_stack_usage="no"
 crypto_afalg="no"
+cfi="disabled"
+cfi_debug="disabled"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1180,6 +1182,16 @@ for opt do
   ;;
   --disable-safe-stack) safe_stack="no"
   ;;
+  --enable-cfi)
+  cfi="enabled";
+  lto="true";
+  ;;
+  --disable-cfi) cfi="disabled"
+  ;;
+  --enable-cfi-debug) cfi_debug="enabled"
+  ;;
+  --disable-cfi-debug) cfi_debug="disabled"
+  ;;
   --disable-curses) curses="disabled"
   ;;
   --enable-curses) curses="enabled"
@@ -1760,6 +1772,13 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   sparse  sparse checker
   safe-stack  SafeStack Stack Smash Protection. Depends on
   clang/llvm >= 3.7 and requires coroutine backend ucontext.
+  cfi Enable Control-Flow Integrity for indirect function calls.
+  In case of a cfi violation, QEMU is terminated with SIGILL
+  Depends on lto and is incompatible with modules
+  Automatically enables Link-Time Optimization (lto)
+  cfi-debug   In case of a cfi violation, a message containing the line 
that
+  triggered the error is written to stderr. After the error,
+  QEMU is still terminated with SIGILL
 
   gnutls  GNUTLS cryptography support
   nettle  nettle cryptography support
@@ -7020,7 +7039,7 @@ NINJA=$ninja $meson setup \
 -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server \
--Db_lto=$lto \
+-Db_lto=$lto -Dcfi=$cfi -Dcfi_debug=$cfi_debug \
 $cross_arg \
 "$PWD" "$source_path"
 
diff --git a/meson.build b/meson.build
index ebd1c690e0..e1ae6521e0 100644
--- a/meson.build
+++ b/meson.build
@@ -773,6 +773,48 @@ elif get_option('vhost_user_blk_server').disabled() or not 
have_system
 have_vhost_user_blk_server = false
 endif
 
+if get_option('cfi').enabled()
+  cfi_flags=[]
+  # Check for dependency on LTO
+  if not get_option('b_lto')
+error('Selected Control-Flow Integrity but LTO is disabled')
+  endif
+  if config_host.has_key('CONFIG_MODULES')
+error('Selected Control-Flow Integrity is not compatible with modules')
+  endif
+  # Check for cfi flags. CFI requires LTO so we can't use
+  # get_supported_arguments, but need a more complex "compiles" which allows
+  # custom arguments
+  if cc.compiles('int main () { return 0; }', name: '-fsanitize=cfi-icall',
+ args: ['-flto', '-fsanitize=cfi-icall'] )
+cfi_flags += '-fsanitize=cfi-icall'
+  else
+error('-fsanitize=cfi-icall is not supported by the compiler')
+  endif
+  if cc.compiles('int main () { return 0; }',
+ name: '-fsanitize-cfi-icall-generalize-pointers',
+ args: ['-flto', '-fsanitize=cfi-icall',
+'-fsanitize-cfi-icall-generalize-pointers'] )
+cfi_flags += '-fsanitize-cfi-icall-generalize-pointers'
+  else
+error('-fsanitize-cfi-icall-generalize-pointers is not supported by the 
compiler')
+  endif
+  if get_option('cfi_debug').enabled()
+if cc.compiles('int main () { return 0; }',
+   name: '-fno-sanitize-trap=cfi-icall',
+   args: ['-flto', '-fsanitize=cfi-icall',
+  '-fno-sanitize-trap=cfi-icall'] )
+  cfi_flags += '-fno-sanitize-trap=cfi-icall'
+else
+  error('-fno-sanitize-trap=cfi-icall is not supported by the compiler')
+endif
+  endif
+  add_project_arguments(cfi_flags, native: false, language: ['c', 'cpp',
+ 'objc'])
+  add_project_link_arguments(cfi_flags, native: false, language: ['c',

[PATCH v4 5/5] docs: Add CFI Documentation

2020-12-04 Thread Daniele Buono

Document how to compile with CFI and how to maintain CFI-safe code

Signed-off-by: Daniele Buono 
---
 docs/devel/control-flow-integrity.rst | 137 ++
 1 file changed, 137 insertions(+)
 create mode 100644 docs/devel/control-flow-integrity.rst

diff --git a/docs/devel/control-flow-integrity.rst 
b/docs/devel/control-flow-integrity.rst
new file mode 100644
index 00..ec54d16a42
--- /dev/null
+++ b/docs/devel/control-flow-integrity.rst
@@ -0,0 +1,137 @@
+
+Control-Flow Integrity (CFI)
+
+
+This document describes the current control-flow integrity (CFI) mechanism in
+QEMU. How it can be enabled, its benefits and deficiencies, and how it affects
+new and existing code in QEMU
+
+Basics
+--
+
+CFI is a hardening technique that focusing on guaranteeing that indirect
+function calls have not been altered by an attacker.
+The type used in QEMU is a forward-edge control-flow integrity that ensures
+function calls performed through function pointers, always call a "compatible"
+function. A compatible function is a function with the same signature of the
+function pointer declared in the source code.
+
+This type of CFI is entirely compiler-based and relies on the compiler knowing
+the signature of every function and every function pointer used in the code.
+As of now, the only compiler that provides support for CFI is Clang.
+
+CFI is best used on production binaries, to protect against unknown attack
+vectors.
+
+In case of a CFI violation (i.e. call to a non-compatible function) QEMU will
+terminate abruptly, to stop the possible attack.
+
+Building with CFI
+-
+
+NOTE: CFI requires the use of link-time optimization. Therefore, when CFI is
+selected, LTO will be automatically enabled.
+
+To build with CFI, the minimum requirement is Clang 6+. If you
+are planning to also enable fuzzing, then Clang 11+ is needed (more on this
+later).
+
+Given the use of LTO, a version of AR that supports LLVM IR is required.
+The easies way of doing this is by selecting the AR provided by LLVM::
+
+ AR=llvm-ar-9 CC=clang-9 CXX=lang++-9 /path/to/configure --enable-cfi
+
+CFI is enabled on every binary produced.
+
+If desired, an additional flag to increase the verbosity of the output in case
+of a CFI violation is offered (``--enable-debug-cfi``).
+
+Using QEMU built with CFI
+-
+
+A binary with CFI will work exactly like a standard binary. In case of a CFI
+violation, the binary will terminate with an illegal instruction signal.
+
+Incompatible code with CFI
+--
+
+As mentioned above, CFI is entirely compiler-based and therefore relies on
+compile-time knowledge of the code. This means that, while generally supported
+for most code, some specific use pattern can break CFI compatibility, and
+create false-positives. The two main patterns that can cause issues are:
+
+* Just-in-time compiled code: since such code is created at runtime, the jump
+  to the buffer containing JIT code will fail.
+
+* Libraries loaded dynamically, e.g. with dlopen/dlsym, since the library was
+  not known at compile time.
+
+Current areas of QEMU that are not entirely compatible with CFI are:
+
+1. TCG, since the idea of TCG is to pre-compile groups of instructions at
+   runtime to speed-up interpretation, quite similarly to a JIT compiler
+
+2. TCI, where the interpreter has to interpret the generic *call* operation
+
+3. Plugins, since a plugin is implemented as an external library
+
+4. Modules, since they are implemented as an external library
+
+5. Directly calling signal handlers from the QEMU source code, since the
+   signal handler may have been provided by an external library or even plugged
+   at runtime.
+
+Disabling CFI for a specific function
+-
+
+If you are working on function that is performing a call using an
+incompatible way, as described before, you can selectively disable CFI checks
+for such function by using the decorator ``QEMU_DISABLE_CFI`` at function
+definition, and add an explanation on why the function is not compatible
+with CFI. An example of the use of ``QEMU_DISABLE_CFI`` is provided here::
+
+   /*
+* Disable CFI checks.
+* TCG creates binary blobs at runtime, with the transformed code.
+* A TB is a blob of binary code, created at runtime and called with an
+* indirect function call. Since such function did not exist at compile 
time,
+* the CFI runtime has no way to verify its signature and would fail.
+* TCG is not considered a security-sensitive part of QEMU so this does 
not
+* affect the impact of CFI in environment with high security 
requirements
+*/
+   QEMU_DISABLE_CFI
+   static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
+
+NOTE: CFI needs to be disabled at the **caller** function, (i.e. a compatible
+cfi

[PATCH v4 1/5] configure,meson: add option to enable LTO

2020-12-04 Thread Daniele Buono

This patch allows to compile QEMU with link-time optimization (LTO).
Compilation with LTO is handled directly by meson. This patch only
adds the option in configure and forwards the request to meson

Tested with all major versions of clang from 6 to 12

Signed-off-by: Daniele Buono 
---
 configure   | 7 +++
 meson.build | 1 +
 2 files changed, 8 insertions(+)

diff --git a/configure b/configure
index 18c26e0389..fee118518b 100755
--- a/configure
+++ b/configure
@@ -242,6 +242,7 @@ host_cc="cc"
 audio_win_int=""
 libs_qga=""
 debug_info="yes"
+lto="false"
 stack_protector=""
 safe_stack=""
 use_containers="yes"
@@ -1167,6 +1168,10 @@ for opt do
   ;;
   --disable-werror) werror="no"
   ;;
+  --enable-lto) lto="true"
+  ;;
+  --disable-lto) lto="false"
+  ;;
   --enable-stack-protector) stack_protector="yes"
   ;;
   --disable-stack-protector) stack_protector="no"
@@ -1751,6 +1756,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   module-upgrades try to load modules from alternate paths for upgrades
   debug-tcg   TCG debugging (default is disabled)
   debug-info  debugging information
+  lto Enable Link-Time Optimization.
   sparse  sparse checker
   safe-stack  SafeStack Stack Smash Protection. Depends on
   clang/llvm >= 3.7 and requires coroutine backend ucontext.
@@ -7014,6 +7020,7 @@ NINJA=$ninja $meson setup \
 -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server \
+-Db_lto=$lto \
 $cross_arg \
 "$PWD" "$source_path"
 
diff --git a/meson.build b/meson.build
index e3386196ba..ebd1c690e0 100644
--- a/meson.build
+++ b/meson.build
@@ -2044,6 +2044,7 @@ summary_info += {'gprof enabled': 
config_host.has_key('CONFIG_GPROF')}
 summary_info += {'sparse enabled':sparse.found()}
 summary_info += {'strip binaries':get_option('strip')}
 summary_info += {'profiler':  config_host.has_key('CONFIG_PROFILER')}
+summary_info += {'link-time optimization (LTO)': get_option('b_lto')}
 summary_info += {'static build':  config_host.has_key('CONFIG_STATIC')}
 if targetos == 'darwin'
   summary_info += {'Cocoa support': config_host.has_key('CONFIG_COCOA')}
-- 
2.17.1

[PATCH v4 3/5] check-block: enable iotests with cfi-icall

2020-12-04 Thread Daniele Buono

cfi-icall is a form of Control-Flow Integrity for indirect function
calls implemented by llvm. It is enabled with a -fsanitize flag.

iotests are currently disabled when -fsanitize options is used, with the
exception of SafeStack.

This patch implements a generic filtering mechanism to allow iotests
with a set of known-to-be-safe -fsanitize option. Then marks SafeStack
and the new options used for cfi-icall safe for iotests

Signed-off-by: Daniele Buono 
---
 tests/check-block.sh | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tests/check-block.sh b/tests/check-block.sh
index f6b1bda7b9..fb4c1baae9 100755
--- a/tests/check-block.sh
+++ b/tests/check-block.sh
@@ -21,14 +21,18 @@ if grep -q "CONFIG_GPROF=y" config-host.mak 2>/dev/null ; 
then
 exit 0
 fi
 
-# Disable tests with any sanitizer except for SafeStack
-CFLAGS=$( grep "CFLAGS.*-fsanitize" config-host.mak 2>/dev/null )
-SANITIZE_FLAGS=""
-#Remove all occurrencies of -fsanitize=safe-stack
-for i in ${CFLAGS}; do
-if [ "${i}" != "-fsanitize=safe-stack" ]; then
-SANITIZE_FLAGS="${SANITIZE_FLAGS} ${i}"
+# Disable tests with any sanitizer except for specific ones
+SANITIZE_FLAGS=$( grep "CFLAGS.*-fsanitize" config-host.mak 2>/dev/null )
+ALLOWED_SANITIZE_FLAGS="safe-stack cfi-icall"
+#Remove all occurrencies of allowed Sanitize flags
+for j in ${ALLOWED_SANITIZE_FLAGS}; do
+TMP_FLAGS=${SANITIZE_FLAGS}
+SANITIZE_FLAGS=""
+for i in ${TMP_FLAGS}; do
+if ! echo ${i} | grep -q "${j}" 2>/dev/null; then
+SANITIZE_FLAGS="${SANITIZE_FLAGS} ${i}"
 fi
+done
 done
 if echo ${SANITIZE_FLAGS} | grep -q "\-fsanitize" 2>/dev/null; then
 # Have a sanitize flag that is not allowed, stop
-- 
2.17.1

[PATCH v4 2/5] cfi: Initial support for cfi-icall in QEMU

2020-12-04 Thread Daniele Buono

LLVM/Clang, supports runtime checks for forward-edge Control-Flow
Integrity (CFI).

CFI on indirect function calls (cfi-icall) ensures that, in indirect
function calls, the function called is of the right signature for the
pointer type defined at compile time.

For this check to work, the code must always respect the function
signature when using function pointer, the function must be defined
at compile time, and be compiled with link-time optimization.

This rules out, for example, shared libraries that are dynamically loaded
(given that functions are not known at compile time), and code that is
dynamically generated at run-time.

This patch:

1) Introduces the CONFIG_CFI flag to support cfi in QEMU

2) Introduces a decorator to allow the definition of "sensitive"
functions, where a non-instrumented function may be called at runtime
through a pointer. The decorator will take care of disabling cfi-icall
checks on such functions, when cfi is enabled.

3) Marks functions currently in QEMU that exhibit such behavior,
in particular:
- The function in TCG that calls pre-compiled TBs
- The function in TCI that interprets instructions
- Functions in the plugin infrastructures that jump to callbacks
- Functions in util that directly call a signal handler

Signed-off-by: Daniele Buono 
Acked-by: Alex Bennée env_ptr;
diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index c76281f354..c87c242063 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -243,4 +243,16 @@ extern void QEMU_NORETURN QEMU_ERROR("code path is 
reachable")
 #define qemu_build_not_reached()  g_assert_not_reached()
 #endif
 
+#ifdef CONFIG_CFI
+/*
+ * If CFI is enabled, use an attribute to disable cfi-icall on the following
+ * function
+ */
+#define QEMU_DISABLE_CFI __attribute__((no_sanitize("cfi-icall")))
+#else
+/* If CFI is not enabled, use an empty define to not change the behavior */
+#define QEMU_DISABLE_CFI
+#endif
+
+
 #endif /* COMPILER_H */
diff --git a/plugins/core.c b/plugins/core.c
index 51bfc94787..87b823bbc4 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -31,6 +31,7 @@
 #include "tcg/tcg-op.h"
 #include "trace/mem-internal.h" /* mem_info macros */
 #include "plugin.h"
+#include "qemu/compiler.h"
 
 struct qemu_plugin_cb {
 struct qemu_plugin_ctx *ctx;
@@ -90,6 +91,12 @@ void plugin_unregister_cb__locked(struct qemu_plugin_ctx 
*ctx,
 }
 }
 
+/*
+ * Disable CFI checks.
+ * The callback function has been loaded from an external library so we do not
+ * have type information
+ */
+QEMU_DISABLE_CFI
 static void plugin_vcpu_cb__simple(CPUState *cpu, enum qemu_plugin_event ev)
 {
 struct qemu_plugin_cb *cb, *next;
@@ -111,6 +118,12 @@ static void plugin_vcpu_cb__simple(CPUState *cpu, enum 
qemu_plugin_event ev)
 }
 }
 
+/*
+ * Disable CFI checks.
+ * The callback function has been loaded from an external library so we do not
+ * have type information
+ */
+QEMU_DISABLE_CFI
 static void plugin_cb__simple(enum qemu_plugin_event ev)
 {
 struct qemu_plugin_cb *cb, *next;
@@ -128,6 +141,12 @@ static void plugin_cb__simple(enum qemu_plugin_event ev)
 }
 }
 
+/*
+ * Disable CFI checks.
+ * The callback function has been loaded from an external library so we do not
+ * have type information
+ */
+QEMU_DISABLE_CFI
 static void plugin_cb__udata(enum qemu_plugin_event ev)
 {
 struct qemu_plugin_cb *cb, *next;
@@ -325,6 +344,12 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
 dyn_cb->f.generic = cb;
 }
 
+/*
+ * Disable CFI checks.
+ * The callback function has been loaded from an external library so we do not
+ * have type information
+ */
+QEMU_DISABLE_CFI
 void qemu_plugin_tb_trans_cb(CPUState *cpu, struct qemu_plugin_tb *tb)
 {
 struct qemu_plugin_cb *cb, *next;
@@ -339,6 +364,12 @@ void qemu_plugin_tb_trans_cb(CPUState *cpu, struct 
qemu_plugin_tb *tb)
 }
 }
 
+/*
+ * Disable CFI checks.
+ * The callback function has been loaded from an external library so we do not
+ * have type information
+ */
+QEMU_DISABLE_CFI
 void
 qemu_plugin_vcpu_syscall(CPUState *cpu, int64_t num, uint64_t a1, uint64_t a2,
  uint64_t a3, uint64_t a4, uint64_t a5,
@@ -358,6 +389,12 @@ qemu_plugin_vcpu_syscall(CPUState *cpu, int64_t num, 
uint64_t a1, uint64_t a2,
 }
 }
 
+/*
+ * Disable CFI checks.
+ * The callback function has been loaded from an external library so we do not
+ * have type information
+ */
+QEMU_DISABLE_CFI
 void qemu_plugin_vcpu_syscall_ret(CPUState *cpu, int64_t num, int64_t ret)
 {
 struct qemu_plugin_cb *cb, *next;
diff --git a/plugins/loader.c b/plugins/loader.c
index 8ac5dbc20f..fd491961de 100644
--- a/plugins/loader.c
+++ b/plugins/loader.c
@@ -32,6 +32,7 @@
 #ifndef CONFIG_USER_ONLY
 #include "hw/boards.h"
 #endif
+#include "qemu/compiler.h"
 
 #include "plugin.h"
 
@@ -150,6 +151,12 @@ static uint64_t xorshift64star(uint64_t x)
 return x * UINT64_C(2685821657736338717);
 }
 
+/*
+ * Disable CFI checks.
+ * The install and version

[PATCH v4 0/5] Add support for Control-Flow Integrity

2020-12-04 Thread Daniele Buono

This patch adds supports for Control-Flow Integrity checks
on indirect function calls.

Requires the use of clang, and link-time optimizations

Since it's been a month, and some of the patches are being
merged independently, I thought of rebasing, retesting
and sending an updated version. Also, added a documentation
in docs/devel to explain CFI and how to handle CFI-sensitive
code.

Changes in v4:
- Removed patches to avoid clang warnings, since they are
being merged independently and are not really necessary
for CFI
- Added documentation in docs/devel to explain how to
compile with CFI, and how to disable CFI for incompatible
functions

Changes in v3:

- clang 11+ warnings are now handled directly at the source,
instead of disabling specific warnings for the whole code.
Some more work may be needed here to polish the patch, I
would kindly ask for a review from the corresponding
maintainers
- Remove configure-time checks for toolchain compatibility
with LTO.
- the decorator to disable cfi checks on functions has
been renamed and moved to include/qemu/compiler.h
- configure-time checks for cfi support and dependencies
has been moved from configure to meson

Link to v3: https://www.mail-archive.com/qemu-devel@nongnu.org/msg757930.html
Link to v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg753675.html
Link to v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg718786.html

Daniele Buono (5):
  configure,meson: add option to enable LTO
  cfi: Initial support for cfi-icall in QEMU
  check-block: enable iotests with cfi-icall
  configure,meson: support Control-Flow Integrity
  docs: Add CFI Documentation

 accel/tcg/cpu-exec.c  |  11 +++
 configure |  26 +
 docs/devel/control-flow-integrity.rst | 137 ++
 include/qemu/compiler.h   |  12 +++
 meson.build   |  46 +
 meson_options.txt |   4 +
 plugins/core.c|  37 +++
 plugins/loader.c  |   7 ++
 tcg/tci.c |   7 ++
 tests/check-block.sh  |  18 ++--
 util/main-loop.c  |  11 +++
 util/oslib-posix.c|  11 +++
 12 files changed, 320 insertions(+), 7 deletions(-)
 create mode 100644 docs/devel/control-flow-integrity.rst

-- 
2.17.1

[PATCH] target/mips: Simplify gen_msa_BxZ() 'if' condition

2020-12-04 Thread Philippe Mathieu-Daudé

As gen_check_zero_element() already produces a boolean,
replace 'if (x) tcg_gen_setcondi_tl()' by tcg_gen_xori_tl(x)
which already contains the if (x).

Suggested-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20201202184415.1434484-1-f4...@amsat.org>
---
 target/mips/translate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 8a35d4d0d03..112a5becfbb 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28778,9 +28778,7 @@ static bool gen_msa_BxZ(DisasContext *ctx, int df, int 
wt, int s16, bool if_not)
 }
 
 gen_check_zero_element(bcond, df, wt);
-if (if_not) {
-tcg_gen_setcondi_tl(TCG_COND_EQ, bcond, bcond, 0);
-}
+tcg_gen_xori_tl(bcond, bcond, if_not);
 
 ctx->btarget = ctx->base.pc_next + (s16 << 2) + 4;
 ctx->hflags |= MIPS_HFLAG_BC;
-- 
2.26.2

Re: [PATCH v2 7/7] qapi: More complex uses of QAPI_LIST_APPEND

2020-12-04 Thread Eric Blake

On 11/19/20 2:50 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> These cases require a bit more thought to review; in each case, the
>> code was appending to a list, but not with a FOOList **tail variable.

>> +++ b/hw/core/machine-qmp-cmds.c
> [...]
>> @@ -294,41 +281,31 @@ void qmp_set_numa_node(NumaOptions *cmd, Error **errp)
>>  static int query_memdev(Object *obj, void *opaque)

>>  v = qobject_input_visitor_new(host_nodes);
>> -visit_type_uint16List(v, NULL, >value->host_nodes, _abort);
>> +visit_type_uint16List(v, NULL, >host_nodes, _abort);
>>  visit_free(v);
>>  qobject_unref(host_nodes);
>>
>> -m->next = *list;
>> -*list = m;
>> +QAPI_LIST_APPEND(list, m);
> 
> The old code prepends, doesn't it?

Good catch, will correct and hoist this into 4/7 for v3.

> 
>>  }
>>
>>  return 0;
>> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
>> index cf0627fd01c1..1afcc29a0649 100644
>> --- a/hw/mem/memory-device.c
>> +++ b/hw/mem/memory-device.c
>> @@ -199,7 +199,7 @@ out:
>>  MemoryDeviceInfoList *qmp_memory_device_list(void)
>>  {
>>  GSList *devices = NULL, *item;
>> -MemoryDeviceInfoList *list = NULL, *prev = NULL;
>> +MemoryDeviceInfoList *list = NULL, **prev = 
> 
> Here, you reuse the old name for the new variable.

>> +++ b/hw/pci/pci.c
>> @@ -1681,41 +1681,34 @@ static PciDeviceInfoList 
>> *qmp_query_pci_devices(PCIBus *bus, int bus_num);
>>
>>  static PciMemoryRegionList *qmp_query_pci_regions(const PCIDevice *dev)
>>  {
>> -PciMemoryRegionList *head = NULL, *cur_item = NULL;
>> +PciMemoryRegionList *head = NULL, **tail = 
> 
> Here, you use a new and better name.
> 
> I'd like to encourage you to name tail pointer variables @tail
> elsewhere, too.

In v3, I will consistently rename the FOOList ** variable 'tail'.

>> @@ -2863,7 +2846,6 @@ qmp_guest_set_memory_blocks(GuestMemoryBlockList 
>> *mem_blks, Error **errp)
>>
>>  while (mem_blks != NULL) {
>>  GuestMemoryBlockResponse *result;
>> -GuestMemoryBlockResponseList *entry;
>>  GuestMemoryBlock *current_mem_blk = mem_blks->value;
>>
>>  result = g_malloc0(sizeof(*result));
>> @@ -2872,11 +2854,7 @@ qmp_guest_set_memory_blocks(GuestMemoryBlockList 
>> *mem_blks, Error **errp)
>>  if (local_err) { /* should never happen */
>>  goto err;
>>  }
>> -entry = g_malloc0(sizeof *entry);
>> -entry->value = result;
>> -
>> -*link = entry;
>> -link = >next;
>> +QAPI_LIST_APPEND(link, result);
>>  mem_blks = mem_blks->next;
>>  }
>>
> 
> This one looks like a candidate for PATCH 6.

Yes.  Will hoist.

v3 will be posted soon.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 9/9] target/mips: Explode gen_msa_branch() as gen_msa_BxZ_V/BxZ()

2020-12-04 Thread Philippe Mathieu-Daudé

On 12/4/20 6:04 PM, Richard Henderson wrote:
> On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
>> +static bool gen_msa_BxZ(DisasContext *ctx, int df, int wt, int s16, bool 
>> if_not)
>> +{
>> +check_msa_access(ctx);
>> +
>> +if (ctx->hflags & MIPS_HFLAG_BMASK) {
>> +generate_exception_end(ctx, EXCP_RI);
>> +return true;
>> +}
>> +
>> +gen_check_zero_element(bcond, df, wt);
>> +if (if_not) {
>> +tcg_gen_setcondi_tl(TCG_COND_EQ, bcond, bcond, 0);
>> +}
> 
> Since gen_check_zero_element already produces a boolean, this is better as
> 
>   tcg_gen_xori_tl(bcond, bcond, if_not);
> 
> where tcg_gen_xori_tl already contains the if.

Ah, got it.

> 
>>  case OPC_BNZ_D:
>> -gen_check_zero_element(bcond, df, wt);
>> -tcg_gen_setcondi_tl(TCG_COND_EQ, bcond, bcond, 0);
>> +gen_msa_BxZ(ctx, df, wt, s16, true);
> 
> ... oops, that'd be for a follow-up patch, to make this patch just code 
> movement.

Yes, will follow. I'm tempted to inline gen_check_zero_element (actually
move gen_msa_BxZ as gen_check_zero_element prologue/epilogue). Not sure
gen_check_zero_element() can be reused later though.

> 
> Reviewed-by: Richard Henderson 

Thanks!

> 
> r~
>

Re: [PATCH 6/9] target/mips: Alias MSA vector registers on FPU scalar registers

2020-12-04 Thread Philippe Mathieu-Daudé

On 12/4/20 5:28 PM, Richard Henderson wrote:
> On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
>> Commits 863f264d10f ("add msa_reset(), global msa register") and
>> cb269f273fd ("fix multiple TCG registers covering same data")
>> removed the FPU scalar registers and replaced them by aliases to
>> the MSA vector registers.
>> While this might be the case for CPU implementing MSA, this makes
>> QEMU code incoherent for CPU not implementing it. It is simpler
>> to inverse the logic and alias the MSA vector registers on the
>> FPU scalar ones.
> 
> How does it make things incoherent?  I'm missing how the logic has actually
> changed, as opposed to an order of assignments.

I guess my wording isn't clear.

By "incoherent" I want to say it is odd to disable MSA and have
FPU registers displayed with MSA register names, instead of their
proper FPU names.

The MIPS ISA represents the ASE as onion rings that extend an ISA.
I'd like to model it that way, have ASE optional (and that we can
even not compile).
You can have CPU without FPU, CPU with FPU, CPU with MSA (you
implicitly have a FPU). If FPU depends on MSA, we can not take the
MSA implementation out of the equation.

Back to the patch, instead of aliasing FPU registers to the MSA ones
(even when MSA is absent), we now alias the MSA ones to the FPU ones
(only when MSA is present). This is what I call the "inverted logic".

BTW the point of this change is simply to be able to extract the MSA
code out of the huge translate.c.

Regards,

Phil.

[PATCH 3/5] target/mips: Do not initialize MT registers if MT ASE absent

2020-12-04 Thread Philippe Mathieu-Daudé

Do not initialize MT-related config registers if the MT ASE
is not present. As some functions access the 'mvp' structure,
we still zero-allocate it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate_init.c.inc | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/mips/translate_init.c.inc b/target/mips/translate_init.c.inc
index 5a926bc6df3..f72fee3b40a 100644
--- a/target/mips/translate_init.c.inc
+++ b/target/mips/translate_init.c.inc
@@ -993,6 +993,10 @@ static void mvp_init(CPUMIPSState *env)
 {
 env->mvp = g_malloc0(sizeof(CPUMIPSMVPContext));
 
+if (!ase_mt_available(env)) {
+return;
+}
+
 /* MVPConf1 implemented, TLB sharable, no gating storage support,
programmable cache partitioning implemented, number of allocatable
and shareable TLB entries, MVP has allocatable TCs, 2 VPEs
-- 
2.26.2

[PATCH 2/5] target/mips: Introduce ase_mt_available() helper

2020-12-04 Thread Philippe Mathieu-Daudé

Instead of accessing CP0_Config3 directly and checking
the 'Multi-Threading Present' bit, introduce an helper
to simplify code review.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/cpu.h| 7 +++
 hw/mips/cps.c| 3 +--
 target/mips/cp0_helper.c | 2 +-
 target/mips/cpu.c| 2 +-
 target/mips/helper.c | 2 +-
 target/mips/translate.c  | 2 +-
 6 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index 2639b0ea06c..82c60a34751 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -1289,6 +1289,13 @@ int cpu_mips_signal_handler(int host_signum, void 
*pinfo, void *puc);
 
 bool cpu_supports_cps_smp(const char *cpu_type);
 bool cpu_supports_isa(const char *cpu_type, uint64_t isa);
+
+/* Check presence of multi-threading ASE implementation */
+static inline bool ase_mt_available(CPUMIPSState *env)
+{
+return env->CP0_Config3 & (1 << CP0C3_MT);
+}
+
 void cpu_set_exception_base(int vp_index, target_ulong address);
 
 /* mips_int.c */
diff --git a/hw/mips/cps.c b/hw/mips/cps.c
index 962b1b0b87c..7a0d289efaf 100644
--- a/hw/mips/cps.c
+++ b/hw/mips/cps.c
@@ -58,8 +58,7 @@ static void main_cpu_reset(void *opaque)
 
 static bool cpu_mips_itu_supported(CPUMIPSState *env)
 {
-bool is_mt = (env->CP0_Config5 & (1 << CP0C5_VP)) ||
- (env->CP0_Config3 & (1 << CP0C3_MT));
+bool is_mt = (env->CP0_Config5 & (1 << CP0C5_VP)) || ase_mt_available(env);
 
 return is_mt && !kvm_enabled();
 }
diff --git a/target/mips/cp0_helper.c b/target/mips/cp0_helper.c
index caaaefcc8ad..9718c93d18c 100644
--- a/target/mips/cp0_helper.c
+++ b/target/mips/cp0_helper.c
@@ -1166,7 +1166,7 @@ void helper_mtc0_entryhi(CPUMIPSState *env, target_ulong 
arg1)
 old = env->CP0_EntryHi;
 val = (arg1 & mask) | (old & ~mask);
 env->CP0_EntryHi = val;
-if (env->CP0_Config3 & (1 << CP0C3_MT)) {
+if (ase_mt_available(env)) {
 sync_c0_entryhi(env, env->current_tc);
 }
 /* If the ASID changes, flush qemu's TLB.  */
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 76d50b00b42..c03e5acf5bc 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -74,7 +74,7 @@ static bool mips_cpu_has_work(CPUState *cs)
 }
 
 /* MIPS-MT has the ability to halt the CPU.  */
-if (env->CP0_Config3 & (1 << CP0C3_MT)) {
+if (ase_mt_available(env)) {
 /*
  * The QEMU model will issue an _WAKE request whenever the CPUs
  * should be woken up.
diff --git a/target/mips/helper.c b/target/mips/helper.c
index cc46ea887e5..608fe1512a3 100644
--- a/target/mips/helper.c
+++ b/target/mips/helper.c
@@ -419,7 +419,7 @@ void cpu_mips_store_status(CPUMIPSState *env, target_ulong 
val)
 tlb_flush(env_cpu(env));
 }
 #endif
-if (env->CP0_Config3 & (1 << CP0C3_MT)) {
+if (ase_mt_available(env)) {
 sync_c0_status(env, env, env->current_tc);
 } else {
 compute_hflags(env);
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 0db032fc5fb..ee45dce9a50 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -31921,7 +31921,7 @@ void cpu_state_reset(CPUMIPSState *env)
 
 cpu_mips_store_count(env, 1);
 
-if (env->CP0_Config3 & (1 << CP0C3_MT)) {
+if (ase_mt_available(env)) {
 int i;
 
 /* Only TC0 on VPE 0 starts as active.  */
-- 
2.26.2

Re: [PATCH] block/nvme: Do not allow image creation with NVMe block driver

2020-12-04 Thread Philippe Mathieu-Daudé

On 12/4/20 5:57 PM, Philippe Mathieu-Daudé wrote:
> The NVMe driver does not support image creation.
> The full drive has to be passed to the guest.
> 
> Before:
> 
>   $ qemu-img create -f raw nvme://:04:00.0/1 20G
>   Formatting 'nvme://:04:00.0/1', fmt=raw size=21474836480
> 
>   $ qemu-img info nvme://:04:00.0/1
>   image: nvme://:04:00.0/1
>   file format: raw
>   virtual size: 349 GiB (375083606016 bytes)
>   disk size: unavailable
> 
> After:
> 
>   $ qemu-img create -f raw nvme://:04:00.0/1 20G
>   qemu-img: nvme://:04:00.0/1: Protocol driver 'nvme' does not support 
> image creation
> 
> Fixes: 5a5e7f8cd86 ("block: trickle down the fallback image creation function 
> use to the block drivers")
> Reported-by: Xueqiang Wei 
> Suggested-by: Max Reitz 

Well Max didn't suggest the change but pointed me to commit 5a5e7f8cd86.

> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Maxim Levitsky 
> ---
>  block/nvme.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index a06a188d530..73ddf837c2b 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -1515,9 +1515,6 @@ static BlockDriver bdrv_nvme = {
>  .protocol_name= "nvme",
>  .instance_size= sizeof(BDRVNVMeState),
>  
> -.bdrv_co_create_opts  = bdrv_co_create_opts_simple,
> -.create_opts  = _create_opts_simple,
> -
>  .bdrv_parse_filename  = nvme_parse_filename,
>  .bdrv_file_open   = nvme_file_open,
>  .bdrv_close   = nvme_close,
>

[PATCH 5/5] hw/mips/malta: Rewrite CP0_MVPConf0 access using deposit()

2020-12-04 Thread Philippe Mathieu-Daudé

PTC field has 8 bits, PVPE has 4. We plan to use the
"hw/registerfields.h" API with MIPS CPU definitions
(target/mips/cpu.h). Meanwhile we use magic 8 and 4.

Signed-off-by: Philippe Mathieu-Daudé 
---
We want to move that to mips_cpu_reset() later,
because this is not Malta specific but cpu-specific.
However SMP 'cpus' come from MachineState ("hw/boards.h").
So meanwhile this is early review.
---
 hw/mips/malta.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index 350b92b4d79..c35fbf97272 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -24,6 +24,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "qemu/bitops.h"
 #include "qemu-common.h"
 #include "cpu.h"
 #include "hw/clock.h"
@@ -1135,8 +1136,11 @@ static void malta_mips_config(MIPSCPU *cpu)
 CPUState *cs = CPU(cpu);
 
 if (ase_mt_available(env)) {
-env->mvp->CP0_MVPConf0 |= ((smp_cpus - 1) << CP0MVPC0_PVPE) |
- ((smp_cpus * cs->nr_threads - 1) << CP0MVPC0_PTC);
+env->mvp->CP0_MVPConf0 = deposit32(env->mvp->CP0_MVPConf0,
+   CP0MVPC0_PTC, 8,
+   smp_cpus * cs->nr_threads - 1);
+env->mvp->CP0_MVPConf0 = deposit32(env->mvp->CP0_MVPConf0,
+   CP0MVPC0_PVPE, 4, smp_cpus - 1);
 }
 }
 
-- 
2.26.2

[PATCH 0/5] mips: Sanitize Multi-Threading ASE

2020-12-04 Thread Philippe Mathieu-Daudé

Reviewing the MIPS code, ASE after ASE.
Time for MT ASE.

- Introduce/use ase_mt_available() helper to check
  if MT ASE is present
- Avoid setting MT specific registers if MT ASE is absent

Philippe Mathieu-Daudé (5):
  target/mips: Remove mips_def_t unused argument from mvp_init()
  target/mips: Introduce ase_mt_available() helper
  target/mips: Do not initialize MT registers if MT ASE absent
  hw/mips/malta: Do not initialize MT registers if MT ASE absent
  hw/mips/malta: Rewrite CP0_MVPConf0 access using deposit()

 target/mips/cpu.h|  7 +++
 hw/mips/cps.c|  3 +--
 hw/mips/malta.c  | 10 --
 target/mips/cp0_helper.c |  2 +-
 target/mips/cpu.c|  2 +-
 target/mips/helper.c |  2 +-
 target/mips/translate.c  |  4 ++--
 target/mips/translate_init.c.inc |  6 +-
 8 files changed, 26 insertions(+), 10 deletions(-)

-- 
2.26.2

[PATCH 4/5] hw/mips/malta: Do not initialize MT registers if MT ASE absent

2020-12-04 Thread Philippe Mathieu-Daudé

Do not initialize MT-related config register if the MT ASE
is not present.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/malta.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index 9d1a3b50b7a..350b92b4d79 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -1134,8 +1134,10 @@ static void malta_mips_config(MIPSCPU *cpu)
 CPUMIPSState *env = >env;
 CPUState *cs = CPU(cpu);
 
-env->mvp->CP0_MVPConf0 |= ((smp_cpus - 1) << CP0MVPC0_PVPE) |
+if (ase_mt_available(env)) {
+env->mvp->CP0_MVPConf0 |= ((smp_cpus - 1) << CP0MVPC0_PVPE) |
  ((smp_cpus * cs->nr_threads - 1) << CP0MVPC0_PTC);
+}
 }
 
 static void main_cpu_reset(void *opaque)
-- 
2.26.2

[PATCH 1/5] target/mips: Remove mips_def_t unused argument from mvp_init()

2020-12-04 Thread Philippe Mathieu-Daudé

mvp_init() doesn't require any CPU definition (beside the
information accessible via CPUMIPSState). Remove the unused
argument.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c  | 2 +-
 target/mips/translate_init.c.inc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index c64a1bc42e1..0db032fc5fb 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -31767,7 +31767,7 @@ void cpu_mips_realize_env(CPUMIPSState *env)
 mmu_init(env, env->cpu_model);
 #endif
 fpu_init(env, env->cpu_model);
-mvp_init(env, env->cpu_model);
+mvp_init(env);
 }
 
 bool cpu_supports_cps_smp(const char *cpu_type)
diff --git a/target/mips/translate_init.c.inc b/target/mips/translate_init.c.inc
index 79f75ed863c..5a926bc6df3 100644
--- a/target/mips/translate_init.c.inc
+++ b/target/mips/translate_init.c.inc
@@ -989,7 +989,7 @@ static void fpu_init (CPUMIPSState *env, const mips_def_t 
*def)
 memcpy(>active_fpu, >fpus[0], sizeof(env->active_fpu));
 }
 
-static void mvp_init (CPUMIPSState *env, const mips_def_t *def)
+static void mvp_init(CPUMIPSState *env)
 {
 env->mvp = g_malloc0(sizeof(CPUMIPSMVPContext));
 
-- 
2.26.2

[PATCH v14 13/13] block: apply COR-filter to block-stream jobs

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

This patch completes the series with the COR-filter applied to
block-stream operations.

Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).

Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.

Several iotests are slightly modified due to filter insertion.

Signed-off-by: Andrey Shinkevich 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/stream.c | 78 ++
 tests/qemu-iotests/030 |  8 ++--
 tests/qemu-iotests/141.out |  2 +-
 tests/qemu-iotests/245 | 20 ++
 4 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index a7fd8945ad..b92f7de55b 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -18,8 +18,10 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
+#include "qapi/qmp/qdict.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
 
 enum {
 /*
@@ -34,6 +36,7 @@ typedef struct StreamBlockJob {
 BlockJob common;
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
+BlockDriverState *cor_filter_bs;
 BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
@@ -46,8 +49,7 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
 {
 assert(bytes < SIZE_MAX);
 
-return blk_co_preadv(blk, offset, bytes, NULL,
- BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
+return blk_co_preadv(blk, offset, bytes, NULL, BDRV_REQ_PREFETCH);
 }
 
 static void stream_abort(Job *job)
@@ -55,7 +57,7 @@ static void stream_abort(Job *job)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
 if (s->chain_frozen) {
-bdrv_unfreeze_backing_chain(s->target_bs, s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 }
 }
 
@@ -69,7 +71,7 @@ static int stream_prepare(Job *job)
 Error *local_err = NULL;
 int ret = 0;
 
-bdrv_unfreeze_backing_chain(s->target_bs, s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 s->chain_frozen = false;
 
 if (bdrv_cow_child(unfiltered_bs)) {
@@ -117,6 +119,8 @@ static void stream_clean(Job *job)
 bdrv_reopen_set_read_only(s->target_bs, true, NULL);
 }
 
+bdrv_cor_filter_drop(s->cor_filter_bs);
+
 g_free(s->backing_file_str);
 }
 
@@ -125,7 +129,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockBackend *blk = s->common.blk;
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
-bool enable_cor = !bdrv_cow_child(s->base_overlay);
 int64_t len;
 int64_t offset = 0;
 uint64_t delay_ns = 0;
@@ -143,15 +146,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 }
 job_progress_set_remaining(>common.job, len);
 
-/* Turn on copy-on-read for the whole block device so that guest read
- * requests help us make progress.  Only do this when copying the entire
- * backing chain since the copy-on-read operation does not take base into
- * account.
- */
-if (enable_cor) {
-bdrv_enable_copy_on_read(s->target_bs);
-}
-
 for ( ; offset < len; offset += n) {
 bool copy;
 int ret;
@@ -210,10 +204,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 }
 }
 
-if (enable_cor) {
-bdrv_disable_copy_on_read(s->target_bs);
-}
-
 /* Do not remove the backing file if an error was there but ignored. */
 return error;
 }
@@ -244,7 +234,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 bool bs_read_only;
 int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
 BlockDriverState *base_overlay;
+BlockDriverState *cor_filter_bs = NULL;
 BlockDriverState *above_base;
+QDict *opts;
 
 assert(!(base && bottom));
 assert(!(backing_file_str && bottom));
@@ -295,17 +287,49 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 }
 }
 
-/* Prevent concurrent jobs trying to modify the graph structure here, we
- * already have our own plans. Also don't allow resize as the image size is
- * queried only at the job start and then cached. */
-s = block_job_create(job_id, _job_driver, NULL, bs,
- basic_flags | BLK_PERM_GRAPH_MOD,
+opts = qdict_new();
+
+qdict_put_str(opts, "driver", "copy-on-read");
+qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+/* Pass the base_overlay node

[Bug 1673976] Re: linux-user clone() can't handle glibc posix_spawn() (causes locale-gen to assert)

2020-12-04 Thread Davide Palma

any solution? trying to emulate a closed source amd64 app on my
raspberry and i'm getting this error with qemu 5.2.0-rc4 and glibc 2.27.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1673976

Title:
  linux-user clone() can't handle glibc posix_spawn() (causes locale-gen
  to assert)

Status in QEMU:
  New

Bug description:
  I'm running a command (locale-gen) inside of an armv7h chroot mounted
  on my x86_64 desktop by putting qemu-arm-static into /usr/bin/ of the
  chroot file system and I get a core dump.

  locale-gen
  Generating locales...
    en_US.UTF-8...localedef: ../sysdeps/unix/sysv/linux/spawni.c:360: 
__spawnix: Assertion `ec >= 0' failed.
  qemu: uncaught target signal 6 (Aborted) - core dumped
  /usr/bin/locale-gen: line 41:34 Aborted (core dumped) 
localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale

  I've done this same thing successfully for years, but this breakage
  has appeared some time in the last 3 or so months. Possibly with the
  update to qemu version 2.8.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1673976/+subscriptions

[PATCH v14 08/13] copy-on-read: skip non-guest reads if no copy needed

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

If the flag BDRV_REQ_PREFETCH was set, skip idling read/write
operations in COR-driver. It can be taken into account for the
COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the
COR-filter.

block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are
going to use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  8 +---
 block/copy-on-read.c  | 14 ++
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 81a3894129..3499554d9c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -81,9 +81,11 @@ typedef enum {
 BDRV_REQ_NO_FALLBACK= 0x100,
 
 /*
- * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
- * on read request and means that caller doesn't really need data to be
- * written to qiov parameter which may be NULL.
+ * BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
+ * (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when a COR
+ * filter is involved), in which case it signals that the COR operation
+ * need not read the data into memory (qiov) but only ensure they are
+ * copied to the top layer (i.e., that COR operation is done).
  */
 BDRV_REQ_PREFETCH  = 0x200,
 /* Mask of valid flags */
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 67f61983c0..8b64e55e22 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -49,6 +49,8 @@ static int cor_open(BlockDriverState *bs, QDict *options, int 
flags,
 return -EINVAL;
 }
 
+bs->supported_read_flags = BDRV_REQ_PREFETCH;
+
 bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
 (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
 
@@ -150,10 +152,14 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
 }
 }
 
-ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
-  local_flags);
-if (ret < 0) {
-return ret;
+/* Skip if neither read nor write are needed */
+if ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
+BDRV_REQ_PREFETCH) {
+ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+if (ret < 0) {
+return ret;
+}
 }
 
 offset += n;
-- 
2.21.3

[PATCH v14 12/13] block/stream: add s->target_bs

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

Add a direct link to target bs for convenience and to simplify
following commit which will insert COR filter above target bs.

This is a part of original commit written by Andrey.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/stream.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index a2744d07fe..a7fd8945ad 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -34,6 +34,7 @@ typedef struct StreamBlockJob {
 BlockJob common;
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
+BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
 bool bs_read_only;
@@ -54,24 +55,21 @@ static void stream_abort(Job *job)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
 if (s->chain_frozen) {
-BlockJob *bjob = >common;
-bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+bdrv_unfreeze_backing_chain(s->target_bs, s->above_base);
 }
 }
 
 static int stream_prepare(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
-BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
 BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
 BlockDriverState *base_unfiltered;
 BlockDriverState *backing_bs;
 Error *local_err = NULL;
 int ret = 0;
 
-bdrv_unfreeze_backing_chain(bs, s->above_base);
+bdrv_unfreeze_backing_chain(s->target_bs, s->above_base);
 s->chain_frozen = false;
 
 if (bdrv_cow_child(unfiltered_bs)) {
@@ -111,13 +109,12 @@ static void stream_clean(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
 
 /* Reopen the image back in read-only mode if necessary */
 if (s->bs_read_only) {
 /* Give up write permissions before making it read-only */
 blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort);
-bdrv_reopen_set_read_only(bs, true, NULL);
+bdrv_reopen_set_read_only(s->target_bs, true, NULL);
 }
 
 g_free(s->backing_file_str);
@@ -127,8 +124,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockBackend *blk = s->common.blk;
-BlockDriverState *bs = blk_bs(blk);
-BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
 bool enable_cor = !bdrv_cow_child(s->base_overlay);
 int64_t len;
 int64_t offset = 0;
@@ -141,7 +137,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 return 0;
 }
 
-len = bdrv_getlength(bs);
+len = bdrv_getlength(s->target_bs);
 if (len < 0) {
 return len;
 }
@@ -153,7 +149,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
  * account.
  */
 if (enable_cor) {
-bdrv_enable_copy_on_read(bs);
+bdrv_enable_copy_on_read(s->target_bs);
 }
 
 for ( ; offset < len; offset += n) {
@@ -215,7 +211,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 }
 
 if (enable_cor) {
-bdrv_disable_copy_on_read(bs);
+bdrv_disable_copy_on_read(s->target_bs);
 }
 
 /* Do not remove the backing file if an error was there but ignored. */
@@ -330,6 +326,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->base_overlay = base_overlay;
 s->above_base = above_base;
 s->backing_file_str = g_strdup(backing_file_str);
+s->target_bs = bs;
 s->bs_read_only = bs_read_only;
 s->chain_frozen = true;
 
-- 
2.21.3

[PATCH v14 09/13] stream: skip filters when writing backing file name to QCOW2 header

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Avoid writing a filter JSON file name and a filter format name to QCOW2
image when the backing file is being changed after the block stream
job. It can occur due to a concurrent commit job on the same backing
chain.
A user is still able to assign the 'backing-file' parameter for a
block-stream job keeping in mind the possible issue mentioned above.
If the user does not specify the 'backing-file' parameter, QEMU will
assign it automatically.

Signed-off-by: Andrey Shinkevich 
 [vsementsov: use unfiltered_bs for bdrv_find_backing_image()]
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/stream.c | 21 +++--
 blockdev.c |  8 +---
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 6e281c71ac..c208393c34 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -17,6 +17,7 @@
 #include "block/blockjob_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
 
@@ -65,6 +66,8 @@ static int stream_prepare(Job *job)
 BlockDriverState *bs = blk_bs(bjob->blk);
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
 BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
+BlockDriverState *base_unfiltered;
+BlockDriverState *backing_bs;
 Error *local_err = NULL;
 int ret = 0;
 
@@ -75,8 +78,22 @@ static int stream_prepare(Job *job)
 const char *base_id = NULL, *base_fmt = NULL;
 if (base) {
 base_id = s->backing_file_str;
-if (base->drv) {
-base_fmt = base->drv->format_name;
+if (base_id) {
+backing_bs = bdrv_find_backing_image(unfiltered_bs, base_id);
+if (backing_bs && backing_bs->drv) {
+base_fmt = backing_bs->drv->format_name;
+} else {
+error_report("Format not found for backing file %s",
+ s->backing_file_str);
+}
+} else {
+base_unfiltered = bdrv_skip_filters(base);
+if (base_unfiltered) {
+base_id = base_unfiltered->filename;
+if (base_unfiltered->drv) {
+base_fmt = base_unfiltered->drv->format_name;
+}
+}
 }
 }
 bdrv_set_backing_hd(unfiltered_bs, base, _err);
diff --git a/blockdev.c b/blockdev.c
index c917625245..70900f4f77 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2508,7 +2508,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 BlockDriverState *base_bs = NULL;
 AioContext *aio_context;
 Error *local_err = NULL;
-const char *base_name = NULL;
 int job_flags = JOB_DEFAULT;
 
 if (!has_on_error) {
@@ -2536,7 +2535,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 assert(bdrv_get_aio_context(base_bs) == aio_context);
-base_name = base;
 }
 
 if (has_base_node) {
@@ -2551,7 +2549,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 assert(bdrv_get_aio_context(base_bs) == aio_context);
 bdrv_refresh_filename(base_bs);
-base_name = base_bs->filename;
 }
 
 /* Check for op blockers in the whole chain between bs and base */
@@ -2571,9 +2568,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 
-/* backing_file string overrides base bs filename */
-base_name = has_backing_file ? backing_file : base_name;
-
 if (has_auto_finalize && !auto_finalize) {
 job_flags |= JOB_MANUAL_FINALIZE;
 }
@@ -2581,7 +2575,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 job_flags |= JOB_MANUAL_DISMISS;
 }
 
-stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
+stream_start(has_job_id ? job_id : NULL, bs, base_bs, backing_file,
  job_flags, has_speed ? speed : 0, on_error,
  filter_node_name, _err);
 if (local_err) {
-- 
2.21.3

[PATCH v14 11/13] iotests: 30: prepare to COR filter insertion by stream job

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

test_stream_parallel run parallel stream jobs, intersecting so that top
of one is base of another. It's OK now, but it would be a problem if
insert the filter, as one job will want to use another job's filter as
above_base node.

Correct thing to do is move to new interface: "bottom" argument instead
of base. This guarantees that jobs don't intersect by their actions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/030 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index dcb4b5d6a6..bd8cf9cff7 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -245,7 +245,9 @@ class TestParallelOps(iotests.QMPTestCase):
 node_name = 'node%d' % i
 job_id = 'stream-%s' % node_name
 pending_jobs.append(job_id)
-result = self.vm.qmp('block-stream', device=node_name, 
job_id=job_id, base=self.imgs[i-2], speed=1024)
+result = self.vm.qmp('block-stream', device=node_name,
+ job_id=job_id, bottom=f'node{i-1}',
+ speed=1024)
 self.assert_qmp(result, 'return', {})
 
 for job in pending_jobs:
-- 
2.21.3

[PATCH v14 07/13] block: include supported_read_flags into BDS structure

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  4 
 block/io.c| 12 ++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c05fa1eb6b..247e166ab6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -873,6 +873,10 @@ struct BlockDriverState {
 /* I/O Limits */
 BlockLimits bl;
 
+/*
+ * Flags honored during pread
+ */
+unsigned int supported_read_flags;
 /* Flags honored during pwrite (so far: BDRV_REQ_FUA,
  * BDRV_REQ_WRITE_UNCHANGED).
  * If a driver does not support BDRV_REQ_WRITE_UNCHANGED, those
diff --git a/block/io.c b/block/io.c
index ec5e152bb7..e28b11c42b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1405,6 +1405,9 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 if (flags & BDRV_REQ_COPY_ON_READ) {
 int64_t pnum;
 
+/* The flag BDRV_REQ_COPY_ON_READ has reached its addressee */
+flags &= ~BDRV_REQ_COPY_ON_READ;
+
 ret = bdrv_is_allocated(bs, offset, bytes, );
 if (ret < 0) {
 goto out;
@@ -1426,9 +1429,13 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 goto out;
 }
 
+if (flags & ~bs->supported_read_flags) {
+abort();
+}
+
 max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
 if (bytes <= max_bytes && bytes <= max_transfer) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, flags);
 goto out;
 }
 
@@ -1441,7 +1448,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 
 ret = bdrv_driver_preadv(bs, offset + bytes - bytes_remaining,
  num, qiov,
- qiov_offset + bytes - bytes_remaining, 0);
+ qiov_offset + bytes - bytes_remaining,
+ flags);
 max_bytes -= num;
 } else {
 num = bytes_remaining;
-- 
2.21.3

[PATCH v14 05/13] qapi: create BlockdevOptionsCor structure for COR driver

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Create the BlockdevOptionsCor structure for COR driver specific options
splitting it off form the BlockdevOptionsGenericFormat. The only option
'bottom' node in the structure denotes an image file that limits the
COR operations in the backing chain.
We are going to use the COR-filter for a block-stream job and will pass
a bottom node name to the COR driver. The bottom node is the first
non-filter overlay of the base. It was introduced because the base node
itself may change due to possible concurrent jobs.

Suggested-by: Max Reitz 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
  [vsementsov: fix bdrv_is_allocated_above() usage]
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json | 21 +++-
 block/copy-on-read.c | 57 ++--
 2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8ef3df6767..04055ef50c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3942,6 +3942,25 @@
   'data': { 'throttle-group': 'str',
 'file' : 'BlockdevRef'
  } }
+
+##
+# @BlockdevOptionsCor:
+#
+# Driver specific block device options for the copy-on-read driver.
+#
+# @bottom: the name of a non-filter node (allocation-bearing layer) that limits
+#  the COR operations in the backing chain (inclusive).
+#  For the block-stream job, it will be the first non-filter overlay of
+#  the base node. We do not involve the base node into the COR
+#  operations because the base may change due to a concurrent
+#  block-commit job on the same backing chain.
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockdevOptionsCor',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { '*bottom': 'str' } }
+
 ##
 # @BlockdevOptions:
 #
@@ -3994,7 +4013,7 @@
   'bochs':  'BlockdevOptionsGenericFormat',
   'cloop':  'BlockdevOptionsGenericFormat',
   'compress':   'BlockdevOptionsGenericFormat',
-  'copy-on-read':'BlockdevOptionsGenericFormat',
+  'copy-on-read':'BlockdevOptionsCor',
   'dmg':'BlockdevOptionsGenericFormat',
   'file':   'BlockdevOptionsFile',
   'ftp':'BlockdevOptionsCurlFtp',
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 618c4c4f43..67f61983c0 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,18 +24,23 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
 #include "block/copy-on-read.h"
 
 
 typedef struct BDRVStateCOR {
 bool active;
+BlockDriverState *bottom_bs;
 } BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
+BlockDriverState *bottom_bs = NULL;
 BDRVStateCOR *state = bs->opaque;
+/* Find a bottom node name, if any */
+const char *bottom_node = qdict_get_try_str(options, "bottom");
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -51,7 +56,17 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
 
+if (bottom_node) {
+bottom_bs = bdrv_lookup_bs(NULL, bottom_node, errp);
+if (!bottom_bs) {
+error_setg(errp, "Bottom node '%s' not found", bottom_node);
+qdict_del(options, "bottom");
+return -EINVAL;
+}
+qdict_del(options, "bottom");
+}
 state->active = true;
+state->bottom_bs = bottom_bs;
 
 /*
  * We don't need to call bdrv_child_refresh_perms() now as the permissions
@@ -107,8 +122,46 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
size_t qiov_offset,
int flags)
 {
-return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
-   flags | BDRV_REQ_COPY_ON_READ);
+int64_t n;
+int local_flags;
+int ret;
+BDRVStateCOR *state = bs->opaque;
+
+if (!state->bottom_bs) {
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
+}
+
+while (bytes) {
+local_flags = flags;
+
+/* In case of failure, try to copy-on-read anyway */
+ret = bdrv_is_allocated(bs->file->bs, offset, bytes, );
+if (ret <= 0) {
+ret = 
bdrv_is_allocated_above(bdrv_backing_chain_next(bs->file->bs),
+  state->bottom_bs, true, offset,
+  n, );
+if (ret > 0 || ret < 0) {
+local_flags |= BDRV_REQ_COPY_ON_READ;
+}
+/*

[PATCH v14 10/13] qapi: block-stream: add "bottom" argument

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

 - drop difference between above_base and base_overlay, which will be
   renamed to just bottom, when old interface dropped

 - clean way to work with parallel streams/commits on the same backing
   chain, which otherwise become a problem when we introduce a filter
   for stream job

 - cleaner interface. Nobody will surprised the fact that base node may
   disappear during block-stream, when there is no word about "base" in
   the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json   |  8 +++--
 include/block/block_int.h  |  1 +
 block/monitor/block-hmp-cmds.c |  3 +-
 block/stream.c | 50 +++-
 blockdev.c | 61 --
 5 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04055ef50c..5d6681a35d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2522,6 +2522,10 @@
 # @base-node: the node name of the backing file.
 # It cannot be set if @base is also set. (Since 2.8)
 #
+# @bottom: the last node in the chain that should be streamed into
+#  top. It cannot be set any of @base, @base-node or @backing-file
+#  is set. It cannot be filter node. (Since 6.0)
+#
 # @backing-file: The backing file string to write into the top
 #image. This filename is not validated.
 #
@@ -2576,8 +2580,8 @@
 ##
 { 'command': 'block-stream',
   'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
-'*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
-'*on-error': 'BlockdevOnError',
+'*base-node': 'str', '*backing-file': 'str', '*bottom': 'str',
+'*speed': 'int', '*on-error': 'BlockdevOnError',
 '*filter-node-name': 'str',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 247e166ab6..b13154edbf 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1152,6 +1152,7 @@ int is_windows_drive(const char *filename);
  */
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
+  BlockDriverState *bottom,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
   const char *filter_node_name,
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index e8a58f326e..afd75ab628 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -507,7 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 int64_t speed = qdict_get_try_int(qdict, "speed", 0);
 
 qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
- false, NULL, qdict_haskey(qdict, "speed"), speed, true,
+ false, NULL, false, NULL,
+ qdict_haskey(qdict, "speed"), speed, true,
  BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, 
false,
  false, );
 
diff --git a/block/stream.c b/block/stream.c
index c208393c34..a2744d07fe 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -237,6 +237,7 @@ static const BlockJobDriver stream_job_driver = {
 
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
+  BlockDriverState *bottom,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
   const char *filter_node_name,
@@ -246,25 +247,42 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 BlockDriverState *iter;
 bool bs_read_only;
 int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
-BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
+BlockDriverState *base_overlay;
 BlockDriverState *above_base;
 
-if (!base_overlay) {
-error_setg(errp, "'%s' is not in the backing chain of '%s'",
-   base->node_name, bs->node_name);
-return;
-}
+assert(!(base && bottom));
+assert(!(backing_file_str && bottom));
+
+if (bottom) {
+/*
+ * New simple interface. The code is written in terms of old interface
+ * with @base parameter (still, it doesn't freeze link to base, so in
+ * this mean old code is correct for new interface). So, for now,

[PATCH v14 02/13] block: add API function to insert a node

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Provide API for insertion a node to backing chain.

Suggested-by: Max Reitz 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  2 ++
 block.c   | 25 +
 2 files changed, 27 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..81a3894129 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -350,6 +350,8 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
  Error **errp);
 void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
Error **errp);
+BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
+   int flags, Error **errp);
 
 int bdrv_parse_aio(const char *mode, int *flags);
 int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough);
diff --git a/block.c b/block.c
index f1cedac362..b71c39f3e6 100644
--- a/block.c
+++ b/block.c
@@ -4698,6 +4698,31 @@ static void bdrv_delete(BlockDriverState *bs)
 g_free(bs);
 }
 
+BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
+   int flags, Error **errp)
+{
+BlockDriverState *new_node_bs;
+Error *local_err = NULL;
+
+new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
+if (new_node_bs == NULL) {
+error_prepend(errp, "Could not create node: ");
+return NULL;
+}
+
+bdrv_drained_begin(bs);
+bdrv_replace_node(bs, new_node_bs, _err);
+bdrv_drained_end(bs);
+
+if (local_err) {
+bdrv_unref(new_node_bs);
+error_propagate(errp, local_err);
+return NULL;
+}
+
+return new_node_bs;
+}
+
 /*
  * Run consistency checks on an image
  *
-- 
2.21.3

[PATCH v14 04/13] qapi: add filter-node-name to block-stream

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json   | 6 ++
 include/block/block_int.h  | 7 ++-
 block/monitor/block-hmp-cmds.c | 4 ++--
 block/stream.c | 4 +++-
 blockdev.c | 4 +++-
 5 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04ad80bc1e..8ef3df6767 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2543,6 +2543,11 @@
 #'stop' and 'enospc' can only be used if the block device
 #supports io-status (see BlockInfo).  Since 1.3.
 #
+# @filter-node-name: the node name that should be assigned to the
+#filter driver that the stream job inserts into the graph
+#above @device. If this option is not given, a node name is
+#autogenerated. (Since: 5.2)
+#
 # @auto-finalize: When false, this job will wait in a PENDING state after it 
has
 # finished its work, waiting for @block-job-finalize before
 # making any block graph changes.
@@ -2573,6 +2578,7 @@
   'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
 '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
 '*on-error': 'BlockdevOnError',
+'*filter-node-name': 'str',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
 
 ##
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 95d9333be1..c05fa1eb6b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1134,6 +1134,9 @@ int is_windows_drive(const char *filename);
  *  See @BlockJobCreateFlags
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @on_error: The action to take upon error.
+ * @filter_node_name: The node name that should be assigned to the filter
+ * driver that the commit job inserts into the graph above @bs. NULL means
+ * that a node name should be autogenerated.
  * @errp: Error object.
  *
  * Start a streaming operation on @bs.  Clusters that are unallocated
@@ -1146,7 +1149,9 @@ int is_windows_drive(const char *filename);
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp);
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp);
 
 /**
  * commit_start:
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index d15a2be827..e8a58f326e 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -508,8 +508,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 
 qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
  false, NULL, qdict_haskey(qdict, "speed"), speed, true,
- BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
- );
+ BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, 
false,
+ false, );
 
 hmp_handle_error(mon, error);
 }
diff --git a/block/stream.c b/block/stream.c
index 236384f2f7..6e281c71ac 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = {
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp)
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp)
 {
 StreamBlockJob *s;
 BlockDriverState *iter;
diff --git a/blockdev.c b/blockdev.c
index fe6fb5dc1d..c917625245 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2499,6 +2499,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
   bool has_backing_file, const char *backing_file,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
+  bool has_filter_node_name, const char *filter_node_name,
   bool has_auto_finalize, bool auto_finalize,
   bool has_auto_dismiss, bool auto_dismiss,
   Error **errp)
@@ -2581,7 +2582,8 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 
 stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
- job_flags, has_speed ? speed : 0, on_error, _err);
+

[PATCH v14 03/13] copy-on-read: add filter drop function

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Provide API for the COR-filter removal. Also, drop the filter child
permissions for an inactive state when the filter node is being
removed.
To insert the filter, the block generic layer function
bdrv_insert_node() can be used.
The new function bdrv_cor_filter_drop() may be considered as an
intermediate solution before the QEMU permission update system has
overhauled. Then we are able to implement the API function
bdrv_remove_node() on the block generic layer.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.h | 32 +
 block/copy-on-read.c | 56 
 2 files changed, 88 insertions(+)
 create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 00..7bf405dccd
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,32 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 2018-2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *   Andrey Shinkevich 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#ifndef BLOCK_COPY_ON_READ
+#define BLOCK_COPY_ON_READ
+
+#include "block/block_int.h"
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
+
+#endif /* BLOCK_COPY_ON_READ */
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f2d3..618c4c4f43 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,20 @@
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
+BDRVStateCOR *state = bs->opaque;
+
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
false, errp);
@@ -42,6 +51,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
 
+state->active = true;
+
+/*
+ * We don't need to call bdrv_child_refresh_perms() now as the permissions
+ * will be updated later when the filter node gets its parent.
+ */
+
 return 0;
 }
 
@@ -57,6 +73,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild 
*c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVStateCOR *s = bs->opaque;
+
+if (!s->active) {
+/*
+ * While the filter is being removed
+ */
+*nperm = 0;
+*nshared = BLK_PERM_ALL;
+return;
+}
+
 *nperm = perm & PERM_PASSTHROUGH;
 *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +162,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool 
locked)
 
 static BlockDriver bdrv_copy_on_read = {
 .format_name= "copy-on-read",
+.instance_size  = sizeof(BDRVStateCOR),
 
 .bdrv_open  = cor_open,
 .bdrv_child_perm= cor_child_perm,
@@ -154,6 +182,34 @@ static BlockDriver bdrv_copy_on_read = {
 .is_filter  = true,
 };
 
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+BdrvChild *child;
+BlockDriverState *bs;
+BDRVStateCOR *s = cor_filter_bs->opaque;
+
+child = bdrv_filter_child(cor_filter_bs);
+if (!child) {
+return;
+}
+bs = child->bs;
+
+/* Retain the BDS until we complete the graph change. */
+bdrv_ref(bs);
+/* Hold a guest back from writing while permissions are being reset. */
+bdrv_drained_begin(bs);
+/* Drop permissions before the graph change. */
+s->active = false;
+bdrv_child_refresh_perms(cor_filter_bs, child, _abort);
+bdrv_replace_node(cor_filter_bs, bs, _abort);
+
+bdrv_drained_end(bs);
+bdrv_unref(bs);
+bdrv_unref(cor_filter_bs);
+}
+
+
 static void bdrv_copy_on_read_init(void)
 {
 bdrv_register(_copy_on_read);
-- 
2.21.3

[PATCH v14 06/13] iotests: add #310 to test bottom node in COR driver

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

The test case #310 is similar to #216 by Max Reitz. The difference is
that the test #310 involves a bottom node to the COR filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/310 | 114 +
 tests/qemu-iotests/310.out |  15 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 130 insertions(+)
 create mode 100755 tests/qemu-iotests/310
 create mode 100644 tests/qemu-iotests/310.out

diff --git a/tests/qemu-iotests/310 b/tests/qemu-iotests/310
new file mode 100755
index 00..c8b34cd887
--- /dev/null
+++ b/tests/qemu-iotests/310
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+#
+# Copy-on-read tests using a COR filter with a bottom node
+#
+# Copyright (C) 2018 Red Hat, Inc.
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+from iotests import log, qemu_img, qemu_io_silent
+
+# Need backing file support
+iotests.script_initialize(supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk'],
+  supported_platforms=['linux'])
+
+log('')
+log('=== Copy-on-read across nodes ===')
+log('')
+
+# This test is similar to the 216 one by Max Reitz 
+# The difference is that this test case involves a bottom node to the
+# COR filter driver.
+
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('mid.img') as mid_img_path, \
+ iotests.FilePath('top.img') as top_img_path, \
+ iotests.VM() as vm:
+
+log('--- Setting up images ---')
+log('')
+
+assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+assert qemu_io_silent(base_img_path, '-c', 'write -P 1 0M 1M') == 0
+assert qemu_io_silent(base_img_path, '-c', 'write -P 1 3M 1M') == 0
+assert qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
+'-F', iotests.imgfmt, mid_img_path) == 0
+assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 2M 1M') == 0
+assert qemu_io_silent(mid_img_path,  '-c', 'write -P 3 4M 1M') == 0
+assert qemu_img('create', '-f', iotests.imgfmt, '-b', mid_img_path,
+'-F', iotests.imgfmt, top_img_path) == 0
+assert qemu_io_silent(top_img_path,  '-c', 'write -P 2 1M 1M') == 0
+
+#  0 1 2 3 4
+# top2
+# mid  3   3
+# base 1 1
+
+log('Done')
+
+log('')
+log('--- Doing COR ---')
+log('')
+
+vm.launch()
+
+log(vm.qmp('blockdev-add',
+   node_name='node0',
+   driver='copy-on-read',
+   bottom='node2',
+   file={
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': top_img_path
+   },
+   'backing': {
+   'node-name': 'node2',
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': mid_img_path
+   },
+   'backing': {
+   'driver': iotests.imgfmt,
+   'file': {
+   'driver': 'file',
+   'filename': base_img_path
+   }
+   },
+   }
+   }))
+
+# Trigger COR
+log(vm.qmp('human-monitor-command',
+   command_line='qemu-io node0 "read 0 5M"'))
+
+vm.shutdown()
+
+log('')
+log('--- Checking COR result ---')
+log('')
+
+assert qemu_io_silent(base_img_path, '-c', 'discard 0 4M') == 0
+assert qemu_io_silent(mid_img_path, '-c', 'discard 0M 5M') == 0
+assert qemu_io_silent(top_img_path,  '-c', 'read -P 0 0 1M') == 0
+assert qemu_io_silent(top_img_path,  '-c', 'read -P 2 1M 1M') == 0
+assert qemu_io_silent(top_img_path,  '-c', 'read -P 3 2M 1M') == 0
+assert qemu_io_silent(top_img_path,  '-c', 'read -P 0 3M 1M') == 0
+assert qemu_io_silent(top_img_path,  '-c', 'read -P 3 4M 1M') == 0
+
+log('Done')
diff --git a/tests/qemu-iotests/310.out b/tests/qemu-iotests/310.out
new file mode 100644
index 00..a70aa5cdae
--- /dev/null
+++ b/tests/qemu-iotests/310.out
@@ -0,0 +1,15 @@
+
+=== Copy-on-read

[PATCH v14 01/13] copy-on-read: support preadv/pwritev_part functions

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

From: Andrey Shinkevich 

Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 2816e61afe..cb03e0f2d3 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs)
 }
 
 
-static int coroutine_fn cor_co_preadv(BlockDriverState *bs,
-  uint64_t offset, uint64_t bytes,
-  QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
+   uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov,
+   size_t qiov_offset,
+   int flags)
 {
-return bdrv_co_preadv(bs->file, offset, bytes, qiov,
-  flags | BDRV_REQ_COPY_ON_READ);
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
 }
 
 
-static int coroutine_fn cor_co_pwritev(BlockDriverState *bs,
-   uint64_t offset, uint64_t bytes,
-   QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
+uint64_t offset,
+uint64_t bytes,
+QEMUIOVector *qiov,
+size_t qiov_offset, int flags)
 {
-
-return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+flags);
 }
 
 
@@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = {
 
 .bdrv_getlength = cor_getlength,
 
-.bdrv_co_preadv = cor_co_preadv,
-.bdrv_co_pwritev= cor_co_pwritev,
+.bdrv_co_preadv_part= cor_co_preadv_part,
+.bdrv_co_pwritev_part   = cor_co_pwritev_part,
 .bdrv_co_pwrite_zeroes  = cor_co_pwrite_zeroes,
 .bdrv_co_pdiscard   = cor_co_pdiscard,
 .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed,
-- 
2.21.3

[PATCH v14 00/13] Apply COR-filter to the block-stream permanently

2020-12-04 Thread Vladimir Sementsov-Ogievskiy

Hi all!

I decided to post v14 myself, to show how to keep the test with parallel
stream jobs.

So, main addition in v14 is "bottom" argument for stream job. Next week
I'll send a follow-up with deprecation for old "base" API.

Also, I already finished my work on updating permissions, so that we
don't need ".active"-like things to add/drop filters, still, as v14 is
a lot bigger than v2, and I believe this this v14 is closer to be
merged, so I'd better resend my
  "[PATCH v2 00/36] block: update graph permissions update", basing on
this v14 (and reworking filter drop/remove).

05: fix bdrv_is_allocated_above() usage
09: merge change from further commit which uses unfiltered_bs for 
bdrv_find_backing_image
10: new
11: new
12: new, splitted from last commit to simplify it a bit
13: - pass bottom option to filter always (who know, may be base will appear 
during the job)
- keep test test_stream_parallel in 30 iotest
- rework changes in 245 iotest

Andrey Shinkevich (10):
  copy-on-read: support preadv/pwritev_part functions
  block: add API function to insert a node
  copy-on-read: add filter drop function
  qapi: add filter-node-name to block-stream
  qapi: create BlockdevOptionsCor structure for COR driver
  iotests: add #310 to test bottom node in COR driver
  block: include supported_read_flags into BDS structure
  copy-on-read: skip non-guest reads if no copy needed
  stream: skip filters when writing backing file name to QCOW2 header
  block: apply COR-filter to block-stream jobs

Vladimir Sementsov-Ogievskiy (3):
  qapi: block-stream: add "bottom" argument
  iotests: 30: prepare to COR filter insertion by stream job
  block/stream: add s->target_bs

 qapi/block-core.json   |  35 ++-
 block/copy-on-read.h   |  32 +++
 include/block/block.h  |  10 +-
 include/block/block_int.h  |  12 ++-
 block.c|  25 +
 block/copy-on-read.c   | 143 ---
 block/io.c |  12 ++-
 block/monitor/block-hmp-cmds.c |   7 +-
 block/stream.c | 170 +++--
 blockdev.c |  71 ++
 tests/qemu-iotests/030 |  12 ++-
 tests/qemu-iotests/141.out |   2 +-
 tests/qemu-iotests/245 |  20 ++--
 tests/qemu-iotests/310 | 114 ++
 tests/qemu-iotests/310.out |  15 +++
 tests/qemu-iotests/group   |   1 +
 16 files changed, 574 insertions(+), 107 deletions(-)
 create mode 100644 block/copy-on-read.h
 create mode 100755 tests/qemu-iotests/310
 create mode 100644 tests/qemu-iotests/310.out

-- 
2.21.3

[PATCH] target/i386/sev: add the support to query the attestation report

2020-12-04 Thread Brijesh Singh

The SEV FW >= 0.23 added a new command that can be used to query the
attestation report containing the SHA-256 digest of the guest memory
and VMSA encrypted with the LAUNCH_UPDATE and sign it with the PEK.

Note, we already have a command (LAUNCH_MEASURE) that can be used to
query the SHA-256 digest of the guest memory encrypted through the
LAUNCH_UPDATE. The main difference between previous and this command
is that the report is signed with the PEK and unlike the LAUNCH_MEASURE
command the ATTESATION_REPORT command can be called while the guest
is running.

Add a QMP interface "query-sev-attestation-report" that can be used
to get the report encoded in base64.

Cc: James Bottomley 
Cc: Tom Lendacky 
Cc: Eric Blake 
Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 linux-headers/linux/kvm.h |  8 ++
 qapi/misc-target.json | 38 +++
 target/i386/monitor.c |  6 +
 target/i386/sev-stub.c|  7 +
 target/i386/sev.c | 54 +++
 target/i386/sev_i386.h|  2 ++
 6 files changed, 115 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 56ce14ad20..6d0f8101ba 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1585,6 +1585,8 @@ enum sev_cmd_id {
KVM_SEV_DBG_ENCRYPT,
/* Guest certificates commands */
KVM_SEV_CERT_EXPORT,
+   /* Attestation report */
+   KVM_SEV_GET_ATTESTATION_REPORT,
 
KVM_SEV_NR_MAX,
 };
@@ -1637,6 +1639,12 @@ struct kvm_sev_dbg {
__u32 len;
 };
 
+struct kvm_sev_attestation_report {
+   __u8 mnonce[16];
+   __u64 uaddr;
+   __u32 len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 1e561fa97b..ec6565e6ef 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -267,3 +267,41 @@
 ##
 { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'],
   'if': 'defined(TARGET_ARM)' }
+
+
+##
+# @SevAttestationReport:
+#
+# The struct describes attestation report for a Secure Encrypted Virtualization
+# feature.
+#
+# @data:  guest attestation report (base64 encoded)
+#
+#
+# Since: 5.2
+##
+{ 'struct': 'SevAttestationReport',
+  'data': { 'data': 'str'},
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @query-sev-attestation-report:
+#
+# This command is used to get the SEV attestation report, and is supported on 
AMD
+# X86 platforms only.
+#
+# @mnonce: a random 16 bytes of data (it will be included in report)
+#
+# Returns: SevAttestationReport objects.
+#
+# Since: 5.2
+#
+# Example:
+#
+# -> { "execute" : "query-sev-attestation-report", "arguments": { "mnonce": 
"aaa" } }
+# <- { "return" : { "data": "bbbd"} }
+#
+##
+{ 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
+  'returns': 'SevAttestationReport',
+  'if': 'defined(TARGET_I386)' }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 9f9e1c42f4..a4b65f330c 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -729,3 +729,9 @@ SevCapability *qmp_query_sev_capabilities(Error **errp)
 {
 return sev_get_capabilities(errp);
 }
+
+SevAttestationReport *
+qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
+{
+return sev_get_attestation_report(mnonce, errp);
+}
diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
index 88e3f39a1e..66d16f53d8 100644
--- a/target/i386/sev-stub.c
+++ b/target/i386/sev-stub.c
@@ -49,3 +49,10 @@ SevCapability *sev_get_capabilities(Error **errp)
 error_setg(errp, "SEV is not available in this QEMU");
 return NULL;
 }
+
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+error_setg(errp, "SEV is not available in this QEMU");
+return NULL;
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 93c4d60b82..28958fb71b 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -68,6 +68,7 @@ struct SevGuestState {
 
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
+#define DEFAULT_ATTESATION_REPORT_BUF_SIZE  4096
 
 static SevGuestState *sev_guest;
 static Error *sev_mig_blocker;
@@ -490,6 +491,59 @@ out:
 return cap;
 }
 
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+struct kvm_sev_attestation_report input = {};
+SevGuestState *sev = sev_guest;
+SevAttestationReport *report;
+guchar *data;
+int err = 0, ret;
+
+if (!sev_enabled()) {
+error_setg(errp, "SEV is not enabled");
+return NULL;
+}
+
+/* Verify that user provided random data length */
+if (strlen(mnonce) != sizeof(input.mnonce)) {
+error_setg(errp, "Expected mnonce data len %ld got %ld",
+sizeof(input.mnonce), strlen(mnonce));
+

Re: [PATCH v13 00/10] Apply COR-filter to the block-stream permanently

2020-12-04 Thread Vladimir Sementsov-Ogievskiy


I still think we should keep dropped iotest by introducing "bottom" interface 
for stream job. And deprecate old interfaces.
Patch is better than arguing, so I decided to try it myself. Now I'm close to 
complete v14, so, will send it soon.

02.12.2020 21:30, Andrey Shinkevich wrote:

The previous version 12 was discussed in the email thread:
Message-Id: <1603390423-980205-1-git-send-email-andrey.shinkev...@virtuozzo.com>

v13:
   02: The bdrv_remove_node() was dropped.
   05: Three patches with fixes were merged into one.
   06: Minor changes based on Vladimir's suggestions.
   08: Three patches with fixes were merged into one.
   09: The search for format_name of backing file was added.
   10: The flag BLK_PERM_GRAPH_MOD was removed.

Andrey Shinkevich (10):
   copy-on-read: support preadv/pwritev_part functions
   block: add API function to insert a node
   copy-on-read: add filter drop function
   qapi: add filter-node-name to block-stream
   qapi: create BlockdevOptionsCor structure for COR driver
   iotests: add #310 to test bottom node in COR driver
   block: include supported_read_flags into BDS structure
   copy-on-read: skip non-guest reads if no copy needed
   stream: skip filters when writing backing file name to QCOW2 header
   block: apply COR-filter to block-stream jobs

  block.c|  25 +++
  block/copy-on-read.c   | 143 +
  block/copy-on-read.h   |  32 +
  block/io.c |  12 +++-
  block/monitor/block-hmp-cmds.c |   4 +-
  block/stream.c | 120 +++---
  blockdev.c |  12 ++--
  include/block/block.h  |  10 ++-
  include/block/block_int.h  |  11 +++-
  qapi/block-core.json   |  27 +++-
  tests/qemu-iotests/030 |  51 ++-
  tests/qemu-iotests/030.out |   4 +-
  tests/qemu-iotests/141.out |   2 +-
  tests/qemu-iotests/245 |  22 +--
  tests/qemu-iotests/310 | 114 
  tests/qemu-iotests/310.out |  15 +
  tests/qemu-iotests/group   |   1 +
  17 files changed, 484 insertions(+), 121 deletions(-)
  create mode 100644 block/copy-on-read.h
  create mode 100755 tests/qemu-iotests/310
  create mode 100644 tests/qemu-iotests/310.out




--
Best regards,
Vladimir

Re: [PATCH qemu v10] spapr: Implement Open Firmware client interface

2020-12-04 Thread Greg Kurz

On Fri, 4 Dec 2020 19:32:05 +0100
Greg Kurz  wrote:
> 
> That's all for now.
> 

Just one last item. I'm observing failures with nvram in the guest:

[root@vir76 ~]# nvram --print-config
[   88.179444] nvram[936]: unhandled signal 11 at 7fffc83a nip 
00012d802110 lr 00012d802118 code 1
Segmentation fault (core dumped)

Haven't tried to figure out why yet.

> Cheers,
> 
> --
> Greg

Re: [PATCH 2/2] nbd/server: Quiesce coroutines on context switch

2020-12-04 Thread Eric Blake

On 12/4/20 10:53 AM, Sergio Lopez wrote:
> When switching between AIO contexts we need to me make sure that both
> recv_coroutine and send_coroutine are not scheduled to run. Otherwise,
> QEMU may crash while attaching the new context with an error like
> this one:
> 
> aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'
> 
> To achieve this we need a local implementation of
> 'qio_channel_readv_all_eof' named 'nbd_read_eof' (a trick already done
> by 'nbd/client.c') that allows us to interrupt the operation and to
> know when recv_coroutine is yielding.
> 
> With this in place, we delegate detaching the AIO context to the
> owning context with a BH ('nbd_aio_detach_bh') scheduled using
> 'aio_wait_bh_oneshot'. This BH signals that we need to quiesce the
> channel by setting 'client->quiescing' to 'true', and either waits for
> the coroutine to finish using AIO_WAIT_WHILE or, if it's yielding in
> 'nbd_read_eof', actively enters the coroutine to interrupt it.
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900326
> Signed-off-by: Sergio Lopez 
> ---
>  nbd/server.c | 120 +--
>  1 file changed, 106 insertions(+), 14 deletions(-)

A complex patch, so I'd appreciate a second set of eyes.

> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 613ed2634a..7229f487d2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -132,6 +132,9 @@ struct NBDClient {
>  CoMutex send_lock;
>  Coroutine *send_coroutine;
>  
> +bool read_yielding;
> +bool quiescing;

Will either of these fields need to be accessed atomically once the
'yank' code is added, or are we still safe with direct access because
coroutines are not multithreaded?

> +
>  QTAILQ_ENTRY(NBDClient) next;
>  int nb_requests;
>  bool closing;
> @@ -1352,14 +1355,60 @@ static coroutine_fn int nbd_negotiate(NBDClient 
> *client, Error **errp)
>  return 0;
>  }
>  
> -static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request,
> +/* nbd_read_eof
> + * Tries to read @size bytes from @ioc. This is a local implementation of
> + * qio_channel_readv_all_eof. We have it here because we need it to be
> + * interruptible and to know when the coroutine is yielding.
> + * Returns 1 on success
> + * 0 on eof, when no data was read (errp is not set)
> + * negative errno on failure (errp is set)
> + */
> +static inline int coroutine_fn
> +nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
> +{
> +bool partial = false;
> +
> +assert(size);
> +while (size > 0) {
> +struct iovec iov = { .iov_base = buffer, .iov_len = size };
> +ssize_t len;
> +
> +len = qio_channel_readv(client->ioc, , 1, errp);
> +if (len == QIO_CHANNEL_ERR_BLOCK) {
> +client->read_yielding = true;
> +qio_channel_yield(client->ioc, G_IO_IN);
> +client->read_yielding = false;

nbd/client.c:nbd_read_eof() uses bdrv_dec/inc_in_flight instead of
read_yielding...

> +if (client->quiescing) {
> +return -EAGAIN;
> +}

and the quiescing check is new; otherwise, these two functions look
identical.  Having two static functions with the same name makes gdb a
bit more annoying (which one of the two did you want your breakpoint
on?).  Is there any way we could write this code only once in
nbd/common.c for reuse by both client and server?  But I can live with
it as written.

> @@ -2151,20 +2223,23 @@ static int nbd_co_send_bitmap(NBDClient *client, 
> uint64_t handle,
>  
>  /* nbd_co_receive_request
>   * Collect a client request. Return 0 if request looks valid, -EIO to drop
> - * connection right away, and any other negative value to report an error to
> - * the client (although the caller may still need to disconnect after 
> reporting
> - * the error).
> + * connection right away, -EAGAIN to indicate we were interrupted and the
> + * channel should be quiesced, and any other negative value to report an 
> error
> + * to the client (although the caller may still need to disconnect after
> + * reporting the error).
>   */
>  static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
>Error **errp)
>  {
>  NBDClient *client = req->client;
>  int valid_flags;
> +int ret;
>  
>  g_assert(qemu_in_coroutine());
>  assert(client->recv_coroutine == qemu_coroutine_self());
> -if (nbd_receive_request(client->ioc, request, errp) < 0) {
> -return -EIO;
> +ret = nbd_receive_request(client, request, errp);
> +if (ret < 0) {
> +return  ret;

Why the double space?

The old code slams to EIO, you preserve errors.  Is that going to bite
us by causing us to see a different errno leaked through?

>  }
>  
>  trace_nbd_co_receive_request_decode_type(request->handle, request->type,
> @@ -2507,6 +2582,17 @@ static coroutine_fn void nbd_trip(void *opaque)
>

Re: [PATCH v4 04/11] hvf: Introduce hvf vcpu struct

2020-12-04 Thread Alex Bennée



Alexander Graf  writes:

> We will need more than a single field for hvf going forward. To keep
> the global vcpu struct uncluttered, let's allocate a special hvf vcpu
> struct, similar to how hax does it.
>
> Signed-off-by: Alexander Graf 
> Reviewed-by: Roman Bolshakov 
> Tested-by: Roman Bolshakov 
> ---
>  accel/hvf/hvf-cpus.c|   8 +-
>  include/hw/core/cpu.h   |   3 +-
>  include/sysemu/hvf_int.h|   4 +
>  target/i386/hvf/hvf.c   | 102 +-
>  target/i386/hvf/vmx.h   |  24 +++--
>  target/i386/hvf/x86.c   |  28 ++---
>  target/i386/hvf/x86_descr.c |  26 ++---
>  target/i386/hvf/x86_emu.c   |  62 +--
>  target/i386/hvf/x86_mmu.c   |   4 +-
>  target/i386/hvf/x86_task.c  |  12 +--
>  target/i386/hvf/x86hvf.c| 210 ++--
>  11 files changed, 247 insertions(+), 236 deletions(-)
>
> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> index 60f6d76bf3..1b0c868944 100644
> --- a/accel/hvf/hvf-cpus.c
> +++ b/accel/hvf/hvf-cpus.c
> @@ -312,10 +312,12 @@ static void hvf_cpu_synchronize_pre_loadvm(CPUState 
> *cpu)
>  
>  static void hvf_vcpu_destroy(CPUState *cpu)
>  {
> -hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
> +hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
>  assert_hvf_ok(ret);
>  
>  hvf_arch_vcpu_destroy(cpu);
> +free(cpu->hvf);

You should pair g_malloc0 with g_free.

> +cpu->hvf = NULL;
>  }
>  
>  static void dummy_signal(int sig)
> @@ -326,6 +328,8 @@ static int hvf_init_vcpu(CPUState *cpu)
>  {
>  int r;
>  
> +cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
> +

Otherwise so far, so mechanical ;-)

Reviewed-by: Alex Bennée 


-- 
Alex Bennée

Re: [PATCH v3 08/10] arm/hvf: Add a WFI handler

2020-12-04 Thread Roman Bolshakov

On Thu, Dec 03, 2020 at 10:18:14AM -0800, Peter Collingbourne wrote:
> On Thu, Dec 3, 2020 at 2:39 AM Roman Bolshakov  wrote:
> >
> > On Wed, Dec 02, 2020 at 08:04:06PM +0100, Alexander Graf wrote:
> > > From: Peter Collingbourne 
> > >
> > > Sleep on WFI until the VTIMER is due but allow ourselves to be woken
> > > up on IPI.
> > >
> > > Signed-off-by: Peter Collingbourne 
> > > [agraf: Remove unused 'set' variable, always advance PC on WFX trap]
> > > Signed-off-by: Alexander Graf 
> > > ---
> > > +static void hvf_wait_for_ipi(CPUState *cpu, struct timespec *ts)
> > > +{
> > > +/*
> > > + * Use pselect to sleep so that other threads can IPI us while we're
> > > + * sleeping.
> > > + */
> > > +qatomic_mb_set(>thread_kicked, false);
> > > +qemu_mutex_unlock_iothread();
> >
> > I raised a concern earlier, but I don't for sure if a kick could be lost
> > right here. On x86 it could be lost.
> 
> If the signal is sent right before the pselect() it will be blocked
> i.e. left pending. With the pselect() we get an atomic unblock of
> SIG_IPI at the same time as we begin sleeping, which means that we
> will receive the signal and leave the pselect() immediately.
> 
> I think at some point macOS had an incorrect implementation of
> pselect() where the signal mask was non-atomically set in userspace
> which could lead to the signal being missed but I checked the latest
> XNU sources and it looks like the pselect() implementation has been
> moved to the kernel.
> 

Yeah, you're right here.

> > > +pselect(0, 0, 0, 0, ts, >hvf->unblock_ipi_mask);
> > > +qemu_mutex_lock_iothread();
> > > +}
> > > +
> > >  int hvf_vcpu_exec(CPUState *cpu)
> > >  {
> > >  ARMCPU *arm_cpu = ARM_CPU(cpu);
> > > @@ -579,6 +594,46 @@ int hvf_vcpu_exec(CPUState *cpu)
> > >  }
> > >  case EC_WFX_TRAP:
> > >  advance_pc = true;
> > > +if (!(syndrome & WFX_IS_WFE) && !(cpu->interrupt_request &
> > > +(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > > +
> > > +uint64_t ctl;
> > > +r = hv_vcpu_get_sys_reg(cpu->hvf->fd, 
> > > HV_SYS_REG_CNTV_CTL_EL0,
> > > +);
> > > +assert_hvf_ok(r);
> > > +
> > > +if (!(ctl & 1) || (ctl & 2)) {
> > > +/* Timer disabled or masked, just wait for an IPI. */
> > > +hvf_wait_for_ipi(cpu, NULL);
> > > +break;
> > > +}
> > > +
> > > +uint64_t cval;
> > > +r = hv_vcpu_get_sys_reg(cpu->hvf->fd, 
> > > HV_SYS_REG_CNTV_CVAL_EL0,
> > > +);
> > > +assert_hvf_ok(r);
> > > +
> > > +int64_t ticks_to_sleep = cval - mach_absolute_time();
> >
> >
> > Apple reference recommends to use [1]:
> >
> >   clock_gettime_nsec_np(CLOCK_UPTIME_RAW)
> >
> > It, internally in Libc, invokes mach_absolute_time() [2].
> >
> > 1. 
> > https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time
> > 2. 
> > https://opensource.apple.com/source/Libc/Libc-1158.1.2/gen/clock_gettime.c.auto.html
> 
> I think that recommendation is because most people want to deal with
> seconds, not ticks. In our case we specifically want ticks because
> we're comparing against a ticks value from the guest, so I don't see
> the benefit of converting from ticks to seconds and back again.
> 

Thanks for the clarifications, Peter.

Regards,
Roman

Re: [RFC v7 15/22] cpu: Move tlb_fill to tcg_ops

2020-12-04 Thread Claudio Fontana

On 12/4/20 6:37 PM, Eduardo Habkost wrote:
> On Fri, Dec 04, 2020 at 06:14:07PM +0100, Philippe Mathieu-Daudé wrote:
>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
>>> From: Eduardo Habkost 
>>>
>>> Signed-off-by: Eduardo Habkost 
>>> ---
>>>  accel/tcg/cputlb.c  |  6 +++---
>>>  accel/tcg/user-exec.c   |  6 +++---
>>>  include/hw/core/cpu.h   |  9 -
>>>  include/hw/core/tcg-cpu-ops.h   | 12 
>>>  target/alpha/cpu.c  |  2 +-
>>>  target/arm/cpu.c|  2 +-
>>>  target/avr/cpu.c|  2 +-
>>>  target/cris/cpu.c   |  2 +-
>>>  target/hppa/cpu.c   |  2 +-
>>>  target/i386/tcg-cpu.c   |  2 +-
>>>  target/lm32/cpu.c   |  2 +-
>>>  target/m68k/cpu.c   |  2 +-
>>>  target/microblaze/cpu.c |  2 +-
>>>  target/mips/cpu.c   |  2 +-
>>>  target/moxie/cpu.c  |  2 +-
>>>  target/nios2/cpu.c  |  2 +-
>>>  target/openrisc/cpu.c   |  2 +-
>>>  target/ppc/translate_init.c.inc |  2 +-
>>>  target/riscv/cpu.c  |  2 +-
>>>  target/rx/cpu.c |  2 +-
>>>  target/s390x/cpu.c  |  2 +-
>>>  target/sh4/cpu.c|  2 +-
>>>  target/sparc/cpu.c  |  2 +-
>>>  target/tilegx/cpu.c |  2 +-
>>>  target/tricore/cpu.c|  2 +-
>>>  target/unicore32/cpu.c  |  2 +-
>>>  target/xtensa/cpu.c |  2 +-
>>>  27 files changed, 41 insertions(+), 38 deletions(-)
>>
>> With cc->tcg_ops.* guarded with #ifdef CONFIG_TCG:
>> Reviewed-by: Philippe Mathieu-Daudé 
> 
> Thanks!
> 
> Are the #ifdefs a hard condition for your Reviewed-by?
> 
> Even if we agree #ifdef CONFIG_TCG is the way to go, I don't
> think this should block a series that's a step in the right
> direction.  It can be done in a separate patch.
> 
> (Unless the lack of #ifdef introduces regressions, of course)
> 

Hi,

I would add ifdefs to all targets that are not TCG-only (for now).

If a target is tcg-only, there is of course no point in adding ifdefs.

For the others, the ifdefs is something that helps us reorg the code into 
separate blocks,
and then we can move them to separate .c files and remove the ifdefs.

Ciao,

Claudio

Re: [PATCH v4 08/11] arm: Add Hypervisor.framework build target

2020-12-04 Thread Alex Bennée



Alexander Graf  writes:

> Now that we have all logic in place that we need to handle 
> Hypervisor.framework
> on Apple Silicon systems, let's add CONFIG_HVF for aarch64 as well so that we
> can build it.
>
> Signed-off-by: Alexander Graf 
>
> ---
>
> v1 -> v2:
>
>   - Fix build on 32bit arm
>
> v3 -> v4:
>
>   - Remove i386-softmmu target
> ---
>  meson.build| 11 ++-
>  target/arm/hvf/meson.build |  3 +++
>  target/arm/meson.build |  2 ++
>  3 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 target/arm/hvf/meson.build
>
> diff --git a/meson.build b/meson.build
> index 86d433c8a4..a2323e8d23 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -74,16 +74,25 @@ else
>  endif
>  
>  accelerator_targets = { 'CONFIG_KVM': kvm_targets }
> +
> +if cpu in ['x86', 'x86_64']
> +  hvf_targets = ['x86_64-softmmu']
> +elif cpu in ['aarch64']
> +  hvf_targets = ['aarch64-softmmu']
> +else
> +  hvf_targets = []
> +endif
> +
>  if cpu in ['x86', 'x86_64', 'arm', 'aarch64']
># i368 emulator provides xenpv machine type for multiple architectures
>accelerator_targets += {
>  'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu'],
> +'CONFIG_HVF': hvf_targets,

I can see this logic continuing to get messier as I just hit a merge
conflict with my Xen on qemu-system-aarch64 patches. Not sure if there
is a cleaner approach though.

>}
>  endif
>  if cpu in ['x86', 'x86_64']
>accelerator_targets += {
>  'CONFIG_HAX': ['i386-softmmu', 'x86_64-softmmu'],
> -'CONFIG_HVF': ['x86_64-softmmu'],
>  'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
>}
>  endif
> diff --git a/target/arm/hvf/meson.build b/target/arm/hvf/meson.build
> new file mode 100644
> index 00..855e6cce5a
> --- /dev/null
> +++ b/target/arm/hvf/meson.build
> @@ -0,0 +1,3 @@
> +arm_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
> +  'hvf.c',
> +))
> diff --git a/target/arm/meson.build b/target/arm/meson.build
> index f5de2a77b8..95bebae216 100644
> --- a/target/arm/meson.build
> +++ b/target/arm/meson.build
> @@ -56,5 +56,7 @@ arm_softmmu_ss.add(files(
>'psci.c',
>  ))
>  
> +subdir('hvf')
> +
>  target_arch += {'arm': arm_ss}
>  target_softmmu_arch += {'arm': arm_softmmu_ss}


-- 
Alex Bennée

Re: [PATCH v2] tests/acceptance: test hot(un)plug of ccw devices

2020-12-04 Thread Wainer dos Santos Moschetta




On 12/4/20 11:08 AM, Cornelia Huck wrote:

On Fri, 4 Dec 2020 11:05:34 -0300
Wainer dos Santos Moschetta  wrote:


Hi,

On 12/4/20 9:14 AM, Cornelia Huck wrote:

Hotplug a virtio-net-ccw device, and then hotunplug it again.

Signed-off-by: Cornelia Huck 
---

v1->v2:
- switch device id
- clear out dmesg before looking for CRW messages

---
   tests/acceptance/machine_s390_ccw_virtio.py | 16 
   1 file changed, 16 insertions(+)

diff --git a/tests/acceptance/machine_s390_ccw_virtio.py 
b/tests/acceptance/machine_s390_ccw_virtio.py
index 53b8484f8f9c..83c00190621b 100644
--- a/tests/acceptance/machine_s390_ccw_virtio.py
+++ b/tests/acceptance/machine_s390_ccw_virtio.py
@@ -97,3 +97,19 @@ class S390CCWVirtioMachine(Test):
   exec_command_and_wait_for_pattern(self,
 'cat 
/sys/bus/pci/devices/000a\:00\:00.0/function_id',
 '0x000c')
+# add another device
+exec_command_and_wait_for_pattern(self, 'dmesg -c', ' ')


The problem is that `dmesg -c` will fail if you run the test with
unprivileged user.

Hm, why should that make a difference for a guest command?



Never mind, my brain mix host and guest very often

Reviewed-by: Wainer dos Santos Moschetta 





- Wainer


+self.vm.command('device_add', driver='virtio-net-ccw',
+devno='fe.0.4711', id='net_4711')
+exec_command_and_wait_for_pattern(self, 'dmesg', 'CRW')
+exec_command_and_wait_for_pattern(self, 'ls /sys/bus/ccw/devices/',
+  '0.0.4711')
+# and detach it again
+exec_command_and_wait_for_pattern(self, 'dmesg -c', ' ')
+self.vm.command('device_del', id='net_4711')
+self.vm.event_wait(name='DEVICE_DELETED',
+   match={'data': {'device': 'net_4711'}})
+exec_command_and_wait_for_pattern(self, 'dmesg', 'CRW')
+exec_command_and_wait_for_pattern(self,
+  'ls /sys/bus/ccw/devices/0.0.4711',
+  'No such file or directory')

Re: [RFC v7 12/22] cpu: Introduce TCGCpuOperations struct

2020-12-04 Thread Claudio Fontana

On 12/4/20 6:10 PM, Philippe Mathieu-Daudé wrote:
> On 11/30/20 3:35 AM, Claudio Fontana wrote:
>> From: Eduardo Habkost 
>>
>> The TCG-specific CPU methods will be moved to a separate struct,
>> to make it easier to move accel-specific code outside generic CPU
>> code in the future.  Start by moving tcg_initialize().
> 
> Good idea! One minor comment below.
> 
>>
>> The new CPUClass.tcg_opts field may eventually become a pointer,
>> but keep it an embedded struct for now, to make code conversion
>> easier.
>>
>> Signed-off-by: Eduardo Habkost 
>> ---
>>  MAINTAINERS |  1 +
>>  cpu.c   |  2 +-
>>  include/hw/core/cpu.h   |  9 -
>>  include/hw/core/tcg-cpu-ops.h   | 25 +
>>  target/alpha/cpu.c  |  2 +-
>>  target/arm/cpu.c|  2 +-
>>  target/avr/cpu.c|  2 +-
>>  target/cris/cpu.c   | 12 ++--
>>  target/hppa/cpu.c   |  2 +-
>>  target/i386/tcg-cpu.c   |  2 +-
>>  target/lm32/cpu.c   |  2 +-
>>  target/m68k/cpu.c   |  2 +-
>>  target/microblaze/cpu.c |  2 +-
>>  target/mips/cpu.c   |  2 +-
>>  target/moxie/cpu.c  |  2 +-
>>  target/nios2/cpu.c  |  2 +-
>>  target/openrisc/cpu.c   |  2 +-
>>  target/ppc/translate_init.c.inc |  2 +-
>>  target/riscv/cpu.c  |  2 +-
>>  target/rx/cpu.c |  2 +-
>>  target/s390x/cpu.c  |  2 +-
>>  target/sh4/cpu.c|  2 +-
>>  target/sparc/cpu.c  |  2 +-
>>  target/tilegx/cpu.c |  2 +-
>>  target/tricore/cpu.c|  2 +-
>>  target/unicore32/cpu.c  |  2 +-
>>  target/xtensa/cpu.c |  2 +-
>>  27 files changed, 63 insertions(+), 30 deletions(-)
>>  create mode 100644 include/hw/core/tcg-cpu-ops.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index f53f2678d8..d876f504a6 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -1535,6 +1535,7 @@ F: qapi/machine.json
>>  F: qapi/machine-target.json
>>  F: include/hw/boards.h
>>  F: include/hw/core/cpu.h
>> +F: include/hw/core/tcg-cpu-ops.h
>>  F: include/hw/cpu/cluster.h
>>  F: include/sysemu/numa.h
>>  T: git https://github.com/ehabkost/qemu.git machine-next
>> diff --git a/cpu.c b/cpu.c
>> index 0be5dcb6f3..d02c2a17f1 100644
>> --- a/cpu.c
>> +++ b/cpu.c
>> @@ -180,7 +180,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>>  
>>  if (tcg_enabled() && !tcg_target_initialized) {
>>  tcg_target_initialized = true;
>> -cc->tcg_initialize();
>> +cc->tcg_ops.initialize();
>>  }
>>  tlb_init(cpu);
>>  
>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>> index 3d92c967ff..c93b08a0fb 100644
>> --- a/include/hw/core/cpu.h
>> +++ b/include/hw/core/cpu.h
>> @@ -76,6 +76,10 @@ typedef struct CPUWatchpoint CPUWatchpoint;
>>  
>>  struct TranslationBlock;
>>  
>> +#ifdef CONFIG_TCG
>> +#include "tcg-cpu-ops.h"
>> +#endif /* CONFIG_TCG */
>> +
>>  /**
>>   * CPUClass:
>>   * @class_by_name: Callback to map -cpu command line model name to an
>> @@ -221,12 +225,15 @@ struct CPUClass {
>>  
>>  void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>>  vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int len);
>> -void (*tcg_initialize)(void);
>>  
>>  const char *deprecation_note;
>>  /* Keep non-pointer data at the end to minimize holes.  */
>>  int gdb_num_core_regs;
>>  bool gdb_stop_before_watchpoint;
>> +
>> +#ifdef CONFIG_TCG
>> +TcgCpuOperations tcg_ops;
>> +#endif /* CONFIG_TCG */
>>  };
>>  
>>  /*
>> diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
>> new file mode 100644
>> index 00..4475ef0996
>> --- /dev/null
>> +++ b/include/hw/core/tcg-cpu-ops.h
>> @@ -0,0 +1,25 @@
>> +/*
>> + * TCG-Specific operations that are not meaningful for hardware accelerators
>> + *
>> + * Copyright 2020 SUSE LLC
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef TCG_CPU_OPS_H
>> +#define TCG_CPU_OPS_H
>> +
>> +/**
>> + * struct TcgCpuOperations: TCG operations specific to a CPU class
>> + */
>> +typedef struct TcgCpuOperations {
>> +/**
>> + * @initialize: Initalize TCG state
>> + *
>> + * Called when the first CPU is realized.
>> + */
>> +void (*initialize)(void);
>> +} TcgCpuOperations;
>> +
>> +#endif /* TCG_CPU_OPS_H */
> ...
> 
>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
>> index 07492e9f9a..1fa9382a7c 100644
>> --- a/target/arm/cpu.c
>> +++ b/target/arm/cpu.c
>> @@ -2261,7 +2261,7 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
>> *data)
>>  cc->gdb_stop_before_watchpoint = true;
>>  cc->disas_set_info = arm_disas_set_info;
>>  #ifdef CONFIG_TCG
>> -cc->tcg_initialize = arm_translate_init;
>> +

Re: [RFC v7 12/22] cpu: Introduce TCGCpuOperations struct

2020-12-04 Thread Claudio Fontana

On 12/4/20 7:04 PM, Claudio Fontana wrote:
> On 12/4/20 6:28 PM, Eduardo Habkost wrote:
>> On Fri, Dec 04, 2020 at 06:10:49PM +0100, Philippe Mathieu-Daudé wrote:
>>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
 From: Eduardo Habkost 

 The TCG-specific CPU methods will be moved to a separate struct,
 to make it easier to move accel-specific code outside generic CPU
 code in the future.  Start by moving tcg_initialize().
>>>
>>> Good idea! One minor comment below.
>>>

 The new CPUClass.tcg_opts field may eventually become a pointer,
 but keep it an embedded struct for now, to make code conversion
 easier.

 Signed-off-by: Eduardo Habkost 
 ---
  MAINTAINERS |  1 +
  cpu.c   |  2 +-
  include/hw/core/cpu.h   |  9 -
  include/hw/core/tcg-cpu-ops.h   | 25 +
  target/alpha/cpu.c  |  2 +-
  target/arm/cpu.c|  2 +-
  target/avr/cpu.c|  2 +-
  target/cris/cpu.c   | 12 ++--
  target/hppa/cpu.c   |  2 +-
  target/i386/tcg-cpu.c   |  2 +-
  target/lm32/cpu.c   |  2 +-
  target/m68k/cpu.c   |  2 +-
  target/microblaze/cpu.c |  2 +-
  target/mips/cpu.c   |  2 +-
  target/moxie/cpu.c  |  2 +-
  target/nios2/cpu.c  |  2 +-
  target/openrisc/cpu.c   |  2 +-
  target/ppc/translate_init.c.inc |  2 +-
  target/riscv/cpu.c  |  2 +-
  target/rx/cpu.c |  2 +-
  target/s390x/cpu.c  |  2 +-
  target/sh4/cpu.c|  2 +-
  target/sparc/cpu.c  |  2 +-
  target/tilegx/cpu.c |  2 +-
  target/tricore/cpu.c|  2 +-
  target/unicore32/cpu.c  |  2 +-
  target/xtensa/cpu.c |  2 +-
  27 files changed, 63 insertions(+), 30 deletions(-)
  create mode 100644 include/hw/core/tcg-cpu-ops.h

 diff --git a/MAINTAINERS b/MAINTAINERS
 index f53f2678d8..d876f504a6 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -1535,6 +1535,7 @@ F: qapi/machine.json
  F: qapi/machine-target.json
  F: include/hw/boards.h
  F: include/hw/core/cpu.h
 +F: include/hw/core/tcg-cpu-ops.h
  F: include/hw/cpu/cluster.h
  F: include/sysemu/numa.h
  T: git https://github.com/ehabkost/qemu.git machine-next
 diff --git a/cpu.c b/cpu.c
 index 0be5dcb6f3..d02c2a17f1 100644
 --- a/cpu.c
 +++ b/cpu.c
 @@ -180,7 +180,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
  
  if (tcg_enabled() && !tcg_target_initialized) {
  tcg_target_initialized = true;
 -cc->tcg_initialize();
 +cc->tcg_ops.initialize();
  }
  tlb_init(cpu);
  
 diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
 index 3d92c967ff..c93b08a0fb 100644
 --- a/include/hw/core/cpu.h
 +++ b/include/hw/core/cpu.h
 @@ -76,6 +76,10 @@ typedef struct CPUWatchpoint CPUWatchpoint;
  
  struct TranslationBlock;
  
 +#ifdef CONFIG_TCG
 +#include "tcg-cpu-ops.h"
 +#endif /* CONFIG_TCG */
 +
  /**
   * CPUClass:
   * @class_by_name: Callback to map -cpu command line model name to an
 @@ -221,12 +225,15 @@ struct CPUClass {
  
  void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
  vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int 
 len);
 -void (*tcg_initialize)(void);
  
  const char *deprecation_note;
  /* Keep non-pointer data at the end to minimize holes.  */
  int gdb_num_core_regs;
  bool gdb_stop_before_watchpoint;
 +
 +#ifdef CONFIG_TCG
 +TcgCpuOperations tcg_ops;
 +#endif /* CONFIG_TCG */
  };
>>
>> I'm not a fan of #ifdefs in struct definitions (especially in
>> generic code like hw/cpu), because there's risk the same header
>> generate different struct layout when used by different .c files.
>> I would prefer to gradually refactor the code so that tcg_ops is
>> eventually removed from CPUClass.
>>
>> This is not a dealbreaker, because both approaches are steps in
>> the same direction.  But the #ifdef here makes review harder and
>> has more risks of unwanted side effects.
>>
  
  /*
 diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
 new file mode 100644
 index 00..4475ef0996
 --- /dev/null
 +++ b/include/hw/core/tcg-cpu-ops.h
 @@ -0,0 +1,25 @@
 +/*
 + * TCG-Specific operations that are not meaningful for hardware 
 accelerators
 + *
 + * Copyright 2020 SUSE LLC
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or 
 later.
 + * See the COPYING file in the

Re: [PATCH qemu v10] spapr: Implement Open Firmware client interface

2020-12-04 Thread Greg Kurz

On Tue, 13 Oct 2020 13:19:11 +1100
Alexey Kardashevskiy  wrote:

> The PAPR platform which describes an OS environment that's presented by
> a combination of a hypervisor and firmware. The features it specifies
> require collaboration between the firmware and the hypervisor.
> 
> Since the beginning, the runtime component of the firmware (RTAS) has
> been implemented as a 20 byte shim which simply forwards it to
> a hypercall implemented in qemu. The boot time firmware component is
> SLOF - but a build that's specific to qemu, and has always needed to be
> updated in sync with it. Even though we've managed to limit the amount
> of runtime communication we need between qemu and SLOF, there's some,
> and it has become increasingly awkward to handle as we've implemented
> new features.
> 
> This implements a boot time OF client interface (CI) which is
> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
> which implements Open Firmware Client Interface (OF CI). This allows
> using a smaller stateless firmware which does not have to manage
> the device tree.
> 
> The new "vof.bin" firmware image is included with source code under
> pc-bios/. It also includes RTAS blob.
> 
> This implements a handful of CI methods just to get -kernel/-initrd
> working. In particular, this implements the device tree fetching and
> simple memory allocator - "claim" (an OF CI memory allocator) and updates
> "/memory@0/available" to report the client about available memory.
> 
> This implements changing some device tree properties which we know how
> to deal with, the rest is ignored. To allow changes, this skips
> fdt_pack() when x-vof=on as not packing the blob leaves some room for
> appending.
> 
> In absence of SLOF, this assigns phandles to device tree nodes to make
> device tree traversing work.
> 
> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
> 
> This adds basic instances support which are managed by a hash map
> ihandle -> [phandle].
> 
> Before the guest started, the used memory is:
> 0..4000 - the initial firmware
> 1..18 - stack
> 
> This OF CI does not implement "interpret".
> 
> With this basic support, this can only boot into kernel directly.

Maybe worth erroring out if -kernel is missing then.

eg.

void spapr_of_client_machine_init(SpaprMachineState *spapr)
{
if (!spapr->kernel_size) {
error_report("The 'x-vof' machine property requires '-kernel'");
exit(EXIT_FAILURE);
}
spapr_register_hypercall(KVMPPC_H_OF_CLIENT, spapr_h_of_client);
}

> However this is just enough for the petitboot kernel and initradmdisk to
> boot from any possible source. Note this requires reasonably recent guest
> kernel with:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> 

FWIW it worked flawlessly with the vmlinuz and initramfs of a recent
rhel8 guest.

The patch is huge and I never find time to do a full review...  so instead
of postponing again and again, I post what I have noted so far.

Please find some comments below.

> Signed-off-by: Alexey Kardashevskiy 
> ---

[...] 

> @@ -1646,22 +1650,36 @@ static void spapr_machine_reset(MachineState *machine)
>  
>  fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE);
>  
> -rc = fdt_pack(fdt);
> -
> -/* Should only fail if we've built a corrupted tree */
> -assert(rc == 0);
> -
> -/* Load the fdt */
>  qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
> -cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
> +
>  g_free(spapr->fdt_blob);
>  spapr->fdt_size = fdt_totalsize(fdt);
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;

It is a bit confusing that these are set here and...

>  
>  /* Set up the entry state */
> -spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, 0, fdt_addr, 
> 0);
>  first_ppc_cpu->env.gpr[5] = 0;
> +if (spapr->vof) {
> +target_ulong stack_ptr = 0;
> +
> +spapr_setup_of_client(spapr, _ptr);
> +spapr_of_client_dt_finalize(spapr);
> +spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
> +  stack_ptr, spapr->initrd_base,
> +  spapr->initrd_size);
> +} else {
> +/* Load the fdt */
> +rc = fdt_pack(spapr->fdt_blob);
> +/* Should only fail if we've built a corrupted tree */
> +assert(rc == 0);
> +
> +spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
> +spapr->fdt_initial_size = spapr->fdt_size;

... overwritten there. I guess this is because fdt_pack() has an
impact on fdt_totalsize(), right ? Could this be consolidated
in an helper that optionally calls fdt_pack() ?

> +cpu_physical_memory_write(fdt_addr, spapr->fdt_blob, 
> spapr->fdt_size);
> +
> +spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
>

Re: [RFC v7 15/22] cpu: Move tlb_fill to tcg_ops

2020-12-04 Thread Philippe Mathieu-Daudé

On 12/4/20 6:37 PM, Eduardo Habkost wrote:
> On Fri, Dec 04, 2020 at 06:14:07PM +0100, Philippe Mathieu-Daudé wrote:
>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
>>> From: Eduardo Habkost 
>>>
>>> Signed-off-by: Eduardo Habkost 
>>> ---
>>>  accel/tcg/cputlb.c  |  6 +++---
>>>  accel/tcg/user-exec.c   |  6 +++---
>>>  include/hw/core/cpu.h   |  9 -
>>>  include/hw/core/tcg-cpu-ops.h   | 12 
>>>  target/alpha/cpu.c  |  2 +-
>>>  target/arm/cpu.c|  2 +-
>>>  target/avr/cpu.c|  2 +-
>>>  target/cris/cpu.c   |  2 +-
>>>  target/hppa/cpu.c   |  2 +-
>>>  target/i386/tcg-cpu.c   |  2 +-
>>>  target/lm32/cpu.c   |  2 +-
>>>  target/m68k/cpu.c   |  2 +-
>>>  target/microblaze/cpu.c |  2 +-
>>>  target/mips/cpu.c   |  2 +-
>>>  target/moxie/cpu.c  |  2 +-
>>>  target/nios2/cpu.c  |  2 +-
>>>  target/openrisc/cpu.c   |  2 +-
>>>  target/ppc/translate_init.c.inc |  2 +-
>>>  target/riscv/cpu.c  |  2 +-
>>>  target/rx/cpu.c |  2 +-
>>>  target/s390x/cpu.c  |  2 +-
>>>  target/sh4/cpu.c|  2 +-
>>>  target/sparc/cpu.c  |  2 +-
>>>  target/tilegx/cpu.c |  2 +-
>>>  target/tricore/cpu.c|  2 +-
>>>  target/unicore32/cpu.c  |  2 +-
>>>  target/xtensa/cpu.c |  2 +-
>>>  27 files changed, 41 insertions(+), 38 deletions(-)
>>
>> With cc->tcg_ops.* guarded with #ifdef CONFIG_TCG:
>> Reviewed-by: Philippe Mathieu-Daudé 
> 
> Thanks!
> 
> Are the #ifdefs a hard condition for your Reviewed-by?

No, as you said, this is fine as a first step, so you can
include them.

> Even if we agree #ifdef CONFIG_TCG is the way to go, I don't
> think this should block a series that's a step in the right
> direction.  It can be done in a separate patch.
> 
> (Unless the lack of #ifdef introduces regressions, of course)

I'm worried about the +system -tcg build configuration.

s390x is the only target testing for such regressions
(see "[s390x] Clang (disable-tcg)" on Travis-CI.

Re: x86 TCG helpers clobbered registers

2020-12-04 Thread Richard Henderson

On 12/4/20 9:36 AM, Stephane Duverger wrote:
> Hello,
> 
> While looking at tcg/i386/tcg-target.c.inc:tcg_out_qemu_st(), I
> discovered that the TCG generates a call to a store helper at the end
> of the TB which is executed on TLB miss and get back to the remaining
> translated ops. I tried to mimick this behavior around the fast path
> (right between tcg_out_tlb_load() and tcg_out_qemu_st_direct()) to
> filter on memory store accesses.

There's your bug -- don't do that.

> I know there is now TCG plugins for that purpose at TCG IR level,
> which every tcg-target might benefit. FWIW, my design choice was more
> led by the fact that I always work on an x86 host and plugins did not
> exist by the time. Anyway, the point is more related to generating a
> call to a helper at the TCG IR level (classic scenario), or later
> during tcg-target code generation (slow path for instance).

You can't just inject a call anywhere you like.  If you add it at the IR level,
then the rest of the compiler will see it and work properly.  If you add the
call in the middle of another operation, the compiler doesn't get to see it and
Bad Things Happen.

> The TCG when calling a helper knows that some registers will be call
> clobbered and as such must free them. This is what I observed in
> tcg_reg_alloc_call():
> 
> /* clobber call registers */
> for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
> if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
> tcg_reg_free(s, i, allocated_regs);
> }
> }
> 
> But in our case (ie. INDEX_op_qemu_st_i32), the TCG code path comes
> from:
> 
> tcg_reg_alloc_op()
>   tcg_out_op()
> tcg_out_qemu_st()
> 
> Then tcg_out_tlb_load() will inject a 'jmp' to the slow path, whose
> generated code does not seem to take care of every call clobbered
> registers, if we look at tcg_out_qemu_st_slow_path().

You missed

> if (def->flags & TCG_OPF_CALL_CLOBBER) {
> /* XXX: permit generic clobber register list ? */ 
> for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
> if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
> tcg_reg_free(s, i, i_allocated_regs);
> }
> }
> }

which handles this in tcg_reg_alloc_op.


> First for an i386 (32bits) tcg-target, as expected, the helper
> arguments are injected into the stack. I noticed that 'esp' is not
> shifted down before stacking up the args, which might corrupt last
> stacked words.

No, we generate code for a constant esp, as if by gcc's -mno-push-args option.
 We have reserved TCG_STATIC_CALL_ARGS_SIZE bytes of stack for the arguments
(which is actually larger than necessary for any of the tcg targets).


r~

Re: [RFC v7 12/22] cpu: Introduce TCGCpuOperations struct

2020-12-04 Thread Claudio Fontana

On 12/4/20 6:28 PM, Eduardo Habkost wrote:
> On Fri, Dec 04, 2020 at 06:10:49PM +0100, Philippe Mathieu-Daudé wrote:
>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
>>> From: Eduardo Habkost 
>>>
>>> The TCG-specific CPU methods will be moved to a separate struct,
>>> to make it easier to move accel-specific code outside generic CPU
>>> code in the future.  Start by moving tcg_initialize().
>>
>> Good idea! One minor comment below.
>>
>>>
>>> The new CPUClass.tcg_opts field may eventually become a pointer,
>>> but keep it an embedded struct for now, to make code conversion
>>> easier.
>>>
>>> Signed-off-by: Eduardo Habkost 
>>> ---
>>>  MAINTAINERS |  1 +
>>>  cpu.c   |  2 +-
>>>  include/hw/core/cpu.h   |  9 -
>>>  include/hw/core/tcg-cpu-ops.h   | 25 +
>>>  target/alpha/cpu.c  |  2 +-
>>>  target/arm/cpu.c|  2 +-
>>>  target/avr/cpu.c|  2 +-
>>>  target/cris/cpu.c   | 12 ++--
>>>  target/hppa/cpu.c   |  2 +-
>>>  target/i386/tcg-cpu.c   |  2 +-
>>>  target/lm32/cpu.c   |  2 +-
>>>  target/m68k/cpu.c   |  2 +-
>>>  target/microblaze/cpu.c |  2 +-
>>>  target/mips/cpu.c   |  2 +-
>>>  target/moxie/cpu.c  |  2 +-
>>>  target/nios2/cpu.c  |  2 +-
>>>  target/openrisc/cpu.c   |  2 +-
>>>  target/ppc/translate_init.c.inc |  2 +-
>>>  target/riscv/cpu.c  |  2 +-
>>>  target/rx/cpu.c |  2 +-
>>>  target/s390x/cpu.c  |  2 +-
>>>  target/sh4/cpu.c|  2 +-
>>>  target/sparc/cpu.c  |  2 +-
>>>  target/tilegx/cpu.c |  2 +-
>>>  target/tricore/cpu.c|  2 +-
>>>  target/unicore32/cpu.c  |  2 +-
>>>  target/xtensa/cpu.c |  2 +-
>>>  27 files changed, 63 insertions(+), 30 deletions(-)
>>>  create mode 100644 include/hw/core/tcg-cpu-ops.h
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index f53f2678d8..d876f504a6 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -1535,6 +1535,7 @@ F: qapi/machine.json
>>>  F: qapi/machine-target.json
>>>  F: include/hw/boards.h
>>>  F: include/hw/core/cpu.h
>>> +F: include/hw/core/tcg-cpu-ops.h
>>>  F: include/hw/cpu/cluster.h
>>>  F: include/sysemu/numa.h
>>>  T: git https://github.com/ehabkost/qemu.git machine-next
>>> diff --git a/cpu.c b/cpu.c
>>> index 0be5dcb6f3..d02c2a17f1 100644
>>> --- a/cpu.c
>>> +++ b/cpu.c
>>> @@ -180,7 +180,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>>>  
>>>  if (tcg_enabled() && !tcg_target_initialized) {
>>>  tcg_target_initialized = true;
>>> -cc->tcg_initialize();
>>> +cc->tcg_ops.initialize();
>>>  }
>>>  tlb_init(cpu);
>>>  
>>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>>> index 3d92c967ff..c93b08a0fb 100644
>>> --- a/include/hw/core/cpu.h
>>> +++ b/include/hw/core/cpu.h
>>> @@ -76,6 +76,10 @@ typedef struct CPUWatchpoint CPUWatchpoint;
>>>  
>>>  struct TranslationBlock;
>>>  
>>> +#ifdef CONFIG_TCG
>>> +#include "tcg-cpu-ops.h"
>>> +#endif /* CONFIG_TCG */
>>> +
>>>  /**
>>>   * CPUClass:
>>>   * @class_by_name: Callback to map -cpu command line model name to an
>>> @@ -221,12 +225,15 @@ struct CPUClass {
>>>  
>>>  void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>>>  vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int len);
>>> -void (*tcg_initialize)(void);
>>>  
>>>  const char *deprecation_note;
>>>  /* Keep non-pointer data at the end to minimize holes.  */
>>>  int gdb_num_core_regs;
>>>  bool gdb_stop_before_watchpoint;
>>> +
>>> +#ifdef CONFIG_TCG
>>> +TcgCpuOperations tcg_ops;
>>> +#endif /* CONFIG_TCG */
>>>  };
> 
> I'm not a fan of #ifdefs in struct definitions (especially in
> generic code like hw/cpu), because there's risk the same header
> generate different struct layout when used by different .c files.
> I would prefer to gradually refactor the code so that tcg_ops is
> eventually removed from CPUClass.
> 
> This is not a dealbreaker, because both approaches are steps in
> the same direction.  But the #ifdef here makes review harder and
> has more risks of unwanted side effects.
> 
>>>  
>>>  /*
>>> diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
>>> new file mode 100644
>>> index 00..4475ef0996
>>> --- /dev/null
>>> +++ b/include/hw/core/tcg-cpu-ops.h
>>> @@ -0,0 +1,25 @@
>>> +/*
>>> + * TCG-Specific operations that are not meaningful for hardware 
>>> accelerators
>>> + *
>>> + * Copyright 2020 SUSE LLC
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or 
>>> later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#ifndef TCG_CPU_OPS_H
>>> +#define TCG_CPU_OPS_H
>>> +
>>> +/**
>>> + * struct TcgCpuOperations: TCG operations specific to a CPU class
>>> +

Re: [RFC v7 12/22] cpu: Introduce TCGCpuOperations struct

2020-12-04 Thread Eduardo Habkost

On Fri, Dec 04, 2020 at 07:07:09PM +0100, Claudio Fontana wrote:
> On 12/4/20 7:04 PM, Claudio Fontana wrote:
> > On 12/4/20 6:28 PM, Eduardo Habkost wrote:
> >> On Fri, Dec 04, 2020 at 06:10:49PM +0100, Philippe Mathieu-Daudé wrote:
> >>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
>  From: Eduardo Habkost 
> 
>  The TCG-specific CPU methods will be moved to a separate struct,
>  to make it easier to move accel-specific code outside generic CPU
>  code in the future.  Start by moving tcg_initialize().
> >>>
> >>> Good idea! One minor comment below.
> >>>
> 
>  The new CPUClass.tcg_opts field may eventually become a pointer,
>  but keep it an embedded struct for now, to make code conversion
>  easier.
> 
>  Signed-off-by: Eduardo Habkost 
>  ---
>   MAINTAINERS |  1 +
>   cpu.c   |  2 +-
>   include/hw/core/cpu.h   |  9 -
>   include/hw/core/tcg-cpu-ops.h   | 25 +
>   target/alpha/cpu.c  |  2 +-
>   target/arm/cpu.c|  2 +-
>   target/avr/cpu.c|  2 +-
>   target/cris/cpu.c   | 12 ++--
>   target/hppa/cpu.c   |  2 +-
>   target/i386/tcg-cpu.c   |  2 +-
>   target/lm32/cpu.c   |  2 +-
>   target/m68k/cpu.c   |  2 +-
>   target/microblaze/cpu.c |  2 +-
>   target/mips/cpu.c   |  2 +-
>   target/moxie/cpu.c  |  2 +-
>   target/nios2/cpu.c  |  2 +-
>   target/openrisc/cpu.c   |  2 +-
>   target/ppc/translate_init.c.inc |  2 +-
>   target/riscv/cpu.c  |  2 +-
>   target/rx/cpu.c |  2 +-
>   target/s390x/cpu.c  |  2 +-
>   target/sh4/cpu.c|  2 +-
>   target/sparc/cpu.c  |  2 +-
>   target/tilegx/cpu.c |  2 +-
>   target/tricore/cpu.c|  2 +-
>   target/unicore32/cpu.c  |  2 +-
>   target/xtensa/cpu.c |  2 +-
>   27 files changed, 63 insertions(+), 30 deletions(-)
>   create mode 100644 include/hw/core/tcg-cpu-ops.h
> 
>  diff --git a/MAINTAINERS b/MAINTAINERS
>  index f53f2678d8..d876f504a6 100644
>  --- a/MAINTAINERS
>  +++ b/MAINTAINERS
>  @@ -1535,6 +1535,7 @@ F: qapi/machine.json
>   F: qapi/machine-target.json
>   F: include/hw/boards.h
>   F: include/hw/core/cpu.h
>  +F: include/hw/core/tcg-cpu-ops.h
>   F: include/hw/cpu/cluster.h
>   F: include/sysemu/numa.h
>   T: git https://github.com/ehabkost/qemu.git machine-next
>  diff --git a/cpu.c b/cpu.c
>  index 0be5dcb6f3..d02c2a17f1 100644
>  --- a/cpu.c
>  +++ b/cpu.c
>  @@ -180,7 +180,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>   
>   if (tcg_enabled() && !tcg_target_initialized) {
>   tcg_target_initialized = true;
>  -cc->tcg_initialize();
>  +cc->tcg_ops.initialize();
>   }
>   tlb_init(cpu);
>   
>  diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>  index 3d92c967ff..c93b08a0fb 100644
>  --- a/include/hw/core/cpu.h
>  +++ b/include/hw/core/cpu.h
>  @@ -76,6 +76,10 @@ typedef struct CPUWatchpoint CPUWatchpoint;
>   
>   struct TranslationBlock;
>   
>  +#ifdef CONFIG_TCG
>  +#include "tcg-cpu-ops.h"
>  +#endif /* CONFIG_TCG */
>  +
>   /**
>    * CPUClass:
>    * @class_by_name: Callback to map -cpu command line model name to an
>  @@ -221,12 +225,15 @@ struct CPUClass {
>   
>   void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>   vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int 
>  len);
>  -void (*tcg_initialize)(void);
>   
>   const char *deprecation_note;
>   /* Keep non-pointer data at the end to minimize holes.  */
>   int gdb_num_core_regs;
>   bool gdb_stop_before_watchpoint;
>  +
>  +#ifdef CONFIG_TCG
>  +TcgCpuOperations tcg_ops;
>  +#endif /* CONFIG_TCG */
>   };
> >>
> >> I'm not a fan of #ifdefs in struct definitions (especially in
> >> generic code like hw/cpu), because there's risk the same header
> >> generate different struct layout when used by different .c files.
> >> I would prefer to gradually refactor the code so that tcg_ops is
> >> eventually removed from CPUClass.
> >>
> >> This is not a dealbreaker, because both approaches are steps in
> >> the same direction.  But the #ifdef here makes review harder and
> >> has more risks of unwanted side effects.
> >>
>   
>   /*
>  diff --git a/include/hw/core/tcg-cpu-ops.h 
>  b/include/hw/core/tcg-cpu-ops.h
>  new file mode 100644
>  index 00..4475ef0996
>  --- /dev/null
>  +++

Re: [RFC v7 15/22] cpu: Move tlb_fill to tcg_ops

2020-12-04 Thread Philippe Mathieu-Daudé

On 12/4/20 7:14 PM, Claudio Fontana wrote:
> On 12/4/20 7:00 PM, Philippe Mathieu-Daudé wrote:
>> On 12/4/20 6:37 PM, Eduardo Habkost wrote:
>>> On Fri, Dec 04, 2020 at 06:14:07PM +0100, Philippe Mathieu-Daudé wrote:
 On 11/30/20 3:35 AM, Claudio Fontana wrote:
> From: Eduardo Habkost 
>
> Signed-off-by: Eduardo Habkost 
> ---
>  accel/tcg/cputlb.c  |  6 +++---
>  accel/tcg/user-exec.c   |  6 +++---
>  include/hw/core/cpu.h   |  9 -
>  include/hw/core/tcg-cpu-ops.h   | 12 
>  target/alpha/cpu.c  |  2 +-
>  target/arm/cpu.c|  2 +-
>  target/avr/cpu.c|  2 +-
>  target/cris/cpu.c   |  2 +-
>  target/hppa/cpu.c   |  2 +-
>  target/i386/tcg-cpu.c   |  2 +-
>  target/lm32/cpu.c   |  2 +-
>  target/m68k/cpu.c   |  2 +-
>  target/microblaze/cpu.c |  2 +-
>  target/mips/cpu.c   |  2 +-
>  target/moxie/cpu.c  |  2 +-
>  target/nios2/cpu.c  |  2 +-
>  target/openrisc/cpu.c   |  2 +-
>  target/ppc/translate_init.c.inc |  2 +-
>  target/riscv/cpu.c  |  2 +-
>  target/rx/cpu.c |  2 +-
>  target/s390x/cpu.c  |  2 +-
>  target/sh4/cpu.c|  2 +-
>  target/sparc/cpu.c  |  2 +-
>  target/tilegx/cpu.c |  2 +-
>  target/tricore/cpu.c|  2 +-
>  target/unicore32/cpu.c  |  2 +-
>  target/xtensa/cpu.c |  2 +-
>  27 files changed, 41 insertions(+), 38 deletions(-)

 With cc->tcg_ops.* guarded with #ifdef CONFIG_TCG:
 Reviewed-by: Philippe Mathieu-Daudé 
>>>
>>> Thanks!
>>>
>>> Are the #ifdefs a hard condition for your Reviewed-by?
>>
>> No, as you said, this is fine as a first step, so you can
>> include them.
>>
>>> Even if we agree #ifdef CONFIG_TCG is the way to go, I don't
>>> think this should block a series that's a step in the right
>>> direction.  It can be done in a separate patch.
>>>
>>> (Unless the lack of #ifdef introduces regressions, of course)
>>
>> I'm worried about the +system -tcg build configuration.
>>
>> s390x is the only target testing for such regressions
>> (see "[s390x] Clang (disable-tcg)" on Travis-CI.
>>
> 
> which exact configure options are concerned about?
> 
> --disable-tcg --enable-kvm --target="*-system"?
> 
> Or something else?

Basically --disable-tcg --enable-$ACCEL [--enable-$ACCEL]

> 
> this is something I am testing (and found the issues).
> 
> I am currently testing (and a result fixing) for each patch:
> 
> --disable-tcg --enable-kvm

This one is meaningful to check the host, so I run it on:
- x86 [ok]
- s390x [ok]
- aarch64 [done, waiting for your effort before respining]
- ppc64 [done, I was postponing the series submission waiting
 for aa64 to be merged, but I might go back to it as
 aa64 is taking too long].
- mips: no hardware access

> --enable-tcg --disable-kvm
> --enable-tcg --enable-kvm --enable-hax
> --disable-system

I also use:

* --disable-tcg --disable-kvm --enable-xen
  [x86 host works]
  [aa64 host needs Alex Bennée patches]

* --disable-tcg --disable-system --disable-user --enable-tools

* --disable-system --static --disable-capstone
(experimental, not supported, don't waste time with it).

The most useful is --enable-tools with all accelerators disabled,
as it quickly triggers linking errors when you miss-place a
handler between #ifdefs.

> With targets (when compatible):
> TARGET_LIST="x86_64-softmmu,x86_64-linux-user,arm-softmmu,arm-linux-user,aarch64-softmmu,aarch64-linux-user,s390x-softmmu,s390x-linux-user"

"first class KVM users" include PPC64 too.

> 
> and yes, should offload much of this to CI..
> 
> Ciao,
> 
> Claudio
>

Re: [RFC v7 15/22] cpu: Move tlb_fill to tcg_ops

2020-12-04 Thread Claudio Fontana

On 12/4/20 7:00 PM, Philippe Mathieu-Daudé wrote:
> On 12/4/20 6:37 PM, Eduardo Habkost wrote:
>> On Fri, Dec 04, 2020 at 06:14:07PM +0100, Philippe Mathieu-Daudé wrote:
>>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
 From: Eduardo Habkost 

 Signed-off-by: Eduardo Habkost 
 ---
  accel/tcg/cputlb.c  |  6 +++---
  accel/tcg/user-exec.c   |  6 +++---
  include/hw/core/cpu.h   |  9 -
  include/hw/core/tcg-cpu-ops.h   | 12 
  target/alpha/cpu.c  |  2 +-
  target/arm/cpu.c|  2 +-
  target/avr/cpu.c|  2 +-
  target/cris/cpu.c   |  2 +-
  target/hppa/cpu.c   |  2 +-
  target/i386/tcg-cpu.c   |  2 +-
  target/lm32/cpu.c   |  2 +-
  target/m68k/cpu.c   |  2 +-
  target/microblaze/cpu.c |  2 +-
  target/mips/cpu.c   |  2 +-
  target/moxie/cpu.c  |  2 +-
  target/nios2/cpu.c  |  2 +-
  target/openrisc/cpu.c   |  2 +-
  target/ppc/translate_init.c.inc |  2 +-
  target/riscv/cpu.c  |  2 +-
  target/rx/cpu.c |  2 +-
  target/s390x/cpu.c  |  2 +-
  target/sh4/cpu.c|  2 +-
  target/sparc/cpu.c  |  2 +-
  target/tilegx/cpu.c |  2 +-
  target/tricore/cpu.c|  2 +-
  target/unicore32/cpu.c  |  2 +-
  target/xtensa/cpu.c |  2 +-
  27 files changed, 41 insertions(+), 38 deletions(-)
>>>
>>> With cc->tcg_ops.* guarded with #ifdef CONFIG_TCG:
>>> Reviewed-by: Philippe Mathieu-Daudé 
>>
>> Thanks!
>>
>> Are the #ifdefs a hard condition for your Reviewed-by?
> 
> No, as you said, this is fine as a first step, so you can
> include them.
> 
>> Even if we agree #ifdef CONFIG_TCG is the way to go, I don't
>> think this should block a series that's a step in the right
>> direction.  It can be done in a separate patch.
>>
>> (Unless the lack of #ifdef introduces regressions, of course)
> 
> I'm worried about the +system -tcg build configuration.
> 
> s390x is the only target testing for such regressions
> (see "[s390x] Clang (disable-tcg)" on Travis-CI.
> 

which exact configure options are concerned about?

--disable-tcg --enable-kvm --target="*-system"?

Or something else?

this is something I am testing (and found the issues).

I am currently testing (and a result fixing) for each patch:

--disable-tcg --enable-kvm
--enable-tcg --disable-kvm
--enable-tcg --enable-kvm --enable-hax
--disable-system

With targets (when compatible):
TARGET_LIST="x86_64-softmmu,x86_64-linux-user,arm-softmmu,arm-linux-user,aarch64-softmmu,aarch64-linux-user,s390x-softmmu,s390x-linux-user"

and yes, should offload much of this to CI..

Ciao,

Claudio

Re: [RFC v7 00/22] i386 cleanup [hw/core/cpu.c common]

2020-12-04 Thread Eduardo Habkost

On Fri, Dec 04, 2020 at 05:07:21PM +0100, Paolo Bonzini wrote:
> Il ven 4 dic 2020, 14:54 Claudio Fontana  ha scritto:
> 
> > On 11/30/20 3:35 AM, Claudio Fontana wrote:
> > > Hi all, this is v7 of the i386 cleanup,
> >
> > This is fairly broken still and I am fixing it up,
> >
> > but a question arises while hunting bugs here.
> >
> > Silent bugs are introduced when trying to use code like
> >
> > #ifndef CONFIG_USER_ONLY
> >
> > in files that are built in "common" objects, since they are target
> > independent.
> >
> 
> That should be avoided by poison.h
> 
> I wonder also about the rationale why the cpu code is split between
> >
> > hw/core/cpu.c and $(top_srcdir)/cpu.c
> >
> > with one part in common and one part in "target specific".
> >
> 
> Mostly historical, cpu.c used to have much more than CPU code (it was
> exec.c until a month ago, one of the "historical" core files in QEMU and it
> had all the dispatch side of the memory API). I wouldn't mind merging these
> two files into one.
> 
> Paolo
> 
> 
> > What do we gain by having part of the cpu in common?
> >
> > In some cases we end up going through all sort of hoops because we cannot
> > just code everything in hw/core/cpu.c due to the fact
> > that we do not see CONFIG_ there.
> >

I really prefer to have core files guaranteed to be generic, so
we don't risk introducing additional per-target differences in
base QOM classes (including TYPE_CPU).

The hoops we are having to go through are hoops we must go
through if we want a multi-architecture QEMU binary in the
future.  We don't need to fix everything at once (that's why
$(top_srcdir)/cpu.c still exists), but we don't need to make this
harder.

-- 
Eduardo

Re: [PATCH 7/9] target/mips: Extract msa_translate_init() from mips_tcg_init()

2020-12-04 Thread Richard Henderson

On 12/4/20 11:23 AM, Philippe Mathieu-Daudé wrote:
> On 12/4/20 5:30 PM, Richard Henderson wrote:
>> On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
>>> Extract the logic initialization of the MSA registers from
>>> the generic initialization.
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>  target/mips/translate.c | 35 ---
>>>  1 file changed, 20 insertions(+), 15 deletions(-)
>>
>> Why?
> 
> msa_wr_d[] registers are only used by MSA, so in the next series
> that allows me to move the 'static msa_wr_d[]' in msa_translate.c,
> without having to declare them global with extern.

Ah, sure.

Reviewed-by: Richard Henderson 

r~

Re: [RFC v7 12/22] cpu: Introduce TCGCpuOperations struct

2020-12-04 Thread Eduardo Habkost

On Fri, Dec 04, 2020 at 06:10:49PM +0100, Philippe Mathieu-Daudé wrote:
> On 11/30/20 3:35 AM, Claudio Fontana wrote:
> > From: Eduardo Habkost 
> > 
> > The TCG-specific CPU methods will be moved to a separate struct,
> > to make it easier to move accel-specific code outside generic CPU
> > code in the future.  Start by moving tcg_initialize().
> 
> Good idea! One minor comment below.
> 
> > 
> > The new CPUClass.tcg_opts field may eventually become a pointer,
> > but keep it an embedded struct for now, to make code conversion
> > easier.
> > 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  MAINTAINERS |  1 +
> >  cpu.c   |  2 +-
> >  include/hw/core/cpu.h   |  9 -
> >  include/hw/core/tcg-cpu-ops.h   | 25 +
> >  target/alpha/cpu.c  |  2 +-
> >  target/arm/cpu.c|  2 +-
> >  target/avr/cpu.c|  2 +-
> >  target/cris/cpu.c   | 12 ++--
> >  target/hppa/cpu.c   |  2 +-
> >  target/i386/tcg-cpu.c   |  2 +-
> >  target/lm32/cpu.c   |  2 +-
> >  target/m68k/cpu.c   |  2 +-
> >  target/microblaze/cpu.c |  2 +-
> >  target/mips/cpu.c   |  2 +-
> >  target/moxie/cpu.c  |  2 +-
> >  target/nios2/cpu.c  |  2 +-
> >  target/openrisc/cpu.c   |  2 +-
> >  target/ppc/translate_init.c.inc |  2 +-
> >  target/riscv/cpu.c  |  2 +-
> >  target/rx/cpu.c |  2 +-
> >  target/s390x/cpu.c  |  2 +-
> >  target/sh4/cpu.c|  2 +-
> >  target/sparc/cpu.c  |  2 +-
> >  target/tilegx/cpu.c |  2 +-
> >  target/tricore/cpu.c|  2 +-
> >  target/unicore32/cpu.c  |  2 +-
> >  target/xtensa/cpu.c |  2 +-
> >  27 files changed, 63 insertions(+), 30 deletions(-)
> >  create mode 100644 include/hw/core/tcg-cpu-ops.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index f53f2678d8..d876f504a6 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1535,6 +1535,7 @@ F: qapi/machine.json
> >  F: qapi/machine-target.json
> >  F: include/hw/boards.h
> >  F: include/hw/core/cpu.h
> > +F: include/hw/core/tcg-cpu-ops.h
> >  F: include/hw/cpu/cluster.h
> >  F: include/sysemu/numa.h
> >  T: git https://github.com/ehabkost/qemu.git machine-next
> > diff --git a/cpu.c b/cpu.c
> > index 0be5dcb6f3..d02c2a17f1 100644
> > --- a/cpu.c
> > +++ b/cpu.c
> > @@ -180,7 +180,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
> >  
> >  if (tcg_enabled() && !tcg_target_initialized) {
> >  tcg_target_initialized = true;
> > -cc->tcg_initialize();
> > +cc->tcg_ops.initialize();
> >  }
> >  tlb_init(cpu);
> >  
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 3d92c967ff..c93b08a0fb 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -76,6 +76,10 @@ typedef struct CPUWatchpoint CPUWatchpoint;
> >  
> >  struct TranslationBlock;
> >  
> > +#ifdef CONFIG_TCG
> > +#include "tcg-cpu-ops.h"
> > +#endif /* CONFIG_TCG */
> > +
> >  /**
> >   * CPUClass:
> >   * @class_by_name: Callback to map -cpu command line model name to an
> > @@ -221,12 +225,15 @@ struct CPUClass {
> >  
> >  void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
> >  vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int len);
> > -void (*tcg_initialize)(void);
> >  
> >  const char *deprecation_note;
> >  /* Keep non-pointer data at the end to minimize holes.  */
> >  int gdb_num_core_regs;
> >  bool gdb_stop_before_watchpoint;
> > +
> > +#ifdef CONFIG_TCG
> > +TcgCpuOperations tcg_ops;
> > +#endif /* CONFIG_TCG */
> >  };

I'm not a fan of #ifdefs in struct definitions (especially in
generic code like hw/cpu), because there's risk the same header
generate different struct layout when used by different .c files.
I would prefer to gradually refactor the code so that tcg_ops is
eventually removed from CPUClass.

This is not a dealbreaker, because both approaches are steps in
the same direction.  But the #ifdef here makes review harder and
has more risks of unwanted side effects.

> >  
> >  /*
> > diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
> > new file mode 100644
> > index 00..4475ef0996
> > --- /dev/null
> > +++ b/include/hw/core/tcg-cpu-ops.h
> > @@ -0,0 +1,25 @@
> > +/*
> > + * TCG-Specific operations that are not meaningful for hardware 
> > accelerators
> > + *
> > + * Copyright 2020 SUSE LLC
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef TCG_CPU_OPS_H
> > +#define TCG_CPU_OPS_H
> > +
> > +/**
> > + * struct TcgCpuOperations: TCG operations specific to a CPU class
> > + */
> > +typedef struct TcgCpuOperations {
> > +/**
> > +

Re: [PATCH 6/9] target/mips: Alias MSA vector registers on FPU scalar registers

2020-12-04 Thread Richard Henderson

On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
> Commits 863f264d10f ("add msa_reset(), global msa register") and
> cb269f273fd ("fix multiple TCG registers covering same data")
> removed the FPU scalar registers and replaced them by aliases to
> the MSA vector registers.
> While this might be the case for CPU implementing MSA, this makes
> QEMU code incoherent for CPU not implementing it. It is simpler
> to inverse the logic and alias the MSA vector registers on the
> FPU scalar ones.

How does it make things incoherent?  I'm missing how the logic has actually
changed, as opposed to an order of assignments.


r~

Re: [RFC v7 00/22] i386 cleanup [hw/core/cpu.c common]

2020-12-04 Thread Claudio Fontana

On 12/4/20 5:07 PM, Paolo Bonzini wrote:
> Il ven 4 dic 2020, 14:54 Claudio Fontana  ha scritto:
> 
>> On 11/30/20 3:35 AM, Claudio Fontana wrote:
>>> Hi all, this is v7 of the i386 cleanup,
>>
>> This is fairly broken still and I am fixing it up,
>>
>> but a question arises while hunting bugs here.
>>
>> Silent bugs are introduced when trying to use code like
>>
>> #ifndef CONFIG_USER_ONLY
>>
>> in files that are built in "common" objects, since they are target
>> independent.
>>
> 
> That should be avoided by poison.h
> 
> I wonder also about the rationale why the cpu code is split between
>>
>> hw/core/cpu.c and $(top_srcdir)/cpu.c
>>
>> with one part in common and one part in "target specific".
>>
> 
> Mostly historical, cpu.c used to have much more than CPU code (it was
> exec.c until a month ago, one of the "historical" core files in QEMU and it
> had all the dispatch side of the memory API). I wouldn't mind merging these
> two files into one.
> 
> Paolo
> 

Thanks Paolo!

Ciao,

Claudio


> 
>> What do we gain by having part of the cpu in common?
>>
>> In some cases we end up going through all sort of hoops because we cannot
>> just code everything in hw/core/cpu.c due to the fact
>> that we do not see CONFIG_ there.
>>
>>
>>> with the most interesting patches at the end.
>>>
>>> v6 -> v7: integrate TCGCpuOperations, refactored cpu_exec_realizefn
>>>
>>> * integrate TCGCpuOperations (Eduardo)
>>>
>>> Taken some refactoring from Eduardo for Tcg-only operations on
>>> CPUClass.
>>>
>>> * refactored cpu_exec_realizefn
>>>
>>> The other main change is a refactoring of cpu_exec_realizefn,
>>> directly linked to the effort of making many cpu_exec operations
>>> TCG-only (Eduardo series above):
>>>
>>> cpu_exec_realizefn is actually a TCG-only thing, with the
>>> exception of a couple things that can be done in base cpu code.
>>>
>>> This changes all targets realizefn, so I guess I have to Cc:
>>> the Multiverse? (Universe was already CCed for all accelerators).
>>>
>>>
>>> v5 -> v6: remove MODULE_INIT_ACCEL_CPU
>>>
>>>
>>> instead, use a call to accel_init_interfaces().
>>>
>>> * The class lookups are now general and performed in accel/
>>>
>>>   new AccelCPUClass for new archs are supported as new
>>>   ones appear in the class hierarchy, no need for stubs.
>>>
>>> * Split the code a bit better
>>>
>>>
>>> v4 -> v5: centralized and simplified initializations
>>>
>>> I put in Cc: Emilio G. Cota, specifically because in patch 8
>>> I (re)moved for user-mode the call to tcg_regions_init().
>>>
>>> The call happens now inside the tcg AccelClass machine_init,
>>> (so earlier). This seems to work fine, but thought to get the
>>> author opinion on this.
>>>
>>> Rebased on "tcg-cpus: split into 3 tcg variants" series
>>> (queued by Richard), to avoid some code churn:
>>>
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04356.html
>>>
>>>
>>> * Extended AccelClass to user-mode.
>>>
>>> user-mode now does not call tcg_exec_init directly,
>>> instead it uses the tcg accel class, and its init_machine method.
>>>
>>> Since user-mode does not define or use a machine state,
>>> the machine is just passed as NULL.
>>>
>>> The immediate advantage is that now we can call current_accel()
>>> from both user mode and softmmu, so we can work out the correct
>>> class to use for accelerator initializations.
>>>
>>> * QOMification of CpusAccelOps
>>>
>>> simple QOMification of CpusAccelOps abstract class.
>>>
>>> * Centralized all accel_cpu_init, so only one per cpu-arch,
>>>   plus one for all accels will remain.
>>>
>>>   So we can expect accel_cpu_init() to be limited to:
>>>
>>>   softmmu/cpus.c - initializes the chosen softmmu accel ops for the cpus
>> module.
>>>   target/ARCH/cpu.c - initializes the chosen arch-specific cpu
>> accelerator.
>>>
>>> These changes are meant to address concerns/issues (Paolo):
>>>
>>> 1) the use of if (tcg_enabled()) and similar in the module_init call path
>>>
>>> 2) the excessive number of accel_cpu_init() to hunt down in the codebase.
>>>
>>>
>>> * Fixed wrong use of host_cpu_class_init (Eduardo)
>>>
>>>
>>> v3 -> v4: QOMification of X86CPUAccelClass
>>>
>>>
>>> In this version I basically QOMified X86CPUAccel, taking the
>>> suggestions from Eduardo as the starting point,
>>> but stopping just short of making it an actual QOM interface,
>>> using a plain abstract class, and then subclasses for the
>>> actual objects.
>>>
>>> Initialization is still using the existing qemu initialization
>>> framework (module_call_init), which is I still think is better
>>> than the alternatives proposed, in the current state.
>>>
>>> Possibly some improvements could be developed in the future here.
>>> In this case, effort should be put in keeping things extendible,
>>> in order not to be blocked once accelerators also become modules.
>>>
>>> Motivation and higher level steps:
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04628.html
>>>
>>> Looking forward to your

Re: [PATCH for-6.0 00/11] target/arm: enforce alignment

2020-12-04 Thread Richard Henderson

On 12/4/20 12:17 AM, Pavel Dovgalyuk wrote:
> On 03.12.2020 19:14, Peter Maydell wrote:
>> On Thu, 3 Dec 2020 at 16:10, Pavel Dovgalyuk  
>> wrote:
>>>
>>> On 03.12.2020 15:30, Philippe Mathieu-Daudé wrote:
 Cc'ing Pavel

 On 12/1/20 4:55 PM, Peter Maydell wrote:
> On Wed, 25 Nov 2020 at 04:06, Richard Henderson
>  wrote:
>>
>> As reported in https://bugs.launchpad.net/bugs/1905356
>>
>> Not implementing SCTLR.A, but all of the other required
>> alignment for SCTLR.A=0 in Table A3-1.
>
> Something in this series breaks the 'make check-acceptance'
> record-and-replay test:
>
>    (30/40)
> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt:
> PASS (9.14 s)
>    (31/40)
> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '31-tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt',
> 'logdir':
> '/home/petmay01/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/result...
> (90.19 s)
>
> The log shows the "recording execution" apparently hanging,
> with the last output from the guest
> [    3.183662] Registering SWP/SWPB emulation handler
>>>
>>> I looked through the patches and it does not seem that they can break
>>> anything.
>>> Could it be the same avocado/chardev socket glitch as in some previous
>>> failures?
>>> What happens when re-running this test?
>>
>> I ran it a couple of times with the patchset and it failed the same
>> way each time. Without is fine.
> 
> I applied the patches and got no failures on my local machine.
> 
> Do you have any ideas on debugging this bug?
> What does "arm-clang" means? Is the host compiler is clang?

I have reproduced it:

qemu-system-arm: /home/rth/qemu/qemu/include/tcg/tcg.h:339: get_alignment_bits:
Assertion `(((1 << (10 - 1)) | (1 << (10 - 2)) | (1 << (10 - 3)) | (1 << (10 -
4)) | (1 << (10 - 5)) | (1 << (10 - 6))) & ((1 << a) - 1)) == 0' failed.
Aborted (core dumped)

You need --enable-debug-tcg for this assert.

It's incredibly stupid of avocado to report SIGABRT as a timeout.


r~

Re: [PATCH v4 07/11] hvf: Add Apple Silicon support

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:53AM +0100, Alexander Graf wrote:
> With Apple Silicon available to the masses, it's a good time to add support
> for driving its virtualization extensions from QEMU.
> 
> This patch adds all necessary architecture specific code to get basic VMs
> working. It's still pretty raw, but definitely functional.
> 
> Known limitations:
> 
>   - Vtimer acknowledgement is hacky
>   - Should implement more sysregs and fault on invalid ones then
>   - WFI handling is missing, need to marry it with vtimer
> 
> Signed-off-by: Alexander Graf 
> 

For non-ARM specific bits,

Reviewed-by: Roman Bolshakov 

Can't set Tested-by because I have no ARM machine yet, but x86
build/execution is fine on Catalina and Big Sur :)

Thanks,
Roman

Re: [PATCH v5 0/4] Introducing QMP query-netdev command

2020-12-04 Thread Alexey Kirillov

ping againPatchwork page: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=212983 09.11.2020, 03:02, "Alexey Kirillov" :This patch series introduces a new QMP command "query-netdev" to getinformation about currently attached backend network devices (netdevs).Also, since the "info_str" field of "NetClientState" is now deprecated,we no longer use it for netdevs, only for NIC/hubports.The HMP command "info network" now also uses the new QMP command inside.Usage example:-> { "execute": "query-netdev" }<- { "return": [ { "listen": "127.0.0.1:90", "type": "socket", "peer-id": "hub0port1", "id": "__org.qemu.net1" }, { "script": "/etc/qemu-ifup", "downscript": "/etc/qemu-ifdown", "ifname": "tap0", "type": "tap", "peer-id": "net5", "vnet_hdr": true, "id": "tap0" }, { "ipv6": true, "ipv4": true, "host": "10.0.2.2", "ipv6-dns": "fec0::3", "ipv6-prefix": "fec0::", "net": "10.0.2.0/255.255.255.0", "ipv6-host": "fec0::2", "type": "user", "peer-id": "net0", "dns": "10.0.2.3", "hostfwd": [ { "str": "tcp::20004-:22" } ], "ipv6-prefixlen": 64, "id": "netdev0", "restrict": false } ]   }v4->v5:- Enable qtest of query-netdevs for AVR and RX archs.- Bump "Since" version in QAPI to 6.0.v3->v4:- Rename "query-netdevs" to "query-netdev".- Copy netdev drivers to new QAPI enum "NetBackend".v2->v3:- Remove NIC and hubports from query-netdevs.- Remove several fields from NetdevInfo since they are unnecessary.- Rename field @peer to @peer-id.- Add support of vhost-vdpa.- Keep "info_str" for NIC/hubports, but remove it for netdevs.v1->v2:- Rewrite HMP "info network" to get information from results of QMP command.- Remove obsolete field "info_str" from "NetClientState".Alexey Kirillov (4):  qapi: net: Add query-netdev command  tests: Add tests for query-netdev command  hmp: Use QMP query-netdev in hmp_info_network  net: Do not use legacy info_str for backends include/net/net.h | 4 +- net/clients.h | 1 + net/hub.c | 4 +- net/hub.h | 2 +- net/l2tpv3.c | 21 +++- net/net.c | 213 +++- net/netmap.c | 13 ++ net/slirp.c | 128 ++- net/socket.c | 91 ++ net/tap-win32.c | 10 +- net/tap.c | 107 ++-- net/vde.c | 39 +- net/vhost-user.c | 20 ++- net/vhost-vdpa.c | 15 ++- qapi/net.json | 80  tests/qtest/meson.build | 3 + tests/qtest/test-query-netdev.c | 120 ++ 17 files changed, 812 insertions(+), 59 deletions(-) create mode 100644 tests/qtest/test-query-netdev.c --2.25.1   -- Alexey KirillovYandex.Cloud

Re: [RFC v7 15/22] cpu: Move tlb_fill to tcg_ops

2020-12-04 Thread Eduardo Habkost

On Fri, Dec 04, 2020 at 06:14:07PM +0100, Philippe Mathieu-Daudé wrote:
> On 11/30/20 3:35 AM, Claudio Fontana wrote:
> > From: Eduardo Habkost 
> > 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  accel/tcg/cputlb.c  |  6 +++---
> >  accel/tcg/user-exec.c   |  6 +++---
> >  include/hw/core/cpu.h   |  9 -
> >  include/hw/core/tcg-cpu-ops.h   | 12 
> >  target/alpha/cpu.c  |  2 +-
> >  target/arm/cpu.c|  2 +-
> >  target/avr/cpu.c|  2 +-
> >  target/cris/cpu.c   |  2 +-
> >  target/hppa/cpu.c   |  2 +-
> >  target/i386/tcg-cpu.c   |  2 +-
> >  target/lm32/cpu.c   |  2 +-
> >  target/m68k/cpu.c   |  2 +-
> >  target/microblaze/cpu.c |  2 +-
> >  target/mips/cpu.c   |  2 +-
> >  target/moxie/cpu.c  |  2 +-
> >  target/nios2/cpu.c  |  2 +-
> >  target/openrisc/cpu.c   |  2 +-
> >  target/ppc/translate_init.c.inc |  2 +-
> >  target/riscv/cpu.c  |  2 +-
> >  target/rx/cpu.c |  2 +-
> >  target/s390x/cpu.c  |  2 +-
> >  target/sh4/cpu.c|  2 +-
> >  target/sparc/cpu.c  |  2 +-
> >  target/tilegx/cpu.c |  2 +-
> >  target/tricore/cpu.c|  2 +-
> >  target/unicore32/cpu.c  |  2 +-
> >  target/xtensa/cpu.c |  2 +-
> >  27 files changed, 41 insertions(+), 38 deletions(-)
> 
> With cc->tcg_ops.* guarded with #ifdef CONFIG_TCG:
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks!

Are the #ifdefs a hard condition for your Reviewed-by?

Even if we agree #ifdef CONFIG_TCG is the way to go, I don't
think this should block a series that's a step in the right
direction.  It can be done in a separate patch.

(Unless the lack of #ifdef introduces regressions, of course)

-- 
Eduardo

Re: [RFC v7 15/22] cpu: Move tlb_fill to tcg_ops

2020-12-04 Thread Philippe Mathieu-Daudé

On 11/30/20 3:35 AM, Claudio Fontana wrote:
> From: Eduardo Habkost 
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  accel/tcg/cputlb.c  |  6 +++---
>  accel/tcg/user-exec.c   |  6 +++---
>  include/hw/core/cpu.h   |  9 -
>  include/hw/core/tcg-cpu-ops.h   | 12 
>  target/alpha/cpu.c  |  2 +-
>  target/arm/cpu.c|  2 +-
>  target/avr/cpu.c|  2 +-
>  target/cris/cpu.c   |  2 +-
>  target/hppa/cpu.c   |  2 +-
>  target/i386/tcg-cpu.c   |  2 +-
>  target/lm32/cpu.c   |  2 +-
>  target/m68k/cpu.c   |  2 +-
>  target/microblaze/cpu.c |  2 +-
>  target/mips/cpu.c   |  2 +-
>  target/moxie/cpu.c  |  2 +-
>  target/nios2/cpu.c  |  2 +-
>  target/openrisc/cpu.c   |  2 +-
>  target/ppc/translate_init.c.inc |  2 +-
>  target/riscv/cpu.c  |  2 +-
>  target/rx/cpu.c |  2 +-
>  target/s390x/cpu.c  |  2 +-
>  target/sh4/cpu.c|  2 +-
>  target/sparc/cpu.c  |  2 +-
>  target/tilegx/cpu.c |  2 +-
>  target/tricore/cpu.c|  2 +-
>  target/unicore32/cpu.c  |  2 +-
>  target/xtensa/cpu.c |  2 +-
>  27 files changed, 41 insertions(+), 38 deletions(-)

With cc->tcg_ops.* guarded with #ifdef CONFIG_TCG:
Reviewed-by: Philippe Mathieu-Daudé

Re: [RFC v7 12/22] cpu: Introduce TCGCpuOperations struct

2020-12-04 Thread Philippe Mathieu-Daudé

On 11/30/20 3:35 AM, Claudio Fontana wrote:
> From: Eduardo Habkost 
> 
> The TCG-specific CPU methods will be moved to a separate struct,
> to make it easier to move accel-specific code outside generic CPU
> code in the future.  Start by moving tcg_initialize().

Good idea! One minor comment below.

> 
> The new CPUClass.tcg_opts field may eventually become a pointer,
> but keep it an embedded struct for now, to make code conversion
> easier.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  MAINTAINERS |  1 +
>  cpu.c   |  2 +-
>  include/hw/core/cpu.h   |  9 -
>  include/hw/core/tcg-cpu-ops.h   | 25 +
>  target/alpha/cpu.c  |  2 +-
>  target/arm/cpu.c|  2 +-
>  target/avr/cpu.c|  2 +-
>  target/cris/cpu.c   | 12 ++--
>  target/hppa/cpu.c   |  2 +-
>  target/i386/tcg-cpu.c   |  2 +-
>  target/lm32/cpu.c   |  2 +-
>  target/m68k/cpu.c   |  2 +-
>  target/microblaze/cpu.c |  2 +-
>  target/mips/cpu.c   |  2 +-
>  target/moxie/cpu.c  |  2 +-
>  target/nios2/cpu.c  |  2 +-
>  target/openrisc/cpu.c   |  2 +-
>  target/ppc/translate_init.c.inc |  2 +-
>  target/riscv/cpu.c  |  2 +-
>  target/rx/cpu.c |  2 +-
>  target/s390x/cpu.c  |  2 +-
>  target/sh4/cpu.c|  2 +-
>  target/sparc/cpu.c  |  2 +-
>  target/tilegx/cpu.c |  2 +-
>  target/tricore/cpu.c|  2 +-
>  target/unicore32/cpu.c  |  2 +-
>  target/xtensa/cpu.c |  2 +-
>  27 files changed, 63 insertions(+), 30 deletions(-)
>  create mode 100644 include/hw/core/tcg-cpu-ops.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f53f2678d8..d876f504a6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1535,6 +1535,7 @@ F: qapi/machine.json
>  F: qapi/machine-target.json
>  F: include/hw/boards.h
>  F: include/hw/core/cpu.h
> +F: include/hw/core/tcg-cpu-ops.h
>  F: include/hw/cpu/cluster.h
>  F: include/sysemu/numa.h
>  T: git https://github.com/ehabkost/qemu.git machine-next
> diff --git a/cpu.c b/cpu.c
> index 0be5dcb6f3..d02c2a17f1 100644
> --- a/cpu.c
> +++ b/cpu.c
> @@ -180,7 +180,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>  
>  if (tcg_enabled() && !tcg_target_initialized) {
>  tcg_target_initialized = true;
> -cc->tcg_initialize();
> +cc->tcg_ops.initialize();
>  }
>  tlb_init(cpu);
>  
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 3d92c967ff..c93b08a0fb 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -76,6 +76,10 @@ typedef struct CPUWatchpoint CPUWatchpoint;
>  
>  struct TranslationBlock;
>  
> +#ifdef CONFIG_TCG
> +#include "tcg-cpu-ops.h"
> +#endif /* CONFIG_TCG */
> +
>  /**
>   * CPUClass:
>   * @class_by_name: Callback to map -cpu command line model name to an
> @@ -221,12 +225,15 @@ struct CPUClass {
>  
>  void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>  vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int len);
> -void (*tcg_initialize)(void);
>  
>  const char *deprecation_note;
>  /* Keep non-pointer data at the end to minimize holes.  */
>  int gdb_num_core_regs;
>  bool gdb_stop_before_watchpoint;
> +
> +#ifdef CONFIG_TCG
> +TcgCpuOperations tcg_ops;
> +#endif /* CONFIG_TCG */
>  };
>  
>  /*
> diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
> new file mode 100644
> index 00..4475ef0996
> --- /dev/null
> +++ b/include/hw/core/tcg-cpu-ops.h
> @@ -0,0 +1,25 @@
> +/*
> + * TCG-Specific operations that are not meaningful for hardware accelerators
> + *
> + * Copyright 2020 SUSE LLC
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef TCG_CPU_OPS_H
> +#define TCG_CPU_OPS_H
> +
> +/**
> + * struct TcgCpuOperations: TCG operations specific to a CPU class
> + */
> +typedef struct TcgCpuOperations {
> +/**
> + * @initialize: Initalize TCG state
> + *
> + * Called when the first CPU is realized.
> + */
> +void (*initialize)(void);
> +} TcgCpuOperations;
> +
> +#endif /* TCG_CPU_OPS_H */
...

> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 07492e9f9a..1fa9382a7c 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -2261,7 +2261,7 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->gdb_stop_before_watchpoint = true;
>  cc->disas_set_info = arm_disas_set_info;
>  #ifdef CONFIG_TCG
> -cc->tcg_initialize = arm_translate_init;
> +cc->tcg_ops.initialize = arm_translate_init;

This one is correctly guarded by '#ifdef CONFIG_TCG'.

For the other targets, can you either place it within
the '#ifdef CONFIG_TCG' block or if there is

Re: [RFC v7 14/22] cpu: Move cpu_exec_* to tcg_ops

2020-12-04 Thread Philippe Mathieu-Daudé

On 11/30/20 3:35 AM, Claudio Fontana wrote:
> From: Eduardo Habkost 
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  accel/tcg/cpu-exec.c| 12 ++--
>  include/hw/core/cpu.h   |  6 --
>  include/hw/core/tcg-cpu-ops.h   |  9 +
>  target/alpha/cpu.c  |  3 ++-
>  target/arm/cpu.c|  2 +-
>  target/arm/cpu64.c  |  2 +-
>  target/arm/cpu_tcg.c|  2 +-
>  target/avr/cpu.c|  2 +-
>  target/cris/cpu.c   |  2 +-
>  target/hppa/cpu.c   |  2 +-
>  target/i386/tcg-cpu.c   |  6 +++---
>  target/lm32/cpu.c   |  2 +-
>  target/m68k/cpu.c   |  2 +-
>  target/microblaze/cpu.c |  2 +-
>  target/mips/cpu.c   |  2 +-
>  target/nios2/cpu.c  |  2 +-
>  target/openrisc/cpu.c   |  2 +-
>  target/ppc/translate_init.c.inc |  6 +++---
>  target/riscv/cpu.c  |  2 +-
>  target/rx/cpu.c |  2 +-
>  target/s390x/cpu.c  |  2 +-
>  target/sh4/cpu.c|  2 +-
>  target/sparc/cpu.c  |  2 +-
>  target/tilegx/cpu.c |  2 +-
>  target/unicore32/cpu.c  |  2 +-
>  target/xtensa/cpu.c |  2 +-
>  26 files changed, 43 insertions(+), 39 deletions(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 816ef29f68..07ff1fa4dc 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -240,8 +240,8 @@ static void cpu_exec_enter(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  
> -if (cc->cpu_exec_enter) {
> -cc->cpu_exec_enter(cpu);
> +if (cc->tcg_ops.cpu_exec_enter) {
> +cc->tcg_ops.cpu_exec_enter(cpu);
>  }
>  }
>  
> @@ -249,8 +249,8 @@ static void cpu_exec_exit(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  
> -if (cc->cpu_exec_exit) {
> -cc->cpu_exec_exit(cpu);
> +if (cc->tcg_ops.cpu_exec_exit) {
> +cc->tcg_ops.cpu_exec_exit(cpu);
>  }
>  }
>  
> @@ -625,8 +625,8 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
> True when it is, and we should restart on a new TB,
> and via longjmp via cpu_loop_exit.  */
>  else {
> -if (cc->cpu_exec_interrupt &&
> -cc->cpu_exec_interrupt(cpu, interrupt_request)) {
> +if (cc->tcg_ops.cpu_exec_interrupt &&
> +cc->tcg_ops.cpu_exec_interrupt(cpu, interrupt_request)) {
>  if (need_replay_interrupt(interrupt_request)) {
>  replay_interrupt();
>  }
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 19211cb409..538f3e6cd3 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -146,9 +146,6 @@ struct TranslationBlock;
>   * @gdb_get_dynamic_xml: Callback to return dynamically generated XML for the
>   *   gdb stub. Returns a pointer to the XML contents for the specified XML 
> file
>   *   or NULL if the CPU doesn't have a dynamically generated content for it.
> - * @cpu_exec_enter: Callback for cpu_exec preparation.
> - * @cpu_exec_exit: Callback for cpu_exec cleanup.
> - * @cpu_exec_interrupt: Callback for processing interrupts in cpu_exec.
>   * @disas_set_info: Setup architecture specific components of disassembly 
> info
>   * @adjust_watchpoint_address: Perform a target-specific adjustment to an
>   * address before attempting to match it against watchpoints.
> @@ -211,9 +208,6 @@ struct CPUClass {
>  const char *gdb_core_xml_file;
>  gchar * (*gdb_arch_name)(CPUState *cpu);
>  const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
> -void (*cpu_exec_enter)(CPUState *cpu);
> -void (*cpu_exec_exit)(CPUState *cpu);
> -bool (*cpu_exec_interrupt)(CPUState *cpu, int interrupt_request);
>  
>  void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>  vaddr (*adjust_watchpoint_address)(CPUState *cpu, vaddr addr, int len);
> diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
> index 109291ac52..e12f32919b 100644
> --- a/include/hw/core/tcg-cpu-ops.h
> +++ b/include/hw/core/tcg-cpu-ops.h
> @@ -10,6 +10,9 @@
>  #ifndef TCG_CPU_OPS_H
>  #define TCG_CPU_OPS_H
>  
> +/**
> + * struct TcgCpuOperations: TCG operations specific to a CPU class
> + */
>  typedef struct TcgCpuOperations {
>  /**
>   * @initialize: Initalize TCG state
> @@ -28,6 +31,12 @@ typedef struct TcgCpuOperations {
>   * @set_pc(tb->pc).
>   */
>  void (*synchronize_from_tb)(CPUState *cpu, struct TranslationBlock *tb);
> +/** @cpu_exec_enter: Callback for cpu_exec preparation */
> +void (*cpu_exec_enter)(CPUState *cpu);
> +/** @cpu_exec_exit: Callback for cpu_exec cleanup */
> +void (*cpu_exec_exit)(CPUState *cpu);
> +/** @cpu_exec_interrupt: Callback for processing interrupts in cpu_exec 
> */
> +bool (*cpu_exec_interrupt)(CPUState *cpu, int

Re: [PATCH 7/9] target/mips: Extract msa_translate_init() from mips_tcg_init()

2020-12-04 Thread Philippe Mathieu-Daudé

On 12/4/20 5:30 PM, Richard Henderson wrote:
> On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
>> Extract the logic initialization of the MSA registers from
>> the generic initialization.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  target/mips/translate.c | 35 ---
>>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> Why?

msa_wr_d[] registers are only used by MSA, so in the next series
that allows me to move the 'static msa_wr_d[]' in msa_translate.c,
without having to declare them global with extern.

> 
>> -fpu_f64[i] = tcg_global_mem_new_i64(cpu_env, off, msaregnames[i * 
>> 2]);
>> +fpu_f64[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
> 
> Maybe fold this back to the previous patch?

Certainly ;)

> 
> 
> r~
>

[PATCH 7/8] x86: ich9: factor out "guest_cpu_hotplug_features"

2020-12-04 Thread Igor Mammedov

it will be reused by next patch to check validity of unplug
feature.

Signed-off-by: Igor Mammedov 
---
 hw/isa/lpc_ich9.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 087a18d04d..da80430144 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
 {
 ICH9LPCState *lpc = opaque;
 uint64_t guest_features;
+uint64_t guest_cpu_hotplug_features;
 
 if (lpc->smi_features_ok) {
 /* negotiation already complete, features locked */
@@ -378,9 +379,12 @@ static void smi_features_ok_callback(void *opaque)
 /* guest requests invalid features, leave @features_ok at zero */
 return;
 }
+
+guest_cpu_hotplug_features = guest_features &
+ (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
+  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
 if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
-guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
-  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
+guest_cpu_hotplug_features) {
 /*
  * cpu hot-[un]plug with SMI requires SMI broadcast,
  * leave @features_ok at zero
-- 
2.27.0

Re: [RFC v7 13/22] cpu: Move synchronize_from_tb() to tcg_ops

2020-12-04 Thread Philippe Mathieu-Daudé

On 11/30/20 3:35 AM, Claudio Fontana wrote:
> From: Eduardo Habkost 
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  accel/tcg/cpu-exec.c  |  4 ++--
>  include/hw/core/cpu.h |  8 
>  include/hw/core/tcg-cpu-ops.h | 14 +++---
>  target/arm/cpu.c  |  2 +-
>  target/avr/cpu.c  |  2 +-
>  target/hppa/cpu.c |  2 +-
>  target/i386/tcg-cpu.c |  2 +-
>  target/microblaze/cpu.c   |  2 +-
>  target/mips/cpu.c |  2 +-
>  target/riscv/cpu.c|  2 +-
>  target/rx/cpu.c   |  2 +-
>  target/sh4/cpu.c  |  2 +-
>  target/sparc/cpu.c|  2 +-
>  target/tricore/cpu.c  |  2 +-
>  14 files changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 64cba89356..816ef29f68 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -192,8 +192,8 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
> TranslationBlock *itb)
> TARGET_FMT_lx "] %s\n",
> last_tb->tc.ptr, last_tb->pc,
> lookup_symbol(last_tb->pc));
> -if (cc->synchronize_from_tb) {
> -cc->synchronize_from_tb(cpu, last_tb);
> +if (cc->tcg_ops.synchronize_from_tb) {
> +cc->tcg_ops.synchronize_from_tb(cpu, last_tb);
>  } else {
>  assert(cc->set_pc);
>  cc->set_pc(cpu, last_tb->pc);
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index c93b08a0fb..19211cb409 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -110,13 +110,6 @@ struct TranslationBlock;
>   *   If the target behaviour here is anything other than "set
>   *   the PC register to the value passed in" then the target must
>   *   also implement the synchronize_from_tb hook.
> - * @synchronize_from_tb: Callback for synchronizing state from a TCG
> - *   #TranslationBlock. This is called when we abandon execution
> - *   of a TB before starting it, and must set all parts of the CPU
> - *   state which the previous TB in the chain may not have updated.
> - *   This always includes at least the program counter; some targets
> - *   will need to do more. If this hook is not implemented then the
> - *   default is to call @set_pc(tb->pc).
>   * @tlb_fill: Callback for handling a softmmu tlb miss or user-only
>   *   address fault.  For system mode, if the access is valid, call
>   *   tlb_set_page and return true; if the access is invalid, and
> @@ -193,7 +186,6 @@ struct CPUClass {
>  void (*get_memory_mapping)(CPUState *cpu, MemoryMappingList *list,
> Error **errp);
>  void (*set_pc)(CPUState *cpu, vaddr value);
> -void (*synchronize_from_tb)(CPUState *cpu, struct TranslationBlock *tb);
>  bool (*tlb_fill)(CPUState *cpu, vaddr address, int size,
>   MMUAccessType access_type, int mmu_idx,
>   bool probe, uintptr_t retaddr);
> diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
> index 4475ef0996..109291ac52 100644
> --- a/include/hw/core/tcg-cpu-ops.h
> +++ b/include/hw/core/tcg-cpu-ops.h
> @@ -10,9 +10,6 @@
>  #ifndef TCG_CPU_OPS_H
>  #define TCG_CPU_OPS_H
>  
> -/**
> - * struct TcgCpuOperations: TCG operations specific to a CPU class
> - */
>  typedef struct TcgCpuOperations {
>  /**
>   * @initialize: Initalize TCG state
> @@ -20,6 +17,17 @@ typedef struct TcgCpuOperations {
>   * Called when the first CPU is realized.
>   */
>  void (*initialize)(void);
> +/**
> + * @synchronize_from_tb: Synchronize state from a TCG #TranslationBlock
> + *
> + * This is called when we abandon execution of a TB before
> + * starting it, and must set all parts of the CPU state which
> + * the previous TB in the chain may not have updated. This
> + * will need to do more. If this hook is not implemented then
> + * the default is to call
> + * @set_pc(tb->pc).
> + */
> +void (*synchronize_from_tb)(CPUState *cpu, struct TranslationBlock *tb);
>  } TcgCpuOperations;
>  
>  #endif /* TCG_CPU_OPS_H */
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 1fa9382a7c..e29601d7db 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -2242,7 +2242,7 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->cpu_exec_interrupt = arm_cpu_exec_interrupt;
>  cc->dump_state = arm_cpu_dump_state;
>  cc->set_pc = arm_cpu_set_pc;
> -cc->synchronize_from_tb = arm_cpu_synchronize_from_tb;
> +cc->tcg_ops.synchronize_from_tb = arm_cpu_synchronize_from_tb;

Similar comment than previous patch, please keep cc->tcg_ops.*
guarded withing #ifdef CONFIG_TCG.

With this change:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-6.0 00/11] target/arm: enforce alignment

2020-12-04 Thread Peter Maydell

On Fri, 4 Dec 2020 at 06:17, Pavel Dovgalyuk  wrote:
>
> On 03.12.2020 19:14, Peter Maydell wrote:
> > On Thu, 3 Dec 2020 at 16:10, Pavel Dovgalyuk  
> > wrote:
> >>
> >> On 03.12.2020 15:30, Philippe Mathieu-Daudé wrote:
> >>> Cc'ing Pavel
> >>>
> >>> On 12/1/20 4:55 PM, Peter Maydell wrote:
>  On Wed, 25 Nov 2020 at 04:06, Richard Henderson
>   wrote:
> >
> > As reported in https://bugs.launchpad.net/bugs/1905356
> >
> > Not implementing SCTLR.A, but all of the other required
> > alignment for SCTLR.A=0 in Table A3-1.
> 
>  Something in this series breaks the 'make check-acceptance'
>  record-and-replay test:
> 
> (30/40) 
>  tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt:
>  PASS (9.14 s)
> (31/40) 
>  tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt:
>  INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
>  Timeout reached\nOriginal status: ERROR\n{'name':
>  '31-tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt',
>  'logdir': 
>  '/home/petmay01/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/result...
>  (90.19 s)
> 
>  The log shows the "recording execution" apparently hanging,
>  with the last output from the guest
>  [3.183662] Registering SWP/SWPB emulation handler
> >>
> >> I looked through the patches and it does not seem that they can break
> >> anything.
> >> Could it be the same avocado/chardev socket glitch as in some previous
> >> failures?
> >> What happens when re-running this test?
> >
> > I ran it a couple of times with the patchset and it failed the same
> > way each time. Without is fine.
>
> I applied the patches and got no failures on my local machine.
>
> Do you have any ideas on debugging this bug?
> What does "arm-clang" means? Is the host compiler is clang?

Yes, it's a clang build (with the sanitizers enabled, though I didn't
see any output from the sanitizers in the logfile).

thanks
-- PMM

[PATCH 6/8] tests/acpi: update expected files

2020-12-04 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 tests/qtest/bios-tables-test-allowed-diff.h |  21 
 tests/data/acpi/pc/DSDT | Bin 5060 -> 5067 bytes
 tests/data/acpi/pc/DSDT.acpihmat| Bin 6385 -> 6392 bytes
 tests/data/acpi/pc/DSDT.bridge  | Bin 6919 -> 6926 bytes
 tests/data/acpi/pc/DSDT.cphp| Bin 5524 -> 5531 bytes
 tests/data/acpi/pc/DSDT.dimmpxm | Bin 6714 -> 6721 bytes
 tests/data/acpi/pc/DSDT.hpbridge| Bin 5021 -> 5028 bytes
 tests/data/acpi/pc/DSDT.hpbrroot| Bin 3079 -> 3086 bytes
 tests/data/acpi/pc/DSDT.ipmikcs | Bin 5132 -> 5139 bytes
 tests/data/acpi/pc/DSDT.memhp   | Bin 6419 -> 6426 bytes
 tests/data/acpi/pc/DSDT.numamem | Bin 5066 -> 5073 bytes
 tests/data/acpi/pc/DSDT.roothp  | Bin 5256 -> 5263 bytes
 tests/data/acpi/q35/DSDT| Bin 7796 -> 7803 bytes
 tests/data/acpi/q35/DSDT.acpihmat   | Bin 9121 -> 9128 bytes
 tests/data/acpi/q35/DSDT.bridge | Bin 7814 -> 7821 bytes
 tests/data/acpi/q35/DSDT.cphp   | Bin 8260 -> 8267 bytes
 tests/data/acpi/q35/DSDT.dimmpxm| Bin 9450 -> 9457 bytes
 tests/data/acpi/q35/DSDT.ipmibt | Bin 7871 -> 7878 bytes
 tests/data/acpi/q35/DSDT.memhp  | Bin 9155 -> 9162 bytes
 tests/data/acpi/q35/DSDT.mmio64 | Bin 8927 -> 8934 bytes
 tests/data/acpi/q35/DSDT.numamem| Bin 7802 -> 7809 bytes
 tests/data/acpi/q35/DSDT.tis| Bin 8402 -> 8409 bytes
 22 files changed, 21 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index cc75f3fc46..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,22 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/DSDT",
-"tests/data/acpi/q35/DSDT",
-"tests/data/acpi/q35/DSDT.tis",
-"tests/data/acpi/q35/DSDT.bridge",
-"tests/data/acpi/q35/DSDT.mmio64",
-"tests/data/acpi/q35/DSDT.ipmibt",
-"tests/data/acpi/q35/DSDT.cphp",
-"tests/data/acpi/q35/DSDT.memhp",
-"tests/data/acpi/q35/DSDT.numamem",
-"tests/data/acpi/q35/DSDT.dimmpxm",
-"tests/data/acpi/q35/DSDT.acpihmat",
-"tests/data/acpi/pc/DSDT.bridge",
-"tests/data/acpi/pc/DSDT.ipmikcs",
-"tests/data/acpi/pc/DSDT.cphp",
-"tests/data/acpi/pc/DSDT.memhp",
-"tests/data/acpi/pc/DSDT.numamem",
-"tests/data/acpi/pc/DSDT.dimmpxm",
-"tests/data/acpi/pc/DSDT.acpihmat",
-"tests/data/acpi/pc/DSDT.roothp",
-"tests/data/acpi/pc/DSDT.hpbridge",
-"tests/data/acpi/pc/DSDT.hpbrroot",
diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
index 
4ca46e5a2bdb1dfab79dd8630aeeb9a386d8b30e..295819caa4a1bfafb7834b19649dac7e44d22839
 100644
GIT binary patch
delta 76
zcmX@2ep;QuQ`HIx3J!5(P;d@#^<#AQ
d^b2Nm4)P6SbawSJU}OMMZj1~}o1NJkc>q|H6axSN

delta 69
zcmX@Deng$iCDoenW33dtLlV)IGe7})v3md1SeSEM}d~}o7-ga7~l

diff --git a/tests/data/acpi/pc/DSDT.cphp b/tests/data/acpi/pc/DSDT.cphp
index 
8bab2f506409f2b025a63d8b91c7bfdaa931e626..d62a3a3e39801348b9d590343355678c38e520f7
 100644
GIT binary patch
delta 76
zcmbQDJzJa0CDuQ`HIx3J!5(P;d@#^<#AQ
d^b2Nm4)P6SbawSJU}OMMZj1~}o1NLCcmPEO6QBS9

delta 69
zcmZ3YK3AQ~CD~^XIC#bMh2$M
IyV!hL0ag?YtN;K2

delta 46
zcmeB^XqVt}33dr#=V4%AT)UA=n~l@UE@pmT=j0{Yh
IyV)Cg0BW`k7XSbN

delta 46
zcmbQN(WAlT66_MfBf`MI7{8J04;!bKU3{=pd~}oZWPbKIM&-$Q>@tijnvvCD@Zg!6A+e3eEwpevHnZ
Ye!+~+LB3&(`NBz1UrO0ke1#KL7v#

diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
index 
e7414e78563372fca4d2aab9d16c58c0ff8468f4..931afc6f626a022d368011c412f36211d1d34031
 100644
GIT binary patch
delta 76
zcmexj^V^2YCD-hQE4OB1PM+ryZB(I_~<6*$%`f8@Zg!6A+e3eEwpevHnZ
Ye!+~+LB3&(`NB)g;qd0jxa|7ytkO

diff --git a/tests/data/acpi/q35/DSDT.cphp b/tests/data/acpi/q35/DSDT.cphp
index 
69c5edf620529e995461ccba63b76a083f25b2b6..f5b411a54bb6942f59ed016d43c4c28a7f99eb6b
 100644
GIT binary patch
delta 76
zcmX@>(CDxc;1PM;ZnD}6)_~<6D$%`f8RJ8(vf55&{4K

diff --git a/tests/data/acpi/q35/DSDT.dimmpxm b/tests/data/acpi/q35/DSDT.dimmpxm
index 
af41acba6e0117191ad8495a30ded7b0acc4d2ca..4bb12250f9dcdb34b2c2801186746a3693398cea
 100644
GIT binary patch
delta 76
zcmaFm`O%ZhCDdHGqv1xb+Y+3P_VK|^@zG6Qlix|iF=|crm6YLNbawS}V`O03
JoFy5`3IMor4(k8_

delta 46
zcmX?RyWf_}CD

-- 
2.27.0

[PATCH 4/8] tests/acpi: allow expected files change

2020-12-04 Thread Igor Mammedov

Change that will be introduced by following patch:

@@ -557,6 +557,8 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPCDSDT", 
0x0001)
 CINS,   1,
 CRMV,   1,
 CEJ0,   1,
+,   1,
+CEJF,   1,
 Offset (0x05),
 CCMD,   8
 }

Signed-off-by: Igor Mammedov 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..cc75f3fc46 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,22 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/DSDT",
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.tis",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.dimmpxm",
+"tests/data/acpi/q35/DSDT.acpihmat",
+"tests/data/acpi/pc/DSDT.bridge",
+"tests/data/acpi/pc/DSDT.ipmikcs",
+"tests/data/acpi/pc/DSDT.cphp",
+"tests/data/acpi/pc/DSDT.memhp",
+"tests/data/acpi/pc/DSDT.numamem",
+"tests/data/acpi/pc/DSDT.dimmpxm",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/pc/DSDT.roothp",
+"tests/data/acpi/pc/DSDT.hpbridge",
+"tests/data/acpi/pc/DSDT.hpbrroot",
-- 
2.27.0

Re: [PATCH v4 11/11] hvf: arm: Implement -cpu host

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:57AM +0100, Alexander Graf wrote:
> Now that we have working system register sync, we push more target CPU
> properties into the virtual machine. That might be useful in some
> situations, but is not the typical case that users want.
> 
> So let's add a -cpu host option that allows them to explicitly pass all
> CPU capabilities of their host CPU into the guest.
> 

Acked-by: Roman Bolshakov 

Thanks,
Roman

[PATCH 1/8] hw: add compat machines for 6.0

2020-12-04 Thread Igor Mammedov

From: Cornelia Huck 

Add 6.0 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck 
Signed-off-by: Igor Mammedov 
---
 include/hw/boards.h|  3 +++
 include/hw/i386/pc.h   |  3 +++
 hw/arm/virt.c  |  9 -
 hw/core/machine.c  |  3 +++
 hw/i386/pc.c   |  3 +++
 hw/i386/pc_piix.c  | 14 +-
 hw/i386/pc_q35.c   | 13 -
 hw/ppc/spapr.c | 15 +--
 hw/s390x/s390-virtio-ccw.c | 14 +-
 9 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index a49e3a6b44..f94f4ad5d8 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -310,6 +310,9 @@ struct MachineState {
 } \
 type_init(machine_initfn##_register_types)
 
+extern GlobalProperty hw_compat_5_2[];
+extern const size_t hw_compat_5_2_len;
+
 extern GlobalProperty hw_compat_5_1[];
 extern const size_t hw_compat_5_1_len;
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 911e460097..49dfa667de 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -191,6 +191,9 @@ void pc_system_firmware_init(PCMachineState *pcms, 
MemoryRegion *rom_memory);
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
const CPUArchIdList *apic_ids, GArray *entry);
 
+extern GlobalProperty pc_compat_5_2[];
+extern const size_t pc_compat_5_2_len;
+
 extern GlobalProperty pc_compat_5_1[];
 extern const size_t pc_compat_5_1_len;
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 27dbeb549e..d21dad4491 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2587,10 +2587,17 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
+static void virt_machine_6_0_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(6, 0)
+
 static void virt_machine_5_2_options(MachineClass *mc)
 {
+virt_machine_6_0_options(mc);
+compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(5, 2)
+DEFINE_VIRT_MACHINE(5, 2)
 
 static void virt_machine_5_1_options(MachineClass *mc)
 {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d0408049b5..9c41b94e0d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,6 +28,9 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/vmstate.h"
 
+GlobalProperty hw_compat_5_2[] = {};
+const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
+
 GlobalProperty hw_compat_5_1[] = {
 { "vhost-scsi", "num_queues", "1"},
 { "vhost-user-blk", "num-queues", "1"},
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 17b514d1da..781523684c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,6 +97,9 @@
 #include "trace.h"
 #include CONFIG_DEVICES
 
+GlobalProperty pc_compat_5_2[] = {};
+const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2);
+
 GlobalProperty pc_compat_5_1[] = {
 { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
 };
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 13d1628f13..6188c3e97e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
 }
 
-static void pc_i440fx_5_2_machine_options(MachineClass *m)
+static void pc_i440fx_6_0_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_machine_options(m);
@@ -435,6 +435,18 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_I440FX_MACHINE(v6_0, "pc-i440fx-6.0", NULL,
+  pc_i440fx_6_0_machine_options);
+
+static void pc_i440fx_5_2_machine_options(MachineClass *m)
+{
+pc_i440fx_6_0_machine_options(m);
+m->alias = NULL;
+m->is_default = false;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
 DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
   pc_i440fx_5_2_machine_options);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a3f4959c43..0a212443aa 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->max_cpus = 288;
 }
 
-static void pc_q35_5_2_machine_options(MachineClass *m)
+static void pc_q35_6_0_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_machine_options(m);
@@ -352,6 +352,17 @@ static void pc_q35_5_2_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_Q35_MACHINE(v6_0, "pc-q35-6.0", NULL,
+   pc_q35_6_0_machine_options);
+
+static void pc_q35_5_2_machine_options(MachineClass *m)
+{
+pc_q35_6_0_machine_options(m);
+m->alias = NULL;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);

Re: [PATCH v4 08/11] arm: Add Hypervisor.framework build target

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:54AM +0100, Alexander Graf wrote:
> Now that we have all logic in place that we need to handle 
> Hypervisor.framework
> on Apple Silicon systems, let's add CONFIG_HVF for aarch64 as well so that we
> can build it.
> 

Reviewed-by: Roman Bolshakov 
on x86:
Tested-by: Roman Bolshakov 

Thanks,
Roman

Re: [RFC 08/15] target/riscv: rvb: single-bit instructions

2020-12-04 Thread Frank Chang

On Fri, Nov 20, 2020 at 5:04 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 11/19/20 12:35 PM, Richard Henderson wrote:
> > On 11/18/20 12:29 AM, frank.ch...@sifive.com wrote:
> >> +static bool trans_sbset(DisasContext *ctx, arg_sbset *a)
> >> +{
> >> +REQUIRE_EXT(ctx, RVB);
> >> +return gen_arith(ctx, a, _sbset);
> >> +}
> >> +
> >> +static bool trans_sbseti(DisasContext *ctx, arg_sbseti *a)
> >> +{
> >> +REQUIRE_EXT(ctx, RVB);
> >> +return gen_arith_shamt_tl(ctx, a, _sbset);
> >> +}
> >> +
> >> +static bool trans_sbclr(DisasContext *ctx, arg_sbclr *a)
> >> +{
> >> +REQUIRE_EXT(ctx, RVB);
> >> +return gen_arith(ctx, a, _sbclr);
> >> +}
> >
> > Coming back to my re-use of code thing, these should use gen_shift.  That
> > handles the truncate of source2 to the shift amount.
> >
> >> +static bool trans_sbclri(DisasContext *ctx, arg_sbclri *a)
> >> +{
> >> +REQUIRE_EXT(ctx, RVB);
> >> +return gen_arith_shamt_tl(ctx, a, _sbclr);
> >> +}
> >> +
> >> +static bool trans_sbinv(DisasContext *ctx, arg_sbinv *a)
> >> +{
> >> +REQUIRE_EXT(ctx, RVB);
> >> +return gen_arith(ctx, a, _sbinv);
> >> +}
> >> +
> >> +static bool trans_sbinvi(DisasContext *ctx, arg_sbinvi *a)
> >> +{
> >> +REQUIRE_EXT(ctx, RVB);
> >> +return gen_arith_shamt_tl(ctx, a, _sbinv);
> >> +}
> >
> > I think there ought to be a gen_shifti for these.
>
> Hmm.  I just realized that gen_shifti would have a generator callback with
> a
> constant argument, a-la tcg_gen_shli_tl.
>
> I don't know if it's worth duplicating gen_sbclr et al for a constant
> argument.
>  And the sloi/sroi insns besides.  Perhaps a gen_shifti_var helper instead?
>
> Let me know what you think, but at the moment we're left with an
> incoherent set
> of helpers that seem split along lines that are less than ideal.
>
>
> r~
>

Thanks Richard and sorry for the late reply.

If we can have gen_shift(), gen_shifti(), gen_shiftw() and gen_shiftiw(),
then we can eliminate the needs of:
gen_arith_shamt_tl(), gen_sbop_shamt(), gen_sbopw_shamt()
and gen_sbopw_common()
and most of the *w version generators can be removed, too.

For *w version, we just need to call gen_shiftw() or gen_shiftiw()
with the reused non-*w version generator.
For example:

  static bool trans_sbclrw(DisasContext *ctx, arg_sbclrw *a)
  {
  REQUIRE_EXT(ctx, RVB);
  return gen_shiftw(ctx, a, _sbclr);
  }

  static bool trans_sbclriw(DisasContext *ctx, arg_sbclriw *a)
  {
  REQUIRE_EXT(ctx, RVB);
  return gen_shiftiw(ctx, a, _sbclr);
  }

both of which can reuse gen_sbclr() generator:

  static void gen_sbclr(TCGv ret, TCGv arg1, TCGv shamt)
  {
  TCGv t = tcg_temp_new();
  tcg_gen_movi_tl(t, 1);
  tcg_gen_shl_tl(t, t, shamt);
  tcg_gen_andc_tl(ret, arg1, t);
  tcg_temp_free(t);
  }

The gen_shift*() I have now are as follow:

  static bool gen_shift(DisasContext *ctx, arg_r *a,
  void(*func)(TCGv, TCGv, TCGv))
  {
  TCGv source1 = tcg_temp_new();
  TCGv source2 = tcg_temp_new();

  gen_get_gpr(source1, a->rs1);
  gen_get_gpr(source2, a->rs2);

  tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
  (*func)(source1, source1, source2);

  gen_set_gpr(a->rd, source1);
  tcg_temp_free(source1);
  tcg_temp_free(source2);
  return true;
  }

  static bool gen_shifti(DisasContext *ctx, arg_shift *a,
  void(*func)(TCGv, TCGv, TCGv))
  {
  TCGv source1 = tcg_temp_new();
  TCGv source2 = tcg_temp_new();

 gen_get_gpr(source1, a->rs1);
 tcg_gen_movi_tl(source2, a->shamt);

  tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
  (*func)(source1, source1, source2);

  gen_set_gpr(a->rd, source1);
  tcg_temp_free(source1);
  tcg_temp_free(source2);
  return true;
  }

  static bool gen_shiftw(DisasContext *ctx, arg_r *a,
  void(*func)(TCGv, TCGv, TCGv))
  {
  TCGv source1 = tcg_temp_new();
  TCGv source2 = tcg_temp_new();

  gen_get_gpr(source1, a->rs1);
  gen_get_gpr(source2, a->rs2);

  tcg_gen_andi_tl(source2, source2, 31);
  (*func)(source1, source1, source2);
  tcg_gen_ext32s_tl(source1, source1);

  gen_set_gpr(a->rd, source1);
  tcg_temp_free(source1);
  tcg_temp_free(source2);
  return true;
  }

  static bool gen_shiftiw(DisasContext *ctx, arg_shift *a,
  void(*func)(TCGv, TCGv, TCGv))
  {
  TCGv source1 = tcg_temp_new();
  TCGv source2 = tcg_temp_new();

 gen_get_gpr(source1, a->rs1);
 tcg_gen_movi_tl(source2, a->shamt);

 tcg_gen_andi_tl(source2, source2, 31);
 (*func)(source1, source1, source2);
 tcg_gen_ext32s_tl(source1, source1);

  gen_set_gpr(a->rd, source1);
  tcg_temp_free(source1);
  tcg_temp_free(source2);
  return true;
  }

They may be further merged as most of them are duplicate with only the
differences of:
gen_get_gpr(source2,

[PATCH 0/8] add support for cpu hot-unplug with SMI broadcast enabled

2020-12-04 Thread Igor Mammedov

Changelog:
 since RFC:
  - split one big patch on smaller chunks
  - clear bit #4 in CPU eject
  - drop bit #4 toggle semantics and let it set only to 1 from guest side
  - do not allow unplug without hotplug
  - update expected ACPI tables to let CI pass

It's QEMU side to support CPU hot-unplug when using OVMF as firmware with SMI
broadcast enabled (default). It adds new bit in CPU hotplug hw, to mark CPU
as pending for removal by firmware and passes control to it to perform CPU
eject once it's ready (i.e. forgot and no longer uses that CPU).

Patches 2-7 are preparatory, adding neccesary HW and ACPI bits for the feature
and the last patch enables feature by default since 6.0 machine type.

Cornelia Huck (1):
  hw: add compat machines for 6.0

Igor Mammedov (7):
  acpi: cpuhp: introduce 'firmware performs eject' status/control bits
  x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug
  tests/acpi: allow expected files change
  x86: acpi: let the firmware handle pending "CPU remove" events in SMM
  tests/acpi: update expected files
  x86: ich9: factor out "guest_cpu_hotplug_features"
  x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature

 include/hw/acpi/cpu.h |   2 ++
 include/hw/boards.h   |   3 +++
 include/hw/i386/pc.h  |   3 +++
 docs/specs/acpi_cpu_hotplug.txt   |  19 ++-
 hw/acpi/cpu.c |  24 ++--
 hw/acpi/trace-events  |   2 ++
 hw/arm/virt.c |   9 -
 hw/core/machine.c |   3 +++
 hw/i386/acpi-build.c  |   5 +
 hw/i386/pc.c  |   5 +
 hw/i386/pc_piix.c |  14 +-
 hw/i386/pc_q35.c  |  13 -
 hw/isa/lpc_ich9.c |  16 +---
 hw/ppc/spapr.c|  15 +--
 hw/s390x/s390-virtio-ccw.c|  14 +-
 tests/data/acpi/pc/DSDT   | Bin 5060 -> 5067 bytes
 tests/data/acpi/pc/DSDT.acpihmat  | Bin 6385 -> 6392 bytes
 tests/data/acpi/pc/DSDT.bridge| Bin 6919 -> 6926 bytes
 tests/data/acpi/pc/DSDT.cphp  | Bin 5524 -> 5531 bytes
 tests/data/acpi/pc/DSDT.dimmpxm   | Bin 6714 -> 6721 bytes
 tests/data/acpi/pc/DSDT.hpbridge  | Bin 5021 -> 5028 bytes
 tests/data/acpi/pc/DSDT.hpbrroot  | Bin 3079 -> 3086 bytes
 tests/data/acpi/pc/DSDT.ipmikcs   | Bin 5132 -> 5139 bytes
 tests/data/acpi/pc/DSDT.memhp | Bin 6419 -> 6426 bytes
 tests/data/acpi/pc/DSDT.numamem   | Bin 5066 -> 5073 bytes
 tests/data/acpi/pc/DSDT.roothp| Bin 5256 -> 5263 bytes
 tests/data/acpi/q35/DSDT  | Bin 7796 -> 7803 bytes
 tests/data/acpi/q35/DSDT.acpihmat | Bin 9121 -> 9128 bytes
 tests/data/acpi/q35/DSDT.bridge   | Bin 7814 -> 7821 bytes
 tests/data/acpi/q35/DSDT.cphp | Bin 8260 -> 8267 bytes
 tests/data/acpi/q35/DSDT.dimmpxm  | Bin 9450 -> 9457 bytes
 tests/data/acpi/q35/DSDT.ipmibt   | Bin 7871 -> 7878 bytes
 tests/data/acpi/q35/DSDT.memhp| Bin 9155 -> 9162 bytes
 tests/data/acpi/q35/DSDT.mmio64   | Bin 8927 -> 8934 bytes
 tests/data/acpi/q35/DSDT.numamem  | Bin 7802 -> 7809 bytes
 tests/data/acpi/q35/DSDT.tis  | Bin 8402 -> 8409 bytes
 36 files changed, 131 insertions(+), 16 deletions(-)

-- 
2.27.0

[PATCH 2/8] acpi: cpuhp: introduce 'firmware performs eject' status/control bits

2020-12-04 Thread Igor Mammedov

Adds bit #4 to status/control field of CPU hotplug MMIO interface.
New bit will be used OSPM to mark CPUs as pending for removal by firmware,
when it calls _EJ0 method on CPU device node. Later on, when firmware
sees this bit set, it will perform CPU eject which will clear bit #4
as well.

Signed-off-by: Igor Mammedov 
---
v1:
  - rearrange status/control bits description (Laszlo)
  - add clear bit #4 on eject
  - drop toggling logic from bit #4, it can be only set by guest
and clear as part of cpu eject
  - exclude boot CPU from remove request
  - add trace events for new bit
---
 include/hw/acpi/cpu.h   |  1 +
 docs/specs/acpi_cpu_hotplug.txt | 19 ++-
 hw/acpi/cpu.c   |  9 +
 hw/acpi/trace-events|  2 ++
 4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 0eeedaa491..d71edde456 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
 uint64_t arch_id;
 bool is_inserting;
 bool is_removing;
+bool fw_remove;
 uint32_t ost_event;
 uint32_t ost_status;
 } AcpiCpuStatus;
diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index 9bb22d1270..9bd59ae0da 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -56,8 +56,11 @@ read access:
   no device check event to OSPM was issued.
   It's valid only when bit 0 is set.
2: Device remove event, used to distinguish device for which
-  no device eject request to OSPM was issued.
-   3-7: reserved and should be ignored by OSPM
+  no device eject request to OSPM was issued. Firmware must
+  ignore this bit.
+   3: reserved and should be ignored by OSPM
+   4: if set to 1, OSPM requests firmware to perform device eject.
+   5-7: reserved and should be ignored by OSPM
 [0x5-0x7] reserved
 [0x8] Command data: (DWORD access)
   contains 0 unless value last stored in 'Command field' is one of:
@@ -79,10 +82,16 @@ write access:
selected CPU device
 2: if set to 1 clears device remove event, set by OSPM
after it has emitted device eject request for the
-   selected CPU device
+   selected CPU device.
 3: if set to 1 initiates device eject, set by OSPM when it
-   triggers CPU device removal and calls _EJ0 method
-4-7: reserved, OSPM must clear them before writing to register
+   triggers CPU device removal and calls _EJ0 method or by firmware
+   when bit #4 is set. In case bit #4 were set, it's cleared as
+   part of device eject.
+4: if set to 1, OSPM hands over device eject to firmware.
+   Firmware shall issue device eject request as described above
+   (bit #3) and OSPM should not touch device eject bit (#3) in case
+   it's asked firmware to perform CPU device eject.
+5-7: reserved, OSPM must clear them before writing to register
 [0x5] Command field: (1 byte access)
   value:
 0: selects a CPU device with inserting/removing events and
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index f099b50927..811218f673 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, 
unsigned size)
 val |= cdev->cpu ? 1 : 0;
 val |= cdev->is_inserting ? 2 : 0;
 val |= cdev->is_removing  ? 4 : 0;
+val |= cdev->fw_remove  ? 16 : 0;
 trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
 break;
 case ACPI_CPU_CMD_DATA_OFFSET_RW:
@@ -148,6 +149,14 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, 
uint64_t data,
 hotplug_ctrl = qdev_get_hotplug_handler(dev);
 hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
 object_unparent(OBJECT(dev));
+cdev->fw_remove = false;
+} else if (data & 16) {
+if (!cdev->cpu || cdev->cpu == first_cpu) {
+trace_cpuhp_acpi_fw_remove_invalid_cpu(cpu_st->selector);
+break;
+}
+trace_cpuhp_acpi_fw_remove_cpu(cpu_st->selector);
+cdev->fw_remove = true;
 }
 break;
 case ACPI_CPU_CMD_OFFSET_WR:
diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
index afbc77de1c..f91ced477d 100644
--- a/hw/acpi/trace-events
+++ b/hw/acpi/trace-events
@@ -29,6 +29,8 @@ cpuhp_acpi_clear_inserting_evt(uint32_t idx) 
"idx[0x%"PRIx32"]"
 cpuhp_acpi_clear_remove_evt(uint32_t idx) "idx[0x%"PRIx32"]"
 cpuhp_acpi_ejecting_invalid_cpu(uint32_t idx) "0x%"PRIx32
 cpuhp_acpi_ejecting_cpu(uint32_t idx) "0x%"PRIx32
+cpuhp_acpi_fw_remove_invalid_cpu(uint32_t idx) "0x%"PRIx32
+cpuhp_acpi_fw_remove_cpu(uint32_t idx) "0x%"PRIx32

[PATCH 3/8] x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug

2020-12-04 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 hw/i386/acpi-build.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1f5c211245..9036e5594c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
 bool s4_disabled;
 bool pcihp_bridge_en;
 bool smi_on_cpuhp;
+bool smi_on_cpu_unplug;
 bool pcihp_root_en;
 uint8_t s4_val;
 AcpiFadtData fadt;
@@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, 
AcpiPmInfo *pm)
 pm->pcihp_io_base = 0;
 pm->pcihp_io_len = 0;
 pm->smi_on_cpuhp = false;
+pm->smi_on_cpu_unplug = false;
 
 assert(obj);
 init_common_fadt_data(machine, obj, >fadt);
@@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, 
AcpiPmInfo *pm)
 pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
 pm->smi_on_cpuhp =
 !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
+pm->smi_on_cpu_unplug =
+!!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
 }
 
 /* The above need not be conditional on machine type because the reset port
-- 
2.27.0

Re: [PATCH v4 00/11] hvf: Implement Apple Silicon Support

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:46AM +0100, Alexander Graf wrote:
> Now that Apple Silicon is widely available, people are obviously excited
> to try and run virtualized workloads on them, such as Linux and Windows.
> 
> This patch set implements a fully functional version to get the ball
> going on that. With this applied, I can successfully run both Linux and
> Windows as guests. I am not aware of any limitations specific to
> Hypervisor.framework apart from:
> 
>   - Live migration / savevm
>   - gdbstub debugging (SP register)
> 
> 

Perhaps we need more eyes from ARM developers to review it.
Otherwise, it's a good christmas present for QEMU users.

Thanks,
Roman

[PATCH 5/8] x86: acpi: let the firmware handle pending "CPU remove" events in SMM

2020-12-04 Thread Igor Mammedov

if firmware and QEMU negotiated CPU hotunplug support, generate
_EJ0 method so that it will mark CPU for removal by firmware and
pass control to it by triggering SMI.

Signed-off-by: Igor Mammedov 
---
 include/hw/acpi/cpu.h |  1 +
 hw/acpi/cpu.c | 15 +--
 hw/i386/acpi-build.c  |  1 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index d71edde456..999caaf510 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -51,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
 typedef struct CPUHotplugFeatures {
 bool acpi_1_compatible;
 bool has_legacy_cphp;
+bool fw_unplugs_cpu;
 const char *smi_path;
 } CPUHotplugFeatures;
 
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 811218f673..bded2a837f 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -341,6 +341,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
 #define CPU_INSERT_EVENT  "CINS"
 #define CPU_REMOVE_EVENT  "CRMV"
 #define CPU_EJECT_EVENT   "CEJ0"
+#define CPU_FW_EJECT_EVENT "CEJF"
 
 void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 hwaddr io_base,
@@ -393,7 +394,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
 aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
 /* initiates device eject, write only */
 aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
-aml_append(field, aml_reserved_field(4));
+aml_append(field, aml_reserved_field(1));
+/* tell firmware to do device eject, write only */
+aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
+aml_append(field, aml_reserved_field(2));
 aml_append(field, aml_named_field(CPU_COMMAND, 8));
 aml_append(cpu_ctrl_dev, field);
 
@@ -428,6 +432,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
 Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
 Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
 Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
+Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
 
 aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
 aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
@@ -470,7 +475,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
 
 aml_append(method, aml_acquire(ctrl_lock, 0x));
 aml_append(method, aml_store(idx, cpu_selector));
-aml_append(method, aml_store(one, ej_evt));
+if (opts.fw_unplugs_cpu) {
+aml_append(method, aml_store(one, fw_ej_evt));
+aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
+   aml_name("%s", opts.smi_path)));
+} else {
+aml_append(method, aml_store(one, ej_evt));
+}
 aml_append(method, aml_release(ctrl_lock));
 }
 aml_append(cpus_dev, method);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9036e5594c..475e76f514 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1586,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 CPUHotplugFeatures opts = {
 .acpi_1_compatible = true, .has_legacy_cphp = true,
 .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
+.fw_unplugs_cpu = pm->smi_on_cpu_unplug,
 };
 build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
"\\_SB.PCI0", "\\_GPE._E02");
-- 
2.27.0

[PATCH 8/8] x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature

2020-12-04 Thread Igor Mammedov

Keep CPU hotunplug with SMI disabled on 5.2 and older and enable
it by default on newer machine types.

Signed-off-by: Igor Mammedov 
---
v1:
  - ensure that unplug can't be enabled without hotplug (Laszlo)
---
 hw/i386/pc.c  | 4 +++-
 hw/isa/lpc_ich9.c | 8 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 781523684c..6476d8d853 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,7 +97,9 @@
 #include "trace.h"
 #include CONFIG_DEVICES
 
-GlobalProperty pc_compat_5_2[] = {};
+GlobalProperty pc_compat_5_2[] = {
+{ "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
+};
 const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2);
 
 GlobalProperty pc_compat_5_1[] = {
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index da80430144..d3145bf014 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -392,6 +392,12 @@ static void smi_features_ok_callback(void *opaque)
 return;
 }
 
+if (guest_cpu_hotplug_features ==
+BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
+/* cpu hot-unplug is unsupported without cpu-hotplug */
+return;
+}
+
 /* valid feature subset requested, lock it down, report success */
 lpc->smi_negotiated_features = guest_features;
 lpc->smi_features_ok = 1;
@@ -774,7 +780,7 @@ static Property ich9_lpc_properties[] = {
 DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
   ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
 DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
-  ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
+  ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.27.0

Re: [PATCH v4 10/11] hvf: arm: Add support for GICv3

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:56AM +0100, Alexander Graf wrote:
> We currently only support GICv2 emulation. To also support GICv3, we will
> need to pass a few system registers into their respective handler functions.
> 
> This patch adds handling for all of the required system registers, so that
> we can run with more than 8 vCPUs.
> 

Acked-by: Roman Bolshakov 

Thanks,
Roman

Re: [PATCH 9/9] target/mips: Explode gen_msa_branch() as gen_msa_BxZ_V/BxZ()

2020-12-04 Thread Richard Henderson

On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
> +static bool gen_msa_BxZ(DisasContext *ctx, int df, int wt, int s16, bool 
> if_not)
> +{
> +check_msa_access(ctx);
> +
> +if (ctx->hflags & MIPS_HFLAG_BMASK) {
> +generate_exception_end(ctx, EXCP_RI);
> +return true;
> +}
> +
> +gen_check_zero_element(bcond, df, wt);
> +if (if_not) {
> +tcg_gen_setcondi_tl(TCG_COND_EQ, bcond, bcond, 0);
> +}

Since gen_check_zero_element already produces a boolean, this is better as

  tcg_gen_xori_tl(bcond, bcond, if_not);

where tcg_gen_xori_tl already contains the if.

>  case OPC_BNZ_D:
> -gen_check_zero_element(bcond, df, wt);
> -tcg_gen_setcondi_tl(TCG_COND_EQ, bcond, bcond, 0);
> +gen_msa_BxZ(ctx, df, wt, s16, true);

... oops, that'd be for a follow-up patch, to make this patch just code 
movement.

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v4 09/11] arm/hvf: Add a WFI handler

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:55AM +0100, Alexander Graf wrote:
> From: Peter Collingbourne 
> 
> Sleep on WFI until the VTIMER is due but allow ourselves to be woken
> up on IPI.
> 
> In this implementation IPI is blocked on the CPU thread at startup and
> pselect() is used to atomically unblock the signal and begin sleeping.
> The signal is sent unconditionally so there's no need to worry about
> races between actually sleeping and the "we think we're sleeping"
> state. It may lead to an extra wakeup but that's better than missing
> it entirely.
> 

Acked-by: Roman Bolshakov 

Thanks,
Roman

Re: [for-6.0 v5 12/13] securable guest memory: Alter virtio default properties for protected guests

2020-12-04 Thread Cornelia Huck

On Fri,  4 Dec 2020 16:44:14 +1100
David Gibson  wrote:

> The default behaviour for virtio devices is not to use the platforms normal
> DMA paths, but instead to use the fact that it's running in a hypervisor
> to directly access guest memory.  That doesn't work if the guest's memory
> is protected from hypervisor access, such as with AMD's SEV or POWER's PEF.
> 
> So, if a securable guest memory mechanism is enabled, then apply the
> iommu_platform=on option so it will go through normal DMA mechanisms.
> Those will presumably have some way of marking memory as shared with
> the hypervisor or hardware so that DMA will work.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Dr. David Alan Gilbert 
> ---
>  hw/core/machine.c | 13 +
>  1 file changed, 13 insertions(+)

Reviewed-by: Cornelia Huck

[PATCH] block/nvme: Do not allow image creation with NVMe block driver

2020-12-04 Thread Philippe Mathieu-Daudé

The NVMe driver does not support image creation.
The full drive has to be passed to the guest.

Before:

  $ qemu-img create -f raw nvme://:04:00.0/1 20G
  Formatting 'nvme://:04:00.0/1', fmt=raw size=21474836480

  $ qemu-img info nvme://:04:00.0/1
  image: nvme://:04:00.0/1
  file format: raw
  virtual size: 349 GiB (375083606016 bytes)
  disk size: unavailable

After:

  $ qemu-img create -f raw nvme://:04:00.0/1 20G
  qemu-img: nvme://:04:00.0/1: Protocol driver 'nvme' does not support 
image creation

Fixes: 5a5e7f8cd86 ("block: trickle down the fallback image creation function 
use to the block drivers")
Reported-by: Xueqiang Wei 
Suggested-by: Max Reitz 
Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Maxim Levitsky 
---
 block/nvme.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index a06a188d530..73ddf837c2b 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1515,9 +1515,6 @@ static BlockDriver bdrv_nvme = {
 .protocol_name= "nvme",
 .instance_size= sizeof(BDRVNVMeState),
 
-.bdrv_co_create_opts  = bdrv_co_create_opts_simple,
-.create_opts  = _create_opts_simple,
-
 .bdrv_parse_filename  = nvme_parse_filename,
 .bdrv_file_open   = nvme_file_open,
 .bdrv_close   = nvme_close,
-- 
2.26.2

Re: [PATCH 1/2] virtio-blk: Acquire context while switching them on dataplane start

2020-12-04 Thread Eric Blake

On 12/4/20 10:53 AM, Sergio Lopez wrote:
> On dataplane start, acquire the new AIO context before calling
> 'blk_set_aio_context', releasing it immediately afterwards. This
> prevents reaching the AIO context attach/detach notifier functions
> without having acquired it first.
> 
> It was also the only place where 'blk_set_aio_context' was called with
> an unprotected AIO context.
> 
> Signed-off-by: Sergio Lopez 
> ---
>  hw/block/dataplane/virtio-blk.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Eric Blake 

I'll queue through my NBD tree, but will wait a couple days to see if
other block developers want to add review comments.

> 
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 37499c5564..034e43cb1f 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -214,7 +214,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
>  vblk->dataplane_started = true;
>  trace_virtio_blk_data_plane_start(s);
>  
> +aio_context_acquire(s->ctx);
>  r = blk_set_aio_context(s->conf->conf.blk, s->ctx, _err);
> +aio_context_release(s->ctx);
>  if (r < 0) {
>  error_report_err(local_err);
>  goto fail_guest_notifiers;
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v1 1/1] intc/ibex_plic: Clear interrupts that occur during claim process

2020-12-04 Thread Alistair Francis

Previously if an interrupt occured during the claim process (after the
interrupt is claimed but before it's completed) it would never be
cleared.
This patch ensures that we also clear the hidden_pending bits as well.

Signed-off-by: Alistair Francis 
---
 hw/intc/ibex_plic.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/intc/ibex_plic.c b/hw/intc/ibex_plic.c
index 341c9db405..c1b72fcab0 100644
--- a/hw/intc/ibex_plic.c
+++ b/hw/intc/ibex_plic.c
@@ -43,16 +43,23 @@ static void ibex_plic_irqs_set_pending(IbexPlicState *s, 
int irq, bool level)
 {
 int pending_num = irq / 32;
 
+if (!level) {
+/*
+ * If the level is low make sure we clear the hidden_pending.
+ */
+s->hidden_pending[pending_num] &= ~(1 << (irq % 32));
+}
+
 if (s->claimed[pending_num] & 1 << (irq % 32)) {
 /*
  * The interrupt has been claimed, but not completed.
  * The pending bit can't be set.
+ * Save the pending level for after the interrupt is completed.
  */
 s->hidden_pending[pending_num] |= level << (irq % 32);
-return;
+} else {
+s->pending[pending_num] |= level << (irq % 32);
 }
-
-s->pending[pending_num] |= level << (irq % 32);
 }
 
 static bool ibex_plic_irqs_pending(IbexPlicState *s, uint32_t context)
-- 
2.29.2

Re: [PATCH 8/9] target/mips: Remove CPUMIPSState* argument from gen_msa*() methods

2020-12-04 Thread Richard Henderson

On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
> The gen_msa*() methods don't use the "CPUMIPSState *env"
> argument. Remove it to simplify.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/mips/translate.c | 57 -
>  1 file changed, 28 insertions(+), 29 deletions(-)

Reviewed-by: Richard Henderson 

r~

x86 TCG helpers clobbered registers

2020-12-04 Thread Stephane Duverger

Hello,

While looking at tcg/i386/tcg-target.c.inc:tcg_out_qemu_st(), I
discovered that the TCG generates a call to a store helper at the end
of the TB which is executed on TLB miss and get back to the remaining
translated ops. I tried to mimick this behavior around the fast path
(right between tcg_out_tlb_load() and tcg_out_qemu_st_direct()) to
filter on memory store accesses.

I know there is now TCG plugins for that purpose at TCG IR level,
which every tcg-target might benefit. FWIW, my design choice was more
led by the fact that I always work on an x86 host and plugins did not
exist by the time. Anyway, the point is more related to generating a
call to a helper at the TCG IR level (classic scenario), or later
during tcg-target code generation (slow path for instance).

The TCG when calling a helper knows that some registers will be call
clobbered and as such must free them. This is what I observed in
tcg_reg_alloc_call():

/* clobber call registers */
for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
tcg_reg_free(s, i, allocated_regs);
}
}

But in our case (ie. INDEX_op_qemu_st_i32), the TCG code path comes
from:

tcg_reg_alloc_op()
  tcg_out_op()
tcg_out_qemu_st()

Then tcg_out_tlb_load() will inject a 'jmp' to the slow path, whose
generated code does not seem to take care of every call clobbered
registers, if we look at tcg_out_qemu_st_slow_path().

First for an i386 (32bits) tcg-target, as expected, the helper
arguments are injected into the stack. I noticed that 'esp' is not
shifted down before stacking up the args, which might corrupt last
stacked words.

Second, for both 32/64 bits tcg-targets since all of the 'call
clobbered' registers are not preserved, it may happen that depending
on the code executed by the helper (and so generated by GCC) these
registers will be clobbered (ie. R10 for x86-64).

While this never happened for the slow path helper call, I observed
that my guest had trouble running when filtering memory in the same
fashion the slow path helper would be called. Conversely, if I
push/pop all of the call clobbered regs around the call to the helper,
everything runs as expected.

Is this correct ? Am I missing something ?

Thanks a lot in advance for your eagle eye on this :)

Re: [PATCH 7/9] target/mips: Extract msa_translate_init() from mips_tcg_init()

2020-12-04 Thread Richard Henderson

On 12/2/20 12:44 PM, Philippe Mathieu-Daudé wrote:
> Extract the logic initialization of the MSA registers from
> the generic initialization.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/mips/translate.c | 35 ---
>  1 file changed, 20 insertions(+), 15 deletions(-)

Why?

> -fpu_f64[i] = tcg_global_mem_new_i64(cpu_env, off, msaregnames[i * 
> 2]);
> +fpu_f64[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);

Maybe fold this back to the previous patch?


r~

[PATCH 2/2] nbd/server: Quiesce coroutines on context switch

2020-12-04 Thread Sergio Lopez

When switching between AIO contexts we need to me make sure that both
recv_coroutine and send_coroutine are not scheduled to run. Otherwise,
QEMU may crash while attaching the new context with an error like
this one:

aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'

To achieve this we need a local implementation of
'qio_channel_readv_all_eof' named 'nbd_read_eof' (a trick already done
by 'nbd/client.c') that allows us to interrupt the operation and to
know when recv_coroutine is yielding.

With this in place, we delegate detaching the AIO context to the
owning context with a BH ('nbd_aio_detach_bh') scheduled using
'aio_wait_bh_oneshot'. This BH signals that we need to quiesce the
channel by setting 'client->quiescing' to 'true', and either waits for
the coroutine to finish using AIO_WAIT_WHILE or, if it's yielding in
'nbd_read_eof', actively enters the coroutine to interrupt it.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900326
Signed-off-by: Sergio Lopez 
---
 nbd/server.c | 120 +--
 1 file changed, 106 insertions(+), 14 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 613ed2634a..7229f487d2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -132,6 +132,9 @@ struct NBDClient {
 CoMutex send_lock;
 Coroutine *send_coroutine;
 
+bool read_yielding;
+bool quiescing;
+
 QTAILQ_ENTRY(NBDClient) next;
 int nb_requests;
 bool closing;
@@ -1352,14 +1355,60 @@ static coroutine_fn int nbd_negotiate(NBDClient 
*client, Error **errp)
 return 0;
 }
 
-static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request,
+/* nbd_read_eof
+ * Tries to read @size bytes from @ioc. This is a local implementation of
+ * qio_channel_readv_all_eof. We have it here because we need it to be
+ * interruptible and to know when the coroutine is yielding.
+ * Returns 1 on success
+ * 0 on eof, when no data was read (errp is not set)
+ * negative errno on failure (errp is set)
+ */
+static inline int coroutine_fn
+nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
+{
+bool partial = false;
+
+assert(size);
+while (size > 0) {
+struct iovec iov = { .iov_base = buffer, .iov_len = size };
+ssize_t len;
+
+len = qio_channel_readv(client->ioc, , 1, errp);
+if (len == QIO_CHANNEL_ERR_BLOCK) {
+client->read_yielding = true;
+qio_channel_yield(client->ioc, G_IO_IN);
+client->read_yielding = false;
+if (client->quiescing) {
+return -EAGAIN;
+}
+continue;
+} else if (len < 0) {
+return -EIO;
+} else if (len == 0) {
+if (partial) {
+error_setg(errp,
+   "Unexpected end-of-file before all bytes were 
read");
+return -EIO;
+} else {
+return 0;
+}
+}
+
+partial = true;
+size -= len;
+buffer = (uint8_t *) buffer + len;
+}
+return 1;
+}
+
+static int nbd_receive_request(NBDClient *client, NBDRequest *request,
Error **errp)
 {
 uint8_t buf[NBD_REQUEST_SIZE];
 uint32_t magic;
 int ret;
 
-ret = nbd_read(ioc, buf, sizeof(buf), "request", errp);
+ret = nbd_read_eof(client, buf, sizeof(buf), errp);
 if (ret < 0) {
 return ret;
 }
@@ -1480,11 +1529,37 @@ static void blk_aio_attached(AioContext *ctx, void 
*opaque)
 
 QTAILQ_FOREACH(client, >clients, next) {
 qio_channel_attach_aio_context(client->ioc, ctx);
+
+assert(client->recv_coroutine == NULL);
+assert(client->send_coroutine == NULL);
+
+if (client->quiescing) {
+client->quiescing = false;
+nbd_client_receive_next_request(client);
+}
+}
+}
+
+static void nbd_aio_detach_bh(void *opaque)
+{
+NBDExport *exp = opaque;
+NBDClient *client;
+
+QTAILQ_FOREACH(client, >clients, next) {
+qio_channel_detach_aio_context(client->ioc);
+client->quiescing = true;
+
 if (client->recv_coroutine) {
-aio_co_schedule(ctx, client->recv_coroutine);
+if (client->read_yielding) {
+qemu_aio_coroutine_enter(exp->common.ctx,
+ client->recv_coroutine);
+} else {
+AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != 
NULL);
+}
 }
+
 if (client->send_coroutine) {
-aio_co_schedule(ctx, client->send_coroutine);
+AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
 }
 }
 }
@@ -1492,13 +1567,10 @@ static void blk_aio_attached(AioContext *ctx, void 
*opaque)
 static void blk_aio_detach(void *opaque)
 {
 NBDExport *exp = opaque;
-NBDClient *client;
 
 trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);

[PATCH 1/2] virtio-blk: Acquire context while switching them on dataplane start

2020-12-04 Thread Sergio Lopez

On dataplane start, acquire the new AIO context before calling
'blk_set_aio_context', releasing it immediately afterwards. This
prevents reaching the AIO context attach/detach notifier functions
without having acquired it first.

It was also the only place where 'blk_set_aio_context' was called with
an unprotected AIO context.

Signed-off-by: Sergio Lopez 
---
 hw/block/dataplane/virtio-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 37499c5564..034e43cb1f 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -214,7 +214,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 vblk->dataplane_started = true;
 trace_virtio_blk_data_plane_start(s);
 
+aio_context_acquire(s->ctx);
 r = blk_set_aio_context(s->conf->conf.blk, s->ctx, _err);
+aio_context_release(s->ctx);
 if (r < 0) {
 error_report_err(local_err);
 goto fail_guest_notifiers;
-- 
2.26.2

Re: [RFC v7 00/22] i386 cleanup [hw/core/cpu.c common]

2020-12-04 Thread Paolo Bonzini

Il ven 4 dic 2020, 14:54 Claudio Fontana  ha scritto:

> On 11/30/20 3:35 AM, Claudio Fontana wrote:
> > Hi all, this is v7 of the i386 cleanup,
>
> This is fairly broken still and I am fixing it up,
>
> but a question arises while hunting bugs here.
>
> Silent bugs are introduced when trying to use code like
>
> #ifndef CONFIG_USER_ONLY
>
> in files that are built in "common" objects, since they are target
> independent.
>

That should be avoided by poison.h

I wonder also about the rationale why the cpu code is split between
>
> hw/core/cpu.c and $(top_srcdir)/cpu.c
>
> with one part in common and one part in "target specific".
>

Mostly historical, cpu.c used to have much more than CPU code (it was
exec.c until a month ago, one of the "historical" core files in QEMU and it
had all the dispatch side of the memory API). I wouldn't mind merging these
two files into one.

Paolo


> What do we gain by having part of the cpu in common?
>
> In some cases we end up going through all sort of hoops because we cannot
> just code everything in hw/core/cpu.c due to the fact
> that we do not see CONFIG_ there.
>
>
> > with the most interesting patches at the end.
> >
> > v6 -> v7: integrate TCGCpuOperations, refactored cpu_exec_realizefn
> >
> > * integrate TCGCpuOperations (Eduardo)
> >
> > Taken some refactoring from Eduardo for Tcg-only operations on
> > CPUClass.
> >
> > * refactored cpu_exec_realizefn
> >
> > The other main change is a refactoring of cpu_exec_realizefn,
> > directly linked to the effort of making many cpu_exec operations
> > TCG-only (Eduardo series above):
> >
> > cpu_exec_realizefn is actually a TCG-only thing, with the
> > exception of a couple things that can be done in base cpu code.
> >
> > This changes all targets realizefn, so I guess I have to Cc:
> > the Multiverse? (Universe was already CCed for all accelerators).
> >
> >
> > v5 -> v6: remove MODULE_INIT_ACCEL_CPU
> >
> >
> > instead, use a call to accel_init_interfaces().
> >
> > * The class lookups are now general and performed in accel/
> >
> >   new AccelCPUClass for new archs are supported as new
> >   ones appear in the class hierarchy, no need for stubs.
> >
> > * Split the code a bit better
> >
> >
> > v4 -> v5: centralized and simplified initializations
> >
> > I put in Cc: Emilio G. Cota, specifically because in patch 8
> > I (re)moved for user-mode the call to tcg_regions_init().
> >
> > The call happens now inside the tcg AccelClass machine_init,
> > (so earlier). This seems to work fine, but thought to get the
> > author opinion on this.
> >
> > Rebased on "tcg-cpus: split into 3 tcg variants" series
> > (queued by Richard), to avoid some code churn:
> >
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04356.html
> >
> >
> > * Extended AccelClass to user-mode.
> >
> > user-mode now does not call tcg_exec_init directly,
> > instead it uses the tcg accel class, and its init_machine method.
> >
> > Since user-mode does not define or use a machine state,
> > the machine is just passed as NULL.
> >
> > The immediate advantage is that now we can call current_accel()
> > from both user mode and softmmu, so we can work out the correct
> > class to use for accelerator initializations.
> >
> > * QOMification of CpusAccelOps
> >
> > simple QOMification of CpusAccelOps abstract class.
> >
> > * Centralized all accel_cpu_init, so only one per cpu-arch,
> >   plus one for all accels will remain.
> >
> >   So we can expect accel_cpu_init() to be limited to:
> >
> >   softmmu/cpus.c - initializes the chosen softmmu accel ops for the cpus
> module.
> >   target/ARCH/cpu.c - initializes the chosen arch-specific cpu
> accelerator.
> >
> > These changes are meant to address concerns/issues (Paolo):
> >
> > 1) the use of if (tcg_enabled()) and similar in the module_init call path
> >
> > 2) the excessive number of accel_cpu_init() to hunt down in the codebase.
> >
> >
> > * Fixed wrong use of host_cpu_class_init (Eduardo)
> >
> >
> > v3 -> v4: QOMification of X86CPUAccelClass
> >
> >
> > In this version I basically QOMified X86CPUAccel, taking the
> > suggestions from Eduardo as the starting point,
> > but stopping just short of making it an actual QOM interface,
> > using a plain abstract class, and then subclasses for the
> > actual objects.
> >
> > Initialization is still using the existing qemu initialization
> > framework (module_call_init), which is I still think is better
> > than the alternatives proposed, in the current state.
> >
> > Possibly some improvements could be developed in the future here.
> > In this case, effort should be put in keeping things extendible,
> > in order not to be blocked once accelerators also become modules.
> >
> > Motivation and higher level steps:
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04628.html
> >
> > Looking forward to your comments on this proposal,
> >
> > Ciao,
> >
> > Claudio
> >
> > Claudio Fontana (13):
> >   i386: move kvm accel files into kvm/
> >

Re: [PATCH v4 06/11] hvf: Simplify post reset/init/loadvm hooks

2020-12-04 Thread Roman Bolshakov

On Fri, Dec 04, 2020 at 12:48:52AM +0100, Alexander Graf wrote:
> The hooks we have that call us after reset, init and loadvm really all
> just want to say "The reference of all register state is in the QEMU
> vcpu struct, please push it".
> 
> We already have a working pushing mechanism though called cpu->vcpu_dirty,
> so we can just reuse that for all of the above, syncing state properly the
> next time we actually execute a vCPU.
> 
> This fixes PSCI resets on ARM, as they modify CPU state even after the
> post init call has completed, but before we execute the vCPU again.
> 
> To also make the scheme work for x86, we have to make sure we don't
> move stale eflags into our env when the vcpu state is dirty.
> 
> Signed-off-by: Alexander Graf 
> ---
>  accel/hvf/hvf-cpus.c | 27 +++
>  target/i386/hvf/x86hvf.c |  5 -
>  2 files changed, 11 insertions(+), 21 deletions(-)
> 
> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> index 1b0c868944..71721e17de 100644
> --- a/accel/hvf/hvf-cpus.c
> +++ b/accel/hvf/hvf-cpus.c
> @@ -275,39 +275,26 @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
>  }
>  }
>  
> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
> -  run_on_cpu_data arg)
> +static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
> + run_on_cpu_data arg)
>  {
> -hvf_put_registers(cpu);
> -cpu->vcpu_dirty = false;
> +/* QEMU state is the reference, push it to HVF now and on next entry */

It's only signalling now. The actual push is delayed until the next
entry.

It'd be good if Paolo or Eduardo would also peek at this change because
it makes HVF a bit different from other accels.

HVF's post_reset, post_init and pre_loadvm no longer result into QEMU
state being pushed to HVF. I'm not sure I can fully grasp if there're
undesired side-effects of this so it's something worth broader review.

If nobody raises objections:

Reviewed-by: Roman Bolshakov 
Tested-by: Roman Bolshakov 

Thanks,
Roman

> +cpu->vcpu_dirty = true;
>  }
>  
>  static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>  {
> -run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
> -}
> -
> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
> - run_on_cpu_data arg)
> -{
> -hvf_put_registers(cpu);
> -cpu->vcpu_dirty = false;
> +run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
>  }
>  
>  static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>  {
> -run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> -}
> -
> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
> -  run_on_cpu_data arg)
> -{
> -cpu->vcpu_dirty = true;
> +run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
>  }
>  
>  static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>  {
> -run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
> +run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
>  }
>  
>  static void hvf_vcpu_destroy(CPUState *cpu)
> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
> index 0f2aeb1cf8..3111c0be4c 100644
> --- a/target/i386/hvf/x86hvf.c
> +++ b/target/i386/hvf/x86hvf.c
> @@ -435,7 +435,10 @@ int hvf_process_events(CPUState *cpu_state)
>  X86CPU *cpu = X86_CPU(cpu_state);
>  CPUX86State *env = >env;
>  
> -env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
> +if (!cpu_state->vcpu_dirty) {
> +/* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
> +env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
> +}
>  
>  if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>  cpu_synchronize_state(cpu_state);
> -- 
> 2.24.3 (Apple Git-128)
>

Re: [PATCH 3/4] block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()

2020-12-04 Thread Alberto Garcia

On Thu 03 Dec 2020 11:27:12 PM CET, Vladimir Sementsov-Ogievskiy wrote:
> Move bdrv_is_inserted() calls into callers.
>
> We are going to make bdrv_check_byte_request() a clean thing.
> bdrv_is_inserted() is not about checking the request, it's about
> checking the bs. So, it should be separate.
>
> With this patch we probably change error path for some failure
> scenarios. But depending on the fact that querying too big request on
> empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
> and not ENOMEDIUM would be very strange. More over, we are going to
> move to 64bit requests, so larger requests will be allowed anyway.
>
> More over, keeping in mind that cdrom is the only driver that has
> .bdrv_is_inserted() handler it's strange that we should care so much
> about it in generic block layer, intuitively we should just do read and
> write, and cdrom driver should return correct errors if it is not
> inserted. But it's a work for another series.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH 2/4] block/io: bdrv_refresh_limits(): use ERRP_GUARD

2020-12-04 Thread Alberto Garcia

On Thu 03 Dec 2020 11:27:11 PM CET, Vladimir Sementsov-Ogievskiy wrote:
> This simplifies following commit.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto

1 2 >

1 - 100 of 185 matches

Mail list logo