Re: [PATCH 1/3] build: validate that system capstone works before using it

2021-06-27 Thread Thomas Huth

On 25/06/2021 19.22, Daniel P. Berrangé wrote:

Some versions of capstone have shipped a broken pkg-config file which
puts the -I path without the trailing '/capstone' suffix. This breaks
the ability to "#include ". Upstream and most distros have
fixed this, but a few stragglers remain, notably FreeBSD.

Signed-off-by: Daniel P. Berrangé 
---
  meson.build | 13 +
  1 file changed, 13 insertions(+)

diff --git a/meson.build b/meson.build
index d8a92666fb..9979ddae71 100644
--- a/meson.build
+++ b/meson.build
@@ -1425,6 +1425,19 @@ if capstone_opt in ['enabled', 'auto', 'system']
  kwargs: static_kwargs, method: 'pkg-config',
  required: capstone_opt == 'system' or
capstone_opt == 'enabled' and not 
have_internal)
+
+  # Some versions of capstone have broken pkg-config file
+  # that reports a wrong -I path, causing the #include to
+  # fail later. If the system has such a broken version
+  # do not use it.
+  if capstone.found() and not cc.compiles('#include ',
+  dependencies: [capstone])
+capstone = not_found
+if capstone_opt == 'system'
+  error('system capstone requested, it it does not appear to work')
+endif
+  endif


Reviewed-by: Thomas Huth 




[PATCH v5] docs/devel: Added cache plugin to the plugins docs

2021-06-27 Thread Mahmoud Mandour
Signed-off-by: Mahmoud Mandour 
---
v4 -> v5:
Fixed the illustrated command since it has path problems...

 docs/devel/tcg-plugins.rst | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index 18c6581d85..595b8e0ea4 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -319,3 +319,62 @@ the user to see what hardware is accessed how often. It 
has a number of options:
   off:001c, 1, 2
   off:0020, 1, 2
   ...
+
+- contrib/plugins/cache
+
+Cache modelling plugin that measures the performance of a given cache
+configuration when a given working set is run::
+
+./x86_64-linux-user/qemu-x86_64 -plugin ./contrib/plugins/libcache.so \
+-d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs
+
+will report the following::
+
+Data accesses: 996479, Misses: 507
+Miss rate: 0.050879%
+
+Instruction accesses: 2641737, Misses: 18617
+Miss rate: 0.704726%
+
+address, data misses, instruction
+0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx)
+0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax)
+0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax)
+0x454d48 (__tunables_init), 20, cmpb $0, (%r8)
+...
+
+address, fetch misses, instruction
+0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx
+0x41f0a0 (_IO_setb), 744, endbr64
+0x415882 (__vfprintf_internal), 744, movq %r12, %rdi
+0x4268a0 (__malloc), 696, andq $0xfff0, %rax
+...
+
+The plugin has a number of arguments, all of them are optional:
+
+  * arg="limit=N"
+
+  Print top N icache and dcache thrashing instructions along with their
+  address, number of misses, and its disassembly. (default: 32)
+
+  * arg="icachesize=N"
+  * arg="iblksize=B"
+  * arg="iassoc=A"
+
+  Instruction cache configuration arguments. They specify the cache size, block
+  size, and associativity of the instruction cache, respectively.
+  (default: N = 16384, B = 64, A = 8)
+
+  * arg="dcachesize=N"
+  * arg="dblksize=B"
+  * arg="dassoc=A"
+
+  Data cache configuration arguments. They specify the cache size, block size,
+  and associativity of the data cache, respectively.
+  (default: N = 16384, B = 64, A = 8)
+
+  * arg="evict=POLICY"
+
+  Sets the eviction policy to POLICY. Available policies are: :code:`lru`,
+  :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for
+  both instruction and data caches. (default: POLICY = :code:`lru`)
-- 
2.25.1




[PATCH v4 0/3] Fuzzer pattern-matching, timeouts, and instrumentation-filtering

2021-06-27 Thread Alexander Bulekov
v4:
- Instead of changing the patterns in the AC97 and ES1370 configs,
  make the type/name pattern matching case-insensitive.
- Copy the instrumentation filter into the build-dir, so it can be
  adapted on-the-fly.
v3:
- Check in ./configure whether clang supports -fsanitize-coverage-allowlist
v2:
- Add the instrumentation filter to the instrumentation filter patch

These patches
1.) Change generic-fuzzer timeouts so they are reconfigured prior to
each individual IO command, to allow for longer-running inputs
2.) Add an instrumentation filter to prevent libfuzzer from tracking
noisy/irrelevant parts of the code.
3.) Make pattern-matching against types/names case-insensitive.


Alexander Bulekov (3):
  fuzz: adjust timeout to allow for longer inputs
  fuzz: add an instrumentation filter
  fuzz: make object-name matching case-insensitive

 configure | 13 +++
 .../oss-fuzz/instrumentation-filter-template  | 14 +++
 tests/qtest/fuzz/generic_fuzz.c   | 37 +++
 3 files changed, 56 insertions(+), 8 deletions(-)
 create mode 100644 scripts/oss-fuzz/instrumentation-filter-template

-- 
2.28.0




[PATCH v4 2/3] fuzz: add an instrumentation filter

2021-06-27 Thread Alexander Bulekov
By default, -fsanitize=fuzzer instruments all code with coverage
information. However, this means that libfuzzer will track coverage over
hundreds of source files that are unrelated to virtual-devices. This
means that libfuzzer will optimize inputs for coverage observed in timer
code, memory APIs etc. This slows down the fuzzer and stores many inputs
that are not relevant to the actual virtual-devices.

With this change, clang versions that support the
"-fsanitize-coverage-allowlist" will only instrument a subset of the
compiled code, that is directly related to virtual-devices.

Signed-off-by: Alexander Bulekov 
---
 configure| 13 +
 scripts/oss-fuzz/instrumentation-filter-template | 14 ++
 2 files changed, 27 insertions(+)
 create mode 100644 scripts/oss-fuzz/instrumentation-filter-template

diff --git a/configure b/configure
index debd50c085..c755f8f29e 100755
--- a/configure
+++ b/configure
@@ -5176,6 +5176,11 @@ if test "$fuzzing" = "yes" && test -z 
"${LIB_FUZZING_ENGINE+xxx}"; then
 error_exit "Your compiler doesn't support -fsanitize=fuzzer"
 exit 1
   fi
+  have_clang_coverage_filter=no
+  echo > $TMPTXT
+  if compile_prog "$CPU_CFLAGS -Werror -fsanitize=fuzzer 
-fsanitize-coverage-allowlist=$TMPTXT" ""; then
+  have_clang_coverage_filter=yes
+  fi
 fi
 
 # Thread sanitizer is, for now, much noisier than the other sanitizers;
@@ -6101,6 +6106,14 @@ if test "$fuzzing" = "yes" ; then
 # rule for the fuzzer adds these to the link_args. They need to be
 # configurable, to support OSS-Fuzz
 FUZZ_EXE_LDFLAGS="-fsanitize=fuzzer"
+
+# Specify a filter to only instrument code that is directly related to
+# virtual-devices.
+if test "$have_clang_coverage_filter" = "yes" ; then
+cp "$source_path/scripts/oss-fuzz/instrumentation-filter-template" \
+instrumentation-filter
+QEMU_CFLAGS="$QEMU_CFLAGS 
-fsanitize-coverage-allowlist=instrumentation-filter"
+fi
   else
 FUZZ_EXE_LDFLAGS="$LIB_FUZZING_ENGINE"
   fi
diff --git a/scripts/oss-fuzz/instrumentation-filter-template 
b/scripts/oss-fuzz/instrumentation-filter-template
new file mode 100644
index 00..44e853159c
--- /dev/null
+++ b/scripts/oss-fuzz/instrumentation-filter-template
@@ -0,0 +1,14 @@
+# Code that we actually want the fuzzer to target
+# See: 
https://clang.llvm.org/docs/SanitizerCoverage.html#disabling-instrumentation-without-source-modification
+#
+src:*/hw/*
+src:*/include/hw/*
+src:*/slirp/*
+
+# We don't care about coverage over fuzzer-specific code, however we should
+# instrument the fuzzer entry-point so libFuzzer always sees at least some
+# coverage - otherwise it will exit after the first input
+src:*/tests/qtest/fuzz/fuzz.c
+
+# Enable instrumentation for all functions in those files
+fun:*
-- 
2.28.0




[PATCH v4 3/3] fuzz: make object-name matching case-insensitive

2021-06-27 Thread Alexander Bulekov
We have some configs for devices such as the AC97 and ES1370 that were
not matching memory-regions correctly, because the configs provided
lowercase names. To resolve these problems and prevent them from
occurring again in the future, convert both the pattern and names to
lower-case, prior to checking for a match.

Suggested-by: Darren Kenny 
Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/generic_fuzz.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index 71d36e8f6f..0695a349b2 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -751,8 +751,13 @@ static int locate_fuzz_memory_regions(Object *child, void 
*opaque)
 
 static int locate_fuzz_objects(Object *child, void *opaque)
 {
+GString *type_name;
+GString *path_name;
 char *pattern = opaque;
-if (g_pattern_match_simple(pattern, object_get_typename(child))) {
+
+type_name = g_string_new(object_get_typename(child));
+g_string_ascii_down(type_name);
+if (g_pattern_match_simple(pattern, type_name->str)) {
 /* Find and save ptrs to any child MemoryRegions */
 object_child_foreach_recursive(child, locate_fuzz_memory_regions, 
NULL);
 
@@ -769,8 +774,9 @@ static int locate_fuzz_objects(Object *child, void *opaque)
 g_ptr_array_add(fuzzable_pci_devices, PCI_DEVICE(child));
 }
 } else if (object_dynamic_cast(OBJECT(child), TYPE_MEMORY_REGION)) {
-if (g_pattern_match_simple(pattern,
-object_get_canonical_path_component(child))) {
+path_name = g_string_new(object_get_canonical_path_component(child));
+g_string_ascii_down(path_name);
+if (g_pattern_match_simple(pattern, path_name->str)) {
 MemoryRegion *mr;
 mr = MEMORY_REGION(child);
 if ((memory_region_is_ram(mr) ||
@@ -779,7 +785,9 @@ static int locate_fuzz_objects(Object *child, void *opaque)
 g_hash_table_insert(fuzzable_memoryregions, mr, 
(gpointer)true);
 }
 }
+g_string_free(path_name, true);
 }
+g_string_free(type_name, true);
 return 0;
 }
 
@@ -807,6 +815,7 @@ static void generic_pre_fuzz(QTestState *s)
 MemoryRegion *mr;
 QPCIBus *pcibus;
 char **result;
+GString *pattern;
 
 if (!getenv("QEMU_FUZZ_OBJECTS")) {
 usage();
@@ -836,10 +845,17 @@ static void generic_pre_fuzz(QTestState *s)
 
 result = g_strsplit(getenv("QEMU_FUZZ_OBJECTS"), " ", -1);
 for (int i = 0; result[i] != NULL; i++) {
+pattern = g_string_new(result[i]);
+/*
+ * Make the pattern lowercase. We do the same for all the MemoryRegion
+ * and Type names so the configs are case-insensitive.
+ */
+g_string_ascii_down(pattern);
 printf("Matching objects by name %s\n", result[i]);
 object_child_foreach_recursive(qdev_get_machine(),
 locate_fuzz_objects,
-result[i]);
+pattern->str);
+g_string_free(pattern, true);
 }
 g_strfreev(result);
 printf("This process will try to fuzz the following MemoryRegions:\n");
-- 
2.28.0




[PATCH v4 1/3] fuzz: adjust timeout to allow for longer inputs

2021-06-27 Thread Alexander Bulekov
Using a custom timeout is useful to continue fuzzing complex devices,
even after we run into some slow code-path. However, simply adding a
fixed timeout to each input effectively caps the maximum input
length/number of operations at some artificial value. There are two
major problems with this:
1. Some code might only be reachable through long IO sequences.
2. Longer inputs can actually be _better_ for performance. While the
   raw number of fuzzer executions decreases with larger inputs, the
   number of MMIO/PIO/DMA operation/second actually increases, since
   were are speding proportionately less time fork()ing.

With this change, we keep the custom-timeout, but we renew it, prior to
each MMIO/PIO/DMA operation. Thus, we time-out only when a specific
operation takes a long time.

Reviewed-by: Darren Kenny 
Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/generic_fuzz.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index cea7d4058e..71d36e8f6f 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -661,15 +661,16 @@ static void generic_fuzz(QTestState *s, const unsigned 
char *Data, size_t Size)
 uint8_t op;
 
 if (fork() == 0) {
+struct sigaction sact;
+struct itimerval timer;
 /*
  * Sometimes the fuzzer will find inputs that take quite a long time to
  * process. Often times, these inputs do not result in new coverage.
  * Even if these inputs might be interesting, they can slow down the
- * fuzzer, overall. Set a timeout to avoid hurting performance, too 
much
+ * fuzzer, overall. Set a timeout for each command to avoid hurting
+ * performance, too much
  */
 if (timeout) {
-struct sigaction sact;
-struct itimerval timer;
 
 sigemptyset(&sact.sa_mask);
 sact.sa_flags   = SA_NODEFER;
@@ -679,13 +680,17 @@ static void generic_fuzz(QTestState *s, const unsigned 
char *Data, size_t Size)
 memset(&timer, 0, sizeof(timer));
 timer.it_value.tv_sec = timeout / USEC_IN_SEC;
 timer.it_value.tv_usec = timeout % USEC_IN_SEC;
-setitimer(ITIMER_VIRTUAL, &timer, NULL);
 }
 
 op_clear_dma_patterns(s, NULL, 0);
 pci_disabled = false;
 
 while (cmd && Size) {
+/* Reset the timeout, each time we run a new command */
+if (timeout) {
+setitimer(ITIMER_VIRTUAL, &timer, NULL);
+}
+
 /* Get the length until the next command or end of input */
 nextcmd = memmem(cmd, Size, SEPARATOR, strlen(SEPARATOR));
 cmd_len = nextcmd ? nextcmd - cmd : Size;
-- 
2.28.0




Re: [PATCH qemu v22] spapr: Implement Open Firmware client interface

2021-06-27 Thread Alexey Kardashevskiy




On 6/28/21 02:38, BALATON Zoltan wrote:

On Fri, 25 Jun 2021, Alexey Kardashevskiy wrote:

The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..1 - stack
40.. - kernel
3ea.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 



The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.

Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: BALATON Zoltan 



Thanks a lot for thorough reviews, much appreciated!


--
Alexey



Re: tests: ReverseDebugging_AArch64.test_aarch64_virt -> InvalidPacketError

2021-06-27 Thread Pavel Dovgalyuk

On 25.06.2021 20:01, Philippe Mathieu-Daudé wrote:

Hi Pavel,

FYI as of commit 050cee12315 ("Merge remote-tracking branch
'remotes/stsquad/tags/pull-testing-updates-250621-1' into staging")


Doesn't it mean, that the real culprit is hidden and we should bisect?


the ReverseDebugging_AArch64 test is failing:


Shouldn't the merge be postponed in such cases?



  (28/37)
tests/acceptance/reverse_debugging.py:ReverseDebugging_AArch64.test_aarch64_virt:
  ERROR (0.96 s)

INFO | recorded log with 1690570+ steps
INFO | replaying the execution...
INFO | connecting to gdbstub
INFO | stepping forward
INFO | saving position 4000
ERROR|
ERROR| Reproduced traceback from:
lib/python3.8/site-packages/avocado/core/test.py:770
ERROR| Traceback (most recent call last):
ERROR|   File "acceptance/reverse_debugging.py", line 206, in
test_aarch64_virt
ERROR| self.reverse_debugging(
ERROR|   File "acceptance/reverse_debugging.py", line 140, in
reverse_debugging
ERROR| pc = self.get_pc(g)
ERROR|   File "acceptance/reverse_debugging.py", line 77, in get_pc
ERROR| return self.get_reg(g, self.REG_PC)
ERROR|   File "acceptance/reverse_debugging.py", line 72, in get_reg
ERROR| return self.get_reg_le(g, reg)
ERROR|   File "acceptance/reverse_debugging.py", line 58, in get_reg_le
ERROR| res = g.cmd(b'p%x' % reg)
ERROR|   File "lib/python3.8/site-packages/avocado/utils/gdb.py", line
783, in cmd
ERROR| response_payload = self.decode(result)
ERROR|   File "lib/python3.8/site-packages/avocado/utils/gdb.py", line
738, in decode
ERROR| raise InvalidPacketError
ERROR| avocado.utils.gdb.InvalidPacketError
ERROR|
DEBUG| Local variables:
DEBUG|  -> self :
28-tests/acceptance/reverse_debugging.py:ReverseDebugging_AArch64.test_aarch64_virt
DEBUG|  -> kernel_url :
https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz
DEBUG|  -> kernel_hash :
8c73e469fc6ea06a58dc83a628fc695b693b8493
DEBUG|  -> kernel_path :
avocado-cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz
DEBUG| >>> {'execute': 'quit'}
ERROR|
ERROR| Reproduced traceback from:
lib/python3.8/site-packages/avocado/core/test.py:796
ERROR| Traceback (most recent call last):
ERROR|   File "python/qemu/machine/machine.py", line 489, in _do_shutdown
ERROR| self._soft_shutdown(timeout, has_quit)
ERROR|   File "python/qemu/machine/machine.py", line 469, in _soft_shutdown
ERROR| self._qmp.cmd('quit')
ERROR|   File "python/qemu/qmp/__init__.py", line 325, in cmd
ERROR| return self.cmd_obj(qmp_cmd)
ERROR|   File "python/qemu/qmp/__init__.py", line 303, in cmd_obj
ERROR| self.__sock.sendall(json.dumps(qmp_cmd).encode('utf-8'))
ERROR| BrokenPipeError: [Errno 32] Broken pipe
ERROR|
ERROR| The above exception was the direct cause of the following exception:
ERROR|
ERROR| Traceback (most recent call last):
ERROR|   File "acceptance/avocado_qemu/__init__.py", line 244, in tearDown
ERROR| vm.shutdown()
ERROR|   File "python/qemu/machine/machine.py", line 519, in shutdown
ERROR| self._do_shutdown(timeout, has_quit)
ERROR|   File "python/qemu/machine/machine.py", line 492, in _do_shutdown
ERROR| raise AbnormalShutdown("Could not perform graceful shutdown") \
ERROR| qemu.machine.machine.AbnormalShutdown: Could not perform graceful
shutdown
ERROR| Traceback (most recent call last):
ERROR| ERROR
28-tests/acceptance/reverse_debugging.py:ReverseDebugging_AArch64.test_aarch64_virt
-> InvalidPacketError:






Re: [PATCH] arm/aspeed: rainier: Add i2c eeproms and muxes

2021-06-27 Thread Joel Stanley
On Mon, 28 Jun 2021 at 04:49, Joel Stanley  wrote:
>
> On Mon, 28 Jun 2021 at 04:33, Andrew Jeffery  wrote:
> >
> >
> >
> > On Fri, 25 Jun 2021, at 14:36, Joel Stanley wrote:
> > > These are the devices documented by the Rainier device tree. With this
> > > we can see the guest discovering the multiplexers and probing the eeprom
> > > devices:
> > >
> > >  i2c i2c-2: Added multiplexed i2c bus 16
> > >  i2c i2c-2: Added multiplexed i2c bus 17
> > >  i2c i2c-2: Added multiplexed i2c bus 18
> > >  i2c i2c-2: Added multiplexed i2c bus 19
> > >  i2c-mux-gpio i2cmux: 4 port mux on 1e78a180.i2c-bus adapter
> > >  at24 20-0050: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> > >  i2c i2c-4: Added multiplexed i2c bus 20
> > >  at24 21-0051: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> > >  i2c i2c-4: Added multiplexed i2c bus 21
> > >  at24 22-0052: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> > >
> > > Signed-off-by: Joel Stanley 
> > > ---
> > >  hw/arm/aspeed.c | 56 +
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> > > index 1301e8fdffb2..7ed22294c6eb 100644
> > > --- a/hw/arm/aspeed.c
> > > +++ b/hw/arm/aspeed.c
> > > @@ -677,6 +677,10 @@ static void g220a_bmc_i2c_init(AspeedMachineState 
> > > *bmc)
> > >  static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
> > >  {
> > >  AspeedSoCState *soc = &bmc->soc;
> > > +I2CSlave *i2c_mux;
> > > +
> > > +smbus_eeprom_init_one(aspeed_i2c_get_bus(&soc->i2c, 0), 0x51,
> > > +  g_malloc0(32 * 1024));
> > >
> > >  /* The rainier expects a TMP275 but a TMP105 is compatible */
> > >  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), 
> > > TYPE_TMP105,
> > > @@ -685,11 +689,25 @@ static void rainier_bmc_i2c_init(AspeedMachineState 
> > > *bmc)
> > >   0x49);
> > >  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), 
> > > TYPE_TMP105,
> > >   0x4a);
> > > +i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4),
> > > +  "pca9546", 0x70);
> > > +smbus_eeprom_init_one(pca954x_i2c_get_bus(i2c_mux, 0), 0x50,
> > > +  g_malloc0(64 * 1024));
> >
> > The EEPROMs described in the Rainier devicetree are Atmel AT2x devices and
> > not SMBus EEPROMs. The SMBus EEPROM model uses SMBus block reads and
> > writes which are not what the AT2x driver issues. If you try to read or
> > write data under Linux from the EEPROMs in this patch you just get
> > corrupt results. So as far as I can see the patch needs to be reworked.
>
> That's fine, these are just there so the drivers can probe. As you can
> see the devices have no data in them either, so it's nowhere near an
> incomplete model.

nowhere near a complete model.

>
> If you want to extend the modelling to have the correct VPD data, and
> allow writes where applicable, that would be welcome.
>
> Cheers,
>
> Joel



Re: [PATCH] arm/aspeed: rainier: Add i2c eeproms and muxes

2021-06-27 Thread Joel Stanley
On Mon, 28 Jun 2021 at 04:33, Andrew Jeffery  wrote:
>
>
>
> On Fri, 25 Jun 2021, at 14:36, Joel Stanley wrote:
> > These are the devices documented by the Rainier device tree. With this
> > we can see the guest discovering the multiplexers and probing the eeprom
> > devices:
> >
> >  i2c i2c-2: Added multiplexed i2c bus 16
> >  i2c i2c-2: Added multiplexed i2c bus 17
> >  i2c i2c-2: Added multiplexed i2c bus 18
> >  i2c i2c-2: Added multiplexed i2c bus 19
> >  i2c-mux-gpio i2cmux: 4 port mux on 1e78a180.i2c-bus adapter
> >  at24 20-0050: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> >  i2c i2c-4: Added multiplexed i2c bus 20
> >  at24 21-0051: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> >  i2c i2c-4: Added multiplexed i2c bus 21
> >  at24 22-0052: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> >
> > Signed-off-by: Joel Stanley 
> > ---
> >  hw/arm/aspeed.c | 56 +
> >  1 file changed, 56 insertions(+)
> >
> > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> > index 1301e8fdffb2..7ed22294c6eb 100644
> > --- a/hw/arm/aspeed.c
> > +++ b/hw/arm/aspeed.c
> > @@ -677,6 +677,10 @@ static void g220a_bmc_i2c_init(AspeedMachineState *bmc)
> >  static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
> >  {
> >  AspeedSoCState *soc = &bmc->soc;
> > +I2CSlave *i2c_mux;
> > +
> > +smbus_eeprom_init_one(aspeed_i2c_get_bus(&soc->i2c, 0), 0x51,
> > +  g_malloc0(32 * 1024));
> >
> >  /* The rainier expects a TMP275 but a TMP105 is compatible */
> >  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), TYPE_TMP105,
> > @@ -685,11 +689,25 @@ static void rainier_bmc_i2c_init(AspeedMachineState 
> > *bmc)
> >   0x49);
> >  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), TYPE_TMP105,
> >   0x4a);
> > +i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4),
> > +  "pca9546", 0x70);
> > +smbus_eeprom_init_one(pca954x_i2c_get_bus(i2c_mux, 0), 0x50,
> > +  g_malloc0(64 * 1024));
>
> The EEPROMs described in the Rainier devicetree are Atmel AT2x devices and
> not SMBus EEPROMs. The SMBus EEPROM model uses SMBus block reads and
> writes which are not what the AT2x driver issues. If you try to read or
> write data under Linux from the EEPROMs in this patch you just get
> corrupt results. So as far as I can see the patch needs to be reworked.

That's fine, these are just there so the drivers can probe. As you can
see the devices have no data in them either, so it's nowhere near an
incomplete model.

If you want to extend the modelling to have the correct VPD data, and
allow writes where applicable, that would be welcome.

Cheers,

Joel



Re: [PATCH] arm/aspeed: rainier: Add i2c eeproms and muxes

2021-06-27 Thread Andrew Jeffery



On Fri, 25 Jun 2021, at 14:36, Joel Stanley wrote:
> These are the devices documented by the Rainier device tree. With this
> we can see the guest discovering the multiplexers and probing the eeprom
> devices:
> 
>  i2c i2c-2: Added multiplexed i2c bus 16
>  i2c i2c-2: Added multiplexed i2c bus 17
>  i2c i2c-2: Added multiplexed i2c bus 18
>  i2c i2c-2: Added multiplexed i2c bus 19
>  i2c-mux-gpio i2cmux: 4 port mux on 1e78a180.i2c-bus adapter
>  at24 20-0050: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
>  i2c i2c-4: Added multiplexed i2c bus 20
>  at24 21-0051: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
>  i2c i2c-4: Added multiplexed i2c bus 21
>  at24 22-0052: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
> 
> Signed-off-by: Joel Stanley 
> ---
>  hw/arm/aspeed.c | 56 +
>  1 file changed, 56 insertions(+)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 1301e8fdffb2..7ed22294c6eb 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -677,6 +677,10 @@ static void g220a_bmc_i2c_init(AspeedMachineState *bmc)
>  static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
>  {
>  AspeedSoCState *soc = &bmc->soc;
> +I2CSlave *i2c_mux;
> +
> +smbus_eeprom_init_one(aspeed_i2c_get_bus(&soc->i2c, 0), 0x51,
> +  g_malloc0(32 * 1024));
>  
>  /* The rainier expects a TMP275 but a TMP105 is compatible */
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), TYPE_TMP105,
> @@ -685,11 +689,25 @@ static void rainier_bmc_i2c_init(AspeedMachineState 
> *bmc)
>   0x49);
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4), TYPE_TMP105,
>   0x4a);
> +i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4),
> +  "pca9546", 0x70);
> +smbus_eeprom_init_one(pca954x_i2c_get_bus(i2c_mux, 0), 0x50,
> +  g_malloc0(64 * 1024));

The EEPROMs described in the Rainier devicetree are Atmel AT2x devices and
not SMBus EEPROMs. The SMBus EEPROM model uses SMBus block reads and 
writes which are not what the AT2x driver issues. If you try to read or 
write data under Linux from the EEPROMs in this patch you just get 
corrupt results. So as far as I can see the patch needs to be reworked.

Andrew



Re: Qemu on Haiku

2021-06-27 Thread Thomas Huth

On 28/06/2021 02.38, BALATON Zoltan wrote:

On Sun, 27 Jun 2021, Richard Zak wrote:

Hopefully last questions:

[...]

2) Is it acceptable to have a patch for the configure script, or is that
generated? I found some Haiku-related issues there


The configure script is not generated, it's just a shell script so you can 
send patches for it I think.


Right, QEMU's "configure" script is handmade.


Regarding prior email:
Seems like the big tasks are:
1) Haiku VM for continuous integration. Is this hosted in Amazon or other
cloud infrastructure?


The QEMU project is using gitlab-CI, Travis-CI and Cirrus-CI, as CI systems 
that could be used by everybody. However, most of these are based on 
Containers, so it's not possible to run an OS there that is completely 
different from the ones that are offered by default.
Now that's where the tests in the tests/vm/ directory come in very handy. 
These are based on KVM, so they can run on all Linux hosts that have 
virtualization enabled. Thus if the tests/vm/haiku.x86_64 test would work 
more properly, it could get run on KVM-enabled machines, too.



3) Supporting aspects of the qemu code relevant to Haiku (found an issue in
slirp & configure script)


slirp is a separate project now, if you want to fix something in there, 
please report it here:


 https://gitlab.freedesktop.org/slirp/libslirp

 Thomas




[Bug 1295587] Re: Temporal freeze and slowdown while using emulated sb16

2021-06-27 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1295587

Title:
  Temporal freeze and slowdown while using emulated sb16

Status in QEMU:
  Expired

Bug description:
  I have been carrying around this bug since previous versions and on
  different machines: When I use the -soundhw sb16 option, while playing
  any sound on the virtual machine it temporally freezes the emulated
  machine and loops the last bit of such sound effect for 1-2 minutes,
  then goes back to normal speed (until a new sound is played).

  Console shows:

   sb16: warning: command 0xf9,1 is not truly understood yet
   sb16: warning: command 0xf9,1 is not truly understood yet
  (...)
  main-loop: WARNING: I/O thread spun for 1000 iterations

  -One of my emulated machines is Windows 3.11: I managed to overrun
  this bug by switching from the local 1.5 version of the sound blaster
  driver to the 1.0, although since I updated qemu it freezes that
  machine, so I can't test if it still works.

  I am using the 1.7.90 version, but I suffered this bug for over one
  year (confirmed in version 2.0.0-rc0 too)

  this bug happens anytime I use the -soundhw sb16 switch, but the full
  command I am using in this specific case is:

  qemu-system-i386 -localtime -cpu pentium -m 32 -display sdl -vga
  cirrus -hda c.img -cdrom win95stuff.iso -net nic,model=ne2k_pci -net
  user -soundhw sb16

  This bug appears on all my machines: Pentium III running Slackware
  13.0 and freeBSD 10; Dual core T2400, both in Arch, Gentoo and
  Slackware 14.1 (all 32 bits), and a Dual core T4400 64 bits with
  Gentoo and Slackware. Same problem in all of those systems after
  compiling instead of using the distro packages

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1295587/+subscriptions



Re: [PATCH v5 0/6] crypto: Make QCryptoTLSCreds* structures private

2021-06-27 Thread Akihiko Odaki

Hi,

qemu-nbd.c still refers to members of QCryptoTLSCreds* and can't be 
compiled.


Regards,
Akihiko Odaki

On 2021/06/17 21:18, Philippe Mathieu-Daudé wrote:

Missing review: 5 & 6

Follow Daniel suggestion to simplify qcrypto TLS implementations,
aiming to solve the OSX build failure.

Since v4:
- Do not introduce qcrypto_tls_session_check_role (Richard, Daniel)
- Added R-b tags

Since v3:
- Added missing @errp docstring description

Since v2:
- Add Error* argument (Daniel)
- Move structure definitions to "tlscredspriv.h"

Philippe Mathieu-Daudé (6):
   crypto/tlscreds: Introduce qcrypto_tls_creds_check_endpoint() helper
   block/nbd: Use qcrypto_tls_creds_check_endpoint()
   chardev/socket: Use qcrypto_tls_creds_check_endpoint()
   migration/tls: Use qcrypto_tls_creds_check_endpoint()
   ui/vnc: Use qcrypto_tls_creds_check_endpoint()
   crypto: Make QCryptoTLSCreds* structures private

  crypto/tlscredspriv.h  | 45 ++
  include/crypto/tls-cipher-suites.h |  6 
  include/crypto/tlscreds.h  | 30 ++--
  include/crypto/tlscredsanon.h  | 12 
  include/crypto/tlscredspsk.h   | 12 
  include/crypto/tlscredsx509.h  | 10 ---
  block/nbd.c|  6 ++--
  blockdev-nbd.c |  6 ++--
  chardev/char-socket.c  | 18 
  crypto/tls-cipher-suites.c |  7 +
  crypto/tlscreds.c  | 12 
  crypto/tlscredsanon.c  |  2 ++
  crypto/tlscredspsk.c   |  2 ++
  crypto/tlscredsx509.c  |  1 +
  crypto/tlssession.c|  1 +
  migration/tls.c|  6 +---
  ui/vnc.c   |  7 +++--
  17 files changed, 101 insertions(+), 82 deletions(-)





[PATCH v2] ui/cocoa: Use NSWindow's ability to resize

2021-06-27 Thread Akihiko Odaki
This change brings two new features:
- The window will be resizable if "Zoom To Fit" is eanbled
- The window can be made full screen by clicking full screen button
  provided by the platform. (The left-top green button.)

Signed-off-by: Akihiko Odaki 
---
 ui/cocoa.m | 542 -
 1 file changed, 249 insertions(+), 293 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 9f72844b079..091d9721f4d 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -93,12 +93,10 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 static DisplayChangeListener dcl = {
 .ops = &dcl_ops,
 };
-static int last_buttons;
 static int cursor_hide = 1;
 
 static int gArgc;
 static char **gArgv;
-static bool stretch_video;
 static NSTextField *pauseLabel;
 static NSArray * supportedImageFileTypes;
 
@@ -301,19 +299,16 @@ static void handleAnyDeviceErrors(Error * err)
 */
 @interface QemuCocoaView : NSView
 {
+NSTrackingArea *trackingArea;
 QEMUScreen screen;
-NSWindow *fullScreenWindow;
-float cx,cy,cw,ch,cdx,cdy;
 pixman_image_t *pixman_image;
 QKbdState *kbd;
 BOOL isMouseGrabbed;
-BOOL isFullscreen;
 BOOL isAbsoluteEnabled;
 }
 - (void) switchSurface:(pixman_image_t *)image;
 - (void) grabMouse;
 - (void) ungrabMouse;
-- (void) toggleFullScreen:(id)sender;
 - (void) handleMonitorInput:(NSEvent *)event;
 - (bool) handleEvent:(NSEvent *)event;
 - (bool) handleEventLocked:(NSEvent *)event;
@@ -328,8 +323,6 @@ - (void) setAbsoluteEnabled:(BOOL)tIsAbsoluteEnabled;
  */
 - (BOOL) isMouseGrabbed;
 - (BOOL) isAbsoluteEnabled;
-- (float) cdx;
-- (float) cdy;
 - (QEMUScreen) gscreen;
 - (void) raiseAllKeys;
 @end
@@ -369,46 +362,43 @@ - (BOOL) isOpaque
 return YES;
 }
 
-- (BOOL) screenContainsPoint:(NSPoint) p
+- (void) removeTrackingRect
 {
-return (p.x > -1 && p.x < screen.width && p.y > -1 && p.y < screen.height);
+if (trackingArea) {
+[self removeTrackingArea:trackingArea];
+[trackingArea release];
+trackingArea = nil;
+}
 }
 
-/* Get location of event and convert to virtual screen coordinate */
-- (CGPoint) screenLocationOfEvent:(NSEvent *)ev
+- (void) frameUpdated
 {
-NSWindow *eventWindow = [ev window];
-// XXX: Use CGRect and -convertRectFromScreen: to support macOS 10.10
-CGRect r = CGRectZero;
-r.origin = [ev locationInWindow];
-if (!eventWindow) {
-if (!isFullscreen) {
-return [[self window] convertRectFromScreen:r].origin;
-} else {
-CGPoint locationInSelfWindow = [[self window] 
convertRectFromScreen:r].origin;
-CGPoint loc = [self convertPoint:locationInSelfWindow 
fromView:nil];
-if (stretch_video) {
-loc.x /= cdx;
-loc.y /= cdy;
-}
-return loc;
-}
-} else if ([[self window] isEqual:eventWindow]) {
-if (!isFullscreen) {
-return r.origin;
-} else {
-CGPoint loc = [self convertPoint:r.origin fromView:nil];
-if (stretch_video) {
-loc.x /= cdx;
-loc.y /= cdy;
-}
-return loc;
-}
-} else {
-return [[self window] convertRectFromScreen:[eventWindow 
convertRectToScreen:r]].origin;
+[self removeTrackingRect];
+
+if ([self window]) {
+NSTrackingAreaOptions options = NSTrackingActiveInKeyWindow |
+NSTrackingMouseEnteredAndExited |
+NSTrackingMouseMoved;
+trackingArea = [[NSTrackingArea alloc] initWithRect:[self frame]
+options:options
+  owner:self
+   userInfo:nil];
+[self addTrackingArea:trackingArea];
+[self updateUIInfo];
 }
 }
 
+- (void) viewDidMoveToWindow
+{
+[self resizeWindow];
+[self frameUpdated];
+}
+
+- (void) viewWillMoveToWindow:(NSWindow *)newWindow
+{
+[self removeTrackingRect];
+}
+
 - (void) hideCursor
 {
 if (!cursor_hide) {
@@ -471,13 +461,14 @@ - (void) drawRect:(NSRect) rect
 int i;
 CGImageRef clipImageRef;
 CGRect clipRect;
+CGFloat d = (CGFloat)h / [self frame].size.height;
 
 [self getRectsBeingDrawn:&rectList count:&rectCount];
 for (i = 0; i < rectCount; i++) {
-clipRect.origin.x = rectList[i].origin.x / cdx;
-clipRect.origin.y = (float)h - (rectList[i].origin.y + 
rectList[i].size.height) / cdy;
-clipRect.size.width = rectList[i].size.width / cdx;
-clipRect.size.height = rectList[i].size.height / cdy;
+clipRect.origin.x = rectList[i].origin.x * d;
+clipRect.origin.y = (float)h - (rectList[i].origin.y + 
rectList[i].size.height) * d;
+clipRect.size.width = rectList[i].size.width * d;
+

[PATCH] ui/cocoa: Use NSWindow's ability to resize

2021-06-27 Thread Akihiko Odaki
This change brings two new features:
- The window will be resizable if "Zoom To Fit" is eanbled
- The window can be made full screen by clicking full screen button
  provided by the platform. (The left-top green button.)

Signed-off-by: Akihiko Odaki 
---
 ui/cocoa.m | 542 -
 1 file changed, 249 insertions(+), 293 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 6cc1d15baba..e54c69e69f1 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -94,12 +94,10 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 static DisplayChangeListener dcl = {
 .ops = &dcl_ops,
 };
-static int last_buttons;
 static int cursor_hide = 1;
 
 static int gArgc;
 static char **gArgv;
-static bool stretch_video;
 static NSTextField *pauseLabel;
 static NSArray * supportedImageFileTypes;
 
@@ -302,20 +300,17 @@ static void handleAnyDeviceErrors(Error * err)
 */
 @interface QemuCocoaView : NSView
 {
+NSTrackingArea *trackingArea;
 QEMUScreen screen;
-NSWindow *fullScreenWindow;
-float cx,cy,cw,ch,cdx,cdy;
 pixman_image_t *pixman_image;
 QKbdState *kbd;
 BOOL isMouseGrabbed;
-BOOL isFullscreen;
 BOOL isAbsoluteEnabled;
 CFMachPortRef eventsTap;
 }
 - (void) switchSurface:(pixman_image_t *)image;
 - (void) grabMouse;
 - (void) ungrabMouse;
-- (void) toggleFullScreen:(id)sender;
 - (void) setFullGrab:(id)sender;
 - (void) handleMonitorInput:(NSEvent *)event;
 - (bool) handleEvent:(NSEvent *)event;
@@ -332,8 +327,6 @@ - (void) setAbsoluteEnabled:(BOOL)tIsAbsoluteEnabled;
 - (BOOL) isMouseGrabbed;
 - (BOOL) isAbsoluteEnabled;
 - (BOOL) isSwapOptionCommandEnabled;
-- (float) cdx;
-- (float) cdy;
 - (QEMUScreen) gscreen;
 - (void) raiseAllKeys;
 @end
@@ -391,46 +384,43 @@ - (BOOL) isOpaque
 return YES;
 }
 
-- (BOOL) screenContainsPoint:(NSPoint) p
+- (void) removeTrackingRect
 {
-return (p.x > -1 && p.x < screen.width && p.y > -1 && p.y < screen.height);
+if (trackingArea) {
+[self removeTrackingArea:trackingArea];
+[trackingArea release];
+trackingArea = nil;
+}
 }
 
-/* Get location of event and convert to virtual screen coordinate */
-- (CGPoint) screenLocationOfEvent:(NSEvent *)ev
+- (void) frameUpdated
 {
-NSWindow *eventWindow = [ev window];
-// XXX: Use CGRect and -convertRectFromScreen: to support macOS 10.10
-CGRect r = CGRectZero;
-r.origin = [ev locationInWindow];
-if (!eventWindow) {
-if (!isFullscreen) {
-return [[self window] convertRectFromScreen:r].origin;
-} else {
-CGPoint locationInSelfWindow = [[self window] 
convertRectFromScreen:r].origin;
-CGPoint loc = [self convertPoint:locationInSelfWindow 
fromView:nil];
-if (stretch_video) {
-loc.x /= cdx;
-loc.y /= cdy;
-}
-return loc;
-}
-} else if ([[self window] isEqual:eventWindow]) {
-if (!isFullscreen) {
-return r.origin;
-} else {
-CGPoint loc = [self convertPoint:r.origin fromView:nil];
-if (stretch_video) {
-loc.x /= cdx;
-loc.y /= cdy;
-}
-return loc;
-}
-} else {
-return [[self window] convertRectFromScreen:[eventWindow 
convertRectToScreen:r]].origin;
+[self removeTrackingRect];
+
+if ([self window]) {
+NSTrackingAreaOptions options = NSTrackingActiveInKeyWindow |
+NSTrackingMouseEnteredAndExited |
+NSTrackingMouseMoved;
+trackingArea = [[NSTrackingArea alloc] initWithRect:[self frame]
+options:options
+  owner:self
+   userInfo:nil];
+[self addTrackingArea:trackingArea];
+[self updateUIInfo];
 }
 }
 
+- (void) viewDidMoveToWindow
+{
+[self resizeWindow];
+[self frameUpdated];
+}
+
+- (void) viewWillMoveToWindow:(NSWindow *)newWindow
+{
+[self removeTrackingRect];
+}
+
 - (void) hideCursor
 {
 if (!cursor_hide) {
@@ -493,13 +483,14 @@ - (void) drawRect:(NSRect) rect
 int i;
 CGImageRef clipImageRef;
 CGRect clipRect;
+CGFloat d = (CGFloat)h / [self frame].size.height;
 
 [self getRectsBeingDrawn:&rectList count:&rectCount];
 for (i = 0; i < rectCount; i++) {
-clipRect.origin.x = rectList[i].origin.x / cdx;
-clipRect.origin.y = (float)h - (rectList[i].origin.y + 
rectList[i].size.height) / cdy;
-clipRect.size.width = rectList[i].size.width / cdx;
-clipRect.size.height = rectList[i].size.height / cdy;
+clipRect.origin.x = rectList[i].origin.x * d;
+clipRect.origin.y = (float)h - (rectList[i].origin.y + 
rectList[i].size.height) * d;
+clipRect

Re: Qemu on Haiku

2021-06-27 Thread BALATON Zoltan

On Sun, 27 Jun 2021, Richard Zak wrote:

Hopefully last questions:

1) What's the format for the subject line for patches? I'm seeing things
like "[PATCH 2/4]" or "[PATCH v2 00/10]" what do those numbers mean in this
context? I didn't see anything about this mentioned in the SubmitAPatch
wiki.


These are added by git format-patch so check out the docs for that (git 
format-patch --help). The numbers are number of patch/total number in 
series, related patches are submitted in one series. The v2 is second 
revision of a patch. The first submission doesn't have a version, then 
when review asks for changes you make those and do git format-patch -v 2 
and so on. For changing commits you might want to learn about git rebase 
-i and git commit --amend. When making commits you don't have to add these 
[PATCH ...] headers just the subject line, then git will add these when 
formatting/sending patches.



2) Is it acceptable to have a patch for the configure script, or is that
generated? I found some Haiku-related issues there


The configure script is not generated, it's just a shell script so you can 
send patches for it I think.



3) Is there a way to specify that the patch is for a submodule, or is there
a separate place for that?


Submodules are separate projects so you'd have to get in patches to them 
separately then QEMU would pick it up from upstream. Each of these may 
have different process to send patches.


Regards,
BALATON Zoltan


Regarding prior email:
Seems like the big tasks are:
1) Haiku VM for continuous integration. Is this hosted in Amazon or other
cloud infrastructure?
2) Resolving issues with Haiku pertaining to testing, bringing it inline
with other OSes (and I see how the disk space error)
3) Supporting aspects of the qemu code relevant to Haiku (found an issue in
slirp & configure script)

Thank you for your help & patience!

În vin., 25 iun. 2021 la 23:03, Warner Losh  a scris:




On Fri, Jun 25, 2021 at 8:45 PM Richard Zak 
wrote:


Hello and thanks for the detailed response! I wasn't aware that a Linux
host could compile for Haiku as a target, that's interesting.

Seems like the big tasks are:
1) Haiku VM for continuous integration. Is this hosted in Amazon or other
cloud infrastructure?



Take a look at, for example, the make vm-build-freebsd target (see
tests/vm/Makefile.include). It downloads
the latest FreeBSD images, boots it with a serial console, walks through
the install of the base OS, then
installs the packages needed for qemu to build and kicks off a build and
runs some acceptance tests
afterwards. OpenBSD, NetBSD and several Linux distributions have similar
setups. I think it would be
useful for there to be one for Haiku as well, so any developer could run
these tests either in response to
a bug report in their code, or to make sure things work on/with Haiku. All
of this is done locally.

There's a separate issue for creating a Haiku runner for gitlab, but I
know little even about the FreeBSD
runner.



2) Supporting aspects of the qemu code relevant to Haiku.

I'll take a look at that Wiki page to get a feel for things, and I've
started with the compilation of the latest code from the repo on Haiku,
addressing some issues as they come up.

I am a huge fan of both projects, but also am doing this in my own time.
I'm a developer professionally, but working on Haiku & qemu during off
hours (though timely shouldn't be a problem). How are things communicated
for this project (in regard to your request for someone who can help in a
timely manner)? It seems that the vast majority of the mailing list is
patch information. What's the primary way for code to be contributed, a
merging code though Gitlab or via emailed


patches?




Emailed patches. https://wiki.qemu.org/Contribute/SubmitAPatch has all
the details, though the volume of patches means that you really want to
make sure that you CC the maintainers of the code listed in the MAINTAINERS
file when submitting patches to help ensure they do not get list.

Warner




În vin., 25 iun. 2021 la 03:09, Thomas Huth  a scris:


On 25/06/2021 06.12, Richard Zak wrote:

Hello there! I noticed the message which appears when building qemu on
Haiku. I'd hate for Haiku to lose qemu, so I would like to help!

What is needed in terms of a build system for continuous integration?

I'm

not familiar with CI systems, other than simply knowing what they do.


  Hi,

since a couple of month, we already have a Haiku VM in our VM tests, so
the
basics are already there - it's possible to run a Haiku build test on a
Linux host by typing:

  make vm-build-haiku.x86_64

However, it's still in a quite bad shape, the disk image that is used in
that VM is not big enough for compiling the whole QEMU sources. So
somebody
needs to add some additional logic there to either increase the disk
image
on the fly or to add a second free disk image to the VM that could be
used
for compilation instead. If you want to have a try, have a look at:
tests/vm/haiku

Re: [PATCH] target/riscv: pmp: Fix some typos

2021-06-27 Thread Alistair Francis
On Sun, Jun 27, 2021 at 9:57 PM Bin Meng  wrote:
>
> %s/CSP/CSR
> %s/thie/the
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  target/riscv/pmp.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> index 82ed020b10..54abf42583 100644
> --- a/target/riscv/pmp.c
> +++ b/target/riscv/pmp.c
> @@ -456,7 +456,7 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
> addr,
>  }
>
>  /*
> - * Handle a write to a pmpcfg CSP
> + * Handle a write to a pmpcfg CSR
>   */
>  void pmpcfg_csr_write(CPURISCVState *env, uint32_t reg_index,
>  target_ulong val)
> @@ -483,7 +483,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
> reg_index,
>
>
>  /*
> - * Handle a read from a pmpcfg CSP
> + * Handle a read from a pmpcfg CSR
>   */
>  target_ulong pmpcfg_csr_read(CPURISCVState *env, uint32_t reg_index)
>  {
> @@ -502,7 +502,7 @@ target_ulong pmpcfg_csr_read(CPURISCVState *env, uint32_t 
> reg_index)
>
>
>  /*
> - * Handle a write to a pmpaddr CSP
> + * Handle a write to a pmpaddr CSR
>   */
>  void pmpaddr_csr_write(CPURISCVState *env, uint32_t addr_index,
>  target_ulong val)
> @@ -540,7 +540,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
> addr_index,
>
>
>  /*
> - * Handle a read from a pmpaddr CSP
> + * Handle a read from a pmpaddr CSR
>   */
>  target_ulong pmpaddr_csr_read(CPURISCVState *env, uint32_t addr_index)
>  {
> @@ -593,7 +593,7 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
>
>  /*
>   * Calculate the TLB size if the start address or the end address of
> - * PMP entry is presented in thie TLB page.
> + * PMP entry is presented in the TLB page.
>   */
>  static target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
>   target_ulong tlb_sa, target_ulong 
> tlb_ea)
> --
> 2.25.1
>
>



Re: [PATCH 1/2] docs/system: riscv: Fix CLINT name in the sifive_u doc

2021-06-27 Thread Alistair Francis
On Mon, Jun 28, 2021 at 12:28 AM Bin Meng  wrote:
>
> It's Core *Local* Interruptor, not 'Level'.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  docs/system/riscv/sifive_u.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/docs/system/riscv/sifive_u.rst b/docs/system/riscv/sifive_u.rst
> index 32d0a1b85d..01108b5ecc 100644
> --- a/docs/system/riscv/sifive_u.rst
> +++ b/docs/system/riscv/sifive_u.rst
> @@ -11,7 +11,7 @@ The ``sifive_u`` machine supports the following devices:
>
>  * 1 E51 / E31 core
>  * Up to 4 U54 / U34 cores
> -* Core Level Interruptor (CLINT)
> +* Core Local Interruptor (CLINT)
>  * Platform-Level Interrupt Controller (PLIC)
>  * Power, Reset, Clock, Interrupt (PRCI)
>  * L2 Loosely Integrated Memory (L2-LIM)
> --
> 2.25.1
>
>



Re: Qemu on Haiku

2021-06-27 Thread Richard Zak
Hopefully last questions:

1) What's the format for the subject line for patches? I'm seeing things
like "[PATCH 2/4]" or "[PATCH v2 00/10]" what do those numbers mean in this
context? I didn't see anything about this mentioned in the SubmitAPatch
wiki.
2) Is it acceptable to have a patch for the configure script, or is that
generated? I found some Haiku-related issues there
3) Is there a way to specify that the patch is for a submodule, or is there
a separate place for that?

Regarding prior email:
Seems like the big tasks are:
1) Haiku VM for continuous integration. Is this hosted in Amazon or other
cloud infrastructure?
2) Resolving issues with Haiku pertaining to testing, bringing it inline
with other OSes (and I see how the disk space error)
3) Supporting aspects of the qemu code relevant to Haiku (found an issue in
slirp & configure script)

Thank you for your help & patience!

În vin., 25 iun. 2021 la 23:03, Warner Losh  a scris:

>
>
> On Fri, Jun 25, 2021 at 8:45 PM Richard Zak 
> wrote:
>
>> Hello and thanks for the detailed response! I wasn't aware that a Linux
>> host could compile for Haiku as a target, that's interesting.
>>
>> Seems like the big tasks are:
>> 1) Haiku VM for continuous integration. Is this hosted in Amazon or other
>> cloud infrastructure?
>>
>
> Take a look at, for example, the make vm-build-freebsd target (see
> tests/vm/Makefile.include). It downloads
> the latest FreeBSD images, boots it with a serial console, walks through
> the install of the base OS, then
> installs the packages needed for qemu to build and kicks off a build and
> runs some acceptance tests
> afterwards. OpenBSD, NetBSD and several Linux distributions have similar
> setups. I think it would be
> useful for there to be one for Haiku as well, so any developer could run
> these tests either in response to
> a bug report in their code, or to make sure things work on/with Haiku. All
> of this is done locally.
>
> There's a separate issue for creating a Haiku runner for gitlab, but I
> know little even about the FreeBSD
> runner.
>
>
>> 2) Supporting aspects of the qemu code relevant to Haiku.
>>
>> I'll take a look at that Wiki page to get a feel for things, and I've
>> started with the compilation of the latest code from the repo on Haiku,
>> addressing some issues as they come up.
>>
>> I am a huge fan of both projects, but also am doing this in my own time.
>> I'm a developer professionally, but working on Haiku & qemu during off
>> hours (though timely shouldn't be a problem). How are things communicated
>> for this project (in regard to your request for someone who can help in a
>> timely manner)? It seems that the vast majority of the mailing list is
>> patch information. What's the primary way for code to be contributed, a
>> merging code though Gitlab or via emailed
>>
> patches?
>>
>
> Emailed patches. https://wiki.qemu.org/Contribute/SubmitAPatch has all
> the details, though the volume of patches means that you really want to
> make sure that you CC the maintainers of the code listed in the MAINTAINERS
> file when submitting patches to help ensure they do not get list.
>
> Warner
>
>
>>
>> În vin., 25 iun. 2021 la 03:09, Thomas Huth  a scris:
>>
>>> On 25/06/2021 06.12, Richard Zak wrote:
>>> > Hello there! I noticed the message which appears when building qemu on
>>> > Haiku. I'd hate for Haiku to lose qemu, so I would like to help!
>>> >
>>> > What is needed in terms of a build system for continuous integration?
>>> I'm
>>> > not familiar with CI systems, other than simply knowing what they do.
>>>
>>>   Hi,
>>>
>>> since a couple of month, we already have a Haiku VM in our VM tests, so
>>> the
>>> basics are already there - it's possible to run a Haiku build test on a
>>> Linux host by typing:
>>>
>>>   make vm-build-haiku.x86_64
>>>
>>> However, it's still in a quite bad shape, the disk image that is used in
>>> that VM is not big enough for compiling the whole QEMU sources. So
>>> somebody
>>> needs to add some additional logic there to either increase the disk
>>> image
>>> on the fly or to add a second free disk image to the VM that could be
>>> used
>>> for compilation instead. If you want to have a try, have a look at:
>>> tests/vm/haiku.x86_64
>>>
>>> Also, I'm not sure whether Peter is using this VM already in his gating
>>> CI
>>> tests? I guess not, due to those size limitations...
>>>
>>> Finally, we'd also need somebody who's proficient with the Haiku APIs
>>> and
>>> who could help with problems in a timely manner, i.e. we'd need an entry
>>> in
>>> the "Hosts" section in the maintainers file. It should be someone who's
>>> basically familiar with the QEMU development process, so if you're
>>> interested, I'd suggest that you try to contribute some patches to QEMU
>>> first to get a basic understanding of the process (see e.g.
>>> https://wiki.qemu.org/Contribute/BiteSizedTasks for some easier tasks),
>>> and
>>> once you feel confident, you could add a Haiku entry to the MAINTAI

Re: [PATCH v8 18/19] arm: Enable Windows 10 trusted SMCCC boot call

2021-06-27 Thread Alexander Graf


On 15.06.21 13:02, Peter Maydell wrote:
> On Wed, 19 May 2021 at 21:23, Alexander Graf  wrote:
>> Windows 10 calls an SMCCC call via SMC unconditionally on boot. It lives
>> in the trusted application call number space, but its purpose is unknown.
>>
>> In our current SMC implementation, we inject a UDEF for unknown SMC calls,
>> including this one. However, Windows breaks on boot when we do this. Instead,
>> let's return an error code.
>>
>> With this and -M virt,virtualization=on I can successfully boot the current
>> Windows 10 Insider Preview in TCG.
>
> Same comments apply here and for patch 19.
>
> Either we can:
>  * find a spec for whatever this SMC ABI is and implement it
>consistently across TCG, KVM and HVF
>  * find whether we're misimplementing whatever the SMCCC spec says
>should happen for unknown SMC calls, and fix that bug
>
> But we're not adding random hacky workarounds for specific guest OSes.


Let's move the conversation to 19/19 then. My take on this is that TCG
is misinterpreting the SMCCC spec.


Alex





Re: [PATCH v8 19/19] hvf: arm: Handle Windows 10 SMC call

2021-06-27 Thread Alexander Graf


On 15.06.21 11:31, Peter Maydell wrote:
> On Wed, 19 May 2021 at 21:23, Alexander Graf  wrote:
>> Windows 10 calls an SMCCC call via SMC unconditionally on boot. It lives
>> in the trusted application call number space, but its purpose is unknown.
>>
>> In our current SMC implementation, we inject a UDEF for unknown SMC calls,
>> including this one. However, Windows breaks on boot when we do this. Instead,
>> let's return an error code.
>>
>> With this patch applied I can successfully boot the current Windows 10
>> Insider Preview in HVF.
>>
>> Signed-off-by: Alexander Graf 
>>
>> ---
>>
>> v7 -> v8:
>>
>>   - fix checkpatch
>> ---
>>  target/arm/hvf/hvf.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
>> index 65c33e2a14..be670af578 100644
>> --- a/target/arm/hvf/hvf.c
>> +++ b/target/arm/hvf/hvf.c
>> @@ -931,6 +931,10 @@ int hvf_vcpu_exec(CPUState *cpu)
>>  cpu_synchronize_state(cpu);
>>  if (!hvf_handle_psci_call(cpu)) {
>>  advance_pc = true;
>> +} else if (env->xregs[0] == QEMU_SMCCC_TC_WINDOWS10_BOOT) {
>> +/* This special SMC is called by Windows 10 on boot. Return 
>> error */
>> +env->xregs[0] = -1;
>> +advance_pc = true;
>>  } else {
>>  trace_hvf_unknown_smc(env->xregs[0]);
>>  hvf_raise_exception(env, EXCP_UDEF, syn_uncategorized());
> Where can I find documentation on what this SMC call is and what
> it's supposed to do ?


It's 0xc301 which according to the SMCCC spec [1] means OR'ed values
of the following:

0x8000 = Fast Call
0x4000 = SMC64
0x0300 = OEM Service Calls
0x0001 = Function number 1

So, uh. I'm not sure how to answer the question above. I don't have
source level access to Windows to read what the call is supposed to do
:). But it's definitely calling something OEM specific that it really
shouldn't be callling.

Reading the SMCCC spec section 5.2, unknown SMCCC calls should return
-1. It advises against probing by just calling them, but does not
specify any other fault behavior than the -1 return (such as the #UDEF
we inject in TCG).


Alex

[1] https://developer.arm.com/documentation/den0028/latest




Re: [PATCH v8 13/19] hvf: Add Apple Silicon support

2021-06-27 Thread Alexander Graf
Hi Peter,

On 15.06.21 12:21, Peter Maydell wrote:
> On Wed, 19 May 2021 at 21:23, Alexander Graf  wrote:
>> With Apple Silicon available to the masses, it's a good time to add support
>> for driving its virtualization extensions from QEMU.
>>
>> This patch adds all necessary architecture specific code to get basic VMs
>> working. It's still pretty raw, but definitely functional.
>>
>> Known limitations:
>>
>>   - Vtimer acknowledgement is hacky
>>   - Should implement more sysregs and fault on invalid ones then
>>   - WFI handling is missing, need to marry it with vtimer
>>
>> Signed-off-by: Alexander Graf 
>> Reviewed-by: Roman Bolshakov 
>> @@ -446,11 +454,17 @@ static void hvf_start_vcpu_thread(CPUState *cpu)
>> cpu, QEMU_THREAD_JOINABLE);
>>  }
>>
>> +__attribute__((weak)) void hvf_kick_vcpu_thread(CPUState *cpu)
>> +{
>> +cpus_kick_thread(cpu);
>> +}
> Why is this marked 'weak' ? If there's a reason for it then
> it ought to have a comment describing the reason. If we can avoid
> it then we should do so -- past experience is that 'weak' refs
> are rather non-portable, though at least this one is in a
> host-OS-specific file.


Mostly because I wanted to keep the kick function in the generic file
for the generic case. ARM is special in that it requires different kick
mechanisms depending on which context we're in (in-vcpu or in-QEMU).

However, I agree that with 2 architectures, there's not really a
"default". I'm happy to move it into the x86 specific file.


>
>> +static void hvf_raise_exception(CPUARMState *env, uint32_t excp,
>> +uint32_t syndrome)
>> +{
>> +unsigned int new_el = 1;
>> +unsigned int old_mode = pstate_read(env);
>> +unsigned int new_mode = aarch64_pstate_mode(new_el, true);
>> +target_ulong addr = env->cp15.vbar_el[new_el];
>> +
>> +env->cp15.esr_el[new_el] = syndrome;
>> +aarch64_save_sp(env, arm_current_el(env));
>> +env->elr_el[new_el] = env->pc;
>> +env->banked_spsr[aarch64_banked_spsr_index(new_el)] = old_mode;
>> +pstate_write(env, PSTATE_DAIF | new_mode);
>> +aarch64_restore_sp(env, new_el);
>> +env->pc = addr;
>> +}
> KVM does "raise an exception" by calling arm_cpu_do_interrupt()
> to do the "set ESR_ELx, save SPSR, etc etc" work (see eg
> kvm_arm_handle_debug()". Does that not work here ?


It works like a charm. I mostly did the dance because I was under the
impression you wanted to avoid me calling into any TCG code. And to me
arm_cpu_do_interrupt() seemed like TCG code. I'm absolutely happy to
change it though. Leaving things to generic code is good IMHO :).


>
>> +
>> +static uint64_t hvf_sysreg_read(CPUState *cpu, uint32_t reg)
>> +{
>> +ARMCPU *arm_cpu = ARM_CPU(cpu);
>> +uint64_t val = 0;
>> +
>> +switch (reg) {
>> +case SYSREG_CNTPCT_EL0:
>> +val = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) /
>> +  gt_cntfrq_period_ns(arm_cpu);
>> +break;
> Does hvf handle the "EL0 access which should be denied because
> CNTKCTL_EL1.EL0PCTEN is 0" case for us, or should we have
> an access-check here ?


A quick test where I tried to access it in a VM in EL0 shows that either
the CPU or HVF already generates the trap. So no check needed.


>
>> +case SYSREG_PMCCNTR_EL0:
>> +val = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> This is supposed to be a cycle counter, not a timestamp...


At 1Ghz cycle frequency, what's the difference? Or are you concerned
about the lack of overflow and PMCR logic?


>
>> +break;
>> +default:
>> +trace_hvf_unhandled_sysreg_read(reg,
>> +(reg >> 20) & 0x3,
>> +(reg >> 14) & 0x7,
>> +(reg >> 10) & 0xf,
>> +(reg >> 1) & 0xf,
>> +(reg >> 17) & 0x7);
>> +break;
>> +}
>> +
>> +return val;
>> +}
>> +
>> +static void hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
>> +{
>> +switch (reg) {
>> +case SYSREG_CNTPCT_EL0:
>> +break;
> CNTPCT_EL0 is read-only (ie writes should fault) but this
> makes it writes-ignored, doesn't it ?


It does indeed, let me fix that.


>
>> +default:
>> +trace_hvf_unhandled_sysreg_write(reg,
>> + (reg >> 20) & 0x3,
>> + (reg >> 14) & 0x7,
>> + (reg >> 10) & 0xf,
>> + (reg >> 1) & 0xf,
>> + (reg >> 17) & 0x7);
>> +break;
>> +}
>> +}
>> +switch (ec) {
>> +case EC_DATAABORT: {
>> +bool isv = syndrome & ARM_EL_ISV;
>> +bool iswrite = (syndrome >> 6) & 1;
>> +bool s1ptw = (syndrome >> 7) & 1;
>> +uint32_t sas = (syndrome >> 22) & 3;
>> +uint32_t len = 1 << sas;
>> 

Re: [PATCH V3 5/6] block/rbd: add write zeroes support

2021-06-27 Thread Peter Lieven



> Am 26.06.2021 um 17:57 schrieb Ilya Dryomov :
> 
> On Mon, Jun 21, 2021 at 10:49 AM Peter Lieven  wrote:
>> 
>>> Am 18.06.21 um 12:34 schrieb Ilya Dryomov:
>>> On Fri, Jun 18, 2021 at 11:00 AM Peter Lieven  wrote:
 Am 16.06.21 um 14:34 schrieb Ilya Dryomov:
> On Wed, May 19, 2021 at 4:28 PM Peter Lieven  wrote:
>> Signed-off-by: Peter Lieven 
>> ---
>>  block/rbd.c | 37 -
>>  1 file changed, 36 insertions(+), 1 deletion(-)
>> 
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 0d8612a988..ee13f08a74 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -63,7 +63,8 @@ typedef enum {
>>  RBD_AIO_READ,
>>  RBD_AIO_WRITE,
>>  RBD_AIO_DISCARD,
>> -RBD_AIO_FLUSH
>> +RBD_AIO_FLUSH,
>> +RBD_AIO_WRITE_ZEROES
>>  } RBDAIOCmd;
>> 
>>  typedef struct BDRVRBDState {
>> @@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, 
>> QDict *options, int flags,
>>  }
>>  }
>> 
>> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
>> +bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
> I wonder if we should also set BDRV_REQ_NO_FALLBACK here since librbd
> does not really have a notion of non-efficient explicit zeroing.
 
 This is only true if thick provisioning is supported which is in Octopus 
 onwards, right?
>>> Since Pacific, I think.
>>> 
 So it would only be correct to set this if thick provisioning is supported 
 otherwise we could
 
 fail with ENOTSUP and then qemu emulates the zeroing with plain writes.
>>> I actually had a question about that.  Why are you returning ENOTSUP
>>> in case BDRV_REQ_MAY_UNMAP is not specified and that can't be fulfilled
>>> because librbd is too old for RBD_WRITE_ZEROES_FLAG_THICK_PROVISION?
>>> 
>>> My understanding has always been that BDRV_REQ_MAY_UNMAP is just
>>> a hint.  Deallocating if BDRV_REQ_MAY_UNMAP is specified is not nice
>>> but should be perfectly acceptable.  It is certainly better than
>>> returning ENOTSUP, particularly if ENOTSUP causes Qemu to do plain
>>> zeroing.
>> 
>> 
>> I think this was introduced to support different provisioning modes. If 
>> BDRV_REQ_MAY_UNMAP is not set
>> 
>> the caller of bdrv_write_zeroes expects that the driver does thick 
>> provisioning. If the driver cannot handle that (efficiently)
>> 
>> qemu does a plain zero write.
>> 
>> 
>> I am still not fully understanding the meaning of the BDRV_REQ_NO_FALLBACK 
>> flag. The original commit states that it was introduced for qemu-img to 
>> efficiently
>> 
>> zero out the target and avoid the slow fallback. When I last worked on 
>> qemu-img convert I remember that there was a call to zero out the target if 
>> bdrv_has_zero_init
>> 
>> is not 1. It seems hat meanwhile a target_is_zero cmdline switch for 
>> qemu-img convert was added to let the user assert that a preexisting target 
>> is zero.
>> 
>> Maybe someone can help here if it would be right to set BDRV_REQ_NO_FALLBACK 
>> for rbd in either of the 2 cases (thick provisioning is support or not)?
> 
> Since no one spoke up I think we should
> 
> a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
>   and as a consequence always unmap if librbd is too old
> 
>   It's not clear what qemu's expectation is but in general Write
>   Zeroes is allowed to unmap.  The only guarantee is that subsequent
>   reads return zeroes, everything else is a hint.  This is how it is
>   specified in the kernel and in the NVMe spec.
> 
>   In particular, block/nvme.c implements it as follows:
> 
>   if (flags & BDRV_REQ_MAY_UNMAP) {
>   cdw12 |= (1 << 25);
>   }
> 
>   This sets the Deallocate bit.  But if it's not set, the device may
>   still deallocate:
> 
>   """
>   If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
>   command, and the namespace supports clearing all bytes to 0h in the
>   values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
>   from a deallocated logical block and its metadata (excluding
>   protection information), then for each specified logical block, the
>   controller:
>   - should deallocate that logical block;
> 
>   ...
> 
>   If the Deallocate bit is cleared to '0' in a Write Zeroes command,
>   and the namespace supports clearing all bytes to 0h in the values
>   read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
>   a deallocated logical block and its metadata (excluding protection
>   information), then, for each specified logical block, the
>   controller:
>   - may deallocate that logical block;
>   """
> 
>   
> https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf
> 
> b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
> 
>   Again, it's not clear what qemu expects here, but without it we end
>   up in a ridiculous situation where specifying the "don't allow slow
>   fallback" sw

[Bug 1897568] Re: Strange keyboard behaviour in Vim editor

2021-06-27 Thread felix
Can someone explain why the patch keeps the incorrect behaviour as the
default? It’s silly.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1897568

Title:
  Strange keyboard behaviour in Vim editor

Status in QEMU:
  Fix Committed

Bug description:
  
  I'm running MS-DOS 7.10 in a QEMU virtual machine, and there is a problem 
with the keyboard in the Vim editor.  The arrow keys jump over a line, as if 
you had typed the key twice.  PgUp and PgDn are likewise affected.  Other 
applications are not affected, unless you shell out from Vim.

  The QEMU version is 5.0.0, and I'm using the "-k sv" option, but I've
  tried without it and it doesn't make a difference.

  I don't get this keyboard behaviour in the exact same VM under VMware
  Player or Bochs.

  -Albert.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1897568/+subscriptions



[PATCH 0/4] ppc/Pegasos2: Firmware replacement using VOF

2021-06-27 Thread BALATON Zoltan
Based-on: <20210625055155.2252896-1-...@ozlabs.ru>
^ That is v22 of Alexey's VOF patch

With this series on top of VOF v22 I can now boot Linux and MorphOS on
pegasos2 without a firmware blob so I hope this is enough to get this
board in 6.1 and also have it enabled so people can start using it
eventually (a lot of people don't compile their QEMU but rely on
binaries from distros and other sources). Provided that VOF will also
be merged by then. This gives VOF another use case that may help it
getting merged at last.

Further info and example command lines can be found at
https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2

Regards,
BALATON Zoltan

BALATON Zoltan (4):
  ppc/pegasos2: Introduce Pegasos2MachineState structure
  target/ppc: Allow virtual hypervisor on CPU without HV
  ppc/pegasos2: Use Virtual Open Firmware as firmware replacement
  ppc/pegasos2: Implement some RTAS functions with VOF

 default-configs/devices/ppc-softmmu.mak |   2 +-
 hw/ppc/Kconfig  |   1 +
 hw/ppc/pegasos2.c   | 783 +++-
 target/ppc/cpu.c|   2 +-
 4 files changed, 771 insertions(+), 17 deletions(-)

-- 
2.21.4




[PATCH 1/4] ppc/pegasos2: Introduce Pegasos2MachineState structure

2021-06-27 Thread BALATON Zoltan
Add own machine state structure which will be used to store state
needed for firmware emulation.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/ppc/pegasos2.c | 50 +++
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 0bfd0928aa..07971175c9 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -1,7 +1,7 @@
 /*
  * QEMU PowerPC CHRP (Genesi/bPlan Pegasos II) hardware System Emulator
  *
- * Copyright (c) 2018-2020 BALATON Zoltan
+ * Copyright (c) 2018-2021 BALATON Zoltan
  *
  * This work is licensed under the GNU GPL license version 2 or later.
  *
@@ -41,6 +41,15 @@
 
 #define BUS_FREQ_HZ 1
 
+#define TYPE_PEGASOS2_MACHINE  MACHINE_TYPE_NAME("pegasos2")
+OBJECT_DECLARE_TYPE(Pegasos2MachineState, MachineClass, PEGASOS2_MACHINE)
+
+struct Pegasos2MachineState {
+MachineState parent_obj;
+PowerPCCPU *cpu;
+DeviceState *mv;
+};
+
 static void pegasos2_cpu_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
@@ -51,9 +60,9 @@ static void pegasos2_cpu_reset(void *opaque)
 
 static void pegasos2_init(MachineState *machine)
 {
-PowerPCCPU *cpu = NULL;
+Pegasos2MachineState *pm = PEGASOS2_MACHINE(machine);
+CPUPPCState *env;
 MemoryRegion *rom = g_new(MemoryRegion, 1);
-DeviceState *mv;
 PCIBus *pci_bus;
 PCIDevice *dev;
 I2CBus *i2c_bus;
@@ -63,15 +72,16 @@ static void pegasos2_init(MachineState *machine)
 uint8_t *spd_data;
 
 /* init CPU */
-cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
-if (PPC_INPUT(&cpu->env) != PPC_FLAGS_INPUT_6xx) {
+pm->cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
+env = &pm->cpu->env;
+if (PPC_INPUT(env) != PPC_FLAGS_INPUT_6xx) {
 error_report("Incompatible CPU, only 6xx bus supported");
 exit(1);
 }
 
 /* Set time-base frequency */
-cpu_ppc_tb_init(&cpu->env, BUS_FREQ_HZ / 4);
-qemu_register_reset(pegasos2_cpu_reset, cpu);
+cpu_ppc_tb_init(env, BUS_FREQ_HZ / 4);
+qemu_register_reset(pegasos2_cpu_reset, pm->cpu);
 
 /* RAM */
 memory_region_add_subregion(get_system_memory(), 0, machine->ram);
@@ -96,16 +106,16 @@ static void pegasos2_init(MachineState *machine)
 g_free(filename);
 
 /* Marvell Discovery II system controller */
-mv = DEVICE(sysbus_create_simple(TYPE_MV64361, -1,
-((qemu_irq *)cpu->env.irq_inputs)[PPC6xx_INPUT_INT]));
-pci_bus = mv64361_get_pci_bus(mv, 1);
+pm->mv = DEVICE(sysbus_create_simple(TYPE_MV64361, -1,
+ ((qemu_irq *)env->irq_inputs)[PPC6xx_INPUT_INT]));
+pci_bus = mv64361_get_pci_bus(pm->mv, 1);
 
 /* VIA VT8231 South Bridge (multifunction PCI device) */
 /* VT8231 function 0: PCI-to-ISA Bridge */
 dev = pci_create_simple_multifunction(pci_bus, PCI_DEVFN(12, 0), true,
   TYPE_VT8231_ISA);
 qdev_connect_gpio_out(DEVICE(dev), 0,
-  qdev_get_gpio_in_named(mv, "gpp", 31));
+  qdev_get_gpio_in_named(pm->mv, "gpp", 31));
 
 /* VT8231 function 1: IDE Controller */
 dev = pci_create_simple(pci_bus, PCI_DEVFN(12, 1), "via-ide");
@@ -129,8 +139,10 @@ static void pegasos2_init(MachineState *machine)
 pci_vga_init(pci_bus);
 }
 
-static void pegasos2_machine(MachineClass *mc)
+static void pegasos2_machine_class_init(ObjectClass *oc, void *data)
 {
+MachineClass *mc = MACHINE_CLASS(oc);
+
 mc->desc = "Genesi/bPlan Pegasos II";
 mc->init = pegasos2_init;
 mc->block_default_type = IF_IDE;
@@ -141,4 +153,16 @@ static void pegasos2_machine(MachineClass *mc)
 mc->default_ram_size = 512 * MiB;
 }
 
-DEFINE_MACHINE("pegasos2", pegasos2_machine)
+static const TypeInfo pegasos2_machine_info = {
+.name  = TYPE_PEGASOS2_MACHINE,
+.parent= TYPE_MACHINE,
+.class_init= pegasos2_machine_class_init,
+.instance_size = sizeof(Pegasos2MachineState),
+};
+
+static void pegasos2_machine_register_types(void)
+{
+type_register_static(&pegasos2_machine_info);
+}
+
+type_init(pegasos2_machine_register_types)
-- 
2.21.4




[PATCH 4/4] ppc/pegasos2: Implement some RTAS functions with VOF

2021-06-27 Thread BALATON Zoltan
Linux uses RTAS functions to access PCI devices so we need to provide
these with VOF. Implement some of the most important functions to
allow booting Linux with VOF. With this the board is now usable
without a binary ROM image and we can enable it by default as other
boards.

Signed-off-by: BALATON Zoltan 
---
 default-configs/devices/ppc-softmmu.mak |   2 +-
 hw/ppc/pegasos2.c   | 131 
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/default-configs/devices/ppc-softmmu.mak 
b/default-configs/devices/ppc-softmmu.mak
index c2d41198cd..4535993d8d 100644
--- a/default-configs/devices/ppc-softmmu.mak
+++ b/default-configs/devices/ppc-softmmu.mak
@@ -14,7 +14,7 @@ CONFIG_SAM460EX=y
 CONFIG_MAC_OLDWORLD=y
 CONFIG_MAC_NEWWORLD=y
 
-CONFIG_PEGASOS2=n
+CONFIG_PEGASOS2=y
 
 # For PReP
 CONFIG_PREP=y
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index f1741a4512..d482806edd 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -43,6 +43,7 @@
 #define PROM_SIZE 0x8
 
 #define KVMPPC_HCALL_BASE0xf000
+#define KVMPPC_H_RTAS(KVMPPC_HCALL_BASE + 0x0)
 #define KVMPPC_H_VOF_CLIENT  (KVMPPC_HCALL_BASE + 0x5)
 
 #define H_SUCCESS 0
@@ -195,6 +196,30 @@ static void pegasos2_init(MachineState *machine)
 }
 }
 
+static uint32_t pegasos2_pci_config_read(AddressSpace *as, int bus,
+ uint32_t addr, uint32_t len)
+{
+hwaddr pcicfg = (bus ? 0xf1000c78 : 0xf1000cf8);
+uint32_t val = 0x;
+
+stl_le_phys(as, pcicfg, addr | BIT(31));
+switch (len) {
+case 4:
+val = ldl_le_phys(as, pcicfg + 4);
+break;
+case 2:
+val = lduw_le_phys(as, pcicfg + 4);
+break;
+case 1:
+val = ldub_phys(as, pcicfg + 4);
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid length\n", __func__);
+break;
+}
+return val;
+}
+
 static void pegasos2_pci_config_write(AddressSpace *as, int bus, uint32_t addr,
   uint32_t len, uint32_t val)
 {
@@ -304,6 +329,87 @@ static void pegasos2_machine_reset(MachineState *machine)
 pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);
 }
 
+enum pegasos2_rtas_tokens {
+RTAS_RESTART_RTAS = 0,
+RTAS_NVRAM_FETCH = 1,
+RTAS_NVRAM_STORE = 2,
+RTAS_GET_TIME_OF_DAY = 3,
+RTAS_SET_TIME_OF_DAY = 4,
+RTAS_EVENT_SCAN = 6,
+RTAS_CHECK_EXCEPTION = 7,
+RTAS_READ_PCI_CONFIG = 8,
+RTAS_WRITE_PCI_CONFIG = 9,
+RTAS_DISPLAY_CHARACTER = 10,
+RTAS_SET_INDICATOR = 11,
+RTAS_POWER_OFF = 17,
+RTAS_SUSPEND = 18,
+RTAS_HIBERNATE = 19,
+RTAS_SYSTEM_REBOOT = 20,
+};
+
+static target_ulong pegasos2_rtas(PowerPCCPU *cpu, Pegasos2MachineState *pm,
+  target_ulong args_real)
+{
+AddressSpace *as = CPU(cpu)->as;
+uint32_t token = ldl_be_phys(as, args_real);
+uint32_t nargs = ldl_be_phys(as, args_real + 4);
+uint32_t nrets = ldl_be_phys(as, args_real + 8);
+uint32_t args = args_real + 12;
+uint32_t rets = args_real + 12 + nargs * 4;
+
+if (nrets < 1) {
+qemu_log_mask(LOG_GUEST_ERROR, "Too few return values in RTAS call\n");
+return H_PARAMETER;
+}
+switch (token) {
+case RTAS_READ_PCI_CONFIG:
+{
+uint32_t addr, len, val;
+
+if (nargs != 2 || nrets != 2) {
+stl_be_phys(as, rets, -1);
+return H_PARAMETER;
+}
+addr = ldl_be_phys(as, args);
+len = ldl_be_phys(as, args + 4);
+val = pegasos2_pci_config_read(as, !(addr >> 24),
+   addr & 0x0fff, len);
+stl_be_phys(as, rets, 0);
+stl_be_phys(as, rets + 4, val);
+return H_SUCCESS;
+}
+case RTAS_WRITE_PCI_CONFIG:
+{
+uint32_t addr, len, val;
+
+if (nargs != 3 || nrets != 1) {
+stl_be_phys(as, rets, -1);
+return H_PARAMETER;
+}
+addr = ldl_be_phys(as, args);
+len = ldl_be_phys(as, args + 4);
+val = ldl_be_phys(as, args + 8);
+pegasos2_pci_config_write(as, !(addr >> 24),
+  addr & 0x0fff, len, val);
+stl_be_phys(as, rets, 0);
+return H_SUCCESS;
+}
+case RTAS_DISPLAY_CHARACTER:
+if (nargs != 1 || nrets != 1) {
+stl_be_phys(as, rets, -1);
+return H_PARAMETER;
+}
+qemu_log_mask(LOG_UNIMP, "%c", ldl_be_phys(as, args));
+stl_be_phys(as, rets, 0);
+return H_SUCCESS;
+default:
+qemu_log_mask(LOG_UNIMP, "Unknown RTAS token %u (args=%u, rets=%u)\n",
+  token, nargs, nrets);
+stl_be_phys(as, rets, 0);
+return H_SUCCESS;
+}
+}
+
 static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
 {
 Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);
@@ -315,6 +421,8 @@ static void pegasos2_

[PATCH 3/4] ppc/pegasos2: Use Virtual Open Firmware as firmware replacement

2021-06-27 Thread BALATON Zoltan
The pegasos2 board comes with an Open Firmware compliant ROM based on
SmartFirmware but it has some changes that are not open source
therefore the ROM binary cannot be included in QEMU. Guests running on
the board however depend on services provided by the firmware. The
Virtual Open Firmware recently added to QEMU implements a minimal set
of these services to allow some guests to boot without the original
firmware. This patch adds VOF as the default firmware for pegasos2
which allows booting Linux and MorphOS via -kernel option while a ROM
image can still be used with -bios for guests that don't run with VOF.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/Kconfig|   1 +
 hw/ppc/pegasos2.c | 602 +-
 2 files changed, 601 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 67630f80e1..7fcafec60a 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -76,6 +76,7 @@ config PEGASOS2
 select VT82C686
 select IDE_VIA
 select SMBUS_EEPROM
+select VOF
 # This should come with VT82C686
 select ACPI_X86
 
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 07971175c9..f1741a4512 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -34,13 +34,33 @@
 #include "trace.h"
 #include "qemu/datadir.h"
 #include "sysemu/device_tree.h"
+#include "hw/ppc/vof.h"
 
-#define PROM_FILENAME "pegasos2.rom"
+#include 
+
+#define PROM_FILENAME "vof.bin"
 #define PROM_ADDR 0xfff0
 #define PROM_SIZE 0x8
 
+#define KVMPPC_HCALL_BASE0xf000
+#define KVMPPC_H_VOF_CLIENT  (KVMPPC_HCALL_BASE + 0x5)
+
+#define H_SUCCESS 0
+#define H_PRIVILEGE  -3  /* Caller not privileged */
+#define H_PARAMETER  -4  /* Parameter invalid, out-of-range or conflicting */
+
 #define BUS_FREQ_HZ 1
 
+#define PCI0_MEM_BASE 0xc000
+#define PCI0_MEM_SIZE 0x2000
+#define PCI0_IO_BASE  0xf800
+#define PCI0_IO_SIZE  0x1
+
+#define PCI1_MEM_BASE 0x8000
+#define PCI1_MEM_SIZE 0x4000
+#define PCI1_IO_BASE  0xfe00
+#define PCI1_IO_SIZE  0x1
+
 #define TYPE_PEGASOS2_MACHINE  MACHINE_TYPE_NAME("pegasos2")
 OBJECT_DECLARE_TYPE(Pegasos2MachineState, MachineClass, PEGASOS2_MACHINE)
 
@@ -48,14 +68,26 @@ struct Pegasos2MachineState {
 MachineState parent_obj;
 PowerPCCPU *cpu;
 DeviceState *mv;
+Vof *vof;
+void *fdt_blob;
+uint64_t kernel_addr;
+uint64_t kernel_entry;
+uint64_t kernel_size;
 };
 
+static void *build_fdt(MachineState *machine, int *fdt_size);
+
 static void pegasos2_cpu_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
+Pegasos2MachineState *pm = PEGASOS2_MACHINE(current_machine);
 
 cpu_reset(CPU(cpu));
 cpu->env.spr[SPR_HID1] = 7ULL << 28;
+if (pm->vof) {
+cpu->env.gpr[1] = 2 * VOF_STACK_SIZE - 0x20;
+cpu->env.nip = 0x100;
+}
 }
 
 static void pegasos2_init(MachineState *machine)
@@ -92,18 +124,24 @@ static void pegasos2_init(MachineState *machine)
 error_report("Could not find firmware '%s'", fwname);
 exit(1);
 }
+if (!machine->firmware && !pm->vof) {
+pm->vof = g_malloc0(sizeof(*pm->vof));
+}
 memory_region_init_rom(rom, NULL, "pegasos2.rom", PROM_SIZE, &error_fatal);
 memory_region_add_subregion(get_system_memory(), PROM_ADDR, rom);
 sz = load_elf(filename, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 1,
   PPC_ELF_MACHINE, 0, 0);
 if (sz <= 0) {
-sz = load_image_targphys(filename, PROM_ADDR, PROM_SIZE);
+sz = load_image_targphys(filename, pm->vof ? 0 : PROM_ADDR, PROM_SIZE);
 }
 if (sz <= 0 || sz > PROM_SIZE) {
 error_report("Could not load firmware '%s'", filename);
 exit(1);
 }
 g_free(filename);
+if (pm->vof) {
+pm->vof->fw_size = sz;
+}
 
 /* Marvell Discovery II system controller */
 pm->mv = DEVICE(sysbus_create_simple(TYPE_MV64361, -1,
@@ -137,20 +175,185 @@ static void pegasos2_init(MachineState *machine)
 
 /* other PC hardware */
 pci_vga_init(pci_bus);
+
+if (machine->kernel_filename) {
+sz = load_elf(machine->kernel_filename, NULL, NULL, NULL,
+  &pm->kernel_entry, &pm->kernel_addr, NULL, NULL, 1,
+  PPC_ELF_MACHINE, 0, 0);
+if (sz <= 0) {
+error_report("Could not load kernel '%s'",
+ machine->kernel_filename);
+exit(1);
+}
+pm->kernel_size = sz;
+if (!pm->vof) {
+warn_report("Option -kernel may be ineffective with -bios.");
+}
+}
+if (machine->kernel_cmdline && !pm->vof) {
+warn_report("Option -append may be ineffective with -bios.");
+}
+}
+
+static void pegasos2_pci_config_write(AddressSpace *as, int bus, uint32_t addr,
+  uint32_t len, uint32_t val)
+{
+hwaddr pcicfg = (bus ? 0xf1000c78 : 0xf1000cf8);
+
+stl_le_phys(as, pcicfg, addr | BIT(31));
+switch

Re: [PATCH qemu v22] spapr: Implement Open Firmware client interface

2021-06-27 Thread BALATON Zoltan

On Fri, 25 Jun 2021, Alexey Kardashevskiy wrote:

The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..1 - stack
40.. - kernel
3ea.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.

Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: BALATON Zoltan 

Regards,
BALATON Zoltan


---

The example command line is:

/home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 8G \
-machine 
pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
 \
-kernel pbuild/kernel-le-guest/vmlinux \
-initrd pb/rootfs.cpio.xz \
-drive 
id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \
-global spapr-nvram.drive=DRIVE0 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
-mon chardev=SOCKET0,mode=control

---
Changes:
v22:
* dropped changes to ./configure and compile VOF always
* added various comments
* style fixes

v21:
* s/ld/lwz/ in entry.S
* moved CONFIG_VOF from default-configs/devices/ppc64-softmmu.mak to Kconfig
* made CONFIG_VOF optional
* s/l.lds/vof.lds/
* force 32 BE in spapr_machine_reset() instead of the firmware
* added checks for non-null methods of VofMachineIfClass
* moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a better comment
* added  path_offset wrapper for handling mixed case for addresses
after "@" in node names
* changed getprop() to check for actual "name" property in the fdt
* moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as (unlike similar
rtas_ld/ldl_be_*) they return error codes
* VOF_MEM_READ uses now address_space_read (it was address_space_read_full
before, not sure why)

v20:
* compile vof.bin with -mcpu=power4 for better compatibility
* s/std/stw/ in entry.S to make it work on ppc32
* fixed dt_available property to support both 32 and 64bit
* shuffled prom_args handling code
* do not e

[PATCH 2/4] target/ppc: Allow virtual hypervisor on CPU without HV

2021-06-27 Thread BALATON Zoltan
Change the assert in ppc_store_sdr1() to allow vhyp to be set on CPUs
without HV bit. This allows using the vhyp interface for firmware
emulation on pegasos2.

Signed-off-by: BALATON Zoltan 
---
 target/ppc/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index 19d67b5b07..a29299882a 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -72,7 +72,7 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value)
 {
 PowerPCCPU *cpu = env_archcpu(env);
 qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value);
-assert(!cpu->vhyp);
+assert(!cpu->env.has_hv_mode || !cpu->vhyp);
 #if defined(TARGET_PPC64)
 if (mmu_is_64bit(env->mmu_model)) {
 target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE;
-- 
2.21.4




[PATCH v10 4/6] migration/dirtyrate: adjust order of registering thread

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

registering get_dirtyrate thread in advance so that both
page-sampling and dirty-ring mode can be covered.

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: 

Reviewed-by: Peter Xu 
---
 migration/dirtyrate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index e0a27a9..a9bdd60 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -352,7 +352,6 @@ static void calculate_dirtyrate(struct DirtyRateConfig 
config)
 int64_t msec = 0;
 int64_t initial_time;
 
-rcu_register_thread();
 rcu_read_lock();
 initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 if (!record_ramblock_hash_info(&block_dinfo, config, &block_count)) {
@@ -375,7 +374,6 @@ static void calculate_dirtyrate(struct DirtyRateConfig 
config)
 out:
 rcu_read_unlock();
 free_ramblock_dirty_info(block_dinfo, block_count);
-rcu_unregister_thread();
 }
 
 void *get_dirtyrate_thread(void *arg)
@@ -383,6 +381,7 @@ void *get_dirtyrate_thread(void *arg)
 struct DirtyRateConfig config = *(struct DirtyRateConfig *)arg;
 int ret;
 int64_t start_time;
+rcu_register_thread();
 
 ret = dirtyrate_set_state(&CalculatingState, DIRTY_RATE_STATUS_UNSTARTED,
   DIRTY_RATE_STATUS_MEASURING);
@@ -401,6 +400,8 @@ void *get_dirtyrate_thread(void *arg)
 if (ret == -1) {
 error_report("change dirtyrate state failed.");
 }
+
+rcu_unregister_thread();
 return NULL;
 }
 
-- 
1.8.3.1




[PATCH v10 6/6] migration/dirtyrate: implement dirty-ring dirtyrate calculation

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

use dirty ring feature to implement dirtyrate calculation.

introduce mode option in qmp calc_dirty_rate to specify what
method should be used when calculating dirtyrate, either
page-sampling or dirty-ring should be passed.

introduce "dirty_ring:-r" option in hmp calc_dirty_rate to
indicate dirty ring method should be used for calculation.

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: 
<7db445109bd18125ce8ec86816d14f6ab5de6a7d.1624040308.git.huang...@chinatelecom.cn>
Reviewed-by: Peter Xu 
---
 hmp-commands.hx|   7 +-
 migration/dirtyrate.c  | 207 ++---
 migration/trace-events |   2 +
 qapi/migration.json|  16 +++-
 4 files changed, 217 insertions(+), 15 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8e45bce..f7fc9d7 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1738,8 +1738,9 @@ ERST
 
 {
 .name   = "calc_dirty_rate",
-.args_type  = "second:l,sample_pages_per_GB:l?",
-.params = "second [sample_pages_per_GB]",
-.help   = "start a round of guest dirty rate measurement",
+.args_type  = "dirty_ring:-r,second:l,sample_pages_per_GB:l?",
+.params = "[-r] second [sample_pages_per_GB]",
+.help   = "start a round of guest dirty rate measurement (using -d 
to"
+  "\n\t\t\t specify dirty ring as the method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index b8f61cc..3c8c5e2 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -16,6 +16,7 @@
 #include "cpu.h"
 #include "exec/ramblock.h"
 #include "qemu/rcu_queue.h"
+#include "qemu/main-loop.h"
 #include "qapi/qapi-commands-migration.h"
 #include "ram.h"
 #include "trace.h"
@@ -23,9 +24,19 @@
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qdict.h"
+#include "sysemu/kvm.h"
+#include "sysemu/runstate.h"
+#include "exec/memory.h"
+
+typedef struct DirtyPageRecord {
+uint64_t start_pages;
+uint64_t end_pages;
+} DirtyPageRecord;
 
 static int CalculatingState = DIRTY_RATE_STATUS_UNSTARTED;
 static struct DirtyRateStat DirtyStat;
+static DirtyRateMeasureMode dirtyrate_mode =
+DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
 static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
 {
@@ -70,18 +81,37 @@ static int dirtyrate_set_state(int *state, int old_state, 
int new_state)
 
 static struct DirtyRateInfo *query_dirty_rate_info(void)
 {
+int i;
 int64_t dirty_rate = DirtyStat.dirty_rate;
 struct DirtyRateInfo *info = g_malloc0(sizeof(DirtyRateInfo));
-
-if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
-info->has_dirty_rate = true;
-info->dirty_rate = dirty_rate;
-}
+DirtyRateVcpuList *head = NULL, **tail = &head;
 
 info->status = CalculatingState;
 info->start_time = DirtyStat.start_time;
 info->calc_time = DirtyStat.calc_time;
 info->sample_pages = DirtyStat.sample_pages;
+info->mode = dirtyrate_mode;
+
+if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
+info->has_dirty_rate = true;
+info->dirty_rate = dirty_rate;
+
+if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_DIRTY_RING) {
+/*
+ * set sample_pages with 0 to indicate page sampling
+ * isn't enabled
+ **/
+info->sample_pages = 0;
+info->has_vcpu_dirty_rate = true;
+for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
+DirtyRateVcpu *rate = g_malloc0(sizeof(DirtyRateVcpu));
+rate->id = DirtyStat.dirty_ring.rates[i].id;
+rate->dirty_rate = DirtyStat.dirty_ring.rates[i].dirty_rate;
+QAPI_LIST_APPEND(tail, rate);
+}
+info->vcpu_dirty_rate = head;
+}
+}
 
 trace_query_dirty_rate_info(DirtyRateStatus_str(CalculatingState));
 
@@ -111,6 +141,15 @@ static void init_dirtyrate_stat(int64_t start_time,
 }
 }
 
+static void cleanup_dirtyrate_stat(struct DirtyRateConfig config)
+{
+/* last calc-dirty-rate qmp use dirty ring mode */
+if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_DIRTY_RING) {
+free(DirtyStat.dirty_ring.rates);
+DirtyStat.dirty_ring.rates = NULL;
+}
+}
+
 static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
 {
 DirtyStat.page_sampling.total_dirty_samples += info->sample_dirty_count;
@@ -345,7 +384,96 @@ static bool compare_page_hash_info(struct 
RamblockDirtyInfo *info,
 return true;
 }
 
-static void calculate_dirtyrate(struct DirtyRateConfig config)
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty

[PATCH v10 0/6] support dirtyrate at the granualrity of vcpu

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

v10
- rebase on master

- pre-check if dirty log has started when calling
  memory_global_dirty_log_stop in the ram_save_cleanup.
  since it will stop dirty log unconditionally, so add if statement
  to ensure that memory_global_dirty_log_start/stop used in pairs.

- modify the memory_global_dirty_log_start/log in xen vitualization
  and make the qemu works in the old way.

v9
- rebase on master

- make global_dirty_tracking a bitmask:
  pass the casted "flags" to the callback in the right way, and drop
  the malloc/free step.

- make bitmask comments more accurate

please review, thanks.

v8
- drop the "none" type of DirtyRateMeasureMode

- make global_dirty_tracking a bitmask:
  1. code clean: pass the casted "flags" as input parameter in
 memory_global_dirty_log_stop, and then drop the free step.
  2. squash commit "rename global_dirty_log to global_dirty_tracking"
 into commit "make global_dirty_tracking a bitmask"
  3. move "GLOBAL_DIRTY_MASK" macro to the bits's definations.

- drop the "cleanup_dirtyrate_stat" in commit
  "move init step of calculation to main thread" so that each commit
  keeps the old way working.

- add dirty rate unit "MB/s" in the output of hmp

please review, may be this is the last version of this patchset, :)
thanks for Peter's patient instructions and reviews.

Hyman Huang(黄勇)

v7
- fix the code style problem, sorry about that

v6:
- pick up commit "KVM: introduce dirty_pages and kvm_dirty_ring_enabled"
  which has been dropped in verison 5

v5:
- rename global_dirty_log to global_dirty_tracking on Peter's advice

- make global_dirty_tracking a bitmask:
  1. add assert statement to ensure starting dirty tracking repeatly
 not allowed.
  2. add assert statement to ensure dirty tracking cannot be stopped
 without having been started.

- protecting dirty rate stat info:
  1. drop the mutext for protecting dirty rate introduced in version 4
  2. change the code block in query_dirty_rate_info so that requirements
 of "safe racing" to the dirty rate stat can be meet

- make the helper function "record_dirtypages" inline and change
  the global var dirty_pages  to local var

- free DirtyRateVcpuList in case of memory leak

please review, thanks a lot.

v4:
- make global_dirty_log a bitmask:
  1. add comments about dirty log bitmask
  2. use assert statement to check validity of flags
  3. add trace to log bitmask changes

- introduce mode option to show what method calculation should be used,
  also, export mode option in the as last commmit

- split cleanup and init of dirty rate stat and move it in the main
  thread

- change the fields of DirtyPageRecord to uint64_t type so that we
  can calculation the increased dirty pages with the formula
  as Peter's advice: dirty pages = end_pages - start_pages

- introduce mutex to protect dirty rate stat info

- adjust order of registering thread

- drop the memory free callback

this version modify some code on Peter's advice, reference to:
https://lore.kernel.org/qemu-devel/YL5nNYXmrqMlXF3v@t490s/

thanks again.

v3:
- pick up "migration/dirtyrate: make sample page count configurable" to
  make patchset apply master correctly

v2:
- rebase to "migration/dirtyrate: make sample page count configurable"

- rename "vcpu" to "per_vcpu" to show the per-vcpu method

- squash patch 5/6 into a single one, squash patch 1/2 also

- pick up "hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds"

- make global_dirty_log a bitmask to make sure both migration and dirty
  could not intefer with each other

- add memory free callback to prevent memory leaking

the most different of v2 fron v1 is that we make the global_dirty_log a
bitmask. the reason is dirty rate measurement may start or stop dirty
logging during calculation. this conflict with migration because stop
dirty log make migration leave dirty pages out then that'll be a
problem.

make global_dirty_log a bitmask can let both migration and dirty
rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and
GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty log aims
for, migration or dirty rate.

all references to global_dirty_log should be untouched because any bit
set there should justify that global dirty logging is enabled.

Please review, thanks !

v1:

Since the Dirty Ring on QEMU part has been merged recently, how to use
this feature is under consideration.

In the scene of migration, it is valuable to provide a more accurante
interface to track dirty memory than existing one, so that the upper
layer application can make a wise decision, or whatever. More
importantly,
dirtyrate info at the granualrity of vcpu could provide a possibility to
make migration convergent by imposing restriction on vcpu. With Dirty
Ring, we can calculate dirtyrate efficiently and cheaply.

The old interface implemented by sampling pages, it consumes cpu
resource, and the larger guest memory size become, the more cpu resource
it consumes, namely, hard to sc

[PATCH v10 5/6] migration/dirtyrate: move init step of calculation to main thread

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

since main thread may "query dirty rate" at any time, it's better
to move init step into main thead so that synchronization overhead
between "main" and "get_dirtyrate" can be reduced.

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: 
<109f8077518ed2f13068e3bfb10e625e964780f1.1624040308.git.huang...@chinatelecom.cn>
Reviewed-by: Peter Xu 
---
 migration/dirtyrate.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index a9bdd60..b8f61cc 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -380,7 +380,6 @@ void *get_dirtyrate_thread(void *arg)
 {
 struct DirtyRateConfig config = *(struct DirtyRateConfig *)arg;
 int ret;
-int64_t start_time;
 rcu_register_thread();
 
 ret = dirtyrate_set_state(&CalculatingState, DIRTY_RATE_STATUS_UNSTARTED,
@@ -390,9 +389,6 @@ void *get_dirtyrate_thread(void *arg)
 return NULL;
 }
 
-start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
-init_dirtyrate_stat(start_time, config);
-
 calculate_dirtyrate(config);
 
 ret = dirtyrate_set_state(&CalculatingState, DIRTY_RATE_STATUS_MEASURING,
@@ -411,6 +407,7 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 static struct DirtyRateConfig config;
 QemuThread thread;
 int ret;
+int64_t start_time;
 
 /*
  * If the dirty rate is already being measured, don't attempt to start.
@@ -451,6 +448,10 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 config.sample_period_seconds = calc_time;
 config.sample_pages_per_gigabytes = sample_pages;
 config.mode = DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
+
+start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
+init_dirtyrate_stat(start_time, config);
+
 qemu_thread_create(&thread, "get_dirtyrate", get_dirtyrate_thread,
(void *)&config, QEMU_THREAD_DETACHED);
 }
-- 
1.8.3.1




[PATCH v10 3/6] migration/dirtyrate: introduce struct and adjust DirtyRateStat

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

introduce "DirtyRateMeasureMode" to specify what method should be
used to calculate dirty rate, introduce "DirtyRateVcpu" to store
dirty rate fore each vcpu.

use union to store stat data of specific mode

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: 
<661c98c40f40e163aa58334337af8f3ddf41316a.1624040308.git.huang...@chinatelecom.cn>
Reviewed-by: Peter Xu 
---
 migration/dirtyrate.c | 48 
 migration/dirtyrate.h | 19 ---
 qapi/migration.json   | 30 ++
 3 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 320c56b..e0a27a9 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -88,33 +88,44 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 return info;
 }
 
-static void init_dirtyrate_stat(int64_t start_time, int64_t calc_time,
-uint64_t sample_pages)
+static void init_dirtyrate_stat(int64_t start_time,
+struct DirtyRateConfig config)
 {
-DirtyStat.total_dirty_samples = 0;
-DirtyStat.total_sample_count = 0;
-DirtyStat.total_block_mem_MB = 0;
 DirtyStat.dirty_rate = -1;
 DirtyStat.start_time = start_time;
-DirtyStat.calc_time = calc_time;
-DirtyStat.sample_pages = sample_pages;
+DirtyStat.calc_time = config.sample_period_seconds;
+DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
+
+switch (config.mode) {
+case DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING:
+DirtyStat.page_sampling.total_dirty_samples = 0;
+DirtyStat.page_sampling.total_sample_count = 0;
+DirtyStat.page_sampling.total_block_mem_MB = 0;
+break;
+case DIRTY_RATE_MEASURE_MODE_DIRTY_RING:
+DirtyStat.dirty_ring.nvcpu = -1;
+DirtyStat.dirty_ring.rates = NULL;
+break;
+default:
+break;
+}
 }
 
 static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
 {
-DirtyStat.total_dirty_samples += info->sample_dirty_count;
-DirtyStat.total_sample_count += info->sample_pages_count;
+DirtyStat.page_sampling.total_dirty_samples += info->sample_dirty_count;
+DirtyStat.page_sampling.total_sample_count += info->sample_pages_count;
 /* size of total pages in MB */
-DirtyStat.total_block_mem_MB += (info->ramblock_pages *
- TARGET_PAGE_SIZE) >> 20;
+DirtyStat.page_sampling.total_block_mem_MB += (info->ramblock_pages *
+   TARGET_PAGE_SIZE) >> 20;
 }
 
 static void update_dirtyrate(uint64_t msec)
 {
 uint64_t dirtyrate;
-uint64_t total_dirty_samples = DirtyStat.total_dirty_samples;
-uint64_t total_sample_count = DirtyStat.total_sample_count;
-uint64_t total_block_mem_MB = DirtyStat.total_block_mem_MB;
+uint64_t total_dirty_samples = DirtyStat.page_sampling.total_dirty_samples;
+uint64_t total_sample_count = DirtyStat.page_sampling.total_sample_count;
+uint64_t total_block_mem_MB = DirtyStat.page_sampling.total_block_mem_MB;
 
 dirtyrate = total_dirty_samples * total_block_mem_MB *
 1000 / (total_sample_count * msec);
@@ -327,7 +338,7 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo 
*info,
 update_dirtyrate_stat(block_dinfo);
 }
 
-if (DirtyStat.total_sample_count == 0) {
+if (DirtyStat.page_sampling.total_sample_count == 0) {
 return false;
 }
 
@@ -372,8 +383,6 @@ void *get_dirtyrate_thread(void *arg)
 struct DirtyRateConfig config = *(struct DirtyRateConfig *)arg;
 int ret;
 int64_t start_time;
-int64_t calc_time;
-uint64_t sample_pages;
 
 ret = dirtyrate_set_state(&CalculatingState, DIRTY_RATE_STATUS_UNSTARTED,
   DIRTY_RATE_STATUS_MEASURING);
@@ -383,9 +392,7 @@ void *get_dirtyrate_thread(void *arg)
 }
 
 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
-calc_time = config.sample_period_seconds;
-sample_pages = config.sample_pages_per_gigabytes;
-init_dirtyrate_stat(start_time, calc_time, sample_pages);
+init_dirtyrate_stat(start_time, config);
 
 calculate_dirtyrate(config);
 
@@ -442,6 +449,7 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 
 config.sample_period_seconds = calc_time;
 config.sample_pages_per_gigabytes = sample_pages;
+config.mode = DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 qemu_thread_create(&thread, "get_dirtyrate", get_dirtyrate_thread,
(void *)&config, QEMU_THREAD_DETACHED);
 }
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index e1fd290..69d4c5b 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -43,6 +43,7 @@
 struct DirtyRateConfig {
 uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
 int64_t sample_period_seconds; /* time duration between two samplin

[PATCH v10 1/6] KVM: introduce dirty_pages and kvm_dirty_ring_enabled

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

dirty_pages is used to calculate dirtyrate via dirty ring, when
enabled, kvm-reaper will increase the dirty pages after gfns
being dirtied.

kvm_dirty_ring_enabled shows if kvm-reaper is working. dirtyrate
thread could use it to check if measurement can base on dirty
ring feature.

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: 

Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c   | 7 +++
 include/hw/core/cpu.h | 1 +
 include/sysemu/kvm.h  | 1 +
 3 files changed, 9 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c7ec538..bc012f0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -469,6 +469,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_fd = ret;
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -743,6 +744,7 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 count++;
 }
 cpu->kvm_fetch_index = fetch;
+cpu->dirty_pages += count;
 
 return count;
 }
@@ -2293,6 +2295,11 @@ bool kvm_vcpu_id_is_valid(int vcpu_id)
 return vcpu_id >= 0 && vcpu_id < kvm_max_vcpu_id(s);
 }
 
+bool kvm_dirty_ring_enabled(void)
+{
+return kvm_state->kvm_dirty_ring_size ? true : false;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 4e0ea68..80fcb1d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -374,6 +374,7 @@ struct CPUState {
 struct kvm_run *kvm_run;
 struct kvm_dirty_gfn *kvm_dirty_gfns;
 uint32_t kvm_fetch_index;
+uint64_t dirty_pages;
 
 /* Used for events with 'vcpu' and *without* the 'disabled' properties */
 DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee..7b22aeb 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -547,4 +547,5 @@ bool kvm_cpu_check_are_resettable(void);
 
 bool kvm_arch_cpu_check_are_resettable(void);
 
+bool kvm_dirty_ring_enabled(void);
 #endif
-- 
1.8.3.1




[PATCH v10 2/6] memory: make global_dirty_tracking a bitmask

2021-06-27 Thread huangy81
From: Hyman Huang(黄勇) 

since dirty ring has been introduced, there are two methods
to track dirty pages of vm. it seems that "logging" has
a hint on the method, so rename the global_dirty_log to
global_dirty_tracking would make description more accurate.

dirty rate measurement may start or stop dirty tracking during
calculation. this conflict with migration because stop dirty
tracking make migration leave dirty pages out then that'll be
a problem.

make global_dirty_tracking a bitmask can let both migration and
dirty rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION
and GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty
tracking aims for, migration or dirty rate.

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: 
<9c9388657cfa0301bd2c1cfa36e7cf6da4aeca19.1624040308.git.huang...@chinatelecom.cn>
Reviewed-by: Peter Xu 
---
 hw/i386/xen/xen-hvm.c   |  4 ++--
 include/exec/memory.h   | 20 +---
 include/exec/ram_addr.h |  4 ++--
 migration/ram.c | 15 +++
 softmmu/memory.c| 32 +---
 softmmu/trace-events|  1 +
 6 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 9b43277..d836d48 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1611,8 +1611,8 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t 
length)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
 if (enable) {
-memory_global_dirty_log_start();
+memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
 } else {
-memory_global_dirty_log_stop();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
 }
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index b116f7c..0ad0fe0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -55,7 +55,17 @@ static inline void fuzz_dma_read_cb(size_t addr,
 }
 #endif
 
-extern bool global_dirty_log;
+/* Possible bits for global_dirty_log_{start|stop} */
+
+/* Dirty tracking enabled because migration is running */
+#define GLOBAL_DIRTY_MIGRATION  (1U << 0)
+
+/* Dirty tracking enabled because measuring dirty rate */
+#define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
+
+#define GLOBAL_DIRTY_MASK  (0x3)
+
+extern unsigned int global_dirty_tracking;
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 
@@ -2105,13 +2115,17 @@ void memory_listener_unregister(MemoryListener 
*listener);
 
 /**
  * memory_global_dirty_log_start: begin dirty logging for all regions
+ *
+ * @flags: purpose of starting dirty log, migration or dirty rate
  */
-void memory_global_dirty_log_start(void);
+void memory_global_dirty_log_start(unsigned int flags);
 
 /**
  * memory_global_dirty_log_stop: end dirty logging for all regions
+ *
+ * @flags: purpose of stopping dirty log, migration or dirty rate
  */
-void memory_global_dirty_log_stop(void);
+void memory_global_dirty_log_stop(unsigned int flags);
 
 void mtree_info(bool flatview, bool dispatch_tree, bool owner, bool disabled);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 551876b..45c9132 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -369,7 +369,7 @@ static inline void 
cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 
 qatomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
 
-if (global_dirty_log) {
+if (global_dirty_tracking) {
 qatomic_or(
 &blocks[DIRTY_MEMORY_MIGRATION][idx][offset],
 temp);
@@ -392,7 +392,7 @@ static inline void 
cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 } else {
 uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : 
DIRTY_CLIENTS_NOCODE;
 
-if (!global_dirty_log) {
+if (!global_dirty_tracking) {
 clients &= ~(1 << DIRTY_MEMORY_MIGRATION);
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 723af67..33d201f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2190,7 +2190,14 @@ static void ram_save_cleanup(void *opaque)
 /* caller have hold iothread lock or is in a bh, so there is
  * no writing race against the migration bitmap
  */
-memory_global_dirty_log_stop();
+if (global_dirty_tracking & GLOBAL_DIRTY_MIGRATION) {
+/*
+ * do not stop dirty log without starting it, since
+ * memory_global_dirty_log_stop will assert that
+ * memory_global_dirty_log_start/stop used in pairs
+ */
+memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
+}
 }
 
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
@@ -2652,7 +2659,7 @@ static void ram_init_bitmaps(RAMState *rs)
 ram_list_init_bitmaps();
 /* We don't use dirty log with background snapshots */
 if (!migrate_background_snapshot()) {
-memory_global_

Re: [RFC PATCH 00/11] RISC-V: support clic v0.9 specification

2021-06-27 Thread Frank Chang
LIU Zhiwei  於 2021年4月9日 週五 下午3:58寫道:

> This patch set gives an implementation of "RISC-V Core-Local Interrupt
> Controller(CLIC) Version 0.9-draft-20210217". It comes from [1], where
> you can find the pdf format or the source code.
>
> I take over the job from Michael Clark, who gave the first implementation
> of clic-v0.7 specification. If there is any copyright question, please
> let me know.
>
> Features:
> 1. support four kinds of trigger types.
> 2. Preserve the CSR WARL/WPRI semantics.
> 3. Option to select different modes, such as M/S/U or M/U.
> 4. At most 4096 interrupts.
> 5. At most 1024 apertures.
>
> Todo:
> 1. Encode the interrupt trigger information to exccode.
> 2. Support complete CSR mclicbase when its number is fixed.
> 3. Gave each aperture an independend address.
>
> It have passed my qtest case and freertos test. Welcome to have a try
> for your hardware.
>
> Any advice is welcomed. Thanks very much.
>
> Zhiwei
>
> [1] specification website: https://github.com/riscv/riscv-fast-interrupt.
> [2] Michael Clark origin work:
> https://github.com/sifive/riscv-qemu/tree/sifive-clic.
>
>
> LIU Zhiwei (11):
>   target/riscv: Add CLIC CSR mintstatus
>   target/riscv: Update CSR xintthresh in CLIC mode
>   hw/intc: Add CLIC device
>   target/riscv: Update CSR xie in CLIC mode
>   target/riscv: Update CSR xip in CLIC mode
>   target/riscv: Update CSR xtvec in CLIC mode
>   target/riscv: Update CSR xtvt in CLIC mode
>   target/riscv: Update CSR xnxti in CLIC mode
>   target/riscv: Update CSR mclicbase in CLIC mode
>   target/riscv: Update interrupt handling in CLIC mode
>   target/riscv: Update interrupt return in CLIC mode
>
>  default-configs/devices/riscv32-softmmu.mak |   1 +
>  default-configs/devices/riscv64-softmmu.mak |   1 +
>  hw/intc/Kconfig |   3 +
>  hw/intc/meson.build |   1 +
>  hw/intc/riscv_clic.c| 836 
>  include/hw/intc/riscv_clic.h| 103 +++
>  target/riscv/cpu.h  |   9 +
>  target/riscv/cpu_bits.h |  32 +
>  target/riscv/cpu_helper.c   | 117 ++-
>  target/riscv/csr.c  | 247 +-
>  target/riscv/op_helper.c|  25 +
>  11 files changed, 1363 insertions(+), 12 deletions(-)
>  create mode 100644 hw/intc/riscv_clic.c
>  create mode 100644 include/hw/intc/riscv_clic.h
>
> --
> 2.25.1
>
>
>
After reviewing this patchset.
I found that CLIC v0.8 spec is quite incomplete.
It lacks all S-mode related CSRs.

If you think that it's just the v0.8 spec issue for not covering
all the required S-mode related CSRs -- and we should include them
in CLIC v0.8 implementation even it's not documented explicitly.
You can just ignore my comments in regard to S-mode CSRs for v0.8 CLIC.
(Besides mintthresh, v0.8 spec does say that it holds the 8-bit field
interrupt thresholds for each privilege mode.)

Regards,
Frank Chang


Re: [RFC PATCH 10/11] target/riscv: Update interrupt handling in CLIC mode

2021-06-27 Thread Frank Chang
LIU Zhiwei  於 2021年4月9日 週五 下午3:54寫道:

> Decode CLIC interrupt information from exccode, includes interrupt
> priviledge mode, interrupt level, and irq number.
>
> Then update CSRs xcause, xstatus, xepc, xintstatus and jump to
> correct PC according to the CLIC specification.
>
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/cpu_bits.h   |   1 +
>  target/riscv/cpu_helper.c | 117 +++---
>  2 files changed, 111 insertions(+), 7 deletions(-)
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 494e41edc9..d8378d2384 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -557,6 +557,7 @@
>  #define RISCV_EXCP_VIRT_INSTRUCTION_FAULT0x16
>  #define RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT  0x17
>
> +#define RISCV_EXCP_INT_CLIC0x4000
>  #define RISCV_EXCP_INT_FLAG0x8000
>  #define RISCV_EXCP_INT_MASK0x7fff
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 21c54ef561..998d1a2742 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -26,6 +26,10 @@
>  #include "trace.h"
>  #include "semihosting/common-semi.h"
>
> +#if !defined(CONFIG_USER_ONLY)
> +#include "hw/intc/riscv_clic.h"
> +#endif
> +
>  int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
>  {
>  #ifdef CONFIG_USER_ONLY
> @@ -36,6 +40,20 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
>  }
>
>  #ifndef CONFIG_USER_ONLY
> +static int riscv_cpu_local_irq_mode_enabled(CPURISCVState *env, int mode)
> +{
> +switch (mode) {
> +case PRV_M:
> +return env->priv < PRV_M ||
> +   (env->priv == PRV_M && get_field(env->mstatus,
> MSTATUS_MIE));
> +case PRV_S:
> +return env->priv < PRV_S ||
> +   (env->priv == PRV_S && get_field(env->mstatus,
> MSTATUS_SIE));
> +default:
> +return false;
> +}
> +}
> +
>  static int riscv_cpu_local_irq_pending(CPURISCVState *env)
>  {
>  target_ulong irqs;
> @@ -90,6 +108,18 @@ bool riscv_cpu_exec_interrupt(CPUState *cs, int
> interrupt_request)
>  return true;
>  }
>  }
> +if (interrupt_request & CPU_INTERRUPT_CLIC) {
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +CPURISCVState *env = &cpu->env;
> +int mode = (env->exccode >> 12) & 0b11;
> +int enabled = riscv_cpu_local_irq_mode_enabled(env, mode);
> +if (enabled) {
> +cs->exception_index = RISCV_EXCP_INT_CLIC | env->exccode;
> +cs->interrupt_request = cs->interrupt_request &
> ~CPU_INTERRUPT_CLIC;
> +riscv_cpu_do_interrupt(cs);
> +return true;
> +}
> +}
>  #endif
>  return false;
>  }
> @@ -884,6 +914,55 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address,
> int size,
>  #endif
>  }
>
> +
> +#if !defined(CONFIG_USER_ONLY)
> +static target_ulong riscv_intr_pc(CPURISCVState *env, target_ulong tvec,
> +  target_ulong tvt, bool async, bool clic,
> +  int cause, int mode)
> +{
> +int mode1 = tvec & 0b11, mode2 = tvec & 0b11;
> +CPUState *cs = env_cpu(env);
> +
> +if (!(async || clic)) {
> +return tvec & ~0b11;
>

In CLIC mode, synchronous exception traps always jump to NBASE.


> +}
> +/* bits [1:0] encode mode; 0 = direct, 1 = vectored, 2 >= reserved */
> +switch (mode1) {
> +case 0b00:
> +return tvec & ~0b11;
> +case 0b01:
> +return (tvec & ~0b11) + cause * 4;
> +default:
> +if (env->clic && (mode2 == 0b11)) {
> +/* Non-vectored, clicintattr[i].shv = 0 || cliccfg.nvbits = 0
> */
> +if (!riscv_clic_shv_interrupt(env->clic, mode, cs->cpu_index,
> +  cause)) {
> +/* NBASE = mtvec[XLEN-1:6]<<6 */
> +return tvec & ~0b11;
> +} else {
> +/*
> + * pc := M[TBASE + XLEN/8 * exccode)] & ~1,
> + * TBASE = mtvt[XLEN-1:6]<<6
> + */
> +int size = TARGET_LONG_BITS / 8;
> +target_ulong tbase = (tvt & ~0b11) + size * cause;
> +void *host = tlb_vaddr_to_host(env, tbase, MMU_DATA_LOAD,
> mode);
>

According to spec:
  For permissions-checking purposes, the memory access to retrieve the
  function pointer for vectoring is treated as a load with the privilege
mode
  and interrupt level of the interrupt handler. If there is an access
exception
  on the table load, xepc holds the faulting address. If this was a page
fault,
  the table load can be resumed by returning with xepc pointing to the
  table entry and the trap handler mode bit set.

So retreiving vector function pointer may raise an exception,
but tlb_vaddr_to_host() cannot reflect that.
We also need to include hardware vectoring bit: xinhv
to allow resumable traps on fetches to the trap

Re: [PATCH] target/riscv: pmp: Fix some typos

2021-06-27 Thread Philippe Mathieu-Daudé
On 6/27/21 1:57 PM, Bin Meng wrote:
> %s/CSP/CSR
> %s/thie/the
> 
> Signed-off-by: Bin Meng 
> ---
> 
>  target/riscv/pmp.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



[PATCH 2/2] docs/system: riscv: Add documentation for virt machine

2021-06-27 Thread Bin Meng
This adds detailed documentation for RISC-V `virt` machine,
including the following information:

  - Supported devices
  - Hardware configuration information
  - Boot options
  - Running Linux kernel
  - Running U-Boot

Signed-off-by: Bin Meng 
---

 docs/system/riscv/virt.rst   | 138 +++
 docs/system/target-riscv.rst |   1 +
 2 files changed, 139 insertions(+)
 create mode 100644 docs/system/riscv/virt.rst

diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
new file mode 100644
index 00..3709f05797
--- /dev/null
+++ b/docs/system/riscv/virt.rst
@@ -0,0 +1,138 @@
+'virt' Generic Virtual Platform (``virt``)
+==
+
+The `virt` board is a platform which does not correspond to any real hardware;
+it is designed for use in virtual machines. It is the recommended board type
+if you simply want to run a guest such as Linux and do not care about
+reproducing the idiosyncrasies and limitations of a particular bit of
+real-world hardware.
+
+Supported devices
+-
+
+The ``virt`` machine supports the following devices:
+
+* Up to 8 generic RV32GC/RV64GC cores, with optional extensions
+* Core Local Interruptor (CLINT)
+* Platform-Level Interrupt Controller (PLIC)
+* CFI parallel NOR flash memory
+* 1 NS16550 compatible UART
+* 1 Google Goldfish RTC
+* 1 SiFive Test device
+* 8 virtio-mmio transport devices
+* 1 generic PCIe host bridge
+* The fw_cfg device that allows a guest to obtain data from QEMU
+
+Note that the default CPU is a generic RV32GC/RV64GC. Optional extensions
+can be enabled via command line parameters, e.g.: ``-cpu rv64,x-h=true``
+enables the hypervisor extension for RV64.
+
+Hardware configuration information
+--
+
+The ``virt`` machine automatically generates a device tree blob ("dtb")
+which it passes to the guest, if there is no ``-dtb`` option. This provides
+information about the addresses, interrupt lines and other configuration of
+the various devices in the system. Guest software should discover the devices
+that are present in the generated DTB.
+
+If users want to provide their own DTB, they can use the ``-dtb`` option.
+These DTBs should have the following requirements:
+
+* The number of subnodes of the /cpus node should match QEMU's ``-smp`` option
+* The /memory reg size should match QEMU’s selected ram_size via ``-m``
+* Should contain a node for the CLINT device with a compatible string
+  "riscv,clint0" if using with OpenSBI BIOS images
+
+Boot options
+
+
+The ``virt`` machine can start using the standard -kernel functionality
+for loading a Linux kernel, a VxWorks kernel, an S-mode U-Boot bootloader
+with the default OpenSBI firmware image as the -bios. It also supports
+the recommended RISC-V bootflow: U-Boot SPL (M-mode) loads OpenSBI fw_dynamic
+firmware and U-Boot proper (S-mode), using the standard -bios functionality.
+
+Running Linux kernel
+
+
+Linux mainline v5.12 release is tested at the time of writing. To build a
+Linux mainline kernel that can be booted by the ``virt`` machine in
+64-bit mode, simply configure the kernel using the defconfig configuration:
+
+.. code-block:: bash
+
+  $ export ARCH=riscv
+  $ export CROSS_COMPILE=riscv64-linux-
+  $ make defconfig
+  $ make
+
+To boot the newly built Linux kernel in QEMU with the ``virt`` machine:
+
+.. code-block:: bash
+
+  $ qemu-system-riscv64 -M virt -smp 4 -m 2G \
+  -display none -serial stdio \
+  -kernel arch/riscv/boot/Image \
+  -initrd /path/to/rootfs.cpio \
+  -append "root=/dev/ram"
+
+To build a Linux mainline kernel that can be booted by the ``virt`` machine
+in 32-bit mode, use the rv32_defconfig configuration. A patch is required to
+fix the 32-bit boot issue for Linux kernel v5.12.
+
+.. code-block:: bash
+
+  $ export ARCH=riscv
+  $ export CROSS_COMPILE=riscv64-linux-
+  $ curl 
https://patchwork.kernel.org/project/linux-riscv/patch/20210627135117.28641-1-bmeng...@gmail.com/mbox/
 > riscv.patch
+  $ git am riscv.patch
+  $ make rv32_defconfig
+  $ make
+
+Replace ``qemu-system-riscv64`` with ``qemu-system-riscv32`` in the command
+line above to boot the 32-bit Linux kernel. A rootfs image containing 32-bit
+applications shall be used in order for kernel to boot to user space.
+
+Running U-Boot
+--
+
+U-Boot mainline v2021.04 release is tested at the time of writing. To build an
+S-mode U-Boot bootloader that can be booted by the ``virt`` machine, use
+the qemu-riscv64_smode_defconfig with similar commands as described above for 
Linux:
+
+.. code-block:: bash
+
+  $ export CROSS_COMPILE=riscv64-linux-
+  $ make qemu-riscv64_smode_defconfig
+
+Boot the 64-bit U-Boot S-mode image directly:
+
+.. code-block:: bash
+
+  $ qemu-system-riscv64 -M virt -smp 4 -m 2G \
+  -display none -serial stdio \
+  -kernel /path/to/u-boot.bin
+
+To test booting U-Boot SPL which in M-mode, which in turn load

[PATCH 1/2] docs/system: riscv: Fix CLINT name in the sifive_u doc

2021-06-27 Thread Bin Meng
It's Core *Local* Interruptor, not 'Level'.

Signed-off-by: Bin Meng 
---

 docs/system/riscv/sifive_u.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/system/riscv/sifive_u.rst b/docs/system/riscv/sifive_u.rst
index 32d0a1b85d..01108b5ecc 100644
--- a/docs/system/riscv/sifive_u.rst
+++ b/docs/system/riscv/sifive_u.rst
@@ -11,7 +11,7 @@ The ``sifive_u`` machine supports the following devices:
 
 * 1 E51 / E31 core
 * Up to 4 U54 / U34 cores
-* Core Level Interruptor (CLINT)
+* Core Local Interruptor (CLINT)
 * Platform-Level Interrupt Controller (PLIC)
 * Power, Reset, Clock, Interrupt (PRCI)
 * L2 Loosely Integrated Memory (L2-LIM)
-- 
2.25.1




Re: [PATCH v2] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Ilya Dryomov
On Sun, Jun 27, 2021 at 1:46 PM Or Ozeri  wrote:
>
> Starting from ceph Pacific, RBD has built-in support for image-level 
> encryption.
> Currently supported formats are LUKS version 1 and 2.
>
> There are 2 new relevant librbd APIs for controlling encryption, both expect 
> an
> open image context:
>
> rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> rbd_encryption_load: loads encryptor/decryptor to the image IO stack
>
> This commit extends the qemu rbd driver API to support the above.
>
> Signed-off-by: Or Ozeri 
> ---
> v2: handle encryption info only for the case where encryption is not loaded
> ---
>  block/rbd.c  | 361 ++-
>  qapi/block-core.json | 110 -
>  2 files changed, 465 insertions(+), 6 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index f098a89c7b..8edd1be49e 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -73,6 +73,18 @@
>  #define LIBRBD_USE_IOVEC 0
>  #endif
>
> +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> +
> +static const char rbd_luks_header_verification[
> +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> +};
> +
> +static const char rbd_luks2_header_verification[
> +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> +};
> +
>  typedef enum {
>  RBD_AIO_READ,
>  RBD_AIO_WRITE,
> @@ -341,6 +353,203 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
>  }
>  }
>
> +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> +static int qemu_rbd_convert_luks_options(
> +RbdEncryptionOptionsLUKSBase *luks_opts,
> +char **passphrase,
> +size_t *passphrase_len,
> +Error **errp)
> +{
> +return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t 
> **)passphrase,
> + passphrase_len, errp);
> +}
> +
> +static int qemu_rbd_convert_luks_create_options(
> +RbdEncryptionCreateOptionsLUKSBase *luks_opts,
> +rbd_encryption_algorithm_t *alg,
> +char **passphrase,
> +size_t *passphrase_len,
> +Error **errp)
> +{
> +int r = 0;
> +
> +r = qemu_rbd_convert_luks_options(
> +qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
> +passphrase, passphrase_len, errp);
> +if (r < 0) {
> +return r;
> +}
> +
> +if (luks_opts->has_cipher_alg) {
> +switch (luks_opts->cipher_alg) {
> +case QCRYPTO_CIPHER_ALG_AES_128: {
> +*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
> +break;
> +}
> +case QCRYPTO_CIPHER_ALG_AES_256: {
> +*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
> +break;
> +}
> +default: {
> +r = -ENOTSUP;
> +error_setg_errno(errp, -r, "unknown encryption algorithm: 
> %u",
> + luks_opts->cipher_alg);
> +return r;
> +}
> +}
> +} else {
> +/* default alg */
> +*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
> +}
> +
> +return 0;
> +}
> +
> +static int qemu_rbd_encryption_format(rbd_image_t image,
> +  RbdEncryptionCreateOptions *encrypt,
> +  Error **errp)
> +{
> +int r = 0;
> +g_autofree char *passphrase = NULL;
> +size_t passphrase_len;
> +rbd_encryption_format_t format;
> +rbd_encryption_options_t opts;
> +rbd_encryption_luks1_format_options_t luks_opts;
> +rbd_encryption_luks2_format_options_t luks2_opts;
> +size_t opts_size;
> +uint64_t raw_size, effective_size;
> +
> +r = rbd_get_size(image, &raw_size);
> +if (r < 0) {
> +error_setg_errno(errp, -r, "cannot get raw image size");
> +return r;
> +}
> +
> +switch (encrypt->format) {
> +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
> +memset(&luks_opts, 0, sizeof(luks_opts));
> +format = RBD_ENCRYPTION_FORMAT_LUKS1;
> +opts = &luks_opts;
> +opts_size = sizeof(luks_opts);
> +r = qemu_rbd_convert_luks_create_options(
> +
> qapi_RbdEncryptionCreateOptionsLUKS_base(&encrypt->u.luks),
> +&luks_opts.alg, &passphrase, &passphrase_len, errp);
> +if (r < 0) {
> +return r;
> +}
> +luks_opts.passphrase = passphrase;
> +luks_opts.passphrase_size = passphrase_len;
> +break;
> +}
> +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
> +memset(&luks2_opts, 0, sizeof(luks2_opts));
> +format = RBD_ENCRYPTION_FORMAT_LUKS2;
> +opts = &luks2_opts;
> +opts_size = sizeof(luks2_opts);
> +r = qemu_rbd_convert_luks_create_options(
> +qapi_RbdEncryptionCreateOptionsLUKS2_base(
> 

Re: [RFC PATCH 11/11] target/riscv: Update interrupt return in CLIC mode

2021-06-27 Thread Frank Chang
LIU Zhiwei  於 2021年4月9日 週五 下午3:55寫道:

> When a vectored interrupt is selected and serviced, the hardware will
> automatically clear the corresponding pending bit in edge-triggered mode.
> This may lead to a lower priviledge interrupt pending forever.
>
> Therefore when interrupts return, pull a pending interrupt to service.
>
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/op_helper.c | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
> index 1eddcb94de..42563b22ba 100644
> --- a/target/riscv/op_helper.c
> +++ b/target/riscv/op_helper.c
> @@ -24,6 +24,10 @@
>  #include "exec/exec-all.h"
>  #include "exec/helper-proto.h"
>
> +#if !defined(CONFIG_USER_ONLY)
> +#include "hw/intc/riscv_clic.h"
> +#endif
> +
>  /* Exceptions processing helpers */
>  void QEMU_NORETURN riscv_raise_exception(CPURISCVState *env,
>uint32_t exception, uintptr_t
> pc)
> @@ -130,6 +134,17 @@ target_ulong helper_sret(CPURISCVState *env,
> target_ulong cpu_pc_deb)
>  mstatus = set_field(mstatus, MSTATUS_SPIE, 1);
>  mstatus = set_field(mstatus, MSTATUS_SPP, PRV_U);
>  env->mstatus = mstatus;
> +
> +if (riscv_clic_is_clic_mode(env)) {
> +CPUState *cs = env_cpu(env);
> +target_ulong spil = get_field(env->scause, SCAUSE_SPIL);
> +env->mintstatus = set_field(env->mintstatus, MINTSTATUS_SIL,
> spil);
> +env->scause = set_field(env->scause, SCAUSE_SPIE, 0);
>

Should scause.spie set to 1?


> +env->scause = set_field(env->scause, SCAUSE_SPP, PRV_U);
> +qemu_mutex_lock_iothread();
> +riscv_clic_get_next_interrupt(env->clic, cs->cpu_index);
> +qemu_mutex_unlock_iothread();
> +}
>  }
>
>  riscv_cpu_set_mode(env, prev_priv);
> @@ -172,6 +187,16 @@ target_ulong helper_mret(CPURISCVState *env,
> target_ulong cpu_pc_deb)
>  riscv_cpu_set_virt_enabled(env, prev_virt);
>  }
>
> +if (riscv_clic_is_clic_mode(env)) {
> +CPUState *cs = env_cpu(env);
> +target_ulong mpil = get_field(env->mcause, MCAUSE_MPIL);
> +env->mintstatus = set_field(env->mintstatus, MINTSTATUS_MIL,
> mpil);
> +env->mcause = set_field(env->mcause, MCAUSE_MPIE, 0);
>

Should mcause.mpie set to 1?
  The xcause.xpp and xcause.xpie fields are modified following the behavior
  previously defined for xstatus.xpp and xstatus.xpie respectively.

RISC-V Privilege spec:
  When executing an xRET instruction, xPIE is set to 1.

Regards,
Frank Chang


> +env->mcause = set_field(env->mcause, MCAUSE_MPP, PRV_U);
> +qemu_mutex_lock_iothread();
> +riscv_clic_get_next_interrupt(env->clic, cs->cpu_index);
> +qemu_mutex_unlock_iothread();
> +}
>  return retpc;
>  }
>
> --
> 2.25.1
>
>
>


[PATCH v2] target/riscv: csr: Remove redundant check in fp csr read/write routines

2021-06-27 Thread Bin Meng
The following check:

if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
return -RISCV_EXCP_ILLEGAL_INST;
}

is redundant in fflags/frm/fcsr read/write routines, as the check was
already done in fs().

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v2:
- rebase on qemu/master

 target/riscv/csr.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index fe5628fea6..62b968326c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -215,11 +215,6 @@ static RISCVException epmp(CPURISCVState *env, int csrno)
 static RISCVException read_fflags(CPURISCVState *env, int csrno,
   target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-#endif
 *val = riscv_cpu_get_fflags(env);
 return RISCV_EXCP_NONE;
 }
@@ -228,9 +223,6 @@ static RISCVException write_fflags(CPURISCVState *env, int 
csrno,
target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 riscv_cpu_set_fflags(env, val & (FSR_AEXC >> FSR_AEXC_SHIFT));
@@ -240,11 +232,6 @@ static RISCVException write_fflags(CPURISCVState *env, int 
csrno,
 static RISCVException read_frm(CPURISCVState *env, int csrno,
target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-#endif
 *val = env->frm;
 return RISCV_EXCP_NONE;
 }
@@ -253,9 +240,6 @@ static RISCVException write_frm(CPURISCVState *env, int 
csrno,
 target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = val & (FSR_RD >> FSR_RD_SHIFT);
@@ -265,11 +249,6 @@ static RISCVException write_frm(CPURISCVState *env, int 
csrno,
 static RISCVException read_fcsr(CPURISCVState *env, int csrno,
 target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-#endif
 *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
 | (env->frm << FSR_RD_SHIFT);
 if (vs(env, csrno) >= 0) {
@@ -283,9 +262,6 @@ static RISCVException write_fcsr(CPURISCVState *env, int 
csrno,
  target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
-- 
2.25.1




[PATCH] target/riscv: pmp: Fix some typos

2021-06-27 Thread Bin Meng
%s/CSP/CSR
%s/thie/the

Signed-off-by: Bin Meng 
---

 target/riscv/pmp.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 82ed020b10..54abf42583 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -456,7 +456,7 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 }
 
 /*
- * Handle a write to a pmpcfg CSP
+ * Handle a write to a pmpcfg CSR
  */
 void pmpcfg_csr_write(CPURISCVState *env, uint32_t reg_index,
 target_ulong val)
@@ -483,7 +483,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 
 
 /*
- * Handle a read from a pmpcfg CSP
+ * Handle a read from a pmpcfg CSR
  */
 target_ulong pmpcfg_csr_read(CPURISCVState *env, uint32_t reg_index)
 {
@@ -502,7 +502,7 @@ target_ulong pmpcfg_csr_read(CPURISCVState *env, uint32_t 
reg_index)
 
 
 /*
- * Handle a write to a pmpaddr CSP
+ * Handle a write to a pmpaddr CSR
  */
 void pmpaddr_csr_write(CPURISCVState *env, uint32_t addr_index,
 target_ulong val)
@@ -540,7 +540,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 
 
 /*
- * Handle a read from a pmpaddr CSP
+ * Handle a read from a pmpaddr CSR
  */
 target_ulong pmpaddr_csr_read(CPURISCVState *env, uint32_t addr_index)
 {
@@ -593,7 +593,7 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
 
 /*
  * Calculate the TLB size if the start address or the end address of
- * PMP entry is presented in thie TLB page.
+ * PMP entry is presented in the TLB page.
  */
 static target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
  target_ulong tlb_sa, target_ulong tlb_ea)
-- 
2.25.1




[PATCH v2] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Or Ozeri
Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.

There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:

rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack

This commit extends the qemu rbd driver API to support the above.

Signed-off-by: Or Ozeri 
---
v2: handle encryption info only for the case where encryption is not loaded
---
 block/rbd.c  | 361 ++-
 qapi/block-core.json | 110 -
 2 files changed, 465 insertions(+), 6 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f098a89c7b..8edd1be49e 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -73,6 +73,18 @@
 #define LIBRBD_USE_IOVEC 0
 #endif
 
+#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
+
+static const char rbd_luks_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
+};
+
+static const char rbd_luks2_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
+};
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -341,6 +353,203 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION
+static int qemu_rbd_convert_luks_options(
+RbdEncryptionOptionsLUKSBase *luks_opts,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t **)passphrase,
+ passphrase_len, errp);
+}
+
+static int qemu_rbd_convert_luks_create_options(
+RbdEncryptionCreateOptionsLUKSBase *luks_opts,
+rbd_encryption_algorithm_t *alg,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+int r = 0;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
+passphrase, passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+
+if (luks_opts->has_cipher_alg) {
+switch (luks_opts->cipher_alg) {
+case QCRYPTO_CIPHER_ALG_AES_128: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
+break;
+}
+case QCRYPTO_CIPHER_ALG_AES_256: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+break;
+}
+default: {
+r = -ENOTSUP;
+error_setg_errno(errp, -r, "unknown encryption algorithm: %u",
+ luks_opts->cipher_alg);
+return r;
+}
+}
+} else {
+/* default alg */
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+}
+
+return 0;
+}
+
+static int qemu_rbd_encryption_format(rbd_image_t image,
+  RbdEncryptionCreateOptions *encrypt,
+  Error **errp)
+{
+int r = 0;
+g_autofree char *passphrase = NULL;
+size_t passphrase_len;
+rbd_encryption_format_t format;
+rbd_encryption_options_t opts;
+rbd_encryption_luks1_format_options_t luks_opts;
+rbd_encryption_luks2_format_options_t luks2_opts;
+size_t opts_size;
+uint64_t raw_size, effective_size;
+
+r = rbd_get_size(image, &raw_size);
+if (r < 0) {
+error_setg_errno(errp, -r, "cannot get raw image size");
+return r;
+}
+
+switch (encrypt->format) {
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
+memset(&luks_opts, 0, sizeof(luks_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS1;
+opts = &luks_opts;
+opts_size = sizeof(luks_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS_base(&encrypt->u.luks),
+&luks_opts.alg, &passphrase, &passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+luks_opts.passphrase = passphrase;
+luks_opts.passphrase_size = passphrase_len;
+break;
+}
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
+memset(&luks2_opts, 0, sizeof(luks2_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS2;
+opts = &luks2_opts;
+opts_size = sizeof(luks2_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS2_base(
+&encrypt->u.luks2),
+&luks2_opts.alg, &passphrase, &passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+luks2_opts.passphrase = passphrase;
+luks2_opts.passphrase_size = passphrase_len;
+break;
+}
+de

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Ilya Dryomov
On Sun, Jun 27, 2021 at 1:09 PM Or Ozeri  wrote:
>
> Should I still leave the encryption format part of the state, and just not 
> report it? or should I also remove it from the state?

I'd remove it, reverting to the previous version of the patch
except instead of fetching the size with rbd_get_size() and storing
it in a local variable keep using s->image_size.

Thanks,

Ilya



Re: [PATCH v3 1/1] yank: Unregister function when using TLS migration

2021-06-27 Thread Alexander Graf


On 14.06.21 13:57, Dr. David Alan Gilbert wrote:
> cc'ing in qemu-stable - I think we'd probably want this on 6.0
> (It's currently merged as 7de2e8565335c13fb3516cddbe2e40e366cce273 ).
> Although you'll probably also want the missing dependency fix
> Philippe is working (See: 
> Mathieu- ( 42) [RFC PATCH] migration: Add missing dependency on GNUTLS )


Current master does not compile for me anymore (on macOS) due to this
change. Can we please either disable yank support and revert this patch,
pick the GNUTLS dependency patch you refer to quickly and work on a real
fix afterwards or get the proposed fix in the "missing dependency on
GNUTLS" discussion done quickly?

Having a broken tree is going to make bisection super painful later.


Alex





RE: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Or Ozeri
Should I still leave the encryption format part of the state, and just not report it? or should I also remove it from the state?-"Ilya Dryomov"  wrote: -To: "Or Ozeri" From: "Ilya Dryomov" Date: 06/27/2021 02:00PMCc: qemu-devel@nongnu.org, qemu-bl...@nongnu.org, kw...@redhat.com, "Mykola Golub" , "Danny Harnik" , "Daniel P. Berrangé" Subject: [EXTERNAL] Re: [PATCH] block/rbd: Add support for rbd image encryptionOn Sun, Jun 27, 2021 at 10:44 AM Or Ozeri  wrote:>> Ilya,>> I fixed according to your suggestions, except for the passphrase zeroing.> Looks like it's not a one-liner, but rather a long list of ifdefs to try and cover all possible platforms/compilers (this is what I've seen they do in k5-int.h).> I didn't want to copy this into rbd.c.> I guess that the right solution would be adding a qemu utility function (not sure where exactly), but anyways perhaps this, like the changes I previously made to raw_format.c, should be made in different patch.Hi Or,Yes, given that there doesn't seem to be an existing straightforwardAPI for it, I don't think it is a blocker.  Just something to keep inmind.You also implemented one change that I didn't suggest -- storingthe encryption status in BDRVRBDState.  BTW it is a good practiceto include the version in the subject (e.g. [PATCH v3] ...) anda per-version changelog in the description.The way the encryption format is reported in this patch actually begsmore questions because now I think we need to differentiate between anencrypted image for which the encryption profile has been loaded andone for which it hasn't been loaded and probably also report the rawsize...The previous approach of reporting the encryption format only for "raw"images was a good starting point IMO.  I'd keep the bit that switchesfrom rbd_get_size() to s->image_size and drop everything else for now.>> Thanks,> Or>> -"Or Ozeri"  wrote: -> To: qemu-devel@nongnu.org> From: "Or Ozeri" > Date: 06/27/2021 11:31AM> Cc: qemu-bl...@nongnu.org, o...@il.ibm.com, kw...@redhat.com, to.my.troc...@gmail.com, dan...@il.ibm.com, berra...@redhat.com, idryo...@gmail.com> Subject: [PATCH] block/rbd: Add support for rbd image encryption>> Starting from ceph Pacific, RBD has built-in support for image-level encryption.> Currently supported formats are LUKS version 1 and 2.>> There are 2 new relevant librbd APIs for controlling encryption, both expect an> open image context:>> rbd_encryption_format: formats an image (i.e. writes the LUKS header)> rbd_encryption_load: loads encryptor/decryptor to the image IO stack>> This commit extends the qemu rbd driver API to support the above.>> Signed-off-by: Or Ozeri > --->  block/rbd.c          | 380 ++->  qapi/block-core.json | 110 ->  2 files changed, 484 insertions(+), 6 deletions(-)>> diff --git a/block/rbd.c b/block/rbd.c> index f098a89c7b..dadecaf3da 100644> --- a/block/rbd.c> +++ b/block/rbd.c> @@ -73,6 +73,18 @@>  #define LIBRBD_USE_IOVEC 0>  #endif>> +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8> +> +static const char rbd_luks_header_verification[> +        RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {> +    'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1> +};> +> +static const char rbd_luks2_header_verification[> +        RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {> +    'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2> +};> +>  typedef enum {>      RBD_AIO_READ,>      RBD_AIO_WRITE,> @@ -106,6 +118,7 @@ typedef struct BDRVRBDState {>      char *snap;>      char *namespace;>      uint64_t image_size;> +    ImageInfoSpecificRbd image_info;>  } BDRVRBDState;>>  static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,> @@ -341,6 +354,207 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)>      }>  }>> +#ifdef LIBRBD_SUPPORTS_ENCRYPTION> +static int qemu_rbd_convert_luks_options(> +        RbdEncryptionOptionsLUKSBase *luks_opts,> +        char **passphrase,> +        size_t *passphrase_len,> +        Error **errp)> +{> +    return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t **)passphrase,> +                                 passphrase_len, errp);> +}> +> +static int qemu_rbd_convert_luks_create_options(> +        RbdEncryptionCreateOptionsLUKSBase *luks_opts,> +        rbd_encryption_algorithm_t *alg,> +        char **passphrase,> +        size_t *passphrase_len,> +        Error **errp)> +{> +    int r = 0;> +> +    r = qemu_rbd_convert_luks_options(> +            qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),> +            passphrase, passphrase_len, errp);> +    if (r < 0) {> +        return r;> +    }> +> +    if (luks_opts->has_cipher_alg) {> +        switch (luks_opts->cipher_alg) {> +            case QCRYPTO_CIPHER_ALG_AES_128: {> +                *alg = RBD_ENCRYPTION_ALGORITHM_AES

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Ilya Dryomov
On Sun, Jun 27, 2021 at 10:44 AM Or Ozeri  wrote:
>
> Ilya,
>
> I fixed according to your suggestions, except for the passphrase zeroing.
> Looks like it's not a one-liner, but rather a long list of ifdefs to try and 
> cover all possible platforms/compilers (this is what I've seen they do in 
> k5-int.h).
> I didn't want to copy this into rbd.c.
> I guess that the right solution would be adding a qemu utility function (not 
> sure where exactly), but anyways perhaps this, like the changes I previously 
> made to raw_format.c, should be made in different patch.

Hi Or,

Yes, given that there doesn't seem to be an existing straightforward
API for it, I don't think it is a blocker.  Just something to keep in
mind.

You also implemented one change that I didn't suggest -- storing
the encryption status in BDRVRBDState.  BTW it is a good practice
to include the version in the subject (e.g. [PATCH v3] ...) and
a per-version changelog in the description.

The way the encryption format is reported in this patch actually begs
more questions because now I think we need to differentiate between an
encrypted image for which the encryption profile has been loaded and
one for which it hasn't been loaded and probably also report the raw
size...

The previous approach of reporting the encryption format only for "raw"
images was a good starting point IMO.  I'd keep the bit that switches
from rbd_get_size() to s->image_size and drop everything else for now.

>
> Thanks,
> Or
>
> -"Or Ozeri"  wrote: -
> To: qemu-devel@nongnu.org
> From: "Or Ozeri" 
> Date: 06/27/2021 11:31AM
> Cc: qemu-bl...@nongnu.org, o...@il.ibm.com, kw...@redhat.com, 
> to.my.troc...@gmail.com, dan...@il.ibm.com, berra...@redhat.com, 
> idryo...@gmail.com
> Subject: [PATCH] block/rbd: Add support for rbd image encryption
>
> Starting from ceph Pacific, RBD has built-in support for image-level 
> encryption.
> Currently supported formats are LUKS version 1 and 2.
>
> There are 2 new relevant librbd APIs for controlling encryption, both expect 
> an
> open image context:
>
> rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> rbd_encryption_load: loads encryptor/decryptor to the image IO stack
>
> This commit extends the qemu rbd driver API to support the above.
>
> Signed-off-by: Or Ozeri 
> ---
>  block/rbd.c  | 380 ++-
>  qapi/block-core.json | 110 -
>  2 files changed, 484 insertions(+), 6 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index f098a89c7b..dadecaf3da 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -73,6 +73,18 @@
>  #define LIBRBD_USE_IOVEC 0
>  #endif
>
> +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> +
> +static const char rbd_luks_header_verification[
> +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> +};
> +
> +static const char rbd_luks2_header_verification[
> +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> +};
> +
>  typedef enum {
>  RBD_AIO_READ,
>  RBD_AIO_WRITE,
> @@ -106,6 +118,7 @@ typedef struct BDRVRBDState {
>  char *snap;
>  char *namespace;
>  uint64_t image_size;
> +ImageInfoSpecificRbd image_info;
>  } BDRVRBDState;
>
>  static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
> @@ -341,6 +354,207 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
>  }
>  }
>
> +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> +static int qemu_rbd_convert_luks_options(
> +RbdEncryptionOptionsLUKSBase *luks_opts,
> +char **passphrase,
> +size_t *passphrase_len,
> +Error **errp)
> +{
> +return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t 
> **)passphrase,
> + passphrase_len, errp);
> +}
> +
> +static int qemu_rbd_convert_luks_create_options(
> +RbdEncryptionCreateOptionsLUKSBase *luks_opts,
> +rbd_encryption_algorithm_t *alg,
> +char **passphrase,
> +size_t *passphrase_len,
> +Error **errp)
> +{
> +int r = 0;
> +
> +r = qemu_rbd_convert_luks_options(
> +qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
> +passphrase, passphrase_len, errp);
> +if (r < 0) {
> +return r;
> +}
> +
> +if (luks_opts->has_cipher_alg) {
> +switch (luks_opts->cipher_alg) {
> +case QCRYPTO_CIPHER_ALG_AES_128: {
> +*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
> +break;
> +}
> +case QCRYPTO_CIPHER_ALG_AES_256: {
> +*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
> +break;
> +}
> +default: {
> +r = -ENOTSUP;
> +error_setg_errno(errp, -r, "unknown encryption algorithm: 
> %u",
> + luks_opts->cipher_alg);
> +return r;
> +}
> +}

Re: [RFC PATCH 08/11] target/riscv: Update CSR xnxti in CLIC mode

2021-06-27 Thread Frank Chang
LIU Zhiwei  於 2021年4月9日 週五 下午3:52寫道:

> The CSR can be used by software to service the next horizontal interrupt
> when it has greater level than the saved interrupt context
> (held in xcause`.pil`) and greater level than the interrupt threshold of
> the corresponding privilege mode,
>
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/cpu_bits.h |  16 ++
>  target/riscv/csr.c  | 114 
>  2 files changed, 130 insertions(+)
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 7922097776..494e41edc9 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -166,6 +166,7 @@
>  #define CSR_MCAUSE  0x342
>  #define CSR_MTVAL   0x343
>  #define CSR_MIP 0x344
> +#define CSR_MNXTI   0x345 /* clic-spec-draft */
>  #define CSR_MINTSTATUS  0x346 /* clic-spec-draft */
>  #define CSR_MINTTHRESH  0x347 /* clic-spec-draft */
>
> @@ -187,6 +188,7 @@
>  #define CSR_SCAUSE  0x142
>  #define CSR_STVAL   0x143
>  #define CSR_SIP 0x144
> +#define CSR_SNXTI   0x145 /* clic-spec-draft */
>  #define CSR_SINTSTATUS  0x146 /* clic-spec-draft */
>  #define CSR_SINTTHRESH  0x147 /* clic-spec-draft */
>
> @@ -596,10 +598,24 @@
>  #define MINTSTATUS_SIL 0xff00 /* sil[7:0] */
>  #define MINTSTATUS_UIL 0x00ff /* uil[7:0] */
>
> +/* mcause */
> +#define MCAUSE_MINHV   0x4000 /* minhv */
> +#define MCAUSE_MPP 0x3000 /* mpp[1:0] */
> +#define MCAUSE_MPIE0x0800 /* mpie */
> +#define MCAUSE_MPIL0x00ff /* mpil[7:0] */
> +#define MCAUSE_EXCCODE 0x0fff /* exccode[11:0] */
> +
>  /* sintstatus */
>  #define SINTSTATUS_SIL 0xff00 /* sil[7:0] */
>  #define SINTSTATUS_UIL 0x00ff /* uil[7:0] */
>
> +/* scause */
> +#define SCAUSE_SINHV   0x4000 /* sinhv */
> +#define SCAUSE_SPP 0x1000 /* spp */
> +#define SCAUSE_SPIE0x0800 /* spie */
> +#define SCAUSE_SPIL0x00ff /* spil[7:0] */
> +#define SCAUSE_EXCCODE 0x0fff /* exccode[11:0] */
> +
>  /* MIE masks */
>  #define MIE_SEIE   (1 << IRQ_S_EXT)
>  #define MIE_UEIE   (1 << IRQ_U_EXT)
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index e1b77f..72cba080bf 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -774,6 +774,80 @@ static int rmw_mip(CPURISCVState *env, int csrno,
> target_ulong *ret_value,
>  return 0;
>  }
>
> +static bool get_xnxti_status(CPURISCVState *env)
> +{
> +CPUState *cs = env_cpu(env);
> +int clic_irq, clic_priv, clic_il, pil;
> +
> +if (!env->exccode) { /* No interrupt */
> +return false;
> +}
> +/* The system is not in a CLIC mode */
> +if (!riscv_clic_is_clic_mode(env)) {
> +return false;
> +} else {
> +riscv_clic_decode_exccode(env->exccode, &clic_priv, &clic_il,
> +  &clic_irq);
> +
> +if (env->priv == PRV_M) {
> +pil = MAX(get_field(env->mcause, MCAUSE_MPIL),
> env->mintthresh);

+} else if (env->priv == PRV_S) {
> +pil = MAX(get_field(env->scause, SCAUSE_SPIL),
> env->sintthresh);
>

Same here, for v0.8 CLIC[1],
both mintthresh and sintthresh valuse should be retrieved from
CLIC mintthresh memory-mapped register.

+} else {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "CSR: rmw xnxti with unsupported mode\n");
> +exit(1);
> +}
> +
> +if ((clic_priv != env->priv) || /* No horizontal interrupt */
> +(clic_il <= pil) || /* No higher level interrupt */
> +(riscv_clic_shv_interrupt(env->clic, clic_priv, cs->cpu_index,
> +  clic_irq))) { /* CLIC vector mode */
> +return false;
> +} else {
> +return true;
> +}
> +}
> +}
> +
> +static int rmw_mnxti(CPURISCVState *env, int csrno, target_ulong
> *ret_value,
> + target_ulong new_value, target_ulong write_mask)
> +{
> +int clic_priv, clic_il, clic_irq;
> +bool ready;
> +CPUState *cs = env_cpu(env);
> +if (write_mask) {
> +env->mstatus |= new_value & (write_mask & 0b1);
> +}
> +
> +qemu_mutex_lock_iothread();
> +ready = get_xnxti_status(env);
> +if (ready) {
> +riscv_clic_decode_exccode(env->exccode, &clic_priv, &clic_il,
> +  &clic_irq);
> +if (write_mask) {
> +bool edge = riscv_clic_edge_triggered(env->clic, clic_priv,
> +  cs->cpu_index,
> clic_irq);
> +if (edge) {
>

Re: [RFC PATCH 06/11] target/riscv: Update CSR xtvec in CLIC mode

2021-06-27 Thread Frank Chang
LIU Zhiwei  於 2021年4月9日 週五 下午3:51寫道:

> The new CLIC interrupt-handling mode is encoded as a new state in the
> existing WARL xtvec register, where the low two bits of are 11.
>
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/csr.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index f6c84b9fe4..39ff72041a 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -637,9 +637,18 @@ static int read_mtvec(CPURISCVState *env, int csrno,
> target_ulong *val)
>
>  static int write_mtvec(CPURISCVState *env, int csrno, target_ulong val)
>  {
> -/* bits [1:0] encode mode; 0 = direct, 1 = vectored, 2 >= reserved */
> +/*
> + * bits [1:0] encode mode; 0 = direct, 1 = vectored, 3 = CLIC,
> + * others reserved
> + */
>  if ((val & 3) < 2) {
>  env->mtvec = val;
> +} else if ((val & 1) && env->clic) {
> +/*
> + * If only CLIC mode is supported, writes to bit 1 are also
> ignored and
> + * it is always set to one. CLIC mode hardwires xtvec bits 2-5 to
> zero.
> + */
> +env->mtvec = ((val & ~0x3f) << 6) | (0b11);
>

Why do we need to left-shift the value 6 bits here?


>  } else {
>  qemu_log_mask(LOG_UNIMP, "CSR_MTVEC: reserved mode not
> supported\n");
>  }
> @@ -837,9 +846,18 @@ static int read_stvec(CPURISCVState *env, int csrno,
> target_ulong *val)
>
>  static int write_stvec(CPURISCVState *env, int csrno, target_ulong val)
>  {
> -/* bits [1:0] encode mode; 0 = direct, 1 = vectored, 2 >= reserved */
> +/*
> + * bits [1:0] encode mode; 0 = direct, 1 = vectored, 3 = CLIC,
> + * others reserved
> + */
>  if ((val & 3) < 2) {
>  env->stvec = val;
> +} else if ((val & 1) && env->clic) {
> +/*
> + * If only CLIC mode is supported, writes to bit 1 are also
> ignored and
> + * it is always set to one. CLIC mode hardwires xtvec bits 2-5 to
> zero.
> + */
> +env->stvec = ((val & ~0x3f) << 6) | (0b11);
>

Same here, why do we need to left-shift the value 6 bits here?

Also, CLIC v0.8 spec[1] doesn't include the change for stvec.
I'm not sure if it's the same as v0.9 to check stvec
when the interrupt is delegated to S-mode in CLIC-mode.

[1] https://github.com/riscv/riscv-fast-interrupt/blob/74f86c3858/clic.adoc

Regards,
Frank Chang

 } else {
>  qemu_log_mask(LOG_UNIMP, "CSR_STVEC: reserved mode not
> supported\n");
>  }
> --
> 2.25.1
>
>
>


Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Or Ozeri
Ilya,I fixed according to your suggestions, except for the passphrase zeroing.Looks like it's not a one-liner, but rather a long list of ifdefs to try and cover all possible platforms/compilers (this is what I've seen they do in k5-int.h).I didn't want to copy this into rbd.c.I guess that the right solution would be adding a qemu utility function (not sure where exactly), but anyways perhaps this, like the changes I previously made to raw_format.c, should be made in different patch.Thanks,Or-"Or Ozeri"  wrote: -To: qemu-devel@nongnu.orgFrom: "Or Ozeri" Date: 06/27/2021 11:31AMCc: qemu-bl...@nongnu.org, o...@il.ibm.com, kw...@redhat.com, to.my.troc...@gmail.com, dan...@il.ibm.com, berra...@redhat.com, idryo...@gmail.comSubject: [PATCH] block/rbd: Add support for rbd image encryptionStarting from ceph Pacific, RBD has built-in support for image-level encryption.Currently supported formats are LUKS version 1 and 2.There are 2 new relevant librbd APIs for controlling encryption, both expect anopen image context:rbd_encryption_format: formats an image (i.e. writes the LUKS header)rbd_encryption_load: loads encryptor/decryptor to the image IO stackThis commit extends the qemu rbd driver API to support the above.Signed-off-by: Or Ozeri --- block/rbd.c          | 380 ++- qapi/block-core.json | 110 - 2 files changed, 484 insertions(+), 6 deletions(-)diff --git a/block/rbd.c b/block/rbd.cindex f098a89c7b..dadecaf3da 100644--- a/block/rbd.c+++ b/block/rbd.c@@ -73,6 +73,18 @@ #define LIBRBD_USE_IOVEC 0 #endif +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8++static const char rbd_luks_header_verification[+        RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {+    'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1+};++static const char rbd_luks2_header_verification[+        RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {+    'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2+};+ typedef enum {     RBD_AIO_READ,     RBD_AIO_WRITE,@@ -106,6 +118,7 @@ typedef struct BDRVRBDState {     char *snap;     char *namespace;     uint64_t image_size;+    ImageInfoSpecificRbd image_info; } BDRVRBDState;  static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,@@ -341,6 +354,207 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)     } } +#ifdef LIBRBD_SUPPORTS_ENCRYPTION+static int qemu_rbd_convert_luks_options(+        RbdEncryptionOptionsLUKSBase *luks_opts,+        char **passphrase,+        size_t *passphrase_len,+        Error **errp)+{+    return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t **)passphrase,+                                 passphrase_len, errp);+}++static int qemu_rbd_convert_luks_create_options(+        RbdEncryptionCreateOptionsLUKSBase *luks_opts,+        rbd_encryption_algorithm_t *alg,+        char **passphrase,+        size_t *passphrase_len,+        Error **errp)+{+    int r = 0;++    r = qemu_rbd_convert_luks_options(+            qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),+            passphrase, passphrase_len, errp);+    if (r < 0) {+        return r;+    }++    if (luks_opts->has_cipher_alg) {+        switch (luks_opts->cipher_alg) {+            case QCRYPTO_CIPHER_ALG_AES_128: {+                *alg = RBD_ENCRYPTION_ALGORITHM_AES128;+                break;+            }+            case QCRYPTO_CIPHER_ALG_AES_256: {+                *alg = RBD_ENCRYPTION_ALGORITHM_AES256;+                break;+            }+            default: {+                r = -ENOTSUP;+                error_setg_errno(errp, -r, "unknown encryption algorithm: %u",+                                 luks_opts->cipher_alg);+                return r;+            }+        }+    } else {+        /* default alg */+        *alg = RBD_ENCRYPTION_ALGORITHM_AES256;+    }++    return 0;+}++static int qemu_rbd_encryption_format(rbd_image_t image,+                                      RbdEncryptionCreateOptions *encrypt,+                                      Error **errp)+{+    int r = 0;+    g_autofree char *passphrase = NULL;+    size_t passphrase_len;+    rbd_encryption_format_t format;+    rbd_encryption_options_t opts;+    rbd_encryption_luks1_format_options_t luks_opts;+    rbd_encryption_luks2_format_options_t luks2_opts;+    size_t opts_size;+    uint64_t raw_size, effective_size;++    r = rbd_get_size(image, &raw_size);+    if (r < 0) {+        error_setg_errno(errp, -r, "cannot get raw image size");+        return r;+    }++    switch (encrypt->format) {+        case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {+            memset(&luks_opts, 0, sizeof(luks_opts));+            format = RBD_ENCRYPTION_FORMAT_LUKS1;+            opts = &luks_opts;+            opts_size = sizeof(luks_opts);+            r = qemu_rbd_convert_luks_create_options(+                    qapi_RbdEncryptionCreateOptionsLUKS_base(&encrypt->u.luks),+                    &luks_opts.alg, &passphrase, &passphrase_len, errp);+            if (r < 

[PATCH] block/rbd: Add support for rbd image encryption

2021-06-27 Thread Or Ozeri
Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.

There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:

rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack

This commit extends the qemu rbd driver API to support the above.

Signed-off-by: Or Ozeri 
---
 block/rbd.c  | 380 ++-
 qapi/block-core.json | 110 -
 2 files changed, 484 insertions(+), 6 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f098a89c7b..dadecaf3da 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -73,6 +73,18 @@
 #define LIBRBD_USE_IOVEC 0
 #endif
 
+#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
+
+static const char rbd_luks_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
+};
+
+static const char rbd_luks2_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
+};
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -106,6 +118,7 @@ typedef struct BDRVRBDState {
 char *snap;
 char *namespace;
 uint64_t image_size;
+ImageInfoSpecificRbd image_info;
 } BDRVRBDState;
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
@@ -341,6 +354,207 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION
+static int qemu_rbd_convert_luks_options(
+RbdEncryptionOptionsLUKSBase *luks_opts,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t **)passphrase,
+ passphrase_len, errp);
+}
+
+static int qemu_rbd_convert_luks_create_options(
+RbdEncryptionCreateOptionsLUKSBase *luks_opts,
+rbd_encryption_algorithm_t *alg,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+int r = 0;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
+passphrase, passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+
+if (luks_opts->has_cipher_alg) {
+switch (luks_opts->cipher_alg) {
+case QCRYPTO_CIPHER_ALG_AES_128: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
+break;
+}
+case QCRYPTO_CIPHER_ALG_AES_256: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+break;
+}
+default: {
+r = -ENOTSUP;
+error_setg_errno(errp, -r, "unknown encryption algorithm: %u",
+ luks_opts->cipher_alg);
+return r;
+}
+}
+} else {
+/* default alg */
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+}
+
+return 0;
+}
+
+static int qemu_rbd_encryption_format(rbd_image_t image,
+  RbdEncryptionCreateOptions *encrypt,
+  Error **errp)
+{
+int r = 0;
+g_autofree char *passphrase = NULL;
+size_t passphrase_len;
+rbd_encryption_format_t format;
+rbd_encryption_options_t opts;
+rbd_encryption_luks1_format_options_t luks_opts;
+rbd_encryption_luks2_format_options_t luks2_opts;
+size_t opts_size;
+uint64_t raw_size, effective_size;
+
+r = rbd_get_size(image, &raw_size);
+if (r < 0) {
+error_setg_errno(errp, -r, "cannot get raw image size");
+return r;
+}
+
+switch (encrypt->format) {
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
+memset(&luks_opts, 0, sizeof(luks_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS1;
+opts = &luks_opts;
+opts_size = sizeof(luks_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS_base(&encrypt->u.luks),
+&luks_opts.alg, &passphrase, &passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+luks_opts.passphrase = passphrase;
+luks_opts.passphrase_size = passphrase_len;
+break;
+}
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
+memset(&luks2_opts, 0, sizeof(luks2_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS2;
+opts = &luks2_opts;
+opts_size = sizeof(luks2_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS2_base(
+&encrypt->u.luks2),
+&luks2_opts.alg, &passphrase, &passphrase_len, errp);
+if (r < 0) {
+return r;
+ 

Re: [RFC PATCH 07/11] target/riscv: Update CSR xtvt in CLIC mode

2021-06-27 Thread Frank Chang
LIU Zhiwei  於 2021年4月9日 週五 下午3:52寫道:

> The xtvt WARL XLEN-bit CSR holds the base address of the trap vector table,
> aligned on a 64-byte or greater power-of-two boundary.
>
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/cpu.h  |  2 ++
>  target/riscv/cpu_bits.h |  2 ++
>  target/riscv/csr.c  | 28 
>  3 files changed, 32 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 9e389d7bbf..b5fd796f98 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -173,11 +173,13 @@ struct CPURISCVState {
>  target_ulong medeleg;
>
>  target_ulong stvec;
> +target_ulong stvt; /* clic-spec */
>  target_ulong sepc;
>  target_ulong scause;
>  target_ulong sintthresh; /* clic-spec */
>
>  target_ulong mtvec;
> +target_ulong mtvt; /* clic-spec */
>  target_ulong mepc;
>  target_ulong mcause;
>  target_ulong mtval;  /* since: priv-1.10.0 */
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 9447801d22..7922097776 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -149,6 +149,7 @@
>  #define CSR_MIE 0x304
>  #define CSR_MTVEC   0x305
>  #define CSR_MCOUNTEREN  0x306
> +#define CSR_MTVT0x307 /* clic-spec-draft */
>
>  /* 32-bit only */
>  #define CSR_MSTATUSH0x310
> @@ -178,6 +179,7 @@
>  #define CSR_SIE 0x104
>  #define CSR_STVEC   0x105
>  #define CSR_SCOUNTEREN  0x106
> +#define CSR_STVT0x107 /* clic-spec-draft */
>
>  /* Supervisor Trap Handling */
>  #define CSR_SSCRATCH0x140
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 39ff72041a..e1b77f 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -667,6 +667,18 @@ static int write_mcounteren(CPURISCVState *env, int
> csrno, target_ulong val)
>  return 0;
>  }
>
> +static int read_mtvt(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +*val = env->mtvt;
> +return 0;
> +}
> +
> +static int write_mtvt(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +env->mtvt = val & ~((1ULL << 6) - 1);
>

mtvt CSR has additional minimum alignment restriction in v0.8 CLIC spec[1]:
  2^ceiling(log2(N)) x 4 bytes, where N is the maximum number of interrupt
sources.


> +return 0;
> +}
> +
>  /* This regiser is replaced with CSR_MCOUNTINHIBIT in 1.11.0 */
>  static int read_mscounteren(CPURISCVState *env, int csrno, target_ulong
> *val)
>  {
> @@ -876,6 +888,18 @@ static int write_scounteren(CPURISCVState *env, int
> csrno, target_ulong val)
>  return 0;
>  }
>
> +static int read_stvt(CPURISCVState *env, int csrno, target_ulong *val)


stvt CSR seems not to exist in v0.8 CLIC spec[1].

[1] https://github.com/riscv/riscv-fast-interrupt/blob/74f86c3858/clic.adoc

Regards,
Frank Chang


>

+{
> +*val = env->stvt;
> +return 0;
> +}
> +
> +static int write_stvt(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +env->stvt = val & ~((1ULL << 6) - 1);
> +return 0;
> +}
> +
>  /* Supervisor Trap Handling */
>  static int read_sscratch(CPURISCVState *env, int csrno, target_ulong *val)
>  {
> @@ -1730,6 +1754,7 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>  [CSR_MHPMCOUNTER31H] = { "mhpmcounter31h", any32,  read_zero },
>
>  /* Machine Mode Core Level Interrupt Controller */
> +[CSR_MTVT] = { "mtvt", clic,  read_mtvt,  write_mtvt  },
>  [CSR_MINTSTATUS] = { "mintstatus", clic,  read_mintstatus },
>  [CSR_MINTTHRESH] = { "mintthresh", clic,  read_mintthresh,
>   write_mintthresh },
> @@ -1739,5 +1764,8 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>  [CSR_SINTTHRESH] = { "sintthresh", clic,  read_sintthresh,
>   write_sintthresh },
>
> +/* Supervisor Mode Core Level Interrupt Controller */
> +[CSR_STVT] = { "stvt", clic,  read_stvt, write_stvt   },
> +
>  #endif /* !CONFIG_USER_ONLY */
>  };
> --
> 2.25.1
>
>
>


Re: [RFC PATCH 02/11] target/riscv: Update CSR xintthresh in CLIC mode

2021-06-27 Thread Frank Chang
Frank Chang  於 2021年6月27日 週日 上午1:23寫道:

> LIU Zhiwei  於 2021年4月9日 週五 下午3:52寫道:
>
>> The interrupt-level threshold (xintthresh) CSR holds an 8-bit field
>> for the threshold level of the associated privilege mode.
>>
>> For horizontal interrupts, only the ones with higher interrupt levels
>> than the threshold level are allowed to preempt.
>>
>> Signed-off-by: LIU Zhiwei 
>>
>
> Reviewed-by: Frank Chang 
>

Sorry, recall that mintthresh description is vague in v0.8 CLIC spec[1].
If mintthresh is a CLIC memory-mapped register in v0.8 CLIC.
Then I think you should restrict the CSR accesses to mintthresh and
sintthresh when CLIC is v0.8.

[1] https://github.com/riscv/riscv-fast-interrupt/blob/74f86c3858/clic.adoc

Regards,
Frank Chang


>
>
>> ---
>>  target/riscv/cpu.h  |  2 ++
>>  target/riscv/cpu_bits.h |  2 ++
>>  target/riscv/csr.c  | 28 
>>  3 files changed, 32 insertions(+)
>>
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index 1a44ca62c7..a5eab26a69 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -160,6 +160,7 @@ struct CPURISCVState {
>>
>>  uint32_t miclaim;
>>  uint32_t mintstatus; /* clic-spec */
>> +target_ulong mintthresh; /* clic-spec */
>>
>>  target_ulong mie;
>>  target_ulong mideleg;
>> @@ -173,6 +174,7 @@ struct CPURISCVState {
>>  target_ulong stvec;
>>  target_ulong sepc;
>>  target_ulong scause;
>> +target_ulong sintthresh; /* clic-spec */
>>
>>  target_ulong mtvec;
>>  target_ulong mepc;
>> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
>> index c4ce6ec3d9..9447801d22 100644
>> --- a/target/riscv/cpu_bits.h
>> +++ b/target/riscv/cpu_bits.h
>> @@ -166,6 +166,7 @@
>>  #define CSR_MTVAL   0x343
>>  #define CSR_MIP 0x344
>>  #define CSR_MINTSTATUS  0x346 /* clic-spec-draft */
>> +#define CSR_MINTTHRESH  0x347 /* clic-spec-draft */
>>
>>  /* Legacy Machine Trap Handling (priv v1.9.1) */
>>  #define CSR_MBADADDR0x343
>> @@ -185,6 +186,7 @@
>>  #define CSR_STVAL   0x143
>>  #define CSR_SIP 0x144
>>  #define CSR_SINTSTATUS  0x146 /* clic-spec-draft */
>> +#define CSR_SINTTHRESH  0x147 /* clic-spec-draft */
>>
>>  /* Legacy Supervisor Trap Handling (priv v1.9.1) */
>>  #define CSR_SBADADDR0x143
>> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
>> index 320b18ab60..4c31364967 100644
>> --- a/target/riscv/csr.c
>> +++ b/target/riscv/csr.c
>> @@ -746,6 +746,18 @@ static int read_mintstatus(CPURISCVState *env, int
>> csrno, target_ulong *val)
>>  return 0;
>>  }
>>
>> +static int read_mintthresh(CPURISCVState *env, int csrno, target_ulong
>> *val)
>> +{
>> +*val = env->mintthresh;
>> +return 0;
>> +}
>> +
>> +static int write_mintthresh(CPURISCVState *env, int csrno, target_ulong
>> val)
>> +{
>> +env->mintthresh = val;
>> +return 0;
>> +}
>> +
>>  /* Supervisor Trap Setup */
>>  static int read_sstatus(CPURISCVState *env, int csrno, target_ulong *val)
>>  {
>> @@ -912,6 +924,18 @@ static int read_sintstatus(CPURISCVState *env, int
>> csrno, target_ulong *val)
>>  return 0;
>>  }
>>
>> +static int read_sintthresh(CPURISCVState *env, int csrno, target_ulong
>> *val)
>> +{
>> +*val = env->sintthresh;
>> +return 0;
>> +}
>> +
>> +static int write_sintthresh(CPURISCVState *env, int csrno, target_ulong
>> val)
>> +{
>> +env->sintthresh = val;
>> +return 0;
>> +}
>> +
>>  /* Supervisor Protection and Translation */
>>  static int read_satp(CPURISCVState *env, int csrno, target_ulong *val)
>>  {
>> @@ -1666,9 +1690,13 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>>
>>  /* Machine Mode Core Level Interrupt Controller */
>>  [CSR_MINTSTATUS] = { "mintstatus", clic,  read_mintstatus },
>> +[CSR_MINTTHRESH] = { "mintthresh", clic,  read_mintthresh,
>> + write_mintthresh },
>>
>>  /* Supervisor Mode Core Level Interrupt Controller */
>>  [CSR_SINTSTATUS] = { "sintstatus", clic,  read_sintstatus },
>> +[CSR_SINTTHRESH] = { "sintthresh", clic,  read_sintthresh,
>> + write_sintthresh },
>>
>>  #endif /* !CONFIG_USER_ONLY */
>>  };
>> --
>> 2.25.1
>>
>>
>>