Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-22 Thread Andy Lutomirski

On 7/21/22 14:19, Sean Christopherson wrote:

On Thu, Jul 21, 2022, Gupta, Pankaj wrote:





I view it as a performance problem because nothing stops KVM from copying from
userspace into the private fd during the SEV ioctl().  What's missing is the
ability for userspace to directly initialze the private fd, which may or may not
avoid an extra memcpy() depending on how clever userspace is.

Can you please elaborate more what you see as a performance problem? And
possible ways to solve it?


Oh, I'm not saying there actually _is_ a performance problem.  What I'm saying 
is
that in-place encryption is not a functional requirement, which means it's 
purely
an optimization, and thus we should other bother supporting in-place encryption
_if_ it would solve a performane bottleneck.


Even if we end up having a performance problem, I think we need to 
understand the workloads that we want to optimize before getting too 
excited about designing a speedup.


In particular, there's (depending on the specific technology, perhaps, 
and also architecture) a possible tradeoff between trying to reduce 
copying and trying to reduce unmapping and the associated flushes.  If a 
user program maps an fd, populates it, and then converts it in place 
into private memory (especially if it doesn't do it in a single shot), 
then that memory needs to get unmapped both from the user mm and 
probably from the kernel direct map.  On the flip side, it's possible to 
imagine an ioctl that does copy-and-add-to-private-fd that uses a 
private mm and doesn't need any TLB IPIs.


All of this is to say that trying to optimize right now seems quite 
premature to me.




[PING^2] linux-user: Passthrough MADV_DONTNEED for certain file mappings

2022-07-22 Thread Ilya Leoshkevich
On Fri, 2022-07-01 at 15:52 +0200, Ilya Leoshkevich wrote:
> This is a follow-up for commit 892a4f6a750a ("linux-user: Add partial
> support for MADV_DONTNEED"), which added passthrough for anonymous
> mappings. File mappings can be handled in a similar manner.
> 
> In order to do that, mark pages, for which mmap() was passed through,
> with PAGE_PASSTHROUGH, and then allow madvise() passthrough for these
> pages as well.
> 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  include/exec/cpu-all.h |  6 ++
>  linux-user/mmap.c  | 25 +
>  2 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index f5bda2c3ca..fbdbc0fdec 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -262,6 +262,12 @@ extern const TargetPageBits target_page;
>  #define PAGE_TARGET_1  0x0200
>  #define PAGE_TARGET_2  0x0400
>  
> +/*
> + * For linux-user, indicates that the page is mapped with the same
> semantics
> + * in both guest and host.
> + */
> +#define PAGE_PASSTHROUGH 0x0080
> +
>  #if defined(CONFIG_USER_ONLY)
>  void page_dump(FILE *f);
>  
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index 4e7a6be6ee..58622a0c15 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -424,7 +424,8 @@ abi_ulong mmap_find_vma(abi_ulong start,
> abi_ulong size, abi_ulong align)
>  abi_long target_mmap(abi_ulong start, abi_ulong len, int
> target_prot,
>   int flags, int fd, abi_ulong offset)
>  {
> -    abi_ulong ret, end, real_start, real_end, retaddr, host_offset,
> host_len;
> +    abi_ulong ret, end, real_start, real_end, retaddr, host_offset,
> host_len,
> +  passthrough_start = -1, passthrough_end = -1;
>  int page_flags, host_prot;
>  
>  mmap_lock();
> @@ -537,6 +538,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong
> len, int target_prot,
>  host_start += offset - host_offset;
>  }
>  start = h2g(host_start);
> +    passthrough_start = start;
> +    passthrough_end = start + len;
>  } else {
>  if (start & ~TARGET_PAGE_MASK) {
>  errno = EINVAL;
> @@ -619,6 +622,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong
> len, int target_prot,
>   host_prot, flags, fd, offset1);
>  if (p == MAP_FAILED)
>  goto fail;
> +    passthrough_start = real_start;
> +    passthrough_end = real_end;
>  }
>  }
>   the_end1:
> @@ -626,7 +631,18 @@ abi_long target_mmap(abi_ulong start, abi_ulong
> len, int target_prot,
>  page_flags |= PAGE_ANON;
>  }
>  page_flags |= PAGE_RESET;
> -    page_set_flags(start, start + len, page_flags);
> +    if (passthrough_start == passthrough_end) {
> +    page_set_flags(start, start + len, page_flags);
> +    } else {
> +    if (start != passthrough_start) {
> +    page_set_flags(start, passthrough_start, page_flags);
> +    }
> +    page_set_flags(passthrough_start, passthrough_end,
> +   page_flags | PAGE_PASSTHROUGH);
> +    if (passthrough_end != start + len) {
> +    page_set_flags(passthrough_end, start + len,
> page_flags);
> +    }
> +    }
>   the_end:
>  trace_target_mmap_complete(start);
>  if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
> @@ -845,7 +861,7 @@ static bool
> can_passthrough_madv_dontneed(abi_ulong start, abi_ulong end)
>  }
>  
>  for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) {
> -    if (!(page_get_flags(addr) & PAGE_ANON)) {
> +    if (!(page_get_flags(addr) & (PAGE_ANON |
> PAGE_PASSTHROUGH))) {
>  return false;
>  }
>  }
> @@ -888,7 +904,8 @@ abi_long target_madvise(abi_ulong start,
> abi_ulong len_in, int advice)
>   *
>   * This breaks MADV_DONTNEED, completely implementing which is
> quite
>   * complicated. However, there is one low-hanging fruit: host-
> page-aligned
> - * anonymous mappings. In this case passthrough is safe, so do
> it.
> + * anonymous mappings or mappings that are known to have the
> same semantics
> + * in the host and the guest. In this case passthrough is safe,
> so do it.
>   */
>  mmap_lock();
>  if ((advice & MADV_DONTNEED) &&

Ping^2:

https://patchew.org/QEMU/20220701135207.2710488-1-...@linux.ibm.com/

Is there still a chance that this can get into QEMU 7.1?



[PATCH v2 2/2] tests/tcg/s390x: Test unaligned accesses to lowcore

2022-07-22 Thread Ilya Leoshkevich
Add a small test to avoid regressions.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/s390x/Makefile.softmmu-target |  9 +
 tests/tcg/s390x/unaligned-lowcore.S | 19 +++
 2 files changed, 28 insertions(+)
 create mode 100644 tests/tcg/s390x/Makefile.softmmu-target
 create mode 100644 tests/tcg/s390x/unaligned-lowcore.S

diff --git a/tests/tcg/s390x/Makefile.softmmu-target 
b/tests/tcg/s390x/Makefile.softmmu-target
new file mode 100644
index 00..a34fa68473
--- /dev/null
+++ b/tests/tcg/s390x/Makefile.softmmu-target
@@ -0,0 +1,9 @@
+S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
+VPATH+=$(S390X_SRC)
+QEMU_OPTS=-action panic=exit-failure -kernel
+
+%: %.S
+   $(CC) -march=z13 -m64 -nostartfiles -static -Wl,-Ttext=0 \
+   -Wl,--build-id=none $< -o $@
+
+TESTS += unaligned-lowcore
diff --git a/tests/tcg/s390x/unaligned-lowcore.S 
b/tests/tcg/s390x/unaligned-lowcore.S
new file mode 100644
index 00..246b517f11
--- /dev/null
+++ b/tests/tcg/s390x/unaligned-lowcore.S
@@ -0,0 +1,19 @@
+.org 0x1D0 /* program new PSW */
+.quad 0x2, 0   /* disabled wait */
+.org 0x200 /* lowcore padding */
+
+.globl _start
+_start:
+lctlg %c0,%c0,_c0
+vst %v0,_unaligned
+lpswe quiesce_psw
+
+.align 8
+quiesce_psw:
+.quad 0x2,0xfff/* see is_special_wait_psw() */
+_c0:
+.quad 0x1006   /* lowcore protection, AFP, VX */
+
+.byte 0
+_unaligned:
+.octa 0
-- 
2.35.3




[PATCH v2 1/2] qapi: Add exit-failure PanicAction

2022-07-22 Thread Ilya Leoshkevich
Currently QEMU exits with code 0 on both panic an shutdown. For tests
it is useful to return 1 on panic, so that it counts as a test
failure.

Introduce a new exit-failure PanicAction that makes main() return
EXIT_FAILURE. Tests can use -action panic=exit-failure option to
activate this behavior.

Signed-off-by: Ilya Leoshkevich 
---
 include/sysemu/sysemu.h |  2 +-
 qapi/run-state.json |  4 +++-
 qemu-options.hx |  2 +-
 softmmu/main.c  |  6 --
 softmmu/runstate.c  | 17 +
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 812f66a31a..31aa45160b 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -103,7 +103,7 @@ void qemu_boot_set(const char *boot_order, Error **errp);
 bool defaults_enabled(void);
 
 void qemu_init(int argc, char **argv, char **envp);
-void qemu_main_loop(void);
+int qemu_main_loop(void);
 void qemu_cleanup(void);
 
 extern QemuOptsList qemu_legacy_drive_opts;
diff --git a/qapi/run-state.json b/qapi/run-state.json
index 6e2162d7b3..d42c370c4f 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -364,10 +364,12 @@
 #
 # @shutdown: Shutdown the VM and exit, according to the shutdown action
 #
+# @exit-failure: Shutdown the VM and exit with nonzero status
+#
 # Since: 6.0
 ##
 { 'enum': 'PanicAction',
-  'data': [ 'pause', 'shutdown', 'none' ] }
+  'data': [ 'pause', 'shutdown', 'exit-failure', 'none' ] }
 
 ##
 # @watchdog-set-action:
diff --git a/qemu-options.hx b/qemu-options.hx
index 79e00916a1..8e17c5064a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4239,7 +4239,7 @@ DEF("action", HAS_ARG, QEMU_OPTION_action,
 "   action when guest reboots [default=reset]\n"
 "-action shutdown=poweroff|pause\n"
 "   action when guest shuts down [default=poweroff]\n"
-"-action panic=pause|shutdown|none\n"
+"-action panic=pause|shutdown|exit-failure|none\n"
 "   action when guest panics [default=shutdown]\n"
 "-action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n"
 "   action when watchdog fires [default=reset]\n",
diff --git a/softmmu/main.c b/softmmu/main.c
index c00432ff09..1b675a8c03 100644
--- a/softmmu/main.c
+++ b/softmmu/main.c
@@ -32,11 +32,13 @@
 
 int qemu_main(int argc, char **argv, char **envp)
 {
+int status;
+
 qemu_init(argc, argv, envp);
-qemu_main_loop();
+status = qemu_main_loop();
 qemu_cleanup();
 
-return 0;
+return status;
 }
 
 #ifndef CONFIG_COCOA
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 168e1b78a0..1e68680b9d 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -482,7 +482,8 @@ void qemu_system_guest_panicked(GuestPanicInformation *info)
 qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE,
 !!info, info);
 vm_stop(RUN_STATE_GUEST_PANICKED);
-} else if (panic_action == PANIC_ACTION_SHUTDOWN) {
+} else if (panic_action == PANIC_ACTION_SHUTDOWN ||
+   panic_action == PANIC_ACTION_EXIT_FAILURE) {
 qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_POWEROFF,
!!info, info);
 vm_stop(RUN_STATE_GUEST_PANICKED);
@@ -662,7 +663,7 @@ void qemu_system_debug_request(void)
 qemu_notify_event();
 }
 
-static bool main_loop_should_exit(void)
+static bool main_loop_should_exit(int *status)
 {
 RunState r;
 ShutdownCause request;
@@ -680,6 +681,10 @@ static bool main_loop_should_exit(void)
 if (shutdown_action == SHUTDOWN_ACTION_PAUSE) {
 vm_stop(RUN_STATE_SHUTDOWN);
 } else {
+if (request == SHUTDOWN_CAUSE_GUEST_PANIC &&
+panic_action == PANIC_ACTION_EXIT_FAILURE) {
+*status = EXIT_FAILURE;
+}
 return true;
 }
 }
@@ -715,12 +720,14 @@ static bool main_loop_should_exit(void)
 return false;
 }
 
-void qemu_main_loop(void)
+int qemu_main_loop(void)
 {
+int status = EXIT_SUCCESS;
 #ifdef CONFIG_PROFILER
 int64_t ti;
 #endif
-while (!main_loop_should_exit()) {
+
+while (!main_loop_should_exit(&status)) {
 #ifdef CONFIG_PROFILER
 ti = profile_getclock();
 #endif
@@ -729,6 +736,8 @@ void qemu_main_loop(void)
 dev_time += profile_getclock() - ti;
 #endif
 }
+
+return status;
 }
 
 void qemu_add_exit_notifier(Notifier *notify)
-- 
2.35.3




[PATCH v2 0/2] accel/tcg: Test unaligned stores to s390x low-address-protected lowcore

2022-07-22 Thread Ilya Leoshkevich
Hi,

This is a follow-up series for [1].

The fix has been committed.

I asked Christian what might be a good alternative for the
mmio-debug-exit device for testing, and he suggested to look into
shutdown/panic actions.

Patch 1 adds a new panic action.
Patch 2 tests unaligned stores to s390x low-address-protected lowcore;
it performs a shutdown on success and panic on failure.

Best regards,
Ilya

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01876.html

Ilya Leoshkevich (2):
  qapi: Add exit-failure PanicAction
  tests/tcg/s390x: Test unaligned accesses to lowcore

 include/sysemu/sysemu.h |  2 +-
 qapi/run-state.json |  4 +++-
 qemu-options.hx |  2 +-
 softmmu/main.c  |  6 --
 softmmu/runstate.c  | 17 +
 tests/tcg/s390x/Makefile.softmmu-target |  9 +
 tests/tcg/s390x/unaligned-lowcore.S | 19 +++
 7 files changed, 50 insertions(+), 9 deletions(-)
 create mode 100644 tests/tcg/s390x/Makefile.softmmu-target
 create mode 100644 tests/tcg/s390x/unaligned-lowcore.S

-- 
2.35.3




Re: [PATCH for-7.2 00/10] add hmp 'save-fdt' and 'info fdt' commands

2022-07-22 Thread BALATON Zoltan

On Fri, 22 Jul 2022, Daniel Henrique Barboza wrote:

Hi,

After dealing with a FDT element that isn't being shown in the userspace
and having to shutdown the guest, dump the FDT using 'machine -dumpdtb' and
then using 'dtc' to see what was inside the FDT, I thought it was a good
idea to add extra support for FDT handling in QEMU.

This series introduces 2 commands. 'fdt-save' behaves similar to what
'machine -dumpdtb' does, with the advantage of saving the FDT of a running
guest on demand. This command is implemented in patch 03.

The second command, 'info fdt ' is more sophisticated. This
command can print specific nodes and properties of the FDT. A few
examples:

- print the /cpus/cpu@0 from an ARM 'virt' machine:

(qemu) info fdt /cpus/cpu@0
/cpus/cpu@0 {
   phandle = <0x8001>
   reg = <0x0>
   compatible = 'arm,cortex-a57'
   device_type = 'cpu'
}
(qemu)

- print the device_type property of the interrupt-controller node of a
pSeries machine:

(qemu) info fdt /interrupt-controller/device_type
/interrupt-controller/device_type = 'PowerPC-External-Interrupt-Presentation'
(qemu)

Issuing a 'info fdt /' will dump all the FDT. 'info fdt' is implemented
in patches 4-10.

Both 'fdt-save' and 'info fdt' works across machines and archs based on
two premises: the FDT must be created using libfdt (which is the case of
all FDTs created with device_tree.c helpers and the _FDT macro) and the
FDT must be reachable via MachineState->fdt.

To meet the prerequisites for ARM machines, patch 1 makes a change in
arm_load_dtb(). Patches 2 and 3 makes a similar change for two PowerPC
machines that weren't using machine->fdt.


There are some other machines that load a dtb with load_device_tree(). Do 
they need some patches too?


Regards,
BALATON Zoltan


Tests were done using the ARM machvirt machine and ppc64 pSeries
machine, but any machine that meets the forementioned conditions will be
supported by these 2 new commands.


Daniel Henrique Barboza (10):
 hw/arm/boot.c: do not free machine->fdt in arm_load_dtb()
 hw/ppc/pegasos2.c: set machine->fdt in machine_reset()
 hw/ppc: set machine->fdt in spapr machine
 hmp, device_tree.c: introduce fdt-save
 hmp, device_tree.c: introduce 'info fdt' command
 device_tree.c: support printing of strings props
 device_tree.c: support remaining FDT prop types
 device_node.c: enable 'info fdt' to print subnodes
 device_tree.c: add fdt_print_property() helper
 hmp, device_tree.c: add 'info fdt ' support

hmp-commands-info.hx |  13 +++
hmp-commands.hx  |  13 +++
hw/arm/boot.c|   3 +-
hw/ppc/pegasos2.c|   3 +
hw/ppc/spapr.c   |   3 +
hw/ppc/spapr_hcall.c |   3 +
include/sysemu/device_tree.h |   3 +
monitor/misc.c   |  25 
softmmu/device_tree.c| 219 +++
9 files changed, 284 insertions(+), 1 deletion(-)






Re: [PATCH for-7.2 04/10] hmp, device_tree.c: introduce fdt-save

2022-07-22 Thread BALATON Zoltan

On Fri, 22 Jul 2022, Daniel Henrique Barboza wrote:

To save the FDT blob we have the '-machine dumpdtb=' property. With this
property set, the machine saves the FDT in  and exit. The created
file can then be converted to plain text dts format using 'dtc'.

There's nothing particularly sophisticated into saving the FDT that
can't be done with the machine at any state, as long as the machine has
a valid FDT to be saved.

The 'fdt-save' command receives a 'filename' paramenter and, if a valid
FDT is available, it'll save it in a file 'filename'. In short, this is
a '-machine dumpdtb' that can be fired on demand via HMP.


If it does the same as -machine dumpdtb wouldn't it be more intuitive to 
call the HMP command the same, so either dumpdtb or machine-dumpdtb or 
similar? That way it's more obvious that these do the same.


Regards,
BALATON Zoltan


A valid FDT consists of a FDT that was created using libfdt being
retrieved via 'current_machine->fdt' in device_tree.c. This condition is
met by most FDT users in QEMU.

Cc: Dr. David Alan Gilbert 
Signed-off-by: Daniel Henrique Barboza 
---
hmp-commands.hx  | 13 +
include/sysemu/device_tree.h |  2 ++
monitor/misc.c   | 13 +
softmmu/device_tree.c| 18 ++
4 files changed, 46 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index c9d465735a..3c134cf652 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1768,3 +1768,16 @@ ERST
  "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
.cmd= hmp_calc_dirty_rate,
},
+
+{
+.name   = "fdt-save",
+.args_type  = "filename:s",
+.params = "[filename] file to save the FDT",
+.help   = "save the FDT in the 'filename' file to be decoded using 
dtc",
+.cmd= hmp_fdt_save,
+},
+
+SRST
+``fdt-save`` *filename*
+  Save the FDT in the 'filename' file to be decoded using dtc
+ERST
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index ef060a9759..1397adb21c 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -123,6 +123,8 @@ int qemu_fdt_nop_node(void *fdt, const char *node_path);
int qemu_fdt_add_subnode(void *fdt, const char *name);
int qemu_fdt_add_path(void *fdt, const char *path);

+void fdt_save(const char *filename, Error **errp);
+
#define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
do {  \
uint32_t qdt_tmp[] = { __VA_ARGS__ }; \
diff --git a/monitor/misc.c b/monitor/misc.c
index 3d2312ba8d..145285cec0 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -78,6 +78,7 @@
#include "qapi/qmp-event.h"
#include "sysemu/cpus.h"
#include "qemu/cutils.h"
+#include "sysemu/device_tree.h"

#if defined(TARGET_S390X)
#include "hw/s390x/storage-keys.h"
@@ -936,6 +937,18 @@ static void hmp_boot_set(Monitor *mon, const QDict *qdict)
}
}

+static void hmp_fdt_save(Monitor *mon, const QDict *qdict)
+{
+const char *path = qdict_get_str(qdict, "filename");
+Error *local_err = NULL;
+
+fdt_save(path, &local_err);
+
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
{
bool flatview = qdict_get_try_bool(qdict, "flatview", false);
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 6ca3fad285..eeab6a5ef0 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -643,3 +643,21 @@ out:
g_free(propcells);
return ret;
}
+
+void fdt_save(const char *filename, Error **errp)
+{
+int size;
+
+if (!current_machine->fdt) {
+error_setg(errp, "Unable to find the machine FDT");
+return;
+}
+
+size = fdt_totalsize(current_machine->fdt);
+
+if (g_file_set_contents(filename, current_machine->fdt, size, NULL)) {
+return;
+}
+
+error_setg(errp, "Error when saving machine FDT to file %s", filename);
+}





Re: [PATCH for-7.2 02/10] hw/ppc/pegasos2.c: set machine->fdt in machine_reset()

2022-07-22 Thread BALATON Zoltan

On Fri, 22 Jul 2022, Daniel Henrique Barboza wrote:

We'll introduce HMP commands that requires machine->fdt to be set
properly.

Cc: BALATON Zoltan 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
hw/ppc/pegasos2.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 61f4263953..9827c3b4c2 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -329,6 +329,9 @@ static void pegasos2_machine_reset(MachineState *machine)
g_free(pm->fdt_blob);
pm->fdt_blob = fdt;

+/* Set common MachineState->fdt */
+machine->fdt = fdt;
+


Again, comment just states what the next line does but does not explain 
why. Either add a comment that explains why it's set or drop the trivial 
comment. Otherwise,


Acked-by: BALATON Zoltan 


vof_build_dt(fdt, pm->vof);
vof_client_open_store(fdt, pm->vof, "/chosen", "stdout", "/failsafe");
pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);





Re: [PATCH for-7.2 01/10] hw/arm/boot.c: do not free machine->fdt in arm_load_dtb()

2022-07-22 Thread BALATON Zoltan

On Fri, 22 Jul 2022, Daniel Henrique Barboza wrote:

At this moment, arm_load_dtb() can free machine->fdt when
binfo->dtb_filename is NULL. If there's no 'dtb_filename', 'fdt' will be
retrieved by binfo->get_dtb(). If get_dtb() returns machine->fdt, as is
the case of machvirt_dtb() from hw/arm/virt.c, fdt now has a pointer to
machine->fdt. And, in that case, the existing g_free(fdt) at the end of
arm_load_dtb() will make machine->fdt point to an invalid memory region.

This is not an issue right now because there's no code that access
machine->fdt after arm_load_dtb(), but we're going to add a couple do
FDT HMP commands that will rely on machine->fdt being valid.

Instead of freeing 'fdt' at the end of arm_load_dtb(), assign it to
machine->fdt. This will allow the FDT of ARM machines that relies on
arm_load_dtb() to be accessed later on.

Since all ARM machines allocates the FDT only once, we don't need to
worry about leaking the existing FDT during a machine reset (which is
something that other machines have to look after, e.g. the ppc64 pSeries
machine).

Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
hw/arm/boot.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index ada2717f76..1d9c6047b1 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -684,7 +684,8 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 */
rom_add_blob_fixed_as("dtb", fdt, size, addr, as);

-g_free(fdt);
+/* Update ms->fdt pointer */
+ms->fdt = fdt;


Not sure this comment is useful as it just states what the assignment does 
so provides no further info.


Regards,
BALATON Zoltan



return size;






Re: [RFC 0/3] add snapshot/restore fuzzing device

2022-07-22 Thread Claudio Fontana
Hi Richard,

On 7/22/22 21:20, Richard Liu wrote:
> This RFC adds a virtual device for snapshot/restores within QEMU. I am working
> on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores
> within QEMU is helpful for code fuzzing.
> 
> I reused the migration code for saving and restoring virtual device and CPU
> state. As for the RAM, I am using a simple COW mmaped file to do restores.
> 
> The loadvm migration function I used for doing restores only worked after I
> called it from a qemu_bh. I'm not sure if I should run the migration code in a
> separate thread (see patch 3), since currently it is running as a part of the
> device code in the vCPU thread.
> 
> This is a rough first revision and feedback on the cpu and device state 
> restores
> is appreciated.

As I understand it, usually the save and restore of VM state in QEMU can best be
managed by libvirt APIs, and for example using the libvirt command line tool 
virsh:

$ virsh save (or managedsave)

$ virsh restore (or start)

These commands start a QEMU migration using the QMP protocol to a file 
descriptor,
previously opened by libvirt to contain the state file.

(getfd QMP command):
https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-2811

(migrate QMP command):
https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1947

This is unfortunately currently very slow.

Maybe you could help thinking out or with the implementation of the solution?
I tried to push this approach that only involves libvirt, using the existing 
QEMU multifd migration to a socket:

https://listman.redhat.com/archives/libvir-list/2022-June/232252.html

performance is very good compared with what is possible today, but it won't be 
upstreamable because it is not deemed optimal, and libvirt wants the code to be 
in QEMU.

What about helping in thinking out how the QEMU-based solution could look like?

The requirements for now in my view seem to be:

* avoiding the kernel file page trashing for large transfers
  which currently requires in my view changing QEMU to be able to migrate a 
stream to an fd that is open with O_DIRECT.
  In practice this means somehow making all QEMU migration stream writes 
block-friendly (adding some buffering?).

* allow concurrent parallel transfers
  to be able to use extra cpu resources to speed up the transfer if such 
resources are available.

* we should be able to transfer multiple GB/s with modern nvmes for super fast 
VM state save and restore (few seconds even for a 30GB VM),
  and we should do no worse than the prototype fully implemented in libvirt, 
otherwise it would not make sense to implement it in QEMU.

What do you think?

Ciao,

Claudio

> 
> To test locally, boot up any linux distro. I used the following C file to
> interact with the PCI snapshot device:
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main() {
> int fd = open("/sys/bus/pci/devices/:00:04.0/resource0", O_RDWR | 
> O_SYNC);
> size_t size = 1024 * 1024;
> uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, 
> MAP_SHARED, fd, 0);
> 
> printf("%x\n", memory[0]);
> 
> int a = 0;
> memory[0] = 0x101; // save snapshot
> printf("before: value of a = %d\n", a);
> a = 1;
> printf("middle: value of a = %d\n", a);
> memory[0] = 0x102; // load snapshot
> printf("after: value of a = %d\n", a);
> 
> return 0;
> }
> 
> Richard Liu (3):
>   create skeleton snapshot device and add docs
>   implement ram save/restore
>   use migration code for cpu and device save/restore
> 
>  docs/devel/snapshot.rst |  26 +++
>  hw/i386/Kconfig |   1 +
>  hw/misc/Kconfig |   3 +
>  hw/misc/meson.build |   1 +
>  hw/misc/snapshot.c  | 164 
>  migration/savevm.c  |  84 
>  migration/savevm.h  |   3 +
>  7 files changed, 282 insertions(+)
>  create mode 100644 docs/devel/snapshot.rst
>  create mode 100644 hw/misc/snapshot.c
> 




[PATCH for-7.2 00/10] add hmp 'save-fdt' and 'info fdt' commands

2022-07-22 Thread Daniel Henrique Barboza
Hi,

After dealing with a FDT element that isn't being shown in the userspace
and having to shutdown the guest, dump the FDT using 'machine -dumpdtb' and
then using 'dtc' to see what was inside the FDT, I thought it was a good
idea to add extra support for FDT handling in QEMU.

This series introduces 2 commands. 'fdt-save' behaves similar to what
'machine -dumpdtb' does, with the advantage of saving the FDT of a running
guest on demand. This command is implemented in patch 03.

The second command, 'info fdt ' is more sophisticated. This
command can print specific nodes and properties of the FDT. A few
examples:

- print the /cpus/cpu@0 from an ARM 'virt' machine:

(qemu) info fdt /cpus/cpu@0
/cpus/cpu@0 {
phandle = <0x8001>
reg = <0x0>
compatible = 'arm,cortex-a57'
device_type = 'cpu'
}
(qemu) 

- print the device_type property of the interrupt-controller node of a
pSeries machine:

(qemu) info fdt /interrupt-controller/device_type
/interrupt-controller/device_type = 'PowerPC-External-Interrupt-Presentation'
(qemu) 

Issuing a 'info fdt /' will dump all the FDT. 'info fdt' is implemented
in patches 4-10.

Both 'fdt-save' and 'info fdt' works across machines and archs based on
two premises: the FDT must be created using libfdt (which is the case of
all FDTs created with device_tree.c helpers and the _FDT macro) and the
FDT must be reachable via MachineState->fdt.

To meet the prerequisites for ARM machines, patch 1 makes a change in
arm_load_dtb(). Patches 2 and 3 makes a similar change for two PowerPC
machines that weren't using machine->fdt.

Tests were done using the ARM machvirt machine and ppc64 pSeries
machine, but any machine that meets the forementioned conditions will be
supported by these 2 new commands. 


Daniel Henrique Barboza (10):
  hw/arm/boot.c: do not free machine->fdt in arm_load_dtb()
  hw/ppc/pegasos2.c: set machine->fdt in machine_reset()
  hw/ppc: set machine->fdt in spapr machine
  hmp, device_tree.c: introduce fdt-save
  hmp, device_tree.c: introduce 'info fdt' command
  device_tree.c: support printing of strings props
  device_tree.c: support remaining FDT prop types
  device_node.c: enable 'info fdt' to print subnodes
  device_tree.c: add fdt_print_property() helper
  hmp, device_tree.c: add 'info fdt ' support

 hmp-commands-info.hx |  13 +++
 hmp-commands.hx  |  13 +++
 hw/arm/boot.c|   3 +-
 hw/ppc/pegasos2.c|   3 +
 hw/ppc/spapr.c   |   3 +
 hw/ppc/spapr_hcall.c |   3 +
 include/sysemu/device_tree.h |   3 +
 monitor/misc.c   |  25 
 softmmu/device_tree.c| 219 +++
 9 files changed, 284 insertions(+), 1 deletion(-)

-- 
2.36.1




[PATCH for-7.2 10/10] hmp, device_tree.c: add 'info fdt ' support

2022-07-22 Thread Daniel Henrique Barboza
'info fdt' is only able to print full nodes so far. It would be good to
be able to also print single properties, since ometimes we just want
to verify a single value from the FDT.

libfdt does not have support to find a property given its full path, but
it does have a way to return a fdt_property given a prop name and its
subnode.

This is how we're going to support it:

- given the same fullpath parameter, assume it's a node. If we have a
match with an existing node, print it. If not, assume it's a property;

- in fdt_find_property() we're going to split 'fullpath' into node and
property. Unfortunately we can't use g_path_get_basename() to helps us
because, although the device tree path format is similar to Linux, it'll
not work when trying to run QEMU under Windows where the path format is
different;

- after spliiting into node + property, try to find the node in the FDT.
If we have a match, use fdt_get_property() to retrieve fdt_property.
Return it if found;

- using the fdt_print_property() created previously, print the property.

After this change, if an user wants to print just the value of 'cpu' inside
/cpu/cpu-map(...) from an ARM FDT, we can do it:

(qemu) info fdt /cpus/cpu-map/socket0/cluster0/core0/cpu
/cpus/cpu-map/socket0/cluster0/core0/cpu = <0x8001>
(qemu)

Or the 'ibm,my-dma-window' from the v-scsi device inside the pSeries
FDT:

(qemu) info fdt /vdevice/v-scsi@7103/ibm,my-dma-window
/vdevice/v-scsi@7103/ibm,my-dma-window = <0x7103 0x0 0x0 0x0 0x1000>
(qemu)

Cc: Dr. David Alan Gilbert 
Signed-off-by: Daniel Henrique Barboza 
---
 hmp-commands-info.hx  |  2 +-
 softmmu/device_tree.c | 79 ---
 2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index abf277be7d..8891c2918a 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -913,7 +913,7 @@ ERST
 .name   = "fdt",
 .args_type  = "fullpath:s",
 .params = "fullpath",
-.help   = "show firmware device tree node given its full path",
+.help   = "show firmware device tree node or property given its 
full path",
 .cmd= hmp_info_fdt,
 },
 
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index e41894fbef..f6eb060acc 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -774,9 +774,74 @@ static void fdt_print_node(int node, int depth, const char 
*fullpath)
 qemu_printf("%*s}\n", padding, "");
 }
 
+static const struct fdt_property *fdt_find_property(const char *fullpath,
+int *prop_size,
+Error **errp)
+{
+const struct fdt_property *prop = NULL;
+void *fdt = current_machine->fdt;
+g_autoptr(GString) nodename = NULL;
+const char *propname = NULL;
+int path_len = strlen(fullpath);
+int node = 0; /* default to root node '/' */
+int i, idx = -1;
+
+/*
+ * We'll assume that we're dealing with a property. libfdt
+ * does not have an API to find a property given the full
+ * path, but it does have an API to find a property inside
+ * a node.
+ */
+nodename = g_string_new("");
+
+for (i = path_len - 1; i >= 0; i--) {
+if (fullpath[i] == '/') {
+idx = i;
+break;
+}
+}
+
+if (idx == -1) {
+error_setg(errp, "FDT paths must contain at least one '/' character");
+return NULL;
+}
+
+if (idx == path_len - 1) {
+error_setg(errp, "FDT paths can't end with a '/' character");
+return NULL;
+}
+
+propname = &fullpath[idx + 1];
+
+if (idx != 0) {
+g_string_append_len(nodename, fullpath, idx);
+
+node = fdt_path_offset(fdt, nodename->str);
+if (node < 0) {
+error_setg(errp, "node '%s' of property '%s' not found in FDT",
+   nodename->str, propname);
+return NULL;
+}
+} else {
+/* idx = 0 means that it's a property of the root node */
+g_string_append(nodename, "/");
+}
+
+prop = fdt_get_property(fdt, node, propname, prop_size);
+if (!prop) {
+error_setg(errp, "property '%s' not found in node '%s' in FDT",
+   propname, nodename->str);
+return NULL;
+}
+
+return prop;
+}
+
 void fdt_info(const char *fullpath, Error **errp)
 {
-int node;
+const struct fdt_property *prop = NULL;
+Error *local_err = NULL;
+int node, prop_size;
 
 if (!current_machine->fdt) {
 error_setg(errp, "Unable to find the machine FDT");
@@ -784,10 +849,16 @@ void fdt_info(const char *fullpath, Error **errp)
 }
 
 node = fdt_path_offset(current_machine->fdt, fullpath);
-if (node < 0) {
-error_setg(errp, "node '%s' not found in FDT", fullpath);
+if (node >= 0) {
+fdt_print_node(node, 0, fullpath);
+return;
+  

[PATCH for-7.2 07/10] device_tree.c: support remaining FDT prop types

2022-07-22 Thread Daniel Henrique Barboza
When printing a blob with 'dtc' using the '-O dts' option there are 3
distinct data types being printed: strings, arrays of uint32s and
regular byte arrays.

Previous patch added support to print strings. Let's add the remaining
formats. We want to resemble the format that 'dtc -O dts' uses, so every
uint32 array uses angle brackets (<>), and regular byte array uses square
brackets ([]). For properties that has no values we keep printing just
its name.

The /chosen FDT node from the pSeris machine gives an example of all
property types 'info fdt' is now able to display:

(qemu) info fdt /chosen
chosen {
ibm,architecture-vec-5 = [0 0]
rng-seed = <0x5967a270 0x62b0fb4f 0x8262b46a 0xabf48423 0xcce9615 
0xf9daae64 0x66564790 0x357d1604>
ibm,arch-vec-5-platform-support = <0x178018c0 0x19001a40>
linux,pci-probe-only = <0x0>
stdout-path = '/vdevice/vty@7100'
linux,stdout-path = '/vdevice/vty@7100'
qemu,graphic-depth = <0x20>
qemu,graphic-height = <0x258>
qemu,graphic-width = <0x320>
}

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 53 +--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 3c070acc0d..3a4d09483b 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -681,6 +681,46 @@ static bool fdt_prop_is_string(const void *data, int size)
 return true;
 }
 
+static bool fdt_prop_is_uint32_array(int size)
+{
+return size % 4 == 0;
+}
+
+static void fdt_prop_print_uint32_array(const char *propname, const void *data,
+int prop_size, int padding)
+{
+const fdt32_t *array = data;
+int array_len = prop_size / 4;
+int i;
+
+qemu_printf("%*s%s = <", padding, "", propname);
+for (i = 0; i < array_len; i++) {
+qemu_printf("0x%" PRIx32, fdt32_to_cpu(array[i]));
+
+if (i < array_len - 1) {
+qemu_printf(" ");
+}
+}
+qemu_printf(">\n");
+}
+
+static void fdt_prop_print_val(const char *propname, const void *data,
+   int prop_size, int padding)
+{
+const char *val = data;
+int i;
+
+qemu_printf("%*s%s = [", padding, "", propname);
+for (i = 0; i < prop_size; i++) {
+qemu_printf("%x", val[i]);
+
+if (i < prop_size - 1) {
+qemu_printf(" ");
+}
+}
+qemu_printf("]\n");
+}
+
 static void fdt_print_node(int node, int depth)
 {
 const struct fdt_property *prop = NULL;
@@ -698,10 +738,19 @@ static void fdt_print_node(int node, int depth)
 prop = fdt_get_property_by_offset(fdt, property, &prop_size);
 propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
 
+if (prop_size == 0) {
+qemu_printf("%*s%s;\n", padding, "", propname);
+continue;
+}
+
 if (fdt_prop_is_string(prop->data, prop_size)) {
-qemu_printf("%*s%s = '%s'\n", padding, "", propname, prop->data);
+qemu_printf("%*s%s = '%s'\n", padding, "",
+propname, (char *)prop->data);
+} else if (fdt_prop_is_uint32_array(prop_size)) {
+fdt_prop_print_uint32_array(propname, prop->data, prop_size,
+padding);
 } else {
-qemu_printf("%*s%s;\n", padding, "", propname);
+fdt_prop_print_val(propname, prop->data, prop_size, padding);
 }
 }
 
-- 
2.36.1




[PATCH for-7.2 09/10] device_tree.c: add fdt_print_property() helper

2022-07-22 Thread Daniel Henrique Barboza
We want to be able to also print properties with 'info fdt'.

Create a helper to print properties based on the already existing code
from fdt_print_node().

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 88b6a0c902..e41894fbef 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -721,6 +721,23 @@ static void fdt_prop_print_val(const char *propname, const 
void *data,
 qemu_printf("]\n");
 }
 
+static void fdt_print_property(const char *propname, const void *data,
+   int prop_size, int padding)
+{
+if (prop_size == 0) {
+qemu_printf("%*s%s;\n", padding, "", propname);
+return;
+}
+
+if (fdt_prop_is_string(data, prop_size)) {
+qemu_printf("%*s%s = '%s'\n", padding, "", propname, (char *)data);
+} else if (fdt_prop_is_uint32_array(prop_size)) {
+fdt_prop_print_uint32_array(propname, data, prop_size, padding);
+} else {
+fdt_prop_print_val(propname, data, prop_size, padding);
+}
+}
+
 static void fdt_print_node(int node, int depth, const char *fullpath)
 {
 const struct fdt_property *prop = NULL;
@@ -746,20 +763,7 @@ static void fdt_print_node(int node, int depth, const char 
*fullpath)
 prop = fdt_get_property_by_offset(fdt, property, &prop_size);
 propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
 
-if (prop_size == 0) {
-qemu_printf("%*s%s;\n", padding, "", propname);
-continue;
-}
-
-if (fdt_prop_is_string(prop->data, prop_size)) {
-qemu_printf("%*s%s = '%s'\n", padding, "",
-propname, (char *)prop->data);
-} else if (fdt_prop_is_uint32_array(prop_size)) {
-fdt_prop_print_uint32_array(propname, prop->data, prop_size,
-padding);
-} else {
-fdt_prop_print_val(propname, prop->data, prop_size, padding);
-}
+fdt_print_property(propname, prop->data, prop_size, padding);
 }
 
 fdt_for_each_subnode(node, fdt, parent) {
-- 
2.36.1




driver type raw-xz supports discard=unmap?

2022-07-22 Thread Chris Murphy
Is this valid?

`


`
`/>
`

I know type="raw" works fine, I'm wondering if there'd be any problem with type 
"raw-xz" combined with discards?

Thanks,

Chris Murphy

[PATCH for-7.2 02/10] hw/ppc/pegasos2.c: set machine->fdt in machine_reset()

2022-07-22 Thread Daniel Henrique Barboza
We'll introduce HMP commands that requires machine->fdt to be set
properly.

Cc: BALATON Zoltan 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/pegasos2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 61f4263953..9827c3b4c2 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -329,6 +329,9 @@ static void pegasos2_machine_reset(MachineState *machine)
 g_free(pm->fdt_blob);
 pm->fdt_blob = fdt;
 
+/* Set common MachineState->fdt */
+machine->fdt = fdt;
+
 vof_build_dt(fdt, pm->vof);
 vof_client_open_store(fdt, pm->vof, "/chosen", "stdout", "/failsafe");
 pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);
-- 
2.36.1




[PATCH for-7.2 04/10] hmp, device_tree.c: introduce fdt-save

2022-07-22 Thread Daniel Henrique Barboza
To save the FDT blob we have the '-machine dumpdtb=' property. With this
property set, the machine saves the FDT in  and exit. The created
file can then be converted to plain text dts format using 'dtc'.

There's nothing particularly sophisticated into saving the FDT that
can't be done with the machine at any state, as long as the machine has
a valid FDT to be saved.

The 'fdt-save' command receives a 'filename' paramenter and, if a valid
FDT is available, it'll save it in a file 'filename'. In short, this is
a '-machine dumpdtb' that can be fired on demand via HMP.

A valid FDT consists of a FDT that was created using libfdt being
retrieved via 'current_machine->fdt' in device_tree.c. This condition is
met by most FDT users in QEMU.

Cc: Dr. David Alan Gilbert 
Signed-off-by: Daniel Henrique Barboza 
---
 hmp-commands.hx  | 13 +
 include/sysemu/device_tree.h |  2 ++
 monitor/misc.c   | 13 +
 softmmu/device_tree.c| 18 ++
 4 files changed, 46 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index c9d465735a..3c134cf652 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1768,3 +1768,16 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+{
+.name   = "fdt-save",
+.args_type  = "filename:s",
+.params = "[filename] file to save the FDT",
+.help   = "save the FDT in the 'filename' file to be decoded using 
dtc",
+.cmd= hmp_fdt_save,
+},
+
+SRST
+``fdt-save`` *filename*
+  Save the FDT in the 'filename' file to be decoded using dtc
+ERST
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index ef060a9759..1397adb21c 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -123,6 +123,8 @@ int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
 int qemu_fdt_add_path(void *fdt, const char *path);
 
+void fdt_save(const char *filename, Error **errp);
+
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
 uint32_t qdt_tmp[] = { __VA_ARGS__ }; \
diff --git a/monitor/misc.c b/monitor/misc.c
index 3d2312ba8d..145285cec0 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -78,6 +78,7 @@
 #include "qapi/qmp-event.h"
 #include "sysemu/cpus.h"
 #include "qemu/cutils.h"
+#include "sysemu/device_tree.h"
 
 #if defined(TARGET_S390X)
 #include "hw/s390x/storage-keys.h"
@@ -936,6 +937,18 @@ static void hmp_boot_set(Monitor *mon, const QDict *qdict)
 }
 }
 
+static void hmp_fdt_save(Monitor *mon, const QDict *qdict)
+{
+const char *path = qdict_get_str(qdict, "filename");
+Error *local_err = NULL;
+
+fdt_save(path, &local_err);
+
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
 static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 {
 bool flatview = qdict_get_try_bool(qdict, "flatview", false);
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 6ca3fad285..eeab6a5ef0 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -643,3 +643,21 @@ out:
 g_free(propcells);
 return ret;
 }
+
+void fdt_save(const char *filename, Error **errp)
+{
+int size;
+
+if (!current_machine->fdt) {
+error_setg(errp, "Unable to find the machine FDT");
+return;
+}
+
+size = fdt_totalsize(current_machine->fdt);
+
+if (g_file_set_contents(filename, current_machine->fdt, size, NULL)) {
+return;
+}
+
+error_setg(errp, "Error when saving machine FDT to file %s", filename);
+}
-- 
2.36.1




[PATCH for-7.2 06/10] device_tree.c: support printing of strings props

2022-07-22 Thread Daniel Henrique Barboza
To support printing string properties in 'info fdt' we need to determine
whether a void data might contain a string.

We do that by casting the void data to a string array and:

- check if the array finishes with a null character
- check if all characters are printable

If both conditions are met, we'll consider it to be a string data type
and print it accordingly. After this change, 'info fdt' is now able to
print string properties. Here's an example with the ARM 'virt' machine:

(qemu) info fdt /chosen
chosen {
stdout-path = '/pl011@900'
rng-seed;
kaslr-seed;
}

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 899c239c5c..3c070acc0d 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -663,6 +663,24 @@ void fdt_save(const char *filename, Error **errp)
 error_setg(errp, "Error when saving machine FDT to file %s", filename);
 }
 
+static bool fdt_prop_is_string(const void *data, int size)
+{
+const char *str = data;
+int i;
+
+if (size <= 0 || str[size - 1] != '\0') {
+return false;
+}
+
+for (i = 0; i < size - 1; i++) {
+if (!g_ascii_isprint(str[i])) {
+return false;
+}
+}
+
+return true;
+}
+
 static void fdt_print_node(int node, int depth)
 {
 const struct fdt_property *prop = NULL;
@@ -680,7 +698,11 @@ static void fdt_print_node(int node, int depth)
 prop = fdt_get_property_by_offset(fdt, property, &prop_size);
 propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
 
-qemu_printf("%*s%s;\n", padding, "", propname);
+if (fdt_prop_is_string(prop->data, prop_size)) {
+qemu_printf("%*s%s = '%s'\n", padding, "", propname, prop->data);
+} else {
+qemu_printf("%*s%s;\n", padding, "", propname);
+}
 }
 
 padding -= 4;
-- 
2.36.1




[PATCH for-7.2 05/10] hmp, device_tree.c: introduce 'info fdt' command

2022-07-22 Thread Daniel Henrique Barboza
Reading the FDT requires that the user saves the fdt_blob and then use
'dtc' to read the contents. Saving the file and using 'dtc' is a strong
use case when we need to compare two FDTs, but it's a lot of steps if
you want to do quick check on a certain attribute.

'info fdt' retrieves FDT nodes (and properties, later on) and print it
to the user. This can be used to check the FDT on running machines
without having to save the blob and use 'dtc'.

The implementation is based on the premise that the machine thas a FDT
created using libfdt and pointed by 'machine->fdt'. As long as this
pre-requisite is met the machine should be able to support it.

For now we're going to add the required HMP boilerplate and the
capability of printing the name of the properties of a given node. Next
patches will extend 'info fdt' to be able to print nodes recursively.

This is an example of 'info fdt' fetching the '/chosen' node of the
pSeries machine:

(qemu) info fdt /chosen
chosen {
ibm,architecture-vec-5;
rng-seed;
ibm,arch-vec-5-platform-support;
linux,pci-probe-only;
stdout-path;
linux,stdout-path;
qemu,graphic-depth;
qemu,graphic-height;
qemu,graphic-width;
}

And the same node for the aarch64 'virt' machine:

(qemu) info fdt /chosen
chosen {
stdout-path;
rng-seed;
kaslr-seed;
}

Cc: Dr. David Alan Gilbert 
Signed-off-by: Daniel Henrique Barboza 
---
 hmp-commands-info.hx | 13 +++
 include/sysemu/device_tree.h |  1 +
 monitor/misc.c   | 12 ++
 softmmu/device_tree.c| 43 
 4 files changed, 69 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 3ffa24bd67..abf277be7d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -908,3 +908,16 @@ SRST
   ``stats``
 Show runtime-collected statistics
 ERST
+
+{
+.name   = "fdt",
+.args_type  = "fullpath:s",
+.params = "fullpath",
+.help   = "show firmware device tree node given its full path",
+.cmd= hmp_info_fdt,
+},
+
+SRST
+  ``info fdt``
+Show firmware device tree (fdt).
+ERST
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 1397adb21c..c0f98b1c88 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -124,6 +124,7 @@ int qemu_fdt_add_subnode(void *fdt, const char *name);
 int qemu_fdt_add_path(void *fdt, const char *path);
 
 void fdt_save(const char *filename, Error **errp);
+void fdt_info(const char *fullpath, Error **errp);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
diff --git a/monitor/misc.c b/monitor/misc.c
index 145285cec0..e709a7de91 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -973,6 +973,18 @@ static void hmp_info_capture(Monitor *mon, const QDict 
*qdict)
 }
 }
 
+static void hmp_info_fdt(Monitor *mon, const QDict *qdict)
+{
+const char *fullpath = qdict_get_str(qdict, "fullpath");
+Error *local_err = NULL;
+
+fdt_info(fullpath, &local_err);
+
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
 static void hmp_stopcapture(Monitor *mon, const QDict *qdict)
 {
 int i;
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index eeab6a5ef0..899c239c5c 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -26,6 +26,7 @@
 #include "hw/loader.h"
 #include "hw/boards.h"
 #include "qemu/config-file.h"
+#include "qemu/qemu-print.h"
 
 #include 
 
@@ -661,3 +662,45 @@ void fdt_save(const char *filename, Error **errp)
 
 error_setg(errp, "Error when saving machine FDT to file %s", filename);
 }
+
+static void fdt_print_node(int node, int depth)
+{
+const struct fdt_property *prop = NULL;
+const char *propname = NULL;
+void *fdt = current_machine->fdt;
+int padding = depth * 4;
+int property = 0;
+int prop_size;
+
+qemu_printf("%*s%s {\n", padding, "", fdt_get_name(fdt, node, NULL));
+
+padding += 4;
+
+fdt_for_each_property_offset(property, fdt, node) {
+prop = fdt_get_property_by_offset(fdt, property, &prop_size);
+propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
+
+qemu_printf("%*s%s;\n", padding, "", propname);
+}
+
+padding -= 4;
+qemu_printf("%*s}\n", padding, "");
+}
+
+void fdt_info(const char *fullpath, Error **errp)
+{
+int node;
+
+if (!current_machine->fdt) {
+error_setg(errp, "Unable to find the machine FDT");
+return;
+}
+
+node = fdt_path_offset(current_machine->fdt, fullpath);
+if (node < 0) {
+error_setg(errp, "node '%s' not found in FDT", fullpath);
+return;
+}
+
+fdt_print_node(node, 0);
+}
-- 
2.36.1




[PATCH for-7.2 08/10] device_node.c: enable 'info fdt' to print subnodes

2022-07-22 Thread Daniel Henrique Barboza
Printing subnodes of a given node will allow us to show a whole subtree,
which the additional perk of 'info fdt /' being able to print the whole
FDT.

Since we're now printing more than one subnode, change 'fdt_info' to
print the full path of the first node. This small tweak helps
identifying which node or subnode are being displayed.

To demostrate this capability without printing the whole FDT, the
'/cpus/cpu-map' node from the ARM 'virt' machine has a lot of subnodes:

(qemu) info fdt /cpus/cpu-map
/cpus/cpu-map {
socket0 {
cluster0 {
core0 {
cpu = <0x8001>
}
}
}
}

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 3a4d09483b..88b6a0c902 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -721,16 +721,24 @@ static void fdt_prop_print_val(const char *propname, 
const void *data,
 qemu_printf("]\n");
 }
 
-static void fdt_print_node(int node, int depth)
+static void fdt_print_node(int node, int depth, const char *fullpath)
 {
 const struct fdt_property *prop = NULL;
+const char *nodename = NULL;
 const char *propname = NULL;
 void *fdt = current_machine->fdt;
 int padding = depth * 4;
 int property = 0;
+int parent = node;
 int prop_size;
 
-qemu_printf("%*s%s {\n", padding, "", fdt_get_name(fdt, node, NULL));
+if (fullpath != NULL) {
+nodename = fullpath;
+} else {
+nodename = fdt_get_name(fdt, node, NULL);
+}
+
+qemu_printf("%*s%s {\n", padding, "", nodename);
 
 padding += 4;
 
@@ -754,6 +762,10 @@ static void fdt_print_node(int node, int depth)
 }
 }
 
+fdt_for_each_subnode(node, fdt, parent) {
+fdt_print_node(node, depth + 1, NULL);
+}
+
 padding -= 4;
 qemu_printf("%*s}\n", padding, "");
 }
@@ -773,5 +785,5 @@ void fdt_info(const char *fullpath, Error **errp)
 return;
 }
 
-fdt_print_node(node, 0);
+fdt_print_node(node, 0, fullpath);
 }
-- 
2.36.1




[PATCH for-7.2 01/10] hw/arm/boot.c: do not free machine->fdt in arm_load_dtb()

2022-07-22 Thread Daniel Henrique Barboza
At this moment, arm_load_dtb() can free machine->fdt when
binfo->dtb_filename is NULL. If there's no 'dtb_filename', 'fdt' will be
retrieved by binfo->get_dtb(). If get_dtb() returns machine->fdt, as is
the case of machvirt_dtb() from hw/arm/virt.c, fdt now has a pointer to
machine->fdt. And, in that case, the existing g_free(fdt) at the end of
arm_load_dtb() will make machine->fdt point to an invalid memory region.

This is not an issue right now because there's no code that access
machine->fdt after arm_load_dtb(), but we're going to add a couple do
FDT HMP commands that will rely on machine->fdt being valid.

Instead of freeing 'fdt' at the end of arm_load_dtb(), assign it to
machine->fdt. This will allow the FDT of ARM machines that relies on
arm_load_dtb() to be accessed later on.

Since all ARM machines allocates the FDT only once, we don't need to
worry about leaking the existing FDT during a machine reset (which is
something that other machines have to look after, e.g. the ppc64 pSeries
machine).

Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
 hw/arm/boot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index ada2717f76..1d9c6047b1 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -684,7 +684,8 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
  */
 rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
 
-g_free(fdt);
+/* Update ms->fdt pointer */
+ms->fdt = fdt;
 
 return size;
 
-- 
2.36.1




[PATCH for-7.2 03/10] hw/ppc: set machine->fdt in spapr machine

2022-07-22 Thread Daniel Henrique Barboza
The pSeries machine never bothered with the common machine->fdt
attribute. We do all the FDT related work using spapr->fdt_blob.

We're going to introduce HMP commands to read and save the FDT, which
will rely on setting machine->fdt properly to work across all machine
archs/types.

Let's set machine->fdt in the two places where we manipulate the FDT:
spapr_machine_reset() and CAS. spapr->fdt_blob is left untouched: what
we want is a way to access the FDT from HMP, not replace
spapr->fdt_blob.

Cc: Cédric Le Goater 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c   | 3 +++
 hw/ppc/spapr_hcall.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bc9ba6e6dc..7279583a4d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1713,6 +1713,9 @@ static void spapr_machine_reset(MachineState *machine)
 spapr->fdt_initial_size = spapr->fdt_size;
 spapr->fdt_blob = fdt;
 
+/* Set common MachineState->fdt */
+machine->fdt = fdt;
+
 /* Set up the entry state */
 first_ppc_cpu->env.gpr[5] = 0;
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index a8d4a6bcf0..e6b960577d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1256,6 +1256,9 @@ target_ulong do_client_architecture_support(PowerPCCPU 
*cpu,
 spapr->fdt_initial_size = spapr->fdt_size;
 spapr->fdt_blob = fdt;
 
+/* Set common MachineState->fdt */
+MACHINE(spapr)->fdt = fdt;
+
 return H_SUCCESS;
 }
 
-- 
2.36.1




[RFC 1/3] create skeleton snapshot device and add docs

2022-07-22 Thread Richard Liu
Added a simple skeleton PCI device for snapshot/restores. Added
documentation about the snapshot/restore functionality.

Signed-off-by: Richard Liu 
---
 docs/devel/snapshot.rst | 26 +
 hw/i386/Kconfig |  1 +
 hw/misc/Kconfig |  3 ++
 hw/misc/meson.build |  1 +
 hw/misc/snapshot.c  | 86 +
 5 files changed, 117 insertions(+)
 create mode 100644 docs/devel/snapshot.rst
 create mode 100644 hw/misc/snapshot.c

diff --git a/docs/devel/snapshot.rst b/docs/devel/snapshot.rst
new file mode 100644
index 00..a333de69b6
--- /dev/null
+++ b/docs/devel/snapshot.rst
@@ -0,0 +1,26 @@
+
+Snapshot/restore
+
+
+The ability to rapidly snapshot and restore guest VM state is a
+crucial component of fuzzing applications with QEMU. A special virtual
+device can be used by fuzzers to interface with snapshot/restores
+commands in QEMU. The virtual device should have the following
+commands supported that can be called by the guest:
+
+- snapshot: save a copy of the guest VM memory, registers, and virtual
+  device state
+- restore: restore the saved copy of guest VM state
+- coverage_location: given a location in guest memory, specifying
+  where the coverage data is to be passed to the fuzzer
+- input_location: specify where in the guest memory the fuzzing input
+  should be stored
+- done: indicates whether or not the run succeeded and that the
+  coverage data has been populated
+
+The first version of the virtual device will only accept snapshot and
+restore commands from the guest. Coverage data will be collected by
+code on the guest with source-based coverage tracking.
+
+Further expansions could include controlling the snapshot/restore from
+host and gathering code coverage information directly from TCG.
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b9..55656eddf5 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -46,6 +46,7 @@ config PC
 select ACPI_VMGENID
 select VIRTIO_PMEM_SUPPORTED
 select VIRTIO_MEM_SUPPORTED
+select SNAPSHOT
 
 config PC_PCI
 bool
diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index cbabe9f78c..fe84f812f2 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -174,4 +174,7 @@ config VIRT_CTRL
 config LASI
 bool
 
+config SNAPSHOT
+bool
+
 source macio/Kconfig
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 95268eddc0..ac8fcc5f0b 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -10,6 +10,7 @@ softmmu_ss.add(when: 'CONFIG_UNIMP', if_true: 
files('unimp.c'))
 softmmu_ss.add(when: 'CONFIG_EMPTY_SLOT', if_true: files('empty_slot.c'))
 softmmu_ss.add(when: 'CONFIG_LED', if_true: files('led.c'))
 softmmu_ss.add(when: 'CONFIG_PVPANIC_COMMON', if_true: files('pvpanic.c'))
+softmmu_ss.add(when: 'CONFIG_SNAPSHOT', if_true: files('snapshot.c'))
 
 # ARM devices
 softmmu_ss.add(when: 'CONFIG_PL310', if_true: files('arm_l2x0.c'))
diff --git a/hw/misc/snapshot.c b/hw/misc/snapshot.c
new file mode 100644
index 00..2690b331fd
--- /dev/null
+++ b/hw/misc/snapshot.c
@@ -0,0 +1,86 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/pci/pci.h"
+#include "hw/hw.h"
+#include "hw/boards.h"
+#include "exec/ramblock.h"
+#include "qom/object.h"
+#include "qemu/module.h"
+#include "qapi/visitor.h"
+#include "io/channel-buffer.h"
+#include "migration/savevm.h"
+
+#define TYPE_PCI_SNAPSHOT_DEVICE "snapshot"
+typedef struct SnapshotState SnapshotState;
+DECLARE_INSTANCE_CHECKER(SnapshotState, SNAPSHOT,
+ TYPE_PCI_SNAPSHOT_DEVICE)
+
+struct SnapshotState {
+PCIDevice pdev;
+MemoryRegion mmio;
+};
+
+static uint64_t snapshot_mmio_read(void *opaque, hwaddr addr, unsigned size)
+{
+return 0;
+}
+
+static void snapshot_mmio_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+}
+
+static const MemoryRegionOps snapshot_mmio_ops = {
+.read = snapshot_mmio_read,
+.write = snapshot_mmio_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+
+};
+
+static void pci_snapshot_realize(PCIDevice *pdev, Error **errp)
+{
+SnapshotState *snapshot = SNAPSHOT(pdev);
+
+memory_region_init_io(&snapshot->mmio, OBJECT(snapshot), 
&snapshot_mmio_ops, snapshot,
+"snapshot-mmio", 1 * MiB);
+pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &snapshot->mmio);
+}
+
+static void snapshot_class_init(ObjectClass *class, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(class);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(class);
+
+k->realize = pci_snapshot_realize;
+k->vendor_id = PCI_VENDOR_ID_QEMU;
+k->device_id = 0xf987;
+k->revision = 0x10;
+k->class_id = PCI_CLASS_OTHERS;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static void pci_snapshot_reg

[RFC 2/3] implement ram save/restore

2022-07-22 Thread Richard Liu
Use a file-backed copy-on-write mmap region for snapshots. Restores are
handled by remmaping the fixed region. Currently, the snapshot file save
path (`filepath`) is hardcoded (to a path that is memory-backed on my
machine).

Signed-off-by: Richard Liu 
---
 hw/misc/snapshot.c | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/hw/misc/snapshot.c b/hw/misc/snapshot.c
index 2690b331fd..510bf59dce 100644
--- a/hw/misc/snapshot.c
+++ b/hw/misc/snapshot.c
@@ -18,8 +18,63 @@ DECLARE_INSTANCE_CHECKER(SnapshotState, SNAPSHOT,
 struct SnapshotState {
 PCIDevice pdev;
 MemoryRegion mmio;
+
+// track saved stated to prevent re-saving
+bool is_saved;
+
+// saved cpu and devices state
+QIOChannelBuffer *ioc;
 };
 
+// memory save location (for better performance, use tmpfs)
+const char *filepath = "/Volumes/RAMDisk/snapshot_0";
+
+static void save_snapshot(struct SnapshotState *s) {
+if (s->is_saved) {
+return;
+}
+s->is_saved = true;
+
+// save memory state to file
+int fd = -1;
+uint8_t *guest_mem = current_machine->ram->ram_block->host;
+size_t guest_size = current_machine->ram->ram_block->max_length;
+
+fd = open(filepath, O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600);
+ftruncate(fd, guest_size);
+
+char *map = mmap(0, guest_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+memcpy(map, guest_mem, guest_size);
+msync(map, guest_size, MS_SYNC);
+munmap(map, guest_size);
+close(fd);
+
+// unmap the guest, we will now use a MAP_PRIVATE
+munmap(guest_mem, guest_size);
+
+// map as MAP_PRIVATE to avoid carrying writes back to the saved file
+fd = open(filepath, O_RDONLY);
+mmap(guest_mem, guest_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_FIXED, fd, 0);
+}
+
+static void restore_snapshot(struct SnapshotState *s) {
+int fd = -1;
+uint8_t *guest_mem = current_machine->ram->ram_block->host;
+size_t guest_size = current_machine->ram->ram_block->max_length;
+
+if (!s->is_saved) {
+fprintf(stderr, "[QEMU] ERROR: attempting to restore but state has not 
been saved!\n");
+return;
+}
+
+munmap(guest_mem, guest_size);
+
+// remap the snapshot at the same location
+fd = open(filepath, O_RDONLY);
+mmap(guest_mem, guest_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_FIXED, fd, 0);
+close(fd);
+}
+
 static uint64_t snapshot_mmio_read(void *opaque, hwaddr addr, unsigned size)
 {
 return 0;
@@ -28,6 +83,21 @@ static uint64_t snapshot_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 static void snapshot_mmio_write(void *opaque, hwaddr addr, uint64_t val,
 unsigned size)
 {
+SnapshotState *snapshot = opaque;
+(void)snapshot;
+
+switch (addr) {
+case 0x00:
+switch (val) {
+case 0x101:
+save_snapshot(snapshot);
+break;
+case 0x102:
+restore_snapshot(snapshot);
+break;
+}
+break;
+}
 }
 
 static const MemoryRegionOps snapshot_mmio_ops = {
@@ -48,6 +118,8 @@ static const MemoryRegionOps snapshot_mmio_ops = {
 static void pci_snapshot_realize(PCIDevice *pdev, Error **errp)
 {
 SnapshotState *snapshot = SNAPSHOT(pdev);
+snapshot->is_saved = false;
+snapshot->ioc = NULL;
 
 memory_region_init_io(&snapshot->mmio, OBJECT(snapshot), 
&snapshot_mmio_ops, snapshot,
 "snapshot-mmio", 1 * MiB);
-- 
2.35.1




[RFC 3/3] use migration code for cpu and device save/restore

2022-07-22 Thread Richard Liu
Reused device migration code for cpu and device state snapshots. In this
initial version, I used several hacks to get the device code working.

vm_stop doesn't have the intended effect (for qemu_save_device_state)
unless called outside the vcpu thread. I trick the function into
thinking it is outside the vcpu thread by temporarily setting
`current_cpu` to be null.

The restore code (qemu_loadvm_state in particular) needs to be called
in a bottom half or a coroutine. I am not sure why.

Signed-off-by: Richard Liu 
---
 hw/misc/snapshot.c |  6 
 migration/savevm.c | 84 ++
 migration/savevm.h |  3 ++
 3 files changed, 93 insertions(+)

diff --git a/hw/misc/snapshot.c b/hw/misc/snapshot.c
index 510bf59dce..afdc5b7f15 100644
--- a/hw/misc/snapshot.c
+++ b/hw/misc/snapshot.c
@@ -55,6 +55,9 @@ static void save_snapshot(struct SnapshotState *s) {
 // map as MAP_PRIVATE to avoid carrying writes back to the saved file
 fd = open(filepath, O_RDONLY);
 mmap(guest_mem, guest_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_FIXED, fd, 0);
+
+// save cpu and device state
+s->ioc = qemu_snapshot_save_cpu_state();
 }
 
 static void restore_snapshot(struct SnapshotState *s) {
@@ -73,6 +76,9 @@ static void restore_snapshot(struct SnapshotState *s) {
 fd = open(filepath, O_RDONLY);
 mmap(guest_mem, guest_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_FIXED, fd, 0);
 close(fd);
+
+// restore cpu and device state
+qemu_snapshot_load_cpu_state(s->ioc);
 }
 
 static uint64_t snapshot_mmio_read(void *opaque, hwaddr addr, unsigned size)
diff --git a/migration/savevm.c b/migration/savevm.c
index 48e85c052c..62e5e5b564 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3309,3 +3309,87 @@ void qmp_snapshot_delete(const char *job_id,
 
 job_start(&s->common);
 }
+
+// saves the cpu and devices state
+QIOChannelBuffer* qemu_snapshot_save_cpu_state(void)
+{
+QEMUFile *f;
+QIOChannelBuffer *ioc;
+MigrationState *ms = migrate_get_current();
+int ret;
+
+/* This is a hack to trick vm_stop() into thinking it is not in vcpu 
thread.
+ * This is needed to properly stop the VM for a snapshot.
+ */
+CPUState *cpu = current_cpu;
+current_cpu = NULL;
+vm_stop(RUN_STATE_SAVE_VM);
+current_cpu = cpu;
+
+global_state_store_running();
+
+ioc = qio_channel_buffer_new(0x1);
+qio_channel_set_name(QIO_CHANNEL(ioc), "snapshot-buffer");
+f = qemu_file_new_output(QIO_CHANNEL(ioc));
+
+/* We need to initialize migration otherwise qemu_save_device_state() will
+ * complain.
+ */
+migrate_init(ms);
+ms->state = MIGRATION_STATUS_NONE;
+ms->send_configuration = false;
+
+cpu_synchronize_all_states();
+
+ret = qemu_save_device_state(f);
+if (ret < 0) {
+fprintf(stderr, "[QEMU] save device err: %d\n", ret);
+}
+
+// clean up and restart vm
+qemu_fflush(f);
+g_free(f);
+
+vm_start();
+
+/* Needed so qemu_loadvm_state does not error with:
+ * qemu-system-x86_64: Expected vmdescription section, but got 0
+ */
+ms->state = MIGRATION_STATUS_POSTCOPY_ACTIVE;
+
+return ioc;
+}
+
+// loads the cpu and devices state
+static void do_snapshot_load(void* opaque) {
+QIOChannelBuffer *ioc = opaque;
+QEMUFile *f;
+int ret;
+
+vm_stop(RUN_STATE_RESTORE_VM);
+
+// seek back to beginning of file
+qio_channel_io_seek(QIO_CHANNEL(ioc), 0, 0, NULL);
+f = qemu_file_new_input(QIO_CHANNEL(ioc));
+
+ret = qemu_loadvm_state(f);
+if (ret < 0) {
+fprintf(stderr, "[QEMU] loadvm err: %d\n", ret);
+}
+
+vm_start();
+
+g_free(f);
+
+// print time to debug speed
+struct timespec ts;
+clock_gettime(CLOCK_MONOTONIC, &ts);
+fprintf(stderr, "loaded snapshot at %ld.%ld\n", ts.tv_sec, ts.tv_nsec);
+}
+
+void qemu_snapshot_load_cpu_state(QIOChannelBuffer *ioc) {
+/* Run in a bh because otherwise qemu_loadvm_state won't work
+ */
+QEMUBH *bh = qemu_bh_new(do_snapshot_load, ioc);
+qemu_bh_schedule(bh);
+}
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..990bcddd2f 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -68,4 +68,7 @@ int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 bool in_postcopy, bool inactivate_disks);
 
+QIOChannelBuffer* qemu_snapshot_save_cpu_state(void);
+void qemu_snapshot_load_cpu_state(QIOChannelBuffer *ioc);
+
 #endif
-- 
2.35.1




Re: [PATCH v6 00/14] qapi: net: add unix socket type support to netdev backend

2022-07-22 Thread Laurent Vivier

On 22/07/2022 20:56, Laurent Vivier wrote:
...

Please ignore this series, bad numbering and patch 11 breaks SMTP server...

Sent v7 with another SMTP server.

Thanks,
LAurent




[RFC 0/3] add snapshot/restore fuzzing device

2022-07-22 Thread Richard Liu
This RFC adds a virtual device for snapshot/restores within QEMU. I am working
on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores
within QEMU is helpful for code fuzzing.

I reused the migration code for saving and restoring virtual device and CPU
state. As for the RAM, I am using a simple COW mmaped file to do restores.

The loadvm migration function I used for doing restores only worked after I
called it from a qemu_bh. I'm not sure if I should run the migration code in a
separate thread (see patch 3), since currently it is running as a part of the
device code in the vCPU thread.

This is a rough first revision and feedback on the cpu and device state restores
is appreciated.

To test locally, boot up any linux distro. I used the following C file to
interact with the PCI snapshot device:

#include 
#include 
#include 
#include 
#include 

int main() {
int fd = open("/sys/bus/pci/devices/:00:04.0/resource0", O_RDWR | 
O_SYNC);
size_t size = 1024 * 1024;
uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, 
fd, 0);

printf("%x\n", memory[0]);

int a = 0;
memory[0] = 0x101; // save snapshot
printf("before: value of a = %d\n", a);
a = 1;
printf("middle: value of a = %d\n", a);
memory[0] = 0x102; // load snapshot
printf("after: value of a = %d\n", a);

return 0;
}

Richard Liu (3):
  create skeleton snapshot device and add docs
  implement ram save/restore
  use migration code for cpu and device save/restore

 docs/devel/snapshot.rst |  26 +++
 hw/i386/Kconfig |   1 +
 hw/misc/Kconfig |   3 +
 hw/misc/meson.build |   1 +
 hw/misc/snapshot.c  | 164 
 migration/savevm.c  |  84 
 migration/savevm.h  |   3 +
 7 files changed, 282 insertions(+)
 create mode 100644 docs/devel/snapshot.rst
 create mode 100644 hw/misc/snapshot.c

-- 
2.35.1




[PATCH v7 07/14] net: stream: add unix socket

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/stream.c| 108 +---
 qapi/net.json   |   2 +-
 qemu-options.hx |   1 +
 3 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index e8afbaca50b6..0f91ff20df61 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -235,7 +235,7 @@ static NetStreamState 
*net_stream_fd_init_stream(NetClientState *peer,
 static void net_stream_accept(void *opaque)
 {
 NetStreamState *s = opaque;
-struct sockaddr_in saddr;
+struct sockaddr_storage saddr;
 socklen_t len;
 int fd;
 
@@ -253,9 +253,27 @@ static void net_stream_accept(void *opaque)
 s->fd = fd;
 s->nc.link_down = false;
 net_stream_connect(s);
-snprintf(s->nc.info_str, sizeof(s->nc.info_str),
- "connection from %s:%d",
- inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+switch (saddr.ss_family) {
+case AF_INET: {
+struct sockaddr_in *saddr_in = (struct sockaddr_in *)&saddr;
+
+snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+ "connection from %s:%d",
+ inet_ntoa(saddr_in->sin_addr), ntohs(saddr_in->sin_port));
+break;
+}
+case AF_UNIX: {
+struct sockaddr_un saddr_un;
+
+len = sizeof(saddr_un);
+getsockname(s->listen_fd, (struct sockaddr *)&saddr_un, &len);
+snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+ "connect from %s", saddr_un.sun_path);
+break;
+}
+default:
+g_assert_not_reached();
+}
 }
 
 static int net_stream_server_init(NetClientState *peer,
@@ -295,6 +313,43 @@ static int net_stream_server_init(NetClientState *peer,
 }
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un saddr_un;
+
+ret = unlink(addr->u.q_unix.path);
+if (ret < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno, "failed to unlink socket %s",
+ addr->u.q_unix.path);
+return -1;
+}
+
+saddr_un.sun_family = PF_UNIX;
+ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
+   addr->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   addr->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(saddr_un.sun_path));
+return -1;
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create stream socket");
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+ret = bind(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un));
+if (ret < 0) {
+error_setg_errno(errp, errno, "can't create socket with path: %s",
+ saddr_un.sun_path);
+closesocket(fd);
+return -1;
+}
+break;
+}
 case SOCKET_ADDRESS_TYPE_FD:
 fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
 if (fd == -1) {
@@ -380,6 +435,49 @@ static int net_stream_client_init(NetClientState *peer,
ntohs(saddr_in.sin_port));
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un saddr_un;
+
+saddr_un.sun_family = PF_UNIX;
+ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
+   addr->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   addr->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(saddr_un.sun_path));
+return -1;
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create stream socket");
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+connected = 0;
+for (;;) {
+ret = connect(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un));
+if (ret < 0) {
+if (errno == EINTR || errno == EWOULDBLOCK) {
+/* continue */
+} else if (errno == EAGAIN ||
+   errno == EALREADY) {
+break;
+} else {
+error_setg_errno(errp, errno, "can't connect socket");
+closesocket(fd);
+return -1;
+}
+} else {
+connected = 1;
+break;
+}
+}
+info_str = g_strdup_printf(" connect to %s", saddr_un.sun_

[PATCH v7 01/14] net: introduce convert_host_port()

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 include/qemu/sockets.h |  2 ++
 net/net.c  | 62 ++
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 038faa157f59..47194b9732f8 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -47,6 +47,8 @@ void socket_listen_cleanup(int fd, Error **errp);
 int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
 
 /* Old, ipv4 only bits.  Don't use for new code. */
+int convert_host_port(struct sockaddr_in *saddr, const char *host,
+  const char *port, Error **errp);
 int parse_host_port(struct sockaddr_in *saddr, const char *str,
 Error **errp);
 int socket_init(void);
diff --git a/net/net.c b/net/net.c
index 2db160e0634d..d2288bd3a929 100644
--- a/net/net.c
+++ b/net/net.c
@@ -66,55 +66,57 @@ static QTAILQ_HEAD(, NetClientState) net_clients;
 /***/
 /* network device redirectors */
 
-int parse_host_port(struct sockaddr_in *saddr, const char *str,
-Error **errp)
+int convert_host_port(struct sockaddr_in *saddr, const char *host,
+  const char *port, Error **errp)
 {
-gchar **substrings;
 struct hostent *he;
-const char *addr, *p, *r;
-int port, ret = 0;
+const char *r;
+long p;
 
 memset(saddr, 0, sizeof(*saddr));
 
-substrings = g_strsplit(str, ":", 2);
-if (!substrings || !substrings[0] || !substrings[1]) {
-error_setg(errp, "host address '%s' doesn't contain ':' "
-   "separating host from port", str);
-ret = -1;
-goto out;
-}
-
-addr = substrings[0];
-p = substrings[1];
-
 saddr->sin_family = AF_INET;
-if (addr[0] == '\0') {
+if (host[0] == '\0') {
 saddr->sin_addr.s_addr = 0;
 } else {
-if (qemu_isdigit(addr[0])) {
-if (!inet_aton(addr, &saddr->sin_addr)) {
+if (qemu_isdigit(host[0])) {
+if (!inet_aton(host, &saddr->sin_addr)) {
 error_setg(errp, "host address '%s' is not a valid "
-   "IPv4 address", addr);
-ret = -1;
-goto out;
+   "IPv4 address", host);
+return -1;
 }
 } else {
-he = gethostbyname(addr);
+he = gethostbyname(host);
 if (he == NULL) {
-error_setg(errp, "can't resolve host address '%s'", addr);
-ret = -1;
-goto out;
+error_setg(errp, "can't resolve host address '%s'", host);
+return -1;
 }
 saddr->sin_addr = *(struct in_addr *)he->h_addr;
 }
 }
-port = strtol(p, (char **)&r, 0);
-if (r == p) {
-error_setg(errp, "port number '%s' is invalid", p);
+if (qemu_strtol(port, &r, 0, &p) != 0) {
+error_setg(errp, "port number '%s' is invalid", port);
+return -1;
+}
+saddr->sin_port = htons(p);
+return 0;
+}
+
+int parse_host_port(struct sockaddr_in *saddr, const char *str,
+Error **errp)
+{
+gchar **substrings;
+int ret;
+
+substrings = g_strsplit(str, ":", 2);
+if (!substrings || !substrings[0] || !substrings[1]) {
+error_setg(errp, "host address '%s' doesn't contain ':' "
+   "separating host from port", str);
 ret = -1;
 goto out;
 }
-saddr->sin_port = htons(port);
+
+ret = convert_host_port(saddr, substrings[0], substrings[1], errp);
 
 out:
 g_strfreev(substrings);
-- 
2.37.1




[PATCH v7 06/14] net: stream: Don't ignore EINVAL on netdev socket connection

2022-07-22 Thread Laurent Vivier
From: Stefano Brivio 

Other errors are treated as failure by net_stream_client_init(),
but if connect() returns EINVAL, we'll fail silently. Remove the
related exception.

Signed-off-by: Stefano Brivio 
[lvivier: applied to net/stream.c]
Signed-off-by: Laurent Vivier 
Reviewed-by: Daniel P. Berrangé 
---
 net/stream.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index 0851e90becca..e8afbaca50b6 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -363,8 +363,7 @@ static int net_stream_client_init(NetClientState *peer,
 if (errno == EINTR || errno == EWOULDBLOCK) {
 /* continue */
 } else if (errno == EINPROGRESS ||
-   errno == EALREADY ||
-   errno == EINVAL) {
+   errno == EALREADY) {
 break;
 } else {
 error_setg_errno(errp, errno, "can't connect socket");
-- 
2.37.1




[PATCH v7 13/14] net: stream: move to QIO

2022-07-22 Thread Laurent Vivier
Use QIOChannel, QIOChannelSocket and QIONetListener.

Signed-off-by: Laurent Vivier 
---
 net/stream.c | 480 ++-
 1 file changed, 168 insertions(+), 312 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index 0f91ff20df61..c4ddb44fc3ed 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -34,48 +34,36 @@
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
 #include "qemu/cutils.h"
+#include "io/channel.h"
+#include "io/channel-socket.h"
+#include "io/net-listener.h"
 
 typedef struct NetStreamState {
 NetClientState nc;
-int listen_fd;
-int fd;
+QIOChannel *listen_ioc;
+QIONetListener *listener;
+QIOChannel *ioc;
+guint ioc_read_tag;
+guint ioc_write_tag;
 SocketReadState rs;
 unsigned int send_index;  /* number of bytes sent*/
-IOHandler *send_fn;
-bool read_poll;   /* waiting to receive data? */
-bool write_poll;  /* waiting to transmit data? */
 } NetStreamState;
 
-static void net_stream_accept(void *opaque);
-static void net_stream_writable(void *opaque);
+static void net_stream_listen(QIONetListener *listener,
+  QIOChannelSocket *cioc,
+  void *opaque);
 
-static void net_stream_update_fd_handler(NetStreamState *s)
+static gboolean net_stream_writable(QIOChannel *ioc,
+GIOCondition condition,
+gpointer data)
 {
-qemu_set_fd_handler(s->fd,
-s->read_poll ? s->send_fn : NULL,
-s->write_poll ? net_stream_writable : NULL,
-s);
-}
+NetStreamState *s = data;
 
-static void net_stream_read_poll(NetStreamState *s, bool enable)
-{
-s->read_poll = enable;
-net_stream_update_fd_handler(s);
-}
-
-static void net_stream_write_poll(NetStreamState *s, bool enable)
-{
-s->write_poll = enable;
-net_stream_update_fd_handler(s);
-}
-
-static void net_stream_writable(void *opaque)
-{
-NetStreamState *s = opaque;
-
-net_stream_write_poll(s, false);
+s->ioc_write_tag = 0;
 
 qemu_flush_queued_packets(&s->nc);
+
+return G_SOURCE_REMOVE;
 }
 
 static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf,
@@ -92,12 +80,15 @@ static ssize_t net_stream_receive(NetClientState *nc, const 
uint8_t *buf,
 .iov_len  = size,
 },
 };
+struct iovec local_iov[2];
+unsigned int nlocal_iov;
 size_t remaining;
 ssize_t ret;
 
-remaining = iov_size(iov, 2) - s->send_index;
-ret = iov_send(s->fd, iov, 2, s->send_index, remaining);
 
+remaining = iov_size(iov, 2) - s->send_index;
+nlocal_iov = iov_copy(local_iov, 2, iov, 2, s->send_index, remaining);
+ret = qio_channel_writev(s->ioc, local_iov, nlocal_iov, NULL);
 if (ret == -1 && errno == EAGAIN) {
 ret = 0; /* handled further down */
 }
@@ -107,19 +98,25 @@ static ssize_t net_stream_receive(NetClientState *nc, 
const uint8_t *buf,
 }
 if (ret < (ssize_t)remaining) {
 s->send_index += ret;
-net_stream_write_poll(s, true);
+s->ioc_write_tag = qio_channel_add_watch(s->ioc, G_IO_OUT,
+ net_stream_writable, s, NULL);
 return 0;
 }
 s->send_index = 0;
 return size;
 }
 
+static gboolean net_stream_send(QIOChannel *ioc,
+GIOCondition condition,
+gpointer data);
+
 static void net_stream_send_completed(NetClientState *nc, ssize_t len)
 {
 NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc);
 
-if (!s->read_poll) {
-net_stream_read_poll(s, true);
+if (!s->ioc_read_tag) {
+s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN,
+net_stream_send, s, NULL);
 }
 }
 
@@ -130,19 +127,24 @@ static void net_stream_rs_finalize(SocketReadState *rs)
 if (qemu_send_packet_async(&s->nc, rs->buf,
rs->packet_len,
net_stream_send_completed) == 0) {
-net_stream_read_poll(s, false);
+if (s->ioc_read_tag) {
+g_source_remove(s->ioc_read_tag);
+s->ioc_read_tag = 0;
+}
 }
 }
 
-static void net_stream_send(void *opaque)
+static gboolean net_stream_send(QIOChannel *ioc,
+GIOCondition condition,
+gpointer data)
 {
-NetStreamState *s = opaque;
+NetStreamState *s = data;
 int size;
 int ret;
-uint8_t buf1[NET_BUFSIZE];
-const uint8_t *buf;
+char buf1[NET_BUFSIZE];
+const char *buf;
 
-size = recv(s->fd, buf1, sizeof(buf1), 0);
+size = qio_channel_read(s->ioc, buf1, sizeof(buf1), NULL);
 if (size < 0) {
 if (errno != EWOULDBLOCK) {
 goto eoc;
@@ -150,52 +152,63 @@ static void net_stream_se

[PATCH v7 05/14] qapi: net: add stream and dgram netdevs

2022-07-22 Thread Laurent Vivier
Copied from socket netdev file and modified to use SocketAddress
to be able to introduce new features like unix socket.

"udp" and "mcast" are squashed into dgram netdev, multicast is detected
according to the IP address type.
"listen" and "connect" modes are managed by stream netdev. An optional
parameter "server" defines the mode (server by default)

The two new types need to be parsed the modern way with -netdev, because
with the traditional way, the "type" field of netdev structure collides with
the "type" field of SocketAddress and prevents the correct evaluation of the
command line option. Moreover the traditional way doesn't allow to use
the same type (SocketAddress) several times with the -netdev option
(needed to specify "local" and "remote" addresses).

The previous commit paved the way for parsing the modern way, but
omitted one detail: how to pick modern vs. traditional, in
netdev_is_modern().

We want to pick based on the value of parameter "type".  But how to
extract it from the option argument?

Parsing the option argument, either the modern or the traditional way,
extracts it for us, but only if parsing succeeds.

If parsing fails, there is no good option.  No matter which parser we
pick, it'll be the wrong one for some arguments, and the error
reporting will be confusing.

Fortunately, the traditional parser accepts *anything* when called in
a certain way.  This maximizes our chance to extract the value of
"type", and in turn minimizes the risk of confusing error reporting.

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 hmp-commands.hx |   2 +-
 net/clients.h   |   6 +
 net/dgram.c | 631 
 net/hub.c   |   2 +
 net/meson.build |   2 +
 net/net.c   |  30 ++-
 net/stream.c| 423 
 qapi/net.json   |  63 -
 qemu-options.hx |  12 +
 9 files changed, 1167 insertions(+), 4 deletions(-)
 create mode 100644 net/dgram.c
 create mode 100644 net/stream.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 182e639d1498..83e8d45a2a8b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1276,7 +1276,7 @@ ERST
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
-.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user"
+.params = 
"[user|tap|socket|stream|dgram|vde|bridge|hubport|netmap|vhost-user"
 #ifdef CONFIG_VMNET
   "|vmnet-host|vmnet-shared|vmnet-bridged"
 #endif
diff --git a/net/clients.h b/net/clients.h
index c9157789f2ce..ed8bdfff1e7c 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -40,6 +40,12 @@ int net_init_hubport(const Netdev *netdev, const char *name,
 int net_init_socket(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
 
+int net_init_stream(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp);
+
+int net_init_dgram(const Netdev *netdev, const char *name,
+   NetClientState *peer, Error **errp);
+
 int net_init_tap(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp);
 
diff --git a/net/dgram.c b/net/dgram.c
new file mode 100644
index ..dbe65102d174
--- /dev/null
+++ b/net/dgram.c
@@ -0,0 +1,631 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "net/net.h"
+#include "clients.h"
+#include "monitor/monitor.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+
+typedef struct NetDgramState {
+NetClientState nc;
+int listen_fd;
+int fd;
+SocketReadState rs;
+  /* contains inet host and port destination iff connectionless (SOCK_DGRAM) */

[PATCH v7 03/14] net: simplify net_client_parse() error management

2022-07-22 Thread Laurent Vivier
All net_client_parse() callers exit in case of error.

Move exit(1) to net_client_parse() and remove error checking from
the callers.

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 include/net/net.h |  2 +-
 net/net.c |  6 ++
 softmmu/vl.c  | 12 +++-
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index c53c64ac18c4..e755254443ea 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -214,7 +214,7 @@ extern NICInfo nd_table[MAX_NICS];
 extern const char *host_net_devices[];
 
 /* from net.c */
-int net_client_parse(QemuOptsList *opts_list, const char *str);
+void net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
 void net_init_clients(void);
 void net_check_clients(void);
diff --git a/net/net.c b/net/net.c
index 15958f881776..f056e8aebfb2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1579,13 +1579,11 @@ void net_init_clients(void)
   &error_fatal);
 }
 
-int net_client_parse(QemuOptsList *opts_list, const char *optarg)
+void net_client_parse(QemuOptsList *opts_list, const char *optarg)
 {
 if (!qemu_opts_parse_noisily(opts_list, optarg, true)) {
-return -1;
+exit(1);
 }
-
-return 0;
 }
 
 /* From FreeBSD */
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 8f3f3bb74389..0478210f2e04 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2815,21 +2815,15 @@ void qemu_init(int argc, char **argv, char **envp)
 break;
 case QEMU_OPTION_netdev:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("netdev"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("netdev"), optarg);
 break;
 case QEMU_OPTION_nic:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("nic"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("nic"), optarg);
 break;
 case QEMU_OPTION_net:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("net"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("net"), optarg);
 break;
 #ifdef CONFIG_LIBISCSI
 case QEMU_OPTION_iscsi:
-- 
2.37.1




[PATCH v7 11/14] qemu-sockets: move and rename SocketAddress_to_str()

2022-07-22 Thread Laurent Vivier
Rename SocketAddress_to_str() to socket_uri() and move it to
util/qemu-sockets.c close to socket_parse().

socket_uri() generates a string from a SocketAddress while
socket_parse() generates a SocketAddress from a string.

Signed-off-by: Laurent Vivier 
---
 include/qemu/sockets.h |  2 +-
 monitor/hmp-cmds.c | 23 +--
 util/qemu-sockets.c| 20 
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 47194b9732f8..e5a06d2e3729 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -40,6 +40,7 @@ NetworkAddressFamily inet_netfamily(int family);
 int unix_listen(const char *path, Error **errp);
 int unix_connect(const char *path, Error **errp);
 
+char *socket_uri(SocketAddress *addr);
 SocketAddress *socket_parse(const char *str, Error **errp);
 int socket_connect(SocketAddress *addr, Error **errp);
 int socket_listen(SocketAddress *addr, int num, Error **errp);
@@ -123,5 +124,4 @@ SocketAddress *socket_address_flatten(SocketAddressLegacy 
*addr);
  * Return 0 on success.
  */
 int socket_address_parse_named_fd(SocketAddress *addr, Error **errp);
-
 #endif /* QEMU_SOCKETS_H */
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index c6cd6f91dde6..cb35059c2d45 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -197,27 +197,6 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
 qapi_free_MouseInfoList(mice_list);
 }
 
-static char *SocketAddress_to_str(SocketAddress *addr)
-{
-switch (addr->type) {
-case SOCKET_ADDRESS_TYPE_INET:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.inet.host,
-   addr->u.inet.port);
-case SOCKET_ADDRESS_TYPE_UNIX:
-return g_strdup_printf("unix:%s",
-   addr->u.q_unix.path);
-case SOCKET_ADDRESS_TYPE_FD:
-return g_strdup_printf("fd:%s", addr->u.fd.str);
-case SOCKET_ADDRESS_TYPE_VSOCK:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.vsock.cid,
-   addr->u.vsock.port);
-default:
-return g_strdup("unknown address type");
-}
-}
-
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 {
 MigrationInfo *info;
@@ -380,7 +359,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "socket address: [\n");
 
 for (addr = info->socket_address; addr; addr = addr->next) {
-char *s = SocketAddress_to_str(addr->value);
+char *s = socket_uri(addr->value);
 monitor_printf(mon, "\t%s\n", s);
 g_free(s);
 }
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 13b5b197f9ea..870a36eb0e93 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -1098,6 +1098,26 @@ int unix_connect(const char *path, Error **errp)
 return sock;
 }
 
+char *socket_uri(SocketAddress *addr)
+{
+switch (addr->type) {
+case SOCKET_ADDRESS_TYPE_INET:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.inet.host,
+   addr->u.inet.port);
+case SOCKET_ADDRESS_TYPE_UNIX:
+return g_strdup_printf("unix:%s",
+   addr->u.q_unix.path);
+case SOCKET_ADDRESS_TYPE_FD:
+return g_strdup_printf("fd:%s", addr->u.fd.str);
+case SOCKET_ADDRESS_TYPE_VSOCK:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.vsock.cid,
+   addr->u.vsock.port);
+default:
+return g_strdup("unknown address type");
+}
+}
 
 SocketAddress *socket_parse(const char *str, Error **errp)
 {
-- 
2.37.1




[PATCH v7 14/14] tests/qtest: netdev: test stream and dgram backends

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
---
 tests/qtest/meson.build |   1 +
 tests/qtest/netdev-socket.c | 322 
 2 files changed, 323 insertions(+)
 create mode 100644 tests/qtest/netdev-socket.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 3a474010e49f..b0790873ecbf 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -26,6 +26,7 @@ qtests_generic = [
   'qom-test',
   'test-hmp',
   'qos-test',
+  'netdev-socket',
 ]
 if config_host.has_key('CONFIG_MODULES')
   qtests_generic += [ 'modules-test' ]
diff --git a/tests/qtest/netdev-socket.c b/tests/qtest/netdev-socket.c
new file mode 100644
index ..bceb30718812
--- /dev/null
+++ b/tests/qtest/netdev-socket.c
@@ -0,0 +1,322 @@
+/*
+ * QTest testcase for netdev stream and dgram
+ *
+ * Copyright (c) 2022 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include "libqtest.h"
+
+#define CONNECTION_TIMEOUT5
+
+#define EXPECT_STATE(q, e, t) \
+do {  \
+char *resp = qtest_hmp(q, "info network");\
+if (t) {  \
+strrchr(resp, t)[0] = 0;  \
+} \
+g_test_timer_start(); \
+while (g_test_timer_elapsed() < CONNECTION_TIMEOUT) { \
+if (strcmp(resp, e) == 0) {   \
+break;\
+} \
+g_free(resp); \
+resp = qtest_hmp(q, "info network");  \
+if (t) {  \
+strrchr(resp, t)[0] = 0;  \
+} \
+} \
+g_assert_cmpstr(resp, ==, e); \
+g_free(resp); \
+} while (0)
+
+static int inet_get_free_port(void)
+{
+int sock;
+struct sockaddr_in addr;
+socklen_t len;
+int port;
+
+sock = socket(AF_INET, SOCK_STREAM, 0);
+if (sock < 0) {
+return -1;
+}
+
+memset(&addr, 0, sizeof(addr));
+addr.sin_family = AF_INET;
+addr.sin_addr.s_addr = INADDR_ANY;
+addr.sin_port = 0;
+if (bind(sock, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
+return -1;
+}
+
+len = sizeof(addr);
+if (getsockname(sock,  (struct sockaddr *)&addr, &len) < 0) {
+return -1;
+}
+
+port = ntohs(addr.sin_port);
+
+close(sock);
+
+return port;
+}
+
+static void test_stream_inet(void)
+{
+QTestState *qts0, *qts1;
+char *expect;
+int port;
+
+port = inet_get_free_port();
+qts0 = qtest_initf("-nodefaults "
+   "-netdev stream,id=st0,addr.type=inet,"
+   "addr.host=localhost,addr.port=%d", port);
+
+EXPECT_STATE(qts0, "st0: index=0,type=stream,\r\n", 0);
+
+qts1 = qtest_initf("-nodefaults "
+   "-netdev stream,server=false,id=st0,addr.type=inet,"
+   "addr.host=localhost,addr.port=%d", port);
+
+expect = g_strdup_printf("st0: index=0,type=stream,tcp:127.0.0.1:%d\r\n",
+ port);
+EXPECT_STATE(qts1, expect, 0);
+g_free(expect);
+
+/* the port is unknown, check only the address */
+EXPECT_STATE(qts0, "st0: index=0,type=stream,tcp:127.0.0.1", ':');
+
+qtest_quit(qts1);
+qtest_quit(qts0);
+}
+
+static void test_stream_unix(void)
+{
+QTestState *qts0, *qts1;
+char *expect;
+gchar *path;
+int ret;
+
+ret = g_file_open_tmp("netdev-XX", &path, NULL);
+g_assert_true(ret >= 0);
+close(ret);
+
+qts0 = qtest_initf("-nodefaults "
+   "-netdev stream,id=st0,addr.type=unix,addr.path=%s",
+   path);
+
+EXPECT_STATE(qts0, "st0: index=0,type=stream,\r\n", 0);
+
+qts1 = qtest_initf("-nodefaults "
+   "-netdev 
stream,id=st0,server=false,addr.type=unix,addr.path=%s",
+   path);
+
+expect = g_strdup_printf("st0: index=0,type=stream,unix:%s\r\n", path);
+EXPECT_STATE(qts1, expect, 0);
+EXPECT_STATE(qts0, expect, 0);
+g_free(expect);
+g_free(path);
+
+qtest_quit(qts1);
+qtest_quit(qts0);
+}
+
+static void test_stream_fd(void)
+{
+QTestState *qts0, *qts1;
+char *expect;
+int ret, sock0, sock1;
+struct sockaddr_un addr;
+gchar *path;
+
+ret = g_file_open_tmp("netdev-XX", &path, NULL);
+g_assert_true(ret >= 0);
+close(ret);
+addr.sun_family = AF_UNIX;
+strcpy(addr.sun_path, path);
+
+unlink(addr.sun_path);
+sock0 = socket(AF_LOCAL, SOCK_STRE

[PATCH v7 09/14] net: dgram: move mcast specific code from net_socket_fd_init_dgram()

2022-07-22 Thread Laurent Vivier
It is less complex to manage special cases directly in
net_dgram_mcast_init() and net_dgram_udp_init().

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/dgram.c | 143 +++-
 1 file changed, 73 insertions(+), 70 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index dcc2205305c5..16e2d909755c 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -302,52 +302,11 @@ static NetClientInfo net_dgram_socket_info = {
 static NetDgramState *net_dgram_fd_init_dgram(NetClientState *peer,
   const char *model,
   const char *name,
-  int fd, int is_fd,
-  SocketAddress *mcast,
+  int fd,
   Error **errp)
 {
-struct sockaddr_in *saddr = NULL;
-int newfd;
 NetClientState *nc;
 NetDgramState *s;
-SocketAddress *sa;
-SocketAddressType sa_type;
-
-sa = socket_local_address(fd, errp);
-if (!sa) {
-return NULL;
-}
-sa_type = sa->type;
-qapi_free_SocketAddress(sa);
-
-/*
- * fd passed: multicast: "learn" dgram_dst address from bound address and
- * save it. Because this may be "shared" socket from a "master" process,
- * datagrams would be recv() by ONLY ONE process: we must "clone" this
- * dgram socket --jjo
- */
-
-if (is_fd && mcast != NULL) {
-saddr = g_new(struct sockaddr_in, 1);
-
-if (convert_host_port(saddr, mcast->u.inet.host, 
mcast->u.inet.port,
-  errp) < 0) {
-goto err;
-}
-/* must be bound */
-if (saddr->sin_addr.s_addr == 0) {
-error_setg(errp, "can't setup multicast destination address");
-goto err;
-}
-/* clone dgram socket */
-newfd = net_dgram_mcast_create(saddr, NULL, errp);
-if (newfd < 0) {
-goto err;
-}
-/* clone newfd to fd, close newfd */
-dup2(newfd, fd);
-close(newfd);
-}
 
 nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name);
 
@@ -359,24 +318,7 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
 net_socket_rs_init(&s->rs, net_dgram_rs_finalize, false);
 net_dgram_read_poll(s, true);
 
-/* mcast: save bound address as dst */
-if (saddr) {
-g_assert(s->dgram_dst == NULL);
-s->dgram_dst = (struct sockaddr *)saddr;
-snprintf(nc->info_str, sizeof(nc->info_str),
- "fd=%d (cloned mcast=%s:%d)",
- fd, inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
-} else {
-snprintf(nc->info_str, sizeof(nc->info_str), "fd=%d %s", fd,
- SocketAddressType_str(sa_type));
-}
-
 return s;
-
-err:
-g_free(saddr);
-closesocket(fd);
-return NULL;
 }
 
 static void net_dgram_connect(void *opaque)
@@ -421,6 +363,7 @@ static int net_dgram_mcast_init(NetClientState *peer,
 NetDgramState *s;
 int fd, ret;
 struct sockaddr_in *saddr;
+gchar *info_str;
 
 if (remote->type != SOCKET_ADDRESS_TYPE_INET) {
 error_setg(errp, "multicast only support inet type");
@@ -440,6 +383,9 @@ static int net_dgram_mcast_init(NetClientState *peer,
 g_free(saddr);
 return -1;
 }
+info_str = g_strdup_printf("mcast=%s:%d",
+   inet_ntoa(saddr->sin_addr),
+   ntohs(saddr->sin_port));
 } else {
 switch (local->type) {
 case SOCKET_ADDRESS_TYPE_INET: {
@@ -457,9 +403,14 @@ static int net_dgram_mcast_init(NetClientState *peer,
 g_free(saddr);
 return -1;
 }
+info_str = g_strdup_printf("mcast=%s:%d",
+   inet_ntoa(saddr->sin_addr),
+   ntohs(saddr->sin_port));
 break;
 }
-case SOCKET_ADDRESS_TYPE_FD:
+case SOCKET_ADDRESS_TYPE_FD: {
+int newfd;
+
 fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp);
 if (fd == -1) {
 g_free(saddr);
@@ -472,7 +423,46 @@ static int net_dgram_mcast_init(NetClientState *peer,
  name, fd);
 return -1;
 }
+
+/*
+ * fd passed: multicast: "learn" dgram_dst address from bound
+ * address and save it. Because this may be "shared" socket from a
+ * "master" process, datagrams would be recv() by ONLY ONE process:
+ * we must "clone" this dgram socket --jjo
+ */
+
+saddr = g_new(struct sockaddr_in, 1);
+
+   

Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-22 Thread Jason A. Donenfeld
Hi Alex,

On Fri, Jul 22, 2022 at 4:37 PM Alex Bennée  wrote:
> That sounds suspiciously like inventing a new ABI between QEMU and
> guests which we generally try to avoid.

Well the ABI is just the "rng-seed" param which is part of the DT
spec. But I can understand why you might find this use a bit "too
creative". So no qualms about dropping it.

Jason



[PATCH v7 00/14] qapi: net: add unix socket type support to netdev backend

2022-07-22 Thread Laurent Vivier
"-netdev socket" only supports inet sockets.

It's not a complex task to add support for unix sockets, but
the socket netdev parameters are not defined to manage well unix
socket parameters.

As discussed in:

  "socket.c added support for unix domain socket datagram transport"
  
https://lore.kernel.org/qemu-devel/1c0e1bc5-904f-46b0-8044-68e43e67b...@gmail.com/

This series adds support of unix socket type using SocketAddress QAPI structure.

Two new netdev backends, "stream" and "dgram" are added, that are barely a copy 
of "socket"
backend but they use the SocketAddress QAPI to provide socket parameters.
And then they also implement unix sockets (TCP and UDP).

Some examples of CLI syntax:

  for TCP:

  -netdev stream,id=socket0,addr.type=inet,addr.host=localhost,addr.port=1234
  -netdev 
stream,id=socket0,server=off,addr.type=inet,addr.host=localhost,addr.port=1234

  -netdev dgram,id=socket0,\
  local.type=inet,local.host=localhost,local.port=1234,\
  remote.type=inet,remote.host=localhost,remote.port=1235

  for UNIX:

  -netdev stream,id=socket0,addr.type=unix,addr.path=/tmp/qemu0
  -netdev stream,id=socket0,server=off,addr.type=unix,addr.path=/tmp/qemu0

  -netdev dgram,id=socket0,\
  local.type=unix,local.path=/tmp/qemu0,\
  remote.type=unix,remote.path=/tmp/qemu1

  for FD:

  -netdev stream,id=socket0,addr.type=fd,addr.str=4
  -netdev stream,id=socket0,server=off,addr.type=fd,addr.str=5

  -netdev dgram,id=socket0,local.type=fd,addr.str=4

v7:
  - add qtests
  - update parameters table in net.json
  - update socket_uri() and socket_parse()

v6:
  - s/netdev option/-netdev option/ PATCH 4
  - s/ / /
  - update @NetdevStreamOptions and @NetdevDgramOptions comments
  - update PATCH 4 description message
  - add missing return in error case for unix stream socket
  - split socket_uri() patch: move and rename, then change content

v5:
  - remove RFC prefix
  - put the change of net_client_parse() into its own patch (exit() in the
function)
  - update comments regarding netdev_is_modern() and netdev_parse_modern()
  - update error case in net_stream_server_init()
  - update qemu-options.hx with unix type
  - fix HMP "info network" with unix protocol/server side.

v4:
  - net_client_parse() fails with exit() rather than with return.
  - keep "{ 'name': 'vmnet-host', 'if': 'CONFIG_VMNET' }" on its
own line in qapi/net.json
  - add a comment in qapi/net.json about parameters usage
  - move netdev_is_modern() check to qemu_init()
  - in netdev_is_modern(), check for JSON and use qemu_opts_do_parse()
to parse parameters and detect type value.
  - add a blank line after copyright comment

v3:
  - remove support of "-net" for dgram and stream. They are only
supported with "-netdev" option.
  - use &error_fatal directly in net_client_inits()
  - update qemu-options.hx
  - move to QIO for stream socket

v2:
  - use "stream" and "dgram" rather than "socket-ng,mode=stream"
and ""socket-ng,mode=dgram"
  - extract code to bypass qemu_opts_parse_noisily() to
a new patch
  - do not ignore EINVAL (Stefano)
  - fix "-net" option

CC: Ralph Schmieder 
CC: Stefano Brivio 
CC: Daniel P. Berrangé 
CC: Markus Armbruster 

Laurent Vivier (13):
  net: introduce convert_host_port()
  net: remove the @errp argument of net_client_inits()
  net: simplify net_client_parse() error management
  qapi: net: introduce a way to bypass qemu_opts_parse_noisily()
  qapi: net: add stream and dgram netdevs
  net: stream: add unix socket
  net: dgram: make dgram_dst generic
  net: dgram: move mcast specific code from net_socket_fd_init_dgram()
  net: dgram: add unix socket
  qemu-sockets: move and rename SocketAddress_to_str()
  qemu-sockets: update socket_uri() and socket_parse()  to be consistent
  net: stream: move to QIO
  tests/qtest: netdev: test stream and dgram backends

Stefano Brivio (1):
  net: stream: Don't ignore EINVAL on netdev socket connection

 hmp-commands.hx |   2 +-
 include/net/net.h   |   6 +-
 include/qemu/sockets.h  |   4 +-
 monitor/hmp-cmds.c  |  23 +-
 net/clients.h   |   6 +
 net/dgram.c | 707 
 net/hub.c   |   2 +
 net/meson.build |   2 +
 net/net.c   | 169 ++---
 net/stream.c| 376 +++
 qapi/net.json   |  63 +++-
 qemu-options.hx |  14 +
 softmmu/vl.c|  16 +-
 tests/qtest/meson.build |   1 +
 tests/qtest/netdev-socket.c | 322 
 util/qemu-sockets.c |  25 ++
 16 files changed, 1656 insertions(+), 82 deletions(-)
 create mode 100644 net/dgram.c
 create mode 100644 net/stream.c
 create mode 100644 tests/qtest/netdev-socket.c

-- 
2.37.1




[PATCH v6 10/14] net: dgram: add unix socket

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/dgram.c | 65 ++---
 qapi/net.json   |  2 +-
 qemu-options.hx |  1 +
 3 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index 16e2d909755c..9f3eafee3b67 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -86,8 +86,15 @@ static ssize_t net_dgram_receive_dgram(NetClientState *nc,
 
 do {
 if (s->dgram_dst) {
-ret = sendto(s->fd, buf, size, 0, s->dgram_dst,
- sizeof(struct sockaddr_in));
+socklen_t len;
+
+if (s->dgram_dst->sa_family == AF_INET) {
+len = sizeof(struct sockaddr_in);
+} else {
+len = sizeof(struct sockaddr_un);
+}
+
+ret = sendto(s->fd, buf, size, 0, s->dgram_dst, len);
 } else {
 ret = send(s->fd, buf, size, 0);
 }
@@ -509,7 +516,7 @@ static int net_dgram_udp_init(NetClientState *peer,
 }
 } else {
 if (local->type != SOCKET_ADDRESS_TYPE_FD) {
-error_setg(errp, "type=inet requires remote parameter");
+error_setg(errp, "type=inet or unix require remote parameter");
 return -1;
 }
 }
@@ -559,6 +566,58 @@ static int net_dgram_udp_init(NetClientState *peer,
 
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un laddr_un, raddr_un;
+
+ret = unlink(local->u.q_unix.path);
+if (ret < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno, "failed to unlink socket %s",
+ local->u.q_unix.path);
+return -1;
+}
+
+laddr_un.sun_family = PF_UNIX;
+ret = snprintf(laddr_un.sun_path, sizeof(laddr_un.sun_path), "%s",
+   local->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(laddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   local->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(laddr_un.sun_path));
+}
+
+raddr_un.sun_family = PF_UNIX;
+ret = snprintf(raddr_un.sun_path, sizeof(raddr_un.sun_path), "%s",
+   remote->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(raddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   remote->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(raddr_un.sun_path));
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_DGRAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create datagram socket");
+return -1;
+}
+
+ret = bind(fd, (struct sockaddr *)&laddr_un, sizeof(laddr_un));
+if (ret < 0) {
+error_setg_errno(errp, errno, "can't bind unix=%s to socket",
+ laddr_un.sun_path);
+closesocket(fd);
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+dgram_dst = g_malloc(sizeof(raddr_un));
+memcpy(dgram_dst, &raddr_un, sizeof(raddr_un));
+
+info_str = g_strdup_printf("udp=%s:%s",
+   laddr_un.sun_path, raddr_un.sun_path);
+break;
+}
 case SOCKET_ADDRESS_TYPE_FD: {
 SocketAddress *sa;
 SocketAddressType sa_type;
diff --git a/qapi/net.json b/qapi/net.json
index 518f288758b0..894c06a82b1b 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -600,7 +600,7 @@
 # @remote: remote address
 # @local: local address
 #
-# Only SocketAddress types 'inet' and 'fd' are supported.
+# Only SocketAddress types 'unix', 'inet' and 'fd' are supported.
 #
 # The code checks there is at least one of these options and reports an error
 # if not. If remote address is present and it's a multicast address, local
diff --git a/qemu-options.hx b/qemu-options.hx
index 827a951a9ef2..33c5bd72af21 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2736,6 +2736,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "configure a network backend to connect to a multicast 
maddr and port\n"
 "use ``local.host=addr`` to specify the host address to 
send packets from\n"
 "-netdev 
dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]\n"
+"-netdev 
dgram,id=str,local.type=unix,local.path=path[,remote.type=unix,remote.path=path]\n"
 "-netdev dgram,id=str,local.type=fd,local.str=h\n"
 "configure a network backend to connect to another 
network\n"
 "using an UDP tunnel\n"
-- 
2.37.1




[PATCH v7 04/14] qapi: net: introduce a way to bypass qemu_opts_parse_noisily()

2022-07-22 Thread Laurent Vivier
As qemu_opts_parse_noisily() flattens the QAPI structures ("type" field
of Netdev structure can collides with "type" field of SocketAddress),
we introduce a way to bypass qemu_opts_parse_noisily() and use directly
visit_type_Netdev() to parse the backend parameters.

More details from Markus:

qemu_init() passes the argument of -netdev, -nic, and -net to
net_client_parse().

net_client_parse() parses with qemu_opts_parse_noisily(), passing
QemuOptsList qemu_netdev_opts for -netdev, qemu_nic_opts for -nic, and
qemu_net_opts for -net.  Their desc[] are all empty, which means any
keys are accepted.  The result of the parse (a QemuOpts) is stored in
the QemuOptsList.

Note that QemuOpts is flat by design.  In some places, we layer non-flat
on top using dotted keys convention, but not here.

net_init_clients() iterates over the stored QemuOpts, and passes them to
net_init_netdev(), net_param_nic(), or net_init_client(), respectively.

These functions pass the QemuOpts to net_client_init().  They also do
other things with the QemuOpts, which we can ignore here.

net_client_init() uses the opts visitor to convert the (flat) QemOpts to
a (non-flat) QAPI object Netdev.  Netdev is also the argument of QMP
command netdev_add.

The opts visitor was an early attempt to support QAPI in
(QemuOpts-based) CLI.  It restricts QAPI types to a certain shape; see
commit eb7ee2cbeb "qapi: introduce OptsVisitor".

A more modern way to support QAPI is qobject_input_visitor_new_str().
It uses keyval_parse() instead of QemuOpts for KEY=VALUE,... syntax, and
it also supports JSON syntax.  The former isn't quite as expressive as
JSON, but it's a lot closer than QemuOpts + opts visitor.

This commit paves the way to use of the modern way instead.

Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 include/net/net.h |  2 ++
 net/net.c | 57 +++
 softmmu/vl.c  |  6 -
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/net/net.h b/include/net/net.h
index e755254443ea..826e14a78734 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -214,6 +214,8 @@ extern NICInfo nd_table[MAX_NICS];
 extern const char *host_net_devices[];
 
 /* from net.c */
+bool netdev_is_modern(const char *optarg);
+void netdev_parse_modern(const char *optarg);
 void net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
 void net_init_clients(void);
diff --git a/net/net.c b/net/net.c
index f056e8aebfb2..ffe3e5a2cf1d 100644
--- a/net/net.c
+++ b/net/net.c
@@ -54,6 +54,7 @@
 #include "net/colo-compare.h"
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
+#include "qapi/qobject-input-visitor.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -63,6 +64,16 @@
 static VMChangeStateEntry *net_change_state_entry;
 static QTAILQ_HEAD(, NetClientState) net_clients;
 
+typedef struct NetdevQueueEntry {
+Netdev *nd;
+Location loc;
+QSIMPLEQ_ENTRY(NetdevQueueEntry) entry;
+} NetdevQueueEntry;
+
+typedef QSIMPLEQ_HEAD(, NetdevQueueEntry) NetdevQueue;
+
+static NetdevQueue nd_queue = QSIMPLEQ_HEAD_INITIALIZER(nd_queue);
+
 /***/
 /* network device redirectors */
 
@@ -1562,6 +1573,20 @@ out:
 return ret;
 }
 
+static void netdev_init_modern(void)
+{
+while (!QSIMPLEQ_EMPTY(&nd_queue)) {
+NetdevQueueEntry *nd = QSIMPLEQ_FIRST(&nd_queue);
+
+QSIMPLEQ_REMOVE_HEAD(&nd_queue, entry);
+loc_push_restore(&nd->loc);
+net_client_init1(nd->nd, true, &error_fatal);
+loc_pop(&nd->loc);
+qapi_free_Netdev(nd->nd);
+g_free(nd);
+}
+}
+
 void net_init_clients(void)
 {
 net_change_state_entry =
@@ -1569,6 +1594,8 @@ void net_init_clients(void)
 
 QTAILQ_INIT(&net_clients);
 
+netdev_init_modern();
+
 qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
   &error_fatal);
 
@@ -1579,6 +1606,36 @@ void net_init_clients(void)
   &error_fatal);
 }
 
+/*
+ * Does this -netdev argument use modern rather than traditional syntax?
+ * Modern syntax is to be parsed with netdev_parse_modern().
+ * Traditional syntax is to be parsed with net_client_parse().
+ */
+bool netdev_is_modern(const char *optarg)
+{
+return false;
+}
+
+/*
+ * netdev_parse_modern() uses modern, more expressive syntax than
+ * net_client_parse(), but supports only the -netdev option.
+ * netdev_parse_modern() appends to @nd_queue, whereas net_client_parse()
+ * appends to @qemu_netdev_opts.
+ */
+void netdev_parse_modern(const char *optarg)
+{
+Visitor *v;
+NetdevQueueEntry *nd;
+
+v = qobject_input_visitor_new_str(optarg, "type", &error_fatal);
+nd = g_new(NetdevQueueEntry, 1);
+visit_type_Netdev(v, NULL, &nd->nd, &error_fatal);
+visit_free(v);
+loc_save(&nd->loc);
+
+QSIMPLEQ_INSERT_TAIL(&nd_queue, nd, entry);
+}
+
 void 

[PATCH v7 02/14] net: remove the @errp argument of net_client_inits()

2022-07-22 Thread Laurent Vivier
The only caller passes &error_fatal, so use this directly in the function.

It's what we do for -blockdev, -device, and -object.

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 include/net/net.h |  2 +-
 net/net.c | 20 +++-
 softmmu/vl.c  |  2 +-
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 523136c7acba..c53c64ac18c4 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -216,7 +216,7 @@ extern const char *host_net_devices[];
 /* from net.c */
 int net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
-int net_init_clients(Error **errp);
+void net_init_clients(void);
 void net_check_clients(void);
 void net_cleanup(void);
 void hmp_host_net_add(Monitor *mon, const QDict *qdict);
diff --git a/net/net.c b/net/net.c
index d2288bd3a929..15958f881776 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1562,27 +1562,21 @@ out:
 return ret;
 }
 
-int net_init_clients(Error **errp)
+void net_init_clients(void)
 {
 net_change_state_entry =
 qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL);
 
 QTAILQ_INIT(&net_clients);
 
-if (qemu_opts_foreach(qemu_find_opts("netdev"),
-  net_init_netdev, NULL, errp)) {
-return -1;
-}
-
-if (qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL, errp)) {
-return -1;
-}
+qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
+  &error_fatal);
 
-if (qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL, errp)) 
{
-return -1;
-}
+qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL,
+  &error_fatal);
 
-return 0;
+qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL,
+  &error_fatal);
 }
 
 int net_client_parse(QemuOptsList *opts_list, const char *optarg)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index aabd82e09a20..8f3f3bb74389 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1904,7 +1904,7 @@ static void qemu_create_late_backends(void)
 qtest_server_init(qtest_chrdev, qtest_log, &error_fatal);
 }
 
-net_init_clients(&error_fatal);
+net_init_clients();
 
 object_option_foreach_add(object_create_late);
 
-- 
2.37.1




[PATCH v7 12/14] qemu-sockets: update socket_uri() and socket_parse() to be consistent

2022-07-22 Thread Laurent Vivier
To be consistent with socket_uri(), add 'tcp:' prefix for inet type in
socket_parse(), by default socket_parse() use tcp when no prefix is
provided (format is host:port).

In socket_uri(), use 'vsock:' prefix for vsock type rather than 'tcp:'
because it makes a vsock address look like an inet address with CID
misinterpreted as host.
Goes back to commit 9aca82ba31 "migration: Create socket-address parameter"

Signed-off-by: Laurent Vivier 
---
 util/qemu-sockets.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 870a36eb0e93..4531b13dbdbf 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -,7 +,7 @@ char *socket_uri(SocketAddress *addr)
 case SOCKET_ADDRESS_TYPE_FD:
 return g_strdup_printf("fd:%s", addr->u.fd.str);
 case SOCKET_ADDRESS_TYPE_VSOCK:
-return g_strdup_printf("tcp:%s:%s",
+return g_strdup_printf("vsock:%s:%s",
addr->u.vsock.cid,
addr->u.vsock.port);
 default:
@@ -1145,6 +1145,11 @@ SocketAddress *socket_parse(const char *str, Error 
**errp)
 if (vsock_parse(&addr->u.vsock, str + strlen("vsock:"), errp)) {
 goto fail;
 }
+} else if (strstart(str, "tcp:", NULL)) {
+addr->type = SOCKET_ADDRESS_TYPE_INET;
+if (inet_parse(&addr->u.inet, str + strlen("tcp:"), errp)) {
+goto fail;
+}
 } else {
 addr->type = SOCKET_ADDRESS_TYPE_INET;
 if (inet_parse(&addr->u.inet, str, errp)) {
-- 
2.37.1




[PATCH v6 08/14] net: dgram: make dgram_dst generic

2022-07-22 Thread Laurent Vivier
dgram_dst is a sockaddr_in structure. To be able to use it with
unix socket, use a pointer to a generic sockaddr structure.

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/dgram.c | 76 +++--
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index dbe65102d174..dcc2205305c5 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -40,9 +40,8 @@ typedef struct NetDgramState {
 int listen_fd;
 int fd;
 SocketReadState rs;
-  /* contains inet host and port destination iff connectionless (SOCK_DGRAM) */
-struct sockaddr_in dgram_dst;
-IOHandler *send_fn;   /* differs between SOCK_STREAM/SOCK_DGRAM */
+struct sockaddr *dgram_dst; /* contains destination iff connectionless */
+IOHandler *send_fn;
 bool read_poll;   /* waiting to receive data? */
 bool write_poll;  /* waiting to transmit data? */
 } NetDgramState;
@@ -86,10 +85,9 @@ static ssize_t net_dgram_receive_dgram(NetClientState *nc,
 ssize_t ret;
 
 do {
-if (s->dgram_dst.sin_family != AF_UNIX) {
-ret = sendto(s->fd, buf, size, 0,
- (struct sockaddr *)&s->dgram_dst,
- sizeof(s->dgram_dst));
+if (s->dgram_dst) {
+ret = sendto(s->fd, buf, size, 0, s->dgram_dst,
+ sizeof(struct sockaddr_in));
 } else {
 ret = send(s->fd, buf, size, 0);
 }
@@ -290,6 +288,8 @@ static void net_dgram_cleanup(NetClientState *nc)
 closesocket(s->listen_fd);
 s->listen_fd = -1;
 }
+g_free(s->dgram_dst);
+s->dgram_dst = NULL;
 }
 
 static NetClientInfo net_dgram_socket_info = {
@@ -306,7 +306,7 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
   SocketAddress *mcast,
   Error **errp)
 {
-struct sockaddr_in saddr;
+struct sockaddr_in *saddr = NULL;
 int newfd;
 NetClientState *nc;
 NetDgramState *s;
@@ -328,24 +328,25 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
  */
 
 if (is_fd && mcast != NULL) {
-if (convert_host_port(&saddr, mcast->u.inet.host,
-  mcast->u.inet.port, errp) < 0) {
+saddr = g_new(struct sockaddr_in, 1);
+
+if (convert_host_port(saddr, mcast->u.inet.host, 
mcast->u.inet.port,
+  errp) < 0) {
 goto err;
 }
 /* must be bound */
-if (saddr.sin_addr.s_addr == 0) {
+if (saddr->sin_addr.s_addr == 0) {
 error_setg(errp, "can't setup multicast destination address");
 goto err;
 }
 /* clone dgram socket */
-newfd = net_dgram_mcast_create(&saddr, NULL, errp);
+newfd = net_dgram_mcast_create(saddr, NULL, errp);
 if (newfd < 0) {
 goto err;
 }
 /* clone newfd to fd, close newfd */
 dup2(newfd, fd);
 close(newfd);
-
 }
 
 nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name);
@@ -359,16 +360,13 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
 net_dgram_read_poll(s, true);
 
 /* mcast: save bound address as dst */
-if (is_fd && mcast != NULL) {
-s->dgram_dst = saddr;
+if (saddr) {
+g_assert(s->dgram_dst == NULL);
+s->dgram_dst = (struct sockaddr *)saddr;
 snprintf(nc->info_str, sizeof(nc->info_str),
  "fd=%d (cloned mcast=%s:%d)",
- fd, inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+ fd, inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
 } else {
-if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) {
-s->dgram_dst.sin_family = AF_UNIX;
-}
-
 snprintf(nc->info_str, sizeof(nc->info_str), "fd=%d %s", fd,
  SocketAddressType_str(sa_type));
 }
@@ -376,6 +374,7 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
 return s;
 
 err:
+g_free(saddr);
 closesocket(fd);
 return NULL;
 }
@@ -421,21 +420,24 @@ static int net_dgram_mcast_init(NetClientState *peer,
 {
 NetDgramState *s;
 int fd, ret;
-struct sockaddr_in saddr;
+struct sockaddr_in *saddr;
 
 if (remote->type != SOCKET_ADDRESS_TYPE_INET) {
 error_setg(errp, "multicast only support inet type");
 return -1;
 }
 
-if (convert_host_port(&saddr, remote->u.inet.host, remote->u.inet.port,
+saddr = g_new(struct sockaddr_in, 1);
+if (convert_host_port(saddr, remote->u.inet.host, remote->u.inet.port,
   errp) < 0) {
+g_free(saddr);
 return -1;
 }
 
 if (!local) {
-fd = net_dgram_mcast_creat

[PATCH v7 08/14] net: dgram: make dgram_dst generic

2022-07-22 Thread Laurent Vivier
dgram_dst is a sockaddr_in structure. To be able to use it with
unix socket, use a pointer to a generic sockaddr structure.

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/dgram.c | 76 +++--
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index dbe65102d174..dcc2205305c5 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -40,9 +40,8 @@ typedef struct NetDgramState {
 int listen_fd;
 int fd;
 SocketReadState rs;
-  /* contains inet host and port destination iff connectionless (SOCK_DGRAM) */
-struct sockaddr_in dgram_dst;
-IOHandler *send_fn;   /* differs between SOCK_STREAM/SOCK_DGRAM */
+struct sockaddr *dgram_dst; /* contains destination iff connectionless */
+IOHandler *send_fn;
 bool read_poll;   /* waiting to receive data? */
 bool write_poll;  /* waiting to transmit data? */
 } NetDgramState;
@@ -86,10 +85,9 @@ static ssize_t net_dgram_receive_dgram(NetClientState *nc,
 ssize_t ret;
 
 do {
-if (s->dgram_dst.sin_family != AF_UNIX) {
-ret = sendto(s->fd, buf, size, 0,
- (struct sockaddr *)&s->dgram_dst,
- sizeof(s->dgram_dst));
+if (s->dgram_dst) {
+ret = sendto(s->fd, buf, size, 0, s->dgram_dst,
+ sizeof(struct sockaddr_in));
 } else {
 ret = send(s->fd, buf, size, 0);
 }
@@ -290,6 +288,8 @@ static void net_dgram_cleanup(NetClientState *nc)
 closesocket(s->listen_fd);
 s->listen_fd = -1;
 }
+g_free(s->dgram_dst);
+s->dgram_dst = NULL;
 }
 
 static NetClientInfo net_dgram_socket_info = {
@@ -306,7 +306,7 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
   SocketAddress *mcast,
   Error **errp)
 {
-struct sockaddr_in saddr;
+struct sockaddr_in *saddr = NULL;
 int newfd;
 NetClientState *nc;
 NetDgramState *s;
@@ -328,24 +328,25 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
  */
 
 if (is_fd && mcast != NULL) {
-if (convert_host_port(&saddr, mcast->u.inet.host,
-  mcast->u.inet.port, errp) < 0) {
+saddr = g_new(struct sockaddr_in, 1);
+
+if (convert_host_port(saddr, mcast->u.inet.host, 
mcast->u.inet.port,
+  errp) < 0) {
 goto err;
 }
 /* must be bound */
-if (saddr.sin_addr.s_addr == 0) {
+if (saddr->sin_addr.s_addr == 0) {
 error_setg(errp, "can't setup multicast destination address");
 goto err;
 }
 /* clone dgram socket */
-newfd = net_dgram_mcast_create(&saddr, NULL, errp);
+newfd = net_dgram_mcast_create(saddr, NULL, errp);
 if (newfd < 0) {
 goto err;
 }
 /* clone newfd to fd, close newfd */
 dup2(newfd, fd);
 close(newfd);
-
 }
 
 nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name);
@@ -359,16 +360,13 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
 net_dgram_read_poll(s, true);
 
 /* mcast: save bound address as dst */
-if (is_fd && mcast != NULL) {
-s->dgram_dst = saddr;
+if (saddr) {
+g_assert(s->dgram_dst == NULL);
+s->dgram_dst = (struct sockaddr *)saddr;
 snprintf(nc->info_str, sizeof(nc->info_str),
  "fd=%d (cloned mcast=%s:%d)",
- fd, inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+ fd, inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
 } else {
-if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) {
-s->dgram_dst.sin_family = AF_UNIX;
-}
-
 snprintf(nc->info_str, sizeof(nc->info_str), "fd=%d %s", fd,
  SocketAddressType_str(sa_type));
 }
@@ -376,6 +374,7 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
 return s;
 
 err:
+g_free(saddr);
 closesocket(fd);
 return NULL;
 }
@@ -421,21 +420,24 @@ static int net_dgram_mcast_init(NetClientState *peer,
 {
 NetDgramState *s;
 int fd, ret;
-struct sockaddr_in saddr;
+struct sockaddr_in *saddr;
 
 if (remote->type != SOCKET_ADDRESS_TYPE_INET) {
 error_setg(errp, "multicast only support inet type");
 return -1;
 }
 
-if (convert_host_port(&saddr, remote->u.inet.host, remote->u.inet.port,
+saddr = g_new(struct sockaddr_in, 1);
+if (convert_host_port(saddr, remote->u.inet.host, remote->u.inet.port,
   errp) < 0) {
+g_free(saddr);
 return -1;
 }
 
 if (!local) {
-fd = net_dgram_mcast_creat

[PATCH v6 09/14] net: dgram: move mcast specific code from net_socket_fd_init_dgram()

2022-07-22 Thread Laurent Vivier
It is less complex to manage special cases directly in
net_dgram_mcast_init() and net_dgram_udp_init().

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/dgram.c | 143 +++-
 1 file changed, 73 insertions(+), 70 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index dcc2205305c5..16e2d909755c 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -302,52 +302,11 @@ static NetClientInfo net_dgram_socket_info = {
 static NetDgramState *net_dgram_fd_init_dgram(NetClientState *peer,
   const char *model,
   const char *name,
-  int fd, int is_fd,
-  SocketAddress *mcast,
+  int fd,
   Error **errp)
 {
-struct sockaddr_in *saddr = NULL;
-int newfd;
 NetClientState *nc;
 NetDgramState *s;
-SocketAddress *sa;
-SocketAddressType sa_type;
-
-sa = socket_local_address(fd, errp);
-if (!sa) {
-return NULL;
-}
-sa_type = sa->type;
-qapi_free_SocketAddress(sa);
-
-/*
- * fd passed: multicast: "learn" dgram_dst address from bound address and
- * save it. Because this may be "shared" socket from a "master" process,
- * datagrams would be recv() by ONLY ONE process: we must "clone" this
- * dgram socket --jjo
- */
-
-if (is_fd && mcast != NULL) {
-saddr = g_new(struct sockaddr_in, 1);
-
-if (convert_host_port(saddr, mcast->u.inet.host, 
mcast->u.inet.port,
-  errp) < 0) {
-goto err;
-}
-/* must be bound */
-if (saddr->sin_addr.s_addr == 0) {
-error_setg(errp, "can't setup multicast destination address");
-goto err;
-}
-/* clone dgram socket */
-newfd = net_dgram_mcast_create(saddr, NULL, errp);
-if (newfd < 0) {
-goto err;
-}
-/* clone newfd to fd, close newfd */
-dup2(newfd, fd);
-close(newfd);
-}
 
 nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name);
 
@@ -359,24 +318,7 @@ static NetDgramState 
*net_dgram_fd_init_dgram(NetClientState *peer,
 net_socket_rs_init(&s->rs, net_dgram_rs_finalize, false);
 net_dgram_read_poll(s, true);
 
-/* mcast: save bound address as dst */
-if (saddr) {
-g_assert(s->dgram_dst == NULL);
-s->dgram_dst = (struct sockaddr *)saddr;
-snprintf(nc->info_str, sizeof(nc->info_str),
- "fd=%d (cloned mcast=%s:%d)",
- fd, inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
-} else {
-snprintf(nc->info_str, sizeof(nc->info_str), "fd=%d %s", fd,
- SocketAddressType_str(sa_type));
-}
-
 return s;
-
-err:
-g_free(saddr);
-closesocket(fd);
-return NULL;
 }
 
 static void net_dgram_connect(void *opaque)
@@ -421,6 +363,7 @@ static int net_dgram_mcast_init(NetClientState *peer,
 NetDgramState *s;
 int fd, ret;
 struct sockaddr_in *saddr;
+gchar *info_str;
 
 if (remote->type != SOCKET_ADDRESS_TYPE_INET) {
 error_setg(errp, "multicast only support inet type");
@@ -440,6 +383,9 @@ static int net_dgram_mcast_init(NetClientState *peer,
 g_free(saddr);
 return -1;
 }
+info_str = g_strdup_printf("mcast=%s:%d",
+   inet_ntoa(saddr->sin_addr),
+   ntohs(saddr->sin_port));
 } else {
 switch (local->type) {
 case SOCKET_ADDRESS_TYPE_INET: {
@@ -457,9 +403,14 @@ static int net_dgram_mcast_init(NetClientState *peer,
 g_free(saddr);
 return -1;
 }
+info_str = g_strdup_printf("mcast=%s:%d",
+   inet_ntoa(saddr->sin_addr),
+   ntohs(saddr->sin_port));
 break;
 }
-case SOCKET_ADDRESS_TYPE_FD:
+case SOCKET_ADDRESS_TYPE_FD: {
+int newfd;
+
 fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp);
 if (fd == -1) {
 g_free(saddr);
@@ -472,7 +423,46 @@ static int net_dgram_mcast_init(NetClientState *peer,
  name, fd);
 return -1;
 }
+
+/*
+ * fd passed: multicast: "learn" dgram_dst address from bound
+ * address and save it. Because this may be "shared" socket from a
+ * "master" process, datagrams would be recv() by ONLY ONE process:
+ * we must "clone" this dgram socket --jjo
+ */
+
+saddr = g_new(struct sockaddr_in, 1);
+
+   

[PATCH v7 10/14] net: dgram: add unix socket

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/dgram.c | 65 ++---
 qapi/net.json   |  2 +-
 qemu-options.hx |  1 +
 3 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index 16e2d909755c..9f3eafee3b67 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -86,8 +86,15 @@ static ssize_t net_dgram_receive_dgram(NetClientState *nc,
 
 do {
 if (s->dgram_dst) {
-ret = sendto(s->fd, buf, size, 0, s->dgram_dst,
- sizeof(struct sockaddr_in));
+socklen_t len;
+
+if (s->dgram_dst->sa_family == AF_INET) {
+len = sizeof(struct sockaddr_in);
+} else {
+len = sizeof(struct sockaddr_un);
+}
+
+ret = sendto(s->fd, buf, size, 0, s->dgram_dst, len);
 } else {
 ret = send(s->fd, buf, size, 0);
 }
@@ -509,7 +516,7 @@ static int net_dgram_udp_init(NetClientState *peer,
 }
 } else {
 if (local->type != SOCKET_ADDRESS_TYPE_FD) {
-error_setg(errp, "type=inet requires remote parameter");
+error_setg(errp, "type=inet or unix require remote parameter");
 return -1;
 }
 }
@@ -559,6 +566,58 @@ static int net_dgram_udp_init(NetClientState *peer,
 
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un laddr_un, raddr_un;
+
+ret = unlink(local->u.q_unix.path);
+if (ret < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno, "failed to unlink socket %s",
+ local->u.q_unix.path);
+return -1;
+}
+
+laddr_un.sun_family = PF_UNIX;
+ret = snprintf(laddr_un.sun_path, sizeof(laddr_un.sun_path), "%s",
+   local->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(laddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   local->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(laddr_un.sun_path));
+}
+
+raddr_un.sun_family = PF_UNIX;
+ret = snprintf(raddr_un.sun_path, sizeof(raddr_un.sun_path), "%s",
+   remote->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(raddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   remote->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(raddr_un.sun_path));
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_DGRAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create datagram socket");
+return -1;
+}
+
+ret = bind(fd, (struct sockaddr *)&laddr_un, sizeof(laddr_un));
+if (ret < 0) {
+error_setg_errno(errp, errno, "can't bind unix=%s to socket",
+ laddr_un.sun_path);
+closesocket(fd);
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+dgram_dst = g_malloc(sizeof(raddr_un));
+memcpy(dgram_dst, &raddr_un, sizeof(raddr_un));
+
+info_str = g_strdup_printf("udp=%s:%s",
+   laddr_un.sun_path, raddr_un.sun_path);
+break;
+}
 case SOCKET_ADDRESS_TYPE_FD: {
 SocketAddress *sa;
 SocketAddressType sa_type;
diff --git a/qapi/net.json b/qapi/net.json
index 518f288758b0..894c06a82b1b 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -600,7 +600,7 @@
 # @remote: remote address
 # @local: local address
 #
-# Only SocketAddress types 'inet' and 'fd' are supported.
+# Only SocketAddress types 'unix', 'inet' and 'fd' are supported.
 #
 # The code checks there is at least one of these options and reports an error
 # if not. If remote address is present and it's a multicast address, local
diff --git a/qemu-options.hx b/qemu-options.hx
index 827a951a9ef2..33c5bd72af21 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2736,6 +2736,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "configure a network backend to connect to a multicast 
maddr and port\n"
 "use ``local.host=addr`` to specify the host address to 
send packets from\n"
 "-netdev 
dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]\n"
+"-netdev 
dgram,id=str,local.type=unix,local.path=path[,remote.type=unix,remote.path=path]\n"
 "-netdev dgram,id=str,local.type=fd,local.str=h\n"
 "configure a network backend to connect to another 
network\n"
 "using an UDP tunnel\n"
-- 
2.37.1




[PATCH v6 07/14] net: stream: add unix socket

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 net/stream.c| 108 +---
 qapi/net.json   |   2 +-
 qemu-options.hx |   1 +
 3 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index e8afbaca50b6..0f91ff20df61 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -235,7 +235,7 @@ static NetStreamState 
*net_stream_fd_init_stream(NetClientState *peer,
 static void net_stream_accept(void *opaque)
 {
 NetStreamState *s = opaque;
-struct sockaddr_in saddr;
+struct sockaddr_storage saddr;
 socklen_t len;
 int fd;
 
@@ -253,9 +253,27 @@ static void net_stream_accept(void *opaque)
 s->fd = fd;
 s->nc.link_down = false;
 net_stream_connect(s);
-snprintf(s->nc.info_str, sizeof(s->nc.info_str),
- "connection from %s:%d",
- inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+switch (saddr.ss_family) {
+case AF_INET: {
+struct sockaddr_in *saddr_in = (struct sockaddr_in *)&saddr;
+
+snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+ "connection from %s:%d",
+ inet_ntoa(saddr_in->sin_addr), ntohs(saddr_in->sin_port));
+break;
+}
+case AF_UNIX: {
+struct sockaddr_un saddr_un;
+
+len = sizeof(saddr_un);
+getsockname(s->listen_fd, (struct sockaddr *)&saddr_un, &len);
+snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+ "connect from %s", saddr_un.sun_path);
+break;
+}
+default:
+g_assert_not_reached();
+}
 }
 
 static int net_stream_server_init(NetClientState *peer,
@@ -295,6 +313,43 @@ static int net_stream_server_init(NetClientState *peer,
 }
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un saddr_un;
+
+ret = unlink(addr->u.q_unix.path);
+if (ret < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno, "failed to unlink socket %s",
+ addr->u.q_unix.path);
+return -1;
+}
+
+saddr_un.sun_family = PF_UNIX;
+ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
+   addr->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   addr->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(saddr_un.sun_path));
+return -1;
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create stream socket");
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+ret = bind(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un));
+if (ret < 0) {
+error_setg_errno(errp, errno, "can't create socket with path: %s",
+ saddr_un.sun_path);
+closesocket(fd);
+return -1;
+}
+break;
+}
 case SOCKET_ADDRESS_TYPE_FD:
 fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
 if (fd == -1) {
@@ -380,6 +435,49 @@ static int net_stream_client_init(NetClientState *peer,
ntohs(saddr_in.sin_port));
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un saddr_un;
+
+saddr_un.sun_family = PF_UNIX;
+ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
+   addr->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   addr->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(saddr_un.sun_path));
+return -1;
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create stream socket");
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+connected = 0;
+for (;;) {
+ret = connect(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un));
+if (ret < 0) {
+if (errno == EINTR || errno == EWOULDBLOCK) {
+/* continue */
+} else if (errno == EAGAIN ||
+   errno == EALREADY) {
+break;
+} else {
+error_setg_errno(errp, errno, "can't connect socket");
+closesocket(fd);
+return -1;
+}
+} else {
+connected = 1;
+break;
+}
+}
+info_str = g_strdup_printf(" connect to %s", saddr_un.sun_

[PATCH v6 05/14] qapi: net: add stream and dgram netdevs

2022-07-22 Thread Laurent Vivier
Copied from socket netdev file and modified to use SocketAddress
to be able to introduce new features like unix socket.

"udp" and "mcast" are squashed into dgram netdev, multicast is detected
according to the IP address type.
"listen" and "connect" modes are managed by stream netdev. An optional
parameter "server" defines the mode (server by default)

The two new types need to be parsed the modern way with -netdev, because
with the traditional way, the "type" field of netdev structure collides with
the "type" field of SocketAddress and prevents the correct evaluation of the
command line option. Moreover the traditional way doesn't allow to use
the same type (SocketAddress) several times with the -netdev option
(needed to specify "local" and "remote" addresses).

The previous commit paved the way for parsing the modern way, but
omitted one detail: how to pick modern vs. traditional, in
netdev_is_modern().

We want to pick based on the value of parameter "type".  But how to
extract it from the option argument?

Parsing the option argument, either the modern or the traditional way,
extracts it for us, but only if parsing succeeds.

If parsing fails, there is no good option.  No matter which parser we
pick, it'll be the wrong one for some arguments, and the error
reporting will be confusing.

Fortunately, the traditional parser accepts *anything* when called in
a certain way.  This maximizes our chance to extract the value of
"type", and in turn minimizes the risk of confusing error reporting.

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 hmp-commands.hx |   2 +-
 net/clients.h   |   6 +
 net/dgram.c | 631 
 net/hub.c   |   2 +
 net/meson.build |   2 +
 net/net.c   |  30 ++-
 net/stream.c| 423 
 qapi/net.json   |  63 -
 qemu-options.hx |  12 +
 9 files changed, 1167 insertions(+), 4 deletions(-)
 create mode 100644 net/dgram.c
 create mode 100644 net/stream.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 182e639d1498..83e8d45a2a8b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1276,7 +1276,7 @@ ERST
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
-.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user"
+.params = 
"[user|tap|socket|stream|dgram|vde|bridge|hubport|netmap|vhost-user"
 #ifdef CONFIG_VMNET
   "|vmnet-host|vmnet-shared|vmnet-bridged"
 #endif
diff --git a/net/clients.h b/net/clients.h
index c9157789f2ce..ed8bdfff1e7c 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -40,6 +40,12 @@ int net_init_hubport(const Netdev *netdev, const char *name,
 int net_init_socket(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
 
+int net_init_stream(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp);
+
+int net_init_dgram(const Netdev *netdev, const char *name,
+   NetClientState *peer, Error **errp);
+
 int net_init_tap(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp);
 
diff --git a/net/dgram.c b/net/dgram.c
new file mode 100644
index ..dbe65102d174
--- /dev/null
+++ b/net/dgram.c
@@ -0,0 +1,631 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "net/net.h"
+#include "clients.h"
+#include "monitor/monitor.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+
+typedef struct NetDgramState {
+NetClientState nc;
+int listen_fd;
+int fd;
+SocketReadState rs;
+  /* contains inet host and port destination iff connectionless (SOCK_DGRAM) */

[PATCH v6 04/14] qapi: net: introduce a way to bypass qemu_opts_parse_noisily()

2022-07-22 Thread Laurent Vivier
As qemu_opts_parse_noisily() flattens the QAPI structures ("type" field
of Netdev structure can collides with "type" field of SocketAddress),
we introduce a way to bypass qemu_opts_parse_noisily() and use directly
visit_type_Netdev() to parse the backend parameters.

More details from Markus:

qemu_init() passes the argument of -netdev, -nic, and -net to
net_client_parse().

net_client_parse() parses with qemu_opts_parse_noisily(), passing
QemuOptsList qemu_netdev_opts for -netdev, qemu_nic_opts for -nic, and
qemu_net_opts for -net.  Their desc[] are all empty, which means any
keys are accepted.  The result of the parse (a QemuOpts) is stored in
the QemuOptsList.

Note that QemuOpts is flat by design.  In some places, we layer non-flat
on top using dotted keys convention, but not here.

net_init_clients() iterates over the stored QemuOpts, and passes them to
net_init_netdev(), net_param_nic(), or net_init_client(), respectively.

These functions pass the QemuOpts to net_client_init().  They also do
other things with the QemuOpts, which we can ignore here.

net_client_init() uses the opts visitor to convert the (flat) QemOpts to
a (non-flat) QAPI object Netdev.  Netdev is also the argument of QMP
command netdev_add.

The opts visitor was an early attempt to support QAPI in
(QemuOpts-based) CLI.  It restricts QAPI types to a certain shape; see
commit eb7ee2cbeb "qapi: introduce OptsVisitor".

A more modern way to support QAPI is qobject_input_visitor_new_str().
It uses keyval_parse() instead of QemuOpts for KEY=VALUE,... syntax, and
it also supports JSON syntax.  The former isn't quite as expressive as
JSON, but it's a lot closer than QemuOpts + opts visitor.

This commit paves the way to use of the modern way instead.

Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 include/net/net.h |  2 ++
 net/net.c | 57 +++
 softmmu/vl.c  |  6 -
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/net/net.h b/include/net/net.h
index e755254443ea..826e14a78734 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -214,6 +214,8 @@ extern NICInfo nd_table[MAX_NICS];
 extern const char *host_net_devices[];
 
 /* from net.c */
+bool netdev_is_modern(const char *optarg);
+void netdev_parse_modern(const char *optarg);
 void net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
 void net_init_clients(void);
diff --git a/net/net.c b/net/net.c
index f056e8aebfb2..ffe3e5a2cf1d 100644
--- a/net/net.c
+++ b/net/net.c
@@ -54,6 +54,7 @@
 #include "net/colo-compare.h"
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
+#include "qapi/qobject-input-visitor.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -63,6 +64,16 @@
 static VMChangeStateEntry *net_change_state_entry;
 static QTAILQ_HEAD(, NetClientState) net_clients;
 
+typedef struct NetdevQueueEntry {
+Netdev *nd;
+Location loc;
+QSIMPLEQ_ENTRY(NetdevQueueEntry) entry;
+} NetdevQueueEntry;
+
+typedef QSIMPLEQ_HEAD(, NetdevQueueEntry) NetdevQueue;
+
+static NetdevQueue nd_queue = QSIMPLEQ_HEAD_INITIALIZER(nd_queue);
+
 /***/
 /* network device redirectors */
 
@@ -1562,6 +1573,20 @@ out:
 return ret;
 }
 
+static void netdev_init_modern(void)
+{
+while (!QSIMPLEQ_EMPTY(&nd_queue)) {
+NetdevQueueEntry *nd = QSIMPLEQ_FIRST(&nd_queue);
+
+QSIMPLEQ_REMOVE_HEAD(&nd_queue, entry);
+loc_push_restore(&nd->loc);
+net_client_init1(nd->nd, true, &error_fatal);
+loc_pop(&nd->loc);
+qapi_free_Netdev(nd->nd);
+g_free(nd);
+}
+}
+
 void net_init_clients(void)
 {
 net_change_state_entry =
@@ -1569,6 +1594,8 @@ void net_init_clients(void)
 
 QTAILQ_INIT(&net_clients);
 
+netdev_init_modern();
+
 qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
   &error_fatal);
 
@@ -1579,6 +1606,36 @@ void net_init_clients(void)
   &error_fatal);
 }
 
+/*
+ * Does this -netdev argument use modern rather than traditional syntax?
+ * Modern syntax is to be parsed with netdev_parse_modern().
+ * Traditional syntax is to be parsed with net_client_parse().
+ */
+bool netdev_is_modern(const char *optarg)
+{
+return false;
+}
+
+/*
+ * netdev_parse_modern() uses modern, more expressive syntax than
+ * net_client_parse(), but supports only the -netdev option.
+ * netdev_parse_modern() appends to @nd_queue, whereas net_client_parse()
+ * appends to @qemu_netdev_opts.
+ */
+void netdev_parse_modern(const char *optarg)
+{
+Visitor *v;
+NetdevQueueEntry *nd;
+
+v = qobject_input_visitor_new_str(optarg, "type", &error_fatal);
+nd = g_new(NetdevQueueEntry, 1);
+visit_type_Netdev(v, NULL, &nd->nd, &error_fatal);
+visit_free(v);
+loc_save(&nd->loc);
+
+QSIMPLEQ_INSERT_TAIL(&nd_queue, nd, entry);
+}
+
 void 

[PATCH v6 00/14] qapi: net: add unix socket type support to netdev backend

2022-07-22 Thread Laurent Vivier
"-netdev socket" only supports inet sockets.

It's not a complex task to add support for unix sockets, but
the socket netdev parameters are not defined to manage well unix
socket parameters.

As discussed in:

  "socket.c added support for unix domain socket datagram transport"
  
https://lore.kernel.org/qemu-devel/1c0e1bc5-904f-46b0-8044-68e43e67b...@gmail.com/

This series adds support of unix socket type using SocketAddress QAPI structure.

Two new netdev backends, "stream" and "dgram" are added, that are barely a copy 
of "socket"
backend but they use the SocketAddress QAPI to provide socket parameters.
And then they also implement unix sockets (TCP and UDP).

Some examples of CLI syntax:

  for TCP:

  -netdev stream,id=socket0,addr.type=inet,addr.host=localhost,addr.port=1234
  -netdev 
stream,id=socket0,server=off,addr.type=inet,addr.host=localhost,addr.port=1234

  -netdev dgram,id=socket0,\
  local.type=inet,local.host=localhost,local.port=1234,\
  remote.type=inet,remote.host=localhost,remote.port=1235

  for UNIX:

  -netdev stream,id=socket0,addr.type=unix,addr.path=/tmp/qemu0
  -netdev stream,id=socket0,server=off,addr.type=unix,addr.path=/tmp/qemu0

  -netdev dgram,id=socket0,\
  local.type=unix,local.path=/tmp/qemu0,\
  remote.type=unix,remote.path=/tmp/qemu1

  for FD:

  -netdev stream,id=socket0,addr.type=fd,addr.str=4
  -netdev stream,id=socket0,server=off,addr.type=fd,addr.str=5

  -netdev dgram,id=socket0,local.type=fd,addr.str=4

v7:
  - add qtests
  - update parameters table in net.json
  - update socket_uri() and socket_parse()

v6:
  - s/netdev option/-netdev option/ PATCH 4
  - s/ / /
  - update @NetdevStreamOptions and @NetdevDgramOptions comments
  - update PATCH 4 description message
  - add missing return in error case for unix stream socket
  - split socket_uri() patch: move and rename, then change content

v5:
  - remove RFC prefix
  - put the change of net_client_parse() into its own patch (exit() in the
function)
  - update comments regarding netdev_is_modern() and netdev_parse_modern()
  - update error case in net_stream_server_init()
  - update qemu-options.hx with unix type
  - fix HMP "info network" with unix protocol/server side.

v4:
  - net_client_parse() fails with exit() rather than with return.
  - keep "{ 'name': 'vmnet-host', 'if': 'CONFIG_VMNET' }" on its
own line in qapi/net.json
  - add a comment in qapi/net.json about parameters usage
  - move netdev_is_modern() check to qemu_init()
  - in netdev_is_modern(), check for JSON and use qemu_opts_do_parse()
to parse parameters and detect type value.
  - add a blank line after copyright comment

v3:
  - remove support of "-net" for dgram and stream. They are only
supported with "-netdev" option.
  - use &error_fatal directly in net_client_inits()
  - update qemu-options.hx
  - move to QIO for stream socket

v2:
  - use "stream" and "dgram" rather than "socket-ng,mode=stream"
and ""socket-ng,mode=dgram"
  - extract code to bypass qemu_opts_parse_noisily() to
a new patch
  - do not ignore EINVAL (Stefano)
  - fix "-net" option

CC: Ralph Schmieder 
CC: Stefano Brivio 
CC: Daniel P. Berrangé 
CC: Markus Armbruster 

Laurent Vivier (13):
  net: introduce convert_host_port()
  net: remove the @errp argument of net_client_inits()
  net: simplify net_client_parse() error management
  qapi: net: introduce a way to bypass qemu_opts_parse_noisily()
  qapi: net: add stream and dgram netdevs
  net: stream: add unix socket
  net: dgram: make dgram_dst generic
  net: dgram: move mcast specific code from net_socket_fd_init_dgram()
  net: dgram: add unix socket
  qemu-sockets: move and rename SocketAddress_to_str()
  qemu-sockets: update socket_uri() and socket_parse()  to be consistent
  net: stream: move to QIO
  tests/qtest: netdev: test stream and dgram backends

Stefano Brivio (1):
  net: stream: Don't ignore EINVAL on netdev socket connection

 hmp-commands.hx |   2 +-
 include/net/net.h   |   6 +-
 include/qemu/sockets.h  |   4 +-
 monitor/hmp-cmds.c  |  23 +-
 net/clients.h   |   6 +
 net/dgram.c | 707 
 net/hub.c   |   2 +
 net/meson.build |   2 +
 net/net.c   | 169 ++---
 net/stream.c| 376 +++
 qapi/net.json   |  63 +++-
 qemu-options.hx |  14 +
 softmmu/vl.c|  16 +-
 tests/qtest/meson.build |   1 +
 tests/qtest/netdev-socket.c | 322 
 util/qemu-sockets.c |  25 ++
 16 files changed, 1656 insertions(+), 82 deletions(-)
 create mode 100644 net/dgram.c
 create mode 100644 net/stream.c
 create mode 100644 tests/qtest/netdev-socket.c

-- 
2.37.1




[PATCH v6 06/14] net: stream: Don't ignore EINVAL on netdev socket connection

2022-07-22 Thread Laurent Vivier
From: Stefano Brivio 

Other errors are treated as failure by net_stream_client_init(),
but if connect() returns EINVAL, we'll fail silently. Remove the
related exception.

Signed-off-by: Stefano Brivio 
[lvivier: applied to net/stream.c]
Signed-off-by: Laurent Vivier 
Reviewed-by: Daniel P. Berrangé 
---
 net/stream.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index 0851e90becca..e8afbaca50b6 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -363,8 +363,7 @@ static int net_stream_client_init(NetClientState *peer,
 if (errno == EINTR || errno == EWOULDBLOCK) {
 /* continue */
 } else if (errno == EINPROGRESS ||
-   errno == EALREADY ||
-   errno == EINVAL) {
+   errno == EALREADY) {
 break;
 } else {
 error_setg_errno(errp, errno, "can't connect socket");
-- 
2.37.1




[PATCH v6 03/14] net: simplify net_client_parse() error management

2022-07-22 Thread Laurent Vivier
All net_client_parse() callers exit in case of error.

Move exit(1) to net_client_parse() and remove error checking from
the callers.

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 include/net/net.h |  2 +-
 net/net.c |  6 ++
 softmmu/vl.c  | 12 +++-
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index c53c64ac18c4..e755254443ea 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -214,7 +214,7 @@ extern NICInfo nd_table[MAX_NICS];
 extern const char *host_net_devices[];
 
 /* from net.c */
-int net_client_parse(QemuOptsList *opts_list, const char *str);
+void net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
 void net_init_clients(void);
 void net_check_clients(void);
diff --git a/net/net.c b/net/net.c
index 15958f881776..f056e8aebfb2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1579,13 +1579,11 @@ void net_init_clients(void)
   &error_fatal);
 }
 
-int net_client_parse(QemuOptsList *opts_list, const char *optarg)
+void net_client_parse(QemuOptsList *opts_list, const char *optarg)
 {
 if (!qemu_opts_parse_noisily(opts_list, optarg, true)) {
-return -1;
+exit(1);
 }
-
-return 0;
 }
 
 /* From FreeBSD */
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 8f3f3bb74389..0478210f2e04 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2815,21 +2815,15 @@ void qemu_init(int argc, char **argv, char **envp)
 break;
 case QEMU_OPTION_netdev:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("netdev"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("netdev"), optarg);
 break;
 case QEMU_OPTION_nic:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("nic"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("nic"), optarg);
 break;
 case QEMU_OPTION_net:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("net"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("net"), optarg);
 break;
 #ifdef CONFIG_LIBISCSI
 case QEMU_OPTION_iscsi:
-- 
2.37.1




[PATCH v6 02/14] net: remove the @errp argument of net_client_inits()

2022-07-22 Thread Laurent Vivier
The only caller passes &error_fatal, so use this directly in the function.

It's what we do for -blockdev, -device, and -object.

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
---
 include/net/net.h |  2 +-
 net/net.c | 20 +++-
 softmmu/vl.c  |  2 +-
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 523136c7acba..c53c64ac18c4 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -216,7 +216,7 @@ extern const char *host_net_devices[];
 /* from net.c */
 int net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
-int net_init_clients(Error **errp);
+void net_init_clients(void);
 void net_check_clients(void);
 void net_cleanup(void);
 void hmp_host_net_add(Monitor *mon, const QDict *qdict);
diff --git a/net/net.c b/net/net.c
index d2288bd3a929..15958f881776 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1562,27 +1562,21 @@ out:
 return ret;
 }
 
-int net_init_clients(Error **errp)
+void net_init_clients(void)
 {
 net_change_state_entry =
 qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL);
 
 QTAILQ_INIT(&net_clients);
 
-if (qemu_opts_foreach(qemu_find_opts("netdev"),
-  net_init_netdev, NULL, errp)) {
-return -1;
-}
-
-if (qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL, errp)) {
-return -1;
-}
+qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
+  &error_fatal);
 
-if (qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL, errp)) 
{
-return -1;
-}
+qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL,
+  &error_fatal);
 
-return 0;
+qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL,
+  &error_fatal);
 }
 
 int net_client_parse(QemuOptsList *opts_list, const char *optarg)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index aabd82e09a20..8f3f3bb74389 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1904,7 +1904,7 @@ static void qemu_create_late_backends(void)
 qtest_server_init(qtest_chrdev, qtest_log, &error_fatal);
 }
 
-net_init_clients(&error_fatal);
+net_init_clients();
 
 object_option_foreach_add(object_create_late);
 
-- 
2.37.1




[PATCH v6 01/14] net: introduce convert_host_port()

2022-07-22 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
---
 include/qemu/sockets.h |  2 ++
 net/net.c  | 62 ++
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 038faa157f59..47194b9732f8 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -47,6 +47,8 @@ void socket_listen_cleanup(int fd, Error **errp);
 int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
 
 /* Old, ipv4 only bits.  Don't use for new code. */
+int convert_host_port(struct sockaddr_in *saddr, const char *host,
+  const char *port, Error **errp);
 int parse_host_port(struct sockaddr_in *saddr, const char *str,
 Error **errp);
 int socket_init(void);
diff --git a/net/net.c b/net/net.c
index 2db160e0634d..d2288bd3a929 100644
--- a/net/net.c
+++ b/net/net.c
@@ -66,55 +66,57 @@ static QTAILQ_HEAD(, NetClientState) net_clients;
 /***/
 /* network device redirectors */
 
-int parse_host_port(struct sockaddr_in *saddr, const char *str,
-Error **errp)
+int convert_host_port(struct sockaddr_in *saddr, const char *host,
+  const char *port, Error **errp)
 {
-gchar **substrings;
 struct hostent *he;
-const char *addr, *p, *r;
-int port, ret = 0;
+const char *r;
+long p;
 
 memset(saddr, 0, sizeof(*saddr));
 
-substrings = g_strsplit(str, ":", 2);
-if (!substrings || !substrings[0] || !substrings[1]) {
-error_setg(errp, "host address '%s' doesn't contain ':' "
-   "separating host from port", str);
-ret = -1;
-goto out;
-}
-
-addr = substrings[0];
-p = substrings[1];
-
 saddr->sin_family = AF_INET;
-if (addr[0] == '\0') {
+if (host[0] == '\0') {
 saddr->sin_addr.s_addr = 0;
 } else {
-if (qemu_isdigit(addr[0])) {
-if (!inet_aton(addr, &saddr->sin_addr)) {
+if (qemu_isdigit(host[0])) {
+if (!inet_aton(host, &saddr->sin_addr)) {
 error_setg(errp, "host address '%s' is not a valid "
-   "IPv4 address", addr);
-ret = -1;
-goto out;
+   "IPv4 address", host);
+return -1;
 }
 } else {
-he = gethostbyname(addr);
+he = gethostbyname(host);
 if (he == NULL) {
-error_setg(errp, "can't resolve host address '%s'", addr);
-ret = -1;
-goto out;
+error_setg(errp, "can't resolve host address '%s'", host);
+return -1;
 }
 saddr->sin_addr = *(struct in_addr *)he->h_addr;
 }
 }
-port = strtol(p, (char **)&r, 0);
-if (r == p) {
-error_setg(errp, "port number '%s' is invalid", p);
+if (qemu_strtol(port, &r, 0, &p) != 0) {
+error_setg(errp, "port number '%s' is invalid", port);
+return -1;
+}
+saddr->sin_port = htons(p);
+return 0;
+}
+
+int parse_host_port(struct sockaddr_in *saddr, const char *str,
+Error **errp)
+{
+gchar **substrings;
+int ret;
+
+substrings = g_strsplit(str, ":", 2);
+if (!substrings || !substrings[0] || !substrings[1]) {
+error_setg(errp, "host address '%s' doesn't contain ':' "
+   "separating host from port", str);
 ret = -1;
 goto out;
 }
-saddr->sin_port = htons(port);
+
+ret = convert_host_port(saddr, substrings[0], substrings[1], errp);
 
 out:
 g_strfreev(substrings);
-- 
2.37.1




[PATCH v3 0/1] python/machine: Fix AF_UNIX path too long on macOS

2022-07-22 Thread Peter Delevoryas
v1: https://lore.kernel.org/qemu-devel/20220705214659.73369-1-pe...@pjd.dev/
v2: https://lore.kernel.org/qemu-devel/20220716173434.17183-1-pe...@pjd.dev/
v3:
- Changed QEMUMachine._name to f"{id(self):x}". Suggestion was to do
  f"{id(self):02x}", but the id's look like they are probably just the
  object address (8-byte pointer), so the "02" had no effect.
- Changed QMP socket name suffix from "-monitor.sock" to ".qmp".
- Changed console socket name suffix from "-console.sock" to ".con".

Thanks for all the comments and suggestions! Glad to be fixing this.
Peter

Peter Delevoryas (1):
  python/machine: Fix AF_UNIX path too long on macOS

 python/qemu/machine/machine.py | 6 +++---
 tests/avocado/avocado_qemu/__init__.py | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.37.0




[PATCH v3 1/1] python/machine: Fix AF_UNIX path too long on macOS

2022-07-22 Thread Peter Delevoryas
On macOS, private $TMPDIR's are the default. These $TMPDIR's are
generated from a user's unix UID and UUID [1], which can create a
relatively long path:

/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/

QEMU's avocado tests create a temporary directory prefixed by
"avo_qemu_sock_", and create QMP sockets within _that_ as well.
The QMP socket is unnecessarily long, because a temporary directory
is created for every QEMUMachine object.

/avo_qemu_sock_uh3w_dgc/qemu-37331-10bacf110-monitor.sock

The path limit for unix sockets on macOS is 104: [2]

/*
 * [XSI] Definitions for UNIX IPC domain.
 */
struct  sockaddr_un {
unsigned char   sun_len;/* sockaddr len including null */
sa_family_t sun_family; /* [XSI] AF_UNIX */
charsun_path[104];  /* [XSI] path name (gag) */
};

This results in avocado tests failing on macOS because the QMP unix
socket can't be created, because the path is too long:

ERROR| Failed to establish connection: OSError: AF_UNIX path too long

This change resolves by reducing the size of the socket directory prefix
and the suffix on the QMP and console socket names.

The result is paths like this:

pdel@pdel-mbp:/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T
$ tree qemu*
qemu_df4evjeq
qemu_jbxel3gy
qemu_ml9s_gg7
qemu_oc7h7f3u
qemu_oqb1yf97
├── 10a004050.con
└── 10a004050.qmp

[1] 
https://apple.stackexchange.com/questions/353832/why-is-mac-osx-temp-directory-in-weird-path
[2] /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/sys/un.h

Signed-off-by: Peter Delevoryas 
---
 python/qemu/machine/machine.py | 6 +++---
 tests/avocado/avocado_qemu/__init__.py | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 37191f433b..5df210c810 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -157,7 +157,7 @@ def __init__(self,
 self._wrapper = wrapper
 self._qmp_timer = qmp_timer
 
-self._name = name or f"qemu-{os.getpid()}-{id(self):02x}"
+self._name = name or f"{id(self):x}"
 self._temp_dir: Optional[str] = None
 self._base_temp_dir = base_temp_dir
 self._sock_dir = sock_dir
@@ -167,7 +167,7 @@ def __init__(self,
 self._monitor_address = monitor_address
 else:
 self._monitor_address = os.path.join(
-self.sock_dir, f"{self._name}-monitor.sock"
+self.sock_dir, f"{self._name}.qmp"
 )
 
 self._console_log_path = console_log
@@ -192,7 +192,7 @@ def __init__(self,
 self._console_set = False
 self._console_device_type: Optional[str] = None
 self._console_address = os.path.join(
-self.sock_dir, f"{self._name}-console.sock"
+self.sock_dir, f"{self._name}.con"
 )
 self._console_socket: Optional[socket.socket] = None
 self._remove_files: List[str] = []
diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index ed4853c805..43b8c8848c 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -296,7 +296,7 @@ def require_accelerator(self, accelerator):
 "available" % accelerator)
 
 def _new_vm(self, name, *args):
-self._sd = tempfile.TemporaryDirectory(prefix="avo_qemu_sock_")
+self._sd = tempfile.TemporaryDirectory(prefix="qemu_")
 vm = QEMUMachine(self.qemu_bin, base_temp_dir=self.workdir,
  sock_dir=self._sd.name, log_dir=self.logdir)
 self.log.debug('QEMUMachine "%s" created', name)
-- 
2.37.0




Re: [PATCH v2 1/1] python/machine: Fix AF_UNIX path too long on macOS

2022-07-22 Thread Peter Delevoryas
On Fri, Jul 22, 2022 at 08:20:11AM +0100, Daniel P. Berrangé wrote:
> On Thu, Jul 21, 2022 at 07:44:21PM -0700, Peter Delevoryas wrote:
> > On Mon, Jul 18, 2022 at 09:56:17AM +0100, Daniel P. Berrangé wrote:
> > > On Sat, Jul 16, 2022 at 10:34:34AM -0700, Peter Delevoryas wrote:
> > > > On macOS, private $TMPDIR's are the default. These $TMPDIR's are
> > > > generated from a user's unix UID and UUID [1], which can create a
> > > > relatively long path:
> > > > 
> > > > /var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/
> > > > 
> > > > QEMU's avocado tests create a temporary directory prefixed by
> > > > "avo_qemu_sock_", and create QMP sockets within _that_ as well.
> > > > The QMP socket is unnecessarily long, because a temporary directory
> > > > is created for every QEMUMachine object.
> > > > 
> > > > /avo_qemu_sock_uh3w_dgc/qemu-37331-10bacf110-monitor.sock
> > > 
> > > 
> > > Looking at this again, I realize my suggestion for dealing with the
> > > second part of the path was mistaken.
> > > 
> > > The "qemu-37331-10bacf110-monitor.sock" part is combining two
> > > pieces.
> > > 
> > > First the result of
> > > 
> > >   f"qemu-{os.getpid()}-{id(self):02x}"
> > > 
> > > is
> > > 
> > >   qemu-37331-10bacf110
> > > 
> > > and the code later than appends '-monitor.sock'
> > > 
> > > So...
> > > 
> > > > 
> > > > The path limit for unix sockets on macOS is 104: [2]
> > > > 
> > > > /*
> > > >  * [XSI] Definitions for UNIX IPC domain.
> > > >  */
> > > > struct  sockaddr_un {
> > > 
> > > > unsigned char   sun_len;/* sockaddr len including null 
> > > > */
> > > > sa_family_t sun_family; /* [XSI] AF_UNIX */
> > > > charsun_path[104];  /* [XSI] path name (gag) */
> > > > };
> > > > 
> > > > This results in avocado tests failing on macOS because the QMP unix
> > > > socket can't be created, because the path is too long:
> > > > 
> > > > ERROR| Failed to establish connection: OSError: AF_UNIX path too 
> > > > long
> > > > 
> > > > This change reduces the size of both paths, and removes the unique
> > > > identification information from the socket name, since it seems to be
> > > > unnecessary.
> > > > 
> > > > This commit produces paths like the following:
> > > > 
> > > > pdel@pdel-mbp:/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T
> > > > $ tree qemu*
> > > > qemu_oc7h7f3u
> > > > ├── qmp-console.sock
> > > > └── qmp-monitor.sock
> > > > 
> > > > [1] 
> > > > https://apple.stackexchange.com/questions/353832/why-is-mac-osx-temp-directory-in-weird-path
> > > > [2] 
> > > > /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/sys/un.h
> > > > 
> > > > Signed-off-by: Peter Delevoryas 
> > > > ---
> > > >  python/qemu/machine/machine.py | 2 +-
> > > >  tests/avocado/avocado_qemu/__init__.py | 2 +-
> > > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/python/qemu/machine/machine.py 
> > > > b/python/qemu/machine/machine.py
> > > > index 37191f433b..b1823966b3 100644
> > > > --- a/python/qemu/machine/machine.py
> > > > +++ b/python/qemu/machine/machine.py
> > > > @@ -157,7 +157,7 @@ def __init__(self,
> > > >  self._wrapper = wrapper
> > > >  self._qmp_timer = qmp_timer
> > > >  
> > > > -self._name = name or f"qemu-{os.getpid()}-{id(self):02x}"
> > > > +self._name = name or "qmp"
> > > 
> > > ...my suggestion here was wrong.
> > > 
> > > We don't need the os.getpid() unoiqueness because the tmpdir already
> > > ensures that is safe, but keeping 'id(self)' is a good idea, if the
> > > test case creates multiple machines concurrently. Bearing in mind we
> > > later append '-monitor.sock' we don't need 'qmp' in the self._name.
> > > 
> > > So on reflection I think I should have suggested using:
> > > 
> > > self._name = name or f"{id(self):02x}"
> > > 
> > > And *in addition*, a few lines later change:
> > > 
> > > self._monitor_address = os.path.join(
> > > self.sock_dir, f"{self._name}-monitor.sock"
> > > )
> > > 
> > > To
> > > 
> > > self._monitor_address = os.path.join(
> > > self.sock_dir, f"{self._name}.qmp"
> > > )
> > >
> > 
> > Finally getting back to this (sorry, been working on other stuff), and I 
> > noticed
> > the console socket is just below this:
> > 
> > self._console_address = os.path.join(
> > self.sock_dir, f"{self._name}-console.sock"
> > )
> > 
> > So I probably shouldn't do the "-monitor.sock" change right?
> 
> I'd suggest changing this one to   f"{self._name}.con" at the
> same time.

Ohh ok, yeah that's nice and short. Ok I'll include the socket name suffix
changes then.

> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.or

Python package qemu.qmp v0.0.1 released

2022-07-22 Thread John Snow
I'm pleased to announce the very first version of the standalone QMP
library for Python, "qemu.qmp".

PyPI: https://pypi.org/project/qemu.qmp/
Docs: https://qemu-project.gitlab.io/python-qemu-qmp/
Source: https://gitlab.com/qemu-project/python-qemu-qmp

This library is identical to the one currently in the QEMU repo, with
a generous helping of extra GitLab CI rules and PyPI packaging
scripts. In 2021, the QMP library originally written by Luiz
Capitulino was re-written from the ground up to include native support
for Python asyncio, which brings along with it some new features and
characteristics:

- Full, native support for Python asyncio
- True asynchronous event handling and full-duplex communication
- Arbitrary numbers of separate event-handling queues (EventListener objects)
- Support for out-of-band execution
- Continued support for QEMU Guest Agent
- Robust error detection and error reporting mechanisms
- Semantic error classes for simplified client writing
- Extensible callback hooks for logging and rewriting I/O messages
- Support for Python 3.6 through the upcoming Python 3.11
- Unit testing of the library itself provided by Avocado Framework
- Compatibility with Luiz's QMP library provided by a synchronous shim class
- Fully statically typed using Python 3.6 type hints and validated with mypy
- PEP 561 compliant export of static typing information for client packages
- Rigorously documented API, including all error pathways and exceptions
- *Zero* mandatory dependencies! This is all Python stdlib native.
- LGPLv2+ license for easier external integration.
- Fully self-contained repository with all publishing and packaging
scripts included.
- qmp-shell and qmp-shell-wrap convenience/debugging utilities are included
- An early alpha version of qmp-tui which has support for displaying
asynchronous events is also included. This is the version authored by
Niteesh for Summer 2021's GSoC session.
- Extensible "AsyncProtocol" class that can be used to implement QTEST
or other full-duplex, asynchronous message-based protocols.
- Native GitLab merge request contributor workflow for ease of
contribution by newer, casual contributors. Merge request
announcements are automatically relayed to the qemu-devel list via
GitLab webhook integrations so that core QEMU developers don't miss
out on development activity.

Cool, in my opinion! This library has been the provider of QMP support
for all of our Python tests in qemu.git for about two releases now, so
we've already been using it for some time. I would be flattered if you
didn't even notice.

Some other features that are in a draft state and need polish, docs,
and review, but do exist;

- A fully statically typed implementation of the qtest protocol built
on top of AsyncProtocol
- An extensible implementation of a complete QMP server, useful for
acting as a debugging proxy to a QMP server, unit testing of the QMP
library itself, or for more meticulous unit testing of other QMP
clients.

With the first official release of this library now published,
attention will shift to integrating this library back into qemu.git,
ensuring that both internal and external projects benefit from the
same library.

Please check out the README visible from PyPI, the source repository,
or the documentation site for more information; see the issue tracker
for pending issues and enhancements, or refer to the project
milestones on GitLab[1] for future roadmap information on this
mini-project.

Thanks to everyone that helped get to this point; in no particular
order: Luiz, Hanna, Kevin, Vladimir, Kashyap, Eduardo, Cleber, Daniel,
Niteesh, Willian, Beraldo, Wainer, Thomas, Eric, Paolo, Stefan,
Andrea, and Victor. Thanks to my friends outside of work for their
help, too; thanks Mike J.!

Enjoy!

--John Snow

[1] https://gitlab.com/qemu-project/python-qemu-qmp/-/milestones




[PULL 7/8] hw/rx: pass random seed to fdt

2022-07-22 Thread Paolo Bonzini
From: "Jason A. Donenfeld" 

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This FDT node is part of the DT specification.

Cc: Yoshinori Sato 
Signed-off-by: Jason A. Donenfeld 
Message-Id: <20220719122033.135902-1-ja...@zx2c4.com>
Signed-off-by: Paolo Bonzini 
---
 hw/rx/rx-gdbsim.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
index be147b4bd9..8ffe1b8035 100644
--- a/hw/rx/rx-gdbsim.c
+++ b/hw/rx/rx-gdbsim.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
+#include "qemu/guest-random.h"
 #include "qapi/error.h"
 #include "hw/loader.h"
 #include "hw/rx/rx62n.h"
@@ -83,6 +84,7 @@ static void rx_gdbsim_init(MachineState *machine)
 MemoryRegion *sysmem = get_system_memory();
 const char *kernel_filename = machine->kernel_filename;
 const char *dtb_filename = machine->dtb;
+uint8_t rng_seed[32];
 
 if (machine->ram_size < mc->default_ram_size) {
 char *sz = size_to_str(mc->default_ram_size);
@@ -140,6 +142,8 @@ static void rx_gdbsim_init(MachineState *machine)
 error_report("Couldn't set /chosen/bootargs");
 exit(1);
 }
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+qemu_fdt_setprop(dtb, "/chosen", "rng-seed", rng_seed, 
sizeof(rng_seed));
 /* DTB is located at the end of SDRAM space. */
 dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16);
 rom_add_blob_fixed("dtb", dtb, dtb_size,
-- 
2.36.1





[PULL 4/8] oss-fuzz: ensure base_copy is a generic-fuzzer

2022-07-22 Thread Paolo Bonzini
From: Alexander Bulekov 

Depending on how the target list is sorted in by qemu, the first target
(used as the base copy of the fuzzer, to which all others are linked)
might not be a generic-fuzzer. Since we are trying to only use
generic-fuzz, on oss-fuzz, fix that, to ensure the base copy is a
generic-fuzzer.

Signed-off-by: Alexander Bulekov 
Message-Id: <20220720180946.2264253-1-alx...@bu.edu>
Signed-off-by: Paolo Bonzini 
---
 scripts/oss-fuzz/build.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/oss-fuzz/build.sh b/scripts/oss-fuzz/build.sh
index 5ee9141e3e..3bda0d72c7 100755
--- a/scripts/oss-fuzz/build.sh
+++ b/scripts/oss-fuzz/build.sh
@@ -92,7 +92,7 @@ make install DESTDIR=$DEST_DIR/qemu-bundle
 rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/bin
 rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/libexec
 
-targets=$(./qemu-fuzz-i386 | awk '$1 ~ /\*/  {print $2}')
+targets=$(./qemu-fuzz-i386 | grep generic-fuzz | awk '$1 ~ /\*/  {print $2}')
 base_copy="$DEST_DIR/qemu-fuzz-i386-target-$(echo "$targets" | head -n 1)"
 
 cp "./qemu-fuzz-i386" "$base_copy"
-- 
2.36.1





[PULL 5/8] hw/nios2: virt: pass random seed to fdt

2022-07-22 Thread Paolo Bonzini
From: "Jason A. Donenfeld" 

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This FDT node is part of the DT specification.

Cc: Chris Wulff 
Cc: Marek Vasut 
Signed-off-by: Jason A. Donenfeld 
Message-Id: <20220719120113.118034-1-ja...@zx2c4.com>
Signed-off-by: Paolo Bonzini 
---
 hw/nios2/boot.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/nios2/boot.c b/hw/nios2/boot.c
index 07b8d87633..21cb47 100644
--- a/hw/nios2/boot.c
+++ b/hw/nios2/boot.c
@@ -34,6 +34,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
+#include "qemu/guest-random.h"
 #include "sysemu/device_tree.h"
 #include "sysemu/reset.h"
 #include "hw/boards.h"
@@ -83,6 +84,7 @@ static int nios2_load_dtb(struct nios2_boot_info bi, const 
uint32_t ramsize,
 int fdt_size;
 void *fdt = NULL;
 int r;
+uint8_t rng_seed[32];
 
 if (dtb_filename) {
 fdt = load_device_tree(dtb_filename, &fdt_size);
@@ -91,6 +93,9 @@ static int nios2_load_dtb(struct nios2_boot_info bi, const 
uint32_t ramsize,
 return 0;
 }
 
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+qemu_fdt_setprop(fdt, "/chosen", "rng-seed", rng_seed, sizeof(rng_seed));
+
 if (kernel_cmdline) {
 r = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
 kernel_cmdline);
-- 
2.36.1





[PULL 8/8] hw/i386: pass RNG seed via setup_data entry

2022-07-22 Thread Paolo Bonzini
From: "Jason A. Donenfeld" 

Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (≥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.

At Paolo's request, we don't pass these to versioned machine types ≤7.0.

Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Marcel Apfelbaum 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Peter Maydell 
Cc: Philippe Mathieu-Daudé 
Cc: Laurent Vivier 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Jason A. Donenfeld 
Message-Id: <20220721125636.446842-1-ja...@zx2c4.com>
Signed-off-by: Paolo Bonzini 
---
 hw/i386/microvm.c|  2 +-
 hw/i386/pc.c |  4 +--
 hw/i386/pc_piix.c|  2 ++
 hw/i386/pc_q35.c |  2 ++
 hw/i386/x86.c| 26 +---
 include/hw/i386/pc.h |  3 +++
 include/hw/i386/x86.h|  3 ++-
 include/standard-headers/asm-x86/bootparam.h |  1 +
 8 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index dc929727dc..7fe8cce03e 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -332,7 +332,7 @@ static void microvm_memory_init(MicrovmMachineState *mms)
 rom_set_fw(fw_cfg);
 
 if (machine->kernel_filename != NULL) {
-x86_load_linux(x86ms, fw_cfg, 0, true);
+x86_load_linux(x86ms, fw_cfg, 0, true, false);
 }
 
 if (mms->option_roms) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 774cb2bf07..d2b5823ffb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -796,7 +796,7 @@ void xen_load_linux(PCMachineState *pcms)
 rom_set_fw(fw_cfg);
 
 x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
-   pcmc->pvh_enabled);
+   pcmc->pvh_enabled, pcmc->legacy_no_rng_seed);
 for (i = 0; i < nb_option_roms; i++) {
 assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
!strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
@@ -992,7 +992,7 @@ void pc_memory_init(PCMachineState *pcms,
 
 if (linux_boot) {
 x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
-   pcmc->pvh_enabled);
+   pcmc->pvh_enabled, pcmc->legacy_no_rng_seed);
 }
 
 for (i = 0; i < nb_option_roms; i++) {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a234989ac3..fbf9465318 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -438,9 +438,11 @@ DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1", NULL,
 
 static void pc_i440fx_7_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_7_1_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
+pcmc->legacy_no_rng_seed = true;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f96cbd04e2..12cc76aaf8 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -375,8 +375,10 @@ DEFINE_Q35_MACHINE(v7_1, "pc-q35-7.1", NULL,
 
 static void pc_q35_7_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_7_1_machine_options(m);
 m->alias = NULL;
+pcmc->legacy_no_rng_seed = true;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 6003b4b2df..ecea25d249 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -26,6 +26,7 @@
 #include "qemu/cutils.h"
 #include "qemu/units.h"
 #include "qemu/datadir.h"
+#include "qemu/guest-random.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qapi-visit-common.h"
@@ -766,7 +767,8 @@ static bool load_elfboot(const char *kernel_filename,
 void x86_load_linux(X86MachineState *x86ms,
 FWCfgState *fw_cfg,
 int acpi_data_size,
-bool pvh_enabled)
+bool pvh_enabled,
+bool legacy_no_rng_seed)
 {
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
@@ -774,7 +776,7 @@ void x86_load_linux(X86MachineState *x86ms,
 int dtb_size, setup_data_offset;
 uint32_t initrd_max;
 uint8_t header[8192], *setup, *kernel;
-hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
+hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
 FILE *f;
 char *vmode;
 MachineState *machine = MACHINE(x86ms);
@@ -784,6 +786,7 @@ void x86_load_linux(X86Machi

[PULL 3/8] oss-fuzz: remove binaries from qemu-bundle tree

2022-07-22 Thread Paolo Bonzini
oss-fuzz is finding possible fuzzing targets even under qemu-bundle/.../bin, 
but they
cannot be used because the required shared libraries are missing.  Since the
fuzzing targets are already placed manually in $OUT, the bindir and libexecdir
subtrees are not needed; remove them.

Cc: Alexander Bulekov 
Signed-off-by: Paolo Bonzini 
---
 scripts/oss-fuzz/build.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/oss-fuzz/build.sh b/scripts/oss-fuzz/build.sh
index 2656a89aea..5ee9141e3e 100755
--- a/scripts/oss-fuzz/build.sh
+++ b/scripts/oss-fuzz/build.sh
@@ -87,8 +87,10 @@ if [ "$GITLAB_CI" != "true" ]; then
 make "-j$(nproc)" qemu-fuzz-i386 V=1
 fi
 
-# Prepare a preinstalled tree
+# Place data files in the preinstall tree
 make install DESTDIR=$DEST_DIR/qemu-bundle
+rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/bin
+rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/libexec
 
 targets=$(./qemu-fuzz-i386 | awk '$1 ~ /\*/  {print $2}')
 base_copy="$DEST_DIR/qemu-fuzz-i386-target-$(echo "$targets" | head -n 1)"
-- 
2.36.1





[PULL 6/8] hw/mips: boston: pass random seed to fdt

2022-07-22 Thread Paolo Bonzini
From: "Jason A. Donenfeld" 

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This FDT node is part of the DT specification.

I'd do the same for other MIPS platforms but boston is the only one that
seems to use FDT.

Cc: Paul Burton 
Cc: Aleksandar Rikalo 
Cc: Philippe Mathieu-Daudé 
Signed-off-by: Jason A. Donenfeld 
Message-Id: <20220719120843.134392-1-ja...@zx2c4.com>
Signed-off-by: Paolo Bonzini 
---
 hw/mips/boston.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/mips/boston.c b/hw/mips/boston.c
index 1debca18ec..d2ab9da1a0 100644
--- a/hw/mips/boston.c
+++ b/hw/mips/boston.c
@@ -34,6 +34,7 @@
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "qemu/guest-random.h"
 #include "qemu/log.h"
 #include "chardev/char.h"
 #include "sysemu/device_tree.h"
@@ -363,6 +364,7 @@ static const void *boston_fdt_filter(void *opaque, const 
void *fdt_orig,
 size_t ram_low_sz, ram_high_sz;
 size_t fdt_sz = fdt_totalsize(fdt_orig) * 2;
 g_autofree void *fdt = g_malloc0(fdt_sz);
+uint8_t rng_seed[32];
 
 err = fdt_open_into(fdt_orig, fdt, fdt_sz);
 if (err) {
@@ -370,6 +372,9 @@ static const void *boston_fdt_filter(void *opaque, const 
void *fdt_orig,
 return NULL;
 }
 
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+qemu_fdt_setprop(fdt, "/chosen", "rng-seed", rng_seed, sizeof(rng_seed));
+
 cmdline = (machine->kernel_cmdline && machine->kernel_cmdline[0])
 ? machine->kernel_cmdline : " ";
 err = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
-- 
2.36.1





[PULL v2 0/8] More fixes + random seed patches for QEMU 7.1

2022-07-22 Thread Paolo Bonzini
The following changes since commit 5288bee45fbd33203b61f8c76e41b15bb5913e6e:

  Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
(2022-07-21 11:13:01 +0100)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream2

for you to fetch changes up to 9fa032885583a2f1cb9cacad2f717784ccea02a1:

  hw/i386: pass RNG seed via setup_data entry (2022-07-22 19:01:44 +0200)


* Bug fixes
* Pass random seed to x86 and other FDT platforms


Alexander Bulekov (1):
  oss-fuzz: ensure base_copy is a generic-fuzzer

Bin Meng (1):
  docs: Add caveats for Windows as the build platform

Jason A. Donenfeld (4):
  hw/nios2: virt: pass random seed to fdt
  hw/mips: boston: pass random seed to fdt
  hw/rx: pass random seed to fdt
  hw/i386: pass RNG seed via setup_data entry

Paolo Bonzini (1):
  oss-fuzz: remove binaries from qemu-bundle tree

Peter Maydell (1):
  accel/kvm: Avoid Coverity warning in query_stats()

 accel/kvm/kvm-all.c  |  2 +-
 docs/about/build-platforms.rst   | 10 +-
 hw/i386/microvm.c|  2 +-
 hw/i386/pc.c |  4 ++--
 hw/i386/pc_piix.c|  2 ++
 hw/i386/pc_q35.c |  2 ++
 hw/i386/x86.c| 26 ++
 hw/mips/boston.c |  5 +
 hw/nios2/boot.c  |  5 +
 hw/rx/rx-gdbsim.c|  4 
 include/hw/i386/pc.h |  3 +++
 include/hw/i386/x86.h|  3 ++-
 include/standard-headers/asm-x86/bootparam.h |  1 +
 scripts/oss-fuzz/build.sh|  6 --
 14 files changed, 63 insertions(+), 12 deletions(-)
-- 
2.36.1




[PULL 1/8] docs: Add caveats for Windows as the build platform

2022-07-22 Thread Paolo Bonzini
From: Bin Meng 

Commit cf60ccc3306c ("cutils: Introduce bundle mechanism") introduced
a Python script to populate a bundle directory using os.symlink() to
point to the binaries in the pc-bios directory of the source tree.
Commit 882084a04ae9 ("datadir: Use bundle mechanism") removed previous
logic in pc-bios/meson.build to create a link/copy of pc-bios binaries
in the build tree so os.symlink() is the way to go.

However os.symlink() may fail [1] on Windows if an unprivileged Windows
user started the QEMU build process, which results in QEMU executables
generated in the build tree not able to load the default BIOS/firmware
images due to symbolic links not present in the bundle directory.

This commits updates the documentation by adding such caveats for users
who want to build QEMU on the Windows platform.

[1] https://docs.python.org/3/library/os.html#os.symlink

Signed-off-by: Bin Meng 
Reviewed-by: Stefan Weil 
Reviewed-by: Akihiko Odaki 
Message-Id: <20220719135014.764981-1-bmeng...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 docs/about/build-platforms.rst | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/about/build-platforms.rst b/docs/about/build-platforms.rst
index ebde20f981..6b8496c430 100644
--- a/docs/about/build-platforms.rst
+++ b/docs/about/build-platforms.rst
@@ -94,8 +94,16 @@ not tested anymore, so it is recommended to use one of the 
latest versions of
 Windows instead.
 
 The project supports building QEMU with current versions of the MinGW
-toolchain, either hosted on Linux (Debian/Fedora) or via MSYS2 on Windows.
+toolchain, either hosted on Linux (Debian/Fedora) or via `MSYS2`_ on Windows.
+A more recent Windows version is always preferred as it is less likely to have
+problems with building via MSYS2. The building process of QEMU involves some
+Python scripts that call os.symlink() which needs special attention for the
+build process to successfully complete. On newer versions of Windows 10,
+unprivileged accounts can create symlinks if Developer Mode is enabled.
+When Developer Mode is not available/enabled, the SeCreateSymbolicLinkPrivilege
+privilege is required, or the process must be run as an administrator.
 
 .. _Homebrew: https://brew.sh/
 .. _MacPorts: https://www.macports.org/
+.. _MSYS2: https://www.msys2.org/
 .. _Repology: https://repology.org/
-- 
2.36.1





[PULL 2/8] accel/kvm: Avoid Coverity warning in query_stats()

2022-07-22 Thread Paolo Bonzini
From: Peter Maydell 

Coverity complains that there is a codepath in the query_stats()
function where it can leak the memory pointed to by stats_list.  This
can only happen if the caller passes something other than
STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no
callsite does.  Enforce this assumption using g_assert_not_reached(),
so that if we have a future bug we hit the assert rather than
silently leaking memory.

Resolves: Coverity CID 1490140
Fixes: cc01a3f4cadd91e6 ("kvm: Support for querying fd-based stats")
Signed-off-by: Peter Maydell 
Message-Id: <20220719134853.327059-1-peter.mayd...@linaro.org>
Signed-off-by: Paolo Bonzini 
---
 accel/kvm/kvm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 99aede73b7..f165074e99 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -4014,7 +4014,7 @@ static void query_stats(StatsResultList **result, 
StatsTarget target,
 stats_list);
 break;
 default:
-break;
+g_assert_not_reached();
 }
 }
 
-- 
2.36.1





Re: Corrupted display changing screen colour depth in qemu-system-ppc/MacOS

2022-07-22 Thread Mark Cave-Ayland

On 22/07/2022 14:44, Marc-André Lureau wrote:


Hi

On Fri, Jul 22, 2022 at 4:28 PM Howard Spoelstra  wrote:




On Fri, Jun 17, 2022 at 2:38 PM Marc-André Lureau  
wrote:


Hi

On Fri, Jun 17, 2022 at 1:56 PM Gerd Hoffmann  wrote:


   Hi,


Can you try ditch the QEMU_ALLOCATED_FLAG check added by the commit?


Commit cb8962c146 drops the QEMU_ALLOCATED_FLAG check: if I add it back in
with the following diff on top then everything works again:


Ah, the other way around.


diff --git a/ui/console.c b/ui/console.c
index 365a2c14b8..decae4287f 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -2400,11 +2400,12 @@ static void vc_chr_open(Chardev *chr,

  void qemu_console_resize(QemuConsole *s, int width, int height)
  {
-DisplaySurface *surface;
+DisplaySurface *surface = qemu_console_surface(s);

  assert(s->console_type == GRAPHIC_CONSOLE);

-if (qemu_console_get_width(s, -1) == width &&
+if (surface && (surface->flags & QEMU_ALLOCATED_FLAG) &&
+qemu_console_get_width(s, -1) == width &&
  qemu_console_get_height(s, -1) == height) {
  return;
  }


Which depth changes triggers this?  Going from direct color to a
paletted mode?


A quick test suggests anything that isn't 32-bit colour is affected.


Hmm, I think the commit should simply be reverted.

Short-cutting the qemu_console_resize() call is only valid in case the
current surface was created by qemu_console_resize() too.  When it is
something else -- typically a surface backed by vga vram -- it's not.
Looking at the QEMU_ALLOCATED_FLAG checks exactly that ...


Oh ok, it might be worth adding a comment to clarify that. By
reverting, we are going back to the situation where
qemu_console_resize() will create a needless surface when rendering
with GL. As I tried to explain in the commit message, it will need
more changes to prevent that. I can take a look later.



Hi Marc-André,

I wondered whether you've had a chance to look at this?



No, it's not clear to me how to reproduce it. Someone that can
actually test it should send a patch with some comments to explain it.


Unfortunately I don't know anything about the host display code, but I think I should 
be able to come up with some Forth to run from the command line that will reproduce 
the issue with qemu-system-ppc. Let me see if I can come up with something...



ATB,

Mark.



Re: [PATCH v3] hw/pci/pci_bridge: ensure PCIe slots have only one slot

2022-07-22 Thread Mark Cave-Ayland

On 22/07/2022 08:28, Thomas Huth wrote:


On 21/07/2022 18.05, Mark Cave-Ayland wrote:

On 21/07/2022 16:56, Daniel P. Berrangé wrote:


On Thu, Jul 21, 2022 at 04:51:51PM +0100, Mark Cave-Ayland wrote:

On 21/07/2022 15:28, Roman Kagan wrote:

(lots cut)


In the guest (Fedora 34):

[root@test ~]# lspci -tv
-[:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
 +-01.0  Device 1234:
 +-02.0  Red Hat, Inc. QEMU XHCI Host Controller
 +-05.0-[01]00.0  Red Hat, Inc. Virtio block device
 +-05.1-[02]00.0  Red Hat, Inc. Virtio network device
 +-05.2-[03]--
 +-05.3-[04]--
 +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
 \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

Changing addr of the second disk from 4 to 0 makes it appear in the
guest.

What exactly do you find odd?


Thanks for this, the part I wasn't sure about was whether the device ids in
the command line matched the primary PCI bus or the secondary PCI bus.

In that case I suspect that the enumeration of non-zero PCIe devices fails
in Linux because of the logic here:
https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2622.


Just above that though is logic that handles 'pci=pcie_scan_all'
kernel parameter, to make it look for non-zero devices.


I don't have a copy of the PCIe specification, but assuming the comment is
true then your patch looks correct to me. I think it would be worth adding a
similar comment and reference to your patch to explain why the logic is
required, which should also help the PCI maintainers during review.


The docs above with the pci=pcie_scan_all suggest it is unusual but not
forbidden.


That's interesting as I read it completely the other way around, i.e. PCIe 
downstream ports should only have device 0 and the PCI_SCAN_ALL_PCIE_DEVS flag is 
there for broken/exotic hardware :)


Perhaps if someone has a copy of the PCIe specification they can check the wording 
in section 7.3.1 to see exactly what the correct behaviour should be?


I've got an older version here... it talks about the "Alternative Routing-ID 
Interpretation" (ARI) there:


"With non-ARI Devices, PCI Express components are restricted to implementing a single 
Device Number on their primary interface (Upstream Port), but are permitted to 
implement up to eight
independent Functions within that Device Number. [...] Downstream Ports that do not 
have ARI Forwarding enabled must associate only Device 0 with the device [...].
With an ARI Device, its Device Number is implied to be 0 rather than specified by a 
field within an ID. The traditional 5-bit Device Number and 3-bit Function Number 
fields in its associated
Routing IDs, Requester IDs, and Completer IDs are interpreted as a single 8-bit 
Function Number."


There was also an older patch similar to yours here:


https://lore.kernel.org/all/33183cc9f5247a488a2544077af1902086d6b...@szxema503-mbs.china.huawei.com/T/ 



... but if I've got that right, it has never been merged?


(goes and looks)

Thanks! I see, so it appears the older patch wasn't merged because it wasn't possible 
to test the ARI logic (which is missing from Roman's patch) and I suspect 2014 
pre-dates the slot_reserved_mask functionality which I think is the better solution 
for recent QEMU.



ATB,

Mark.



Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-22 Thread Paolo Bonzini
Ok I will resend the pull request. Apologies for overstepping.

Paolo

Il ven 22 lug 2022, 16:37 Alex Bennée  ha scritto:

>
> "Jason A. Donenfeld"  writes:
>
> > Hey Alex,
> >
> > On Fri, Jul 22, 2022 at 10:45:19AM +0100, Alex Bennée wrote:
> >> All the guest-loader does is add the information about where in memory a
> >> guest and/or it's initrd have been placed in memory to the DTB. It's
> >> entirely up to the initial booted code (usually a hypervisor in this
> >> case) to decide what gets passed up the chain to any subsequent guests.
> >
> > I think that's also my understanding, but let me tell you what I was
> > thinking with regards to rng-seed there, and you can tell me if I'm way
> > off.
> >
> > The guest-loader puts in memory various loaders in a multistage boot.
> > Let's call it stage0, stage1, stage2, and finally the kernel. Normally,
> > rng-seed is only given to one of these stages. That stage may or may not
> > pass it to the next one, and it most probably does not. And why should
> > it? The host is in a better position to generate these seeds, rather
> > than adding finicky and fragile crypto ratcheting code into each stage
> > bootloader. So, instead, QEMU can just give each stage its own seed, for
> > it to do whatever with. This way, if stage1 does nothing, at least
> > there's a fresh unused one available for the kernel when it finally gets
> > there.
>
> That sounds suspiciously like inventing a new ABI between QEMU and
> guests which we generally try to avoid. The DTB exposed to the first
> stage may never be made visible to the following stages or more likely a
> sanitised version is prepared by the previous stage. Generally QEMU just
> tries to get the emulation right so the firmware/software can get on
> with it's thing. Indeed the dynamic DTB for -M virt and friends is an
> oddity compared to most of the other machine types which assume the user
> has a valid DTB.
>
> Either way given how close to release we are I'd rather drop this patch.
>
> > Does what I describe correspond at all with the use of guest-loader? If
> > so, maybe this patch should stay? If not, discard it as rubbish.
>
> The original intent of the guest-loader was to make testing of
> hypervisors easier because the alternative is getting a multi-stage boot
> chain of firmware, boot-loaders and distro specific integration working
> which can be quite opaque to debug (c.f. why -kernel/-initrd exist and
> not everyone boots via -bios/-pflash).
>
> >
> > Jason
>
>
> --
> Alex Bennée
>
>


Re: [PATCH v2 2/3] target/s390x: display deprecation status in '-cpu help'

2022-07-22 Thread Cornelia Huck
On Fri, Jul 22 2022, Daniel P. Berrangé  wrote:

> When the user queries CPU models via QMP there is a 'deprecated' flag
> present, however, this is not done for the CLI '-cpu help' command.
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  target/s390x/cpu_models.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)

Reviewed-by: Cornelia Huck 




Re: [PATCH v2 3/3] target/arm: display deprecation status in '-cpu help'

2022-07-22 Thread Cornelia Huck
On Fri, Jul 22 2022, Daniel P. Berrangé  wrote:

> When the user queries CPU models via QMP there is a 'deprecated' flag
> present, however, this is not done for the CLI '-cpu help' command.
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  target/arm/helper.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Cornelia Huck 




Re: [PATCH v2 1/3] target/i386: display deprecation status in '-cpu help'

2022-07-22 Thread Cornelia Huck
On Fri, Jul 22 2022, Daniel P. Berrangé  wrote:

> When the user queries CPU models via QMP there is a 'deprecated' flag
> present, however, this is not done for the CLI '-cpu help' command.
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  target/i386/cpu.c | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Cornelia Huck 




Re: [PATCH] trivial: Fix duplicated words

2022-07-22 Thread Daniel P . Berrangé
On Fri, Jul 22, 2022 at 05:07:08PM +0200, Thomas Huth wrote:
> On 22/07/2022 17.03, Daniel P. Berrangé wrote:
> > On Fri, Jul 22, 2022 at 04:58:59PM +0200, Thomas Huth wrote:
> > > Some files wrongly contain the same word twice in a row.
> > > One of them should be removed or replaced.
> > > 
> > > Signed-off-by: Thomas Huth 
> > > ---
> > >   Removing duplicated words seems to be the new hip trend on the
> > >   Linux kernel mailing lists - so let's be hip in QEMU land, too! ;-)
> > 
> > I've got patches proposed for this, as well as test to detect it:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01405.html
> > https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01403.html
> > 
> > though I'm not checking 'this this' or 'a a'
> 
> Ah, ok! Sorry, I should have had a closer look at that series...
> 
> So never mind this patch here - but what do we do about "this" and "a" ?
> Shall I respin my patch limited to those two words, or do you want to
> include it in your series?

I don't mind if your patches merge now regardless actually, and I'll
rebase, since it'll likely take me longer to deal with the broader
review feedback on mine.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] trivial: Fix duplicated words

2022-07-22 Thread Thomas Huth

On 22/07/2022 17.03, Daniel P. Berrangé wrote:

On Fri, Jul 22, 2022 at 04:58:59PM +0200, Thomas Huth wrote:

Some files wrongly contain the same word twice in a row.
One of them should be removed or replaced.

Signed-off-by: Thomas Huth 
---
  Removing duplicated words seems to be the new hip trend on the
  Linux kernel mailing lists - so let's be hip in QEMU land, too! ;-)


I've got patches proposed for this, as well as test to detect it:

https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01405.html
https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01403.html

though I'm not checking 'this this' or 'a a'


Ah, ok! Sorry, I should have had a closer look at that series...

So never mind this patch here - but what do we do about "this" and "a" ? 
Shall I respin my patch limited to those two words, or do you want to 
include it in your series?


 Thomas




Re: [PATCH v2 2/2] migration-test: Allow test to run without uffd

2022-07-22 Thread Daniel P . Berrangé
On Fri, Jul 22, 2022 at 10:56:54AM -0400, Peter Xu wrote:
> We used to stop running all tests if uffd is not detected.  However
> logically that's only needed for postcopy not the rest of tests.
> 
> Keep running the rest when still possible.
> 
> Signed-off-by: Peter Xu 
> ---
>  tests/qtest/migration-test.c | 48 +++-
>  1 file changed, 25 insertions(+), 23 deletions(-)

Reviewed-by: Daniel P. Berrangé 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] trivial: Fix duplicated words

2022-07-22 Thread Daniel P . Berrangé
On Fri, Jul 22, 2022 at 04:58:59PM +0200, Thomas Huth wrote:
> Some files wrongly contain the same word twice in a row.
> One of them should be removed or replaced.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Removing duplicated words seems to be the new hip trend on the
>  Linux kernel mailing lists - so let's be hip in QEMU land, too! ;-)

I've got patches proposed for this, as well as test to detect it:

https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01405.html
https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01403.html

though I'm not checking 'this this' or 'a a'

> 
>  docs/system/s390x/bootdevices.rst | 2 +-
>  hw/usb/u2f.h  | 2 +-
>  include/hw/qdev-core.h| 2 +-
>  block/linux-aio.c | 2 +-
>  contrib/plugins/cache.c   | 2 +-
>  hw/arm/omap2.c| 2 +-
>  hw/misc/mac_via.c | 2 +-
>  hw/s390x/s390-ccw.c   | 2 +-
>  linux-user/i386/cpu_loop.c| 2 +-
>  target/arm/helper.c   | 2 +-
>  tools/virtiofsd/fuse_virtio.c | 2 +-
>  ui/vdagent.c  | 2 +-
>  tests/docker/dockerfiles/debian-native.docker | 2 +-
>  13 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/docs/system/s390x/bootdevices.rst 
> b/docs/system/s390x/bootdevices.rst
> index 9e591cb9dc..b5950133e8 100644
> --- a/docs/system/s390x/bootdevices.rst
> +++ b/docs/system/s390x/bootdevices.rst
> @@ -65,7 +65,7 @@ you can specify it via the ``-global 
> s390-ipl.netboot_fw=filename``
>  command line option.
>  
>  The ``bootindex`` property is especially important for booting via the 
> network.
> -If you don't specify the the ``bootindex`` property here, the network 
> bootloader
> +If you don't specify the ``bootindex`` property here, the network bootloader
>  firmware code won't get loaded into the guest memory so that the network boot
>  will fail. For a successful network boot, try something like this::
>  
> diff --git a/hw/usb/u2f.h b/hw/usb/u2f.h
> index db30f3586b..a408a82927 100644
> --- a/hw/usb/u2f.h
> +++ b/hw/usb/u2f.h
> @@ -74,7 +74,7 @@ typedef struct U2FKeyState {
>  
>  /*
>   * API to be used by the U2F key device variants (i.e. hw/u2f-*.c)
> - * to interact with the the U2F key base device (i.e. hw/u2f.c)
> + * to interact with the U2F key base device (i.e. hw/u2f.c)
>   */
>  void u2f_send_to_guest(U2FKeyState *key,
> const uint8_t packet[U2FHID_PACKET_SIZE]);
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 98774e2835..785dd5a56e 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -386,7 +386,7 @@ bool qdev_realize_and_unref(DeviceState *dev, BusState 
> *bus, Error **errp);
>   *
>   *  - unrealize any child buses by calling qbus_unrealize()
>   *(this will recursively unrealize any devices on those buses)
> - *  - call the the unrealize method of @dev
> + *  - call the unrealize method of @dev
>   *
>   * The device can then be freed by causing its reference count to go
>   * to zero.
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index 9c2393a2f7..d2cfb7f523 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -461,7 +461,7 @@ LinuxAioState *laio_init(Error **errp)
>  s = g_malloc0(sizeof(*s));
>  rc = event_notifier_init(&s->e, false);
>  if (rc < 0) {
> -error_setg_errno(errp, -rc, "failed to to initialize event 
> notifier");
> +error_setg_errno(errp, -rc, "failed to initialize event notifier");
>  goto out_free_state;
>  }
>  
> diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
> index b9226e7c40..ac1510aaa1 100644
> --- a/contrib/plugins/cache.c
> +++ b/contrib/plugins/cache.c
> @@ -38,7 +38,7 @@ enum EvictionPolicy policy;
>   * put in any of the blocks inside the set. The number of block per set is
>   * called the associativity (assoc).
>   *
> - * Each block contains the the stored tag and a valid bit. Since this is not
> + * Each block contains the stored tag and a valid bit. Since this is not
>   * a functional simulator, the data itself is not stored. We only identify
>   * whether a block is in the cache or not by searching for its tag.
>   *
> diff --git a/hw/arm/omap2.c b/hw/arm/omap2.c
> index 02b1aa8c97..8571eedd73 100644
> --- a/hw/arm/omap2.c
> +++ b/hw/arm/omap2.c
> @@ -274,7 +274,7 @@ static void omap_eac_format_update(struct omap_eac_s *s)
>  fmt.freq = s->codec.rate;
>  /* TODO: signedness possibly depends on the CODEC hardware - or
>   * does I2S specify it?  */
> -/* All register writes are 16 bits so we we store 16-bit samples
> +/* All register writes are 16 bits so we store 16-bit samples
>   * in the buffers regardless of AGCFR[B8_16] value.  */
>  fmt.fmt = AUDIO_FORMAT_U16;
>  
> diff --git a/hw/misc/mac_via.c b/

[PATCH v2 1/2] migration-test: Use migrate_ensure_converge() for auto-converge

2022-07-22 Thread Peter Xu
Thomas reported that auto-converge test will timeout on MacOS CI gatings.
Use the migrate_ensure_converge() helper too in the auto-converge as when
Daniel reworked the other test cases.

Since both max_bandwidth / downtime_limit will not be used for converge
calculations, make it simple by removing the remaining check, then we can
completely remove both variables altogether, since migrate_ensure_converge
is used the remaining check won't make much sense anyway.

Suggested-by: Daniel P. Berrange 
Reported-by: Thomas Huth 
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 71595a74fd..dd50aa600c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1776,14 +1776,6 @@ static void test_migrate_auto_converge(void)
  * so we need to decrease a bandwidth.
  */
 const int64_t init_pct = 5, inc_pct = 50, max_pct = 95;
-const int64_t max_bandwidth = 4; /* ~400Mb/s */
-const int64_t downtime_limit = 250; /* 250ms */
-/*
- * We migrate through unix-socket (> 500Mb/s).
- * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s).
- * So, we can predict expected_threshold
- */
-const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
 
 if (test_migrate_start(&from, &to, uri, &args)) {
 return;
@@ -1818,8 +1810,7 @@ static void test_migrate_auto_converge(void)
 /* The first percentage of throttling should be equal to init_pct */
 g_assert_cmpint(percentage, ==, init_pct);
 /* Now, when we tested that throttling works, let it converge */
-migrate_set_parameter_int(from, "downtime-limit", downtime_limit);
-migrate_set_parameter_int(from, "max-bandwidth", max_bandwidth);
+migrate_ensure_converge(from);
 
 /*
  * Wait for pre-switchover status to check last throttle percentage
@@ -1830,11 +1821,6 @@ static void test_migrate_auto_converge(void)
 /* The final percentage of throttling shouldn't be greater than max_pct */
 percentage = read_migrate_property_int(from, "cpu-throttle-percentage");
 g_assert_cmpint(percentage, <=, max_pct);
-
-remaining = read_ram_property_int(from, "remaining");
-g_assert_cmpint(remaining, <,
-(expected_threshold + expected_threshold / 100));
-
 migrate_continue(from, "pre-switchover");
 
 qtest_qmp_eventwait(to, "RESUME");
@@ -1842,7 +1828,6 @@ static void test_migrate_auto_converge(void)
 wait_for_serial("dest_serial");
 wait_for_migration_complete(from);
 
-
 test_migrate_end(from, to, true);
 }
 
-- 
2.32.0




Re: [PATCH v2 1/2] migration-test: Use migrate_ensure_converge() for auto-converge

2022-07-22 Thread Daniel P . Berrangé
On Fri, Jul 22, 2022 at 10:56:53AM -0400, Peter Xu wrote:
> Thomas reported that auto-converge test will timeout on MacOS CI gatings.
> Use the migrate_ensure_converge() helper too in the auto-converge as when
> Daniel reworked the other test cases.
> 
> Since both max_bandwidth / downtime_limit will not be used for converge
> calculations, make it simple by removing the remaining check, then we can
> completely remove both variables altogether, since migrate_ensure_converge
> is used the remaining check won't make much sense anyway.
> 
> Suggested-by: Daniel P. Berrange 
> Reported-by: Thomas Huth 
> Signed-off-by: Peter Xu 
> ---
>  tests/qtest/migration-test.c | 17 +
>  1 file changed, 1 insertion(+), 16 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2 0/2] migration-test: Allow test to run without uffd

2022-07-22 Thread Peter Xu
Compare to v1, this added a new patch as reported by Thomas to (hopefully)
allow auto-converge test to pass on some MacOS testbeds.

Please review, thanks.

Peter Xu (2):
  migration-test: Use migrate_ensure_converge() for auto-converge
  migration-test: Allow test to run without uffd

 tests/qtest/migration-test.c | 65 +++-
 1 file changed, 26 insertions(+), 39 deletions(-)

-- 
2.32.0




[PATCH] trivial: Fix duplicated words

2022-07-22 Thread Thomas Huth
Some files wrongly contain the same word twice in a row.
One of them should be removed or replaced.

Signed-off-by: Thomas Huth 
---
 Removing duplicated words seems to be the new hip trend on the
 Linux kernel mailing lists - so let's be hip in QEMU land, too! ;-)

 docs/system/s390x/bootdevices.rst | 2 +-
 hw/usb/u2f.h  | 2 +-
 include/hw/qdev-core.h| 2 +-
 block/linux-aio.c | 2 +-
 contrib/plugins/cache.c   | 2 +-
 hw/arm/omap2.c| 2 +-
 hw/misc/mac_via.c | 2 +-
 hw/s390x/s390-ccw.c   | 2 +-
 linux-user/i386/cpu_loop.c| 2 +-
 target/arm/helper.c   | 2 +-
 tools/virtiofsd/fuse_virtio.c | 2 +-
 ui/vdagent.c  | 2 +-
 tests/docker/dockerfiles/debian-native.docker | 2 +-
 13 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/docs/system/s390x/bootdevices.rst 
b/docs/system/s390x/bootdevices.rst
index 9e591cb9dc..b5950133e8 100644
--- a/docs/system/s390x/bootdevices.rst
+++ b/docs/system/s390x/bootdevices.rst
@@ -65,7 +65,7 @@ you can specify it via the ``-global 
s390-ipl.netboot_fw=filename``
 command line option.
 
 The ``bootindex`` property is especially important for booting via the network.
-If you don't specify the the ``bootindex`` property here, the network 
bootloader
+If you don't specify the ``bootindex`` property here, the network bootloader
 firmware code won't get loaded into the guest memory so that the network boot
 will fail. For a successful network boot, try something like this::
 
diff --git a/hw/usb/u2f.h b/hw/usb/u2f.h
index db30f3586b..a408a82927 100644
--- a/hw/usb/u2f.h
+++ b/hw/usb/u2f.h
@@ -74,7 +74,7 @@ typedef struct U2FKeyState {
 
 /*
  * API to be used by the U2F key device variants (i.e. hw/u2f-*.c)
- * to interact with the the U2F key base device (i.e. hw/u2f.c)
+ * to interact with the U2F key base device (i.e. hw/u2f.c)
  */
 void u2f_send_to_guest(U2FKeyState *key,
const uint8_t packet[U2FHID_PACKET_SIZE]);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 98774e2835..785dd5a56e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -386,7 +386,7 @@ bool qdev_realize_and_unref(DeviceState *dev, BusState 
*bus, Error **errp);
  *
  *  - unrealize any child buses by calling qbus_unrealize()
  *(this will recursively unrealize any devices on those buses)
- *  - call the the unrealize method of @dev
+ *  - call the unrealize method of @dev
  *
  * The device can then be freed by causing its reference count to go
  * to zero.
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 9c2393a2f7..d2cfb7f523 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -461,7 +461,7 @@ LinuxAioState *laio_init(Error **errp)
 s = g_malloc0(sizeof(*s));
 rc = event_notifier_init(&s->e, false);
 if (rc < 0) {
-error_setg_errno(errp, -rc, "failed to to initialize event notifier");
+error_setg_errno(errp, -rc, "failed to initialize event notifier");
 goto out_free_state;
 }
 
diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
index b9226e7c40..ac1510aaa1 100644
--- a/contrib/plugins/cache.c
+++ b/contrib/plugins/cache.c
@@ -38,7 +38,7 @@ enum EvictionPolicy policy;
  * put in any of the blocks inside the set. The number of block per set is
  * called the associativity (assoc).
  *
- * Each block contains the the stored tag and a valid bit. Since this is not
+ * Each block contains the stored tag and a valid bit. Since this is not
  * a functional simulator, the data itself is not stored. We only identify
  * whether a block is in the cache or not by searching for its tag.
  *
diff --git a/hw/arm/omap2.c b/hw/arm/omap2.c
index 02b1aa8c97..8571eedd73 100644
--- a/hw/arm/omap2.c
+++ b/hw/arm/omap2.c
@@ -274,7 +274,7 @@ static void omap_eac_format_update(struct omap_eac_s *s)
 fmt.freq = s->codec.rate;
 /* TODO: signedness possibly depends on the CODEC hardware - or
  * does I2S specify it?  */
-/* All register writes are 16 bits so we we store 16-bit samples
+/* All register writes are 16 bits so we store 16-bit samples
  * in the buffers regardless of AGCFR[B8_16] value.  */
 fmt.fmt = AUDIO_FORMAT_U16;
 
diff --git a/hw/misc/mac_via.c b/hw/misc/mac_via.c
index fba85a53d7..f42c12755a 100644
--- a/hw/misc/mac_via.c
+++ b/hw/misc/mac_via.c
@@ -587,7 +587,7 @@ static void adb_via_poll(void *opaque)
 /*
  * For older Linux kernels that switch to IDLE mode after sending the
  * ADB command, detect if there is an existing response and return that
- * as a a "fake" autopoll reply or bus timeout accordingly
+ * as a "fake" autopoll reply or bus timeout accordingly
  */
 *data = v1s->adb_data_out[0];
 ol

[PATCH v2 2/2] migration-test: Allow test to run without uffd

2022-07-22 Thread Peter Xu
We used to stop running all tests if uffd is not detected.  However
logically that's only needed for postcopy not the rest of tests.

Keep running the rest when still possible.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 48 +++-
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index dd50aa600c..8826ee4be4 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2424,14 +2424,11 @@ int main(int argc, char **argv)
 {
 char template[] = "/tmp/migration-test-XX";
 const bool has_kvm = qtest_has_accel("kvm");
+const bool has_uffd = ufd_version_check();
 int ret;
 
 g_test_init(&argc, &argv, NULL);
 
-if (!ufd_version_check()) {
-return g_test_run();
-}
-
 /*
  * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
  * is touchy due to race conditions on dirty bits (especially on PPC for
@@ -2460,13 +2457,15 @@ int main(int argc, char **argv)
 
 module_call_init(MODULE_INIT_QOM);
 
-qtest_add_func("/migration/postcopy/unix", test_postcopy);
-qtest_add_func("/migration/postcopy/plain", test_postcopy);
-qtest_add_func("/migration/postcopy/recovery/plain",
-   test_postcopy_recovery);
-qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt);
-qtest_add_func("/migration/postcopy/preempt/recovery/plain",
-test_postcopy_preempt_recovery);
+if (has_uffd) {
+qtest_add_func("/migration/postcopy/unix", test_postcopy);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
+qtest_add_func("/migration/postcopy/recovery/plain",
+   test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/preempt/plain", 
test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery/plain",
+   test_postcopy_preempt_recovery);
+}
 
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
@@ -2474,18 +2473,21 @@ int main(int argc, char **argv)
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
-/*
- * NOTE: psk test is enough for postcopy, as other types of TLS
- * channels are tested under precopy.  Here what we want to test is the
- * general postcopy path that has TLS channel enabled.
- */
-qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
-qtest_add_func("/migration/postcopy/recovery/tls/psk",
-   test_postcopy_recovery_tls_psk);
-qtest_add_func("/migration/postcopy/preempt/tls/psk",
-   test_postcopy_preempt_tls_psk);
-qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk",
-   test_postcopy_preempt_all);
+
+if (has_uffd) {
+/*
+ * NOTE: psk test is enough for postcopy, as other types of TLS
+ * channels are tested under precopy.  Here what we want to test is the
+ * general postcopy path that has TLS channel enabled.
+ */
+qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
+qtest_add_func("/migration/postcopy/recovery/tls/psk",
+   test_postcopy_recovery_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/tls/psk",
+   test_postcopy_preempt_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk",
+   test_postcopy_preempt_all);
+}
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.32.0




Re: Access target TranslatorOps

2022-07-22 Thread Alex Bennée


Kenneth Adam Miller  writes:

> Oh whoa, I thought I could have an architecture neutral way to
> interface with the TCG to find this out.

While the TCG intermediates are architecture neutral there are enough
difference between the various guest architectures in the way exceptions
are raised there is no common API. Most will generate an exception with
a front end specific helper. We only define a few common exception types
that all CPUs might generate:

  #define EXCP_INTERRUPT  0x1 /* async interruption */
  #define EXCP_HLT0x10001 /* hlt instruction reached */
  #define EXCP_DEBUG  0x10002 /* cpu stopped after a breakpoint or 
singlestep */
  #define EXCP_HALTED 0x10003 /* cpu is halted (waiting for external event) 
*/
  #define EXCP_YIELD  0x10004 /* cpu wants to yield timeslice to another */
  #define EXCP_ATOMIC 0x10005 /* stop-the-world and emulate atomic */

with the front-ends free to generate any others as they see fit.

>
> Yes, I do have to use the decode tree, and converting the script to output 
> the codes would suffice for my case. However,
> I do not know how to do that at the moment. I've tried my best to understand 
> the TCG documentation but this appears to
> not be too straightforward.

We've slowly been moving more stuff into the RST documentation which you
can see rendered here:

  https://qemu.readthedocs.io/en/latest/devel/index-tcg.html

but we could certainly do with adding some more to describe the general
flow of translation and execution. However if there are things that
aren't clear please to ask here and we can do our best to answer.

>
> On Fri, Jul 22, 2022 at 5:31 AM Alex Bennée  wrote:
>
>  Kenneth Adam Miller  writes:
>
>  > I need to determine the set of instruction encodings that the TCG can 
> support for a given platform. I am not
>  bothered
>  > whether the target runs at all, and in fact it is better if it
>  > doesn't, so runtime or translate time doesn't bother me.
>
>  Which architectures are you interested in? For the ones that have been
>  converted to use decode tree it should be easy enough to update the
>  script to emit the uncovered opcode space. However decode tree targets
>  regular encoding - I think it has gained support for multiple encoding
>  modes but I don't know if it can handle the irregular madness of x86.
>
>  > Imagine I were adding support for more instructions for a given platform. 
> I would like to check that I'm using the
>  API
>  > right. It's amazing that it's been so far and there's no way to check that 
> the correct behavior occurs when a given
>  > encoding is encountered regarding the TCG. A boolean result from a 
> can_translate called just when the target
>  encounters
>  > the instruction would be good.
>
>  Generally when the translator encounters an instruction it can't
>  translate it would emit a illegal instruction exception. While you might
>  be able to peek into the TCG opcode stream to see such calls to the
>  relevant helpers I doubt it would be up-streamable as each front end
>  will deal with illegal instructions their own way (including
>  instructions that are illegal due to the current CPU operating mode).
>
>  > Additionally, the ability to force the translation of arbitrary encodings 
> would be good. I
>  > would like to not have to engineer some binary file format.
>
>  You don't need a new binary file format - just to construct an ELF with
>  the stream you want. A possibly adjacent project you might want to look
>  at is RISU:
>
>https://git.linaro.org/people/peter.maydell/risu.git/about/
>
>  which we've used for testing the range of the translator for a number of
>  architectures.
>
>  >
>  > On Wed, Jul 20, 2022 at 1:37 PM Peter Maydell  
> wrote:
>  >
>  >  On Wed, 20 Jul 2022 at 17:39, Kenneth Adam Miller
>  >   wrote:
>  >  > That I know of, the TCG plugins do not allow me to feed the
>  >  > QEMU instance dynamically changing opcodes. I wouldn't use
>  >  > TranslatorOps if I don't have to. I want to facilitate a
>  >  > use case in which the contents of the target being emulated
>  >  > are changing, but it is not a self modifying target. I have
>  >  > to query and interact with the TCG to find out what opcodes
>  >  > are supported or not.
>  >
>  >  I agree that feeding opcodes into the translator isn't what
>  >  TCG plugins are intended for.
>  >
>  >  I'm definitely not clear on what you're trying to do here,
>  >  so it's hard to suggest some other approach, but linux-user
>  >  code shouldn't be messing with the internals of the translator
>  >  by grabbing the TranslatorOps struct. Among other things,
>  >  linux-user code is runtime and TranslatorOps is for
>  >  translate-time.
>  >
>  >  Sometimes code in linux-user needs to be a bit over-familiar
>  >  with the CPU state, but we try to keep that to a minimum.
>  >  Generally that involves code in target/foo/ providing some
>  >  set of interface functions that code in linux-user/foo/
>  >  can work 

Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-22 Thread Alex Bennée


"Jason A. Donenfeld"  writes:

> Hey Alex,
>
> On Fri, Jul 22, 2022 at 10:45:19AM +0100, Alex Bennée wrote:
>> All the guest-loader does is add the information about where in memory a
>> guest and/or it's initrd have been placed in memory to the DTB. It's
>> entirely up to the initial booted code (usually a hypervisor in this
>> case) to decide what gets passed up the chain to any subsequent guests.
>
> I think that's also my understanding, but let me tell you what I was
> thinking with regards to rng-seed there, and you can tell me if I'm way
> off.
>
> The guest-loader puts in memory various loaders in a multistage boot.
> Let's call it stage0, stage1, stage2, and finally the kernel. Normally,
> rng-seed is only given to one of these stages. That stage may or may not
> pass it to the next one, and it most probably does not. And why should
> it? The host is in a better position to generate these seeds, rather
> than adding finicky and fragile crypto ratcheting code into each stage
> bootloader. So, instead, QEMU can just give each stage its own seed, for
> it to do whatever with. This way, if stage1 does nothing, at least
> there's a fresh unused one available for the kernel when it finally gets
> there.

That sounds suspiciously like inventing a new ABI between QEMU and
guests which we generally try to avoid. The DTB exposed to the first
stage may never be made visible to the following stages or more likely a
sanitised version is prepared by the previous stage. Generally QEMU just
tries to get the emulation right so the firmware/software can get on
with it's thing. Indeed the dynamic DTB for -M virt and friends is an
oddity compared to most of the other machine types which assume the user
has a valid DTB.

Either way given how close to release we are I'd rather drop this patch.

> Does what I describe correspond at all with the use of guest-loader? If
> so, maybe this patch should stay? If not, discard it as rubbish.

The original intent of the guest-loader was to make testing of
hypervisors easier because the alternative is getting a multi-stage boot
chain of firmware, boot-loaders and distro specific integration working
which can be quite opaque to debug (c.f. why -kernel/-initrd exist and
not everyone boots via -bios/-pflash).

>
> Jason


-- 
Alex Bennée



Re: [PATCH v3 14/14] s390x: pv: Add dump support

2022-07-22 Thread Steffen Eiden

Hi Janosch,

looks good to me.
Have a look on my comments.

On 7/21/22 15:22, Janosch Frank wrote:

Sometimes dumping a guest from the outside is the only way to get the
data that is needed. This can be the case if a dumping mechanism like
KDUMP hasn't been configured or data needs to be fetched at a specific
point. Dumping a protected guest from the outside without help from
fw/hw doesn't yield sufficient data to be useful. Hence we now
introduce PV dump support.

The PV dump support works by integrating the firmware into the dump
process. New Ultravisor calls are used to initiate the dump process,
dump cpu data, dump memory state and lastly complete the dump process.
The UV calls are exposed by KVM via the new KVM_PV_DUMP command and
its subcommands. The guest's data is fully encrypted and can only be
decrypted by the entity that owns the customer communication key for
the dumped guest. Also dumping needs to be allowed via a flag in the
SE header.

On the QEMU side of things we store the PV dump data in the newly
introduced architecture ELF sections (storage state and completion
data) and the cpu notes (for cpu dump data).

Users can use the zgetdump tool to convert the encrypted QEMU dump to an
unencrypted one.

Signed-off-by: Janosch Frank 
---
  include/elf.h|   1 +
  target/s390x/arch_dump.c | 248 ++-
  2 files changed, 219 insertions(+), 30 deletions(-)

diff --git a/include/elf.h b/include/elf.h
index 3a4bcb646a..58f76fd5b4 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1649,6 +1649,7 @@ typedef struct elf64_shdr {
  #define NT_TASKSTRUCT 4
  #define NT_AUXV   6
  #define NT_PRXFPREG 0x46e62b7f  /* copied from 
gdb5.1/include/elf/common.h */
+#define NT_S390_PV_DATA 0x30e   /* s390 protvirt cpu dump data */
  #define NT_S390_GS_CB   0x30b   /* s390 guarded storage registers */
  #define NT_S390_VXRS_HIGH 0x30a /* s390 vector registers 16-31 */
  #define NT_S390_VXRS_LOW  0x309 /* s390 vector registers 0-15 (lower 
half) */
diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 08daf93ae1..e081aa9483 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -16,7 +16,8 @@
  #include "s390x-internal.h"
  #include "elf.h"
  #include "sysemu/dump.h"
-
+#include "hw/s390x/pv.h"
+#include "kvm/kvm_s390x.h"
  
  struct S390xUserRegsStruct {

  uint64_t psw[2];
@@ -76,9 +77,16 @@ typedef struct noteStruct {
  uint64_t todcmp;
  uint32_t todpreg;
  uint64_t ctrs[16];
+uint8_t dynamic[1];  /*
+  * Would be a flexible array member, if
+  * that was legal inside a union. Real
+  * size comes from PV info interface.
+  */
  } contents;
  } QEMU_PACKED Note;
  
+static bool pv_dump_initialized;

+
  static void s390x_write_elf64_prstatus(Note *note, S390CPU *cpu, int id)
  {
  int i;
@@ -177,52 +185,82 @@ static void s390x_write_elf64_prefix(Note *note, S390CPU 
*cpu, int id)
  note->contents.prefix = cpu_to_be32((uint32_t)(cpu->env.psa));
  }
  
+static void s390x_write_elf64_pv(Note *note, S390CPU *cpu, int id)

+{
+note->hdr.n_type = cpu_to_be32(NT_S390_PV_DATA);
+if (!pv_dump_initialized) {
+return;
+}
+kvm_s390_dump_cpu(cpu, ¬e->contents.dynamic);
+}
  
  typedef struct NoteFuncDescStruct {

  int contents_size;
+uint64_t (*note_size_func)(void); /* NULL for non-dynamic sized contents */
  void (*note_contents_func)(Note *note, S390CPU *cpu, int id);
+bool pvonly;
  } NoteFuncDesc;
  
  static const NoteFuncDesc note_core[] = {

-{sizeof_field(Note, contents.prstatus), s390x_write_elf64_prstatus},
-{sizeof_field(Note, contents.fpregset), s390x_write_elf64_fpregset},
-{ 0, NULL}
+{sizeof_field(Note, contents.prstatus), NULL, s390x_write_elf64_prstatus, 
false},
+{sizeof_field(Note, contents.fpregset), NULL, s390x_write_elf64_fpregset, 
false},
+{ 0, NULL, NULL}
  };
  
  static const NoteFuncDesc note_linux[] = {

-{sizeof_field(Note, contents.prefix),   s390x_write_elf64_prefix},
-{sizeof_field(Note, contents.ctrs), s390x_write_elf64_ctrs},
-{sizeof_field(Note, contents.timer),s390x_write_elf64_timer},
-{sizeof_field(Note, contents.todcmp),   s390x_write_elf64_todcmp},
-{sizeof_field(Note, contents.todpreg),  s390x_write_elf64_todpreg},
-{sizeof_field(Note, contents.vregslo),  s390x_write_elf64_vregslo},
-{sizeof_field(Note, contents.vregshi),  s390x_write_elf64_vregshi},
-{sizeof_field(Note, contents.gscb), s390x_write_elf64_gscb},
-{ 0, NULL}
+{sizeof_field(Note, contents.prefix),   NULL, s390x_write_elf64_prefix,  
false},
+{sizeof_field(Note, contents.ctrs), NULL, s390x_write_elf64_ctrs,
false},
+{sizeof_field(Note, contents.timer),NULL, s390x_write_elf64_timer,   
false},
+{s

Re: [PATCH v3 11/14] s390x: Add protected dump cap

2022-07-22 Thread Steffen Eiden




On 7/21/22 15:22, Janosch Frank wrote:

Add a protected dump capability for later feature checking.

Signed-off-by: Janosch Frank 

Reviewed-by: Steffen Eiden 

---
  target/s390x/kvm/kvm.c   | 7 +++
  target/s390x/kvm/kvm_s390x.h | 1 +
  2 files changed, 8 insertions(+)

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 7bd8db0e7b..cbd8c91424 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -157,6 +157,7 @@ static int cap_ri;
  static int cap_hpage_1m;
  static int cap_vcpu_resets;
  static int cap_protected;
+static int cap_protected_dump;
  
  static bool mem_op_storage_key_support;
  
@@ -362,6 +363,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)

  cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
  cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
  cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
+cap_protected_dump = kvm_check_extension(s, KVM_CAP_S390_PROTECTED_DUMP);
  
  kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);

  kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
@@ -2043,6 +2045,11 @@ int kvm_s390_assign_subch_ioeventfd(EventNotifier 
*notifier, uint32_t sch,
  return kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
  }
  
+int kvm_s390_get_protected_dump(void)

+{
+return cap_protected_dump;
+}
+
  int kvm_s390_get_ri(void)
  {
  return cap_ri;
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index 05a5e1e6f4..31a69f9ce2 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -26,6 +26,7 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state);
  void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
  int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
  int kvm_s390_get_hpage_1m(void);
+int kvm_s390_get_protected_dump(void);
  int kvm_s390_get_ri(void);
  int kvm_s390_get_clock(uint8_t *tod_high, uint64_t *tod_clock);
  int kvm_s390_get_clock_ext(uint8_t *tod_high, uint64_t *tod_clock);




Re: [PATCH v3 12/14] s390x: Introduce PV query interface

2022-07-22 Thread Steffen Eiden




On 7/21/22 15:22, Janosch Frank wrote:

Introduce an interface over which we can get information about UV data.

Signed-off-by: Janosch Frank 

Reviewed-by: Steffen Eiden 

---
  hw/s390x/pv.c  | 61 ++
  hw/s390x/s390-virtio-ccw.c |  5 
  include/hw/s390x/pv.h  | 10 +++
  3 files changed, 76 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 401b63d6cb..a5af4ddf46 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -20,6 +20,11 @@
  #include "exec/confidential-guest-support.h"
  #include "hw/s390x/ipl.h"
  #include "hw/s390x/pv.h"
+#include "target/s390x/kvm/kvm_s390x.h"
+
+static bool info_valid;
+static struct kvm_s390_pv_info_vm info_vm;
+static struct kvm_s390_pv_info_dump info_dump;
  
  static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)

  {
@@ -56,6 +61,42 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, 
void *data)
  }  \
  }
  
+int s390_pv_query_info(void)

+{
+struct kvm_s390_pv_info info = {
+.header.id = KVM_PV_INFO_VM,
+.header.len_max = sizeof(info.header) + sizeof(info.vm),
+};
+int rc;
+
+/* Info API's first user is dump so they are bundled */
+if (!kvm_s390_get_protected_dump()) {
+return 0;
+}
+
+rc = s390_pv_cmd(KVM_PV_INFO, &info);
+if (rc) {
+error_report("KVM PV INFO cmd %x failed: %s",
+ info.header.id, strerror(rc));
+return rc;
+}
+memcpy(&info_vm, &info.vm, sizeof(info.vm));
+
+info.header.id = KVM_PV_INFO_DUMP;
+info.header.len_max = sizeof(info.header) + sizeof(info.dump);
+rc = s390_pv_cmd(KVM_PV_INFO, &info);
+if (rc) {
+error_report("KVM PV INFO cmd %x failed: %s",
+ info.header.id, strerror(rc));
+return rc;
+}
+
+memcpy(&info_dump, &info.dump, sizeof(info.dump));
+info_valid = true;
+
+return rc;
+}
+
  int s390_pv_vm_enable(void)
  {
  return s390_pv_cmd(KVM_PV_ENABLE, NULL);
@@ -114,6 +155,26 @@ void s390_pv_inject_reset_error(CPUState *cs)
  env->regs[r1 + 1] = DIAG_308_RC_INVAL_FOR_PV;
  }
  
+uint64_t kvm_s390_pv_dmp_get_size_cpu(void)

+{
+return info_dump.dump_cpu_buffer_len;
+}
+
+uint64_t kvm_s390_pv_dmp_get_size_complete(void)
+{
+return info_dump.dump_config_finalize_len;
+}
+
+uint64_t kvm_s390_pv_dmp_get_size_mem(void)
+{
+return info_dump.dump_config_mem_buffer_per_1m;
+}
+
+bool kvm_s390_pv_info_basic_valid(void)
+{
+return info_valid;
+}
+
  #define TYPE_S390_PV_GUEST "s390-pv-guest"
  OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
  
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c

index cc3097bfee..f9401e392b 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -366,6 +366,11 @@ static int s390_machine_protect(S390CcwMachineState *ms)
  
  ms->pv = true;
  
+rc = s390_pv_query_info();

+if (rc) {
+goto out_err;
+}
+
  /* Set SE header and unpack */
  rc = s390_ipl_prepare_pv_header();
  if (rc) {
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h
index 1f1f545bfc..6fa55bf70e 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -38,6 +38,7 @@ static inline bool s390_is_pv(void)
  return ccw->pv;
  }
  
+int s390_pv_query_info(void);

  int s390_pv_vm_enable(void);
  void s390_pv_vm_disable(void);
  int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
@@ -46,8 +47,13 @@ void s390_pv_prep_reset(void);
  int s390_pv_verify(void);
  void s390_pv_unshare(void);
  void s390_pv_inject_reset_error(CPUState *cs);
+uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
+uint64_t kvm_s390_pv_dmp_get_size_mem(void);
+uint64_t kvm_s390_pv_dmp_get_size_complete(void);
+bool kvm_s390_pv_info_basic_valid(void);
  #else /* CONFIG_KVM */
  static inline bool s390_is_pv(void) { return false; }
+static inline int s390_pv_query_info(void) { return 0; }
  static inline int s390_pv_vm_enable(void) { return 0; }
  static inline void s390_pv_vm_disable(void) {}
  static inline int s390_pv_set_sec_parms(uint64_t origin, uint64_t length) { 
return 0; }
@@ -56,6 +62,10 @@ static inline void s390_pv_prep_reset(void) {}
  static inline int s390_pv_verify(void) { return 0; }
  static inline void s390_pv_unshare(void) {}
  static inline void s390_pv_inject_reset_error(CPUState *cs) {};
+static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { return 0; }
+static inline uint64_t kvm_s390_pv_dmp_get_size_mem(void) { return 0; }
+static inline uint64_t kvm_s390_pv_dmp_get_size_complete(void) { return 0; }
+static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
  #endif /* CONFIG_KVM */
  
  int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);




Re: [PATCH v3 13/14] s390x: Add KVM PV dump interface

2022-07-22 Thread Steffen Eiden

Hi Janosch,

looks good to me.
Have a look on my comments.

On 7/21/22 15:22, Janosch Frank wrote:

Let's add a few bits of code which hide the new KVM PV dump API from
us via new functions.

Signed-off-by: Janosch Frank 
---
  hw/s390x/pv.c | 51 +++
  include/hw/s390x/pv.h |  8 +++
  2 files changed, 59 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index a5af4ddf46..48591c387d 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -175,6 +175,57 @@ bool kvm_s390_pv_info_basic_valid(void)
  return info_valid;
  }
  
+static int s390_pv_dump_cmd(uint64_t subcmd, uint64_t uaddr, uint64_t gaddr,

+uint64_t len)
+{
+struct kvm_s390_pv_dmp dmp = {
+.subcmd = subcmd,
+.buff_addr = uaddr,
+.buff_len = len,
+.gaddr = gaddr,
+};
+int ret;
+
+ret = s390_pv_cmd(KVM_PV_DUMP, (void *)&dmp);
+if (ret) {
+error_report("KVM DUMP command %ld failed", subcmd);
+}
+return ret;
+}
+
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff)
+{
+struct kvm_s390_pv_dmp dmp = {
+.subcmd = KVM_PV_DUMP_CPU,
+.buff_addr = (uint64_t)buff,
+.gaddr = 0,
+.buff_len = info_dump.dump_cpu_buffer_len,
+};
+struct kvm_pv_cmd pv = {
+.cmd = KVM_PV_DUMP,
+.data = (uint64_t)&dmp,
+};
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_S390_PV_CPU_COMMAND, &pv);
+}
+
+int kvm_s390_dump_init(void)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_INIT, 0, 0, 0);
+}
+
+int kvm_s390_dump_mem(uint64_t gaddr, size_t len, void *dest)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_CONFIG_STATE, (uint64_t)dest,
+gaddr, len);
+}

Wouldn't be kvm_s390_dump_mem_state() a more precise name?
Or kvm_s390_dump_mem_meta, as the corresponding section in the dump
has that name (pv_mem_meta)

The current name may lead to the conclusion that this function dumps the
guests memory, which it does not.



+
+int kvm_s390_dump_finish(void *buff)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_COMPLETE, (uint64_t)buff, 0,
+info_dump.dump_config_finalize_len);
+}

IIRC this is the only place were you call "complete-dump"
"finish". In the next patch you call that function in
"get_data_complete()". This is the only reference to that function.

Why not simply call it kvm_s390_dump_complete() to reduce confusion?




+
  #define TYPE_S390_PV_GUEST "s390-pv-guest"
  OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
  
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h

index 6fa55bf70e..f37021e189 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -51,6 +51,10 @@ uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
  uint64_t kvm_s390_pv_dmp_get_size_mem(void);
  uint64_t kvm_s390_pv_dmp_get_size_complete(void);
  bool kvm_s390_pv_info_basic_valid(void);
+int kvm_s390_dump_init(void);
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff);
+int kvm_s390_dump_mem(uint64_t addr, size_t len, void *dest);
+int kvm_s390_dump_finish(void *buff);
  #else /* CONFIG_KVM */
  static inline bool s390_is_pv(void) { return false; }
  static inline int s390_pv_query_info(void) { return 0; }
@@ -66,6 +70,10 @@ static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { 
return 0; }
  static inline uint64_t kvm_s390_pv_dmp_get_size_mem(void) { return 0; }
  static inline uint64_t kvm_s390_pv_dmp_get_size_complete(void) { return 0; }
  static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
+static inline int kvm_s390_dump_init(void) { return 0; }
+static inline int kvm_s390_dump_cpu(S390CPU *cpu, void *buff, size_t len) { 
return 0; }
+static inline int kvm_s390_dump_mem(uint64_t addr, size_t len, void *dest) { 
return 0; }
+static inline int kvm_s390_dump_finish(void *buff) { return 0; }
  #endif /* CONFIG_KVM */
  
  int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);




[PATCH v2 4/7] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap

2022-07-22 Thread Eugenio Pérez
So the caller can choose which ASID is destined.

No need to update the batch functions as they will always be called from
memory listener updates at the moment. Memory listener updates will
always update ASID 0, as it's the passthrough ASID.

All vhost devices's ASID are 0 at this moment.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  8 +---
 hw/virtio/vhost-vdpa.c | 26 --
 net/vhost-vdpa.c   |  6 +++---
 hw/virtio/trace-events |  4 ++--
 4 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d85643..6560bb9d78 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
 int index;
 uint32_t msg_type;
 bool iotlb_batch_begin_sent;
+uint32_t address_space_id;
 MemoryListener listener;
 struct vhost_vdpa_iova_range iova_range;
 uint64_t acked_features;
@@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-   void *vaddr, bool readonly);
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+   hwaddr size, void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+ hwaddr size);
 
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index e1ed56b26d..79623badf2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -72,22 +72,24 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 return false;
 }
 
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-   void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+   hwaddr size, void *vaddr, bool readonly)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
 int ret = 0;
 
 msg.type = v->msg_type;
+msg.asid = asid;
 msg.iotlb.iova = iova;
 msg.iotlb.size = size;
 msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
 msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
 msg.iotlb.type = VHOST_IOTLB_UPDATE;
 
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
-msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+ msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+ msg.iotlb.type);
 
 if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
 error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, 
hwaddr size,
 return ret;
 }
 
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+ hwaddr size)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
 int ret = 0;
 
 msg.type = v->msg_type;
+msg.asid = asid;
 msg.iotlb.iova = iova;
 msg.iotlb.size = size;
 msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
 
-trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
+trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
msg.iotlb.size, msg.iotlb.type);
 
 if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
@@ -228,7 +232,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
*listener,
 }
 
 vhost_vdpa_iotlb_batch_begin_once(v);
-ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
+ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
  vaddr, section->readonly);
 if (ret) {
 error_report("vhost vdpa map fail!");
@@ -293,7 +297,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
*listener,
 vhost_iova_tree_remove(v->iova_tree, result);
 }
 vhost_vdpa_iotlb_batch_begin_once(v);
-ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
+ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
 if (ret) {
 error_report("vhost_vdpa dma unmap error!");
 }
@@ -884,7 +888,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
 }
 
 size = ROUND_UP(result->size, qemu_real_host_page_size());
-r = vhost_vdpa_dma_unmap(v, result->iova, size);
+r = vhost_vdpa_dma_unmap(v, v->address_space_id, result->iova, size);
 return r == 0;
 }
 
@@ -926,7 +930,8 @@ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, 
DMAMap *needle,
 return false;

Re: [PATCH v2] Loading new machines and devices from external modules

2022-07-22 Thread Drap, Anton
Hi Guys,



Let me clarify my position about out of tree devices. Yes, I understand that 
current QEMU politics is to have all the supported platforms inside QEMU source 
tree, but actually simulator core development, development of the devices 
standard library and development of virtual platforms are three different 
tasks. Moreover different people interested in different parts of QEMU. QEMU 
core developers not interested in supporting and maintaining tons of platforms 
available on the market. Virtual platform developers not interested and usually 
don’t have resources to merge their changes upstream. So we have a lots of 
abandoned QEMU forks for different platforms. For example we’re now working on 
Raspberry Pi 4b implementation for our internal needs and we’re planning to 
merge it upstream. It’s based on some QEMU fork author of which wasn’t able to 
complete it and commit upstream. And it can’t be used with later QEMU without 
some efforts to port it to newer QEMU version. Nobody supports and maintaining 
it since constant efforts necessary to be in sync with QEMU mainline. So my 
opinion is that core development, core device library and virtual platform 
development should be divided to make life easier for everybody. And this 
changes is first step to it.



About legal reasons and GPL violations. Possibility to make .so with machine 
separately and load it without providing sources is a legal risk and can’t be 
completely solved with technical actions. Ban on external modules just makes it 
more difficult for everybody to use not upstream code (including GPL violators, 
but not only for them) and doesn’t block ability to distribute full QEMU fork 
with closed models without providing sources. So I don’t see any reason to make 
technical limitations which actually can’t solve legal problem.



Best Regards,

Anton

Software engineer from Auriga LLC


От: Daniel P. Berrangé 
Отправлено: 19 июля 2022 г. 19:25
Кому: Drap Anton
Копия: qemu-devel@nongnu.org; Drap, Anton
Тема: Re: [PATCH v2] Loading new machines and devices from external modules

On Tue, Jul 19, 2022 at 04:59:22PM +0500, Drap Anton wrote:
> From: "Drap, Anton" 
>
> There is no mechanism to load external machines and classes from modules
> at the moment. This patch is to add two parameters `add_machine` and
> `add_modinfo` for it.
> `add_machine` is to add machines from external modules.
> `add_modinfo` is to add devices from external modules, needed for a new
> machine, for example.
> Main aim is to have possibility to develop independent models and be able
> to use it with mainline QEMU. It will help to make develop new models of
> proprietary boards, simplify to use Qemu by hardware developers and extend
> number of supporting boards and devices in QEMU. It will be easier for
> small hardware manufacturers to use QEMU to develop their own board models
> and use them to shift left of FW/SW development.

IIUC, this is suggesting QEMU load pre-built .so files created from
non-upstream code, to arbitrarily extend QEMU's functionality. Such
.so files will inherantly have to be GPLd as they'll derive from
QEMU's internal APIs which are GPL. Given the proposed use case is
to emulate non-released proprietary hardware, I struggle to see how
you'll fullfill the requirements for GPL licensing of the loaded .so,
without revealing your proprietary hardware design to any who receive
the .so files.


More generally, QEMU's existing loadable module usage is explicitly
designed to try to *prevent* loading of non-upstream code. It aims
to only load code that was built as part of the integrated QEMU
build process. ie, QEMU's loadable module system is about making
it possible to build many QEMU features, but then selectively load
them at runtime to reduce footprint/attack surface. It is *not*
intended to allow non-upstream code to be loaded.


Aside from our goal to prevent/discourage GPL violation through
closed source loadable modules, QEMU also has a strong desire to
not lock ourselves into supporting a public API for loadable
modules. Maintainers wish to retain flexibility to change the
internal APIs at any time.


Partially related to this topic, there is some work taking place
with the goal of making it possible to define new machine types
in QEMU from a QAPI based JSON description.  The actual hardware
devices and CPUs would still need code to be built into QEMU
and upstream, but the way the hardware devices & CPUs are wired
together would be customizable via the JSON config.  That could
get some, but not all, of the benefits you seek without the
downsides the QEMU maintainers wish to avoid.  This isn't ready
to consume yet and we don't have any firm ETA either I'm
afraid.

With regards,
Daniel
--
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
libvirt: The virtualization API

Re: Access target TranslatorOps

2022-07-22 Thread Kenneth Adam Miller
Oh whoa, I thought I could have an architecture neutral way to interface
with the TCG to find this out.

Yes, I do have to use the decode tree, and converting the script to output
the codes would suffice for my case. However, I do not know how to do that
at the moment. I've tried my best to understand the TCG documentation but
this appears to not be too straightforward.


On Fri, Jul 22, 2022 at 5:31 AM Alex Bennée  wrote:

>
> Kenneth Adam Miller  writes:
>
> > I need to determine the set of instruction encodings that the TCG can
> support for a given platform. I am not bothered
> > whether the target runs at all, and in fact it is better if it
> > doesn't, so runtime or translate time doesn't bother me.
>
> Which architectures are you interested in? For the ones that have been
> converted to use decode tree it should be easy enough to update the
> script to emit the uncovered opcode space. However decode tree targets
> regular encoding - I think it has gained support for multiple encoding
> modes but I don't know if it can handle the irregular madness of x86.
>
> > Imagine I were adding support for more instructions for a given
> platform. I would like to check that I'm using the API
> > right. It's amazing that it's been so far and there's no way to check
> that the correct behavior occurs when a given
> > encoding is encountered regarding the TCG. A boolean result from a
> can_translate called just when the target encounters
> > the instruction would be good.
>
> Generally when the translator encounters an instruction it can't
> translate it would emit a illegal instruction exception. While you might
> be able to peek into the TCG opcode stream to see such calls to the
> relevant helpers I doubt it would be up-streamable as each front end
> will deal with illegal instructions their own way (including
> instructions that are illegal due to the current CPU operating mode).
>
> > Additionally, the ability to force the translation of arbitrary
> encodings would be good. I
> > would like to not have to engineer some binary file format.
>
> You don't need a new binary file format - just to construct an ELF with
> the stream you want. A possibly adjacent project you might want to look
> at is RISU:
>
>   https://git.linaro.org/people/peter.maydell/risu.git/about/
>
> which we've used for testing the range of the translator for a number of
> architectures.
>
> >
> > On Wed, Jul 20, 2022 at 1:37 PM Peter Maydell 
> wrote:
> >
> >  On Wed, 20 Jul 2022 at 17:39, Kenneth Adam Miller
> >   wrote:
> >  > That I know of, the TCG plugins do not allow me to feed the
> >  > QEMU instance dynamically changing opcodes. I wouldn't use
> >  > TranslatorOps if I don't have to. I want to facilitate a
> >  > use case in which the contents of the target being emulated
> >  > are changing, but it is not a self modifying target. I have
> >  > to query and interact with the TCG to find out what opcodes
> >  > are supported or not.
> >
> >  I agree that feeding opcodes into the translator isn't what
> >  TCG plugins are intended for.
> >
> >  I'm definitely not clear on what you're trying to do here,
> >  so it's hard to suggest some other approach, but linux-user
> >  code shouldn't be messing with the internals of the translator
> >  by grabbing the TranslatorOps struct. Among other things,
> >  linux-user code is runtime and TranslatorOps is for
> >  translate-time.
> >
> >  Sometimes code in linux-user needs to be a bit over-familiar
> >  with the CPU state, but we try to keep that to a minimum.
> >  Generally that involves code in target/foo/ providing some
> >  set of interface functions that code in linux-user/foo/
> >  can work with, typically passing it the CPU state struct.
> >
> >  thanks
> >  -- PMM
>
>
> --
> Alex Bennée
>


Re: Corrupted display changing screen colour depth in qemu-system-ppc/MacOS

2022-07-22 Thread Marc-André Lureau
Hi

On Fri, Jul 22, 2022 at 4:28 PM Howard Spoelstra  wrote:
>
>
>
> On Fri, Jun 17, 2022 at 2:38 PM Marc-André Lureau 
>  wrote:
>>
>> Hi
>>
>> On Fri, Jun 17, 2022 at 1:56 PM Gerd Hoffmann  wrote:
>> >
>> >   Hi,
>> >
>> > > > Can you try ditch the QEMU_ALLOCATED_FLAG check added by the commit?
>> > >
>> > > Commit cb8962c146 drops the QEMU_ALLOCATED_FLAG check: if I add it back 
>> > > in
>> > > with the following diff on top then everything works again:
>> >
>> > Ah, the other way around.
>> >
>> > > diff --git a/ui/console.c b/ui/console.c
>> > > index 365a2c14b8..decae4287f 100644
>> > > --- a/ui/console.c
>> > > +++ b/ui/console.c
>> > > @@ -2400,11 +2400,12 @@ static void vc_chr_open(Chardev *chr,
>> > >
>> > >  void qemu_console_resize(QemuConsole *s, int width, int height)
>> > >  {
>> > > -DisplaySurface *surface;
>> > > +DisplaySurface *surface = qemu_console_surface(s);
>> > >
>> > >  assert(s->console_type == GRAPHIC_CONSOLE);
>> > >
>> > > -if (qemu_console_get_width(s, -1) == width &&
>> > > +if (surface && (surface->flags & QEMU_ALLOCATED_FLAG) &&
>> > > +qemu_console_get_width(s, -1) == width &&
>> > >  qemu_console_get_height(s, -1) == height) {
>> > >  return;
>> > >  }
>> > >
>> > > > Which depth changes triggers this?  Going from direct color to a
>> > > > paletted mode?
>> > >
>> > > A quick test suggests anything that isn't 32-bit colour is affected.
>> >
>> > Hmm, I think the commit should simply be reverted.
>> >
>> > Short-cutting the qemu_console_resize() call is only valid in case the
>> > current surface was created by qemu_console_resize() too.  When it is
>> > something else -- typically a surface backed by vga vram -- it's not.
>> > Looking at the QEMU_ALLOCATED_FLAG checks exactly that ...
>>
>> Oh ok, it might be worth adding a comment to clarify that. By
>> reverting, we are going back to the situation where
>> qemu_console_resize() will create a needless surface when rendering
>> with GL. As I tried to explain in the commit message, it will need
>> more changes to prevent that. I can take a look later.
>>
>
> Hi Marc-André,
>
> I wondered whether you've had a chance to look at this?
>

No, it's not clear to me how to reproduce it. Someone that can
actually test it should send a patch with some comments to explain it.

thanks




[PATCH v2 3/7] vdpa: Allocate SVQ unconditionally

2022-07-22 Thread Eugenio Pérez
SVQ may run or not in a device depending on runtime conditions (for
example, if the device can move CVQ to its own group or not).

Allocate the resources unconditionally, and decide later if to use them
or not.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 897e1fdd47..e1ed56b26d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -400,6 +400,21 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 int r;
 bool ok;
 
+shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
+for (unsigned n = 0; n < hdev->nvqs; ++n) {
+g_autoptr(VhostShadowVirtqueue) svq;
+
+svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+v->shadow_vq_ops_opaque);
+if (unlikely(!svq)) {
+error_setg(errp, "Cannot create svq %u", n);
+return -1;
+}
+g_ptr_array_add(shadow_vqs, g_steal_pointer(&svq));
+}
+
+v->shadow_vqs = g_steal_pointer(&shadow_vqs);
+
 if (!v->shadow_vqs_enabled) {
 return 0;
 }
@@ -416,20 +431,6 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 return -1;
 }
 
-shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
-for (unsigned n = 0; n < hdev->nvqs; ++n) {
-g_autoptr(VhostShadowVirtqueue) svq;
-
-svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
-v->shadow_vq_ops_opaque);
-if (unlikely(!svq)) {
-error_setg(errp, "Cannot create svq %u", n);
-return -1;
-}
-g_ptr_array_add(shadow_vqs, g_steal_pointer(&svq));
-}
-
-v->shadow_vqs = g_steal_pointer(&shadow_vqs);
 return 0;
 }
 
@@ -570,10 +571,6 @@ static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
 struct vhost_vdpa *v = dev->opaque;
 size_t idx;
 
-if (!v->shadow_vqs) {
-return;
-}
-
 for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
 vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
 }
-- 
2.31.1




[PATCH v2 7/7] vdpa: Always start CVQ in SVQ mode

2022-07-22 Thread Eugenio Pérez
Isolate control virtqueue in its own group, allowing to intercept control
commands but letting dataplane run totally passthrough to the guest.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c |   3 +-
 net/vhost-vdpa.c   | 158 +++--
 2 files changed, 156 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 79623badf2..fe1c85b086 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -668,7 +668,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
 uint64_t features;
 uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
 int r;
 
 if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 6c1c64f9b1..f5075ef487 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
 /* Control commands shadow buffers */
 void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
 
+/* Number of address spaces supported by the device */
+unsigned address_space_num;
+
 /* The device always have SVQ enabled */
 bool always_svq;
 bool started;
@@ -100,6 +103,8 @@ static const uint64_t vdpa_svq_device_features =
 BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
 BIT_ULL(VIRTIO_NET_F_STANDBY);
 
+#define VHOST_VDPA_NET_CVQ_ASID 1
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -214,6 +219,109 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, 
const uint8_t *buf,
 return 0;
 }
 
+static int vhost_vdpa_get_vring_group(int device_fd,
+  struct vhost_vring_state *state)
+{
+int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
+return r < 0 ? -errno : 0;
+}
+
+/**
+ * Check if all the virtqueues of the virtio device are in a different vq than
+ * the last vq. VQ group of last group passed in cvq_group.
+ */
+static bool vhost_vdpa_cvq_group_is_independent(struct vhost_vdpa *v,
+struct vhost_vring_state cvq_group)
+{
+struct vhost_dev *dev = v->dev;
+int ret;
+
+for (int i = 0; i < (dev->vq_index_end - 1); ++i) {
+struct vhost_vring_state vq_group = {
+.index = i,
+};
+
+ret = vhost_vdpa_get_vring_group(v->device_fd, &vq_group);
+if (unlikely(ret)) {
+goto call_err;
+}
+if (unlikely(vq_group.num == cvq_group.num)) {
+error_report("CVQ %u group is the same as VQ %u one (%u)",
+ cvq_group.index, vq_group.index, cvq_group.num);
+return false;
+}
+}
+
+return true;
+
+call_err:
+error_report("Can't read vq group, errno=%d (%s)", -ret, g_strerror(-ret));
+return false;
+}
+
+static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
+   unsigned vq_group,
+   unsigned asid_num)
+{
+struct vhost_vring_state asid = {
+.index = vq_group,
+.num = asid_num,
+};
+int ret;
+
+ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
+if (unlikely(ret < 0)) {
+error_report("Can't set vq group %u asid %u, errno=%d (%s)",
+asid.index, asid.num, errno, g_strerror(errno));
+}
+return ret;
+}
+
+static void vhost_vdpa_net_prepare(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_vdpa *v = &s->vhost_vdpa;
+struct vhost_dev *dev = v->dev;
+struct vhost_vring_state cvq_group = {
+.index = v->dev->vq_index_end - 1,
+};
+int r;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (dev->nvqs != 1 || dev->vq_index + dev->nvqs != dev->vq_index_end) {
+/* Only interested in CVQ */
+return;
+}
+
+if (s->always_svq) {
+/* SVQ is already enabled */
+return;
+}
+
+if (s->address_space_num < 2) {
+v->shadow_vqs_enabled = false;
+return;
+}
+
+r = vhost_vdpa_get_vring_group(v->device_fd, &cvq_group);
+if (unlikely(r)) {
+error_report("Can't read cvq group, errno=%d (%s)", r, g_strerror(-r));
+v->shadow_vqs_enabled = false;
+return;
+}
+
+if (!vhost_vdpa_cvq_group_is_independent(v, cvq_group)) {
+v->shadow_vqs_enabled = false;
+return;
+}
+
+r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
+VHOST_VDPA_NET_CVQ_ASID);
+v->shadow_vqs_enabled = r == 0;
+s->vhost_vdpa.address_space_id = r == 0 ? 1 : 0;
+}
+
 static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
 {
 VhostIOVATree *tree = v->iova_tr

[PATCH v2 6/7] vhost_net: Add NetClientInfo prepare callback

2022-07-22 Thread Eugenio Pérez
This is used by the backend to perform actions before the device is
started.

In particular, vdpa will use it to isolate CVQ in its own ASID if
possible, and start SVQ unconditionally only in CVQ.

Signed-off-by: Eugenio Pérez 
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index ad9e80083a..37aecff8f7 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
+typedef void (NetPrepare)(NetClientState *);
 typedef int (NetStart)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
@@ -72,6 +73,7 @@ typedef struct NetClientInfo {
 NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
+NetPrepare *prepare;
 NetStart *start;
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ddd9ee0441..0fc892c22b 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -244,6 +244,10 @@ static int vhost_net_start_one(struct vhost_net *net,
 struct vhost_vring_file file = { };
 int r;
 
+if (net->nc->info->prepare) {
+net->nc->info->prepare(net->nc);
+}
+
 r = vhost_dev_enable_notifiers(&net->dev, dev);
 if (r < 0) {
 goto fail_notifiers;
-- 
2.31.1




[PATCH v2 1/7] linux-headers: Update kernel headers to v5.19-rc1

2022-07-22 Thread Eugenio Pérez
Main reason is for new vhost_vdpa address space ioctls to be available.

Signed-off-by: Eugenio Pérez 
---
 include/standard-headers/asm-x86/bootparam.h |  1 +
 include/standard-headers/drm/drm_fourcc.h| 69 
 include/standard-headers/linux/ethtool.h |  1 +
 include/standard-headers/linux/input.h   |  1 +
 include/standard-headers/linux/pci_regs.h|  1 +
 include/standard-headers/linux/vhost_types.h | 11 +++-
 include/standard-headers/linux/virtio_ids.h  | 14 ++--
 linux-headers/asm-arm64/kvm.h| 27 
 linux-headers/asm-generic/unistd.h   |  4 +-
 linux-headers/asm-riscv/kvm.h| 20 ++
 linux-headers/asm-riscv/unistd.h |  3 +-
 linux-headers/asm-x86/kvm.h  | 11 ++--
 linux-headers/asm-x86/mman.h | 14 
 linux-headers/linux/kvm.h| 54 ++-
 linux-headers/linux/userfaultfd.h| 10 ++-
 linux-headers/linux/vfio.h   |  4 +-
 linux-headers/linux/vhost.h  | 26 ++--
 17 files changed, 229 insertions(+), 42 deletions(-)

diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index 072e2ed546..09811d90cf 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,6 +10,7 @@
 #define SETUP_EFI  4
 #define SETUP_APPLE_PROPERTIES 5
 #define SETUP_JAILHOUSE6
+#define SETUP_CC_BLOB  7
 
 #define SETUP_INDIRECT (1<<31)
 
diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 4888f85f69..0b051545d3 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -571,6 +571,53 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
+/*
+ * Intel Tile 4 layout
+ *
+ * This is a tiled layout using 4KB tiles in a row-major layout. It has the 
same
+ * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
+ * only differs from Tile Y at the 256B granularity in between. At this
+ * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a 
shape
+ * of 64B x 8 rows.
+ */
+#define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. For semi-planar formats
+ * like NV12, the Y and UV planes are Tile 4 and are located at plane indices
+ * 0 and 1, respectively. The CCS for all planes are stored outside of the
+ * GEM object in a reserved memory area dedicated for the storage of the
+ * CCS data for all RC/RC_CC/MC compressible GEM objects. The main surface
+ * pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for DG2 render 
compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths. The
+ * clear color is stored at plane index 1 and the pitch should be ignored. The
+ * format of the 256 bits of clear color data matches the one used for the
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
+ * for details.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -608,6 +655,28 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_QCOM_COMPRESSED fourcc_mod_code(QCOM, 1)
 
+/*
+ * Qualcomm Tiled Format
+ *
+ * Similar to DRM_FORMAT_MOD_QCOM_COMPRESSED but not compressed.
+ * Implementation may be platform and base-format specific.
+ *
+ * Each macrotile consists of m x n (mostly 4 x 4) tiles.
+ * Pixel data pitch/stride is aligned with macrotile width.
+ * Pixel data height is aligned with macrotile height.
+ * Entire pixel data buffer is aligned with 4k(bytes).
+ */
+#define DRM_FORMAT_MOD_QCOM_TILED3 fourcc_mod_code(QCOM, 3)
+
+/*
+ * Qualcomm Alternate Tiled Format
+ *
+ * Alternate tiled format typically only used within GMEM.
+ * Implementation may be platform and base

[PATCH v2 5/7] vdpa: Store x-svq parameter in VhostVDPAState

2022-07-22 Thread Eugenio Pérez
CVQ can be shadowed two ways:
- Device has x-svq=on parameter (current way)
- The device can isolate CVQ in its own vq group

QEMU needs to check for the second condition dynamically, because CVQ
index is not known at initialization time. Since this is dynamic, the
CVQ isolation could vary with different conditions, making it possible
to go from "not isolated group" to "isolated".

Saving the cmdline parameter in an extra field so we never disable CVQ
SVQ in case the device was started with cmdline.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 8203200c2a..6c1c64f9b1 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -36,6 +36,9 @@ typedef struct VhostVDPAState {
 
 /* Control commands shadow buffers */
 void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
+
+/* The device always have SVQ enabled */
+bool always_svq;
 bool started;
 } VhostVDPAState;
 
@@ -565,6 +568,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+s->always_svq = svq;
 s->vhost_vdpa.shadow_vqs_enabled = svq;
 s->vhost_vdpa.iova_tree = iova_tree;
 if (!is_datapath) {
-- 
2.31.1




[PATCH v2 0/7] ASID support in vhost-vdpa net

2022-07-22 Thread Eugenio Pérez
Control VQ is the way net devices use to send changes to the device state, like
the number of active queues or its mac address.

QEMU needs to intercept this queue so it can track these changes and is able to
migrate the device. It can do it from 1576dbb5bbc4 ("vdpa: Add x-svq to
NetdevVhostVDPAOptions"). However, to enable x-svq implies to shadow all VirtIO
device's virtqueues, which will damage performance.

This series adds address space isolation, so the device and the guest
communicate directly with them (passthrough) and CVQ communication is split in
two: The guest communicates with QEMU and QEMU forwards the commands to the
device.

This series is based on [1], and this needs to be applied on top of that.  Each
one of them adds a feature on isolation and could be merged individually once
conflicts are solved.

Comments are welcome. Thanks!

v2:
- Much as commented on series [1], handle vhost_net backend through
  NetClientInfo callbacks instead of directly.
- Fix not freeing SVQ properly when device does not support CVQ
- Add BIT_ULL missed checking device's backend feature for _F_ASID.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg04009.html

Eugenio Pérez (7):
  linux-headers: Update kernel headers to v5.19-rc1
  vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop
  vdpa: Allocate SVQ unconditionally
  vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  vdpa: Store x-svq parameter in VhostVDPAState
  vhost_net: Add NetClientInfo prepare callback
  vdpa: Always start CVQ in SVQ mode

 include/hw/virtio/vhost-vdpa.h   |   8 +-
 include/net/net.h|   2 +
 include/standard-headers/asm-x86/bootparam.h |   1 +
 include/standard-headers/drm/drm_fourcc.h|  69 
 include/standard-headers/linux/ethtool.h |   1 +
 include/standard-headers/linux/input.h   |   1 +
 include/standard-headers/linux/pci_regs.h|   1 +
 include/standard-headers/linux/vhost_types.h |  11 +-
 include/standard-headers/linux/virtio_ids.h  |  14 +-
 linux-headers/asm-arm64/kvm.h|  27 +++
 linux-headers/asm-generic/unistd.h   |   4 +-
 linux-headers/asm-riscv/kvm.h|  20 +++
 linux-headers/asm-riscv/unistd.h |   3 +-
 linux-headers/asm-x86/kvm.h  |  11 +-
 linux-headers/asm-x86/mman.h |  14 --
 linux-headers/linux/kvm.h|  54 +-
 linux-headers/linux/userfaultfd.h|  10 +-
 linux-headers/linux/vfio.h   |   4 +-
 linux-headers/linux/vhost.h  |  26 ++-
 hw/net/vhost_net.c   |   4 +
 hw/virtio/vhost-vdpa.c   |  66 
 net/vhost-vdpa.c | 168 ++-
 hw/virtio/trace-events   |   4 +-
 23 files changed, 438 insertions(+), 85 deletions(-)

-- 
2.31.1





[PATCH v2 2/7] vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop

2022-07-22 Thread Eugenio Pérez
This function used to trust in v->shadow_vqs != NULL to know if it must
start svq or not.

This is not going to be valid anymore, as qemu is going to allocate svq
unconditionally (but it will only start them conditionally).

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9a2daef7e3..897e1fdd47 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1019,7 +1019,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 Error *err = NULL;
 unsigned i;
 
-if (!v->shadow_vqs) {
+if (!v->shadow_vqs_enabled) {
 return true;
 }
 
@@ -1072,7 +1072,7 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 {
 struct vhost_vdpa *v = dev->opaque;
 
-if (!v->shadow_vqs) {
+if (!v->shadow_vqs_enabled) {
 return true;
 }
 
-- 
2.31.1




Re: [PATCH v2 18/19] vdpa: Add device migration blocker

2022-07-22 Thread Eugenio Perez Martin
On Fri, Jul 15, 2022 at 10:51 AM Jason Wang  wrote:
>
> On Fri, Jul 15, 2022 at 1:40 PM Eugenio Perez Martin
>  wrote:
> >
> > On Fri, Jul 15, 2022 at 6:03 AM Jason Wang  wrote:
> > >
> > > On Fri, Jul 15, 2022 at 12:32 AM Eugenio Pérez  
> > > wrote:
> > > >
> > > > Since the vhost-vdpa device is exposing _F_LOG,
> > >
> > > I may miss something but I think it doesn't?
> > >
> >
> > It's at vhost_vdpa_get_features. As long as SVQ is enabled, it's
> > exposing VHOST_F_LOG_ALL.
>
> Ok, so this needs to be specified in the change log.

I tried to add the entry in the changelog but I don't have the
permission to do so.

Something like "Add new experimental x-svq option to migrate simple
vhost-vdpa net devices without CVQ"?

Thanks!




  1   2   >