Re: [PATCH] block-qdict: Fix -Werror=maybe-uninitialized build failure

2022-03-11 Thread Murilo Opsfelder Araújo

Hi, Markus.

On 3/11/22 06:33, Markus Armbruster wrote:

Murilo Opsfelder Araujo  writes:


Building QEMU on Fedora 37 (Rawhide Prerelease) ppc64le failed with the
following error:

 $ ../configure --prefix=/usr/local/qemu-disabletcg 
--target-list=ppc-softmmu,ppc64-softmmu --disable-tcg --disable-linux-user
 ...
 $ make -j$(nproc)
 ...
 FAILED: libqemuutil.a.p/qobject_block-qdict.c.o
 cc -m64 -mlittle-endian -Ilibqemuutil.a.p -I. -I.. 
-Isubprojects/libvhost-user -I../subprojects/libvhost-user -Iqapi -Itrace -Iui 
-Iui/shader -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
-I/usr/include/sysprof-4 -I/usr/include/lib
 mount -I/usr/include/blkid -I/usr/include/gio-unix-2.0 
-I/usr/include/p11-kit-1 -I/usr/include/pixman-1 -fdiagnostics-color=auto -Wall 
-Winvalid-pch -Werror -std=gnu11 -O2 -g -isystem /root/qemu/linux-headers 
-isystem linux-headers -iquote
  . -iquote /root/qemu -iquote /root/qemu/include -iquote 
/root/qemu/disas/libvixl -pthread -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite
 -strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
-Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs 
-Wendif-label
 s -Wexpansion-to-defined -Wimplicit-fallthrough=2 
-Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi 
-fstack-protector-strong -fPIE -MD -MQ libqemuutil.a.p/qobject_block-qdict.c.o 
-MF libqemuutil.a.p/qobject_block-qdict.c.o.d -
 o libqemuutil.a.p/qobject_block-qdict.c.o -c ../qobject/block-qdict.c
 In file included from /root/qemu/include/qapi/qmp/qdict.h:16,
  from /root/qemu/include/block/qdict.h:13,
  from ../qobject/block-qdict.c:11:
 /root/qemu/include/qapi/qmp/qobject.h: In function ‘qdict_array_split’:
 /root/qemu/include/qapi/qmp/qobject.h:49:17: error: ‘subqdict’ may be used 
uninitialized [-Werror=maybe-uninitialized]
49 | typeof(obj) _obj = (obj);   \
   | ^~~~
 ../qobject/block-qdict.c:227:16: note: ‘subqdict’ declared here
   227 | QDict *subqdict;
   |^~~~
 cc1: all warnings being treated as errors

Fix build failure by initializing the QDict variable with NULL.

Signed-off-by: Murilo Opsfelder Araujo 
Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: Markus Armbruster 
---
  qobject/block-qdict.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qobject/block-qdict.c b/qobject/block-qdict.c
index 1487cc5dd8..b26524429c 100644
--- a/qobject/block-qdict.c
+++ b/qobject/block-qdict.c
@@ -224,7 +224,7 @@ void qdict_array_split(QDict *src, QList **dst)
  for (i = 0; i < UINT_MAX; i++) {
  QObject *subqobj;
  bool is_subqdict;
-QDict *subqdict;
+QDict *subqdict = NULL;
  char indexstr[32], prefix[32];
  size_t snprintf_ret;


The compiler's warning is actually spurious.  Your patch is the
minimally invasive way to shut it up.  But I wonder whether we can
make the code clearer instead.  Let's have a look:

/*
 * There may be either a single subordinate object (named
 * "%u") or multiple objects (each with a key prefixed "%u."),
 * but not both.
 */
if (!subqobj == !is_subqdict) {
break;

Because of this, ...

}

if (is_subqdict) {

... subqobj is non-null here, and ...

qdict_extract_subqdict(src, , prefix);
assert(qdict_size(subqdict) > 0);
} else {

... null here.

qobject_ref(subqobj);
qdict_del(src, indexstr);
}

qlist_append_obj(*dst, subqobj ?: QOBJECT(subqdict));

What about this:

if (is_subqdict) {
qdict_extract_subqdict(src, , prefix);
assert(qdict_size(subqdict) > 0);
qlist_append_obj(*dst, subqobj);
} else {
qobject_ref(subqobj);
qdict_del(src, indexstr);
qlist_append_obj(*dst, QOBJECT(subqdict));
}



The logic looks inverted but I think I got what you meant.

I've sent a v2 with changes that made the compiler happy and also passed 
check-unit tests.

Thank you!

--
Murilo



[PATCH v2] block-qdict: Fix -Werror=maybe-uninitialized build failure

2022-03-11 Thread Murilo Opsfelder Araujo
Building QEMU on Fedora 37 (Rawhide Prerelease) ppc64le failed with the
following error:

$ ../configure --prefix=/usr/local/qemu-disabletcg 
--target-list=ppc-softmmu,ppc64-softmmu --disable-tcg --disable-linux-user
...
$ make -j$(nproc)
...
FAILED: libqemuutil.a.p/qobject_block-qdict.c.o
cc -m64 -mlittle-endian -Ilibqemuutil.a.p -I. -I.. 
-Isubprojects/libvhost-user -I../subprojects/libvhost-user -Iqapi -Itrace -Iui 
-Iui/shader -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
-I/usr/include/sysprof-4 -I/usr/include/lib
mount -I/usr/include/blkid -I/usr/include/gio-unix-2.0 
-I/usr/include/p11-kit-1 -I/usr/include/pixman-1 -fdiagnostics-color=auto -Wall 
-Winvalid-pch -Werror -std=gnu11 -O2 -g -isystem /root/qemu/linux-headers 
-isystem linux-headers -iquote
 . -iquote /root/qemu -iquote /root/qemu/include -iquote 
/root/qemu/disas/libvixl -pthread -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite
-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
-Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs 
-Wendif-label
s -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-psabi -fstack-protector-strong -fPIE -MD -MQ 
libqemuutil.a.p/qobject_block-qdict.c.o -MF 
libqemuutil.a.p/qobject_block-qdict.c.o.d -
o libqemuutil.a.p/qobject_block-qdict.c.o -c ../qobject/block-qdict.c
In file included from /root/qemu/include/qapi/qmp/qdict.h:16,
 from /root/qemu/include/block/qdict.h:13,
 from ../qobject/block-qdict.c:11:
/root/qemu/include/qapi/qmp/qobject.h: In function ‘qdict_array_split’:
/root/qemu/include/qapi/qmp/qobject.h:49:17: error: ‘subqdict’ may be used 
uninitialized [-Werror=maybe-uninitialized]
   49 | typeof(obj) _obj = (obj);   \
  | ^~~~
../qobject/block-qdict.c:227:16: note: ‘subqdict’ declared here
  227 | QDict *subqdict;
  |^~~~
cc1: all warnings being treated as errors

Fix build failure by expanding the ternary operation.
Tested with `make check-unit` (the check-block-qdict test passed).

Signed-off-by: Murilo Opsfelder Araujo 
Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: Markus Armbruster 
---
v1: https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03224.html

 qobject/block-qdict.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qobject/block-qdict.c b/qobject/block-qdict.c
index 1487cc5dd8..4a83bda2c3 100644
--- a/qobject/block-qdict.c
+++ b/qobject/block-qdict.c
@@ -251,12 +251,12 @@ void qdict_array_split(QDict *src, QList **dst)
 if (is_subqdict) {
 qdict_extract_subqdict(src, , prefix);
 assert(qdict_size(subqdict) > 0);
+qlist_append_obj(*dst, QOBJECT(subqdict));
 } else {
 qobject_ref(subqobj);
 qdict_del(src, indexstr);
+qlist_append_obj(*dst, subqobj);
 }
-
-qlist_append_obj(*dst, subqobj ?: QOBJECT(subqdict));
 }
 }
 
-- 
2.35.1




Re: [PATCH V7 11/29] qapi: list utility functions

2022-03-11 Thread Marc-André Lureau
Hi

On Fri, Mar 11, 2022 at 8:46 PM Steven Sistare 
wrote:

> On 3/9/2022 9:11 AM, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Dec 22, 2021 at 11:42 PM Steve Sistare <
> steven.sist...@oracle.com > wrote:
> >
> > Generalize strList_from_comma_list() to take any delimiter
> character, rename
> > as strList_from_string(), and move it to qapi/util.c.  Also add
> > strv_from_strList() and QAPI_LIST_LENGTH().
> >
> > Looks like you could easily split, and add some tests.
>
> Will do.
> I don't see any tests that include qapi/util.h, so this will be a new test
> file.
>
> For the split, how about:
>   patch: qapi: strList_from_string
>   patch: qapi: strv_from_strList
>   patch: qapi: QAPI_LIST_LENGTH
>   patch: qapi: unit tests for lists
>
>
Sure, that's fine


> Or do you prefer that unit tests be pushed with each function's patch?
>

I don't have a strong preference. I usually prefer new code coming with its
own test, but if the resulting patch becomes too large, or if the test
touches other related aspects, might be better off as different patch. Up
to you!


> > No functional change.
> >
> > Signed-off-by: Steve Sistare  steven.sist...@oracle.com>>
> > ---
> >  include/qapi/util.h | 28 
> >  monitor/hmp-cmds.c  | 29 ++---
> >  qapi/qapi-util.c| 37 +
> >  3 files changed, 67 insertions(+), 27 deletions(-)
> >
> > diff --git a/include/qapi/util.h b/include/qapi/util.h
> > index 81a2b13..c249108 100644
> > --- a/include/qapi/util.h
> > +++ b/include/qapi/util.h
> > @@ -22,6 +22,8 @@ typedef struct QEnumLookup {
> >  const int size;
> >  } QEnumLookup;
> >
> > +struct strList;
> > +
> >  const char *qapi_enum_lookup(const QEnumLookup *lookup, int val);
> >  int qapi_enum_parse(const QEnumLookup *lookup, const char *buf,
> >  int def, Error **errp);
> > @@ -31,6 +33,19 @@ bool qapi_bool_parse(const char *name, const char
> *value, bool *obj,
> >  int parse_qapi_name(const char *name, bool complete);
> >
> >  /*
> > + * Produce and return a NULL-terminated array of strings from @args.
> > + * All strings are g_strdup'd.
> > + */
> > +char **strv_from_strList(const struct strList *args);
> >
> > +
> >
> > I'd suggest to use the dedicated glib type GStrv
>
> Will do, here and in related code.
>

thanks


>
> - Steve
>
> > +/*
> > + * Produce a strList from the character delimited string @in.
> > + * All strings are g_strdup'd.
> > + * A NULL or empty input string returns NULL.
> > + */
> > +struct strList *strList_from_string(const char *in, char delim);
> > +
> > +/*
> >   * For any GenericList @list, insert @element at the front.
> >   *
> >   * Note that this macro evaluates @element exactly once, so it is
> safe
> > @@ -56,4 +71,17 @@ int parse_qapi_name(const char *name, bool
> complete);
> >  (tail) = &(*(tail))->next; \
> >  } while (0)
> >
> > +/*
> > + * For any GenericList @list, return its length.
> > + */
> > +#define QAPI_LIST_LENGTH(list) \
> > +({ \
> > +int len = 0; \
> > +typeof(list) elem; \
> > +for (elem = list; elem != NULL; elem = elem->next) { \
> > +len++; \
> > +} \
> > +len; \
> > +})
> > +
> >  #endif
> > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > index b8c22da..5ca8b4b 100644
> > --- a/monitor/hmp-cmds.c
> > +++ b/monitor/hmp-cmds.c
> > @@ -43,6 +43,7 @@
> >  #include "qapi/qapi-commands-run-state.h"
> >  #include "qapi/qapi-commands-tpm.h"
> >  #include "qapi/qapi-commands-ui.h"
> > +#include "qapi/util.h"
> >  #include "qapi/qapi-visit-net.h"
> >  #include "qapi/qapi-visit-migration.h"
> >  #include "qapi/qmp/qdict.h"
> > @@ -70,32 +71,6 @@ bool hmp_handle_error(Monitor *mon, Error *err)
> >  return false;
> >  }
> >
> > -/*
> > - * Produce a strList from a comma separated list.
> > - * A NULL or empty input string return NULL.
> > - */
> > -static strList *strList_from_comma_list(const char *in)
> > -{
> > -strList *res = NULL;
> > -strList **tail = 
> > -
> > -while (in && in[0]) {
> > -char *comma = strchr(in, ',');
> > -char *value;
> > -
> > -if (comma) {
> > -value = g_strndup(in, comma - in);
> > -in = comma + 1; /* skip the , */
> > -} else {
> > -value = g_strdup(in);
> > -in = NULL;
> > -}
> > -QAPI_LIST_APPEND(tail, value);
> > -}
> > -
> > -return res;
> > -}
> > -
> >  void hmp_info_name(Monitor *mon, const QDict *qdict)
> >  {
> 

Re: [PATCH 3/3] tests/tcg/s390x: Test BRASL and BRCL with large negative offsets

2022-03-11 Thread Richard Henderson

On 3/11/22 10:49, Ilya Leoshkevich wrote:

+size_t length = 0x10006;
+unsigned char *buf;
+int i;
+
+buf = mmap(NULL, length, PROT_READ | PROT_WRITE | PROT_EXEC,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+assert(buf != MAP_FAILED);


I'm thinking exit success here, as such a large allocation may well fail depending on the 
host.



r~



Re: [PATCH 2/5] dump: Split write of section headers and data and add a prepare step

2022-03-11 Thread Richard Henderson

On 3/10/22 03:16, Janosch Frank wrote:

-static void write_elf_section(DumpState *s, int type, Error **errp)
+static size_t write_elf_section_hdr_zero(DumpState *s, void *buff)
  {
-Elf32_Shdr shdr32;
-Elf64_Shdr shdr64;
-int shdr_size;
-void *shdr;
-int ret;
+Elf32_Shdr *shdr32 = buff;
+Elf64_Shdr *shdr64 = buff;
  
-if (type == 0) {

-shdr_size = sizeof(Elf32_Shdr);
-memset(, 0, shdr_size);
-shdr32.sh_info = cpu_to_dump32(s, s->phdr_num);
-shdr = 
+if (dump_is_64bit(s)) {
+memset(buff, 0, sizeof(Elf64_Shdr));
+shdr64->sh_info = cpu_to_dump32(s, s->phdr_num);
  } else {
-shdr_size = sizeof(Elf64_Shdr);
-memset(, 0, shdr_size);
-shdr64.sh_info = cpu_to_dump32(s, s->phdr_num);
-shdr = 
+memset(buff, 0, sizeof(Elf32_Shdr));
+shdr32->sh_info = cpu_to_dump32(s, s->phdr_num);
  }


I'd move those shdr* variables into the if blocks.
It looks odd to have them both in scope at once.

You're re-zeroing the data -- it was allocated with g_malloc0.

  
-ret = fd_write_vmcore(shdr, shdr_size, s);

+return dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+}


Return value here...


+
+static void prepare_elf_section_hdrs(DumpState *s)
+{
+uint8_t *buff_hdr;
+size_t len, sizeof_shdr;
+
+/*
+ * Section ordering:
+ * - HDR zero (if needed)
+ */
+sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+len = sizeof_shdr * s->shdr_num ;
+s->elf_section_hdrs = g_malloc0(len);


Perhaps save to s->shdr_len?


+buff_hdr = s->elf_section_hdrs;


Why this extra variable?


+
+/* Write special section first */
+if (s->phdr_num == PN_XNUM) {
+write_elf_section_hdr_zero(s, buff_hdr);
+}


... but not used here.  Was there some other intended use?  You already know the size, per 
above...



+static void write_elf_section_headers(DumpState *s, Error **errp)
+{
+size_t sizeof_shdr;
+int ret;
+
+sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+
+ret = fd_write_vmcore(s->elf_section_hdrs, s->shdr_num * sizeof_shdr, s);


Use saved s->shdr_len?  Skip if 0?


+static void write_elf_sections(DumpState *s, Error **errp)
+{
+int ret;
+
+/* Write section zero */
+ret = fd_write_vmcore(s->elf_section_data, s->elf_section_data_size, s);


Again skip if 0?  Comment is misleading because section 0 should have no 
contents...


r~



Re: [PATCH 3/5] dump: Reorder struct DumpState

2022-03-11 Thread Richard Henderson

On 3/10/22 03:16, Janosch Frank wrote:

Let's move ELF related members into one block and guest memory related
ones into another to improve readability.

Signed-off-by: Janosch Frank
---
  include/sysemu/dump.h | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 1/5] dump: Allocate header

2022-03-11 Thread Richard Henderson

On 3/10/22 03:16, Janosch Frank wrote:

Allocating the header lets us write it at a later time and hence also
allows us to change section and segment table offsets until we
finally write it.

Signed-off-by: Janosch Frank 
---
  dump/dump.c   | 115 ++
  include/sysemu/dump.h |   1 +
  2 files changed, 62 insertions(+), 54 deletions(-)


You're collecting the important pieces that go into the header separately (phdr_offset, 
phdr_num, etc), and you can certainly change those as you go along.  But I don't see the 
point of allocating the ehdr early.  I think you should create it and write it out 
immediately.



r~



Re: [PATCH v2 9/9] dump: Consolidate elf note function

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

+static void write_elf_notes(DumpState *s, Error **errp)
+{
+ERRP_GUARD();
+
+if (dump_is_64bit(s)) {
+write_elf64_notes(fd_write_vmcore, s, errp);
+} else {
+write_elf32_notes(fd_write_vmcore, s, errp);
+}
+if (*errp) {
+return;
+}
+}


Are you anticipating adding more code to this function?
Otherwise the early return, and the guard are useless.


r~



Re: [PATCH v2 8/9] dump: Cleanup dump_begin write functions

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

There's no need to have a gigantic if in there let's move the elf
32/64 bit logic into the section, segment or note code.

Signed-off-by: Janosch Frank
---
  dump/dump.c | 42 +++---
  1 file changed, 11 insertions(+), 31 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 7/9] dump: Consolidate phdr note writes

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

There's no need to have two write functions. Let's rather have two
functions that set the data for elf 32/64 and then write it in a
common function.

Signed-off-by: Janosch Frank
---
  dump/dump.c | 94 +++--
  1 file changed, 48 insertions(+), 46 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 6/9] dump: Introduce dump_is_64bit() helper function

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

Checking d_class in dump_info leads to lengthy conditionals so let's
shorten things a bit by introducing a helper function.

Signed-off-by: Janosch Frank 
---
  dump/dump.c | 19 ---
  1 file changed, 12 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 


@@ -1007,7 +1012,7 @@ out:
  
  static void write_dump_header(DumpState *s, Error **errp)

  {
-if (s->dump_info.d_class == ELFCLASS32) {
+if (!dump_is_64bit(s)) {
  create_header32(s, errp);
  } else {
  create_header64(s, errp);
@@ -1697,7 +1702,7 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
  uint32_t size;
  uint16_t format;
  
-note_head_size = s->dump_info.d_class == ELFCLASS32 ?

+note_head_size = !dump_is_64bit(s) ?
  sizeof(Elf32_Nhdr) : sizeof(Elf64_Nhdr);


It would be nice to standardize on positive tests, which in this case would reverse these 
two conditionals.



r~



Re: [PATCH v2 5/9] dump: Add more offset variables

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

Offset calculations are easy enough to get wrong. Let's add a few
variables to make moving around elf headers and data sections easier.

Signed-off-by: Janosch Frank
Reviewed-by: Marc-André Lureau
---
  dump/dump.c   | 35 +++
  include/sysemu/dump.h |  4 
  2 files changed, 19 insertions(+), 20 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 4/9] dump: Remove the section if when calculating the memory offset

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

When s->shdr_num is 0 we'll add 0 bytes of section headers which is
equivalent to not adding section headers but with the multiplication
we can remove a if/else.

Signed-off-by: Janosch Frank
---
  dump/dump.c | 24 
  1 file changed, 8 insertions(+), 16 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 3/9] dump: Introduce shdr_num to decrease complexity

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

Let's move from a boolean to a int variable which will later enable us
to store the number of sections that are in the dump file.

Signed-off-by: Janosch Frank
---
  dump/dump.c   | 24 
  include/sysemu/dump.h |  2 +-
  2 files changed, 13 insertions(+), 13 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 2/9] dump: Remove the sh_info variable

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

There's no need to have phdr_num and sh_info at the same time. We can
make phdr_num 32 bit and set PN_XNUM when we write the header if
phdr_num >= PN_XNUM.

Signed-off-by: Janosch Frank
---
  dump/dump.c   | 34 ++
  include/sysemu/dump.h |  3 +--
  2 files changed, 15 insertions(+), 22 deletions(-)


Nice.

Reviewed-by: Richard Henderson 

r~



Re: [PATCH 3/3] tests/tcg/s390x: Test BRASL and BRCL with large negative offsets

2022-03-11 Thread Ilya Leoshkevich
On Fri, 2022-03-11 at 19:57 +0100, David Hildenbrand wrote:
> On 11.03.22 19:49, Ilya Leoshkevich wrote:
> > Add a small test in order to prevent regressions.
> > 
> > Signed-off-by: Ilya Leoshkevich 
> > ---
> >  tests/tcg/s390x/Makefile.target    |  1 +
> >  tests/tcg/s390x/branch-relative-long.c | 29
> > ++
> >  2 files changed, 30 insertions(+)
> >  create mode 100644 tests/tcg/s390x/branch-relative-long.c
> > 
> > diff --git a/tests/tcg/s390x/Makefile.target
> > b/tests/tcg/s390x/Makefile.target
> > index 257c568c58..fd34b130f7 100644
> > --- a/tests/tcg/s390x/Makefile.target
> > +++ b/tests/tcg/s390x/Makefile.target
> > @@ -15,6 +15,7 @@ TESTS+=mvc
> >  TESTS+=shift
> >  TESTS+=trap
> >  TESTS+=signals-s390x
> > +TESTS+=branch-relative-long
> >  
> >  ifneq ($(HAVE_GDB_BIN),)
> >  GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
> > diff --git a/tests/tcg/s390x/branch-relative-long.c
> > b/tests/tcg/s390x/branch-relative-long.c
> > new file mode 100644
> > index 00..b9fcee9873
> > --- /dev/null
> > +++ b/tests/tcg/s390x/branch-relative-long.c
> > @@ -0,0 +1,29 @@
> > +#include 
> > +#include 
> > +#include 
> > +
> > +int main(void)
> > +{
> > +    const unsigned short opcodes[] = {
> > +    0xc005,  /* brasl %r0 */
> > +    0xc0f4,  /* brcl 0xf */
> > +    };
> > +    size_t length = 0x10006;
> > +    unsigned char *buf;
> > +    int i;
> > +
> > +    buf = mmap(NULL, length, PROT_READ | PROT_WRITE | PROT_EXEC,
> > +   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > +    assert(buf != MAP_FAILED);
> > +
> > +    *(unsigned short *)[0] = 0x07fe;  /* br %r14 */
> > +    *(unsigned int *)[0x10002] = 0x8000;
> > +    for (i = 0; i < sizeof(opcodes) / sizeof(opcodes[0]); i++) {
> > +    *(unsigned short *)[0x1] = opcodes[i];
> > +    ((void (*)(void))[0x1])();
> > +    }
> 
> Hmmm, can't we write some "nice" inline asm instead?
> 
> 

If we do this in a straightforward way, then the resulting binary will
be 4G large.

But maybe there is a way to play games with sections, I'll need to
think about it.



Re: [PATCH 3/3] tests/tcg/s390x: Test BRASL and BRCL with large negative offsets

2022-03-11 Thread David Hildenbrand
On 11.03.22 19:49, Ilya Leoshkevich wrote:
> Add a small test in order to prevent regressions.
> 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  tests/tcg/s390x/Makefile.target|  1 +
>  tests/tcg/s390x/branch-relative-long.c | 29 ++
>  2 files changed, 30 insertions(+)
>  create mode 100644 tests/tcg/s390x/branch-relative-long.c
> 
> diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
> index 257c568c58..fd34b130f7 100644
> --- a/tests/tcg/s390x/Makefile.target
> +++ b/tests/tcg/s390x/Makefile.target
> @@ -15,6 +15,7 @@ TESTS+=mvc
>  TESTS+=shift
>  TESTS+=trap
>  TESTS+=signals-s390x
> +TESTS+=branch-relative-long
>  
>  ifneq ($(HAVE_GDB_BIN),)
>  GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
> diff --git a/tests/tcg/s390x/branch-relative-long.c 
> b/tests/tcg/s390x/branch-relative-long.c
> new file mode 100644
> index 00..b9fcee9873
> --- /dev/null
> +++ b/tests/tcg/s390x/branch-relative-long.c
> @@ -0,0 +1,29 @@
> +#include 
> +#include 
> +#include 
> +
> +int main(void)
> +{
> +const unsigned short opcodes[] = {
> +0xc005,  /* brasl %r0 */
> +0xc0f4,  /* brcl 0xf */
> +};
> +size_t length = 0x10006;
> +unsigned char *buf;
> +int i;
> +
> +buf = mmap(NULL, length, PROT_READ | PROT_WRITE | PROT_EXEC,
> +   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +assert(buf != MAP_FAILED);
> +
> +*(unsigned short *)[0] = 0x07fe;  /* br %r14 */
> +*(unsigned int *)[0x10002] = 0x8000;
> +for (i = 0; i < sizeof(opcodes) / sizeof(opcodes[0]); i++) {
> +*(unsigned short *)[0x1] = opcodes[i];
> +((void (*)(void))[0x1])();
> +}

Hmmm, can't we write some "nice" inline asm instead?


-- 
Thanks,

David / dhildenb




Re: [PATCH 2/3] s390x/tcg: Fix BRCL with a large negative offset

2022-03-11 Thread David Hildenbrand
On 11.03.22 19:49, Ilya Leoshkevich wrote:
> When RI2 is 0x8000, qemu enters an infinite loop instead of jumping
> backwards. Fix by adding a missing cast, like in in2_ri2().
> 
> Fixes: 7233f2ed1717 ("target-s390: Convert BRANCH ON CONDITION")
> Signed-off-by: Ilya Leoshkevich 
> ---
>  target/s390x/tcg/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index 41c8696185..5acfc0ff9b 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -1201,7 +1201,7 @@ static DisasJumpType help_branch(DisasContext *s, 
> DisasCompare *c,
>   bool is_imm, int imm, TCGv_i64 cdest)
>  {
>  DisasJumpType ret;
> -uint64_t dest = s->base.pc_next + 2 * imm;
> +uint64_t dest = s->base.pc_next + (int64_t)imm * 2;
>  TCGLabel *lab;
>  
>  /* Take care of the special cases first.  */

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb




Re: [PATCH 1/3] s390x/tcg: Fix BRASL with a large negative offset

2022-03-11 Thread David Hildenbrand
On 11.03.22 19:49, Ilya Leoshkevich wrote:
> When RI2 is 0x8000, qemu enters an infinite loop instead of jumping
> backwards. Fix by adding a missing cast, like in in2_ri2().
> 
> Fixes: 8ac33cdb8bfb ("Convert BRANCH AND SAVE")
> Signed-off-by: Ilya Leoshkevich 
> ---
>  target/s390x/tcg/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index 904b51542f..41c8696185 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -1597,7 +1597,7 @@ static DisasJumpType op_bal(DisasContext *s, DisasOps 
> *o)
>  static DisasJumpType op_basi(DisasContext *s, DisasOps *o)
>  {
>  pc_to_link_info(o->out, s, s->pc_tmp);
> -return help_goto_direct(s, s->base.pc_next + 2 * get_field(s, i2));
> +return help_goto_direct(s, s->base.pc_next + (int64_t)get_field(s, i2) * 
> 2);
>  }
>  
>  static DisasJumpType op_bc(DisasContext *s, DisasOps *o)

Reviewed-by: David Hildenbrand 

Thanks!

-- 
Thanks,

David / dhildenb




Re: XIVE VFIO kernel resample failure in INTx mode under heavy load

2022-03-11 Thread Timothy Pearson
Correction -- the desynchronization appears to be on the DisINTx line.

Host:
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- SERR- TAbort- SERR-  From: "Timothy Pearson" 
> To: "qemu-devel" 
> Sent: Friday, March 11, 2022 12:35:45 PM
> Subject: XIVE VFIO kernel resample failure in INTx mode under heavy load

> All,
> 
> I've been struggling for some time with what is looking like a potential bug 
> in
> QEMU/KVM on the POWER9 platform.  It appears that in XIVE mode, when the
> in-kernel IRQ chip is enabled, an external device that rapidly asserts IRQs 
> via
> the legacy INTx level mechanism will only receive one interrupt in the KVM
> guest.
> 
> Changing any one of those items appears to avoid the glitch, e.g. XICS mode 
> with
> the in-kernel IRQ chip works (all interrupts are passed through), and XIVE 
> mode
> with the in-kernel IRQ chip disabled also works.  We are also not seeing any
> problems in XIVE mode with the in-kernel chip from MSI/MSI-X devices.
> 
> The device in question is a real time card that needs to raise an interrupt
> every 1ms.  It works perfectly on the host, but fails in the guest -- with the
> in-kernel IRQ chip and XIVE enabled, it receives exactly one interrupt, at
> which point the host continues to see INTx+ but the guest sees INTX-, and the
> IRQ handler in the guest kernel is never reentered.
> 
> We have also seen some very rare glitches where, over a long period of time, 
> we
> can enter a similar deadlock in XICS mode.  Disabling the in-kernel IRQ chip 
> in
> XIVE mode will also lead to the lockup with this device, since the userspace
> IRQ emulation cannot keep up with the rapid interrupt firing (measurements 
> show
> around 100ms required for processing each interrupt in the user mode).
> 
> My understanding is the resample mechanism does some clever tricks with level
> IRQs, but that QEMU needs to check if the IRQ is still asserted by the device
> on guest EOI.  Since a failure here would explain these symptoms I'm wondering
> if there is a bug in either QEMU or KVM for POWER / pSeries (SPAPr) where the
> IRQ is not resampled and therefore not re-fired in the guest?
> 
> Unfortunately I lack the resources at the moment to dig through the QEMU
> codebase and try to find the bug.  Any IBMers here that might be able to help
> out?  I can provide access to a test setup if desired.
> 
> Thanks!



[PATCH 1/3] s390x/tcg: Fix BRASL with a large negative offset

2022-03-11 Thread Ilya Leoshkevich
When RI2 is 0x8000, qemu enters an infinite loop instead of jumping
backwards. Fix by adding a missing cast, like in in2_ri2().

Fixes: 8ac33cdb8bfb ("Convert BRANCH AND SAVE")
Signed-off-by: Ilya Leoshkevich 
---
 target/s390x/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 904b51542f..41c8696185 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -1597,7 +1597,7 @@ static DisasJumpType op_bal(DisasContext *s, DisasOps *o)
 static DisasJumpType op_basi(DisasContext *s, DisasOps *o)
 {
 pc_to_link_info(o->out, s, s->pc_tmp);
-return help_goto_direct(s, s->base.pc_next + 2 * get_field(s, i2));
+return help_goto_direct(s, s->base.pc_next + (int64_t)get_field(s, i2) * 
2);
 }
 
 static DisasJumpType op_bc(DisasContext *s, DisasOps *o)
-- 
2.35.1




[PATCH 2/3] s390x/tcg: Fix BRCL with a large negative offset

2022-03-11 Thread Ilya Leoshkevich
When RI2 is 0x8000, qemu enters an infinite loop instead of jumping
backwards. Fix by adding a missing cast, like in in2_ri2().

Fixes: 7233f2ed1717 ("target-s390: Convert BRANCH ON CONDITION")
Signed-off-by: Ilya Leoshkevich 
---
 target/s390x/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 41c8696185..5acfc0ff9b 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -1201,7 +1201,7 @@ static DisasJumpType help_branch(DisasContext *s, 
DisasCompare *c,
  bool is_imm, int imm, TCGv_i64 cdest)
 {
 DisasJumpType ret;
-uint64_t dest = s->base.pc_next + 2 * imm;
+uint64_t dest = s->base.pc_next + (int64_t)imm * 2;
 TCGLabel *lab;
 
 /* Take care of the special cases first.  */
-- 
2.35.1




[PATCH 3/3] tests/tcg/s390x: Test BRASL and BRCL with large negative offsets

2022-03-11 Thread Ilya Leoshkevich
Add a small test in order to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/s390x/Makefile.target|  1 +
 tests/tcg/s390x/branch-relative-long.c | 29 ++
 2 files changed, 30 insertions(+)
 create mode 100644 tests/tcg/s390x/branch-relative-long.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 257c568c58..fd34b130f7 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -15,6 +15,7 @@ TESTS+=mvc
 TESTS+=shift
 TESTS+=trap
 TESTS+=signals-s390x
+TESTS+=branch-relative-long
 
 ifneq ($(HAVE_GDB_BIN),)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
diff --git a/tests/tcg/s390x/branch-relative-long.c 
b/tests/tcg/s390x/branch-relative-long.c
new file mode 100644
index 00..b9fcee9873
--- /dev/null
+++ b/tests/tcg/s390x/branch-relative-long.c
@@ -0,0 +1,29 @@
+#include 
+#include 
+#include 
+
+int main(void)
+{
+const unsigned short opcodes[] = {
+0xc005,  /* brasl %r0 */
+0xc0f4,  /* brcl 0xf */
+};
+size_t length = 0x10006;
+unsigned char *buf;
+int i;
+
+buf = mmap(NULL, length, PROT_READ | PROT_WRITE | PROT_EXEC,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+assert(buf != MAP_FAILED);
+
+*(unsigned short *)[0] = 0x07fe;  /* br %r14 */
+*(unsigned int *)[0x10002] = 0x8000;
+for (i = 0; i < sizeof(opcodes) / sizeof(opcodes[0]); i++) {
+*(unsigned short *)[0x1] = opcodes[i];
+((void (*)(void))[0x1])();
+}
+
+munmap(buf, length);
+
+return 0;
+}
-- 
2.35.1




[PATCH 0/3] Fix BRASL and BRCL with large negative offsets

2022-03-11 Thread Ilya Leoshkevich
Hi,

I noticed that sometimes jumping backwards leads to crashes or hangs.
The problem is a missing cast.
Patches 1 and 2 fix the problem, patch 3 adds a test.

Best regards,
Ilya

Ilya Leoshkevich (3):
  s390x/tcg: Fix BRASL with a large negative offset
  s390x/tcg: Fix BRCL with a large negative offset
  tests/tcg/s390x: Test BRASL and BRCL with large negative offsets

 target/s390x/tcg/translate.c   |  4 ++--
 tests/tcg/s390x/Makefile.target|  1 +
 tests/tcg/s390x/branch-relative-long.c | 29 ++
 3 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/s390x/branch-relative-long.c

-- 
2.35.1




Re: [PATCH v2 1/9] dump: Use ERRP_GUARD()

2022-03-11 Thread Richard Henderson

On 3/10/22 03:08, Janosch Frank wrote:

Let's move to the new way of handling errors before changing the dump
code. This patch has mostly been generated by the coccinelle script
scripts/coccinelle/errp-guard.cocci.

Signed-off-by: Janosch Frank
---
  dump/dump.c | 144 ++--
  1 file changed, 61 insertions(+), 83 deletions(-)


Reviewed-by: Richard Henderson 

r~



XIVE VFIO kernel resample failure in INTx mode under heavy load

2022-03-11 Thread Timothy Pearson
All,

I've been struggling for some time with what is looking like a potential bug in 
QEMU/KVM on the POWER9 platform.  It appears that in XIVE mode, when the 
in-kernel IRQ chip is enabled, an external device that rapidly asserts IRQs via 
the legacy INTx level mechanism will only receive one interrupt in the KVM 
guest.

Changing any one of those items appears to avoid the glitch, e.g. XICS mode 
with the in-kernel IRQ chip works (all interrupts are passed through), and XIVE 
mode with the in-kernel IRQ chip disabled also works.  We are also not seeing 
any problems in XIVE mode with the in-kernel chip from MSI/MSI-X devices.

The device in question is a real time card that needs to raise an interrupt 
every 1ms.  It works perfectly on the host, but fails in the guest -- with the 
in-kernel IRQ chip and XIVE enabled, it receives exactly one interrupt, at 
which point the host continues to see INTx+ but the guest sees INTX-, and the 
IRQ handler in the guest kernel is never reentered.

We have also seen some very rare glitches where, over a long period of time, we 
can enter a similar deadlock in XICS mode.  Disabling the in-kernel IRQ chip in 
XIVE mode will also lead to the lockup with this device, since the userspace 
IRQ emulation cannot keep up with the rapid interrupt firing (measurements show 
around 100ms required for processing each interrupt in the user mode).

My understanding is the resample mechanism does some clever tricks with level 
IRQs, but that QEMU needs to check if the IRQ is still asserted by the device 
on guest EOI.  Since a failure here would explain these symptoms I'm wondering 
if there is a bug in either QEMU or KVM for POWER / pSeries (SPAPr) where the 
IRQ is not resampled and therefore not re-fired in the guest?

Unfortunately I lack the resources at the moment to dig through the QEMU 
codebase and try to find the bug.  Any IBMers here that might be able to help 
out?  I can provide access to a test setup if desired.

Thanks!



Re: 'make check-acceptance' failing on s390 tests?

2022-03-11 Thread Thomas Huth

On 18/02/2022 16.04, Peter Maydell wrote:

Hi; is anybody else seeing 'make check-acceptance' fail on some of
the s390 tests?

  (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
(900.20 s)

[...]

Not sure about the timeout on the boot test: the avocado log
shows it booting at least as far as
"Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
and then there's no further output until the timeout.


Now that I've finally been able to run the test again (after
manually tweaking that borked is_port_free() function in
avocado), I've had a closer look at the failing BootLinuxS390X
test: If you're looking at the output of the guest in the log,
you can see that it fails to init the cloud-init stuff and
thus fails to "phone home" at the end.

This used to work fine in older versions, so I just spent a
lot of time bisecting this issue and ended up here:

f83bcecb1ffe25a18367409eaf4ba1453c835c48 is the first bad commit
commit f83bcecb1ffe25a18367409eaf4ba1453c835c48
Author: Richard Henderson 
Date:   Tue Jul 27 07:48:55 2021 -1000

accel/tcg: Add cpu_{ld,st}*_mmu interfaces

Richard, could you please have a look at this one, too? ... it
causes the test to fail:

$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48~1
$ ./configure --target-list=s390x-softmmu --disable-docs
$ make -j8
$ make check-venv
$ cd build
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID : 0a6d287620d150d52c24417d0a672a1a826b3a82
JOB LOG: 
/home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
 (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: 
PASS (130.38 s)
RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 136.51 s
$ grep cloud-ini 
/home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
...
2022-03-11 18:31:52,745 datadrainer  L0193 DEBUG| [  OK  ] Started Initial 
cloud-init…ob (metadata service crawler).

$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48
$ make -j8
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID : cb143be36631515f74cb6de2b263dfe1bc0f9709
JOB LOG: 
/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
 (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: 
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout 
reached\nOriginal status: ERROR\n{'name': 
'1-tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 
'logdir': 
'/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/test-res... 
(900.97 s)
RESULTS: PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB TIME   : 907.16 s
$ grep cloud-ini 
/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
2022-03-11 18:35:15,106 datadrainer  L0193 DEBUG|  Starting Initial 
cloud-init job (pre-networking)...
2022-03-11 18:35:21,691 datadrainer  L0193 DEBUG| [FAILED] Failed to start 
Initial cloud-init job (pre-networking).
...

 Thomas




Re: [PATCH] tcg/arm: Don't emit UNPREDICTABLE LDRD with Rm == Rt or Rt+1

2022-03-11 Thread Richard Henderson

On 3/11/22 06:02, Alex Bennée wrote:

A wider question. Is this something that can be handled the constraints
done by the register allocator? I assume that avoid direct aliasing if
needed?


No.  We do have "allocate a non-overlapping register"; we don't have "allocate an aligned 
register pair", which *would* be helpful.


However, in this specific case "addend" is completely invisible to the register allocator, 
coming entirely from the backend's tlb implementation (or guest_base).



r~



Re: [PATCH 5/6] 9pfs: fix 'Twalk' to only send error if no component walked

2022-03-11 Thread Christian Schoenebeck
On Freitag, 11. März 2022 18:08:38 CET Greg Kurz wrote:
> On Fri, 11 Mar 2022 17:44:54 +0100
> 
> Christian Schoenebeck  wrote:
> > On Freitag, 11. März 2022 17:35:41 CET Greg Kurz wrote:
> > > On Thu, 10 Mar 2022 10:13:33 +0100
> > > 
> > > Christian Schoenebeck  wrote:
> > > > On Mittwoch, 9. März 2022 18:57:39 CET Christian Schoenebeck wrote:
> > > > > Current implementation of 'Twalk' request handling always sends an
> > > > > 'Rerror'
> > > > > 
> > > > > response if any error occured. The 9p2000 protocol spec sais though:
> > > > >   "
> > > > >   If the first element cannot be walked for any reason, Rerror is
> > > > >   returned.
> > > > >   Otherwise, the walk will return an Rwalk message containing nwqid
> > > > >   qids
> > > > >   corresponding, in order, to the files that are visited by the
> > > > >   nwqid
> > > > >   successful elementwise walks; nwqid is therefore either nwname or
> > > > >   the
> > > > > 
> > > > > index of the first elementwise walk that failed.
> > > > > 
> > > > >   "
> > > > >   
> > > > >   http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33
> > > > > 
> > > > > For that reason we are no longer leaving from an error path in
> > > > > function
> > > > > v9fs_walk(), unless really no path component could be walked
> > > > > successfully or if the request has been interrupted.
> > > > > 
> > > > > Local variable 'nvalid' counts and reflects the number of path
> > > > > components
> > > > > successfully processed by background I/O thread, whereas local
> > > > > variable
> > > > > 'name_idx' subsequently counts and reflects the number of path
> > > > > components
> > > > > eventually accepted successfully by 9p server controller portion.
> > > > > 
> > > > > New local variable 'any_err' is an aggregate variable reflecting
> > > > > whether
> > > > > any error occurred at all, while already existing variable 'err'
> > > > > only
> > > > > reflects the last error.
> > > > > 
> > > > > Despite QIDs being delivered to client in a more relaxed way now, it
> > > > > is
> > > > > important to note though that fid still must remain uneffacted if
> > > > > any
> > > > > error
> > > > 
> > > > Typo: should be "unaffected".
> > > > 
> > > > > occurred.
> > > > > 
> > > > > Signed-off-by: Christian Schoenebeck 
> > > > > ---
> > > > > 
> > > > >  hw/9pfs/9p.c | 29 +
> > > > >  1 file changed, 21 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > > > index 6cdc566866..8ccd180608 100644
> > > > > --- a/hw/9pfs/9p.c
> > > > > +++ b/hw/9pfs/9p.c
> > > > > @@ -1766,7 +1766,7 @@ static void coroutine_fn v9fs_walk(void
> > > > > *opaque)
> > > > > 
> > > > >  {
> > > > >  
> > > > >  int name_idx, nvalid;
> > > > >  g_autofree V9fsQID *qids = NULL;
> > > > > 
> > > > > -int i, err = 0;
> > > > > +int i, err = 0, any_err = 0;
> > > > > 
> > > > >  V9fsPath dpath, path;
> > > > >  P9ARRAY_REF(V9fsPath) pathes = NULL;
> > > > >  uint16_t nwnames;
> > > > > 
> > > > > @@ -1832,6 +1832,7 @@ static void coroutine_fn v9fs_walk(void
> > > > > *opaque)
> > > > > 
> > > > >   * driver code altogether inside the following block.
> > > > >   */
> > > > >  
> > > > >  v9fs_co_run_in_worker({
> > > > > 
> > > > > +nvalid = 0;
> > > > > 
> > > > >  if (v9fs_request_cancelled(pdu)) {
> > > > >  
> > > > >  err = -EINTR;
> > > > >  break;
> > > > > 
> > > > > @@ -1842,7 +1843,7 @@ static void coroutine_fn v9fs_walk(void
> > > > > *opaque)
> > > > > 
> > > > >  break;
> > > > >  
> > > > >  }
> > > > >  stbuf = fidst;
> > > > > 
> > > > > -for (nvalid = 0; nvalid < nwnames; nvalid++) {
> > > > > +for (; nvalid < nwnames; nvalid++) {
> > > > > 
> > > > >  if (v9fs_request_cancelled(pdu)) {
> > > > >  
> > > > >  err = -EINTR;
> > > > >  break;
> > > > > 
> > > > > @@ -1874,12 +1875,13 @@ static void coroutine_fn v9fs_walk(void
> > > > > *opaque)
> > > > > 
> > > > >  /*
> > > > >  
> > > > >   * Handle all the rest of this Twalk request on main thread ...
> > > > >   */
> > > > > 
> > > > > -if (err < 0) {
> > > > > +if ((err < 0 && !nvalid) || err == -EINTR) {
> > > > > 
> > > > >  goto out;
> > > > >  
> > > > >  }
> > > > > 
> > > > > +any_err |= err;
> > > > > 
> > > > >  err = stat_to_qid(pdu, , );
> > > > > 
> > > > > -if (err < 0) {
> > > > > +if (err < 0 && !nvalid) {
> > > > > 
> > > > >  goto out;
> > > > >  
> > > > >  }
> > > > >  stbuf = fidst;
> > > > > 
> > > > > @@ -1888,20 +1890,30 @@ static void coroutine_fn v9fs_walk(void
> > > > > *opaque)
> > > > > 
> > > > >  v9fs_path_copy(, >path);
> > > > >  v9fs_path_copy(, >path);
> > > > > 
> > > > > -for (name_idx = 0; name_idx < nwnames; name_idx++) {
> > > > > +for (name_idx = 0; 

Re: [PATCH 3/6] tests/9pfs: compare QIDs in fs_walk_none() test

2022-03-11 Thread Christian Schoenebeck
On Freitag, 11. März 2022 18:02:36 CET Greg Kurz wrote:
> On Fri, 11 Mar 2022 17:39:56 +0100
> 
> Christian Schoenebeck  wrote:
> > On Freitag, 11. März 2022 17:11:24 CET Greg Kurz wrote:
> > > On Thu, 10 Mar 2022 10:04:50 +0100
> > > 
> > > Christian Schoenebeck  wrote:
> > > > On Mittwoch, 9. März 2022 15:49:04 CET Christian Schoenebeck wrote:
> > > > > Extend previously added fs_walk_none() test by comparing the QID
> > > > > of the root fid with the QID of the cloned fid. They should be
> > > > > equal.
> > > 
> > > Ha, I understand your suggestion of changing the name now :-) but I'll
> > > personally leave it named according to the test scenario of "sending
> > > a Twalk with no names" and checking everything that is expected in this
> > > case.
> > 
> > NP
> > 
> > > > > Signed-off-by: Christian Schoenebeck 
> > > > > ---
> > > > > 
> > > > >  tests/qtest/virtio-9p-test.c | 70
> > > > >  
> > > > >  1 file changed, 70 insertions(+)
> > > > > 
> > > > > diff --git a/tests/qtest/virtio-9p-test.c
> > > > > b/tests/qtest/virtio-9p-test.c
> > > > > index 6c00da03f4..9098e21173 100644
> > > > > --- a/tests/qtest/virtio-9p-test.c
> > > > > +++ b/tests/qtest/virtio-9p-test.c
> > > > > @@ -146,6 +146,11 @@ static void v9fs_uint16_read(P9Req *req,
> > > > > uint16_t
> > > > > *val) le16_to_cpus(val);
> > > > > 
> > > > >  }
> > > > > 
> > > > > +static void v9fs_int16_read(P9Req *req, int16_t *val)
> > > > > +{
> > > > > +v9fs_uint16_read(req, (uint16_t *)val);
> > > > > +}
> > > > > +
> > > > > 
> > > > >  static void v9fs_uint32_write(P9Req *req, uint32_t val)
> > > > >  {
> > > > >  
> > > > >  uint32_t le_val = cpu_to_le32(val);
> > > > > 
> > > > > @@ -166,12 +171,22 @@ static void v9fs_uint32_read(P9Req *req,
> > > > > uint32_t
> > > > > *val) le32_to_cpus(val);
> > > > > 
> > > > >  }
> > > > > 
> > > > > +static void v9fs_int32_read(P9Req *req, int32_t *val)
> > > > > +{
> > > > > +v9fs_uint32_read(req, (uint32_t *)val);
> > > > > +}
> > > > > +
> > > > > 
> > > > >  static void v9fs_uint64_read(P9Req *req, uint64_t *val)
> > > > >  {
> > > > >  
> > > > >  v9fs_memread(req, val, 8);
> > > > >  le64_to_cpus(val);
> > > > >  
> > > > >  }
> > > > > 
> > > > > +static void v9fs_int64_read(P9Req *req, int64_t *val)
> > > > > +{
> > > > > +v9fs_uint64_read(req, (uint64_t *)val);
> > > > > +}
> > > > > +
> > > > > 
> > > > >  /* len[2] string[len] */
> > > > >  static uint16_t v9fs_string_size(const char *string)
> > > > >  {
> > > > > 
> > > > > @@ -425,6 +440,40 @@ static void v9fs_rwalk(P9Req *req, uint16_t
> > > > > *nwqid,
> > > > > v9fs_qid **wqid) v9fs_req_free(req);
> > > > > 
> > > > >  }
> > > > > 
> > > > > +/* size[4] Tstat tag[2] fid[4] */
> > > > > +static P9Req *v9fs_tstat(QVirtio9P *v9p, uint32_t fid, uint16_t
> > > > > tag)
> > > 
> > > Tstat/Rstat aren't part of 9p2000.L, you should use Tgetattr/Rgetattr
> > > instead (see https://github.com/chaos/diod/blob/master/protocol.md).
> > 
> > Ah right, I forgot.
> > 
> > > > > +{
> > > > > +P9Req *req;
> > > > > +
> > > > > +req = v9fs_req_init(v9p, 4, P9_TSTAT, tag);
> > > > > +v9fs_uint32_write(req, fid);
> > > > > +v9fs_req_send(req);
> > > > > +return req;
> > > > > +}
> > > > > +
> > > > > +/* size[4] Rstat tag[2] stat[n] */
> > > > > +static void v9fs_rstat(P9Req *req, struct V9fsStat *st)
> > > > > +{
> > > > > +v9fs_req_recv(req, P9_RSTAT);
> > > > > +
> > > 
> > > For the records, this is a stat[n], i.e. "n[2] followed by n bytes of
> > > data forming the parameter", so you should read an uint16_t first.
> > > 
> > > > > +v9fs_int16_read(req, >size);
> > 
> > Which I did here? --^
> 
> From the BUGS section of
> https://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 :
> 
>  BUGS
>   To make the contents of a directory, such as returned by
>   read(5), easy to parse, each directory entry begins with a
>   size field.  For consistency, the entries in Twstat and
>   Rstat messages also contain their size, which means the size
>   
>   appears twice.  For example, the Rstat message is formatted
>   ^
>   as ``(4+1+2+2+n)[4] Rstat tag[2] n[2] (n-2)[2] type[2]
>   dev[4]...,'' where n is the value returned by Styx->packdir.
> 
> I realized that when giving a try to convert a v9fs_qid to a V9fsQID on
> top of this patch.

Ouch, what a trap. Yeah, I didn't realize that.

> > > > > +v9fs_int16_read(req, >type);
> > > > > +v9fs_int32_read(req, >dev);
> > > > > +v9fs_uint8_read(req, >qid.type);
> > > > > +v9fs_uint32_read(req, >qid.version);
> > > > > +v9fs_uint64_read(req, >qid.path);
> > > > > +v9fs_int32_read(req, >mode);
> > > > > +v9fs_int32_read(req, >mtime);
> > > > > +v9fs_int32_read(req, >atime);
> > > > > +v9fs_int64_read(req, >length);
> > > > > +

Re: [PATCH 6/9] tests/avocado/virtiofs_submounts.py: shared_dir may not exist

2022-03-11 Thread Beraldo Leal
On Fri, Feb 25, 2022 at 04:01:53PM -0500, Cleber Rosa wrote:
> If the test is skipped because of their conditionals, the shared_dir
> attribute may not exist.
> 
> Check for its existence in the tearDown() method to avoid and
> AttributeError.
> 
> Signed-off-by: Cleber Rosa 
> ---
>  tests/avocado/virtiofs_submounts.py | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/avocado/virtiofs_submounts.py 
> b/tests/avocado/virtiofs_submounts.py
> index e6dc32ffd4..d9c2c9d9ef 100644
> --- a/tests/avocado/virtiofs_submounts.py
> +++ b/tests/avocado/virtiofs_submounts.py
> @@ -157,9 +157,10 @@ def tearDown(self):
>  except:
>  pass
>  
> -scratch_dir = os.path.join(self.shared_dir, 'scratch')
> -self.run(('bash', self.get_data('cleanup.sh'), scratch_dir),
> - ignore_error=True)
> +if hasattr(self, 'shared_dir'):
> +scratch_dir = os.path.join(self.shared_dir, 'scratch')
> +self.run(('bash', self.get_data('cleanup.sh'), scratch_dir),
> + ignore_error=True)
>  
>  def test_pre_virtiofsd_set_up(self):
>  self.set_up_shared_dir()

Reviewed-by: Beraldo Leal 

--
Beraldo




[PATCH v2] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Beraldo Leal
Race conditions can happen with the current code, because the port that
was available might not be anymore by the time the server is started.

By setting the port to 0, PhoneServer it will use the OS default
behavior to get a free port, then we save this information so we can
later configure the guest.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Beraldo Leal 
---
 tests/avocado/avocado_qemu/__init__.py | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index 9b056b5ce5..ac85e36a4d 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -18,7 +18,7 @@
 import uuid
 
 import avocado
-from avocado.utils import cloudinit, datadrainer, network, process, ssh, 
vmimage
+from avocado.utils import cloudinit, datadrainer, process, ssh, vmimage
 from avocado.utils.path import find_command
 
 #: The QEMU build root directory.  It may also be the source directory
@@ -602,9 +602,6 @@ def prepare_cloudinit(self, ssh_pubkey=None):
 self.log.info('Preparing cloudinit image')
 try:
 cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
-self.phone_home_port = network.find_free_port()
-if not self.phone_home_port:
-self.cancel('Failed to get a free port')
 pubkey_content = None
 if ssh_pubkey:
 with open(ssh_pubkey) as pubkey:
@@ -614,7 +611,7 @@ def prepare_cloudinit(self, ssh_pubkey=None):
   password=self.password,
   # QEMU's hard coded usermode router address
   phone_home_host='10.0.2.2',
-  phone_home_port=self.phone_home_port,
+  phone_home_port=self.phone_server.server_port,
   authorized_key=pubkey_content)
 except Exception:
 self.cancel('Failed to prepare the cloudinit image')
@@ -625,6 +622,8 @@ def set_up_boot(self):
 self.vm.add_args('-drive', 'file=%s' % path)
 
 def set_up_cloudinit(self, ssh_pubkey=None):
+self.phone_server = cloudinit.PhoneHomeServer(('0.0.0.0', 0),
+  self.name)
 cloudinit_iso = self.prepare_cloudinit(ssh_pubkey)
 self.vm.add_args('-drive', 'file=%s,format=raw' % cloudinit_iso)
 
@@ -635,7 +634,9 @@ def launch_and_wait(self, set_up_ssh_connection=True):
  
logger=self.log.getChild('console'))
 console_drainer.start()
 self.log.info('VM launched, waiting for boot confirmation from guest')
-cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
self.name)
+while not self.phone_server.instance_phoned_back:
+self.phone_server.handle_request()
+
 if set_up_ssh_connection:
 self.log.info('Setting up the SSH connection')
 self.ssh_connect(self.username, self.ssh_key)
-- 
2.31.1




Re: [PATCH 5/6] 9pfs: fix 'Twalk' to only send error if no component walked

2022-03-11 Thread Greg Kurz
On Fri, 11 Mar 2022 17:44:54 +0100
Christian Schoenebeck  wrote:

> On Freitag, 11. März 2022 17:35:41 CET Greg Kurz wrote:
> > On Thu, 10 Mar 2022 10:13:33 +0100
> > 
> > Christian Schoenebeck  wrote:
> > > On Mittwoch, 9. März 2022 18:57:39 CET Christian Schoenebeck wrote:
> > > > Current implementation of 'Twalk' request handling always sends an
> > > > 'Rerror'
> > > > 
> > > > response if any error occured. The 9p2000 protocol spec sais though:
> > > >   "
> > > >   If the first element cannot be walked for any reason, Rerror is
> > > >   returned.
> > > >   Otherwise, the walk will return an Rwalk message containing nwqid qids
> > > >   corresponding, in order, to the files that are visited by the nwqid
> > > >   successful elementwise walks; nwqid is therefore either nwname or the
> > > > 
> > > > index of the first elementwise walk that failed.
> > > > 
> > > >   "
> > > >   
> > > >   http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33
> > > > 
> > > > For that reason we are no longer leaving from an error path in function
> > > > v9fs_walk(), unless really no path component could be walked
> > > > successfully or if the request has been interrupted.
> > > > 
> > > > Local variable 'nvalid' counts and reflects the number of path
> > > > components
> > > > successfully processed by background I/O thread, whereas local variable
> > > > 'name_idx' subsequently counts and reflects the number of path
> > > > components
> > > > eventually accepted successfully by 9p server controller portion.
> > > > 
> > > > New local variable 'any_err' is an aggregate variable reflecting whether
> > > > any error occurred at all, while already existing variable 'err' only
> > > > reflects the last error.
> > > > 
> > > > Despite QIDs being delivered to client in a more relaxed way now, it is
> > > > important to note though that fid still must remain uneffacted if any
> > > > error
> > > 
> > > Typo: should be "unaffected".
> > > 
> > > > occurred.
> > > > 
> > > > Signed-off-by: Christian Schoenebeck 
> > > > ---
> > > > 
> > > >  hw/9pfs/9p.c | 29 +
> > > >  1 file changed, 21 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > > index 6cdc566866..8ccd180608 100644
> > > > --- a/hw/9pfs/9p.c
> > > > +++ b/hw/9pfs/9p.c
> > > > @@ -1766,7 +1766,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > > 
> > > >  {
> > > >  
> > > >  int name_idx, nvalid;
> > > >  g_autofree V9fsQID *qids = NULL;
> > > > 
> > > > -int i, err = 0;
> > > > +int i, err = 0, any_err = 0;
> > > > 
> > > >  V9fsPath dpath, path;
> > > >  P9ARRAY_REF(V9fsPath) pathes = NULL;
> > > >  uint16_t nwnames;
> > > > 
> > > > @@ -1832,6 +1832,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > > 
> > > >   * driver code altogether inside the following block.
> > > >   */
> > > >  
> > > >  v9fs_co_run_in_worker({
> > > > 
> > > > +nvalid = 0;
> > > > 
> > > >  if (v9fs_request_cancelled(pdu)) {
> > > >  
> > > >  err = -EINTR;
> > > >  break;
> > > > 
> > > > @@ -1842,7 +1843,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > > 
> > > >  break;
> > > >  
> > > >  }
> > > >  stbuf = fidst;
> > > > 
> > > > -for (nvalid = 0; nvalid < nwnames; nvalid++) {
> > > > +for (; nvalid < nwnames; nvalid++) {
> > > > 
> > > >  if (v9fs_request_cancelled(pdu)) {
> > > >  
> > > >  err = -EINTR;
> > > >  break;
> > > > 
> > > > @@ -1874,12 +1875,13 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > > 
> > > >  /*
> > > >  
> > > >   * Handle all the rest of this Twalk request on main thread ...
> > > >   */
> > > > 
> > > > -if (err < 0) {
> > > > +if ((err < 0 && !nvalid) || err == -EINTR) {
> > > > 
> > > >  goto out;
> > > >  
> > > >  }
> > > > 
> > > > +any_err |= err;
> > > > 
> > > >  err = stat_to_qid(pdu, , );
> > > > 
> > > > -if (err < 0) {
> > > > +if (err < 0 && !nvalid) {
> > > > 
> > > >  goto out;
> > > >  
> > > >  }
> > > >  stbuf = fidst;
> > > > 
> > > > @@ -1888,20 +1890,30 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > > 
> > > >  v9fs_path_copy(, >path);
> > > >  v9fs_path_copy(, >path);
> > > > 
> > > > -for (name_idx = 0; name_idx < nwnames; name_idx++) {
> > > > +for (name_idx = 0; name_idx < nvalid; name_idx++) {
> > > > 
> > > >  if (!same_stat_id(>s->root_st, ) ||
> > > >  
> > > >  strcmp("..", wnames[name_idx].data))
> > > >  
> > > >  {
> > > >  
> > > >  stbuf = stbufs[name_idx];
> > > >  err = stat_to_qid(pdu, , );
> > > >  if (err < 0) {
> > > > 
> > > > -goto out;
> > > > +break;
> > > > 
> > > >  }
> > 

Re: [PATCH 3/6] tests/9pfs: compare QIDs in fs_walk_none() test

2022-03-11 Thread Greg Kurz
On Fri, 11 Mar 2022 17:39:56 +0100
Christian Schoenebeck  wrote:

> On Freitag, 11. März 2022 17:11:24 CET Greg Kurz wrote:
> > On Thu, 10 Mar 2022 10:04:50 +0100
> > 
> > Christian Schoenebeck  wrote:
> > > On Mittwoch, 9. März 2022 15:49:04 CET Christian Schoenebeck wrote:
> > > > Extend previously added fs_walk_none() test by comparing the QID
> > > > of the root fid with the QID of the cloned fid. They should be
> > > > equal.
> > 
> > Ha, I understand your suggestion of changing the name now :-) but I'll
> > personally leave it named according to the test scenario of "sending
> > a Twalk with no names" and checking everything that is expected in this
> > case.
> 
> NP
> 
> > > > Signed-off-by: Christian Schoenebeck 
> > > > ---
> > > > 
> > > >  tests/qtest/virtio-9p-test.c | 70 
> > > >  1 file changed, 70 insertions(+)
> > > > 
> > > > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > > > index 6c00da03f4..9098e21173 100644
> > > > --- a/tests/qtest/virtio-9p-test.c
> > > > +++ b/tests/qtest/virtio-9p-test.c
> > > > @@ -146,6 +146,11 @@ static void v9fs_uint16_read(P9Req *req, uint16_t
> > > > *val) le16_to_cpus(val);
> > > > 
> > > >  }
> > > > 
> > > > +static void v9fs_int16_read(P9Req *req, int16_t *val)
> > > > +{
> > > > +v9fs_uint16_read(req, (uint16_t *)val);
> > > > +}
> > > > +
> > > > 
> > > >  static void v9fs_uint32_write(P9Req *req, uint32_t val)
> > > >  {
> > > >  
> > > >  uint32_t le_val = cpu_to_le32(val);
> > > > 
> > > > @@ -166,12 +171,22 @@ static void v9fs_uint32_read(P9Req *req, uint32_t
> > > > *val) le32_to_cpus(val);
> > > > 
> > > >  }
> > > > 
> > > > +static void v9fs_int32_read(P9Req *req, int32_t *val)
> > > > +{
> > > > +v9fs_uint32_read(req, (uint32_t *)val);
> > > > +}
> > > > +
> > > > 
> > > >  static void v9fs_uint64_read(P9Req *req, uint64_t *val)
> > > >  {
> > > >  
> > > >  v9fs_memread(req, val, 8);
> > > >  le64_to_cpus(val);
> > > >  
> > > >  }
> > > > 
> > > > +static void v9fs_int64_read(P9Req *req, int64_t *val)
> > > > +{
> > > > +v9fs_uint64_read(req, (uint64_t *)val);
> > > > +}
> > > > +
> > > > 
> > > >  /* len[2] string[len] */
> > > >  static uint16_t v9fs_string_size(const char *string)
> > > >  {
> > > > 
> > > > @@ -425,6 +440,40 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid,
> > > > v9fs_qid **wqid) v9fs_req_free(req);
> > > > 
> > > >  }
> > > > 
> > > > +/* size[4] Tstat tag[2] fid[4] */
> > > > +static P9Req *v9fs_tstat(QVirtio9P *v9p, uint32_t fid, uint16_t tag)
> > 
> > Tstat/Rstat aren't part of 9p2000.L, you should use Tgetattr/Rgetattr
> > instead (see https://github.com/chaos/diod/blob/master/protocol.md).
> 
> Ah right, I forgot.
> 
> > > > +{
> > > > +P9Req *req;
> > > > +
> > > > +req = v9fs_req_init(v9p, 4, P9_TSTAT, tag);
> > > > +v9fs_uint32_write(req, fid);
> > > > +v9fs_req_send(req);
> > > > +return req;
> > > > +}
> > > > +
> > > > +/* size[4] Rstat tag[2] stat[n] */
> > > > +static void v9fs_rstat(P9Req *req, struct V9fsStat *st)
> > > > +{
> > > > +v9fs_req_recv(req, P9_RSTAT);
> > > > +
> > 
> > For the records, this is a stat[n], i.e. "n[2] followed by n bytes of
> > data forming the parameter", so you should read an uint16_t first.
> > 
> > > > +v9fs_int16_read(req, >size);
> 
> Which I did here? --^
> 

>From the BUGS section of 
>https://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 :

 BUGS
  To make the contents of a directory, such as returned by
  read(5), easy to parse, each directory entry begins with a
  size field.  For consistency, the entries in Twstat and
  Rstat messages also contain their size, which means the size
  
  appears twice.  For example, the Rstat message is formatted
  ^
  as ``(4+1+2+2+n)[4] Rstat tag[2] n[2] (n-2)[2] type[2]
  dev[4]...,'' where n is the value returned by Styx->packdir.

I realized that when giving a try to convert a v9fs_qid to a V9fsQID on
top of this patch.

> > > > +v9fs_int16_read(req, >type);
> > > > +v9fs_int32_read(req, >dev);
> > > > +v9fs_uint8_read(req, >qid.type);
> > > > +v9fs_uint32_read(req, >qid.version);
> > > > +v9fs_uint64_read(req, >qid.path);
> > > > +v9fs_int32_read(req, >mode);
> > > > +v9fs_int32_read(req, >mtime);
> > > > +v9fs_int32_read(req, >atime);
> > > > +v9fs_int64_read(req, >length);
> > > > +v9fs_string_read(req, >name.size, >name.data);
> > > > +v9fs_string_read(req, >uid.size, >uid.data);
> > > > +v9fs_string_read(req, >gid.size, >gid.data);
> > > > +v9fs_string_read(req, >muid.size, >muid.data);
> > > > +
> > > > +v9fs_req_free(req);
> > > > +}
> > > > +
> > > > 
> > > >  /* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
> > > >  static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t
> > > >  

Re: [PATCH V7 11/29] qapi: list utility functions

2022-03-11 Thread Steven Sistare
On 3/9/2022 9:11 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Dec 22, 2021 at 11:42 PM Steve Sistare  > wrote:
> 
> Generalize strList_from_comma_list() to take any delimiter character, 
> rename
> as strList_from_string(), and move it to qapi/util.c.  Also add
> strv_from_strList() and QAPI_LIST_LENGTH().
> 
> Looks like you could easily split, and add some tests.

Will do.  
I don't see any tests that include qapi/util.h, so this will be a new test file.

For the split, how about:
  patch: qapi: strList_from_string
  patch: qapi: strv_from_strList
  patch: qapi: QAPI_LIST_LENGTH
  patch: qapi: unit tests for lists

Or do you prefer that unit tests be pushed with each function's patch?

> No functional change.
> 
> Signed-off-by: Steve Sistare  >
> ---
>  include/qapi/util.h | 28 
>  monitor/hmp-cmds.c  | 29 ++---
>  qapi/qapi-util.c    | 37 +
>  3 files changed, 67 insertions(+), 27 deletions(-)
> 
> diff --git a/include/qapi/util.h b/include/qapi/util.h
> index 81a2b13..c249108 100644
> --- a/include/qapi/util.h
> +++ b/include/qapi/util.h
> @@ -22,6 +22,8 @@ typedef struct QEnumLookup {
>      const int size;
>  } QEnumLookup;
> 
> +struct strList;
> +
>  const char *qapi_enum_lookup(const QEnumLookup *lookup, int val);
>  int qapi_enum_parse(const QEnumLookup *lookup, const char *buf,
>                      int def, Error **errp);
> @@ -31,6 +33,19 @@ bool qapi_bool_parse(const char *name, const char 
> *value, bool *obj,
>  int parse_qapi_name(const char *name, bool complete);
> 
>  /*
> + * Produce and return a NULL-terminated array of strings from @args.
> + * All strings are g_strdup'd.
> + */
> +char **strv_from_strList(const struct strList *args);
> 
> +
> 
> I'd suggest to use the dedicated glib type GStrv

Will do, here and in related code.

- Steve

> +/*
> + * Produce a strList from the character delimited string @in.
> + * All strings are g_strdup'd.
> + * A NULL or empty input string returns NULL.
> + */
> +struct strList *strList_from_string(const char *in, char delim);
> +
> +/*
>   * For any GenericList @list, insert @element at the front.
>   *
>   * Note that this macro evaluates @element exactly once, so it is safe
> @@ -56,4 +71,17 @@ int parse_qapi_name(const char *name, bool complete);
>      (tail) = &(*(tail))->next; \
>  } while (0)
> 
> +/*
> + * For any GenericList @list, return its length.
> + */
> +#define QAPI_LIST_LENGTH(list) \
> +    ({ \
> +        int len = 0; \
> +        typeof(list) elem; \
> +        for (elem = list; elem != NULL; elem = elem->next) { \
> +            len++; \
> +        } \
> +        len; \
> +    })
> +
>  #endif
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index b8c22da..5ca8b4b 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -43,6 +43,7 @@
>  #include "qapi/qapi-commands-run-state.h"
>  #include "qapi/qapi-commands-tpm.h"
>  #include "qapi/qapi-commands-ui.h"
> +#include "qapi/util.h"
>  #include "qapi/qapi-visit-net.h"
>  #include "qapi/qapi-visit-migration.h"
>  #include "qapi/qmp/qdict.h"
> @@ -70,32 +71,6 @@ bool hmp_handle_error(Monitor *mon, Error *err)
>      return false;
>  }
> 
> -/*
> - * Produce a strList from a comma separated list.
> - * A NULL or empty input string return NULL.
> - */
> -static strList *strList_from_comma_list(const char *in)
> -{
> -    strList *res = NULL;
> -    strList **tail = 
> -
> -    while (in && in[0]) {
> -        char *comma = strchr(in, ',');
> -        char *value;
> -
> -        if (comma) {
> -            value = g_strndup(in, comma - in);
> -            in = comma + 1; /* skip the , */
> -        } else {
> -            value = g_strdup(in);
> -            in = NULL;
> -        }
> -        QAPI_LIST_APPEND(tail, value);
> -    }
> -
> -    return res;
> -}
> -
>  void hmp_info_name(Monitor *mon, const QDict *qdict)
>  {
>      NameInfo *info;
> @@ -1103,7 +1078,7 @@ void hmp_announce_self(Monitor *mon, const QDict 
> *qdict)
>                                              migrate_announce_params());
> 
>      qapi_free_strList(params->interfaces);
> -    params->interfaces = strList_from_comma_list(interfaces_str);
> +    params->interfaces = strList_from_string(interfaces_str, ',');
>      params->has_interfaces = params->interfaces != NULL;
>      params->id = g_strdup(id);
>      params->has_id = !!params->id;
> diff --git a/qapi/qapi-util.c b/qapi/qapi-util.c
> 

Re: [PATCH] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Beraldo Leal
On Fri, Mar 11, 2022 at 11:18:38AM -0500, Cleber Rosa wrote:
> 
> Beraldo Leal  writes:
> 
> > On Fri, Mar 11, 2022 at 09:28:24AM -0500, Cleber Rosa wrote:
> >> 
> >> Beraldo Leal  writes:
> >> 
> >> > Race conditions can happen with the current code, because the port that
> >> > was available might not be anymore by the time the server is started.
> >> >
> >> > By setting the port to 0, PhoneServer it will use the OS default
> >> > behavior to get a free port, then we save this information so we can
> >> > later configure the guest.
> >> >
> >> > Suggested-by: Daniel P. Berrangé 
> >> > Signed-off-by: Beraldo Leal 
> >> > ---
> >> >  tests/avocado/avocado_qemu/__init__.py | 13 -
> >> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/tests/avocado/avocado_qemu/__init__.py 
> >> > b/tests/avocado/avocado_qemu/__init__.py
> >> > index 9b056b5ce5..e830d04b84 100644
> >> > --- a/tests/avocado/avocado_qemu/__init__.py
> >> > +++ b/tests/avocado/avocado_qemu/__init__.py
> >> > @@ -602,9 +602,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
> >> >  self.log.info('Preparing cloudinit image')
> >> >  try:
> >> >  cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
> >> > -self.phone_home_port = network.find_free_port()
> >> > -if not self.phone_home_port:
> >> > -self.cancel('Failed to get a free port')
> >> > +if not self.phone_server:
> >> > +self.cancel('Failed to get port used by the 
> >> > PhoneServer.')
> >> 
> >> Can you think of a condition where `self.phone_server` would not
> >> evaluate to True?  `network.find_free_port()` could return None, so this
> >> check was valid.  But now with `cloudinit.PhoneHomeServer`, I can not
> >> see how we'd end up with a similar condition.  Instantiating
> >> `cloudinit.PhoneHomeServer` where a port can not be alloccated, AFAICT,
> >> would raise a socket exception instead.
> >
> > Since this is a public method and could be called anytime before
> > set_up_cloudinit(), I decided to keep the check just for safety reasons.
> > Ideally, I would prefer not to have this dependency and add a new
> > argument, but I didn't want to change the method signature since it
> > would be required.
> >
> 
> I'm not sure I follow your point.  Let me try to rephrase mine, in case
> I failed to communicate it: I can't see how "if not self.phone_server"
> is a valid check given that it will either:
> 
> * Contain an instance with a port that is already allocated, OR
> * Not get assigned if cloudinit.PhoneHomeServer() fails (and raises an
>   exception).

You are right, makes sense. I will fix with a v2.

Thanks
Beraldo




Re: [PATCH V7 12/29] vl: helper to request re-exec

2022-03-11 Thread Steven Sistare
On 3/9/2022 9:16 AM, Marc-André Lureau wrote:
> On Wed, Dec 22, 2021 at 11:52 PM Steve Sistare  > wrote:
> 
> Add a qemu_system_exec_request() hook that causes the main loop to exit 
> and
> re-exec qemu using the specified arguments.
> 
> Signed-off-by: Steve Sistare  >
> ---
>  include/sysemu/runstate.h |  1 +
>  softmmu/runstate.c        | 21 +
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index b655c7b..198211b 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -57,6 +57,7 @@ void qemu_system_wakeup_enable(WakeupReason reason, 
> bool enabled);
>  void qemu_register_wakeup_notifier(Notifier *notifier);
>  void qemu_register_wakeup_support(void);
>  void qemu_system_shutdown_request(ShutdownCause reason);
> +void qemu_system_exec_request(const strList *args);
>  void qemu_system_powerdown_request(void);
>  void qemu_register_powerdown_notifier(Notifier *notifier);
>  void qemu_register_shutdown_notifier(Notifier *notifier);
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index 3d344c9..309a4bf 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -38,6 +38,7 @@
>  #include "monitor/monitor.h"
>  #include "net/net.h"
>  #include "net/vhost_net.h"
> +#include "qapi/util.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-run-state.h"
>  #include "qapi/qapi-events-run-state.h"
> @@ -355,6 +356,7 @@ static NotifierList wakeup_notifiers =
>  static NotifierList shutdown_notifiers =
>      NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
>  static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
> +static char **exec_argv;
> 
>  ShutdownCause qemu_shutdown_requested_get(void)
>  {
> @@ -371,6 +373,11 @@ static int qemu_shutdown_requested(void)
>      return qatomic_xchg(_requested, SHUTDOWN_CAUSE_NONE);
>  }
> 
> +static int qemu_exec_requested(void)
> +{
> +    return exec_argv != NULL;
> +}
> +
>  static void qemu_kill_report(void)
>  {
>      if (!qtest_driver() && shutdown_signal) {
> @@ -641,6 +648,13 @@ void qemu_system_shutdown_request(ShutdownCause 
> reason)
>      qemu_notify_event();
>  }
> 
> +void qemu_system_exec_request(const strList *args)
> +{
> +    exec_argv = strv_from_strList(args);
> 
> 
> I would rather make it take a GStrv, since that's what it actually uses.
> 
> I would also check if argv[0] is set (or document the expected behaviour).

Will do, thanks.

- Steve

> +    shutdown_requested = 1;
> +    qemu_notify_event();
> +}
> +
>  static void qemu_system_powerdown(void)
>  {
>      qapi_event_send_powerdown();
> @@ -689,6 +703,13 @@ static bool main_loop_should_exit(void)
>      }
>      request = qemu_shutdown_requested();
>      if (request) {
> +
> +        if (qemu_exec_requested()) {
> +            execvp(exec_argv[0], exec_argv);
> +            error_report("execvp %s failed: %s", exec_argv[0], 
> strerror(errno));
> +            g_strfreev(exec_argv);
> +            exec_argv = NULL;
> +        }
>          qemu_kill_report();
>          qemu_system_shutdown(request);
>          if (shutdown_action == SHUTDOWN_ACTION_PAUSE) {
> -- 
> 1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau



Re: [PATCH 5/6] 9pfs: fix 'Twalk' to only send error if no component walked

2022-03-11 Thread Christian Schoenebeck
On Freitag, 11. März 2022 17:35:41 CET Greg Kurz wrote:
> On Thu, 10 Mar 2022 10:13:33 +0100
> 
> Christian Schoenebeck  wrote:
> > On Mittwoch, 9. März 2022 18:57:39 CET Christian Schoenebeck wrote:
> > > Current implementation of 'Twalk' request handling always sends an
> > > 'Rerror'
> > > 
> > > response if any error occured. The 9p2000 protocol spec sais though:
> > >   "
> > >   If the first element cannot be walked for any reason, Rerror is
> > >   returned.
> > >   Otherwise, the walk will return an Rwalk message containing nwqid qids
> > >   corresponding, in order, to the files that are visited by the nwqid
> > >   successful elementwise walks; nwqid is therefore either nwname or the
> > > 
> > > index of the first elementwise walk that failed.
> > > 
> > >   "
> > >   
> > >   http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33
> > > 
> > > For that reason we are no longer leaving from an error path in function
> > > v9fs_walk(), unless really no path component could be walked
> > > successfully or if the request has been interrupted.
> > > 
> > > Local variable 'nvalid' counts and reflects the number of path
> > > components
> > > successfully processed by background I/O thread, whereas local variable
> > > 'name_idx' subsequently counts and reflects the number of path
> > > components
> > > eventually accepted successfully by 9p server controller portion.
> > > 
> > > New local variable 'any_err' is an aggregate variable reflecting whether
> > > any error occurred at all, while already existing variable 'err' only
> > > reflects the last error.
> > > 
> > > Despite QIDs being delivered to client in a more relaxed way now, it is
> > > important to note though that fid still must remain uneffacted if any
> > > error
> > 
> > Typo: should be "unaffected".
> > 
> > > occurred.
> > > 
> > > Signed-off-by: Christian Schoenebeck 
> > > ---
> > > 
> > >  hw/9pfs/9p.c | 29 +
> > >  1 file changed, 21 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > index 6cdc566866..8ccd180608 100644
> > > --- a/hw/9pfs/9p.c
> > > +++ b/hw/9pfs/9p.c
> > > @@ -1766,7 +1766,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > 
> > >  {
> > >  
> > >  int name_idx, nvalid;
> > >  g_autofree V9fsQID *qids = NULL;
> > > 
> > > -int i, err = 0;
> > > +int i, err = 0, any_err = 0;
> > > 
> > >  V9fsPath dpath, path;
> > >  P9ARRAY_REF(V9fsPath) pathes = NULL;
> > >  uint16_t nwnames;
> > > 
> > > @@ -1832,6 +1832,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > 
> > >   * driver code altogether inside the following block.
> > >   */
> > >  
> > >  v9fs_co_run_in_worker({
> > > 
> > > +nvalid = 0;
> > > 
> > >  if (v9fs_request_cancelled(pdu)) {
> > >  
> > >  err = -EINTR;
> > >  break;
> > > 
> > > @@ -1842,7 +1843,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > 
> > >  break;
> > >  
> > >  }
> > >  stbuf = fidst;
> > > 
> > > -for (nvalid = 0; nvalid < nwnames; nvalid++) {
> > > +for (; nvalid < nwnames; nvalid++) {
> > > 
> > >  if (v9fs_request_cancelled(pdu)) {
> > >  
> > >  err = -EINTR;
> > >  break;
> > > 
> > > @@ -1874,12 +1875,13 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > 
> > >  /*
> > >  
> > >   * Handle all the rest of this Twalk request on main thread ...
> > >   */
> > > 
> > > -if (err < 0) {
> > > +if ((err < 0 && !nvalid) || err == -EINTR) {
> > > 
> > >  goto out;
> > >  
> > >  }
> > > 
> > > +any_err |= err;
> > > 
> > >  err = stat_to_qid(pdu, , );
> > > 
> > > -if (err < 0) {
> > > +if (err < 0 && !nvalid) {
> > > 
> > >  goto out;
> > >  
> > >  }
> > >  stbuf = fidst;
> > > 
> > > @@ -1888,20 +1890,30 @@ static void coroutine_fn v9fs_walk(void *opaque)
> > > 
> > >  v9fs_path_copy(, >path);
> > >  v9fs_path_copy(, >path);
> > > 
> > > -for (name_idx = 0; name_idx < nwnames; name_idx++) {
> > > +for (name_idx = 0; name_idx < nvalid; name_idx++) {
> > > 
> > >  if (!same_stat_id(>s->root_st, ) ||
> > >  
> > >  strcmp("..", wnames[name_idx].data))
> > >  
> > >  {
> > >  
> > >  stbuf = stbufs[name_idx];
> > >  err = stat_to_qid(pdu, , );
> > >  if (err < 0) {
> > > 
> > > -goto out;
> > > +break;
> > > 
> > >  }
> > >  v9fs_path_copy(, [name_idx]);
> > >  v9fs_path_copy(, );
> > >  
> > >  }
> > >  memcpy([name_idx], , sizeof(qid));
> > >  
> > >  }
> > > 
> > > +any_err |= err;
> > > +if (any_err) {
> > 
> > Not sure if there is ever the case err > 0, but as we are already
> > comparing
> > for "if (err 

Re: [PATCH 3/6] tests/9pfs: compare QIDs in fs_walk_none() test

2022-03-11 Thread Christian Schoenebeck
On Freitag, 11. März 2022 17:11:24 CET Greg Kurz wrote:
> On Thu, 10 Mar 2022 10:04:50 +0100
> 
> Christian Schoenebeck  wrote:
> > On Mittwoch, 9. März 2022 15:49:04 CET Christian Schoenebeck wrote:
> > > Extend previously added fs_walk_none() test by comparing the QID
> > > of the root fid with the QID of the cloned fid. They should be
> > > equal.
> 
> Ha, I understand your suggestion of changing the name now :-) but I'll
> personally leave it named according to the test scenario of "sending
> a Twalk with no names" and checking everything that is expected in this
> case.

NP

> > > Signed-off-by: Christian Schoenebeck 
> > > ---
> > > 
> > >  tests/qtest/virtio-9p-test.c | 70 
> > >  1 file changed, 70 insertions(+)
> > > 
> > > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > > index 6c00da03f4..9098e21173 100644
> > > --- a/tests/qtest/virtio-9p-test.c
> > > +++ b/tests/qtest/virtio-9p-test.c
> > > @@ -146,6 +146,11 @@ static void v9fs_uint16_read(P9Req *req, uint16_t
> > > *val) le16_to_cpus(val);
> > > 
> > >  }
> > > 
> > > +static void v9fs_int16_read(P9Req *req, int16_t *val)
> > > +{
> > > +v9fs_uint16_read(req, (uint16_t *)val);
> > > +}
> > > +
> > > 
> > >  static void v9fs_uint32_write(P9Req *req, uint32_t val)
> > >  {
> > >  
> > >  uint32_t le_val = cpu_to_le32(val);
> > > 
> > > @@ -166,12 +171,22 @@ static void v9fs_uint32_read(P9Req *req, uint32_t
> > > *val) le32_to_cpus(val);
> > > 
> > >  }
> > > 
> > > +static void v9fs_int32_read(P9Req *req, int32_t *val)
> > > +{
> > > +v9fs_uint32_read(req, (uint32_t *)val);
> > > +}
> > > +
> > > 
> > >  static void v9fs_uint64_read(P9Req *req, uint64_t *val)
> > >  {
> > >  
> > >  v9fs_memread(req, val, 8);
> > >  le64_to_cpus(val);
> > >  
> > >  }
> > > 
> > > +static void v9fs_int64_read(P9Req *req, int64_t *val)
> > > +{
> > > +v9fs_uint64_read(req, (uint64_t *)val);
> > > +}
> > > +
> > > 
> > >  /* len[2] string[len] */
> > >  static uint16_t v9fs_string_size(const char *string)
> > >  {
> > > 
> > > @@ -425,6 +440,40 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid,
> > > v9fs_qid **wqid) v9fs_req_free(req);
> > > 
> > >  }
> > > 
> > > +/* size[4] Tstat tag[2] fid[4] */
> > > +static P9Req *v9fs_tstat(QVirtio9P *v9p, uint32_t fid, uint16_t tag)
> 
> Tstat/Rstat aren't part of 9p2000.L, you should use Tgetattr/Rgetattr
> instead (see https://github.com/chaos/diod/blob/master/protocol.md).

Ah right, I forgot.

> > > +{
> > > +P9Req *req;
> > > +
> > > +req = v9fs_req_init(v9p, 4, P9_TSTAT, tag);
> > > +v9fs_uint32_write(req, fid);
> > > +v9fs_req_send(req);
> > > +return req;
> > > +}
> > > +
> > > +/* size[4] Rstat tag[2] stat[n] */
> > > +static void v9fs_rstat(P9Req *req, struct V9fsStat *st)
> > > +{
> > > +v9fs_req_recv(req, P9_RSTAT);
> > > +
> 
> For the records, this is a stat[n], i.e. "n[2] followed by n bytes of
> data forming the parameter", so you should read an uint16_t first.
> 
> > > +v9fs_int16_read(req, >size);

Which I did here? --^

> > > +v9fs_int16_read(req, >type);
> > > +v9fs_int32_read(req, >dev);
> > > +v9fs_uint8_read(req, >qid.type);
> > > +v9fs_uint32_read(req, >qid.version);
> > > +v9fs_uint64_read(req, >qid.path);
> > > +v9fs_int32_read(req, >mode);
> > > +v9fs_int32_read(req, >mtime);
> > > +v9fs_int32_read(req, >atime);
> > > +v9fs_int64_read(req, >length);
> > > +v9fs_string_read(req, >name.size, >name.data);
> > > +v9fs_string_read(req, >uid.size, >uid.data);
> > > +v9fs_string_read(req, >gid.size, >gid.data);
> > > +v9fs_string_read(req, >muid.size, >muid.data);
> > > +
> > > +v9fs_req_free(req);
> > > +}
> > > +
> > > 
> > >  /* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
> > >  static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t
> > >  offset,
> > >  
> > >  uint32_t count, uint16_t tag)
> > > 
> > > @@ -1009,6 +1058,8 @@ static void fs_walk_none(void *obj, void *data,
> > > QGuestAllocator *t_alloc) v9fs_qid root_qid;
> > > 
> > >  g_autofree v9fs_qid *wqid = NULL;
> > >  P9Req *req;
> > > 
> > > +struct V9fsStat st[2];
> > > +int i;
> > > 
> > >  do_version(v9p);
> > >  req = v9fs_tattach(v9p, 0, getuid(), 0);
> > > 
> > > @@ -1021,6 +1072,25 @@ static void fs_walk_none(void *obj, void *data,
> > > QGuestAllocator *t_alloc)
> > > 
> > >  /* special case: no QID is returned if nwname=0 was sent */
> > >  g_assert(wqid == NULL);
> > > 
> > > +
> > > +req = v9fs_tstat(v9p, 0, 0);
> > > +v9fs_req_wait_for_reply(req, NULL);
> > > +v9fs_rstat(req, [0]);
> > 
> > Probably stat-ing the root fid (0) should happen before sending Twalk, to
> > better counter the 1st fid (0) having become potentially mutated?
> 
> You already have the root qid from Rattach, no need to stat.

Yes, this was about easy comparison with qid.version in mind, 

Re: [PATCH 6/6] tests/9pfs: guard recent 'Twalk' behaviour fix

2022-03-11 Thread Greg Kurz
On Wed, 9 Mar 2022 19:21:18 +0100
Christian Schoenebeck  wrote:

> Previous 9p patch fixed 'Twalk' request handling, which was previously not
> behaving as specified by the 9p2000 protocol spec. This patch adds a new test
> case which guards the new 'Twalk' behaviour in question.
> 
> More specifically: it sends a 'Twalk' request where the 1st path component
> is valid, whereas the 2nd path component transmitted to server does not
> exist. The expected behaviour is that 9p server would respond by sending
> a 'Rwalk' response with exactly 1 QID (instead of 'Rlerror' response).
> 
> Signed-off-by: Christian Schoenebeck 
> ---
>  tests/qtest/virtio-9p-test.c | 42 +---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index 9098e21173..f29de1ca64 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -638,8 +638,12 @@ static void do_version(QVirtio9P *v9p)
>  g_assert_cmpmem(server_version, server_len, version, strlen(version));
>  }
>  
> -/* utility function: walk to requested dir and return fid for that dir */
> -static uint32_t do_walk(QVirtio9P *v9p, const char *path)
> +/*
> + * utility function: walk to requested dir and return fid for that dir and
> + * the QIDs of server response
> + */
> +static uint32_t do_walk_rqids(QVirtio9P *v9p, const char *path, uint16_t 
> *nwqid,
> +  v9fs_qid **wqid)
>  {
>  char **wnames;
>  P9Req *req;
> @@ -649,12 +653,18 @@ static uint32_t do_walk(QVirtio9P *v9p, const char 
> *path)
>  
>  req = v9fs_twalk(v9p, 0, fid, nwnames, wnames, 0);
>  v9fs_req_wait_for_reply(req, NULL);
> -v9fs_rwalk(req, NULL, NULL);
> +v9fs_rwalk(req, nwqid, wqid);
>  
>  split_free();
>  return fid;
>  }
>  
> +/* utility function: walk to requested dir and return fid for that dir */
> +static uint32_t do_walk(QVirtio9P *v9p, const char *path)
> +{
> +return do_walk_rqids(v9p, path, NULL, NULL);
> +}
> +
>  /* utility function: walk to requested dir and expect passed error response 
> */
>  static void do_walk_expect_error(QVirtio9P *v9p, const char *path, uint32_t 
> err)
>  {
> @@ -1048,9 +1058,33 @@ static void fs_walk_nonexistent(void *obj, void *data, 
> QGuestAllocator *t_alloc)
>  alloc = t_alloc;
>  
>  do_attach(v9p);
> +/*
> + * The 9p2000 protocol spec sais: "If the first element cannot be walked

s/sais/says/ and elsewhere

LGTM.

Reviewed-by: Greg Kurz 

> + * for any reason, Rerror is returned."
> + */
>  do_walk_expect_error(v9p, "non-existent", ENOENT);
>  }
>  
> +static void fs_walk_2nd_nonexistent(void *obj, void *data,
> +QGuestAllocator *t_alloc)
> +{
> +QVirtio9P *v9p = obj;
> +alloc = t_alloc;
> +uint16_t nwqid;
> +g_autofree v9fs_qid *wqid = NULL;
> +g_autofree char *path = g_strdup_printf(
> +QTEST_V9FS_SYNTH_WALK_FILE "/non-existent", 0
> +);
> +
> +do_attach(v9p);
> +do_walk_rqids(v9p, path, , );
> +/*
> + * The 9p2000 protocol spec sais: "nwqid is therefore either nwname or 
> the
> + * index of the first elementwise walk that failed."
> + */
> +assert(nwqid == 1);
> +}
> +
>  static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)
>  {
>  QVirtio9P *v9p = obj;
> @@ -1531,6 +1565,8 @@ static void register_virtio_9p_test(void)
>   fs_walk_dotdot,  );
>  qos_add_test("synth/walk/non_existent", "virtio-9p", fs_walk_nonexistent,
>);
> +qos_add_test("synth/walk/2nd_non_existent", "virtio-9p",
> + fs_walk_2nd_nonexistent, );
>  qos_add_test("synth/lopen/basic", "virtio-9p", fs_lopen,  );
>  qos_add_test("synth/write/basic", "virtio-9p", fs_write,  );
>  qos_add_test("synth/flush/success", "virtio-9p", fs_flush_success,




Re: [PATCH 5/6] 9pfs: fix 'Twalk' to only send error if no component walked

2022-03-11 Thread Greg Kurz
On Thu, 10 Mar 2022 10:13:33 +0100
Christian Schoenebeck  wrote:

> On Mittwoch, 9. März 2022 18:57:39 CET Christian Schoenebeck wrote:
> > Current implementation of 'Twalk' request handling always sends an 'Rerror'
> > response if any error occured. The 9p2000 protocol spec sais though:
> > 
> >   "
> >   If the first element cannot be walked for any reason, Rerror is returned.
> >   Otherwise, the walk will return an Rwalk message containing nwqid qids
> >   corresponding, in order, to the files that are visited by the nwqid
> >   successful elementwise walks; nwqid is therefore either nwname or the
> > index of the first elementwise walk that failed.
> >   "
> > 
> >   http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33
> > 
> > For that reason we are no longer leaving from an error path in function
> > v9fs_walk(), unless really no path component could be walked successfully or
> > if the request has been interrupted.
> > 
> > Local variable 'nvalid' counts and reflects the number of path components
> > successfully processed by background I/O thread, whereas local variable
> > 'name_idx' subsequently counts and reflects the number of path components
> > eventually accepted successfully by 9p server controller portion.
> > 
> > New local variable 'any_err' is an aggregate variable reflecting whether any
> > error occurred at all, while already existing variable 'err' only reflects
> > the last error.
> > 
> > Despite QIDs being delivered to client in a more relaxed way now, it is
> > important to note though that fid still must remain uneffacted if any error
> 
> Typo: should be "unaffected".
> 
> > occurred.
> > 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> >  hw/9pfs/9p.c | 29 +
> >  1 file changed, 21 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index 6cdc566866..8ccd180608 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -1766,7 +1766,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >  {
> >  int name_idx, nvalid;
> >  g_autofree V9fsQID *qids = NULL;
> > -int i, err = 0;
> > +int i, err = 0, any_err = 0;
> >  V9fsPath dpath, path;
> >  P9ARRAY_REF(V9fsPath) pathes = NULL;
> >  uint16_t nwnames;
> > @@ -1832,6 +1832,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >   * driver code altogether inside the following block.
> >   */
> >  v9fs_co_run_in_worker({
> > +nvalid = 0;
> >  if (v9fs_request_cancelled(pdu)) {
> >  err = -EINTR;
> >  break;
> > @@ -1842,7 +1843,7 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >  break;
> >  }
> >  stbuf = fidst;
> > -for (nvalid = 0; nvalid < nwnames; nvalid++) {
> > +for (; nvalid < nwnames; nvalid++) {
> >  if (v9fs_request_cancelled(pdu)) {
> >  err = -EINTR;
> >  break;
> > @@ -1874,12 +1875,13 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >  /*
> >   * Handle all the rest of this Twalk request on main thread ...
> >   */
> > -if (err < 0) {
> > +if ((err < 0 && !nvalid) || err == -EINTR) {
> >  goto out;
> >  }
> > 
> > +any_err |= err;
> >  err = stat_to_qid(pdu, , );
> > -if (err < 0) {
> > +if (err < 0 && !nvalid) {
> >  goto out;
> >  }
> >  stbuf = fidst;
> > @@ -1888,20 +1890,30 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >  v9fs_path_copy(, >path);
> >  v9fs_path_copy(, >path);
> > 
> > -for (name_idx = 0; name_idx < nwnames; name_idx++) {
> > +for (name_idx = 0; name_idx < nvalid; name_idx++) {
> >  if (!same_stat_id(>s->root_st, ) ||
> >  strcmp("..", wnames[name_idx].data))
> >  {
> >  stbuf = stbufs[name_idx];
> >  err = stat_to_qid(pdu, , );
> >  if (err < 0) {
> > -goto out;
> > +break;
> >  }
> >  v9fs_path_copy(, [name_idx]);
> >  v9fs_path_copy(, );
> >  }
> >  memcpy([name_idx], , sizeof(qid));
> >  }
> > +any_err |= err;
> > +if (any_err) {
> 
> 
> Not sure if there is ever the case err > 0, but as we are already comparing 
> for "if (err < 0)" everywhere, we should probably also do the same comparison 
> for the aggregate error variable here, right?
> 

It seems that you could drop any_err and just check name_idx != nwnames ?

> if (any_err < 0) {
> ...
> 
> > +if (!name_idx) {
> > +/* don't send any QIDs, send Rlerror instead */
> > +goto out;
> > +} else {
> > +/* send QIDs (not Rlerror), but fid MUST remain unaffected */
> > +goto send_qids;
> > +}
> > +}
> >  if (fid == newfid) {
> >  if (fidp->fid_type != P9_FID_NONE) {
> >  err = -EINVAL;
> > @@ -1919,8 +1931,9 @@ static void coroutine_fn 

Re: [PATCH] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Cleber Rosa


Beraldo Leal  writes:

> On Fri, Mar 11, 2022 at 09:28:24AM -0500, Cleber Rosa wrote:
>> 
>> Beraldo Leal  writes:
>> 
>> > Race conditions can happen with the current code, because the port that
>> > was available might not be anymore by the time the server is started.
>> >
>> > By setting the port to 0, PhoneServer it will use the OS default
>> > behavior to get a free port, then we save this information so we can
>> > later configure the guest.
>> >
>> > Suggested-by: Daniel P. Berrangé 
>> > Signed-off-by: Beraldo Leal 
>> > ---
>> >  tests/avocado/avocado_qemu/__init__.py | 13 -
>> >  1 file changed, 8 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/tests/avocado/avocado_qemu/__init__.py 
>> > b/tests/avocado/avocado_qemu/__init__.py
>> > index 9b056b5ce5..e830d04b84 100644
>> > --- a/tests/avocado/avocado_qemu/__init__.py
>> > +++ b/tests/avocado/avocado_qemu/__init__.py
>> > @@ -602,9 +602,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
>> >  self.log.info('Preparing cloudinit image')
>> >  try:
>> >  cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
>> > -self.phone_home_port = network.find_free_port()
>> > -if not self.phone_home_port:
>> > -self.cancel('Failed to get a free port')
>> > +if not self.phone_server:
>> > +self.cancel('Failed to get port used by the PhoneServer.')
>> 
>> Can you think of a condition where `self.phone_server` would not
>> evaluate to True?  `network.find_free_port()` could return None, so this
>> check was valid.  But now with `cloudinit.PhoneHomeServer`, I can not
>> see how we'd end up with a similar condition.  Instantiating
>> `cloudinit.PhoneHomeServer` where a port can not be alloccated, AFAICT,
>> would raise a socket exception instead.
>
> Since this is a public method and could be called anytime before
> set_up_cloudinit(), I decided to keep the check just for safety reasons.
> Ideally, I would prefer not to have this dependency and add a new
> argument, but I didn't want to change the method signature since it
> would be required.
>

I'm not sure I follow your point.  Let me try to rephrase mine, in case
I failed to communicate it: I can't see how "if not self.phone_server"
is a valid check given that it will either:

* Contain an instance with a port that is already allocated, OR
* Not get assigned if cloudinit.PhoneHomeServer() fails (and raises an
  exception).

Instead of this check, it'd make sense to have a try/except block
protecting the PhoneHomeServer instantiation, and canceling the test if
it fails.

Or maybe you meant to check for self.phone_server.server_port instead?

Cheers,
- Cleber.




Re: [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma)

2022-03-11 Thread Steven Sistare
On 3/10/2022 5:30 PM, Alex Williamson wrote:
> On Thu, 10 Mar 2022 14:55:50 -0500
> Steven Sistare  wrote:
> 
>> On 3/10/2022 1:35 PM, Alex Williamson wrote:
>>> On Thu, 10 Mar 2022 10:00:29 -0500
>>> Steven Sistare  wrote:
>>>   
 On 3/7/2022 5:16 PM, Alex Williamson wrote:  
> On Wed, 22 Dec 2021 11:05:24 -0800
> Steve Sistare  wrote:  
> [...]
>> diff --git a/migration/cpr.c b/migration/cpr.c
>> index 37eca66..cee82cf 100644
>> --- a/migration/cpr.c
>> +++ b/migration/cpr.c
>> @@ -7,6 +7,7 @@
>>  
>>  #include "qemu/osdep.h"
>>  #include "exec/memory.h"
>> +#include "hw/vfio/vfio-common.h"
>>  #include "io/channel-buffer.h"
>>  #include "io/channel-file.h"
>>  #include "migration.h"
>> @@ -101,7 +102,9 @@ void qmp_cpr_exec(strList *args, Error **errp)
>>  error_setg(errp, "cpr-exec requires cpr-save with restart 
>> mode");
>>  return;
>>  }
>> -
>> +if (cpr_vfio_save(errp)) {
>> +return;
>> +}
>
> Why is vfio so unique that it needs separate handlers versus other
> devices?  Thanks,

 In earlier patches these functions fiddled with more objects, but at this 
 point
 they are simple enough to convert to pre_save and post_load vmstate 
 handlers for
 the container and group objects.  However, we would still need to call 
 special 
 functons for vfio from qmp_cpr_exec:

   * validate all containers support CPR before we start blasting vaddrs
 However, I could validate all, in every call to pre_save for each 
 container.
 That would be less efficient, but fits the vmstate model.  
>>>
>>> Would it be a better option to mirror the migration blocker support, ie.
>>> any device that doesn't support cpr registers a blocker and generic
>>> code only needs to keep track of whether any blockers are registered.  
>>
>> We cannot specifically use migrate_add_blocker(), because it is checked in
>> the migration specific function migrate_prepare(), in a layer of functions 
>> above the simpler qemu_save_device_state() used in cpr.  But yes, we could
>> do something similar for vfio.  Increment a global counter in vfio_realize
>> if the container does not support cpr, and decrement it when the container is
>> destroyed.  pre_save could just check the counter.
> 
> Right, not suggesting to piggyback on migrate_add_blocker() only to use
> a similar mechanism.  Only drivers that can't support cpr need register
> a blocker but testing for blockers is done generically, not just for
> vfio devices.
> 
   * restore all vaddr's if qemu_save_device_state fails.
 However, I could recover for all containers inside pre_save when one 
 container fails.
 Feels strange touching all objects in a function for one, but there is 
 no real
 downside.  
>>>
>>> I'm not as familiar as I should be with migration callbacks, thanks to
>>> mostly not supporting it for vfio devices, but it seems strange to me
>>> that there's no existing callback or notifier per device to propagate
>>> save failure.  Do we not at least get some sort of resume callback in
>>> that case?  
>>
>> We do not:
>> struct VMStateDescription {
>> int (*pre_load)(void *opaque);
>> int (*post_load)(void *opaque, int version_id);
>> int (*pre_save)(void *opaque);
>> int (*post_save)(void *opaque);
>>
>> The handler returns an error, which stops further saves and is propagated 
>> back
>> to the top level caller qemu_save_device_state().
>>
>> The vast majority of handlers do not have side effects, with no need to 
>> unwind 
>> anything on failure.
>>
>> This raises another point.  If pre_save succeeds for all the containers,
>> but fails for some non-vfio object, then the overall operation is abandoned,
>> but we do not restore the vaddr's.  To plug that hole, we need to call the
>> unwind code from qmp_cpr_save, or implement your alternative below.
> 
> We're trying to reuse migration interfaces, are we also triggering
> migration state change notifiers?  ie.
> MIGRATION_STATUS_{CANCELLING,CANCELLED,FAILED}  

No. That happens in the migration layer which we do not use.

> We already hook vfio
> devices supporting migration into that notifier to tell the driver to
> move the device back to the running state on failure, which seems a bit
> unique to vfio devices.  Containers could maybe register their own
> callbacks.
> 
>>> As an alternative, maybe each container could register a vm change
>>> handler that would trigger reloading vaddrs if we move to a running
>>> state and a flag on the container indicates vaddrs were invalidated?
>>> Thanks,  
>>
>> That works and is modular, but I dislike that it adds checks on the
>> happy path for a case that will rarely happen, and it pushes recovery from
>> failure further away from the original failure, which would make debugging
>> cascading 

Re: [PATCH 4/6] 9pfs: refactor 'name_idx' -> 'nvalid' in v9fs_walk()

2022-03-11 Thread Greg Kurz
On Thu, 10 Mar 2022 10:07:04 +0100
Christian Schoenebeck  wrote:

> On Mittwoch, 9. März 2022 18:12:17 CET Christian Schoenebeck wrote:
> > The local variable 'name_idx' is used in two loops in function v9fs_walk().
> > Let the first loop use its own variable 'nvalid' instead, which we will use
> > in subsequent patches as the number of (requested) path components
> > successfully retrieved/walked by background I/O thread.

I think walked is clear enough.

> > 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> >  hw/9pfs/9p.c | 16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index a6d6b3f835..6cdc566866 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -1764,7 +1764,7 @@ static bool same_stat_id(const struct stat *a, const
> > struct stat *b)
> > 
> >  static void coroutine_fn v9fs_walk(void *opaque)
> >  {
> > -int name_idx;
> > +int name_idx, nvalid;
> 
> Or rather renaming this nvalid -> nfetched?
> 

or simply nwalked ?

Anyway,

Reviewed-by: Greg Kurz 

> >  g_autofree V9fsQID *qids = NULL;
> >  int i, err = 0;
> >  V9fsPath dpath, path;
> > @@ -1842,17 +1842,17 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >  break;
> >  }
> >  stbuf = fidst;
> > -for (name_idx = 0; name_idx < nwnames; name_idx++) {
> > +for (nvalid = 0; nvalid < nwnames; nvalid++) {
> >  if (v9fs_request_cancelled(pdu)) {
> >  err = -EINTR;
> >  break;
> >  }
> >  if (!same_stat_id(>s->root_st, ) ||
> > -strcmp("..", wnames[name_idx].data))
> > +strcmp("..", wnames[nvalid].data))
> >  {
> >  err = s->ops->name_to_path(>ctx, ,
> > -   wnames[name_idx].data,
> > -   [name_idx]);
> > +   wnames[nvalid].data,
> > +   [nvalid]);
> >  if (err < 0) {
> >  err = -errno;
> >  break;
> > @@ -1861,13 +1861,13 @@ static void coroutine_fn v9fs_walk(void *opaque)
> >  err = -EINTR;
> >  break;
> >  }
> > -err = s->ops->lstat(>ctx, [name_idx], );
> > +err = s->ops->lstat(>ctx, [nvalid], );
> >  if (err < 0) {
> >  err = -errno;
> >  break;
> >  }
> > -stbufs[name_idx] = stbuf;
> > -v9fs_path_copy(, [name_idx]);
> > +stbufs[nvalid] = stbuf;
> > +v9fs_path_copy(, [nvalid]);
> >  }
> >  }
> >  });
> 
> 




Re: [PATCH v2 04/11] edk2: .git can be a file

2022-03-11 Thread Eric Blake
On Fri, Mar 11, 2022 at 06:37:52AM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  roms/Makefile.edk2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 3/6] tests/9pfs: compare QIDs in fs_walk_none() test

2022-03-11 Thread Greg Kurz
On Thu, 10 Mar 2022 10:04:50 +0100
Christian Schoenebeck  wrote:

> On Mittwoch, 9. März 2022 15:49:04 CET Christian Schoenebeck wrote:
> > Extend previously added fs_walk_none() test by comparing the QID
> > of the root fid with the QID of the cloned fid. They should be
> > equal.
> > 

Ha, I understand your suggestion of changing the name now :-) but I'll
personally leave it named according to the test scenario of "sending
a Twalk with no names" and checking everything that is expected in this
case.

> > Signed-off-by: Christian Schoenebeck 
> > ---
> >  tests/qtest/virtio-9p-test.c | 70 
> >  1 file changed, 70 insertions(+)
> > 
> > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > index 6c00da03f4..9098e21173 100644
> > --- a/tests/qtest/virtio-9p-test.c
> > +++ b/tests/qtest/virtio-9p-test.c
> > @@ -146,6 +146,11 @@ static void v9fs_uint16_read(P9Req *req, uint16_t *val)
> > le16_to_cpus(val);
> >  }
> > 
> > +static void v9fs_int16_read(P9Req *req, int16_t *val)
> > +{
> > +v9fs_uint16_read(req, (uint16_t *)val);
> > +}
> > +
> >  static void v9fs_uint32_write(P9Req *req, uint32_t val)
> >  {
> >  uint32_t le_val = cpu_to_le32(val);
> > @@ -166,12 +171,22 @@ static void v9fs_uint32_read(P9Req *req, uint32_t
> > *val) le32_to_cpus(val);
> >  }
> > 
> > +static void v9fs_int32_read(P9Req *req, int32_t *val)
> > +{
> > +v9fs_uint32_read(req, (uint32_t *)val);
> > +}
> > +
> >  static void v9fs_uint64_read(P9Req *req, uint64_t *val)
> >  {
> >  v9fs_memread(req, val, 8);
> >  le64_to_cpus(val);
> >  }
> > 
> > +static void v9fs_int64_read(P9Req *req, int64_t *val)
> > +{
> > +v9fs_uint64_read(req, (uint64_t *)val);
> > +}
> > +
> >  /* len[2] string[len] */
> >  static uint16_t v9fs_string_size(const char *string)
> >  {
> > @@ -425,6 +440,40 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid,
> > v9fs_qid **wqid) v9fs_req_free(req);
> >  }
> > 
> > +/* size[4] Tstat tag[2] fid[4] */
> > +static P9Req *v9fs_tstat(QVirtio9P *v9p, uint32_t fid, uint16_t tag)

Tstat/Rstat aren't part of 9p2000.L, you should use Tgetattr/Rgetattr
instead (see https://github.com/chaos/diod/blob/master/protocol.md).

> > +{
> > +P9Req *req;
> > +
> > +req = v9fs_req_init(v9p, 4, P9_TSTAT, tag);
> > +v9fs_uint32_write(req, fid);
> > +v9fs_req_send(req);
> > +return req;
> > +}
> > +
> > +/* size[4] Rstat tag[2] stat[n] */
> > +static void v9fs_rstat(P9Req *req, struct V9fsStat *st)
> > +{
> > +v9fs_req_recv(req, P9_RSTAT);
> > +

For the records, this is a stat[n], i.e. "n[2] followed by n bytes of
data forming the parameter", so you should read an uint16_t first.

> > +v9fs_int16_read(req, >size);
> > +v9fs_int16_read(req, >type);
> > +v9fs_int32_read(req, >dev);
> > +v9fs_uint8_read(req, >qid.type);
> > +v9fs_uint32_read(req, >qid.version);
> > +v9fs_uint64_read(req, >qid.path);
> > +v9fs_int32_read(req, >mode);
> > +v9fs_int32_read(req, >mtime);
> > +v9fs_int32_read(req, >atime);
> > +v9fs_int64_read(req, >length);
> > +v9fs_string_read(req, >name.size, >name.data);
> > +v9fs_string_read(req, >uid.size, >uid.data);
> > +v9fs_string_read(req, >gid.size, >gid.data);
> > +v9fs_string_read(req, >muid.size, >muid.data);
> > +
> > +v9fs_req_free(req);
> > +}
> > +
> >  /* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
> >  static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t offset,
> >  uint32_t count, uint16_t tag)
> > @@ -1009,6 +1058,8 @@ static void fs_walk_none(void *obj, void *data,
> > QGuestAllocator *t_alloc) v9fs_qid root_qid;
> >  g_autofree v9fs_qid *wqid = NULL;
> >  P9Req *req;
> > +struct V9fsStat st[2];
> > +int i;
> > 
> >  do_version(v9p);
> >  req = v9fs_tattach(v9p, 0, getuid(), 0);
> > @@ -1021,6 +1072,25 @@ static void fs_walk_none(void *obj, void *data,
> > QGuestAllocator *t_alloc)
> > 
> >  /* special case: no QID is returned if nwname=0 was sent */
> >  g_assert(wqid == NULL);
> > +
> > +req = v9fs_tstat(v9p, 0, 0);
> > +v9fs_req_wait_for_reply(req, NULL);
> > +v9fs_rstat(req, [0]);
> 
> Probably stat-ing the root fid (0) should happen before sending Twalk, to 
> better counter the 1st fid (0) having become potentially mutated?
> 

You already have the root qid from Rattach, no need to stat.

> > +
> > +req = v9fs_tstat(v9p, 1, 0);
> > +v9fs_req_wait_for_reply(req, NULL);
> > +v9fs_rstat(req, [1]);
> > +
> > +/* don't compare QID version for checking for file ID equalness */
> > +g_assert(st[0].qid.type == st[1].qid.type);
> > +g_assert(st[0].qid.path == st[1].qid.path);
> 
> I could add a helper function is_same_qid() for this if desired.
> 

Rgetattr provides a qid[13] like Rattach. Since we control everything,
the version bits won't change and I think is_same_qid() could be
something as simple as:

static inline 

Re: [PATCH v2 1/7] target/s390x: vxeh2: vector convert short/32b

2022-03-11 Thread David Hildenbrand
On 07.03.22 03:03, David Miller wrote:
> Signed-off-by: David Miller 
> ---
>  target/s390x/helper.h   |  4 +++
>  target/s390x/tcg/translate_vx.c.inc | 44 ++---
>  target/s390x/tcg/vec_fpu_helper.c   | 31 
>  3 files changed, 75 insertions(+), 4 deletions(-)
> 

Reviewed-by: David Hildenbrand 


-- 
Thanks,

David / dhildenb




Re: [PATCH v2 07/11] tests/acpi: update expected data files

2022-03-11 Thread Michael S. Tsirkin
On Fri, Mar 11, 2022 at 06:37:55AM +0100, Gerd Hoffmann wrote:
> The switch to edk2 RELEASE builds changes the memory layout a bit,
> resulting in a acpi table change.
> 
>  DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
>  {
>  Scope (\_SB)
>  {
>  Device (NVDR)
>  {
>  Name (_HID, "ACPI0012" /* NVDIMM Root Device */)  // _HID: 
> Hardware ID
>  Method (NCAL, 5, Serialized)
>  {
>  Local6 = MEMA /* \MEMA */
>  {
>  Return (NCAL (Arg0, Arg1, Arg2, Arg3, 0x02))
>  }
>  }
> 
>  Device (NV02)
>  {
>  Name (_ADR, 0x03)  // _ADR: Address
>  Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific 
> Method
>  {
>  Return (NCAL (Arg0, Arg1, Arg2, Arg3, 0x03))
>  }
>  }
>  }
>  }
> 
> -Name (MEMA, 0x43D1)
> +Name (MEMA, 0x43F5)
>  }
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Michael S. Tsirkin 

> ---
>  tests/data/acpi/virt/SSDT.memhp | Bin 736 -> 736 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/tests/data/acpi/virt/SSDT.memhp b/tests/data/acpi/virt/SSDT.memhp
> index 
> 375d7b6fc85a484f492a26ccd355c205f2c34473..4c363a6d95a7e2e826568c85f5719127748e7932
>  100644
> GIT binary patch
> delta 22
> dcmaFB`hb-yIM^lR0TTlQqx43uD@;sZodHo~2HXGu
> 
> delta 22
> dcmaFB`hb-yIM^lR0TTlQqy0v%D@;rmodHrj2HXGu
> 
> -- 
> 2.35.1




Re: [RFC PATCH] docs/devel: try and impose some organisation

2022-03-11 Thread Kashyap Chamarthy
On Wed, Mar 09, 2022 at 01:53:55PM +, Alex Bennée wrote:
> We have a growing set of developer docs but the index is currently in
> order of when stuff was added. Try and make things a bit easier to
> find my adding sub indexes and organising into themes.
> 
> Signed-off-by: Alex Bennée 
> ---
>  docs/devel/index-api.rst   | 15 +++
>  docs/devel/index-build.rst | 20 +++
>  docs/devel/index-internals.rst | 22 
>  docs/devel/index-process.rst   | 18 +
>  docs/devel/index-tcg.rst   | 17 +
>  docs/devel/index.rst   | 46 ++
>  6 files changed, 99 insertions(+), 39 deletions(-)
>  create mode 100644 docs/devel/index-api.rst
>  create mode 100644 docs/devel/index-build.rst
>  create mode 100644 docs/devel/index-internals.rst
>  create mode 100644 docs/devel/index-process.rst
>  create mode 100644 docs/devel/index-tcg.rst

Yeah, the sub-indexes idea looks good to me.  It's good to start this
before it gets out of hand :-)

As discussed on IRC, there seems to be some whitespace damage:

$> git describe
v6.2.0-2381-g034e818c93

$>  ~/Mail/patch-temp/cur/1647009097.180136_1.paraplu\:2\,S
Applying: docs/devel: try and impose some organisation
.git/rebase-apply/patch:61: trailing whitespace.
  
.git/rebase-apply/patch:89: trailing whitespace.
  
.git/rebase-apply/patch:113: trailing whitespace.
  
.git/rebase-apply/patch:136: trailing whitespace.
  
.git/rebase-apply/patch:61: new blank line at EOF.
+
warning: squelched 3 whitespace errors
warning: 8 lines add whitespace errors.

FWIW:

Reviewed-by: Kashyap Chamarthy 

[...]

-- 
/kashyap




Re: [PATCH v2 01/11] tests/acpi: allow virt memory hotplug changes

2022-03-11 Thread Michael S. Tsirkin
On Fri, Mar 11, 2022 at 06:37:49AM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> Reviewed-by: Alex Bennée 

Reviewed-by: Michael S. Tsirkin 

> ---
>  tests/qtest/bios-tables-test-allowed-diff.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index dfb8523c8bf4..e569098abddc 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1 +1,2 @@
>  /* List of comma-separated changed AML files to ignore */
> +"tests/data/acpi/virt/SSDT.memhp",
> -- 
> 2.35.1




Re: [PATCH] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Beraldo Leal
On Fri, Mar 11, 2022 at 09:28:24AM -0500, Cleber Rosa wrote:
> 
> Beraldo Leal  writes:
> 
> > Race conditions can happen with the current code, because the port that
> > was available might not be anymore by the time the server is started.
> >
> > By setting the port to 0, PhoneServer it will use the OS default
> > behavior to get a free port, then we save this information so we can
> > later configure the guest.
> >
> > Suggested-by: Daniel P. Berrangé 
> > Signed-off-by: Beraldo Leal 
> > ---
> >  tests/avocado/avocado_qemu/__init__.py | 13 -
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/tests/avocado/avocado_qemu/__init__.py 
> > b/tests/avocado/avocado_qemu/__init__.py
> > index 9b056b5ce5..e830d04b84 100644
> > --- a/tests/avocado/avocado_qemu/__init__.py
> > +++ b/tests/avocado/avocado_qemu/__init__.py
> > @@ -602,9 +602,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
> >  self.log.info('Preparing cloudinit image')
> >  try:
> >  cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
> > -self.phone_home_port = network.find_free_port()
> > -if not self.phone_home_port:
> > -self.cancel('Failed to get a free port')
> > +if not self.phone_server:
> > +self.cancel('Failed to get port used by the PhoneServer.')
> 
> Can you think of a condition where `self.phone_server` would not
> evaluate to True?  `network.find_free_port()` could return None, so this
> check was valid.  But now with `cloudinit.PhoneHomeServer`, I can not
> see how we'd end up with a similar condition.  Instantiating
> `cloudinit.PhoneHomeServer` where a port can not be alloccated, AFAICT,
> would raise a socket exception instead.

Since this is a public method and could be called anytime before
set_up_cloudinit(), I decided to keep the check just for safety reasons.
Ideally, I would prefer not to have this dependency and add a new
argument, but I didn't want to change the method signature since it
would be required.

> Also, the name of the utility class is PhoneHomeServer.  Using a
> different name in the message will make cross references into the
> Avocado docs harder.
> 
> Finally, a nitpick: I'd drop the leading dot in such a test cancelation
> message.

Makes sense.

> Other than those points, the direction of those changes are indeed a
> great improvement.

Thank you all, I will also remove the unused 'network' import on a v2,
that I just notice after sending the patch.

--
Beraldo




Re: [PATCH] block: Fix BB.root changing across bdrv_next()

2022-03-11 Thread Hanna Reitz

On 01.03.22 18:39, Hanna Reitz wrote:

bdrv_next() has no guarantee that its caller has stopped all block graph
operations; for example, bdrv_flush_all() does not.

The latter can actually provoke such operations, because its
bdrv_flush() call, which runs a coroutine (bdrv_co_flush()), may run
this coroutine in a different AioContext than the main one, and then
when this coroutine is done and invokes aio_wait_kick(), the monitor may
get a chance to run and start executing some graph-modifying QMP
command.

One example for this is when the VM encounters an I/O error on a block
device and stops, triggering a bdrv_flush_all(), and a blockdev-mirror
is started simultaneously on a block node in an I/O thread.  When
bdrv_flush_all() comes to that node[1] and flushes it, the
aio_wait_kick() at the end of bdrv_co_flush_entry() may cause the
monitor to process the mirror request, and mirror_start_job() will then
replace the node by a mirror filter node, before bdrv_flush_all()
resumes and can invoke bdrv_next() again to continue iterating.

[1] Say there is a BlockBackend on top of the node in question, and so
bdrv_next() finds that BB and returns the node as the BB's blk_bs().
bdrv_next() will bdrv_ref() the node such that it remains valid through
bdrv_flush_all()'s iteration, and drop the reference when it is called
the next time.

The problem is that bdrv_next() does not store to which BDS it retains a
strong reference when the BDS is a BB's child, so on the subsequent
call, it will just invoke blk_bs() again and bdrv_unref() the returned
node -- but as the example above shows, this node might be a different
one than the one that was bdrv_ref()-ed before.  This can lead to a
use-after-free (for the mirror filter node in our example), because this
negligent bdrv_unref() would steal a strong reference from someone else.

We can solve this problem by always storing the returned (and strongly
referenced) BDS in BdrvNextIterator.bs.  When we want to drop the strong
reference of a BDS previously returned, always drop BdrvNextIterator.bs
instead of using other ways of trying to figure out what that BDS was
that we returned last time.


So a week ago, Kevin and me talked about this on IRC, and he was rather 
apprehensive of this approach, because (1) it fixes a probably 
high-level problem in one specific low-level place, and (2) it’s not 
even quite clear whether even this specific problem is really fixed.


(For (2): If bdrv_next() can cope with graph changes, then if such a 
change occurs during bdrv_flush_all(), it isn’t entirely clear whether 
we’ve truly iterated over all nodes and flushed them all.)


I’ve put a more detailed description of what I think is happening step 
by step here: https://bugzilla.redhat.com/show_bug.cgi?id=2058457#c7


So, the question came up whether we shouldn’t put bdrv_flush_all() into 
a drained section (there is a bdrv_drain_all() before, it’s just not a 
section), and ensure that no QMP commands can be executed in drained 
sections.  I fiddled around a bit, just wanting to send an extremely 
rough RFC to see whether the direction I’d be going in made any sense at 
all, but I’m not really making progress:


I wanted to basically introduce an Rwlock for QMP request processing, 
and take a read lock while we’re in a drained section. That doesn’t work 
so well, though, because when a QMP command (i.e. Rwlock is taken for a 
writer) uses drain (trying to take it as a reader), there’s a deadlock.  
I don’t really have a good idea to consolidate this, because if running 
QMP commands during drains is forbidden, then, well, a QMP command 
cannot use drain.  We’d need some way to identify that the drain is 
based in the currently running QMP command, and allow that, but I really 
don’t know how.



While looking into the STOP behavior further, something else came up.  
Summarizing part of my comment linked above, part of the problem is that 
vm_stop() runs, and concurrently the guest device requests another STOP; 
therefore, the next main loop iteration will try to STOP again.  That 
seems to be why the monitor makes some progress during one main loop 
iteration, and then the next unfortunate sequence of polling can lead to 
the monitor processing a QMP command.


So other things to consider could be whether we should ensure that the 
monitor is not in danger of processing QMP requests when 
main_loop_should_exit() runs, e.g. by polling until it’s back at the top 
of its main loop (in monitor_qmp_dispatcher_co()).


Or whether we could make qemu_system_vmstop_request() prevent double 
stop requests, by forbidding STOP requests while a previous STOP request 
has not yet been completely processed (i.e. accepting another one only 
once the VM has been stopped one time).


The simplest way to fix this very issue I’m seeing at least would be 
just to pull do_vm_stop()’s bdrv_drain_all()+bdrv_flush_all() into its 
conditional path; i.e. to only do this if the VM hadn’t been already 
stopped.  (I 

Re: [PATCH] x86: q35: require split irqchip for large CPU count

2022-03-11 Thread David Woodhouse
On Fri, 2022-03-11 at 09:39 -0500, Igor Mammedov wrote:
> if VM is started with:
> 
>-enable-kvm -smp 256
> 
> without specifying 'split' irqchip, VM might eventually boot
> but no more than 255 CPUs will be operational and following
> error messages in guest could be observed:
>...
>smpboot: native_cpu_up: bad cpu 256
>...
> It's a regression introduced by [1], which removed dependency
> on intremap=on that were implicitly requiring 'split' irqchip
> and forgot to check for 'split' irqchip.
> Instead of letting VM boot a broken VM, error out and tell
> user how to fix CLI.

Hm, wasn't that already fixed in the patches I posted in December?



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5

2022-03-11 Thread Fabiano Rosas
Daniel Henrique Barboza  writes:

> On 3/8/22 22:23, Fabiano Rosas wrote:
>> QEMU reports MMU support to the guest via the ibm,architecture-vec-5
>> property of the /chosen node. Byte number 26 specifies Radix Table
>> Expansions, currently only GTSE (Guest Translation Shootdown
>> Enable). This feature determines whether the tlbie instruction (and
>> others) are HV privileged.
>> 
>> Up until now, we always reported GTSE=1 to guests. Even after the
>> support for GTSE=0 was added. As part of that support, a kernel
>> command line radix_hcall_invalidate=on was introduced that overrides
>> the GTSE value received via CAS. So a guest can run with GTSE=0 and
>> use the H_RPT_INVALIDATE hcall instead of tlbie.
>> 
>> In this scenario, having GTSE always set to 1 by QEMU leads to a crash
>> when running nested KVM guests because KVM does not allow a nested
>> hypervisor to set GTSE support for its nested guests. So a nested
>> guest always uses the same value for LPCR_GTSE as its HV. Since the
>> nested HV disabled GTSE, but the L2 QEMU always reports GTSE=1, we run
>> into a crash when:
>> 
>> L1 LPCR_GTSE=0
>> L2 LPCR_GTSE=0
>> L2 CAS GTSE=1
>> 
>> The nested guest will run 'tlbie' and crash because the HW looks at
>> LPCR_GTSE, which is clear.
>> 
>> Having GTSE disabled in the L1 and enabled in the L2 is not an option
>> because the whole purpose of GTSE is to disallow access to tlbie and
>> we cannot allow L1 to spawn L2s that can access features that L1
>> itself cannot.
>> 
>> We also cannot have the guest check the LPCR bit, because LPCR is
>> HV-privileged.
>> 
>> So this patch goes through the most intuitive route which is to have
>> QEMU ask KVM about GTSE support and advertise the correct value to the
>> guest. A new KVM_CAP_PPC_GTSE capability is being added.
>> 
>> TCG continues to always enable GTSE.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>
>
> I'm not sure if I fully understood the situation, so let me recap. Once upon 
> a time,
> QEMU advertised GTSE=1 and the host would never advertise other value, and 
> everyone
> was happy.
>
> The host started to support GTSE=0, but QEMU kept advertising GTSE=1 
> regardless, and no
> KVM GTSE cap was added to reflect the host support. I'll then assume that:
>
>
> - all guests would break if running in a GTSE=0 host prior to rpt_invalidate 
> support (which
> is necessary to allow the guest to run in GTSE=0)
>
> - apparently no one ever tried to run a KVM guest in a GTSE=0 host, so no 
> bugs were opened

There's a slight misconception in the above statements which is the
separation of QEMU vs. the host. GTSE is advertised via CAS, so the
guest on one side and the HV on the other. QEMU is not merely
advertising what the host GTSE value is, QEMU *is* the host.

Now, of course we could have done this in a way that QEMU asked the
kernel what GTSE value to use, but since we always thought of GTSE as
required for Radix, that was would have been useless. No HV ever
reported GTSE=0 via CAS, either PowerVM or QEMU/KVM, so having the value
hardcoded in QEMU and in the kernel was never an issue.

> After commit 82123b756a1a2f1 (target/ppc: Support for H_RPT_INVALIDATE hcall) 
> we added
> cap-rpt-invalidate. I didn't follow the discussions of this cap but it seems 
> like, as with
> almost every other cap we have, there would be a migration problem for a 
> guest that was in
> a rpt_invalidate aware host to migrate to another where this wouldn't be 
> true, and the cap
> solves that.

Yes, cap-rpt-invalidate works just as we would expect. When I mentioned
to you in private about migration I meant the kernel-side change:

https://git.kernel.org/torvalds/c/bf6b7661f41615

What that change does is add a kernel cmdline option to allow the kernel
to disable GTSE even when running along with an HV that allows GTSE.

> What I'm not following is why, even after having cap-rpt-invalidate, we are 
> still "lying"
> about the GTSE=1 regardless of what the host supports. We could've added the 
> GTSE KVM cap
> at the same time rpt_invalidate was introduced, and guests that want to 
> ignore this setting
> can use the cap to bypass it.

We're still reporting GTSE=1 because that is a design decision from
Linux/KVM. The work to support GTSE=0 was just adding the support, not
deciding whether we should disable GTSE. So QEMU/kernel kept hardcoding
the value without issue.

> In the end this patch is a needed fix IMHO. My confusion is why we're doing 
> this just now.

The bug only surfaces when we run an L1 guest that decided to disable
GTSE via kernel cmdline and a nested guest on top of it. The QEMU inside
the L1 continues to force GTSE=1 as always. That is why the capability
now seem so compelling when previously it might have not.

> The patch itself LGTM.

Unfortunately, this patch as it is cannot work. We always ran with
GTSE=1 so any kernel that does not know about CAP_GTSE will report
GTSE=0 and break any guest that is older than the initial H_RPT
enablement. And the 

[PATCH] x86: q35: require split irqchip for large CPU count

2022-03-11 Thread Igor Mammedov
if VM is started with:

   -enable-kvm -smp 256

without specifying 'split' irqchip, VM might eventually boot
but no more than 255 CPUs will be operational and following
error messages in guest could be observed:
   ...
   smpboot: native_cpu_up: bad cpu 256
   ...
It's a regression introduced by [1], which removed dependency
on intremap=on that were implicitly requiring 'split' irqchip
and forgot to check for 'split' irqchip.
Instead of letting VM boot a broken VM, error out and tell
user how to fix CLI.

Fixes: c1bb5418e ("target/i386: Support up to 32768 CPUs without IRQ remapping")
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2060691
Signed-off-by: Igor Mammedov 
---
 hw/i386/pc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fd55fc725c..a612df5241 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -743,9 +743,9 @@ void pc_machine_done(Notifier *notifier, void *data)
 
 
 if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
-!kvm_irqchip_in_kernel()) {
-error_report("current -smp configuration requires kernel "
- "irqchip support.");
+!kvm_irqchip_is_split()) {
+error_report("current -smp configuration requires 'split' irqchip,"
+ "ensure that '-machine kernel-irqchip=split' is used.");
 exit(EXIT_FAILURE);
 }
 }
-- 
2.31.1




Re: [PATCH] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Cleber Rosa


Beraldo Leal  writes:

> Race conditions can happen with the current code, because the port that
> was available might not be anymore by the time the server is started.
>
> By setting the port to 0, PhoneServer it will use the OS default
> behavior to get a free port, then we save this information so we can
> later configure the guest.
>
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Beraldo Leal 
> ---
>  tests/avocado/avocado_qemu/__init__.py | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tests/avocado/avocado_qemu/__init__.py 
> b/tests/avocado/avocado_qemu/__init__.py
> index 9b056b5ce5..e830d04b84 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -602,9 +602,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
>  self.log.info('Preparing cloudinit image')
>  try:
>  cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
> -self.phone_home_port = network.find_free_port()
> -if not self.phone_home_port:
> -self.cancel('Failed to get a free port')
> +if not self.phone_server:
> +self.cancel('Failed to get port used by the PhoneServer.')

Can you think of a condition where `self.phone_server` would not
evaluate to True?  `network.find_free_port()` could return None, so this
check was valid.  But now with `cloudinit.PhoneHomeServer`, I can not
see how we'd end up with a similar condition.  Instantiating
`cloudinit.PhoneHomeServer` where a port can not be alloccated, AFAICT,
would raise a socket exception instead.

Also, the name of the utility class is PhoneHomeServer.  Using a
different name in the message will make cross references into the
Avocado docs harder.

Finally, a nitpick: I'd drop the leading dot in such a test cancelation
message.

Other than those points, the direction of those changes are indeed a
great improvement.

Thanks,
- Cleber.




Re: [PATCH] tcg/arm: Don't emit UNPREDICTABLE LDRD with Rm == Rt or Rt+1

2022-03-11 Thread Alex Bennée


Richard Henderson  writes:

> The LDRD (register) instruction is UNPREDICTABLE if the Rm register
> is the same as either Rt or Rt+1 (the two registers being loaded to).
> We weren't making sure we avoided this, with the result that on some
> host CPUs like the Cortex-A7 we would get a SIGILL because the CPU
> chooses to UNDEF for this particular UNPREDICTABLE case.
>
> Since we've already checked that datalo is aligned, we can simplify
> the test vs the Rm operand by aligning it before comparison.  Check
> for the two orderings before falling back to two ldr instructions.
>
> We don't bother to do anything similar for tcg_out_ldrd_rwb(),
> because it is only used in tcg_out_tlb_read() with a fixed set of
> registers which don't overlap.
>
> There is no equivalent UNPREDICTABLE case for STRD.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/896
> Signed-off-by: Richard Henderson 

The fix looks sane to me (although I can't test because it seems my
aarch32 on the SynQuacer does try it's best). So:

Reviewed-by: Alex Bennée 

A wider question. Is this something that can be handled the constraints
done by the register allocator? I assume that avoid direct aliasing if
needed?

-- 
Alex Bennée



[RFC PATCH] mailmap/gitdm: more fixes for bad tags and authors

2022-03-11 Thread Alex Bennée
I was running some historical tags for the last 10 years and got the
following warnings:

  git log --use-mailmap --numstat --since "June 2010" | ~/src/gitdm.git/gitdm 
-n -l 5
  alar...@ddci.com is an author name, probably not what you want
  bad utf-8 ('utf-8' codec can't decode byte 0xe4 in position 552: invalid 
continuation byte) in patchm skipping
  bad utf-8 ('utf-8' codec can't decode byte 0xe4 in position 342: invalid 
continuation byte) in patchm skipping
  mich...@ozlabs.org  is an author name, probably not what you want
  Oops...funky email nicta.com.au
  bad utf-8 ('utf-8' codec can't decode byte 0xe9 in position 232: invalid 
continuation byte) in patchm skipping
  Oops...funky email andreas.faerber
  Grabbing changesets...done
  Processed 76422 csets from 1902 developers

The following fixes try and alleviate that although I still get a
warning for Aaron which I think is from 9743cd5736.

Signed-off-by: Alex Bennée 
Cc: Aaron Larson 
Cc: Andreas Färber 
Cc: Jason Wang 
Cc: Michael Ellerman 
Cc: Peter Chubb 
---
 .mailmap  | 6 ++
 contrib/gitdm/aliases | 5 -
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index 5113f55b3a..5dc168b199 100644
--- a/.mailmap
+++ b/.mailmap
@@ -28,7 +28,11 @@ Thiemo Seufer  ths 

 malc  malc 
 
 # Corrupted Author fields
+Aaron Larson  alar...@ddci.com
+Andreas Färber  Andreas Färber 
+Jason Wang  Jason Wang 
 Marek Dolata  mkdol...@us.ibm.com 
+Michael Ellerman  mich...@ozlabs.org 

 Nick Hudson  hn...@vmware.com 
 
 # There is also a:
@@ -70,6 +74,7 @@ Yongbok Kim  
 # Also list preferred name forms where people have changed their
 # git author config, or had utf8/latin1 encoding issues.
 Aaron Lindsay 
+Aaron Larson 
 Alexey Gerasimenko 
 Alex Chen 
 Alex Ivanov 
@@ -144,6 +149,7 @@ Pan Nengyuan 
 Pavel Dovgaluk 
 Pavel Dovgaluk 
 Pavel Dovgaluk 
+Peter Chubb 
 Peter Crosthwaite 
 Peter Crosthwaite 
 Peter Crosthwaite 
diff --git a/contrib/gitdm/aliases b/contrib/gitdm/aliases
index 4792413ce7..5b31635c15 100644
--- a/contrib/gitdm/aliases
+++ b/contrib/gitdm/aliases
@@ -34,8 +34,11 @@ malc@c046a42c-6fe2-441c-8c8c-71466251a162 av1...@comtv.ru
 # canonical emails
 liq...@163.com liq...@gmail.com
 
-# some broken tags
+# some broken DCO tags
 yuval.shaia.ml.gmail.com yuval.shaia...@gmail.com
+jasowang jasow...@redhat.com
+nicta.com.au peter.ch...@nicta.com.au
+alar...@ddci.com alar...@ddci.com
 
 # There is also a:
 #(no author) <(no author)@c046a42c-6fe2-441c-8c8c-71466251a162>
-- 
2.30.2




Re: [PATCH 2/6] tests/9pfs: Twalk with nwname=0

2022-03-11 Thread Christian Schoenebeck
On Freitag, 11. März 2022 12:41:32 CET Greg Kurz wrote:
> On Thu, 10 Mar 2022 09:57:25 +0100
> 
> Christian Schoenebeck  wrote:
> > On Mittwoch, 9. März 2022 14:24:24 CET Christian Schoenebeck wrote:
> > > Send Twalk request with nwname=0. In this case no QIDs should
> > > be returned by 9p server; this is equivalent to walking to dot.
> > > 
> > > Signed-off-by: Christian Schoenebeck 
> > > ---
> > > 
> > >  tests/qtest/virtio-9p-test.c | 22 ++
> > >  1 file changed, 22 insertions(+)
> > > 
> > > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > > index 22bdd74bc1..6c00da03f4 100644
> > > --- a/tests/qtest/virtio-9p-test.c
> > > +++ b/tests/qtest/virtio-9p-test.c
> > > @@ -1002,6 +1002,27 @@ static void fs_walk_nonexistent(void *obj, void
> > > *data, QGuestAllocator *t_alloc) do_walk_expect_error(v9p,
> > > "non-existent",
> > > ENOENT);
> > > 
> > >  }
> > > 
> > > +static void fs_walk_none(void *obj, void *data, QGuestAllocator
> > > *t_alloc)
> > > +{
> > 
> > Or maybe calling this function fs_walk_clone_fid and the test case name
> > "synth/walk/clone_fid" respectively instead?
> 
> I agree that Twalk with nwname=0 does clone the fid in practice but
> the test doesn't explicitly check that. In its present form, I'd
> suggest a "no_names" wording but it is already fine as is, so:

It actually does; not with this patch 2 yet, but with patch 3 (which compares 
QIDs).

> Reviewed-by: Greg Kurz 

Thanks!

Best regards,
Christian Schoenebeck





Re: [PATCH v5 0/5] user creatable pnv-phb4 devices

2022-03-11 Thread Daniel Henrique Barboza




On 3/11/22 09:45, Cédric Le Goater wrote:

Hello,

In 3/11/22 03:18, Daniel Henrique Barboza wrote:



On 3/10/22 15:49, Thomas Huth wrote:

On 11/01/2022 14.10, Daniel Henrique Barboza wrote:

Hi,

This version implements Cedric's review suggestions from v4. No
drastic design changes were made.

Changes from v4:
- patches 1,3,5: unchanged
- patch 2:
   * renamed function to pnv_phb4_xscom_realize()
   * pnv4_phb4_xscom_realize() is now called at the end of phb4_realize()
- patch 4:
   * changed pnv_phb4_get_stack signature to use chip and phb
   * added a new helper called pnv_pec_stk_default_phb_realize() to
realize the default phb when running with defaults
- v4 link: https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg02148.html

Daniel Henrique Barboza (5):
   ppc/pnv: set phb4 properties in stk_realize()
   ppc/pnv: move PHB4 XSCOM init to phb4_realize()
   ppc/pnv: turn 'phb' into a pointer in struct PnvPhb4PecStack
   ppc/pnv: Introduce user creatable pnv-phb4 devices
   ppc/pnv: turn pnv_phb4_update_regions() into static


It's now possible to crash QEMU with the pnv-phb4 device:

$ ./qemu-system-ppc64 -nographic -M powernv9 -device pnv-phb4
Unexpected error in object_property_try_add() at 
../../devel/qemu/qom/object.c:1229:
qemu-system-ppc64: -device pnv-phb4: attempt to add duplicate property 
'pnv-phb4[0]' to object (type 'power9_v2.0-pnv-chip')
Aborted (core dumped)

Any ideas how to fix this?


Thanks for catching this up.

The issue here is that we're not handling the case where an user adds a 
pnv-phb4 device
when running default settings (no -nodefaults). With default settings we are 
adding all
pnv-phb4 devices that are available to the machine, having no room for any 
additional
user creatable pnv-phb4 devices.

A similar situation happens with the powernv8 machine which errors out with a 
different
error message:

$ ./qemu-system-ppc64 -nographic -M powernv8 -device pnv-phb3
qemu-system-ppc64: -device pnv-phb3: Can't add chassis slot, error -16


Adding all possible phbs by default is a behavior these machines had since they 
were introduced,
and I don't think we want to change it. Thus, a fix would be to forbid user 
created pnv-phb devices
when running with defaults.



On a real system with POWER{8,9,10} processors, PHBs are sub-units of
the processor, they can be deactivated by firmware but not plugged in
or out like a PCI adapter on a slot. Nevertheless, it seemed to be
good idea to have user-created PHBs for testing purposes.

Let's come back to the PowerNV requirements.

  1. having a limited set of PHBs speedups boot time.
  2. it is useful to be able to mimic a partially broken topology you
     some time have to deal with during bring-up.

PowerNV is also used for distro install tests and having libvirt
support eases these tasks. libvirt prefers to run the machine with
-nodefaults to be sure not to drag unexpected devices which would need
to be defined in the domain file without being specified on the QEMU
command line. For this reason :

  3. -nodefaults should not include default PHBs

The solution we came with was to introduce user-created PHB{3,4,5}
devices on the powernv{8,9,10} machines. Reality proves to be a bit
more complex, internally when modeling such devices, and externally
when dealing with the user interface.

So, to make sure that we don't ship a crappy feature in QEMU 7.0,
I think we should step back a little.

Req 1. and 2. can be simply addressed differently with a machine option:
"phb-mask=", which QEMU would use to enable/disable PHB device
nodes when creating the device tree.


Would this property only impact the device-tree?

Let's say that we're running a 2 socket pnv4 machine, with default settings. 
That
would give us 12 PHBs. So phb-mask= is kind of a no-op because you're adding
all PHBs, phb-mask=0001 would add just the first PHB (index=0 chip-id=0) and so
on. Is that correct?



For Req 3., we need to make sure we are taking the right approach. It
seems that we should expose a new type of user-created PHB device,
a generic virtualized one, that libvirt would use and not one depending
on the processor revision. This needs more thinking.


libvirt support isn't upstream yet. We have room to make changes.

I agree that creating generic, un-versioned pnv-phb and pnv-phb-root-port 
devices
that can be used by all pnv machines would be good for libvirt support.



Hence, I am proposing we drop user-created PHB{3,4,5} for QEMU-7.0.
All the cleanups we did are not lost and they will be useful for the
next steps in QEMU 7.1.



Seems like a good idea for now. Even if we decide to expose them in the end, if 
we
keep them user visible for the 7.0 release we won't be able to change our minds
in 7.1.


Thanks,


Daniel




Thanks,

C.





Re: [PATCH] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Daniel P . Berrangé
On Fri, Mar 11, 2022 at 10:09:19AM -0300, Beraldo Leal wrote:
> Race conditions can happen with the current code, because the port that
> was available might not be anymore by the time the server is started.
> 
> By setting the port to 0, PhoneServer it will use the OS default
> behavior to get a free port, then we save this information so we can
> later configure the guest.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Beraldo Leal 
> ---
>  tests/avocado/avocado_qemu/__init__.py | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)

Great improvement !

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v4 1/3] qmp: Support for querying stats

2022-03-11 Thread Daniel P . Berrangé
On Fri, Mar 11, 2022 at 02:06:46PM +0100, Markus Armbruster wrote:
> Mark Kanda  writes:
> 
> > Introduce QMP support for querying stats. Provide a framework for adding new
> > stats and support for the following commands:
> >
> > - query-stats
> > Returns a list of all stats per target type (only VM and vCPU to start), 
> > with
> > additional options for specifying stat names, vCPU qom paths, and providers.
> >
> > - query-stats-schemas
> > Returns a list of stats included in each target type, with an option for
> > specifying the provider.
> >
> > The framework provides a method to register callbacks for these QMP 
> > commands.
> >
> > The first use-case will be for fd-based KVM stats (in an upcoming patch).
> >
> > Examples (with fd-based KVM stats):
> >
> > - Query all VM stats:
> >
> > { "execute": "query-stats", "arguments" : { "target": "vm" } }
> >
> > { "return": {
> >   "vm": [
> >  { "provider": "kvm",
> >"stats": [
> >   { "name": "max_mmu_page_hash_collisions", "value": 0 },
> >   { "name": "max_mmu_rmap_size", "value": 0 },
> >   { "name": "nx_lpage_splits", "value": 148 },
> >   ...
> >  { "provider": "xyz",
> >"stats": [ ...
> >  ...
> > ] } }
> >
> > - Query all vCPU stats:
> >
> > { "execute": "query-stats", "arguments" : { "target": "vcpu" } }
> >
> > { "return": {
> > "vcpus": [
> >   { "path": "/machine/unattached/device[0]"
> > "providers": [
> >   { "provider": "kvm",
> > "stats": [
> >   { "name": "guest_mode", "value": 0 },
> >   { "name": "directed_yield_successful", "value": 0 },
> >   { "name": "directed_yield_attempted", "value": 106 },
> >   ...
> >   { "provider": "xyz",
> > "stats": [ ...
> >...
> >   { "path": "/machine/unattached/device[1]"
> > "providers": [
> >   { "provider": "kvm",
> > "stats": [...
> >   ...
> > } ] } }
> >
> > - Query 'exits' and 'l1d_flush' KVM stats, and 'somestat' from provider 
> > 'xyz'
> > for vCPUs '/machine/unattached/device[2]' and 
> > '/machine/unattached/device[4]':
> >
> > { "execute": "query-stats",
> >   "arguments": {
> > "target": "vcpu",
> > "vcpus": [ "/machine/unattached/device[2]",
> >"/machine/unattached/device[4]" ],
> > "filters": [
> >   { "provider": "kvm",
> > "fields": [ "l1d_flush", "exits" ] },
> >   { "provider": "xyz",
> > "fields": [ "somestat" ] } ] } }
> 
> Are the stats bulky enough to justfify the extra complexity of
> filtering?

I viewed it more as a mechanism to cope with a scenario where
some stats are expensive to query. If the mgmt app only want
to get one specific cheap stat, we don't want to trigger code
that does expensive queries of other stats as a side effect.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH] tests/avocado: starts PhoneServer upfront

2022-03-11 Thread Beraldo Leal
Race conditions can happen with the current code, because the port that
was available might not be anymore by the time the server is started.

By setting the port to 0, PhoneServer it will use the OS default
behavior to get a free port, then we save this information so we can
later configure the guest.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Beraldo Leal 
---
 tests/avocado/avocado_qemu/__init__.py | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index 9b056b5ce5..e830d04b84 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -602,9 +602,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
 self.log.info('Preparing cloudinit image')
 try:
 cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
-self.phone_home_port = network.find_free_port()
-if not self.phone_home_port:
-self.cancel('Failed to get a free port')
+if not self.phone_server:
+self.cancel('Failed to get port used by the PhoneServer.')
 pubkey_content = None
 if ssh_pubkey:
 with open(ssh_pubkey) as pubkey:
@@ -614,7 +613,7 @@ def prepare_cloudinit(self, ssh_pubkey=None):
   password=self.password,
   # QEMU's hard coded usermode router address
   phone_home_host='10.0.2.2',
-  phone_home_port=self.phone_home_port,
+  phone_home_port=self.phone_server.server_port,
   authorized_key=pubkey_content)
 except Exception:
 self.cancel('Failed to prepare the cloudinit image')
@@ -625,6 +624,8 @@ def set_up_boot(self):
 self.vm.add_args('-drive', 'file=%s' % path)
 
 def set_up_cloudinit(self, ssh_pubkey=None):
+self.phone_server = cloudinit.PhoneHomeServer(('0.0.0.0', 0),
+  self.name)
 cloudinit_iso = self.prepare_cloudinit(ssh_pubkey)
 self.vm.add_args('-drive', 'file=%s,format=raw' % cloudinit_iso)
 
@@ -635,7 +636,9 @@ def launch_and_wait(self, set_up_ssh_connection=True):
  
logger=self.log.getChild('console'))
 console_drainer.start()
 self.log.info('VM launched, waiting for boot confirmation from guest')
-cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
self.name)
+while not self.phone_server.instance_phoned_back:
+self.phone_server.handle_request()
+
 if set_up_ssh_connection:
 self.log.info('Setting up the SSH connection')
 self.ssh_connect(self.username, self.ssh_key)
-- 
2.31.1




Re: [PATCH v4 1/3] qmp: Support for querying stats

2022-03-11 Thread Markus Armbruster
Mark Kanda  writes:

> Introduce QMP support for querying stats. Provide a framework for adding new
> stats and support for the following commands:
>
> - query-stats
> Returns a list of all stats per target type (only VM and vCPU to start), with
> additional options for specifying stat names, vCPU qom paths, and providers.
>
> - query-stats-schemas
> Returns a list of stats included in each target type, with an option for
> specifying the provider.
>
> The framework provides a method to register callbacks for these QMP commands.
>
> The first use-case will be for fd-based KVM stats (in an upcoming patch).
>
> Examples (with fd-based KVM stats):
>
> - Query all VM stats:
>
> { "execute": "query-stats", "arguments" : { "target": "vm" } }
>
> { "return": {
>   "vm": [
>  { "provider": "kvm",
>"stats": [
>   { "name": "max_mmu_page_hash_collisions", "value": 0 },
>   { "name": "max_mmu_rmap_size", "value": 0 },
>   { "name": "nx_lpage_splits", "value": 148 },
>   ...
>  { "provider": "xyz",
>"stats": [ ...
>  ...
> ] } }
>
> - Query all vCPU stats:
>
> { "execute": "query-stats", "arguments" : { "target": "vcpu" } }
>
> { "return": {
> "vcpus": [
>   { "path": "/machine/unattached/device[0]"
> "providers": [
>   { "provider": "kvm",
> "stats": [
>   { "name": "guest_mode", "value": 0 },
>   { "name": "directed_yield_successful", "value": 0 },
>   { "name": "directed_yield_attempted", "value": 106 },
>   ...
>   { "provider": "xyz",
> "stats": [ ...
>...
>   { "path": "/machine/unattached/device[1]"
> "providers": [
>   { "provider": "kvm",
> "stats": [...
>   ...
> } ] } }
>
> - Query 'exits' and 'l1d_flush' KVM stats, and 'somestat' from provider 'xyz'
> for vCPUs '/machine/unattached/device[2]' and '/machine/unattached/device[4]':
>
> { "execute": "query-stats",
>   "arguments": {
> "target": "vcpu",
> "vcpus": [ "/machine/unattached/device[2]",
>"/machine/unattached/device[4]" ],
> "filters": [
>   { "provider": "kvm",
> "fields": [ "l1d_flush", "exits" ] },
>   { "provider": "xyz",
> "fields": [ "somestat" ] } ] } }

Are the stats bulky enough to justfify the extra complexity of
filtering?

>
> { "return": {
> "vcpus": [
>   { "path": "/machine/unattached/device[2]"
> "providers": [
>   { "provider": "kvm",
> "stats": [ { "name": "l1d_flush", "value": 41213 },
>{ "name": "exits", "value": 74291 } ] },
>   { "provider": "xyz",
> "stats": [ ... ] } ] },
>   { "path": "/machine/unattached/device[4]"
> "providers": [
>   { "provider": "kvm",
> "stats": [ { "name": "l1d_flush", "value": 16132 },
>{ "name": "exits", "value": 57922 } ] },
>   { "provider": "xyz",
> "stats": [ ... ] } ] } ] } }
>
> - Query stats schemas:
>
> { "execute": "query-stats-schemas" }
>
> { "return": {
> "vcpu": [
>   { "provider": "kvm",
> "stats": [
>{ "name": "guest_mode",
>  "unit": "none",
>  "base": 10,
>  "exponent": 0,
>  "type": "instant" },
>   { "name": "directed_yield_successful",
>  "unit": "none",
>  "base": 10,
>  "exponent": 0,
>  "type": "cumulative" },
>  ...
>   { "provider": "xyz",
> ...
>"vm": [
>   { "provider": "kvm",
> "stats": [
>{ "name": "max_mmu_page_hash_collisions",
>  "unit": "none",
>  "base": 10,
>  "exponent": 0,
>  "type": "peak" },
>   { "provider": "xyz",
>   ...

Can you give a use case for query-stats-schemas?

>
> Signed-off-by: Mark Kanda 
> ---
>  include/monitor/stats.h |  51 
>  monitor/qmp-cmds.c  | 219 +
>  qapi/meson.build|   1 +
>  qapi/qapi-schema.json   |   1 +
>  qapi/stats.json | 259 

That's a lot of schema code.

How much of it is for query-stats, and how much for query-stats-schemas?

How much of the query-stats part is for filtering?

>  5 files changed, 531 insertions(+)
>  create mode 100644 include/monitor/stats.h
>  create mode 100644 qapi/stats.json

[...]

> diff --git a/qapi/stats.json b/qapi/stats.json
> new file mode 100644
> index 00..ae5dc3ee2c
> --- /dev/null
> +++ b/qapi/stats.json
> @@ -0,0 +1,259 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +#
> +# Copyright (c) 2022 Oracle and/or its affiliates.
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or later.
> +# See the COPYING file in the top-level directory.
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +##
> +# 

Re: [PATCH 1/2] ui/cocoa: add option to disable left-command forwarding to guest

2022-03-11 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> On 18/2/22 19:55, Peter Maydell wrote:
>> On Sun, 2 Jan 2022 at 17:42, Carwyn Ellis  wrote:
>>>
>>> When switching between guest and host on a Mac using command-tab the
>>> command key is sent to the guest which can trigger functionality in the
>>> guest OS. Specifying left-command-key=off disables forwarding this key
>>> to the guest. Defaults to enabled.
>>>
>>> Also updated the cocoa display documentation to reference the new
>>> left-command-key option along with the existing show-cursor option.
>>>
>>> Signed-off-by: Carwyn Ellis 
>> Ccing the QAPI folks for review on the QAPI parts of this.
>> -- PMM
>> 
>>> ---
>>>   qapi/ui.json| 17 +
>>>   qemu-options.hx | 12 
>>>   ui/cocoa.m  |  8 +++-
>>>   3 files changed, 36 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qapi/ui.json b/qapi/ui.json
>>> index 2b4371da37..764480e145 100644
>>> --- a/qapi/ui.json
>>> +++ b/qapi/ui.json
>>> @@ -1107,6 +1107,22 @@
>>> 'data': { '*grab-on-hover' : 'bool',
>>>   '*zoom-to-fit'   : 'bool'  } }
>>>
>>> +##
>>> +# @DisplayCocoa:
>>> +#
>>> +# Cocoa display options.
>>> +#
>>> +# @left-command-key: Enable/disable forwarding of left command key to
>>> +#guest. Allows command-tab window switching on the
>>> +#host without sending this key to the guest when
>>> +#"off". Defaults to "on"
>>> +#
>>> +# Since: 6.2.50
>
> Not a QAPI folk, but LGTM using 7.0 here instead of 6.2.50 (this the
> number of the *released* QEMU version which contains this new type).

Yes.

>>> +#
>>> +##
>>> +{ 'struct'  : 'DisplayCocoa',
>>> +  'data': { '*left-command-key' : 'bool' } }
>>> +
>>>   ##
>>>   # @DisplayEGLHeadless:
>>>   #
>>> @@ -1254,6 +1270,7 @@
>>> 'discriminator' : 'type',
>>> 'data': {
>>> 'gtk': { 'type': 'DisplayGTK', 'if': 'CONFIG_GTK' },
>>> +  'cocoa': { 'type': 'DisplayCocoa', 'if': 'CONFIG_COCOA' },
>>> 'curses': { 'type': 'DisplayCurses', 'if': 'CONFIG_CURSES' },
>>> 'egl-headless': { 'type': 'DisplayEGLHeadless',
>>>   'if': { 'all': ['CONFIG_OPENGL', 'CONFIG_GBM'] } 
>>> },
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index fd1f8135fb..6fa9c38c83 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -1912,6 +1912,9 @@ DEF("display", HAS_ARG, QEMU_OPTION_display,
>>>   #if defined(CONFIG_DBUS_DISPLAY)
>>>   "-display dbus[,addr=]\n"
>>>   " [,gl=on|core|es|off][,rendernode=]\n"
>>> +#endif
>>> +#if defined(CONFIG_COCOA)
>>> +"-display cocoa[,show-cursor=on|off][,left-command-key=on|off]\n"
>>>   #endif
>>>   "-display none\n"
>>>   "select display backend type\n"
>>> @@ -1999,6 +2002,15 @@ SRST
>>>   ``charset=CP850`` for IBM CP850 encoding. The default is
>>>   ``CP437``.
>>>
>>> +``cocoa``
>>> +Display video output in a Cocoa window. Mac only. This interface
>>> +provides drop-down menus and other UI elements to configure and
>>> +control the VM during runtime. Valid parameters are:
>>> +
>>> +``show-cursor=on|off`` :  Force showing the mouse cursor
>>> +
>>> +``left-command-key=on|off`` : Disable forwarding left command key 
>>> to host
>>> +
>>>   ``egl-headless[,rendernode=]``
>>>   Offload all OpenGL operations to a local DRI device. For any
>>>   graphical display, this display needs to be paired with either

In the QAPI schema, @full-screen, @window-close and @show-cursor are
common to all display types.  Here, @full-screen is only documented with
gtk, @window-close with gtk and sdl, @show-cursor with gtk, sdl and now
cocoa.

What's going on here?

[...]




Re: [PATCH v5 0/5] user creatable pnv-phb4 devices

2022-03-11 Thread Cédric Le Goater

Hello,

In 3/11/22 03:18, Daniel Henrique Barboza wrote:



On 3/10/22 15:49, Thomas Huth wrote:

On 11/01/2022 14.10, Daniel Henrique Barboza wrote:

Hi,

This version implements Cedric's review suggestions from v4. No
drastic design changes were made.

Changes from v4:
- patches 1,3,5: unchanged
- patch 2:
   * renamed function to pnv_phb4_xscom_realize()
   * pnv4_phb4_xscom_realize() is now called at the end of phb4_realize()
- patch 4:
   * changed pnv_phb4_get_stack signature to use chip and phb
   * added a new helper called pnv_pec_stk_default_phb_realize() to
realize the default phb when running with defaults
- v4 link: https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg02148.html

Daniel Henrique Barboza (5):
   ppc/pnv: set phb4 properties in stk_realize()
   ppc/pnv: move PHB4 XSCOM init to phb4_realize()
   ppc/pnv: turn 'phb' into a pointer in struct PnvPhb4PecStack
   ppc/pnv: Introduce user creatable pnv-phb4 devices
   ppc/pnv: turn pnv_phb4_update_regions() into static


It's now possible to crash QEMU with the pnv-phb4 device:

$ ./qemu-system-ppc64 -nographic -M powernv9 -device pnv-phb4
Unexpected error in object_property_try_add() at 
../../devel/qemu/qom/object.c:1229:
qemu-system-ppc64: -device pnv-phb4: attempt to add duplicate property 
'pnv-phb4[0]' to object (type 'power9_v2.0-pnv-chip')
Aborted (core dumped)

Any ideas how to fix this?


Thanks for catching this up.

The issue here is that we're not handling the case where an user adds a 
pnv-phb4 device
when running default settings (no -nodefaults). With default settings we are 
adding all
pnv-phb4 devices that are available to the machine, having no room for any 
additional
user creatable pnv-phb4 devices.

A similar situation happens with the powernv8 machine which errors out with a 
different
error message:

$ ./qemu-system-ppc64 -nographic -M powernv8 -device pnv-phb3
qemu-system-ppc64: -device pnv-phb3: Can't add chassis slot, error -16


Adding all possible phbs by default is a behavior these machines had since they 
were introduced,
and I don't think we want to change it. Thus, a fix would be to forbid user 
created pnv-phb devices
when running with defaults.



On a real system with POWER{8,9,10} processors, PHBs are sub-units of
the processor, they can be deactivated by firmware but not plugged in
or out like a PCI adapter on a slot. Nevertheless, it seemed to be
good idea to have user-created PHBs for testing purposes.

Let's come back to the PowerNV requirements.

 1. having a limited set of PHBs speedups boot time.
 2. it is useful to be able to mimic a partially broken topology you
some time have to deal with during bring-up.

PowerNV is also used for distro install tests and having libvirt
support eases these tasks. libvirt prefers to run the machine with
-nodefaults to be sure not to drag unexpected devices which would need
to be defined in the domain file without being specified on the QEMU
command line. For this reason :

 3. -nodefaults should not include default PHBs

The solution we came with was to introduce user-created PHB{3,4,5}
devices on the powernv{8,9,10} machines. Reality proves to be a bit
more complex, internally when modeling such devices, and externally
when dealing with the user interface.

So, to make sure that we don't ship a crappy feature in QEMU 7.0,
I think we should step back a little.

Req 1. and 2. can be simply addressed differently with a machine option:
"phb-mask=", which QEMU would use to enable/disable PHB device
nodes when creating the device tree.

For Req 3., we need to make sure we are taking the right approach. It
seems that we should expose a new type of user-created PHB device,
a generic virtualized one, that libvirt would use and not one depending
on the processor revision. This needs more thinking.

Hence, I am proposing we drop user-created PHB{3,4,5} for QEMU-7.0.
All the cleanups we did are not lost and they will be useful for the
next steps in QEMU 7.1.


Thanks,

C.




Re: [PATCH v2 2/2] Added parameter to take screenshot with screendump as PNG.

2022-03-11 Thread Markus Armbruster
Kshitij Suri  writes:

> Currently screendump only supports PPM format, which is un-compressed and not
> standard. Added a "format" parameter to qemu monitor screendump capabilites
> to support PNG image capture using libpng. The param was added in QAPI schema
> of screendump present in ui.json along with png_save() function which converts
> pixman_image to PNG. HMP command equivalent was also modified to support the
> feature.
>
> Example usage:
> { "execute": "screendump", "arguments": { "filename": "/tmp/image",
> "format":"png" } }
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/718
>
> Signed-off-by: Kshitij Suri 
> ---

[...]

> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 8c384dc1b2..9a640146eb 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1677,9 +1677,26 @@ hmp_screendump(Monitor *mon, const QDict *qdict)
>  const char *filename = qdict_get_str(qdict, "filename");
>  const char *id = qdict_get_try_str(qdict, "device");
>  int64_t head = qdict_get_try_int(qdict, "head", 0);
> +const char *input_format  = qdict_get_str(qdict, "format");
>  Error *err = NULL;
> +ImageFormat format;
>  
> -qmp_screendump(filename, id != NULL, id, id != NULL, head, );
> +int val = qapi_enum_parse(_lookup, input_format, -1, );
> +if (val < 0) {
> +goto end;
> +}
> +
> +switch (val) {
> +case IMAGE_FORMAT_PNG:
> +format = IMAGE_FORMAT_PNG;
> +break;
> +default:
> +format = IMAGE_FORMAT_PPM;
> +}
> +
> +qmp_screendump(filename, id != NULL, id, id != NULL, head,
> +   input_format != NULL, format, );
> +end:
>  hmp_handle_error(mon, err);
>  }
>  
> diff --git a/qapi/ui.json b/qapi/ui.json
> index 9354f4c467..6aa0dd7c1b 100644
> --- a/qapi/ui.json
> +++ b/qapi/ui.json
> @@ -73,12 +73,27 @@
>  ##
>  { 'command': 'expire_password', 'data': {'protocol': 'str', 'time': 'str'} }
>  
> +##
> +# @ImageFormat:
> +#
> +# Available list of supported types.

This is just a hair better than "Lorem ipsum" :)

Suggest: Supported image format types.

> +#
> +# @png: PNG format
> +#
> +# @ppm: PPM format
> +#
> +# Since: 7.0
> +#
> +##
> +{ 'enum': 'ImageFormat',
> +  'data': ['ppm', 'png'] }
> +
>  ##
>  # @screendump:
>  #
> -# Write a PPM of the VGA screen to a file.
> +# Write a screenshot of the VGA screen to a file.

Is "VGA screen" accurate?  Or does this work for other displays, too?

>  #
> -# @filename: the path of a new PPM file to store the image
> +# @filename: the path of a new file to store the image
>  #
>  # @device: ID of the display device that should be dumped. If this parameter
>  #  is missing, the primary display will be used. (Since 2.12)
> @@ -87,6 +102,9 @@
>  #parameter is missing, head #0 will be used. Also note that the head
>  #can only be specified in conjunction with the device ID. (Since 
> 2.12)
>  #
> +# @format: image format for screendump is specified. ppm is the set as the
> +#  default format. (Since 7.0)

I figure you mean "is set as the default".  Suggest to replace the
sentence by "(default: ppm)".

> +#
>  # Returns: Nothing on success
>  #
>  # Since: 0.14
> @@ -99,7 +117,8 @@
>  #
>  ##
>  { 'command': 'screendump',
> -  'data': {'filename': 'str', '*device': 'str', '*head': 'int'},
> +  'data': {'filename': 'str', '*device': 'str', '*head': 'int',
> +   '*format': 'ImageFormat'},
>'coroutine': true }
>  
>  ##

[...]




Re: [PATCH v5 13/15] hw/nvme: Add support for the Virtualization Management command

2022-03-11 Thread Lukasz Maniak
On Wed, Mar 09, 2022 at 01:41:27PM +0100, Łukasz Gieryk wrote:
> On Tue, Mar 01, 2022 at 02:07:08PM +0100, Klaus Jensen wrote:
> > On Feb 17 18:45, Lukasz Maniak wrote:
> > > From: Łukasz Gieryk 
> > > 
> > > With the new command one can:
> > >  - assign flexible resources (queues, interrupts) to primary and
> > >secondary controllers,
> > >  - toggle the online/offline state of given controller.
> > > 
> > 
> > QEMU segfaults (or asserts depending on the wind blowing) if the SR-IOV
> > enabled device is hotplugged after being configured (i.e. follow the
> > docs for a simple setup and then do a `device_del ` in the
> > monitor. I suspect this is related to freeing the queues and something
> > getting double-freed.
> > 
> 
> I’ve finally found some time to look at the issue.
> 
> Long story short: the hot-plug mechanism deletes all VFs without the PF
> knowing, then PF tries to reset and delete all the already non-existing
> devices.
> 
> I have a solution for the problem, but there’s high a chance it’s not
> the correct one. I’m still reading through the specs, as my knowledge in
> the area of hot-plug/ACPI is quite limited.
> 
> Soon we will release the next patch set, with the fix included. I hope
> the ACPI maintainers will chime in then. Till that happens, this is the
> summary of my findings:
> 
> 1) The current SR-IOV implementation assumes it’s the PF that creates
>and deletes VFs.
> 2) It’s a design decision (the Nvme device at least) for the VFs to be
>of the same class as PF. Effectively, they share the dc->hotpluggable
>value.
> 3) When a VF is created, it’s added as a child node to PF’s PCI bus
>slot.
> 4) Monitor/device_del triggers the ACPI mechanism. The implementation is
>not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
>hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
> 5) VFs are unrealized directly, and it doesn’t work well with (1).
>SR/IOV structures are not updated, so when it’s PF’s turn to be
>unrealized, it works on stale pointers to already-deleted VFs.
> 
> My proposed ‘fix’ is to make the PCI ACPI code aware of SR/IOV:
> 

CC'ing ACPI/SMBIOS maintainers/reviewers on the proposed fix.

> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index f4d706e47d..090bdb8e74 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -196,8 +196,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, 
> PCIDevice *dev)
>   * ACPI doesn't allow hotplug of bridge devices.  Don't allow
>   * hot-unplug of bridge devices unless they were added by hotplug
>   * (and so, not described by acpi).
> + *
> + * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
> + * will be removed implicitly, when Physical Function is unplugged.
>   */
> -return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
> +return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
> +   pci_is_vf(dev);
>  }
> 



Re: [PATCH v2 07/11] tests/acpi: update expected data files

2022-03-11 Thread Ani Sinha
On Fri, Mar 11, 2022 at 11:09 AM Gerd Hoffmann  wrote:
>
> The switch to edk2 RELEASE builds changes the memory layout a bit,
> resulting in a acpi table change.

I would actually also refer to the actual change (submodule update
etc) that results in this change. Otherwise its hard to track/bosect
stuff.

>
>  DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
>  {
>  Scope (\_SB)
>  {
>  Device (NVDR)
>  {
>  Name (_HID, "ACPI0012" /* NVDIMM Root Device */)  // _HID: 
> Hardware ID
>  Method (NCAL, 5, Serialized)
>  {
>  Local6 = MEMA /* \MEMA */
>  {
>  Return (NCAL (Arg0, Arg1, Arg2, Arg3, 0x02))
>  }
>  }
>
>  Device (NV02)
>  {
>  Name (_ADR, 0x03)  // _ADR: Address
>  Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific 
> Method
>  {
>  Return (NCAL (Arg0, Arg1, Arg2, Arg3, 0x03))
>  }
>  }
>  }
>  }
>
> -Name (MEMA, 0x43D1)
> +Name (MEMA, 0x43F5)
>  }
>
> Signed-off-by: Gerd Hoffmann 
> ---
>  tests/data/acpi/virt/SSDT.memhp | Bin 736 -> 736 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
>
> diff --git a/tests/data/acpi/virt/SSDT.memhp b/tests/data/acpi/virt/SSDT.memhp
> index 
> 375d7b6fc85a484f492a26ccd355c205f2c34473..4c363a6d95a7e2e826568c85f5719127748e7932
>  100644
> GIT binary patch
> delta 22
> dcmaFB`hb-yIM^lR0TTlQqx43uD@;sZodHo~2HXGu
>
> delta 22
> dcmaFB`hb-yIM^lR0TTlQqy0v%D@;rmodHrj2HXGu
>
> --
> 2.35.1
>



Re: [PATCH experiment 00/35] stackless coroutine backend

2022-03-11 Thread Daniel P . Berrangé
On Fri, Mar 11, 2022 at 01:04:33PM +0100, Paolo Bonzini wrote:
> On 3/11/22 10:27, Stefan Hajnoczi wrote:
> > > Not quite voluntarily, but I noticed I had to add one 0 to make them run 
> > > for
> > > a decent amount of time.  So yeah, it's much faster than siglongjmp.
> > That's a nice first indication that performance will be good. I guess
> > that deep coroutine_fn stacks could be less efficient with stackless
> > coroutines compared to ucontext, but the cost of switching between
> > coroutines (enter/yield) will be lower with stackless coroutines.
> 
> Note that right now I'm not placing the coroutine_fn stack on the heap, it's
> still allocated from a contiguous area in virtual address space. The
> contiguous allocation is wrapped by coroutine_stack_alloc and
> coroutine_stack_free, so it's really easy to change them to malloc and free.
> 
> I also do not have to walk up the whole call stack on coroutine_fn yields,
> because calls from one coroutine_fn to the next are tail calls; in exchange
> for that, I have more indirect calls than if the code did
> 
> if (next_call() == COROUTINE_YIELD) {
> return COROUTINE_YIELD;
> }
> 
> For now the choice was again just the one that made the translation easiest.
> 
> Today I also managed to implement a QEMU-like API on top of C++ coroutines:
> 
> CoroutineFn return_int() {
> co_await qemu_coroutine_yield();
> co_return 30;
> }
> 
> CoroutineFn return_void() {
> co_await qemu_coroutine_yield();
> }
> 
> CoroutineFn co(void *) {
> co_await return_void();
> printf("%d\n", co_await return_int())
> co_await qemu_coroutine_yield();
> }
> 
> int main() {
> Coroutine *f = qemu_coroutine_create(co, NULL);
> printf("--- 0\n");
> qemu_coroutine_enter(f);
> printf("--- 1\n");
> qemu_coroutine_enter(f);
> printf("--- 2\n");
> qemu_coroutine_enter(f);
> printf("--- 3\n");
> qemu_coroutine_enter(f);
> printf("--- 4\n");
> }
> 
> The runtime code is absurdly obscure; my favorite bit is
> 
> Yield qemu_coroutine_yield()
> {
> return Yield();
> }
> 
> :) However, at 200 lines of code it's certainly smaller than a
> source-to-source translator.  It might be worth investigating a bit more.
> Only files that define or use a coroutine_fn (which includes callers of
> qemu_coroutine_create) would have to be compiled as C++.

Unless I'm misunderstanding what you mean, "define a coroutine_fn"
is a very large number of functions/files

  $ git grep coroutine_fn | wc -l
  806
  $ git grep -l coroutine_fn | wc -l
  132

Dominated by the block layer of course, but tentacles spreading
out into alot of other code.

Feels like identifying all callers would be tedious/unpleasant enough,
that practically speaking we would have to just compile all of QEMU
as C++.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH experiment 00/35] stackless coroutine backend

2022-03-11 Thread Paolo Bonzini

On 3/11/22 10:27, Stefan Hajnoczi wrote:

Not quite voluntarily, but I noticed I had to add one 0 to make them run for
a decent amount of time.  So yeah, it's much faster than siglongjmp.

That's a nice first indication that performance will be good. I guess
that deep coroutine_fn stacks could be less efficient with stackless
coroutines compared to ucontext, but the cost of switching between
coroutines (enter/yield) will be lower with stackless coroutines.


Note that right now I'm not placing the coroutine_fn stack on the heap, 
it's still allocated from a contiguous area in virtual address space. 
The contiguous allocation is wrapped by coroutine_stack_alloc and 
coroutine_stack_free, so it's really easy to change them to malloc and free.


I also do not have to walk up the whole call stack on coroutine_fn 
yields, because calls from one coroutine_fn to the next are tail calls; 
in exchange for that, I have more indirect calls than if the code did


if (next_call() == COROUTINE_YIELD) {
return COROUTINE_YIELD;
}

For now the choice was again just the one that made the translation easiest.

Today I also managed to implement a QEMU-like API on top of C++ coroutines:

CoroutineFn return_int() {
co_await qemu_coroutine_yield();
co_return 30;
}

CoroutineFn return_void() {
co_await qemu_coroutine_yield();
}

CoroutineFn co(void *) {
co_await return_void();
printf("%d\n", co_await return_int())
co_await qemu_coroutine_yield();
}

int main() {
Coroutine *f = qemu_coroutine_create(co, NULL);
printf("--- 0\n");
qemu_coroutine_enter(f);
printf("--- 1\n");
qemu_coroutine_enter(f);
printf("--- 2\n");
qemu_coroutine_enter(f);
printf("--- 3\n");
qemu_coroutine_enter(f);
printf("--- 4\n");
}

The runtime code is absurdly obscure; my favorite bit is

Yield qemu_coroutine_yield()
{
return Yield();
}

:) However, at 200 lines of code it's certainly smaller than a 
source-to-source translator.  It might be worth investigating a bit 
more.  Only files that define or use a coroutine_fn (which includes 
callers of qemu_coroutine_create) would have to be compiled as C++.


Paolo



Re: [PATCH 2/6] tests/9pfs: Twalk with nwname=0

2022-03-11 Thread Greg Kurz
On Thu, 10 Mar 2022 09:57:25 +0100
Christian Schoenebeck  wrote:

> On Mittwoch, 9. März 2022 14:24:24 CET Christian Schoenebeck wrote:
> > Send Twalk request with nwname=0. In this case no QIDs should
> > be returned by 9p server; this is equivalent to walking to dot.
> > 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> >  tests/qtest/virtio-9p-test.c | 22 ++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > index 22bdd74bc1..6c00da03f4 100644
> > --- a/tests/qtest/virtio-9p-test.c
> > +++ b/tests/qtest/virtio-9p-test.c
> > @@ -1002,6 +1002,27 @@ static void fs_walk_nonexistent(void *obj, void
> > *data, QGuestAllocator *t_alloc) do_walk_expect_error(v9p, "non-existent",
> > ENOENT);
> >  }
> > 
> > +static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)
> > +{
> 
> Or maybe calling this function fs_walk_clone_fid and the test case name
> "synth/walk/clone_fid" respectively instead?
> 

I agree that Twalk with nwname=0 does clone the fid in practice but
the test doesn't explicitly check that. In its present form, I'd
suggest a "no_names" wording but it is already fine as is, so:

Reviewed-by: Greg Kurz 

> > +QVirtio9P *v9p = obj;
> > +alloc = t_alloc;
> > +v9fs_qid root_qid;
> > +g_autofree v9fs_qid *wqid = NULL;
> > +P9Req *req;
> > +
> > +do_version(v9p);
> > +req = v9fs_tattach(v9p, 0, getuid(), 0);
> > +v9fs_req_wait_for_reply(req, NULL);
> > +v9fs_rattach(req, _qid);
> > +
> > +req = v9fs_twalk(v9p, 0, 1, 0, NULL, 0);
> > +v9fs_req_wait_for_reply(req, NULL);
> > +v9fs_rwalk(req, NULL, );
> > +
> > +/* special case: no QID is returned if nwname=0 was sent */
> > +g_assert(wqid == NULL);
> > +}
> > +
> >  static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc)
> > {
> >  QVirtio9P *v9p = obj;
> > @@ -1435,6 +1456,7 @@ static void register_virtio_9p_test(void)
> >  qos_add_test("synth/walk/basic", "virtio-9p", fs_walk,  );
> >  qos_add_test("synth/walk/no_slash", "virtio-9p", fs_walk_no_slash,
> >);
> > +qos_add_test("synth/walk/none", "virtio-9p", fs_walk_none, );
> >  qos_add_test("synth/walk/dotdot_from_root", "virtio-9p",
> >   fs_walk_dotdot,  );
> >  qos_add_test("synth/walk/non_existent", "virtio-9p", 
> > fs_walk_nonexistent,
> 
> 




Re: [PATCH 1/6] tests/9pfs: walk to non-existent dir

2022-03-11 Thread Greg Kurz
On Wed, 9 Mar 2022 13:18:50 +0100
Christian Schoenebeck  wrote:

> Expect ENOENT Rlerror response when trying to walk to a
> non-existent directory.
> 
> Signed-off-by: Christian Schoenebeck 
> ---

Reviewed-by: Greg Kurz 

>  tests/qtest/virtio-9p-test.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index 01ca076afe..22bdd74bc1 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -606,6 +606,25 @@ static uint32_t do_walk(QVirtio9P *v9p, const char *path)
>  return fid;
>  }
>  
> +/* utility function: walk to requested dir and expect passed error response 
> */
> +static void do_walk_expect_error(QVirtio9P *v9p, const char *path, uint32_t 
> err)
> +{
> +char **wnames;
> +P9Req *req;
> +uint32_t _err;
> +const uint32_t fid = genfid();
> +
> +int nwnames = split(path, "/", );
> +
> +req = v9fs_twalk(v9p, 0, fid, nwnames, wnames, 0);
> +v9fs_req_wait_for_reply(req, NULL);
> +v9fs_rlerror(req, &_err);
> +
> +g_assert_cmpint(_err, ==, err);
> +
> +split_free();
> +}
> +
>  static void fs_version(void *obj, void *data, QGuestAllocator *t_alloc)
>  {
>  alloc = t_alloc;
> @@ -974,6 +993,15 @@ static void fs_walk_no_slash(void *obj, void *data, 
> QGuestAllocator *t_alloc)
>  g_free(wnames[0]);
>  }
>  
> +static void fs_walk_nonexistent(void *obj, void *data, QGuestAllocator 
> *t_alloc)
> +{
> +QVirtio9P *v9p = obj;
> +alloc = t_alloc;
> +
> +do_attach(v9p);
> +do_walk_expect_error(v9p, "non-existent", ENOENT);
> +}
> +
>  static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc)
>  {
>  QVirtio9P *v9p = obj;
> @@ -1409,6 +1437,8 @@ static void register_virtio_9p_test(void)
>);
>  qos_add_test("synth/walk/dotdot_from_root", "virtio-9p",
>   fs_walk_dotdot,  );
> +qos_add_test("synth/walk/non_existent", "virtio-9p", fs_walk_nonexistent,
> +  );
>  qos_add_test("synth/lopen/basic", "virtio-9p", fs_lopen,  );
>  qos_add_test("synth/write/basic", "virtio-9p", fs_write,  );
>  qos_add_test("synth/flush/success", "virtio-9p", fs_flush_success,




Re: [PATCH v2 4/4] util/event-loop-base: Introduce options to set the thread pool size

2022-03-11 Thread Nicolas Saenz Julienne
On Thu, 2022-03-10 at 10:45 +, Stefan Hajnoczi wrote:
> On Thu, Mar 03, 2022 at 04:13:07PM +0100, Nicolas Saenz Julienne wrote:
> > @@ -537,10 +546,19 @@
> >  # 0 means that the engine will use its default
> >  # (default:0, since 6.1)
> >  #
> > +# @thread-pool-min: minimum number of threads readily available in the 
> > thread
> > +#   pool
> > +#   (default:0, since 6.2)
> > +#
> > +# @thread-pool-max: maximum number of threads the thread pool can contain
> > +#   (default:64, since 6.2)
> 
> Here and elsewhere:
> s/6.2/7.1/

Yes, forgot to mention it was a placeholder, as I wasn't sure what version to
target.

> > @@ -294,6 +314,36 @@ void thread_pool_submit(ThreadPool *pool, 
> > ThreadPoolFunc *func, void *arg)
> >  thread_pool_submit_aio(pool, func, arg, NULL, NULL);
> >  }
> >  
> > +void thread_pool_update_params(ThreadPool *pool, AioContext *ctx)
> > +{
> > +qemu_mutex_lock(>lock);
> > +
> > +pool->min_threads = ctx->thread_pool_min;
> > +pool->max_threads = ctx->thread_pool_max;
> > +
> > +/*
> > + * We either have to:
> > + *  - Increase the number available of threads until over the 
> > min_threads
> > + *threshold.
> > + *  - Decrease the number of available threads until under the 
> > max_threads
> > + *threshold.
> > + *  - Do nothing. the current number of threads fall in between the 
> > min and
> > + *max thresholds. We'll let the pool manage itself.
> > + */
> > +for (int i = pool->cur_threads; i < pool->min_threads; i++) {
> > +spawn_thread(pool);
> > +}
> > +
> > +while (pool->cur_threads > pool->max_threads) {
> > +qemu_sem_post(>sem);
> > +qemu_mutex_unlock(>lock);
> > +qemu_cond_wait(>worker_stopped, >lock);
> > +qemu_mutex_lock(>lock);
> 
> Same question as Patch 1. This looks incorrect because qemu_cond_wait()
> already drops pool->lock if it needs to block.

Yes, I'll fix that.

> Also, why wait? If worker threads are blocked for some reason then our
> thread will block too.

Exiting thread_pool_update_params() before honoring the new constraints is a
source of potential race conditions (having to worry for situations where
cur_threads > max_threads), and on systems where determinism is important it's
crucial to have a clear boundary between 'unsafe' and 'safe' states.

Thanks,

-- 
Nicolás Sáenz




Re: [PATCH 6/6] tests/9pfs: guard recent 'Twalk' behaviour fix

2022-03-11 Thread Christian Schoenebeck
On Mittwoch, 9. März 2022 19:21:18 CET Christian Schoenebeck wrote:
> Previous 9p patch fixed 'Twalk' request handling, which was previously not
> behaving as specified by the 9p2000 protocol spec. This patch adds a new
> test case which guards the new 'Twalk' behaviour in question.
> 
> More specifically: it sends a 'Twalk' request where the 1st path component
> is valid, whereas the 2nd path component transmitted to server does not
> exist. The expected behaviour is that 9p server would respond by sending
> a 'Rwalk' response with exactly 1 QID (instead of 'Rlerror' response).
> 
> Signed-off-by: Christian Schoenebeck 
> ---
>  tests/qtest/virtio-9p-test.c | 42 +---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index 9098e21173..f29de1ca64 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -638,8 +638,12 @@ static void do_version(QVirtio9P *v9p)
>  g_assert_cmpmem(server_version, server_len, version, strlen(version));
>  }
> 
> -/* utility function: walk to requested dir and return fid for that dir */
> -static uint32_t do_walk(QVirtio9P *v9p, const char *path)
> +/*
> + * utility function: walk to requested dir and return fid for that dir and
> + * the QIDs of server response
> + */
> +static uint32_t do_walk_rqids(QVirtio9P *v9p, const char *path, uint16_t
> *nwqid, +  v9fs_qid **wqid)
>  {
>  char **wnames;
>  P9Req *req;
> @@ -649,12 +653,18 @@ static uint32_t do_walk(QVirtio9P *v9p, const char
> *path)
> 
>  req = v9fs_twalk(v9p, 0, fid, nwnames, wnames, 0);
>  v9fs_req_wait_for_reply(req, NULL);
> -v9fs_rwalk(req, NULL, NULL);
> +v9fs_rwalk(req, nwqid, wqid);
> 
>  split_free();
>  return fid;
>  }
> 
> +/* utility function: walk to requested dir and return fid for that dir */
> +static uint32_t do_walk(QVirtio9P *v9p, const char *path)
> +{
> +return do_walk_rqids(v9p, path, NULL, NULL);
> +}
> +
>  /* utility function: walk to requested dir and expect passed error response
> */ static void do_walk_expect_error(QVirtio9P *v9p, const char *path,
> uint32_t err) {
> @@ -1048,9 +1058,33 @@ static void fs_walk_nonexistent(void *obj, void
> *data, QGuestAllocator *t_alloc) alloc = t_alloc;
> 
>  do_attach(v9p);
> +/*
> + * The 9p2000 protocol spec sais: "If the first element cannot be
> walked + * for any reason, Rerror is returned."
> + */
>  do_walk_expect_error(v9p, "non-existent", ENOENT);
>  }
> 
> +static void fs_walk_2nd_nonexistent(void *obj, void *data,
> +QGuestAllocator *t_alloc)
> +{
> +QVirtio9P *v9p = obj;
> +alloc = t_alloc;
> +uint16_t nwqid;
> +g_autofree v9fs_qid *wqid = NULL;
> +g_autofree char *path = g_strdup_printf(
> +QTEST_V9FS_SYNTH_WALK_FILE "/non-existent", 0
> +);
> +
> +do_attach(v9p);
> +do_walk_rqids(v9p, path, , );
> +/*
> + * The 9p2000 protocol spec sais: "nwqid is therefore either nwname or
> the + * index of the first elementwise walk that failed."
> + */
> +assert(nwqid == 1);
> +}

I should probably extend this test case with a separate 7th patch to check 
that fid was really unaffected by Twalk, by comparing the QID before vs. after 
the walk (similar to what patch 3 does).

> +
>  static void fs_walk_none(void *obj, void *data, QGuestAllocator *t_alloc)
>  {
>  QVirtio9P *v9p = obj;
> @@ -1531,6 +1565,8 @@ static void register_virtio_9p_test(void)
>   fs_walk_dotdot,  );
>  qos_add_test("synth/walk/non_existent", "virtio-9p",
> fs_walk_nonexistent, );
> +qos_add_test("synth/walk/2nd_non_existent", "virtio-9p",
> + fs_walk_2nd_nonexistent, );
>  qos_add_test("synth/lopen/basic", "virtio-9p", fs_lopen,  );
>  qos_add_test("synth/write/basic", "virtio-9p", fs_write,  );
>  qos_add_test("synth/flush/success", "virtio-9p", fs_flush_success,





Re: [PATCH V7 10/29] machine: memfd-alloc option

2022-03-11 Thread David Hildenbrand
On 03.03.22 18:21, Michael S. Tsirkin wrote:
> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:
>> Allocate anonymous memory using memfd_create if the memfd-alloc machine
>> option is set.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  hw/core/machine.c   | 19 +++
>>  include/hw/boards.h |  1 +
>>  qemu-options.hx |  6 ++
>>  softmmu/physmem.c   | 47 ++-
>>  softmmu/vl.c|  1 +
>>  trace-events|  1 +
>>  util/qemu-config.c  |  4 
>>  7 files changed, 70 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index 53a99ab..7739d88 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, bool 
>> value, Error **errp)
>>  ms->mem_merge = value;
>>  }
>>  
>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
>> +{
>> +MachineState *ms = MACHINE(obj);
>> +
>> +return ms->memfd_alloc;
>> +}
>> +
>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
>> +{
>> +MachineState *ms = MACHINE(obj);
>> +
>> +ms->memfd_alloc = value;
>> +}
>> +
>>  static bool machine_get_usb(Object *obj, Error **errp)
>>  {
>>  MachineState *ms = MACHINE(obj);
>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass *oc, void 
>> *data)
>>  object_class_property_set_description(oc, "mem-merge",
>>  "Enable/disable memory merge support");
>>  
>> +object_class_property_add_bool(oc, "memfd-alloc",
>> +machine_get_memfd_alloc, machine_set_memfd_alloc);
>> +object_class_property_set_description(oc, "memfd-alloc",
>> +"Enable/disable allocating anonymous memory using memfd_create");
>> +
>>  object_class_property_add_bool(oc, "usb",
>>  machine_get_usb, machine_set_usb);
>>  object_class_property_set_description(oc, "usb",
>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>> index 9c1c190..a57d7a0 100644
>> --- a/include/hw/boards.h
>> +++ b/include/hw/boards.h
>> @@ -327,6 +327,7 @@ struct MachineState {
>>  char *dt_compatible;
>>  bool dump_guest_core;
>>  bool mem_merge;
>> +bool memfd_alloc;
>>  bool usb;
>>  bool usb_disabled;
>>  char *firmware;
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 7d47510..33c8173 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>  "vmport=on|off|auto controls emulation of vmport 
>> (default: auto)\n"
>>  "dump-guest-core=on|off include guest memory in a core 
>> dump (default=on)\n"
>>  "mem-merge=on|off controls memory merge support 
>> (default: on)\n"
>> +"memfd-alloc=on|off controls allocating anonymous guest 
>> RAM using memfd_create (default: off)\n"
> 
> Question: are there any disadvantages associated with using
> memfd_create? I guess we are using up an fd, but that seems minor.  Any
> reason not to set to on by default? maybe with a fallback option to
> disable that?
> 
> I am concerned that it's actually a kind of memory backend, this flag
> seems to instead be closer to the deprecated mem-prealloc. E.g.
> it does not work with a mem path, does it?

We had a RH-internal discssuion some time ago, here is my writeup (note
the TMPFS/SHMEM discussion):

--- snip ---

In QEMU, we specify the type of guest RAM via
* -object memory-backend-ram,...
* -object memory-backend-file,...
* -object memory-backend-memfd,...

We can specify whether to share the memory (share=on -- MAP_SHARED),
or whether to keep modifications local to QEMU (share=off -- MAP_PRIVATE).

Using "share=off" (or using the default) with files/memfd can have some
serious side-effects.

ALERT: "share=off" is the default in QEMU for memory-backend-ram and
memory-backend-file. "share=on" is the default in QEMU only for
memory-backend-memfd.


I. MAP_SHARED vs. MAP_PRIVATE

MAP_SHARED: when reading, read file content; when writing, modify file
 content.
MAP_PRIVATE: when reading, read file content, except if there was a
  local/private change. When writing, keep change
  local/private and don't modify file content.


MAP_PRIVATE sounds like a snapshot, however, in some cases it really
behaves differently -- especially with tmpfs/shmem and when QEMU
discards memory (e.g., with virtio-balloon or during postcopy live
migration).

There is some connection between MAP_PRIVATE and NUMA bindings that I
have yet to fully explore. We could have issues with some MAP_SHARED
mappings and NUMA bindings (IOW: policy getting ignored).


II Impact on different memory backends/types

II.1. Anonymous memory:

Usage: -object memory-backend-ram,...

We really want "share=off" in 99.99% of all cases. Shared anonymous RAM
-- i.e., sharing RAM with your child processes -- does not really apply

Re: [PATCH v2 2/4] Introduce event-loop-base abstract class

2022-03-11 Thread Nicolas Saenz Julienne
On Thu, 2022-03-10 at 10:25 +, Stefan Hajnoczi wrote:
> On Thu, Mar 03, 2022 at 03:58:20PM +0100, Nicolas Saenz Julienne wrote:
> > @@ -2935,13 +2947,6 @@ qemu_syms = custom_target('qemu.syms', output: 
> > 'qemu.syms',
> >   capture: true,
> >   command: [undefsym, nm, '@INPUT@'])
> >  
> > -qom_ss = qom_ss.apply(config_host, strict: false)
> > -libqom = static_library('qom', qom_ss.sources() + genh,
> > -dependencies: [qom_ss.dependencies()],
> > -name_suffix: 'fa')
> > -
> > -qom = declare_dependency(link_whole: libqom)
> > -
> 
> Why was it necessary to move qom_ss and subdir('hw') up? Can
> event_loop_base be defined down here instead?

The way I setup it up, qemuutil now depdens on event_loop_base which in turn
depends on qom. IIUC I can't declare dependencies without declaring first the
libraries and source sets. All has to happen sequencially. With this in mind,
almost all libraries depend on libqemuutil so moving it down isn't possible.

subdir('hw') was also moved up, as 'hw/nvram/meson.build' is introducing files
into qom_ss. This operation has to be performed before declaring libqom.

> (The benefit of less code churn is it reduces the risk of patch conflicts.)

Agree, I know how painful it can be for backports.

Thanks,

-- 
Nicolás Sáenz




Re: [PATCH V7 10/29] machine: memfd-alloc option

2022-03-11 Thread Daniel P . Berrangé
On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:
> On 3/4/2022 5:41 AM, Igor Mammedov wrote:
> > On Thu, 3 Mar 2022 12:21:15 -0500
> > "Michael S. Tsirkin"  wrote:
> > 
> >> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:
> >>> Allocate anonymous memory using memfd_create if the memfd-alloc machine
> >>> option is set.
> >>>
> >>> Signed-off-by: Steve Sistare 
> >>> ---
> >>>  hw/core/machine.c   | 19 +++
> >>>  include/hw/boards.h |  1 +
> >>>  qemu-options.hx |  6 ++
> >>>  softmmu/physmem.c   | 47 ++-
> >>>  softmmu/vl.c|  1 +
> >>>  trace-events|  1 +
> >>>  util/qemu-config.c  |  4 
> >>>  7 files changed, 70 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >>> index 53a99ab..7739d88 100644
> >>> --- a/hw/core/machine.c
> >>> +++ b/hw/core/machine.c
> >>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, bool 
> >>> value, Error **errp)
> >>>  ms->mem_merge = value;
> >>>  }
> >>>  
> >>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> >>> +{
> >>> +MachineState *ms = MACHINE(obj);
> >>> +
> >>> +return ms->memfd_alloc;
> >>> +}
> >>> +
> >>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error 
> >>> **errp)
> >>> +{
> >>> +MachineState *ms = MACHINE(obj);
> >>> +
> >>> +ms->memfd_alloc = value;
> >>> +}
> >>> +
> >>>  static bool machine_get_usb(Object *obj, Error **errp)
> >>>  {
> >>>  MachineState *ms = MACHINE(obj);
> >>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass *oc, void 
> >>> *data)
> >>>  object_class_property_set_description(oc, "mem-merge",
> >>>  "Enable/disable memory merge support");
> >>>  
> >>> +object_class_property_add_bool(oc, "memfd-alloc",
> >>> +machine_get_memfd_alloc, machine_set_memfd_alloc);
> >>> +object_class_property_set_description(oc, "memfd-alloc",
> >>> +"Enable/disable allocating anonymous memory using memfd_create");
> >>> +
> >>>  object_class_property_add_bool(oc, "usb",
> >>>  machine_get_usb, machine_set_usb);
> >>>  object_class_property_set_description(oc, "usb",
> >>> diff --git a/include/hw/boards.h b/include/hw/boards.h
> >>> index 9c1c190..a57d7a0 100644
> >>> --- a/include/hw/boards.h
> >>> +++ b/include/hw/boards.h
> >>> @@ -327,6 +327,7 @@ struct MachineState {
> >>>  char *dt_compatible;
> >>>  bool dump_guest_core;
> >>>  bool mem_merge;
> >>> +bool memfd_alloc;
> >>>  bool usb;
> >>>  bool usb_disabled;
> >>>  char *firmware;
> >>> diff --git a/qemu-options.hx b/qemu-options.hx
> >>> index 7d47510..33c8173 100644
> >>> --- a/qemu-options.hx
> >>> +++ b/qemu-options.hx
> >>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >>>  "vmport=on|off|auto controls emulation of vmport 
> >>> (default: auto)\n"
> >>>  "dump-guest-core=on|off include guest memory in a 
> >>> core dump (default=on)\n"
> >>>  "mem-merge=on|off controls memory merge support 
> >>> (default: on)\n"
> >>> +"memfd-alloc=on|off controls allocating anonymous 
> >>> guest RAM using memfd_create (default: off)\n"  
> >>
> >> Question: are there any disadvantages associated with using
> >> memfd_create? I guess we are using up an fd, but that seems minor.  Any
> >> reason not to set to on by default? maybe with a fallback option to
> >> disable that?
> 
> Old Linux host kernels, circa 4.1, do not support huge pages for shared 
> memory.

That doesn't matter, as we don't support any distros with kernels that old

   https://www.qemu.org/docs/master/about/build-platforms.html

We can assume something around kernel 4.18 I believe.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 06/18 for-7.0] migration: fix use of TLS PSK credentials with a UNIX socket

2022-03-11 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> The migration TLS code has a check mandating that a hostname be
> available when starting a TLS session. This is expected when using
> x509 credentials, but is bogus for PSK and anonymous credentials
> as neither involve hostname validation.
>
> The TLS crdentials object gained suitable error reporting in the
> case of TLS with x509 credentials, so there is no longer any need
> for the migration code to do its own (incorrect) validation.
>
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Juan Quintela 

Should be safe for current one (famous last words).




Re: [PATCH v2 04/18] tests: print newline after QMP response in qtest logs

2022-03-11 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> The QMP commands have a trailing newline, but the response does not.
> This makes the qtest logs hard to follow as the next QMP command
> appears in the same line as the previous QMP response.
>
> Reviewed-by: Thomas Huth 
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Juan Quintela 




Re: [PATCH v2 03/18] tests: support QTEST_TRACE env variable

2022-03-11 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> When debugging failing qtests it is useful to be able to turn on trace
> output to stderr. The QTEST_TRACE env variable contents get injected
> as a '-trace ' command line arg
>
> Reviewed-by: Peter Xu 
> Reviewed-by: Thomas Huth 
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Juan Quintela 

Great.  I have had a crude version of this (i.e. just changing the
command line for ages).




Re: [PATCH v2 08/11] tests/acpi: disallow virt memory hotplug changes

2022-03-11 Thread Igor Mammedov
On Fri, 11 Mar 2022 06:37:56 +0100
Gerd Hoffmann  wrote:

> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 

> ---
>  tests/qtest/bios-tables-test-allowed-diff.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index e569098abddc..dfb8523c8bf4 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1,2 +1 @@
>  /* List of comma-separated changed AML files to ignore */
> -"tests/data/acpi/virt/SSDT.memhp",




Re: [PATCH v2 07/11] tests/acpi: update expected data files

2022-03-11 Thread Igor Mammedov
On Fri, 11 Mar 2022 06:37:55 +0100
Gerd Hoffmann  wrote:

> The switch to edk2 RELEASE builds changes the memory layout a bit,
> resulting in a acpi table change.
> 
>  DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
>  {
>  Scope (\_SB)
>  {
>  Device (NVDR)
>  {
>  Name (_HID, "ACPI0012" /* NVDIMM Root Device */)  // _HID: 
> Hardware ID
>  Method (NCAL, 5, Serialized)
>  {
>  Local6 = MEMA /* \MEMA */
>  {
>  Return (NCAL (Arg0, Arg1, Arg2, Arg3, 0x02))
>  }
>  }
> 
>  Device (NV02)
>  {
>  Name (_ADR, 0x03)  // _ADR: Address
>  Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific 
> Method
>  {
>  Return (NCAL (Arg0, Arg1, Arg2, Arg3, 0x03))
>  }
>  }
>  }
>  }
> 
> -Name (MEMA, 0x43D1)
> +Name (MEMA, 0x43F5)
>  }
> 
> Signed-off-by: Gerd Hoffmann 

Acked-by: Igor Mammedov 

> ---
>  tests/data/acpi/virt/SSDT.memhp | Bin 736 -> 736 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/tests/data/acpi/virt/SSDT.memhp b/tests/data/acpi/virt/SSDT.memhp
> index 
> 375d7b6fc85a484f492a26ccd355c205f2c34473..4c363a6d95a7e2e826568c85f5719127748e7932
>  100644
> GIT binary patch
> delta 22
> dcmaFB`hb-yIM^lR0TTlQqx43uD@;sZodHo~2HXGu
> 
> delta 22
> dcmaFB`hb-yIM^lR0TTlQqy0v%D@;rmodHrj2HXGu
> 




Re: [PATCH v2 01/11] tests/acpi: allow virt memory hotplug changes

2022-03-11 Thread Igor Mammedov
On Fri, 11 Mar 2022 06:37:49 +0100
Gerd Hoffmann  wrote:

> Signed-off-by: Gerd Hoffmann 
> Reviewed-by: Alex Bennée 

Reviewed-by: Igor Mammedov 

> ---
>  tests/qtest/bios-tables-test-allowed-diff.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index dfb8523c8bf4..e569098abddc 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1 +1,2 @@
>  /* List of comma-separated changed AML files to ignore */
> +"tests/data/acpi/virt/SSDT.memhp",




Re: [PATCH v3] i386/cpu: Remove the deprecated cpu model 'Icelake-Client'

2022-03-11 Thread Igor Mammedov
On Tue, 08 Feb 2022 16:38:27 +0800
Robert Hoo  wrote:

> Hi,
> 
> Can we remove the deprecated 'Icelake-Client' CPU model now? if so, I
> can rebase patch to latest and resend.

Please rebase and resend.

PS:
Also looking at deprecation commit 3e6a015cbd0f, it doesn't
have any reason behind deprecation so please add explanation
to commit message why it's being removed.

> Thanks.
> 
> On Sat, 2021-05-08 at 11:16 +0800, Robert Hoo wrote:
> > Hi,
> > 
> > Ping...
> > 
> > Thanks.
> > 
> > On Thu, 2021-04-29 at 09:35 +0800, Robert Hoo wrote:  
> > > As it's been marked deprecated since v5.2, now I think it's time
> > > remove it
> > > from code.
> > > 
> > > Signed-off-by: Robert Hoo 
> > > ---
> > > Changelog:
> > > v3:
> > >   Update deprecated.rst. (Sorry for my carelessness in last
> > > search. I
> > > sware I did search.)
> > > v2:
> > > Update removed-features.rst.
> > > ---
> > >  docs/system/deprecated.rst   |   6 --
> > >  docs/system/removed-features.rst |   5 ++
> > >  target/i386/cpu.c| 118 ---
> > > 
> > >  3 files changed, 5 insertions(+), 124 deletions(-)
> > > 
> > > diff --git a/docs/system/deprecated.rst
> > > b/docs/system/deprecated.rst
> > > index 80cae86..780b756 100644
> > > --- a/docs/system/deprecated.rst
> > > +++ b/docs/system/deprecated.rst
> > > @@ -222,12 +222,6 @@ a future version of QEMU. Support for this CPU
> > > was removed from the
> > >  upstream Linux kernel, and there is no available upstream
> > > toolchain
> > >  to build binaries for it.
> > >  
> > > -``Icelake-Client`` CPU Model (since 5.2.0)
> > > -''
> > > -
> > > -``Icelake-Client`` CPU Models are deprecated. Use ``Icelake-
> > > Server`` 
> > > CPU
> > > -Models instead.
> > > -
> > >  MIPS ``I7200`` CPU Model (since 5.2)
> > >  
> > >  
> > > diff --git a/docs/system/removed-features.rst
> > > b/docs/system/removed-
> > > features.rst
> > > index 29e9060..f1b5a16 100644
> > > --- a/docs/system/removed-features.rst
> > > +++ b/docs/system/removed-features.rst
> > > @@ -285,6 +285,11 @@ The RISC-V no MMU cpus have been removed. The
> > > two CPUs: ``rv32imacu-nommu`` and
> > >  ``rv64imacu-nommu`` can no longer be used. Instead the MMU status
> > > can be specified
> > >  via the CPU ``mmu`` option when using the ``rv32`` or ``rv64``
> > > CPUs.
> > >  
> > > +x86 Icelake-Client CPU (removed in 6.1)
> > > +'''
> > > +
> > > +``Icelake-Client`` cpu can no longer be used. Use ``Icelake-
> > > Server`` 
> > > instead.
> > > +
> > >  System emulator machines
> > >  
> > >  
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index ad99cad..75f2ad1 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -3338,124 +3338,6 @@ static X86CPUDefinition builtin_x86_defs[]
> > > =
> > > {
> > >  .model_id = "Intel Xeon Processor (Cooperlake)",
> > >  },
> > >  {
> > > -.name = "Icelake-Client",
> > > -.level = 0xd,
> > > -.vendor = CPUID_VENDOR_INTEL,
> > > -.family = 6,
> > > -.model = 126,
> > > -.stepping = 0,
> > > -.features[FEAT_1_EDX] =
> > > -CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR |
> > > CPUID_MMX |
> > > -CPUID_CLFLUSH | CPUID_PSE36 | CPUID_PAT | CPUID_CMOV |
> > > CPUID_MCA |
> > > -CPUID_PGE | CPUID_MTRR | CPUID_SEP | CPUID_APIC |
> > > CPUID_CX8 |
> > > -CPUID_MCE | CPUID_PAE | CPUID_MSR | CPUID_TSC |
> > > CPUID_PSE |
> > > -CPUID_DE | CPUID_FP87,
> > > -.features[FEAT_1_ECX] =
> > > -CPUID_EXT_AVX | CPUID_EXT_XSAVE | CPUID_EXT_AES |
> > > -CPUID_EXT_POPCNT | CPUID_EXT_X2APIC | CPUID_EXT_SSE42
> > > |
> > > -CPUID_EXT_SSE41 | CPUID_EXT_CX16 | CPUID_EXT_SSSE3 |
> > > -CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3 |
> > > -CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_FMA |
> > > CPUID_EXT_MOVBE |
> > > -CPUID_EXT_PCID | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
> > > -.features[FEAT_8000_0001_EDX] =
> > > -CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
> > > -CPUID_EXT2_SYSCALL,
> > > -.features[FEAT_8000_0001_ECX] =
> > > -CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM |
> > > CPUID_EXT3_3DNOWPREFETCH,
> > > -.features[FEAT_8000_0008_EBX] =
> > > -CPUID_8000_0008_EBX_WBNOINVD,
> > > -.features[FEAT_7_0_EBX] =
> > > -CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
> > > -CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 |
> > > CPUID_7_0_EBX_SMEP |
> > > -CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS |
> > > CPUID_7_0_EBX_INVPCID |
> > > -CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_RDSEED |
> > > CPUID_7_0_EBX_ADX |
> > > -CPUID_7_0_EBX_SMAP,
> > > -.features[FEAT_7_0_ECX] =
> > > 

Re: [PATCH V7 10/29] machine: memfd-alloc option

2022-03-11 Thread David Hildenbrand
On 22.12.21 20:05, Steve Sistare wrote:
> Allocate anonymous memory using memfd_create if the memfd-alloc machine
> option is set.

Hi,

late to the party (thanks Igor for CCing)

... in which case it's no longer anonymous memory (because it's now
MAP_SHARED). So you're converting all private memory to shared memory.

For example, memory ballooning will no longer work as expected. There is
no shared zeropage. KSM won't work. This brings a lot of "surprises".


This patch begs for a proper description why this is required and why we
cannot simply let the user handle that by properly using
memory-backend-memfd manually.

Especially the "memfd-alloc option" doesn't even express to a user
what's actually happening and what the implications are.


Long story short: this patch description has to be seriously extended.

> 
> Signed-off-by: Steve Sistare 
> ---
>  hw/core/machine.c   | 19 +++
>  include/hw/boards.h |  1 +
>  qemu-options.hx |  6 ++
>  softmmu/physmem.c   | 47 ++-
>  softmmu/vl.c|  1 +
>  trace-events|  1 +
>  util/qemu-config.c  |  4 
>  7 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 53a99ab..7739d88 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, bool 
> value, Error **errp)
>  ms->mem_merge = value;
>  }
>  
> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +return ms->memfd_alloc;
> +}
> +
> +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +ms->memfd_alloc = value;
> +}
> +
>  static bool machine_get_usb(Object *obj, Error **errp)
>  {
>  MachineState *ms = MACHINE(obj);
> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass *oc, void 
> *data)
>  object_class_property_set_description(oc, "mem-merge",
>  "Enable/disable memory merge support");
>  
> +object_class_property_add_bool(oc, "memfd-alloc",
> +machine_get_memfd_alloc, machine_set_memfd_alloc);
> +object_class_property_set_description(oc, "memfd-alloc",
> +"Enable/disable allocating anonymous memory using memfd_create");
> +
>  object_class_property_add_bool(oc, "usb",
>  machine_get_usb, machine_set_usb);
>  object_class_property_set_description(oc, "usb",
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 9c1c190..a57d7a0 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -327,6 +327,7 @@ struct MachineState {
>  char *dt_compatible;
>  bool dump_guest_core;
>  bool mem_merge;
> +bool memfd_alloc;
>  bool usb;
>  bool usb_disabled;
>  char *firmware;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 7d47510..33c8173 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>  "vmport=on|off|auto controls emulation of vmport 
> (default: auto)\n"
>  "dump-guest-core=on|off include guest memory in a core 
> dump (default=on)\n"
>  "mem-merge=on|off controls memory merge support 
> (default: on)\n"
> +"memfd-alloc=on|off controls allocating anonymous guest 
> RAM using memfd_create (default: off)\n"
>  "aes-key-wrap=on|off controls support for AES key 
> wrapping (default=on)\n"
>  "dea-key-wrap=on|off controls support for DEA key 
> wrapping (default=on)\n"
>  "suppress-vmdesc=on|off disables self-describing 
> migration (default=off)\n"
> @@ -76,6 +77,11 @@ SRST
>  supported by the host, de-duplicates identical memory pages
>  among VMs instances (enabled by default).
>  
> +``memfd-alloc=on|off``
> +Enables or disables allocation of anonymous guest RAM using
> +memfd_create.  Any associated memory-backend objects are created with
> +share=on.  The memfd-alloc default is off.
> +
>  ``aes-key-wrap=on|off``
>  Enables or disables AES key wrapping support on s390-ccw hosts.
>  This feature controls whether AES wrapping keys will be created
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3524c04..95e2b49 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -41,6 +41,7 @@
>  #include "qemu/config-file.h"
>  #include "qemu/error-report.h"
>  #include "qemu/qemu-print.h"
> +#include "qemu/memfd.h"
>  #include "exec/memory.h"
>  #include "exec/ioport.h"
>  #include "sysemu/dma.h"
> @@ -1964,35 +1965,63 @@ static void ram_block_add(RAMBlock *new_block, Error 
> **errp)
>  const bool shared = qemu_ram_is_shared(new_block);
>  RAMBlock *block;
>  RAMBlock *last_block = NULL;
> +struct MemoryRegion *mr = new_block->mr;
>  

[PATCH v2] target/riscv: write back unmodified value for csrrc/csrrs with rs1 is not x0 but holding zero

2022-03-11 Thread Weiwei Li
For csrrs and csrrc, if rs1 specifies a register other than x0, holding
a zero value, the instruction will still attempt to write the unmodified
value back to the csr and will cause side effects

v2:
* change to explictly pass "bool write_op" argument in riscv_csrrw*, do
  write permission check and write operation depend on it
* extend riscv_csr_predicate_fn to pass "write_op" argument which will
  be checked by seed csr or other future "write-only" csr

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu.c   |   3 +-
 target/riscv/cpu.h   |  15 +++---
 target/riscv/csr.c   | 101 +--
 target/riscv/gdbstub.c   |  15 +++---
 target/riscv/op_helper.c |  12 ++---
 5 files changed, 79 insertions(+), 67 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ddda4906ff..76ad9bffac 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -314,7 +314,8 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
 for (int i = 0; i < ARRAY_SIZE(dump_csrs); ++i) {
 int csrno = dump_csrs[i];
 target_ulong val = 0;
-RISCVException res = riscv_csrrw_debug(env, csrno, , 0, 0);
+RISCVException res = riscv_csrrw_debug(env, csrno, , 0, 0,
+   false);
 
 /*
  * Rely on the smode, hmode, etc, predicates within csr.c
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 9ba05042ed..971d193d61 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -610,27 +610,29 @@ void riscv_cpu_update_mask(CPURISCVState *env);
 
 RISCVException riscv_csrrw(CPURISCVState *env, int csrno,
target_ulong *ret_value,
-   target_ulong new_value, target_ulong write_mask);
+   target_ulong new_value, target_ulong write_mask,
+   bool write_op);
 RISCVException riscv_csrrw_debug(CPURISCVState *env, int csrno,
  target_ulong *ret_value,
  target_ulong new_value,
- target_ulong write_mask);
+ target_ulong write_mask, bool write_op);
 
 static inline void riscv_csr_write(CPURISCVState *env, int csrno,
target_ulong val)
 {
-riscv_csrrw(env, csrno, NULL, val, MAKE_64BIT_MASK(0, TARGET_LONG_BITS));
+riscv_csrrw(env, csrno, NULL, val, MAKE_64BIT_MASK(0, TARGET_LONG_BITS),
+true);
 }
 
 static inline target_ulong riscv_csr_read(CPURISCVState *env, int csrno)
 {
 target_ulong val = 0;
-riscv_csrrw(env, csrno, , 0, 0);
+riscv_csrrw(env, csrno, , 0, 0, false);
 return val;
 }
 
 typedef RISCVException (*riscv_csr_predicate_fn)(CPURISCVState *env,
- int csrno);
+ int csrno, bool write_op);
 typedef RISCVException (*riscv_csr_read_fn)(CPURISCVState *env, int csrno,
 target_ulong *ret_value);
 typedef RISCVException (*riscv_csr_write_fn)(CPURISCVState *env, int csrno,
@@ -642,7 +644,8 @@ typedef RISCVException (*riscv_csr_op_fn)(CPURISCVState 
*env, int csrno,
 
 RISCVException riscv_csrrw_i128(CPURISCVState *env, int csrno,
 Int128 *ret_value,
-Int128 new_value, Int128 write_mask);
+Int128 new_value, Int128 write_mask,
+bool write_op);
 
 typedef RISCVException (*riscv_csr_read128_fn)(CPURISCVState *env, int csrno,
Int128 *ret_value);
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index aea82dff4a..1907481fb1 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -36,7 +36,7 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 }
 
 /* Predicates */
-static RISCVException fs(CPURISCVState *env, int csrno)
+static RISCVException fs(CPURISCVState *env, int csrno, bool write_op)
 {
 #if !defined(CONFIG_USER_ONLY)
 if (!env->debugger && !riscv_cpu_fp_enabled(env) &&
@@ -47,7 +47,7 @@ static RISCVException fs(CPURISCVState *env, int csrno)
 return RISCV_EXCP_NONE;
 }
 
-static RISCVException vs(CPURISCVState *env, int csrno)
+static RISCVException vs(CPURISCVState *env, int csrno, bool write_op)
 {
 CPUState *cs = env_cpu(env);
 RISCVCPU *cpu = RISCV_CPU(cs);
@@ -64,7 +64,7 @@ static RISCVException vs(CPURISCVState *env, int csrno)
 return RISCV_EXCP_ILLEGAL_INST;
 }
 
-static RISCVException ctr(CPURISCVState *env, int csrno)
+static RISCVException ctr(CPURISCVState *env, int csrno, bool write_op)
 {
 #if !defined(CONFIG_USER_ONLY)
 CPUState *cs = env_cpu(env);
@@ -135,50 +135,50 @@ static RISCVException ctr(CPURISCVState *env, int csrno)
 return RISCV_EXCP_NONE;
 }
 

Re: [PATCH V7 10/29] machine: memfd-alloc option

2022-03-11 Thread Igor Mammedov
On Thu, 10 Mar 2022 13:18:35 -0500
Steven Sistare  wrote:

> On 3/10/2022 12:28 PM, Steven Sistare wrote:
> > On 3/10/2022 11:00 AM, Igor Mammedov wrote:  
> >> On Thu, 10 Mar 2022 10:36:08 -0500
> >> Steven Sistare  wrote:
> >>  
> >>> On 3/8/2022 2:20 AM, Igor Mammedov wrote:  
>  On Tue, 8 Mar 2022 01:50:11 -0500
>  "Michael S. Tsirkin"  wrote:
>  
> > On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:
> >> On 3/4/2022 5:41 AM, Igor Mammedov wrote:  
> >>> On Thu, 3 Mar 2022 12:21:15 -0500
> >>> "Michael S. Tsirkin"  wrote:
> >>>   
>  On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:  
> > Allocate anonymous memory using memfd_create if the memfd-alloc 
> > machine
> > option is set.
> >
> > Signed-off-by: Steve Sistare 
> > ---
> >  hw/core/machine.c   | 19 +++
> >  include/hw/boards.h |  1 +
> >  qemu-options.hx |  6 ++
> >  softmmu/physmem.c   | 47 
> > ++-
> >  softmmu/vl.c|  1 +
> >  trace-events|  1 +
> >  util/qemu-config.c  |  4 
> >  7 files changed, 70 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 53a99ab..7739d88 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, 
> > bool value, Error **errp)
> >  ms->mem_merge = value;
> >  }
> >  
> > +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> > +{
> > +MachineState *ms = MACHINE(obj);
> > +
> > +return ms->memfd_alloc;
> > +}
> > +
> > +static void machine_set_memfd_alloc(Object *obj, bool value, Error 
> > **errp)
> > +{
> > +MachineState *ms = MACHINE(obj);
> > +
> > +ms->memfd_alloc = value;
> > +}
> > +
> >  static bool machine_get_usb(Object *obj, Error **errp)
> >  {
> >  MachineState *ms = MACHINE(obj);
> > @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass 
> > *oc, void *data)
> >  object_class_property_set_description(oc, "mem-merge",
> >  "Enable/disable memory merge support");
> >  
> > +object_class_property_add_bool(oc, "memfd-alloc",
> > +machine_get_memfd_alloc, machine_set_memfd_alloc);
> > +object_class_property_set_description(oc, "memfd-alloc",
> > +"Enable/disable allocating anonymous memory using 
> > memfd_create");
> > +
> >  object_class_property_add_bool(oc, "usb",
> >  machine_get_usb, machine_set_usb);
> >  object_class_property_set_description(oc, "usb",
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 9c1c190..a57d7a0 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -327,6 +327,7 @@ struct MachineState {
> >  char *dt_compatible;
> >  bool dump_guest_core;
> >  bool mem_merge;
> > +bool memfd_alloc;
> >  bool usb;
> >  bool usb_disabled;
> >  char *firmware;
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 7d47510..33c8173 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >  "vmport=on|off|auto controls emulation of 
> > vmport (default: auto)\n"
> >  "dump-guest-core=on|off include guest memory 
> > in a core dump (default=on)\n"
> >  "mem-merge=on|off controls memory merge 
> > support (default: on)\n"
> > +"memfd-alloc=on|off controls allocating 
> > anonymous guest RAM using memfd_create (default: off)\n"
> 
>  Question: are there any disadvantages associated with using
>  memfd_create? I guess we are using up an fd, but that seems minor.  
>  Any
>  reason not to set to on by default? maybe with a fallback option to
>  disable that?  
> >>
> >> Old Linux host kernels, circa 4.1, do not support huge pages for 
> >> shared memory.
> >> Also, the tunable to enable huge pages for share memory is different 
> >> than for
> >> anon memory, so there could be performance loss if it is not set 
> >> correctly.
> >> /sys/kernel/mm/transparent_hugepage/enabled
> >> vs
> >> /sys/kernel/mm/transparent_hugepage/shmem_enabled 

Re: [PATCH] block-qdict: Fix -Werror=maybe-uninitialized build failure

2022-03-11 Thread Markus Armbruster
Murilo Opsfelder Araujo  writes:

> Building QEMU on Fedora 37 (Rawhide Prerelease) ppc64le failed with the
> following error:
>
> $ ../configure --prefix=/usr/local/qemu-disabletcg 
> --target-list=ppc-softmmu,ppc64-softmmu --disable-tcg --disable-linux-user
> ...
> $ make -j$(nproc)
> ...
> FAILED: libqemuutil.a.p/qobject_block-qdict.c.o
> cc -m64 -mlittle-endian -Ilibqemuutil.a.p -I. -I.. 
> -Isubprojects/libvhost-user -I../subprojects/libvhost-user -Iqapi -Itrace 
> -Iui -Iui/shader -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
> -I/usr/include/sysprof-4 -I/usr/include/lib
> mount -I/usr/include/blkid -I/usr/include/gio-unix-2.0 
> -I/usr/include/p11-kit-1 -I/usr/include/pixman-1 -fdiagnostics-color=auto 
> -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g -isystem 
> /root/qemu/linux-headers -isystem linux-headers -iquote
>  . -iquote /root/qemu -iquote /root/qemu/include -iquote 
> /root/qemu/disas/libvixl -pthread -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 
> -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
> -Wredundant-decls -Wundef -Wwrite
> -strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -Wold-style-declaration -Wold-style-definition -Wtype-limits 
> -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body 
> -Wnested-externs -Wendif-label
> s -Wexpansion-to-defined -Wimplicit-fallthrough=2 
> -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi 
> -fstack-protector-strong -fPIE -MD -MQ 
> libqemuutil.a.p/qobject_block-qdict.c.o -MF 
> libqemuutil.a.p/qobject_block-qdict.c.o.d -
> o libqemuutil.a.p/qobject_block-qdict.c.o -c ../qobject/block-qdict.c
> In file included from /root/qemu/include/qapi/qmp/qdict.h:16,
>  from /root/qemu/include/block/qdict.h:13,
>  from ../qobject/block-qdict.c:11:
> /root/qemu/include/qapi/qmp/qobject.h: In function ‘qdict_array_split’:
> /root/qemu/include/qapi/qmp/qobject.h:49:17: error: ‘subqdict’ may be 
> used uninitialized [-Werror=maybe-uninitialized]
>49 | typeof(obj) _obj = (obj);   \
>   | ^~~~
> ../qobject/block-qdict.c:227:16: note: ‘subqdict’ declared here
>   227 | QDict *subqdict;
>   |^~~~
> cc1: all warnings being treated as errors
>
> Fix build failure by initializing the QDict variable with NULL.
>
> Signed-off-by: Murilo Opsfelder Araujo 
> Cc: Kevin Wolf 
> Cc: Hanna Reitz 
> Cc: Markus Armbruster 
> ---
>  qobject/block-qdict.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/qobject/block-qdict.c b/qobject/block-qdict.c
> index 1487cc5dd8..b26524429c 100644
> --- a/qobject/block-qdict.c
> +++ b/qobject/block-qdict.c
> @@ -224,7 +224,7 @@ void qdict_array_split(QDict *src, QList **dst)
>  for (i = 0; i < UINT_MAX; i++) {
>  QObject *subqobj;
>  bool is_subqdict;
> -QDict *subqdict;
> +QDict *subqdict = NULL;
>  char indexstr[32], prefix[32];
>  size_t snprintf_ret;

The compiler's warning is actually spurious.  Your patch is the
minimally invasive way to shut it up.  But I wonder whether we can
make the code clearer instead.  Let's have a look:

   /*
* There may be either a single subordinate object (named
* "%u") or multiple objects (each with a key prefixed "%u."),
* but not both.
*/
   if (!subqobj == !is_subqdict) {
   break;

Because of this, ...

   }

   if (is_subqdict) {

... subqobj is non-null here, and ...

   qdict_extract_subqdict(src, , prefix);
   assert(qdict_size(subqdict) > 0);
   } else {

... null here.

   qobject_ref(subqobj);
   qdict_del(src, indexstr);
   }

   qlist_append_obj(*dst, subqobj ?: QOBJECT(subqdict));

What about this:

   if (is_subqdict) {
   qdict_extract_subqdict(src, , prefix);
   assert(qdict_size(subqdict) > 0);
   qlist_append_obj(*dst, subqobj);
   } else {
   qobject_ref(subqobj);
   qdict_del(src, indexstr);
   qlist_append_obj(*dst, QOBJECT(subqdict));
   }




Re: [PATCH experiment 00/35] stackless coroutine backend

2022-03-11 Thread Stefan Hajnoczi
On Thu, Mar 10, 2022 at 09:14:07PM +0100, Paolo Bonzini wrote:
> On 3/10/22 18:42, Stefan Hajnoczi wrote:
> > There are a lot of details to decide on in the translator tool and
> > runtime to optimize the code. I think the way the stack frames are
> > organized in this patch series is probably for convenience rather than
> > performance.
> 
> Yes, sometimes the optimizations are there but mostly because they made my
> job easier.
> 
> > Out of curiousity, did you run the perf tests and compare against
> > ucontext?
> 
> Not quite voluntarily, but I noticed I had to add one 0 to make them run for
> a decent amount of time.  So yeah, it's much faster than siglongjmp.

That's a nice first indication that performance will be good. I guess
that deep coroutine_fn stacks could be less efficient with stackless
coroutines compared to ucontext, but the cost of switching between
coroutines (enter/yield) will be lower with stackless coroutines.

Stefan


signature.asc
Description: PGP signature


Re: MAINTAINERS: macOS host support (was: MAINTAINERS: take edk2)

2022-03-11 Thread Daniel P . Berrangé
On Fri, Mar 11, 2022 at 10:13:24AM +0100, Christian Schoenebeck wrote:
> On Donnerstag, 10. März 2022 12:40:06 CET Philippe Mathieu-Daudé wrote:
> > +Stefan for overall project resources.
> > 
> > On 10/3/22 12:07, Daniel P. Berrangé wrote:
> > > On Thu, Mar 10, 2022 at 12:00:35PM +0100, Christian Schoenebeck wrote:
> > >> On Mittwoch, 9. März 2022 12:44:16 CET Daniel P. Berrangé wrote:
> > >>> On Wed, Mar 09, 2022 at 11:40:42AM +0100, Christian Schoenebeck wrote:
> >  On Mittwoch, 9. März 2022 11:05:02 CET Philippe Mathieu-Daudé wrote:
> > > Not sure what you have in mind. I'm totally new to the macOS/Darwin
> > > world, and have no choice but to use it as primary workstation and
> > > for CI builds, so I can help with overall testing / maintenance.
> > > 
> > > Peter, since you take some macOS patches, would you like to maintain
> > > this officially? Since I doubt you want to take yet another
> > > responsibility, what about having a co-maintained section, including
> > > technical expertise from Akihiko / Joelle / Christian? (Cc'ed)
> > > 
> > > Regards,
> >  
> >  Also CCing Cameron on this, just in case someone at Apple could spend
> >  some
> >  slices on QEMU macOS patches in general as well.
> >  
> >  As for my part: I try to help out more on the macOS front. As there's
> >  now
> >  macOS host support for 9p I have to start QEMU testing on macOS locally
> >  anyway. Too bad that macOS CI tests on Github are no longer available
> >  BTW.
> > >>> 
> > >>> Note QEMU gets macOS CI coverage in GitLab. We use a clever trick by
> > >>> which we use 'cirrus-run' from the GitLab job to trigger a build in
> > >>> Cirrus CI's macOS builders, and pull the results back when its done.
> > >>> 
> > >>> Any contributor can get this working on their QEMU fork too, if they
> > >>> configure the needed Cirrus CI API token. See the docs in
> > >>> 
> > >>> .gitlab-ci.d/cirrus/README.rst
> > >>> 
> > >>> This is enough for build + automated tests.
> > >> 
> > >> Does this mean that people no longer have to pull their credit card just
> > >> for running CI tests on Gitlab?
> > > 
> > > Not really. The CC validation is something GitLab have had to force
> > > onto all new accounts due to cryptominer abuse of their free shared
> > > CI runners :-( If you have VMs somewhere you could theoretically
> > > spin up your own CI runners instead of using the shared runners and
> > > that could avoid the CC validation need.
> > 
> > Not that trivial, first you need to figure out the list of dependencies
> > GitLab images come with, then you realize you need 50GiB+ of available
> > storage a single pipeline (due to all the Docker images pulled / built)
> > and you also need a decent internet link otherwise various jobs timeout
> > randomly, then you have to wait 20h+ with a quad-core CPU / 16GiB RAM,
> 
> Considering that CI jobs currently take about 1 hour on Gitlab, which 
> processor generation are you referring to that would take 20 hours?

You're not taking into account parallelism. The GitLab pipeline takes
1 hour wallclock time, which is not the same as 1 hour CPU time. We
probably have 20+ jobs running in parallel on gitlab, as they get
farmed out to many machines. If you have only a single machine at your
disposal, then you'll have much less prallelism, so overall time can
be much longer.

> > and eventually you realize you lost 3 days of your life to not register
> > your CC which you'll be forced to give anyway.
> 
> It's an obstacle. And that keeps people away. Plus the trend seems to be that 
> free CI services disappear one by one, so I am not so sure that giving your 
> credit card once solves this issue for good.

The CC requirement there is primarily to act as an identity check
on accounts, so they have some mechanism to discourage and/or trace
abusive users. You can use it to purchase extra CI time, but they've
stated multiple times their intention to continue to grant free CI
time to open source projects and their contributors. They are actively
discussing their plans with a number of open source project contributors
including myself on behalf of QEMU, to better understand our needs. I
outlined my current understanding of their intentions here:

 https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg03962.html

> > Long term maintainers don't realize that because they had the luxury to
> > open their GitLab account soon enough and are now privileged.
> 
> Would it be possible to deploy all CI jobs via Cirrus-CI?

Not unless you want to wait 10 hours for the pipeline to finish. Cirrus
CI only lets you run 2 jobs at a time. It also doesn't have any integrated
container registry which we rely on for creatnig our build env.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: 

Re: MAINTAINERS: macOS host support (was: MAINTAINERS: take edk2)

2022-03-11 Thread Christian Schoenebeck
On Donnerstag, 10. März 2022 12:40:06 CET Philippe Mathieu-Daudé wrote:
> +Stefan for overall project resources.
> 
> On 10/3/22 12:07, Daniel P. Berrangé wrote:
> > On Thu, Mar 10, 2022 at 12:00:35PM +0100, Christian Schoenebeck wrote:
> >> On Mittwoch, 9. März 2022 12:44:16 CET Daniel P. Berrangé wrote:
> >>> On Wed, Mar 09, 2022 at 11:40:42AM +0100, Christian Schoenebeck wrote:
>  On Mittwoch, 9. März 2022 11:05:02 CET Philippe Mathieu-Daudé wrote:
> > Not sure what you have in mind. I'm totally new to the macOS/Darwin
> > world, and have no choice but to use it as primary workstation and
> > for CI builds, so I can help with overall testing / maintenance.
> > 
> > Peter, since you take some macOS patches, would you like to maintain
> > this officially? Since I doubt you want to take yet another
> > responsibility, what about having a co-maintained section, including
> > technical expertise from Akihiko / Joelle / Christian? (Cc'ed)
> > 
> > Regards,
>  
>  Also CCing Cameron on this, just in case someone at Apple could spend
>  some
>  slices on QEMU macOS patches in general as well.
>  
>  As for my part: I try to help out more on the macOS front. As there's
>  now
>  macOS host support for 9p I have to start QEMU testing on macOS locally
>  anyway. Too bad that macOS CI tests on Github are no longer available
>  BTW.
> >>> 
> >>> Note QEMU gets macOS CI coverage in GitLab. We use a clever trick by
> >>> which we use 'cirrus-run' from the GitLab job to trigger a build in
> >>> Cirrus CI's macOS builders, and pull the results back when its done.
> >>> 
> >>> Any contributor can get this working on their QEMU fork too, if they
> >>> configure the needed Cirrus CI API token. See the docs in
> >>> 
> >>> .gitlab-ci.d/cirrus/README.rst
> >>> 
> >>> This is enough for build + automated tests.
> >> 
> >> Does this mean that people no longer have to pull their credit card just
> >> for running CI tests on Gitlab?
> > 
> > Not really. The CC validation is something GitLab have had to force
> > onto all new accounts due to cryptominer abuse of their free shared
> > CI runners :-( If you have VMs somewhere you could theoretically
> > spin up your own CI runners instead of using the shared runners and
> > that could avoid the CC validation need.
> 
> Not that trivial, first you need to figure out the list of dependencies
> GitLab images come with, then you realize you need 50GiB+ of available
> storage a single pipeline (due to all the Docker images pulled / built)
> and you also need a decent internet link otherwise various jobs timeout
> randomly, then you have to wait 20h+ with a quad-core CPU / 16GiB RAM,

Considering that CI jobs currently take about 1 hour on Gitlab, which 
processor generation are you referring to that would take 20 hours?

> and eventually you realize you lost 3 days of your life to not register
> your CC which you'll be forced to give anyway.

It's an obstacle. And that keeps people away. Plus the trend seems to be that 
free CI services disappear one by one, so I am not so sure that giving your 
credit card once solves this issue for good.

> Long term maintainers don't realize that because they had the luxury to
> open their GitLab account soon enough and are now privileged.

Would it be possible to deploy all CI jobs via Cirrus-CI?

> It is unfortunate the project strongly suggest new maintainers to pass
> by that hassle and doesn't provide access to project resources instead.
> 
> But then I know, while the project has access to FOSS hardware resources
> it doesn't have human resources to maintain them so can't use them, back
> to square one.
> 
> Regards,
> 
> Phil.





Re: [PATCH v5 03/13] mm/shmem: Support memfile_notifier

2022-03-11 Thread Chao Peng
On Fri, Mar 11, 2022 at 10:08:22AM +1100, Dave Chinner wrote:
> On Thu, Mar 10, 2022 at 10:09:01PM +0800, Chao Peng wrote:
> > From: "Kirill A. Shutemov" 
> > 
> > It maintains a memfile_notifier list in shmem_inode_info structure and
> > implements memfile_pfn_ops callbacks defined by memfile_notifier. It
> > then exposes them to memfile_notifier via
> > shmem_get_memfile_notifier_info.
> > 
> > We use SGP_NOALLOC in shmem_get_lock_pfn since the pages should be
> > allocated by userspace for private memory. If there is no pages
> > allocated at the offset then error should be returned so KVM knows that
> > the memory is not private memory.
> > 
> > Signed-off-by: Kirill A. Shutemov 
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/shmem_fs.h |  4 +++
> >  mm/shmem.c   | 76 
> >  2 files changed, 80 insertions(+)
> > 
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index 2dde843f28ef..7bb16f2d2825 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -9,6 +9,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  /* inode in-kernel data */
> >  
> > @@ -28,6 +29,9 @@ struct shmem_inode_info {
> > struct simple_xattrsxattrs; /* list of xattrs */
> > atomic_tstop_eviction;  /* hold when working on inode */
> > unsigned intxflags; /* shmem extended flags */
> > +#ifdef CONFIG_MEMFILE_NOTIFIER
> > +   struct memfile_notifier_list memfile_notifiers;
> > +#endif
> > struct inodevfs_inode;
> >  };
> >  
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 9b31a7056009..7b43e274c9a2 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -903,6 +903,28 @@ static struct folio *shmem_get_partial_folio(struct 
> > inode *inode, pgoff_t index)
> > return page ? page_folio(page) : NULL;
> >  }
> >  
> > +static void notify_fallocate(struct inode *inode, pgoff_t start, pgoff_t 
> > end)
> > +{
> > +#ifdef CONFIG_MEMFILE_NOTIFIER
> > +   struct shmem_inode_info *info = SHMEM_I(inode);
> > +
> > +   memfile_notifier_fallocate(>memfile_notifiers, start, end);
> > +#endif
> > +}
> 
> *notify_populate(), not fallocate.  This is a notification that a
> range has been populated, not that the fallocate() syscall was run
> to populate the backing store of a file.
> 
> i.e.  fallocate is the name of a userspace filesystem API that can
> be used to manipulate the backing store of a file in various ways.
> It can both populate and punch away the backing store of a file, and
> some operations that fallocate() can run will do both (e.g.
> FALLOC_FL_ZERO_RANGE) and so could generate both
> notify_invalidate() and a notify_populate() events.

Yes, I fully agreed fallocate syscall has both populating and hole
punching semantics so notify_fallocate can be misleading since we
actually mean populate here.

> 
> Hence "fallocate" as an internal mm namespace or operation does not
> belong anywhere in core MM infrastructure - it should never get used
> anywhere other than the VFS/filesystem layers that implement the
> fallocate() syscall or use it directly.

Will use your suggestion through the series where applied. Thanks for
your suggestion.

Chao
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> da...@fromorbit.com



Re: [PATCH] target/riscv: write back unmodified value for csrrc/csrrs with rs1 is not x0 but holding zero

2022-03-11 Thread Weiwei Li



在 2022/3/11 下午3:54, Alistair Francis 写道:

On Fri, Mar 11, 2022 at 2:58 PM Weiwei Li  wrote:


在 2022/3/11 上午10:58, Alistair Francis 写道:

On Wed, Mar 2, 2022 at 11:50 PM Weiwei Li  wrote:

   For csrrs and csrrc, if rs1 specifies a register other than x0, holding
   a zero value, the instruction will still attempt to write the unmodified
   value back to the csr and will cause side effects

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
   target/riscv/csr.c   | 46 
   target/riscv/op_helper.c |  7 +-
   2 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index aea82dff4a..f4774ca07b 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -2872,7 +2872,7 @@ static RISCVException write_upmbase(CPURISCVState *env, 
int csrno,

   static inline RISCVException riscv_csrrw_check(CPURISCVState *env,
  int csrno,
-   bool write_mask,
+   bool write_csr,
  RISCVCPU *cpu)
   {
   /* check privileges and return RISCV_EXCP_ILLEGAL_INST if check fails */
@@ -2895,7 +2895,7 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
   return RISCV_EXCP_ILLEGAL_INST;
   }
   #endif
-if (write_mask && read_only) {
+if (write_csr && read_only) {
   return RISCV_EXCP_ILLEGAL_INST;
   }

@@ -2915,7 +2915,8 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
   static RISCVException riscv_csrrw_do64(CPURISCVState *env, int csrno,
  target_ulong *ret_value,
  target_ulong new_value,
-   target_ulong write_mask)
+   target_ulong write_mask,
+   bool write_csr)
   {
   RISCVException ret;
   target_ulong old_value;
@@ -2935,8 +2936,8 @@ static RISCVException riscv_csrrw_do64(CPURISCVState 
*env, int csrno,
   return ret;
   }

-/* write value if writable and write mask set, otherwise drop writes */
-if (write_mask) {
+/* write value if needed, otherwise drop writes */
+if (write_csr) {
   new_value = (old_value & ~write_mask) | (new_value & write_mask);
   if (csr_ops[csrno].write) {
   ret = csr_ops[csrno].write(env, csrno, new_value);
@@ -2960,18 +2961,27 @@ RISCVException riscv_csrrw(CPURISCVState *env, int 
csrno,
   {
   RISCVCPU *cpu = env_archcpu(env);

-RISCVException ret = riscv_csrrw_check(env, csrno, write_mask, cpu);
+/*
+ * write value when write_mask is set or rs1 is not x0 but holding zero
+ * value for csrrc(new_value is zero) and csrrs(new_value is all-ones)

I don't understand this. Won't write_mask also be zero and when reading?

Alistair


Yeah. It's true. To distinguish only-read operation with the special
write case(write_mask = 0), I also modified the new_value of riscv_csrrw
from 0 to 1 in helper_csrr :

   target_ulong helper_csrr(CPURISCVState *env, int csr)
   {
   target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , 0, 0);
+
+/*
+ * new_value here should be none-zero or none-all-ones here to
+ * distinguish with csrrc/csrrs with rs1 is not x0 but holding zero value
+ */
+RISCVException ret = riscv_csrrw(env, csr, , 1, 0);

This is confusing though and I worry a future change will break this.
I think we should be explicit instead of using special combinations of
masks. What if a write operation occurred that wanted to write 1 with
a mark of 0?


 When we write csr, if the mask is zero,  the new_value will be ignored 
and write back the original value.


So choose any none-zero and none-all-ones value here is OK. and a new 
instruction to write 1 with a mark of 0


seems  unnecessary. I have no idea what future change may break this 
currently.




The two options seem to either add a check for the seed CSR in
helper_csrr() to fault if the address matches. That's not the best as
then we have specific code, but this requirement seems pretty specific
as well so it's probably ok.


The side effect of writing CSR with original value is ignored in 
previous code.  So I think it's a missing function,


not only the requirement of seed csr.



The other option would be to modify riscv_csrrw() to explicitly pass
in a `bool write_op` that you check against.


I agree that it may be more intuitive and easy to understand if we 
explicitly pass a new "write_op" argument.


I'll try this. Maybe we can judge which one is better later.



Alistair


   if (ret != RISCV_EXCP_NONE) {
   riscv_raise_exception(env, ret, GETPC());


After modification, the cases for all csr related instructions is as follows:

index