Re: [PATCH v2 3/3] savevm: check RAM is pagesize aligned

2020-02-27 Thread Juan Quintela
Aleksandar Markovic  wrote:
> On Thursday, February 27, 2020, Juan Quintela  wrote:
>
>  Marc-André Lureau  wrote:
>  > Hi Juan
>  >
>  > On Wed, Jan 8, 2020 at 2:08 PM Juan Quintela  wrote:
>  >>
>  >> Marc-André Lureau  wrote:
>  >> n> Check the host pointer is correctly aligned, otherwise we may fail
>  >> > during migration in ram_block_discard_range().
>  >> >
>  >> > Signed-off-by: Marc-André Lureau 
>  >>
>  >> Reviewed-by: Juan Quintela 
>  >>
>  >> queued
>  >>
>  >
>  > Did it get lost? thanks
>
>  I dropped it in the past, because it made "make check" for mips fail.
>  (I put it on my ToDo list to investigate and forgot about it)
>
> Thank you for caring for mips.

You are welcome.
But you need to thank "make check"
It didn't pass.

> Do you perhaps remember what was tgevtest and environment for the failing 
> test?

It was plain "make check" with everything under the sun compiled in.
Clearly it was other of the patches, or an interaction between them what
failed.
I don't remember the error, sorry.  I droped the patch to my ToDo list
of things to investigate (the patch was "obviously" correct) and forgot
about it.

Later, Juan.




[PATCH] accel/tcg: increase default code gen buffer size for 64 bit

2020-02-27 Thread Alex Bennée
While 32mb is certainly usable a full system boot ends up flushing the
codegen buffer nearly 100 times. Increase the default on 64 bit hosts
to take advantage of all that spare memory. After this change I can
boot my tests system without any TB flushes.

As we usually run more CONFIG_USER binaries at a time in typical usage
we aren't quite as profligate for user-mode code generation usage. We
also bring the static code gen defies to the same place to keep all
the reasoning in the comments together.

Signed-off-by: Alex Bennée 
Tested-by: Niek Linnenbank 

---
v3
  - 2gb->1gb for system emulation
  - split user and system emulation buffer sizes
---
 accel/tcg/translate-all.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ce5d1b3931..78914154bfc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -892,15 +892,6 @@ static void page_lock_pair(PageDesc **ret_p1, 
tb_page_addr_t phys1,
 }
 }
 
-#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
-/*
- * For user mode on smaller 32 bit systems we may run into trouble
- * allocating big chunks of data in the right place. On these systems
- * we utilise a static code generation buffer directly in the binary.
- */
-#define USE_STATIC_CODE_GEN_BUFFER
-#endif
-
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
but not so small that we can't have a fair number of TB's live.  */
 #define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB)
@@ -929,7 +920,33 @@ static void page_lock_pair(PageDesc **ret_p1, 
tb_page_addr_t phys1,
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
+#if TCG_TARGET_REG_BITS == 32
 #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
+#ifdef CONFIG_USER_ONLY
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
+#define USE_STATIC_CODE_GEN_BUFFER
+#endif
+#else /* TCG_TARGET_REG_BITS == 64 */
+#ifdef CONFIG_USER_ONLY
+/*
+ * As user-mode emulation typically means running multiple instances
+ * of the translator don't go too nuts with our default code gen
+ * buffer lest we make things too hard for the OS.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
+#else
+/*
+ * We expect most system emulation to run one or two guests per host.
+ * Users running large scale system emulation may want to tweak their
+ * runtime setup via the tb-size control on the command line.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
+#endif
+#endif
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1




Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit

2020-02-27 Thread Igor Mammedov
On Thu, 27 Feb 2020 20:07:24 +0100
Niek Linnenbank  wrote:

> Hi Richard,
> 
> On Thu, Feb 27, 2020 at 1:57 PM Richard Henderson <
> richard.hender...@linaro.org> wrote:  
> 
> > On 2/27/20 4:31 AM, Alex Bennée wrote:  
> > >> It does not make sense for a linux-user chroot, running make -jN, on  
> > just about  
> > >> any host.  For linux-user, I could be happy with a modest increase, but  
> > not all  
> > >> the way out to 2GiB.
> > >>
> > >> Discuss.  
> > >
> > > Does it matter that much? Surely for small programs the kernel just
> > > never pages in the used portions of the mmap?  
> >
> > That's why I used the example of a build under the chroot, because the
> > compiler
> > is not a small program.
> >
> > Consider when the memory *is* used, and N * 2GB implies lots of paging,
> > where
> > the previous N * 32MB did not.
> >
> > I agree that a lower default value probably is safer until we have more  
> proof that a larger value does not give any issues.
> 
> 
> > I'm saying that we should consider a setting more like 128MB or so, since
> > the
> > value cannot be changed from the command-line, or through the environment.
> >  
> 
> Proposal: can we then introduce a new command line parameter for this?
> Maybe in a new patch?

linux-user currently uses 32Mb static buffer so it probably fine to
leave it as is or bump it to 128Mb regardless of the 32/64bit host.

for system emulation, we already have tb-size option to set user
specified buffer size.

Issue is with system emulation is that it sizes buffer to 1/4 of
ram_size and dependency on ram_size is what we are trying to get
rid of. If we consider unit/acceptance tests as main target/user,
then they mostly use default ram_size value which varies mostly
from 16Mb to 1Gb depending on the board. So used buffer size is
in 4-256Mb range.
Considering that current CI runs fine with max 256Mb buffer,
it might make sense to use it as new heuristic which would not
regress our test infrastructure and might improve performance
for boards where smaller default buffer was used.


> Since the size of the code generation buffer appears to have an impact on
> performance,
> in my opinion it would make sense to make it configurable by the user.
> 
> Regards,
> 
> 
> >
> >
> > r~
> >
> >  
> 




Re: [PATCH v2 2/6] util: Replace fprintf(stderr, "*\n" with error_report()

2020-02-27 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> From: Alistair Francis 
>
> Replace a large number of the fprintf(stderr, "*\n" calls with
> error_report(). The functions were renamed with these commands and then
> compiler issues where manually fixed.
>
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N;N; {s|fprintf(stderr, 
> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' 
> \
> {} +
> find ./* -type f -exec sed -i \
> 'N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
> find ./* -type f -exec sed -i \
> 'N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
> {} +
>
> The error in aio_poll() was removed manually.
>
> Reviewed-by: Philippe Mathieu-Daudé 
> Message-Id: 
> 
> Signed-off-by: Alistair Francis 
> [PMD: Rebased]
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Alistair Francis 
> Cc: Alistair Francis 
> ---
>  util/coroutine-sigaltstack.c |  3 ++-
>  util/mmap-alloc.c| 11 ++-
>  util/module.c| 13 ++---
>  util/osdep.c |  4 ++--
>  util/oslib-posix.c   |  3 ++-
>  util/oslib-win32.c   |  3 ++-
>  util/qemu-coroutine.c| 10 +-
>  util/qemu-thread-posix.c |  5 +++--
>  util/qemu-thread-win32.c |  5 +++--
>  util/qemu-timer-common.c |  3 ++-
>  10 files changed, 33 insertions(+), 27 deletions(-)
>
> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
> index f6fc49a0e5..63decd4d1d 100644
> --- a/util/coroutine-sigaltstack.c
> +++ b/util/coroutine-sigaltstack.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include "qemu-common.h"
>  #include "qemu/coroutine_int.h"
> +#include "qemu/error-report.h"
>  
>  typedef struct {
>  Coroutine base;
> @@ -80,7 +81,7 @@ static void __attribute__((constructor)) 
> coroutine_init(void)
>  
>  ret = pthread_key_create(&thread_state_key, 
> qemu_coroutine_thread_cleanup);
>  if (ret != 0) {
> -fprintf(stderr, "unable to create leader key: %s\n", 
> strerror(errno));
> +error_report("unable to create leader key: %s", strerror(errno));
>  abort();
>  }
>  }
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 27dcccd8ec..3ac6e10404 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -18,6 +18,7 @@
>  #endif /* CONFIG_LINUX */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/host-utils.h"
>  
> @@ -63,7 +64,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>  } while (ret != 0 && errno == EINTR);
>  
>  if (ret != 0) {
> -fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> +error_report("Couldn't statfs() memory path: %s",
>  strerror(errno));

Indentation is off.

>  exit(1);
>  }
> @@ -160,10 +161,10 @@ void *qemu_ram_mmap(int fd,
>  len = 0;
>  }
>  file_name[len] = '\0';
> -fprintf(stderr, "Warning: requesting persistence across crashes "
> -"for backend file %s failed. Proceeding without "
> -"persistence, data might become corrupted in case of 
> host "
> -"crash.\n", file_name);
> +error_report("Warning: requesting persistence across crashes "
> + "for backend file %s failed. Proceeding without "
> + "persistence, data might become corrupted in case "
> + "of host crash.", file_name);

This should be something like

   warn_report("requesting persistence across crashes"
   " for backend file %s failed",
   file_name);
   error_printf("Proceeding without persistence, data might"
" become corrupted 

RE: [PATCH v2 02/13] block/iscsi:Remove redundant statement in iscsi_open()

2020-02-27 Thread Chenqun (kuhn)
>-Original Message-
>From: Kevin Wolf [mailto:kw...@redhat.com]
>Sent: Thursday, February 27, 2020 6:31 PM
>To: Chenqun (kuhn) 
>Cc: qemu-devel@nongnu.org; qemu-triv...@nongnu.org;
>peter.mayd...@linaro.org; Zhanghailiang ;
>Euler Robot ; Ronnie Sahlberg
>; Paolo Bonzini ; Peter
>Lieven ; Max Reitz 
>Subject: Re: [PATCH v2 02/13] block/iscsi:Remove redundant statement in
>iscsi_open()
>
>Am 27.02.2020 um 02:49 hat Chenqun (kuhn) geschrieben:
>> >-Original Message-
>> >From: Kevin Wolf [mailto:kw...@redhat.com]
>> >Sent: Wednesday, February 26, 2020 5:55 PM
>> >To: Chenqun (kuhn) 
>> >Cc: qemu-devel@nongnu.org; qemu-triv...@nongnu.org;
>> >peter.mayd...@linaro.org; Zhanghailiang
>> >; Euler Robot
>> >; Ronnie Sahlberg
>;
>> >Paolo Bonzini ; Peter Lieven ; Max
>> >Reitz 
>> >Subject: Re: [PATCH v2 02/13] block/iscsi:Remove redundant statement
>> >in
>> >iscsi_open()
>> >
>> >Am 26.02.2020 um 09:46 hat kuhn.chen...@huawei.com geschrieben:
>> >> From: Chen Qun 
>> >>
>> >> Clang static code analyzer show warning:
>> >>   block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read
>> >> flags &= ~BDRV_O_RDWR;
>> >> ^
>> >>
>> >> Reported-by: Euler Robot 
>> >> Signed-off-by: Chen Qun 
>> >
>> >Hmm, I'm not so sure about this one because if we remove the line,
>> >flags will be inconsistent with bs->open_flags. It feels like setting
>> >a trap for anyone who wants to add code using flags in the future.
>> Hi Kevin,
>> I find it exists since 8f3bf50d34037266.   :  )
>
>Yes, it has existed from the start with auto-read-only.
>
>> It's not a big deal,  just upset clang static code analyzer.
>> As you said, it could be a trap for the future.
>
>What's interesting is that we do have one user of the flags later in the 
>function,
>but it uses bs->open_flags instead:
>
>ret = iscsi_allocmap_init(iscsilun, bs->open_flags);
>
>Maybe this should be using flags? (The value of the bits we're interested in is
>the same, but when flags is passed as a parameter, I would expect it to be
>used.)
>
Hi Kevin,
I have a question: are 'flags' exactly the same as 'bs-> open_flags'? 
In the function bdrv_open_common() at block.c file,  the existence of 
statement( open_flags = bdrv_open_flags(bs, bs->open_flags); ) makes them a 
little different.
Will this place affect them inconsistently ?

Is it safer if we assign bs-> open_flags to flags?
Modify just like:
@@ -1917,7 +1917,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (ret < 0) {
 goto out;
 }
-flags &= ~BDRV_O_RDWR;
+flags = bs->open_flags;
 }

 iscsi_readcapacity_sync(iscsilun, &local_err);
@@ -2002,7 +2002,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 iscsilun->cluster_size = iscsilun->bl.opt_unmap_gran *
 iscsilun->block_size;
 if (iscsilun->lbprz) {
-ret = iscsi_allocmap_init(iscsilun, bs->open_flags);
+ret = iscsi_allocmap_init(iscsilun, flags);
 }
 }

Thanks.





[PATCH v3 3/4] scripts/simplebench: add example usage of simplebench

2020-02-27 Thread Vladimir Sementsov-Ogievskiy
This example may be used as a template for custom benchmark.
It illustrates three things to prepare:
 - define bench_func
 - define test environments (columns)
 - define test cases (rows)
And final call of simplebench API.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/bench-example.py | 80 
 1 file changed, 80 insertions(+)
 create mode 100644 scripts/simplebench/bench-example.py

diff --git a/scripts/simplebench/bench-example.py 
b/scripts/simplebench/bench-example.py
new file mode 100644
index 00..c642a5b891
--- /dev/null
+++ b/scripts/simplebench/bench-example.py
@@ -0,0 +1,80 @@
+#!/usr/bin/env python3
+#
+# Benchmark example
+#
+# Copyright (c) 2019 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import simplebench
+from bench_block_job import bench_block_copy, drv_file, drv_nbd
+
+
+def bench_func(env, case):
+""" Handle one "cell" of benchmarking table. """
+return bench_block_copy(env['qemu_binary'], env['cmd'],
+case['source'], case['target'])
+
+
+# You may set the following five variables to correct values, to turn this
+# example to real benchmark.
+ssd_source = '/path-to-raw-source-image-at-ssd'
+ssd_target = '/path-to-raw-target-image-at-ssd'
+hdd_target = '/path-to-raw-source-image-at-hdd'
+nbd_ip = 'nbd-ip-addr'
+nbd_port = 'nbd-port-number'
+
+# Test-cases are "rows" in benchmark resulting table, 'id' is a caption for
+# the row, other fields are handled by bench_func.
+test_cases = [
+{
+'id': 'ssd -> ssd',
+'source': drv_file(ssd_source),
+'target': drv_file(ssd_target)
+},
+{
+'id': 'ssd -> hdd',
+'source': drv_file(ssd_source),
+'target': drv_file(hdd_target)
+},
+{
+'id': 'ssd -> nbd',
+'source': drv_file(ssd_source),
+'target': drv_nbd(nbd_ip, nbd_port)
+},
+]
+
+# Test-envs are "columns" in benchmark resulting table, 'id is a caption for
+# the column, other fields are handled by bench_func.
+test_envs = [
+{
+'id': 'backup-1',
+'cmd': 'blockdev-backup',
+'qemu_binary': '/path-to-qemu-binary-1'
+},
+{
+'id': 'backup-2',
+'cmd': 'blockdev-backup',
+'qemu_binary': '/path-to-qemu-binary-2'
+},
+{
+'id': 'mirror',
+'cmd': 'blockdev-mirror',
+'qemu_binary': '/path-to-qemu-binary-1'
+}
+]
+
+result = simplebench.bench(bench_func, test_envs, test_cases, count=3)
+print(simplebench.ascii(result))
-- 
2.21.0




[PATCH v3 0/4] benchmark util

2020-02-27 Thread Vladimir Sementsov-Ogievskiy
Hi all!

v3:
  move all to scripts/simplebench
  add myself as a maintainer of this thing

Here is simple benchmarking utility, to generate performance
comparison tables, like the following:

--  -  -  -
backup-1   backup-2   mirror
ssd -> ssd  0.43 +- 0.00   4.48 +- 0.06   4.38 +- 0.02
ssd -> hdd  10.60 +- 0.08  10.69 +- 0.18  10.57 +- 0.05
ssd -> nbd  33.81 +- 0.37  10.67 +- 0.17  10.07 +- 0.07
--  -  -  -

I'll use this benchmark in other series, hope someone
will like it.

Vladimir Sementsov-Ogievskiy (4):
  scripts/simplebench: add simplebench.py
  scripts/simplebench: add qemu/bench_block_job.py
  scripts/simplebench: add example usage of simplebench
  MAINTAINERS: add simplebench

 MAINTAINERS|   5 +
 scripts/simplebench/bench-example.py   |  80 
 scripts/simplebench/bench_block_job.py | 119 +++
 scripts/simplebench/simplebench.py | 128 +
 4 files changed, 332 insertions(+)
 create mode 100644 scripts/simplebench/bench-example.py
 create mode 100755 scripts/simplebench/bench_block_job.py
 create mode 100644 scripts/simplebench/simplebench.py

-- 
2.21.0




[PATCH v3 1/4] scripts/simplebench: add simplebench.py

2020-02-27 Thread Vladimir Sementsov-Ogievskiy
Add simple benchmark table creator.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/simplebench.py | 128 +
 1 file changed, 128 insertions(+)
 create mode 100644 scripts/simplebench/simplebench.py

diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
new file mode 100644
index 00..59e7314ff6
--- /dev/null
+++ b/scripts/simplebench/simplebench.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python
+#
+# Simple benchmarking framework
+#
+# Copyright (c) 2019 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+
+def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
+"""Benchmark one test-case
+
+test_func   -- benchmarking function with prototype
+   test_func(env, case), which takes test_env and test_case
+   arguments and returns {'seconds': int} (which is benchmark
+   result) on success and {'error': str} on error. Returned
+   dict may contain any other additional fields.
+test_env-- test environment - opaque first argument for test_func
+test_case   -- test case - opaque second argument for test_func
+count   -- how many times to call test_func, to calculate average
+initial_run -- do initial run of test_func, which don't get into result
+
+Returns dict with the following fields:
+'runs': list of test_func results
+'average':  average seconds per run (exists only if at least one run
+succeeded)
+'delta':maximum delta between test_func result and the average
+(exists only if at least one run succeeded)
+'n-failed': number of failed runs (exists only if at least one run
+failed)
+"""
+if initial_run:
+print('  #initial run:')
+print('   ', test_func(test_env, test_case))
+
+runs = []
+for i in range(count):
+print('  #run {}'.format(i+1))
+res = test_func(test_env, test_case)
+print('   ', res)
+runs.append(res)
+
+result = {'runs': runs}
+
+successed = [r for r in runs if ('seconds' in r)]
+if successed:
+avg = sum(r['seconds'] for r in successed) / len(successed)
+result['average'] = avg
+result['delta'] = max(abs(r['seconds'] - avg) for r in successed)
+
+if len(successed) < count:
+result['n-failed'] = count - len(successed)
+
+return result
+
+
+def ascii_one(result):
+"""Return ASCII representation of bench_one() returned dict."""
+if 'average' in result:
+s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
+if 'n-failed' in result:
+s += '\n({} failed)'.format(result['n-failed'])
+return s
+else:
+return 'FAILED'
+
+
+def bench(test_func, test_envs, test_cases, *args, **vargs):
+"""Fill benchmark table
+
+test_func -- benchmarking function, see bench_one for description
+test_envs -- list of test environments, see bench_one
+test_cases -- list of test cases, see bench_one
+args, vargs -- additional arguments for bench_one
+
+Returns dict with the following fields:
+'envs':  test_envs
+'cases': test_cases
+'tab':   filled 2D array, where cell [i][j] is bench_one result for
+ test_cases[i] for test_envs[j] (i.e., rows are test cases and
+ columns are test environments)
+"""
+tab = {}
+results = {
+'envs': test_envs,
+'cases': test_cases,
+'tab': tab
+}
+n = 1
+n_tests = len(test_envs) * len(test_cases)
+for env in test_envs:
+for case in test_cases:
+print('Testing {}/{}: {} :: {}'.format(n, n_tests,
+   env['id'], case['id']))
+if case['id'] not in tab:
+tab[case['id']] = {}
+tab[case['id']][env['id']] = bench_one(test_func, env, case,
+   *args, **vargs)
+n += 1
+
+print('Done')
+return results
+
+
+def ascii(results):
+"""Return ASCII representation of bench() returned dict."""
+from tabulate import tabulate
+
+tab = [[""] + [c['id'] for c in results['envs']]]
+for case in results[

[PATCH v3 2/4] scripts/simplebench: add qemu/bench_block_job.py

2020-02-27 Thread Vladimir Sementsov-Ogievskiy
Add block-job benchmarking helper functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/bench_block_job.py | 119 +
 1 file changed, 119 insertions(+)
 create mode 100755 scripts/simplebench/bench_block_job.py

diff --git a/scripts/simplebench/bench_block_job.py 
b/scripts/simplebench/bench_block_job.py
new file mode 100755
index 00..9808d696cf
--- /dev/null
+++ b/scripts/simplebench/bench_block_job.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python
+#
+# Benchmark block jobs
+#
+# Copyright (c) 2019 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+
+import sys
+import os
+import socket
+import json
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
+from qemu.machine import QEMUMachine
+from qemu.qmp import QMPConnectError
+
+
+def bench_block_job(cmd, cmd_args, qemu_args):
+"""Benchmark block-job
+
+cmd   -- qmp command to run block-job (like blockdev-backup)
+cmd_args  -- dict of qmp command arguments
+qemu_args -- list of Qemu command line arguments, including path to Qemu
+ binary
+
+Returns {'seconds': int} on success and {'error': str} on failure, dict may
+contain addional 'vm-log' field. Return value is compatible with
+simplebench lib.
+"""
+
+vm = QEMUMachine(qemu_args[0], args=qemu_args[1:])
+
+try:
+vm.launch()
+except OSError as e:
+return {'error': 'popen failed: ' + str(e)}
+except (QMPConnectError, socket.timeout):
+return {'error': 'qemu failed: ' + str(vm.get_log())}
+
+try:
+res = vm.qmp(cmd, **cmd_args)
+if res != {'return': {}}:
+vm.shutdown()
+return {'error': '"{}" command failed: {}'.format(cmd, str(res))}
+
+e = vm.event_wait('JOB_STATUS_CHANGE')
+assert e['data']['status'] == 'created'
+start_ms = e['timestamp']['seconds'] * 100 + \
+e['timestamp']['microseconds']
+
+e = vm.events_wait((('BLOCK_JOB_READY', None),
+('BLOCK_JOB_COMPLETED', None),
+('BLOCK_JOB_FAILED', None)), timeout=True)
+if e['event'] not in ('BLOCK_JOB_READY', 'BLOCK_JOB_COMPLETED'):
+vm.shutdown()
+return {'error': 'block-job failed: ' + str(e),
+'vm-log': vm.get_log()}
+end_ms = e['timestamp']['seconds'] * 100 + \
+e['timestamp']['microseconds']
+finally:
+vm.shutdown()
+
+return {'seconds': (end_ms - start_ms) / 100.0}
+
+
+# Bench backup or mirror
+def bench_block_copy(qemu_binary, cmd, source, target):
+"""Helper to run bench_block_job() for mirror or backup"""
+assert cmd in ('blockdev-backup', 'blockdev-mirror')
+
+source['node-name'] = 'source'
+target['node-name'] = 'target'
+
+return bench_block_job(cmd,
+   {'job-id': 'job0', 'device': 'source',
+'target': 'target', 'sync': 'full'},
+   [qemu_binary,
+'-blockdev', json.dumps(source),
+'-blockdev', json.dumps(target)])
+
+
+def drv_file(filename):
+return {'driver': 'file', 'filename': filename,
+'cache': {'direct': True}, 'aio': 'native'}
+
+
+def drv_nbd(host, port):
+return {'driver': 'nbd',
+'server': {'type': 'inet', 'host': host, 'port': port}}
+
+
+if __name__ == '__main__':
+import sys
+
+if len(sys.argv) < 4:
+print('USAGE: {}  '
+  ' '
+  ''.format(sys.argv[0]))
+exit(1)
+
+res = bench_block_job(sys.argv[1], json.loads(sys.argv[2]), sys.argv[3:])
+if 'seconds' in res:
+print('{:.2f}'.format(res['seconds']))
+else:
+print(res)
-- 
2.21.0




[PATCH v3 4/4] MAINTAINERS: add simplebench

2020-02-27 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5e5e3e52d6..16d069adc5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2038,6 +2038,11 @@ F: python/qemu/*py
 F: scripts/*.py
 F: tests/*.py
 
+Benchmark util
+M: Vladimir Sementsov-Ogievskiy 
+S: Maintained
+F: scripts/simplebench/
+
 QAPI
 M: Markus Armbruster 
 M: Michael Roth 
-- 
2.21.0




Re: [PATCH v2] qapi/machine: Place the 'Notes' tag after the 'Since' tag

2020-02-27 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> On 2/27/20 3:55 PM, Philippe Mathieu-Daudé wrote:
>> On 2/27/20 3:52 PM, Markus Armbruster wrote:
>>> Philippe Mathieu-Daudé  writes:
>>>
 This fixes when adding a 'Since' tag:

    In file included from qapi/qapi-schema.json:105:
    qapi/machine.json:25:1: '@arch:' can't follow 'Notes' section
>>>
>>> I'm confused.  This error is detected in scripts/qapi/parser.py, and it
>>> is fatal.  Is the build broken for you?  It isn't for me.  Moreover,
>>> where is @arch?  I can't see it anywhere close to the two spots the
>>> patch patches.
>>
>> I get the error after trying to fix what Eric commented here:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg682344.html
>
> Using:
> ---
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6c11e3cf3a..40a36d6276 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -20,13 +20,15 @@
>  #prefix to produce the corresponding QEMU executable name. This
>  #is true even for "qemu-system-x86_64".
>  #
> +# @rx: since 5.0
> +#
>  # Since: 3.0
>  ##
>  { 'enum' : 'SysEmuTarget',
>'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
>   'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
>   'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
> - 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
> + 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
>   'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
>   'x86_64', 'xtensa', 'xtensaeb' ] }
> ---
>
> or
>
> ---
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6c11e3cf3a..4b59e87b6f 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -21,12 +21,14 @@
>  #is true even for "qemu-system-x86_64".
>  #
>  # Since: 3.0
> +#
> +# @rx: since 5.0
>  ##
>  { 'enum' : 'SysEmuTarget',
>'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
>   'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
>   'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
> - 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
> + 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
>   'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
>   'x86_64', 'xtensa', 'xtensaeb' ] }
> ---
>
> I get:
>
>   GEN qapi-gen
>   GEN rx-softmmu/config-devices.mak
> In file included from qapi/qapi-schema.json:105:
> qapi/machine.json:23:1: '@rx:' can't follow 'Notes' section
> make: *** [Makefile:645: qapi-gen-timestamp] Error 1
>
> This works however:
>
> ---
>  ##
>  # @SysEmuTarget:
>  #
>  # The comprehensive enumeration of QEMU system emulation ("softmmu")
>  # targets. Run "./configure --help" in the project root directory, and
>  # look for the *-softmmu targets near the "--target-list" option. The
>  # individual target constants are not documented here, for the time
>  # being.
>  #
> +# @rx: since 5.0
> +#
>  # Notes: The resulting QMP strings can be appended to the "qemu-system-"
>  #prefix to produce the corresponding QEMU executable name. This
>  #is true even for "qemu-system-x86_64".
>  #
>  # Since: 3.0
>  ##
>  { 'enum' : 'SysEmuTarget',
>'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
>   'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
>   'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
> - 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
> + 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
>   'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
>   'x86_64', 'xtensa', 'xtensaeb' ] }
> ---

This one adds it to the correct spot.

qapi-code-gen.txt:

Definition documentation starts with a line naming the definition,
followed by an optional overview, a description of each argument (for
commands and events), member (for structs and unions), branch (for
alternates), or value (for enums), and finally optional tagged
sections.

Let's apply this to SysEmuTarget's doc comment:

##
# @SysEmuTarget:

Line naming the definition

#
# The comprehensive enumeration of QEMU system emulation ("softmmu")
# targets. Run "./configure --help" in the project root directory, and
# look for the *-softmmu targets near the "--target-list" option. The
# individual target constants are not documented here, for the time
# being.

Optional overview.

Missing here: a description of each value.  We should enforce such
descriptions.  We don't, mostly because we have a number of exceptions
where documentation would be bothersome, such as enum QKeyCode.

#
# Notes: The resulting QMP strings can be appended to the "qemu-system-"
#prefix to produce the corresponding QEMU executable name. This
#is true even for "qemu-system-x86_64".

A tagged section.

#
   

Re: ping Re: [PATCH for-5.0 v2 0/3] benchmark util

2020-02-27 Thread Vladimir Sementsov-Ogievskiy

27.02.2020 23:09, Eduardo Habkost wrote:

Sorry, this is due to lack of bandwidth of maintainers who can
review those patches.

I have one suggestion: if you make your script self-contained
inside a scripts/ subdirectory, it would be simpler to merge it
without detailed reviews from others.


That works for me



The python/ subdirectory is supposed to appear on sys.path, so


Hmm. Ok, I think, we'll always be able to move shareable parts of simplebench
into python/ if needed. So it's OK to keep it in scripts. I just
thought that python/ is a new home for python scripts :) So, it's OK
to keep the whole thing at scripts/ for now.


maybe simplebench.py and qemu/bench_block_job.py can stay there,
but bench-example.py is not a loadable Python module and
shouldn't be there.

I see two possible options:

a) Moving everything to a scripts/simplebench subdirectory.
b) Moving only bench-example.py to scripts/, and do the sys.path
hacking the other scripts do.

On either case, please add your name to MAINTAINERS as the
maintainer of those new files.



OK, thanks!



On Thu, Feb 27, 2020 at 04:18:00PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Hi!

Is problem in "S: Odd fixes" in Python section of MAINTAINERS?

Will it be correct, if I send a patch to MAINTAINERS, proposing
myself as maintainer of Python scripts and s/Odd fixes/Maintained/ ?

And then just send pull request with this series, as "nobody minds"?

08.02.2020 13:36, Vladimir Sementsov-Ogievskiy wrote:

pingg..

Hi! Could it be merged at all?

20.01.2020 12:10, Vladimir Sementsov-Ogievskiy wrote:

ping

26.11.2019 18:48, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Here is simple benchmarking utility, to generate performance
comparison tables, like the following:

--  -  -  -
  backup-1   backup-2   mirror
ssd -> ssd  0.43 +- 0.00   4.48 +- 0.06   4.38 +- 0.02
ssd -> hdd  10.60 +- 0.08  10.69 +- 0.18  10.57 +- 0.05
ssd -> nbd  33.81 +- 0.37  10.67 +- 0.17  10.07 +- 0.07
--  -  -  -

This is a v2, as v1 was inside
   "[RFC 00/24] backup performance: block_status + async"

I'll use this benchmark in other series, hope someone
will like it.

Vladimir Sementsov-Ogievskiy (3):
    python: add simplebench.py
    python: add qemu/bench_block_job.py
    python: add example usage of simplebench

   python/bench-example.py    |  80 +
   python/qemu/bench_block_job.py | 115 +
   python/simplebench.py  | 128 +
   3 files changed, 323 insertions(+)
   create mode 100644 python/bench-example.py
   create mode 100755 python/qemu/bench_block_job.py
   create mode 100644 python/simplebench.py










--
Best regards,
Vladimir






--
Best regards,
Vladimir



Re: [PATCH v3 1/1] target/riscv: add vector integer operations

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 13:46, Richard Henderson wrote:

On 2/25/20 6:43 PM, LIU Zhiwei wrote:

Signed-off-by: LIU Zhiwei 
---
  target/riscv/helper.h   |  395 +++
  target/riscv/insn32.decode  |  127 +++
  target/riscv/insn_trans/trans_rvv.inc.c |  671 +++-
  target/riscv/vector_helper.c| 1308 ++-
  4 files changed, 2462 insertions(+), 39 deletions(-)

This patch is too large and needs splitting.

OK.

-static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm)
+static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
+bool widen)
  {
-return !(s->lmul > 1 && vm == 0 && vd == 0);
+return (vm != 0 || vd != 0) ? true : (!widen && (s->lmul == 0));
  }
  

Best to move the addition of widen back to the patch that introduced this 
function.

The "? true :" is a funny way to write ||.

Oh yes. I did not notice it.


r~





Re: [PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 11:33, Richard Henderson wrote:

On 2/27/20 5:50 PM, LIU Zhiwei wrote:

This is not what I had in mind, and looks wrong as well.

 int idx = (index * mlen) / 64;
 int pos = (index * mlen) % 64;
 return (((uint64_t *)v0)[idx] >> pos) & 1;

You also might consider passing log2(mlen), so the multiplication could be
strength-reduced to a shift.

I don't think so. For example, when mlen is 8 bits and index is 0, it will
reduce to

return (((uint64_t *)v0)[0]) & 1

And it's not right.

The right bit is first bit in vector register 0. And in host big endianess,
it will be  the first bit of the seventh byte.

You've forgotten that we've just done an 8-byte big-endian load, which means
that we *are* looking at the first bit of the byte at offset 7.

It is right.

Yes, that's it.
  

You don't need to pass mlen, since it's

Yes.

I finally remembered all of the bits that go into mlen and thought I had
deleted that sentence -- apparently I only removed half.  ;-)


r~





Re: [PATCH v2] spapr: Fix Coverity warning while validating nvdimm options

2020-02-27 Thread David Gibson
On Thu, Feb 27, 2020 at 07:42:49AM -0600, Shivaprasad G Bhat wrote:
> Fixes Coverity issue,
>   CID 1419883:  Error handling issues  (CHECKED_RETURN)
>Calling "qemu_uuid_parse" without checking return value
> 
> nvdimm_set_uuid() already verifies if the user provided uuid is valid or
> not. So, need to check for the validity during pre-plug validation again.
> 
> As this a false positive in this case, assert if not valid to be safe.
> Also, error_abort if QOM accessor encounters error while fetching the uuid
> property.
> 
> Reported-by: Coverity (CID 1419883)
> Signed-off-by: Shivaprasad G Bhat 

Applied to ppc-for-5.0, thanks.

> ---
>  hw/ppc/spapr_nvdimm.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index 74eeb8bb74..25be8082d7 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -35,6 +35,7 @@ void spapr_nvdimm_validate_opts(NVDIMMDevice *nvdimm, 
> uint64_t size,
>  {
>  char *uuidstr = NULL;
>  QemuUUID uuid;
> +int ret;
>  
>  if (size % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>  error_setg(errp, "NVDIMM memory size excluding the label area"
> @@ -43,8 +44,10 @@ void spapr_nvdimm_validate_opts(NVDIMMDevice *nvdimm, 
> uint64_t size,
>  return;
>  }
>  
> -uuidstr = object_property_get_str(OBJECT(nvdimm), NVDIMM_UUID_PROP, 
> NULL);
> -qemu_uuid_parse(uuidstr, &uuid);
> +uuidstr = object_property_get_str(OBJECT(nvdimm), NVDIMM_UUID_PROP,
> +  &error_abort);
> +ret = qemu_uuid_parse(uuidstr, &uuid);
> +g_assert(!ret);
>  g_free(uuidstr);
>  
>  if (qemu_uuid_is_null(&uuid)) {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v3 1/1] target/riscv: add vector integer operations

2020-02-27 Thread Richard Henderson
On 2/25/20 6:43 PM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/helper.h   |  395 +++
>  target/riscv/insn32.decode  |  127 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  671 +++-
>  target/riscv/vector_helper.c| 1308 ++-
>  4 files changed, 2462 insertions(+), 39 deletions(-)

This patch is too large and needs splitting.

> -static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm)
> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
> +bool widen)
>  {
> -return !(s->lmul > 1 && vm == 0 && vd == 0);
> +return (vm != 0 || vd != 0) ? true : (!widen && (s->lmul == 0));
>  }
>  

Best to move the addition of widen back to the patch that introduced this 
function.

The "? true :" is a funny way to write ||.


r~



Re: [PATCH v4 5/5] target/riscv: add vector amo operations

2020-02-27 Thread Richard Henderson
On 2/25/20 2:35 AM, LIU Zhiwei wrote:
> +if (s->sew < 2) {
> +return false;
> +}

This could just as easily be in amo_check?

> +
> +if (tb_cflags(s->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC64
> +fn = fns[0][seq][s->sew - 2];
> +#else
> +gen_helper_exit_atomic(cpu_env);
> +s->base.is_jmp = DISAS_NORETURN;
> +return true;
> +#endif

Why are you raising exit_atomic without first checking that s->sew == 3?  We
can do 32-bit atomic operations always.

> +} else {
> +fn = fns[1][seq][s->sew - 2];
> +}
> +if (fn == NULL) {
> +return false;
> +}
> +
> +return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool amo_check(DisasContext *s, arg_rwdvm* a)
> +{
> +return (vext_check_isa_ill(s, RVV | RVA) &&
> +(a->wd ? vext_check_overlap_mask(s, a->rd, a->vm) : 1) &&
> +vext_check_reg(s, a->rd, false) &&
> +vext_check_reg(s, a->rs2, false));
> +}

I guess the "If SEW is greater than XLEN, an illegal instruction exception is
raised" requirement is currently in the column of NULLs in the !CONFIG_RISCV64
block.  But it might be better to have it explicit and save the column of NULLs.

It makes sense to me to do both sew checks together, whether in amo_check or in
amo_op.

> +#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ETYPE, MTYPE, H, DO_OP, SUF)  \
> +static void vext_##NAME##_noatomic_op(void *vs3, target_ulong addr,  \
> +uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr)\
> +{\
> +ETYPE ret;   \
> +target_ulong tmp;\
> +int mmu_idx = cpu_mmu_index(env, false); \
> +tmp = cpu_ld##SUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);  \
> +ret = DO_OP((ETYPE)(MTYPE)tmp, *((ETYPE *)vs3 + H(idx)));\
> +cpu_st##SUF##_mmuidx_ra(env, addr, ret, mmu_idx, retaddr);   \
> +if (wd) {\
> +*((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \

The target_long cast is wrong; should be ETYPE.

You can use cpu_ldX/stX_data (no mmu_idx or retaddr argument).  There should be
no faults, since you've already checked for read+write.

> +/* atomic opreation for vector atomic insructions */
> +#ifndef CONFIG_USER_ONLY
> +#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)   \
> +static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,\
> +uint32_t wd, uint32_t idx, CPURISCVState *env)   \
> +{\
> +target_ulong tmp;\
> +int mem_idx = cpu_mmu_index(env, false); \
> +tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx)),  \
> +make_memop_idx(MO_ALIGN | MOFLAG, mem_idx)); \
> +if (wd) {\
> +*((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \
> +}\
> +}
> +#else
> +#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)   \
> +static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,\
> +uint32_t wd, uint32_t idx, CPURISCVState *env)   \
> +{\
> +target_ulong tmp;\
> +tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx))); \
> +if (wd) {\
> +*((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \
> +}\
> +}
> +#endif

This is not right.  It is not legal to call these helpers from another helper
-- they will use the wrong GETPC() and will not unwind properly.

> +static inline void vext_amo_atomic(void *vs3, void *v0, target_ulong base,
> +void *vs2, CPURISCVState *env, uint32_t desc,
> +vext_get_index_addr get_index_addr,
> +vext_amo_atomic_fn atomic_op,
> +vext_ld_clear_elem clear_elem,
> +uint32_t esz, uint32_t msz, uintptr_t ra)
> +{
> +uint32_t i;
> +target_long addr;
> +uint32_t wd = vext_wd(desc);
> +uint32_t vm = vext_vm(desc);
> +uint32_t mlen = vext_mlen(desc);
> +uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +for (i = 0; i < env->vl; i++) {
> +if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +continue;
> +}
> +probe_read_access(env, get_index_addr(base, i, vs2), m

Re: [PATCH v4 2/5] generic vhost user server

2020-02-27 Thread Coiby Xu
> > +static coroutine_fn void vu_client_next_trip(VuClient *client);
> > +
> > +static coroutine_fn void vu_client_trip(void *opaque)
> > +{
> > +VuClient *client = opaque;
> > +
> > +vu_dispatch(&client->parent);
> > +client->co_trip = NULL;
> > +if (!client->closed) {
> > +vu_client_next_trip(client);
> > +}
> > +}

> > The last part is very untypical coroutine code: It says that we want to
spawn a new coroutine with vu_client_trip() as its entry point, and then
terminates the current one.

> > Why don't we just put the whole thing in a while (!client->closed) loop
and stay in the same coroutine instead of terminating the old one and
starting a new one all the time?

> > +static coroutine_fn void vu_client_next_trip(VuClient *client)
> > +{
> > +if (!client->co_trip) {
> > +client->co_trip = qemu_coroutine_create(vu_client_trip, client);
> > +aio_co_schedule(client->ioc->ctx, client->co_trip);
> > +}
> > +}
> > +
> > +static void vu_client_start(VuClient *client)
> > +{
> > +client->co_trip = qemu_coroutine_create(vu_client_trip, client);
> > +aio_co_enter(client->ioc->ctx, client->co_trip);
> > +}

> This is essentially a duplicate of vu_client_next_trip(). The only
place where it is called (vu_accept()) knows that client->co_trip is
already NULL, so it could just call vu_client_next_trip().

> Or in fact, if vu_client_trip() gets turned into a loop, it's
> vu_client_next_trip() that becomes unnecessary.

This part of code is an imitation of nbd_client_trip in nbd/server.c.
I think the reason to repeatedly create/start/terminate vu_client_trip
is to support BlockBackendAioNotifier. In v5, I will keep running the
spawned coroutine in a loop until being informed of the change of
AioContext of the block device backend, i.e. vu_client_trip will only
be restarted when the block device backend is attached to a different
AiOContext.

> > +if (rc != sizeof(eventfd_t)) {
> > +if (errno == EAGAIN) {
> > +qio_channel_yield(data->ioc, G_IO_IN);
> > +} else if (errno != EINTR) {
> > +data->co = NULL;
> > +return;
> > +}
> > +} else {
> > +vq->handler(dev, index);
> > +}
> > +data->co = NULL;
> > +vu_kick_cb_next(client, data);

> This can be a loop, too, instead of terminating the coroutine and
starting a new one for the same function.

In v5, I plan to use aio_set_fd_handler to set a read hander which is
a wrapper for vu_kick_cb to deal with kick events since eventfd
doesn't have the short read issue like socket. Thus vu_kick_cb in
libvhost-user can be re-used. My only concern is if this could lead to
worse performance in comparison to keep reading from eventfd until
getting EAGAIN errno.

On Tue, Feb 25, 2020 at 11:44 PM Kevin Wolf  wrote:
>
> Am 18.02.2020 um 06:07 hat Coiby Xu geschrieben:
> > Sharing QEMU devices via vhost-user protocol
> >
> > Signed-off-by: Coiby Xu 
> > ---
> >  util/Makefile.objs   |   3 +
> >  util/vhost-user-server.c | 427 +++
> >  util/vhost-user-server.h |  56 +
> >  3 files changed, 486 insertions(+)
> >  create mode 100644 util/vhost-user-server.c
> >  create mode 100644 util/vhost-user-server.h
> >
> > diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
> > new file mode 100644
> > index 00..ff6d3145cd
> > --- /dev/null
> > +++ b/util/vhost-user-server.h
> > @@ -0,0 +1,56 @@
> > +#include "io/channel-socket.h"
> > +#include "io/channel-file.h"
> > +#include "io/net-listener.h"
> > +#include "contrib/libvhost-user/libvhost-user.h"
> > +#include "standard-headers/linux/virtio_blk.h"
> > +#include "qemu/error-report.h"
> > +
> > +typedef struct VuClient VuClient;
>
> I find the terminology a bit confusing here: VuClient is really the
> connection to a single client, but it's part of the server. The name
> gives the impression as if this were client-side code. (This is
> something that already tends to confuse me in the NBD code.)
>
> I'm not sure what a better name could be, though. Maybe
> VuServerConnevtion or VuExportClient or VuExportConnection?
>
> > +typedef struct VuServer {
> > +QIONetListener *listener;
> > +AioContext *ctx;
> > +QTAILQ_HEAD(, VuClient) clients;
> > +void (*device_panic_notifier)(struct VuClient *client) ;
> > +int max_queues;
> > +const VuDevIface *vu_iface;
> > +/*
> > + * @ptr_in_device: VuServer pointer memory location in vhost-user 
> > device
> > + * struct, so later container_of can be used to get device destruct
> > + */
> > +void *ptr_in_device;
> > +bool close;
> > +} VuServer;
> > +
> > +typedef struct kick_info {
> > +VuDev *vu_dev;
>
> I suppose this could specifically be VuClient?
>
> > +int fd; /*kick fd*/
> > +long index; /*queue index*/
> > +QIOChannel *ioc; /*I/O channel for kick fd*/
> > +QIOChannelFile *fioc; /*underlying data channel for kick fd*/
> > +Coroutin

[PATCH] mips/mips_malta: Allow more than 2G RAM

2020-02-27 Thread Jiaxun Yang
When malta is coupled with MIPS64 cpu which have 64bit
address space, it is possible to have more than 2G RAM.

So we removed ram_size check and overwrite memory
layout for these targets.

Signed-off-by: Jiaxun Yang 
Suggested-by: Yunqiang Su 
---
 hw/mips/mips_malta.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index 6e7ba9235d..de89cdcfc1 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -98,7 +98,8 @@ typedef struct {
 } MaltaState;
 
 static struct _loaderparams {
-int ram_size, ram_low_size;
+unsigned int ram_low_size;
+ram_addr_t ram_size;
 const char *kernel_filename;
 const char *kernel_cmdline;
 const char *initrd_filename;
@@ -1023,6 +1024,7 @@ static int64_t load_kernel(void)
 {
 int64_t kernel_entry, kernel_high, initrd_size;
 long kernel_size;
+char mem_cmdline[128];
 ram_addr_t initrd_offset;
 int big_endian;
 uint32_t *prom_buf;
@@ -1099,20 +1101,28 @@ static int64_t load_kernel(void)
 prom_buf = g_malloc(prom_size);
 
 prom_set(prom_buf, prom_index++, "%s", loaderparams.kernel_filename);
+
+/*
+ * Always use cmdline to overwrite mem layout
+ * as kernel may reject large emesize.
+ */
+sprintf(&mem_cmdline[0],
+"mem=0x1000@0x mem=0x%" PRIx64 "@0x9000",
+loaderparams.ram_size - 0x1000);
 if (initrd_size > 0) {
 prom_set(prom_buf, prom_index++,
- "rd_start=0x%" PRIx64 " rd_size=%" PRId64 " %s",
- xlate_to_kseg0(NULL, initrd_offset),
+ "%s rd_start=0x%" PRIx64 " rd_size=%" PRId64 " %s",
+ &mem_cmdline[0], xlate_to_kseg0(NULL, initrd_offset),
  initrd_size, loaderparams.kernel_cmdline);
 } else {
-prom_set(prom_buf, prom_index++, "%s", loaderparams.kernel_cmdline);
+prom_set(prom_buf, prom_index++, "%s %s",&mem_cmdline[0] 
,loaderparams.kernel_cmdline);
 }
 
 prom_set(prom_buf, prom_index++, "memsize");
 prom_set(prom_buf, prom_index++, "%u", loaderparams.ram_low_size);
 
 prom_set(prom_buf, prom_index++, "ememsize");
-prom_set(prom_buf, prom_index++, "%u", loaderparams.ram_size);
+prom_set(prom_buf, prom_index++, "%lu", loaderparams.ram_size);
 
 prom_set(prom_buf, prom_index++, "modetty0");
 prom_set(prom_buf, prom_index++, "38400n8r");
@@ -1253,12 +1263,14 @@ void mips_malta_init(MachineState *machine)
 /* create CPU */
 mips_create_cpu(machine, s, &cbus_irq, &i8259_irq);
 
-/* allocate RAM */
+#ifdef TARGET_MIPS32
+/* MIPS32 won't accept more than 2GiB RAM due to limited address space */
 if (ram_size > 2 * GiB) {
 error_report("Too much memory for this machine: %" PRId64 "MB,"
  " maximum 2048MB", ram_size / MiB);
 exit(1);
 }
+#endif
 
 /* register RAM at high address where it is undisturbed by IO */
 memory_region_add_subregion(system_memory, 0x8000, machine->ram);
-- 
2.25.1





Re: [edk2-devel] A problem with live migration of UEFI virtual machines

2020-02-27 Thread Andrew Fish via


> On Feb 26, 2020, at 1:42 AM, Laszlo Ersek  wrote:
> 
> Hi Andrew,
> 
> On 02/25/20 22:35, Andrew Fish wrote:
> 
>> Laszlo,
>> 
>> The FLASH offsets changing breaking things makes sense.
>> 
>> I now realize this is like updating the EFI ROM without rebooting the
>> system.  Thus changes in how the new EFI code works is not the issue.
>> 
>> Is this migration event visible to the firmware? Traditionally the
>> NVRAM is a region in the FD so if you update the FD you have to skip
>> NVRAM region or save and restore it. Is that activity happening in
>> this case? Even if the ROM layout does not change how do you not lose
>> the contents of the NVRAM store when the live migration happens? Sorry
>> if this is a remedial question but I'm trying to learn how this
>> migration works.
> 
> With live migration, the running guest doesn't notice anything. This is
> a general requirement for live migration (regardless of UEFI or flash).
> 
> You are very correct to ask about "skipping" the NVRAM region. With the
> approach that OvmfPkg originally supported, live migration would simply
> be unfeasible. The "build" utility would produce a single (unified)
> OVMF.fd file, which would contain both NVRAM and executable regions, and
> the guest's variable updates would modify the one file that would exist.
> This is inappropriate even without considering live migration, because
> OVMF binary upgrades (package updates) on the virtualization host would
> force guests to lose their private variable stores (NVRAMs).
> 
> Therefore, the "build" utility produces "split" files too, in addition
> to the unified OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
> OVMF.fd is simply the concatenation of the latter two.
> 
> $ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd
> [prints nothing]


Laszlo,

Thanks for the detailed explanation. 

Maybe I was overcomplicating this. Given your explanation I think the part I'm 
missing is OVMF is implying FLASH layout, in this split model, based on the 
size of the OVMF_CODE.fd and OVMF_VARS.fd.  Given that if OVMF_CODE.fd gets 
bigger the variable address changes from a QEMU point of view. So basically it 
is the QEMU  API that is making assumptions about the relative layout of the FD 
in the split model that makes a migration to larger ROM not work. Basically the 
-pflash API does not support changing the size of the ROM without moving NVRAM 
given the way it is currently defined. 

Given the above it seems like the 2 options are:
1) Pad OVMF_CODE.fd to be very large so there is room to grow.
2) Add some feature to QUEM that allows the variable store address to not be 
based on OVMF_CODE.fd size. 

I did see this [1] and combined with your email I either understand, or I'm 
still confused? :)

I'm not saying we need to change anything, I'm just trying to make sure I 
understand how OVMF and QEMU are tied to together. 

[1] https://www.redhat.com/archives/libvir-list/2019-January/msg01031.html

Thanks,

Andrew Fish




> 
> When you define a new domain (VM) on a virtualization host, the domain
> definition saves a reference (pathname) to the OVMF_CODE.fd file.
> However, the OVMF_VARS.fd file (the variable store *template*) is not
> directly referenced; instead, it is *copied* into a separate (private)
> file for the domain.
> 
> Furthermore, once booted, guest has two flash chips, one that maps the
> firmware executable OVMF_CODE.fd read-only, and another pflash chip that
> maps its private varstore file read-write.
> 
> This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
> package upgrades on the virt host) without messing with varstores that
> were earlier instantiated from OVMF_VARS.fd. What's important here is
> that the various constants in the new (upgraded) OVMF_CODE.fd file
> remain compatible with the *old* OVMF_VARS.fd structure, across package
> upgrades.
> 
> If that's not possible for introducing e.g. a new feature, then the
> package upgrade must not overwrite the OVMF_CODE.fd file in place, but
> must provide an additional firmware binary. This firmware binary can
> then only be used by freshly defined domains (old domains cannot be
> switched over). Old domains can be switched over manually -- and only if
> the sysadmin decides it is OK to lose the current variable store
> contents. Then the old varstore file for the domain is deleted
> (manually), the domain definition is updated, and then a new (logically
> empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd
> that matches the *new* OVMF_2_CODE.fd.
> 
> 
> During live migration, the "RAM-like" contents of both pflash chips are
> migrated (the guest-side view of both chips remains the same, including
> the case when the writeable chip happens to be in "programming mode",
> i.e., during a UEFI variable write through the Fault Tolerant Write and
> Firmware Volume Block(2) protocols).
> 
> Once live migration completes, QEMU dumps the full contents of the
> writeable chip to the backing fi

Re: [PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-27 Thread Richard Henderson
On 2/27/20 5:50 PM, LIU Zhiwei wrote:
>> This is not what I had in mind, and looks wrong as well.
>>
>> int idx = (index * mlen) / 64;
>> int pos = (index * mlen) % 64;
>> return (((uint64_t *)v0)[idx] >> pos) & 1;
>>
>> You also might consider passing log2(mlen), so the multiplication could be
>> strength-reduced to a shift.
> I don't think so. For example, when mlen is 8 bits and index is 0, it will
> reduce to
> 
> return (((uint64_t *)v0)[0]) & 1
> 
> And it's not right.
> 
> The right bit is first bit in vector register 0. And in host big endianess,
> it will be  the first bit of the seventh byte.

You've forgotten that we've just done an 8-byte big-endian load, which means
that we *are* looking at the first bit of the byte at offset 7.

It is right.

>> You don't need to pass mlen, since it's
> Yes.

I finally remembered all of the bits that go into mlen and thought I had
deleted that sentence -- apparently I only removed half.  ;-)


r~



RE: [edk2-devel] A problem with live migration of UEFI virtual machines

2020-02-27 Thread Zhoujian (jay)
Hi Laszlo,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+jianjay.zhou=huawei@nongnu.org] On Behalf
> Of Laszlo Ersek
> Sent: Wednesday, February 26, 2020 5:42 PM
> To: Andrew Fish ; de...@edk2.groups.io
> Cc: berra...@redhat.com; qemu-devel@nongnu.org; Dr. David Alan Gilbert
> ; zhoujianjay ; discuss
> ; Alex Bennée ;
> wuchenye1995 
> Subject: Re: [edk2-devel] A problem with live migration of UEFI virtual 
> machines
> 
> Hi Andrew,
> 
> On 02/25/20 22:35, Andrew Fish wrote:
> 
> > Laszlo,
> >
> > The FLASH offsets changing breaking things makes sense.
> >
> > I now realize this is like updating the EFI ROM without rebooting the
> > system.  Thus changes in how the new EFI code works is not the issue.
> >
> > Is this migration event visible to the firmware? Traditionally the
> > NVRAM is a region in the FD so if you update the FD you have to skip
> > NVRAM region or save and restore it. Is that activity happening in
> > this case? Even if the ROM layout does not change how do you not lose
> > the contents of the NVRAM store when the live migration happens? Sorry
> > if this is a remedial question but I'm trying to learn how this
> > migration works.
> 
> With live migration, the running guest doesn't notice anything. This is a 
> general
> requirement for live migration (regardless of UEFI or flash).
> 
> You are very correct to ask about "skipping" the NVRAM region. With the
> approach that OvmfPkg originally supported, live migration would simply be
> unfeasible. The "build" utility would produce a single (unified) OVMF.fd 
> file, which
> would contain both NVRAM and executable regions, and the guest's variable
> updates would modify the one file that would exist.
> This is inappropriate even without considering live migration, because OVMF
> binary upgrades (package updates) on the virtualization host would force 
> guests
> to lose their private variable stores (NVRAMs).
> 
> Therefore, the "build" utility produces "split" files too, in addition to the 
> unified
> OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
> OVMF.fd is simply the concatenation of the latter two.
> 
> $ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd [prints nothing]
> 
> When you define a new domain (VM) on a virtualization host, the domain
> definition saves a reference (pathname) to the OVMF_CODE.fd file.
> However, the OVMF_VARS.fd file (the variable store *template*) is not directly
> referenced; instead, it is *copied* into a separate (private) file for the 
> domain.
> 
> Furthermore, once booted, guest has two flash chips, one that maps the
> firmware executable OVMF_CODE.fd read-only, and another pflash chip that
> maps its private varstore file read-write.
> 
> This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
> package upgrades on the virt host) without messing with varstores that were
> earlier instantiated from OVMF_VARS.fd. What's important here is that the
> various constants in the new (upgraded) OVMF_CODE.fd file remain compatible
> with the *old* OVMF_VARS.fd structure, across package upgrades.
> 
> If that's not possible for introducing e.g. a new feature, then the package
> upgrade must not overwrite the OVMF_CODE.fd file in place, but must provide an
> additional firmware binary. This firmware binary can then only be used by 
> freshly
> defined domains (old domains cannot be switched over). Old domains can be
> switched over manually -- and only if the sysadmin decides it is OK to lose 
> the
> current variable store contents. Then the old varstore file for the domain is
> deleted (manually), the domain definition is updated, and then a new 
> (logically
> empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd that
> matches the *new* OVMF_2_CODE.fd.
> 
> 
> During live migration, the "RAM-like" contents of both pflash chips are 
> migrated
> (the guest-side view of both chips remains the same, including the case when 
> the
> writeable chip happens to be in "programming mode", i.e., during a UEFI 
> variable
> write through the Fault Tolerant Write and Firmware Volume Block(2) 
> protocols).
> 
> Once live migration completes, QEMU dumps the full contents of the writeable
> chip to the backing file (on the destination host). Going forward, flash 
> writes from
> within the guest are reflected to said host-side file on-line, just like it 
> happened
> on the source host before live migration. If the file backing the r/w pflash 
> chip is
> on NFS (shared by both src and dst hosts), then this one-time dumping when the
> migration completes is superfluous, but it's also harmless.
> 
> The interesting question is, what happens when you power down the VM on the
> destination host (= post migration), and launch it again there, from zero. In 
> that
> case, the firmware executable file comes from the *destination host* (it was
> never persistently migrated from the source host, i.e. never written out on 
> the
> dst). It simply comes from t

[PATCH v3 2/2] util: add util function buffer_zero_avx512()

2020-02-27 Thread Robert Hoo
And intialize buffer_is_zero() with it, when Intel AVX512F is
available on host.

This function utilizes Intel AVX512 fundamental instructions which
is faster than its implementation with AVX2 (in my unit test, with
4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S.
buffer_zero_avx2()).

Signed-off-by: Robert Hoo 
---
 include/qemu/cpuid.h |  3 +++
 util/bufferiszero.c  | 67 +---
 2 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h
index 6930170..09fc245 100644
--- a/include/qemu/cpuid.h
+++ b/include/qemu/cpuid.h
@@ -45,6 +45,9 @@
 #ifndef bit_AVX2
 #define bit_AVX2(1 << 5)
 #endif
+#ifndef bit_AVX512F
+#define bit_AVX512F(1 << 16)
+#endif
 #ifndef bit_BMI2
 #define bit_BMI2(1 << 8)
 #endif
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index bfb2605..ce877b7 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -63,11 +63,11 @@ buffer_zero_int(const void *buf, size_t len)
 }
 }
 
-#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
+#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || 
defined(__SSE2__)
 /* Do not use push_options pragmas unnecessarily, because clang
  * does not support them.
  */
-#ifdef CONFIG_AVX2_OPT
+#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
 #pragma GCC push_options
 #pragma GCC target("sse2")
 #endif
@@ -104,7 +104,7 @@ buffer_zero_sse2(const void *buf, size_t len)
 
 return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0x;
 }
-#ifdef CONFIG_AVX2_OPT
+#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
 #pragma GCC pop_options
 #endif
 
@@ -187,18 +187,54 @@ buffer_zero_avx2(const void *buf, size_t len)
 #pragma GCC pop_options
 #endif /* CONFIG_AVX2_OPT */
 
+#ifdef CONFIG_AVX512F_OPT
+#pragma GCC push_options
+#pragma GCC target("avx512f")
+#include 
+
+static bool
+buffer_zero_avx512(const void *buf, size_t len)
+{
+/* Begin with an unaligned head of 64 bytes.  */
+__m512i t = _mm512_loadu_si512(buf);
+__m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
+__m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64);
+
+/* Loop over 64-byte aligned blocks of 256.  */
+while (p <= e) {
+__builtin_prefetch(p);
+if (unlikely(_mm512_test_epi64_mask(t, t))) {
+return false;
+}
+t = p[-4] | p[-3] | p[-2] | p[-1];
+p += 4;
+}
+
+t |= _mm512_loadu_si512(buf + len - 4 * 64);
+t |= _mm512_loadu_si512(buf + len - 3 * 64);
+t |= _mm512_loadu_si512(buf + len - 2 * 64);
+t |= _mm512_loadu_si512(buf + len - 1 * 64);
+
+return !_mm512_test_epi64_mask(t, t);
+
+}
+#pragma GCC pop_options
+#endif
+
+
 /* Note that for test_buffer_is_zero_next_accel, the most preferred
  * ISA must have the least significant bit.
  */
-#define CACHE_AVX21
-#define CACHE_SSE42
-#define CACHE_SSE24
+#define CACHE_AVX512F 1
+#define CACHE_AVX22
+#define CACHE_SSE44
+#define CACHE_SSE28
 
 /* Make sure that these variables are appropriately initialized when
  * SSE2 is enabled on the compiler command-line, but the compiler is
  * too old to support CONFIG_AVX2_OPT.
  */
-#ifdef CONFIG_AVX2_OPT
+#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
 # define INIT_CACHE 0
 # define INIT_ACCEL buffer_zero_int
 #else
@@ -211,25 +247,35 @@ buffer_zero_avx2(const void *buf, size_t len)
 
 static unsigned cpuid_cache = INIT_CACHE;
 static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL;
+static int length_to_accel;
 
 static void init_accel(unsigned cache)
 {
 bool (*fn)(const void *, size_t) = buffer_zero_int;
 if (cache & CACHE_SSE2) {
 fn = buffer_zero_sse2;
+length_to_accel = 64;
 }
 #ifdef CONFIG_AVX2_OPT
 if (cache & CACHE_SSE4) {
 fn = buffer_zero_sse4;
+length_to_accel = 64;
 }
 if (cache & CACHE_AVX2) {
 fn = buffer_zero_avx2;
+length_to_accel = 64;
+}
+#endif
+#ifdef CONFIG_AVX512F_OPT
+if (cache & CACHE_AVX512F) {
+fn = buffer_zero_avx512;
+length_to_accel = 256;
 }
 #endif
 buffer_accel = fn;
 }
 
-#ifdef CONFIG_AVX2_OPT
+#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
 #include "qemu/cpuid.h"
 
 static void __attribute__((constructor)) init_cpuid_cache(void)
@@ -255,6 +301,9 @@ static void __attribute__((constructor)) 
init_cpuid_cache(void)
 if ((bv & 6) == 6 && (b & bit_AVX2)) {
 cache |= CACHE_AVX2;
 }
+if ((bv & 6) == 6 && (b & bit_AVX512F)) {
+cache |= CACHE_AVX512F;
+}
 }
 }
 cpuid_cache = cache;
@@ -277,7 +326,7 @@ bool test_buffer_is_zero_next_accel(void)
 
 static bool select_accel_fn(const void *buf, size_t len)
 {
-if (likely(len >= 64)) {
+if (likely(len >= length_to_accel)) {
 return buffer_accel(buf, len);
 }
 return buffer_ze

[PATCH v3 1/2] configure: add configure option avx512f_opt

2020-02-27 Thread Robert Hoo
If it is enabled, config-host.mak will have CONFIG_AVX512F_OPT defined.

AVX512F instruction set is available since Intel Skylake, and can be enabled in
compiling with -mavx512f.
More info:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Robert Hoo 
---
 configure | 41 +
 1 file changed, 41 insertions(+)

diff --git a/configure b/configure
index d57261e..a0b41ce 100755
--- a/configure
+++ b/configure
@@ -1395,6 +1395,11 @@ for opt do
   ;;
   --enable-avx2) avx2_opt="yes"
   ;;
+  --disable-avx512f) avx512f_opt="no"
+  ;;
+  --enable-avx512f) avx512f_opt="yes"
+  ;;
+
   --enable-glusterfs) glusterfs="yes"
   ;;
   --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
@@ -1825,6 +1830,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   tcmalloctcmalloc support
   jemallocjemalloc support
   avx2AVX2 optimization support
+  avx512f AVX512F optimization support
   replication replication support
   opengl  opengl support
   virglrenderer   virgl rendering support
@@ -5518,6 +5524,36 @@ EOF
   fi
 fi
 
+##
+# avx512f optimization requirement check
+#
+# There is no point enabling this if cpuid.h is not usable,
+# since we won't be able to select the new routines.
+# by default, it is turned off.
+# if user explicitly want to enable it, check environment
+
+if test "$cpuid_h" = "yes" && test "$avx512f_opt" = "yes"; then
+  cat > $TMPC << EOF
+#pragma GCC push_options
+#pragma GCC target("avx512f")
+#include 
+#include 
+static int bar(void *a) {
+__m512i x = *(__m512i *)a;
+return _mm512_test_epi64_mask(x, x);
+}
+int main(int argc, char *argv[])
+{
+   return bar(argv[0]);
+}
+EOF
+  if ! compile_object "" ; then
+avx512f_opt="no"
+  fi
+else
+  avx512f_opt="no"
+fi
+
 
 # check if __[u]int128_t is usable.
 
@@ -6650,6 +6686,7 @@ echo "libxml2   $libxml2"
 echo "tcmalloc support  $tcmalloc"
 echo "jemalloc support  $jemalloc"
 echo "avx2 optimization $avx2_opt"
+echo "avx512f optimization $avx512f_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "bochs support $bochs"
@@ -7200,6 +7237,10 @@ if test "$avx2_opt" = "yes" ; then
   echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
 fi
 
+if test "$avx512f_opt" = "yes" ; then
+  echo "CONFIG_AVX512F_OPT=y" >> $config_host_mak
+fi
+
 if test "$lzo" = "yes" ; then
   echo "CONFIG_LZO=y" >> $config_host_mak
 fi
-- 
1.8.3.1




[PATCH v3 0/2] Add AVX512F optimization option and buffer_zero_avx512()

2020-02-27 Thread Robert Hoo
1) Introduce {enable,disable}-avx512f configure option

2) Implement new buffer_zero_avx512() with AVX512F instructions

Changes in v3:
In init_accel(), init length_to_accel value in every accel case, because
in unit test, it will be invoked several times with different accel cases.
(Thanks Richard's careful review)

Changes in v2:
1. Fixes wrong definition of CACHE_SSE2 in v1.
2. Fixes not handle <256 length case in buffer_zero_avx512() implementaion.
(Follow Richard's suggestion: handle the case in select_accel_fn(), and have a
global variable alongside buffer_accel)
3. Changes avx512f configuration option's default status to disabled.
4. Ran 'make check-unit' on this patch, on both a Ivybridge machine and a
CascadeLake machine.


Robert Hoo (2):
  configure: add configure option avx512f_opt
  util: add util function buffer_zero_avx512()

 configure| 41 
 include/qemu/cpuid.h |  3 +++
 util/bufferiszero.c  | 67 +---
 3 files changed, 102 insertions(+), 9 deletions(-)

-- 
1.8.3.1




Re: [PATCH v4 4/5] target/riscv: add fault-only-first unit stride load

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 4:03, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+GEN_VEXT_LD_ELEM(vlbff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlhff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlwff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwff_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vleff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vleff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlbuff_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlhuff_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlwuff_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwuff_v_d, uint32_t, uint64_t, H8, ldl)

We definitely should not have a 3rd copy of these.

Yes, I will remove it by add a parameter to GEN_VEXT_LDFF.




+if (i == 0) {
+probe_read_access(env, addr, nf * msz, ra);
+} else {
+/* if it triggles an exception, no need to check watchpoint */

triggers.

Yes.



+offset = -(addr | TARGET_PAGE_MASK);
+remain = nf * msz;
+while (remain > 0) {
+host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
+if (host) {
+#ifdef CONFIG_USER_ONLY
+if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
+vl = i;
+goto ProbeSuccess;
+}
+#else
+probe_read_access(env, addr, nf * msz, ra);
+#endif

Good job finding all of the corner cases.  I should invent a new cputlb
function that handles this better.  For now, this is the best we can do.

That will be better.

I learn a lot from SVE and some S390  code. Thanks a lot.

Best Regards,
Zhiwei


r~





Re: [PATCH v4 3/5] target/riscv: add vector index load and store instructions

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 3:49, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+vsxb_v ... 011 . . . 000 . 0100111 @r_nfvm
+vsxh_v ... 011 . . . 101 . 0100111 @r_nfvm
+vsxw_v ... 011 . . . 110 . 0100111 @r_nfvm
+vsxe_v ... 011 . . . 111 . 0100111 @r_nfvm
+vsuxb_v... 111 . . . 000 . 0100111 @r_nfvm
+vsuxh_v... 111 . . . 101 . 0100111 @r_nfvm
+vsuxw_v... 111 . . . 110 . 0100111 @r_nfvm
+vsuxe_v... 111 . . . 111 . 0100111 @r_nfvm

These can be merged, with a comment, like

# Vector ordered-indexed and unordered-indexed store insns.
vsxb_v ... -11 . . . 000 . 0100111 @r_nfvm

which means you don't need these:

Good.

+static bool trans_vsuxb_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxb_v(s, a);
+}
+
+static bool trans_vsuxh_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxh_v(s, a);
+}
+
+static bool trans_vsuxw_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxw_v(s, a);
+}
+
+static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxe_v(s, a);
+}
+static inline void vext_ld_index(void *vd, void *v0, target_ulong base,
+void *vs2, CPURISCVState *env, uint32_t desc,
+vext_get_index_addr get_index_addr,
+vext_ld_elem_fn ld_elem,
+vext_ld_clear_elem clear_elem,
+uint32_t esz, uint32_t msz, uintptr_t ra)

Similar comment about merging vext_ld_index and vext_st_index.

Good idea. Thanks.

Zhiwei


r~





Re: [PATCH v4 2/5] target/riscv: add vector stride load and store instructions

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 3:36, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+GEN_VEXT_LD_ELEM(vlsb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlsh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlsw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlsw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlse_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlse_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlsbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlshu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)

Why do you need to define new functions identical to the old ones?
Are you
doing this just to make the names match up?

Yes, just to make the names match up. So I can use

GEN_VEXT_ST_STRIDE

to generate code.

Perhaps add a parameter for GEN_VEXT_ST_STRIDE is just OK.




+GEN_VEXT_ST_ELEM(vssb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vssb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vssb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vssb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vssh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vssh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vssh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vssw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vssw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)

Likewise.


+static void vext_st_stride(void *vd, void *v0, target_ulong base,
+target_ulong stride, CPURISCVState *env, uint32_t desc,
+vext_st_elem_fn st_elem, uint32_t esz, uint32_t msz, uintptr_t ra)
+{
+uint32_t i, k;
+uint32_t nf = vext_nf(desc);
+uint32_t vm = vext_vm(desc);
+uint32_t mlen = vext_mlen(desc);
+uint32_t vlmax = vext_maxsz(desc) / esz;
+
+/* probe every access*/
+for (i = 0; i < env->vl; i++) {
+if (!vm && !vext_elem_mask(v0, mlen, i)) {
+continue;
+}
+probe_write_access(env, base + stride * i, nf * msz, ra);
+}
+/* store bytes to guest memory */
+for (i = 0; i < env->vl; i++) {
+k = 0;
+if (!vm && !vext_elem_mask(v0, mlen, i)) {
+continue;
+}
+while (k < nf) {
+target_ulong addr = base + stride * i + k * msz;
+st_elem(env, addr, i + k * vlmax, vd, ra);
+k++;
+}
+}
+}

Similar comments wrt unifying the load and store helpers.

I'll also note that vext_st_stride and vext_st_us_mask could be unified by
passing sizeof(ETYPE) as stride, and vm = true as a parameter.

Good idea. Thanks.

Zhiwei



r~





Re: [PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-27 Thread LIU Zhiwei



On 2020/2/28 3:17, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
+{
+int legal = widen ? 2 << s->lmul : 1 << s->lmul;
+
+return !((s->lmul == 0x3 && widen) || (reg % legal));
+}
+
+static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm)
+{
+return !(s->lmul > 1 && vm == 0 && vd == 0);
+}
+
+static bool vext_check_nf(DisasContext *s, uint32_t nf)
+{
+return s->lmul * (nf + 1) <= 8;
+}

Some commentary would be good here, quoting the rule being applied.  E.g. "The
destination vector register group for a masked vector instruction can only
overlap the source mask regis-
ter (v0) when LMUL=1. (Section 5.3)"

Good idea.

+static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+uint8_t nf = a->nf + 1;

Perhaps NF should have the +1 done during decode, so that it cannot be
forgotten here or elsewhere.


Perhaps not. It will  not be used elsewhere. And it will need one more 
bit in FIELD().

  E.g.

%nf  31:3  !function=ex_plus_1
@r2_nfvm ... ... vm:1 . . ... . ... \
  &r2nfvm %nf %rs1 %rd

Where ex_plus_1 is the obvious modification of ex_shift_1().

+static inline uint32_t vext_nf(uint32_t desc)
+{
+return (simd_data(desc) >> 11) & 0xf;
+}
+
+static inline uint32_t vext_mlen(uint32_t desc)
+{
+return simd_data(desc) & 0xff;
+}
+
+static inline uint32_t vext_vm(uint32_t desc)
+{
+return (simd_data(desc) >> 8) & 0x1;
+}
+
+static inline uint32_t vext_lmul(uint32_t desc)
+{
+return (simd_data(desc) >> 9) & 0x3;
+}

You should use FIELD() to define the fields, and then use FIELD_EX32 and
FIELD_DP32 to reference them.

Nice, I will find some place to define the fields.

+/*
+ * This function checks watchpoint before real load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will triggle an exception if there is no mapping in TLB

trigger.

Yes.

+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_read_access(CPURISCVState *env, target_ulong addr,
+target_ulong len, uintptr_t ra)
+{
+while (len) {
+const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+const target_ulong curlen = MIN(pagelen, len);
+
+probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
+addr += curlen;
+len -= curlen;
+}
+}
+
+static void probe_write_access(CPURISCVState *env, target_ulong addr,
+target_ulong len, uintptr_t ra)
+{
+while (len) {
+const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+const target_ulong curlen = MIN(pagelen, len);
+
+probe_write(env, addr, curlen, cpu_mmu_index(env, false), ra);
+addr += curlen;
+len -= curlen;
+}
+}

A loop is overkill -- the access cannot span to 3 pages.

Yes, I will just do as you suggest!

In the unit stride load, without mask,  the max access len is checked . 
It is 512 in bytes.

And current target page is 4096 in bytes.

#define TARGET_PAGE_BITS 12 /* 4 KiB Pages */


These two functions
can be merged using probe_access and MMU_DATA_{LOAD,STORE}.


+
+#ifdef HOST_WORDS_BIGENDIAN
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+/*
+ * Split the remaining range to two parts.
+ * The first part is in the last uint64_t unit.
+ * The second part start from the next uint64_t unit.
+ */
+int part1 = 0, part2 = tot - cnt;
+if (cnt % 64) {
+part1 = 64 - (cnt % 64);
+part2 = tot - cnt - part1;
+memset(tail & ~(63ULL), 0, part1);
+memset((tail + 64) & ~(63ULL), 0, part2);

You're confusing bit and byte offsets -- cnt and tot are both byte offsets.

Yes, I will fix it.

+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+
+int idx = (index * mlen) / 8;
+int pos = (index * mlen) % 8;
+
+switch (mlen) {
+case 8:
+return *((uint8_t *)v0 + H1(index)) & 0x1;
+case 16:
+return *((uint16_t *)v0 + H2(index)) & 0x1;
+case 32:
+return *((uint32_t *)v0 + H4(index)) & 0x1;
+case 64:
+return *((uint64_t *)v0 + index) & 0x1;
+default:









+return (*((uint8_t *)v0 + H1(idx)) >> pos) & 0x1;
+}

This is not what I had in mind, and looks wrong as well.

 int idx = (index * mlen) / 64;
 int pos = (index * mlen) % 64;
 return (((uint64_t *)v0)[idx] >> pos) & 1;

You also might consider passing log2(mlen), so the multiplication could be
strength-reduced to a shift.
I don't think so. For example, when mlen is 8 bits and index is 0, it 
will reduce to


return (((uint64_t *)v0)[0]) & 1

And it's not right.

The right bit is first bit in vector register 0. And in

[Bug 1865099] [NEW] cannot run x64 based system on x64 host with Intel Haxm

2020-02-27 Thread Nick
Public bug reported:

i am trying to run Windows 10 x64 on Windows 10 x64 host with intel haxm
as kernel accelerator, but the system never boots, as far i read the
documentation everything should be fine...

the logs are qemu:


`
D:\vm>qemu-system-x86_64 -d 
guest_errors,out_asm,in_asm,op,op_opt,op_ind,int,exec,cpu,fpu,mmu,pcall,cpu_reset,unimp,page,nochain
 -cpu core2duo -smp 4 -accel hax -drive 
file=w10.img,index=0,media=disk,format=raw -cdrom "E:\test\W10x64ProEn-UK.iso" 
-m 4G -L Bios -usbdevice mouse -usbdevice keyboard -boot menu=on -rtc 
base=localtime,clock=host -name windows
qemu-system-x86_64: -usbdevice mouse: '-usbdevice' is deprecated, please use 
'-device usb-...' instead
qemu-system-x86_64: -usbdevice keyboard: '-usbdevice' is deprecated, please use 
'-device usb-...' instead
HAX is working and emulator runs in fast virt mode.
CPU Reset (CPU 0)
EAX= EBX= ECX= EDX=
ESI= EDI= EBP= ESP=
EIP= EFL= [---] CPL=0 II=0 A20=0 SMM=0 HLT=0
ES =   
CS =   
SS =   
DS =   
FS =   
GS =   
LDT=   
TR =   
GDT=  
IDT=  
CR0= CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6= DR7=
CCS= CCD= CCO=DYNAMIC
EFER=
FCW= FSW= [ST=0] FTW=ff MXCSR=
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=
CR0 update: CR0=0x6010
CPU Reset (CPU 1)
EAX= EBX= ECX= EDX=
ESI= EDI= EBP= ESP=
EIP= EFL= [---] CPL=0 II=0 A20=0 SMM=0 HLT=0
ES =   
CS =   
SS =   
DS =   
FS =   
GS =   
LDT=   
TR =   
GDT=  
IDT=  
CR0= CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6= DR7=
CCS= CCD= CCO=DYNAMIC
EFER=
FCW= FSW= [ST=0] FTW=ff MXCSR=
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=
CR0 update: CR0=0x6010
CPU Reset (CPU 2)
EAX= EBX= ECX= EDX=
ESI= EDI= EBP= ESP=
EIP= EFL= [---] CPL=0 II=0 A20=0 SMM=0 HLT=0
ES =   
CS =   
SS =   
DS =   
FS =   
GS =   
LDT=   
TR =   
GDT=  
IDT=  
CR0= CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6= DR7=
CCS= CCD= CCO=DYNAMIC
EFER=
FCW= FSW= [ST=0] FTW=ff MXCSR=
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=
CR0 update: CR0=0x6010
CPU Reset (CPU 3)
EAX= EB

[PATCH V2] MAINTAINERS: Add entry for Guest X86 HAXM CPUs

2020-02-27 Thread Colin Xu
HAXM covers below files:
include/sysemu/hax.h
target/i386/hax-*

V2: Add HAXM github page for wiki and issue tracking.

Cc: Wenchao Wang 
Cc: Hang Yuan 
Reviewed-by: Hang Yuan 
Signed-off-by: Colin Xu 
---
 MAINTAINERS | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index df1786db3207..c45f1421eab5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -435,6 +435,17 @@ F: include/hw/block/dataplane/xen*
 F: include/hw/xen/
 F: include/sysemu/xen-mapcache.h
 
+Guest CPU Cores (HAXM)
+-
+X86 HAXM CPUs
+M: Wenchao Wang 
+M: Colin Xu 
+L: haxm-t...@intel.com
+W: https://github.com/intel/haxm/issues
+S: Maintained
+F: include/sysemu/hax.h
+F: target/i386/hax-*
+
 Hosts
 -
 LINUX
-- 
2.25.1




FYI: The daily digest email for qemu-devel/qemu-arm didn't go out today

2020-02-27 Thread Ian Kelling
I hope to fix it tomorrow, but I can't be sure. Maybe it will fix itself
and I'll just abandon the digest for today. Very few people use the
digest feature, normally it goes out at noonish eastern time.

All I've had time to do done is grab a stack trace, seems like there's
some characters in an email that mailman doesn't like. If anyone wants
to lend a hand, this is from a lightly modified senddigests from mailman
to get a stack trace:

Traceback (most recent call last):
  File "/usr/lib/mailman/cron/sd", line 106, in 
main()
  File "/usr/lib/mailman/cron/sd", line 89, in main
mlist.send_digest_now()
  File "/var/lib/mailman/Mailman/Digester.py", line 60, in send_digest_now
ToDigest.send_digests(self, mboxfp)
  File "/var/lib/mailman/Mailman/Handlers/ToDigest.py", line 147, in 
send_digests
send_i18n_digests(mlist, mboxfp)
  File "/var/lib/mailman/Mailman/Handlers/ToDigest.py", line 347, in 
send_i18n_digests
payload = unicode(payload, mcset, 'replace'
  File "/usr/lib/python2.7/encodings/base64_codec.py", line 41, in base64_decode
assert errors == 'strict'
AssertionError

-- 
Ian Kelling | Senior Systems Administrator, Free Software Foundation
GPG Key: B125 F60B 7B28 7FF6 A2B7  DF8F 170A F0E2 9542 95DF
https://fsf.org | https://gnu.org



Re: [PATCH] MAINTAINERS: Add entry for Guest X86 HAXM CPUs

2020-02-27 Thread Colin Xu



On 2020-02-27 16:00, Philippe Mathieu-Daudé wrote:

On 2/27/20 8:56 AM, Philippe Mathieu-Daudé wrote:

Cc'ing qemu-trivial@

On 2/26/20 5:32 AM, Colin Xu wrote:

HAXM covers below files:
include/sysemu/hax.h
target/i386/hax-*

Cc: Wenchao Wang 
Cc: Hang Yuan 
Signed-off-by: Colin Xu 


Please keep the Acked-by/Reviewed-by tags:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg623832.html

This patch already has:
Reviewed-by: Hang Yuan 


---
  MAINTAINERS | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 36d94c17a654..27727e2fac13 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -435,6 +435,16 @@ F: include/hw/block/dataplane/xen*
  F: include/hw/xen/
  F: include/sysemu/xen-mapcache.h
+Guest CPU Cores (HAXM)
+-
+X86 HAXM CPUs
+M: Wenchao Wang 
+M: Colin Xu 


Maybe you can also link where to report HAXM issues:

W: https://github.com/intel/haxm/issues

Indeed we need this. Thanks for point it out.



+L: haxm-t...@intel.com
+S: Maintained
+F: include/sysemu/hax.h
+F: target/i386/hax-*
+
  Hosts
  -
  LINUX




--
Best Regards,
Colin Xu




Re: [PATCH 3/3] target/i386: modify Icelake-Client and Icelake-Server CPU model number

2020-02-27 Thread Chenyi Qiang




On 2/27/2020 5:48 PM, Jack Wang wrote:

Chenyi Qiang  于2020年2月27日周四 上午10:07写道:


According to the Intel Icelake family list, Icelake-Client uses model
number 126(0x7D)

0x7D is 125 in hex, so the commit message needs to be fixed.



Sorry, my mistake.



Cheers
Jack Wang





Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-27 Thread Stefan Ring
On Thu, Feb 27, 2020 at 10:12 PM Stefan Ring  wrote:
> Victory! I have a reproducer in the form of a plain C libgfapi client.
>
> However, I have not been able to trigger corruption by just executing
> the simple pattern in an artificial way. Currently, I need to feed my
> reproducer 2 GB of data that I streamed out of the qemu block driver.
> I get two possible end states out of my reproducer: The correct one or
> a corrupted one, where 48 KB are zeroed out. It takes no more than 10
> runs to get each of them at least once. The corrupted end state is
> exactly the same that I got from the real qemu process from where I
> obtained the streamed trace. This gives me a lot of confidence in the
> soundness of my reproducer.
>
> More details will follow.

Ok, so the exact sequence of activity around the corruption is this:

8700 and so on are the sequential request numbers. All of these
requests are writes. Blocks are 512 bytes.

8700
  grows the file to a certain size (2134144 blocks)

<8700 retires, nothing in flight>

8701
  writes 55 blocks inside currently allocated file range, close to the
end (7 blocks short)

8702
  writes 54 blocks from the end of 8701, growing the file by 47 blocks

<8702 retires, 8701 remains in flight>

8703
  writes from the end of 8702, growing the file by 81 blocks

<8703 retires, 8701 remains in flight>

8704
  writes 1623 blocks also from the end of 8702, growing the file by 1542 blocks

<8701 retires>
<8704 retires>

The exact range covered by 8703 ends up zeroed out.

If 8701 retires earlier (before 8702 is issued), everything is fine.



[PATCH v1 4/4] hw/arm/cubieboard: report error when using unsupported -bios argument

2020-02-27 Thread Niek Linnenbank
The Cubieboard machine does not support the -bios argument.
Report an error when -bios is used and exit immediately.

Signed-off-by: Niek Linnenbank 
---
 hw/arm/cubieboard.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index 6c55d9056f..871b1beef4 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -19,6 +19,7 @@
 #include "exec/address-spaces.h"
 #include "qapi/error.h"
 #include "cpu.h"
+#include "sysemu/sysemu.h"
 #include "hw/sysbus.h"
 #include "hw/boards.h"
 #include "hw/arm/allwinner-a10.h"
@@ -33,6 +34,12 @@ static void cubieboard_init(MachineState *machine)
 AwA10State *a10;
 Error *err = NULL;
 
+/* BIOS is not supported by this board */
+if (bios_name) {
+error_report("BIOS not supported for this machine");
+exit(1);
+}
+
 /* This board has fixed size RAM (512MiB or 1GiB) */
 if (machine->ram_size != 512 * MiB &&
 machine->ram_size != 1 * GiB) {
-- 
2.17.1




[PATCH v1 0/4] hw/arm/cubieboard: correct CPU type and add machine argument checks

2020-02-27 Thread Niek Linnenbank
These patches change the Cubieboard machine definition to use the
correct CPU type, which is ARM Cortex-A8 instead of ARM Cortex-A9.

Additionally, add some sanity checks for the machine input
arguments in the initialization function.

Niek Linnenbank (4):
  hw/arm/cubieboard: use ARM Cortex-A8 as the default CPU in machine
definition
  hw/arm/cubieboard: restrict allowed CPU type to ARM Cortex-A8
  hw/arm/cubieboard: restrict allowed RAM size to 512MiB and 1GiB
  hw/arm/cubieboard: report error when using unsupported -bios argument

 hw/arm/cubieboard.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

-- 
2.17.1




[PATCH v1 3/4] hw/arm/cubieboard: restrict allowed RAM size to 512MiB and 1GiB

2020-02-27 Thread Niek Linnenbank
The Cubieboard contains either 512MiB or 1GiB of onboard RAM [1].
Prevent changing RAM to a different size which could break user programs.

 [1] http://linux-sunxi.org/Cubieboard

Signed-off-by: Niek Linnenbank 
---
 hw/arm/cubieboard.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index 010375f0a8..6c55d9056f 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -33,6 +33,13 @@ static void cubieboard_init(MachineState *machine)
 AwA10State *a10;
 Error *err = NULL;
 
+/* This board has fixed size RAM (512MiB or 1GiB) */
+if (machine->ram_size != 512 * MiB &&
+machine->ram_size != 1 * GiB) {
+error_report("This machine can only be used with 512MiB or 1GiB RAM");
+exit(1);
+}
+
 /* Only allow Cortex-A8 for this board */
 if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
 error_report("This board can only be used with cortex-a8 CPU");
@@ -78,6 +85,7 @@ static void cubieboard_machine_init(MachineClass *mc)
 {
 mc->desc = "cubietech cubieboard (Cortex-A8)";
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
+mc->default_ram_size = 1 * GiB;
 mc->init = cubieboard_init;
 mc->block_default_type = IF_IDE;
 mc->units_per_default_bus = 1;
-- 
2.17.1




[PATCH v1 1/4] hw/arm/cubieboard: use ARM Cortex-A8 as the default CPU in machine definition

2020-02-27 Thread Niek Linnenbank
The Cubieboard is a singleboard computer with an Allwinner A10 System-on-Chip 
[1].
As documented in the Allwinner A10 User Manual V1.5 [2], the SoC has an ARM
Cortex-A8 processor. Currently the Cubieboard machine definition specifies the
ARM Cortex-A9 in its description and as the default CPU.

This patch corrects the Cubieboard machine definition to use the ARM Cortex-A8.

 [1] http://docs.cubieboard.org/products/start#cubieboard1
 [2] https://linux-sunxi.org/File:Allwinner_A10_User_manual_V1.5.pdf

Signed-off-by: Niek Linnenbank 
---
 hw/arm/cubieboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index 089f9a30c1..0195925c73 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -68,8 +68,8 @@ static void cubieboard_init(MachineState *machine)
 
 static void cubieboard_machine_init(MachineClass *mc)
 {
-mc->desc = "cubietech cubieboard (Cortex-A9)";
-mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a9");
+mc->desc = "cubietech cubieboard (Cortex-A8)";
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
 mc->init = cubieboard_init;
 mc->block_default_type = IF_IDE;
 mc->units_per_default_bus = 1;
-- 
2.17.1




[PATCH v1 2/4] hw/arm/cubieboard: restrict allowed CPU type to ARM Cortex-A8

2020-02-27 Thread Niek Linnenbank
The Cubieboard has an ARM Cortex-A8. Prevent changing the CPU
to a different type which could break user programs.

Signed-off-by: Niek Linnenbank 
---
 hw/arm/cubieboard.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index 0195925c73..010375f0a8 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -30,9 +30,17 @@ static struct arm_boot_info cubieboard_binfo = {
 
 static void cubieboard_init(MachineState *machine)
 {
-AwA10State *a10 = AW_A10(object_new(TYPE_AW_A10));
+AwA10State *a10;
 Error *err = NULL;
 
+/* Only allow Cortex-A8 for this board */
+if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
+error_report("This board can only be used with cortex-a8 CPU");
+exit(1);
+}
+
+a10 = AW_A10(object_new(TYPE_AW_A10));
+
 object_property_set_int(OBJECT(&a10->emac), 1, "phy-addr", &err);
 if (err != NULL) {
 error_reportf_err(err, "Couldn't set phy address: ");
-- 
2.17.1




Re: [PATCH v5 4/4] target/riscv: add vector configure instruction

2020-02-27 Thread Alistair Francis
On Wed, Feb 26, 2020 at 5:41 PM LIU Zhiwei  wrote:
>
>
>
> On 2020/2/27 3:20, Alistair Francis wrote:
> >   On Fri, Feb 21, 2020 at 1:45 AM LIU Zhiwei  wrote:
> >> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
> >> should update after configure instructions. The (ill, lmul, sew ) of vtype
> >> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
> >>
> >> Signed-off-by: LIU Zhiwei 
> >> ---
> >>   MAINTAINERS |  1 +
> >>   target/riscv/Makefile.objs  |  2 +-
> >>   target/riscv/cpu.h  | 61 +++---
> >>   target/riscv/helper.h   |  2 +
> >>   target/riscv/insn32.decode  |  5 ++
> >>   target/riscv/insn_trans/trans_rvv.inc.c | 69 +
> >>   target/riscv/translate.c| 17 +-
> >>   target/riscv/vector_helper.c| 53 +++
> >>   8 files changed, 199 insertions(+), 11 deletions(-)
> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >>   create mode 100644 target/riscv/vector_helper.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 1740a4fddc..cd2e200db9 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -266,6 +266,7 @@ M: Palmer Dabbelt 
> >>   M: Alistair Francis 
> >>   M: Sagar Karandikar 
> >>   M: Bastian Koppelmann 
> >> +M: LIU Zhiwei 
> > I don't think you should add yourself here. MAINTAINERS is more for
> > people doing active patch review.
> OK.
> > RISC-V QEMU can really do with more maintainers though, so if you do
> > want to be involved you could help review patches.
> Actually my main job is to maintain and develop QEMU code,so I'd like to
> review target/riscv code,
> however vector upstream takes a lot time .

Great! I know reviewing code can be touch and time consuming but it
really helps the project. Just as upstreaming can be time consuming
but it's worth it.

Just try to help review what you can, every little bit helps a lot :)

Anyone can review code (you don't have to be a maintainer) so it's a
good place to start. Once you are activley reviewing patches we can
add you as a RISC-V maintainer.

> >>   L: qemu-ri...@nongnu.org
> >>   S: Supported
> >>   F: target/riscv/
> >> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
> >> index ff651f69f6..ff38df6219 100644
> >> --- a/target/riscv/Makefile.objs
> >> +++ b/target/riscv/Makefile.objs
> >> @@ -1,4 +1,4 @@
> >> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
> >> gdbstub.o
> >> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
> >> vector_helper.o gdbstub.o
> >>   obj-$(CONFIG_SOFTMMU) += pmp.o
> >>
> >>   ifeq ($(CONFIG_SOFTMMU),y)
> >> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> >> index 748bd557f9..f7003edb86 100644
> >> --- a/target/riscv/cpu.h
> >> +++ b/target/riscv/cpu.h
> >> @@ -21,6 +21,7 @@
> >>   #define RISCV_CPU_H
> >>
> >>   #include "hw/core/cpu.h"
> >> +#include "hw/registerfields.h"
> >>   #include "exec/cpu-defs.h"
> >>   #include "fpu/softfloat-types.h"
> >>
> >> @@ -98,6 +99,12 @@ typedef struct CPURISCVState CPURISCVState;
> >>
> >>   #define RV_VLEN_MAX 512
> >>
> >> +FIELD(VTYPE, LMUL, 0, 2)
> > Shouldn't this be VLMUL?
> OK. The same with VSEW and VEDIV.
> >
> >> +FIELD(VTYPE, SEW, 2, 3)
> > VSEW?
> >
> >> +FIELD(VTYPE, EDIV, 5, 2)
> > VEDIV?
> >
> >> +FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
> >> +FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
> >> +
> >>   struct CPURISCVState {
> >>   target_ulong gpr[32];
> >>   uint64_t fpr[32]; /* assume both F and D extensions */
> >> @@ -302,16 +309,59 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
> >> target_ulong);
> >>   #define TB_FLAGS_MMU_MASK   3
> >>   #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
> >>
> >> +typedef CPURISCVState CPUArchState;
> >> +typedef RISCVCPU ArchCPU;
> >> +#include "exec/cpu-all.h"
> > Why do you need this? Shouldn't the TB_FLAGS fields work without this.
> Because env_archcpu in cpu_get_tb_cpu_state will use it.

Ah fair enough.

> >> +
> >> +FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> >> +FIELD(TB_FLAGS, LMUL, 3, 2)
> >> +FIELD(TB_FLAGS, SEW, 5, 3)
> >> +FIELD(TB_FLAGS, VILL, 8, 1)
> > These should probably be defined with the other TB_FLAGS (or if you
> > need them here you can move the others up here).
> I'd like to put other TB_FLAGS in other separate patch.
> >
> >> +
> >> +/*
> >> + * A simplification for VLMAX
> >> + * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
> >> + * = (VLEN << LMUL) / (8 << SEW)
> >> + * = (VLEN << LMUL) >> (SEW + 3)
> >> + * = VLEN >> (SEW + 3 - LMUL)
> >> + */
> >> +static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
> >> +{
> >> +uint8_t sew, lmul;
> >> +
> >> +sew = FIELD_EX64(vtype, VTYPE, SEW);
> >> +lmul = FIELD_EX64(vtype, VTYPE, LMUL);
> >> +return cpu->cfg.vlen >> (sew + 3 - lmul);
> > Shouldn't we assert this isn't over

Re: [PATCH v6 11/18] target/ppc: Only calculate RMLS derived RMA limit on demand

2020-02-27 Thread David Gibson
On Wed, Feb 26, 2020 at 02:24:53PM +0100, Greg Kurz wrote:
> On Tue, 25 Feb 2020 10:37:17 +1100
> David Gibson  wrote:
> 
> > When the LPCR is written, we update the env->rmls field with the RMA limit
> > it implies.  Simplify things by just calculating the value directly from
> > the LPCR value when we need it.
> > 
> > It's possible this is a little slower, but it's unlikely to be significant,
> > since this is only for real mode accesses in a translation configuration
> > that's not used very often, and the whole thing is behind the qemu TLB
> > anyway.  Therefore, keeping the number of state variables down and not
> > having to worry about making sure it's always in sync seems the better
> > option.
> > 
> 
> This patch also refactors the code of ppc_hash64_update_vrma(), which
> is definitely an improvement, but seems a bit unrelated to the title...
> I'd personally make it a separate patch but you decide of course :)

Ah, dang it, botched rebase damage.  There are a couple of hunks here
that are supposed to be in the next patch.  I think I've sorted it out
now.

> 
> Also, a cosmetic remark. See below.
> 
> > Signed-off-by: David Gibson 
> > Reviewed-by: Cédric Le Goater 
> > ---
> >  target/ppc/cpu.h|  1 -
> >  target/ppc/mmu-hash64.c | 84 -
> >  2 files changed, 40 insertions(+), 45 deletions(-)
> > 
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 8077fdb068..f9871b1233 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -1046,7 +1046,6 @@ struct CPUPPCState {
> >  uint64_t insns_flags2;
> >  #if defined(TARGET_PPC64)
> >  ppc_slb_t vrma_slb;
> > -target_ulong rmls;
> >  #endif
> >  
> >  int error_code;
> > diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> > index dd0df6fd01..ac21c14f68 100644
> > --- a/target/ppc/mmu-hash64.c
> > +++ b/target/ppc/mmu-hash64.c
> > @@ -791,6 +791,35 @@ static target_ulong rmls_limit(PowerPCCPU *cpu)
> >  }
> >  }
> >  
> > +static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
> > +{
> > +CPUPPCState *env = &cpu->env;
> > +target_ulong lpcr = env->spr[SPR_LPCR];
> > +uint32_t vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
> > +target_ulong vsid = SLB_VSID_VRMA | ((vrmasd << 4) & 
> > SLB_VSID_LLP_MASK);
> > +int i;
> > +
> > +for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
> > +const PPCHash64SegmentPageSizes *sps = &cpu->hash64_opts->sps[i];
> > +
> > +if (!sps->page_shift) {
> > +break;
> > +}
> > +
> > +if ((vsid & SLB_VSID_LLP_MASK) == sps->slb_enc) {
> > +slb->esid = SLB_ESID_V;
> > +slb->vsid = vsid;
> > +slb->sps = sps;
> > +return 0;
> > +}
> > +}
> > +
> > +error_report("Bad page size encoding in LPCR[VRMASD]; LPCR=0x"
> > + TARGET_FMT_lx"\n", lpcr);
> > +
> > +return -1;
> > +}
> > +
> >  int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr,
> >  int rwx, int mmu_idx)
> >  {
> > @@ -844,8 +873,10 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> > eaddr,
> >  
> >  goto skip_slb_search;
> >  } else {
> > +target_ulong limit = rmls_limit(cpu);
> > +
> >  /* Emulated old-style RMO mode, bounds check against RMLS */
> > -if (raddr >= env->rmls) {
> > +if (raddr >= limit) {
> >  if (rwx == 2) {
> >  ppc_hash64_set_isi(cs, SRR1_PROTFAULT);
> >  } else {
> > @@ -1007,8 +1038,9 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU 
> > *cpu, target_ulong addr)
> >  return -1;
> >  }
> >  } else {
> > +target_ulong limit = rmls_limit(cpu);
> 
> Maybe add an empty line like you did above for consistency and better
> readability ?

Ok, done.

> 
> Anyway, feel free to add:
> 
> Reviewed-by: Greg Kurz 
> 
> >  /* Emulated old-style RMO mode, bounds check against RMLS */
> > -if (raddr >= env->rmls) {
> > +if (raddr >= limit) {
> >  return -1;
> >  }
> >  return raddr | env->spr[SPR_RMOR];
> > @@ -1043,53 +1075,18 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu, 
> > target_ulong ptex,
> >  static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
> >  {
> >  CPUPPCState *env = &cpu->env;
> > -const PPCHash64SegmentPageSizes *sps = NULL;
> > -target_ulong esid, vsid, lpcr;
> >  ppc_slb_t *slb = &env->vrma_slb;
> > -uint32_t vrmasd;
> > -int i;
> > -
> > -/* First clear it */
> > -slb->esid = slb->vsid = 0;
> > -slb->sps = NULL;
> >  
> >  /* Is VRMA enabled ? */
> > -if (!ppc_hash64_use_vrma(env)) {
> > -return;
> > -}
> > -
> > -/*
> > - * Make one up. Mostly ignore the ESID which will not be needed
> > - * for translation
> > - */
> > -l

Re: [PATCH v6 09/18] target/ppc: Streamline calculation of RMA limit from LPCR[RMLS]

2020-02-27 Thread David Gibson
On Wed, Feb 26, 2020 at 08:56:40AM +0100, Greg Kurz wrote:
> On Wed, 26 Feb 2020 12:04:13 +1100
> David Gibson  wrote:
> 
> > On Tue, Feb 25, 2020 at 11:47:25PM +0100, Greg Kurz wrote:
> > > On Tue, 25 Feb 2020 18:05:31 +0100
> > > Greg Kurz  wrote:
> > > 
> > > > On Tue, 25 Feb 2020 10:37:15 +1100
> > > > David Gibson  wrote:
> > > > 
> > > > > Currently we use a big switch statement in ppc_hash64_update_rmls() 
> > > > > to work
> > > > > out what the right RMA limit is based on the LPCR[RMLS] field.  
> > > > > There's no
> > > > > formula for this - it's just an arbitrary mapping defined by the 
> > > > > existing
> > > > > CPU implementations - but we can make it a bit more readable by using 
> > > > > a
> > > > > lookup table rather than a switch.  In addition we can use the MiB/GiB
> > > > > symbols to make it a bit clearer.
> > > > > 
> > > > > While there we add a bit of clarity and rationale to the comment about
> > > > > what happens if the LPCR[RMLS] doesn't contain a valid value.
> > > > > 
> > > > > Signed-off-by: David Gibson 
> > > > > Reviewed-by: Cédric Le Goater 
> > > > > ---
> > > > >  target/ppc/mmu-hash64.c | 71 
> > > > > -
> > > > >  1 file changed, 35 insertions(+), 36 deletions(-)
> > > > > 
> > > > > diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> > > > > index 0ef330a614..4f082d775d 100644
> > > > > --- a/target/ppc/mmu-hash64.c
> > > > > +++ b/target/ppc/mmu-hash64.c
> > > > > @@ -18,6 +18,7 @@
> > > > >   * License along with this library; if not, see 
> > > > > .
> > > > >   */
> > > > >  #include "qemu/osdep.h"
> > > > > +#include "qemu/units.h"
> > > > >  #include "cpu.h"
> > > > >  #include "exec/exec-all.h"
> > > > >  #include "exec/helper-proto.h"This tool was originally developed to 
> > > > > fix Linux CPU throttling issues affecting Lenovo T480 / T480s / X1C6 
> > > > > as described here.
> > > > > @@ -757,6 +758,39 @@ static void ppc_hash64_set_c(PowerPCCPU *cpu, 
> > > > > hwaddr ptex, uint64_t pte1)
> > > > >  stb_phys(CPU(cpu)->as, base + offset, (pte1 & 0xff) | 0x80);
> > > > >  }
> > > > >  
> > > > > +static target_ulong rmls_limit(PowerPCCPU *cpu)
> > > > > +{
> > > > > +CPUPPCState *env = &cpu->env;
> > > > > +/*
> > > > > + * This is the full 4 bits encoding of POWER8. Previous
> > > > > + * CPUs only support a subset of these but the filtering
> > > > > + * is done when writing LPCR
> > > > > + */
> > > > > +const target_ulong rma_sizes[] = {
> > > > > +[0] = 0,
> > > > > +[1] = 16 * GiB,
> > > > > +[2] = 1 * GiB,
> > > > > +[3] = 64 * MiB,
> > > > > +[4] = 256 * MiB,
> > > > > +[5] = 0,
> > > > > +[6] = 0,
> > > > > +[7] = 128 * MiB,
> > > > > +[8] = 32 * MiB,
> > > > > +};
> > > > > +target_ulong rmls = (env->spr[SPR_LPCR] & LPCR_RMLS) >> 
> > > > > LPCR_RMLS_SHIFT;
> > > > > +
> > > > > +if (rmls < ARRAY_SIZE(rma_sizes)) {
> > > > 
> > > > This condition is always true since the RMLS field is 4-bit long... 
> > > 
> > > Oops my mistake, I was already thinking about the suggestion I have
> > > for something that was puzzling me. See below.
> > > 
> > > > I guess you want to check that RMLS encodes a valid RMA size instead.
> > > > 
> > > > if (rma_sizes[rmls]) {
> > > > 
> > > > > +return rma_sizes[rmls];
> > > > > +} else {
> > > > > +/*
> > > > > + * Bad value, so the OS has shot itself in the foot.  Return 
> > > > > a
> > > > > + * 0-sized RMA which we expect to trigger an immediate DSI or
> > > > > + * ISI
> > > > > + */
> > > 
> > > It seems a bit weird to differentiate the case where the value is bad
> > > because it happens to be bigger than the highest supported one, compared
> > > to values that are declared bad in rma_sizes[], like 0, 5 or 6. They're
> > > all basically the same case of values not used to encode a valid
> > > size...
> > 
> > Right, but the result is the same either way - the function returns
> > 0.  This is basically just a small space optimization.
> > 
> > > 
> > > What about :
> > > 
> > > static const target_ulong rma_sizes[16] = {
> > > [1] = 16 * GiB,
> > > [2] = 1 * GiB,
> > > [3] = 64 * MiB,
> > > [4] = 256 * MiB,
> > > [7] = 128 * MiB,
> > > [8] = 32 * MiB,
> > > };
> > 
> > Eh, I guess?  I don't see much to pick between them.
> > 
> 
> This is what I had in mind actually.
> 
> static target_ulong rmls_limit(PowerPCCPU *cpu)
> {
> CPUPPCState *env = &cpu->env;
> /*
>  * This is the full 4 bits encoding of POWER8. Previous
>  * CPUs only support a subset of these but the filtering
>  * is done when writing LPCR.
>  *
>  * Unsupported values mean the OS has shot itself in the
>  * foot. Return a 0-sized RMA in this case, which we expect
>  * to trigger an immediate DSI or IS

Re: [PATCH v6 17/18] spapr: Clean up RMA size calculation

2020-02-27 Thread David Gibson
On Wed, Feb 26, 2020 at 02:37:51PM +0100, Greg Kurz wrote:
> On Tue, 25 Feb 2020 10:37:23 +1100
> David Gibson  wrote:
> 
> > Move the calculation of the Real Mode Area (RMA) size into a helper
> > function.  While we're there clean it up and correct it in a few ways:
> >   * Add comments making it clearer where the various constraints come from
> >   * Remove a pointless check that the RMA fits within Node 0 (we've just
> > clamped it so that it does)
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr.c | 59 ++
> >  1 file changed, 35 insertions(+), 24 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 6e9f15f64d..f0354b699d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2648,6 +2648,40 @@ static PCIHostState *spapr_create_default_phb(void)
> >  return PCI_HOST_BRIDGE(dev);
> >  }
> >  
> > +static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
> > +{
> > +MachineState *machine = MACHINE(spapr);
> > +hwaddr rma_size = machine->ram_size;
> > +hwaddr node0_size = spapr_node0_size(machine);
> > +
> > +/* RMA has to fit in the first NUMA node */
> > +rma_size = MIN(rma_size, node0_size);
> > +
> > +/*
> > + * VRMA access is via a special 1TiB SLB mapping, so the RMA can
> > + * never exceed that
> > + */
> > +rma_size = MIN(rma_size, TiB);
> > +
> > +/*
> > + * Clamp the RMA size based on machine type.  This is for
> > + * migration compatibility with older qemu versions, which limited
> > + * the RMA size for complicated and mostly bad reasons.
> > + */
> > +if (smc->rma_limit) {
> 
> /home/greg/Work/qemu/qemu-ppc/hw/ppc/spapr.c: In function ‘spapr_rma_size’:
> /home/greg/Work/qemu/qemu-ppc/hw/ppc/spapr.c:2671:9: error: ‘smc’ undeclared 
> (first use in this function)
>  if (smc->rma_limit) {

Oops.  Fixed.

> 
> > +spapr->rma_size = MIN(spapr->rma_size, smc->rma_limit);
> > +}
> > +
> > +if (rma_size < (MIN_RMA_SLOF * MiB)) {
> > +error_setg(errp,
> > +"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
> > +   MIN_RMA_SLOF);
> > +return -1;
> > +}
> > +
> > +return rma_size;
> > +}
> > +
> >  /* pSeries LPAR / sPAPR hardware init */
> >  static void spapr_machine_init(MachineState *machine)
> >  {
> > @@ -2660,7 +2694,6 @@ static void spapr_machine_init(MachineState *machine)
> >  int i;
> >  MemoryRegion *sysmem = get_system_memory();
> >  MemoryRegion *ram = g_new(MemoryRegion, 1);
> > -hwaddr node0_size = spapr_node0_size(machine);
> >  long load_limit, fw_size;
> >  char *filename;
> >  Error *resize_hpt_err = NULL;
> > @@ -2700,22 +2733,7 @@ static void spapr_machine_init(MachineState *machine)
> >  exit(1);
> >  }
> >  
> > -spapr->rma_size = node0_size;
> > -
> > -/*
> > - * Clamp the RMA size based on machine type.  This is for
> > - * migration compatibility with older qemu versions, which limited
> > - * the RMA size for complicated and mostly bad reasons.
> > - */
> > -if (smc->rma_limit) {
> > -spapr->rma_size = MIN(spapr->rma_size, smc->rma_limit);
> > -}
> > -
> > -if (spapr->rma_size > node0_size) {
> > -error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
> > - spapr->rma_size);
> > -exit(1);
> > -}
> > +spapr->rma_size = spapr_rma_size(spapr, &error_fatal);
> >  
> >  /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
> >  load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
> > @@ -2954,13 +2972,6 @@ static void spapr_machine_init(MachineState *machine)
> >  }
> >  }
> >  
> > -if (spapr->rma_size < MIN_RMA_SLOF) {
> > -error_report(
> > -"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode 
> > Area memory)",
> > -MIN_RMA_SLOF / MiB);
> > -exit(1);
> > -}
> > -
> >  if (kernel_filename) {
> >  uint64_t lowaddr = 0;
> >  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Peter Xu
On Thu, Feb 27, 2020 at 10:14:47PM +0100, Auger Eric wrote:
> Hi Peter,

Hi, Eric,

[...]

> > + * the KVM resample fd kick is skipped.  The userspace
> > + * needs to remember the resamplefd and kick it when we
> > + * receive EOI of this IRQ.
>  Practically we now talk about a VFIO ACTION_UNMASK classical eventfd
>  As such isn't it a bit weird to handle those normal UNMASK eventfds in
>  the KVM code?
> >>>
> >>> I'm not sure I completely get the question, but this should be
> >>> something general to KVM resamplefd support.  In other words, this
> >>> should also fix other devices (besides VFIO) when they're using the
> >>> KVM resamplefd, because IMHO it's the resamplefd and split irqchip
> >>> which is really broken here.
> >> Here is my understanding (& memories): the KVM resamplefd is an eventfd
> >> you register to KVM so that KVM triggers the resamplefd when KVM traps
> >> the EOI. Here I understand this is the userspace IOAPIC that traps the
> >> EOI and not the in-kernel virtual interrupt controller. So I would have
> >> expected you just need to signal the VFIO UNMASK eventfd to re-enable
> >> the physical IRQ (which was automasked). This is no more a KVM
> >> resamplefd strictly speaking as KVM is not involved anymore in the
> >> deactivation process.
> > 
> > Yes KVM kernel side should not be involed when we're using split
> > irqchip in this case.  However it should still belongs to the work of
> > the userspace KVM module (kvm-all.c) so that it can still "mimic" the
> > resamplefd feature that KVM_IRQFD provides.
> OK. So that what my actual question. Should this be handled by kvm-all.c?

It should fix KVM split irqchip with resamplefd, so I think it's
natural to do this in kvm-all.c (I'm a bit puzzled on where else we
can put this... :).  Or did I misunderstood your question?

> > 
> >>>
> >>> With that in mind, I think KVM should not need to even know what's
> >>> behind the resamplefd (in VFIO's case, it's the UNMASK eventfd).  It
> >>> just needs to kick it when IOAPIC EOI comes for the specific IRQ
> >> But above the userspace directly calls
> >> event_notifier_set(rfd->resample_event);
> >>
> >> This is not KVM anymore that "kicks it". Or maybe I miss something. So
> >> my comment was, why is it handled in the QEMU KVM layer?
> > 
> > It's my fault to be unclear on using "KVM" above.  I should really say
> > it as kvm-all.c, say, the QEMU layer for the kernel KVM module.
> > 
> > Indeed this problem is complicated... let me try to summarize.
> > 
> > Firstly KVM split irqchip and resamplefd is not really going to work
> > in the kernel (I think we just overlooked that when introducing the
> > 2nd feature, no matter which one comes first), because the resample
> > operation should be part of IOAPIC EOI, nevertheless when using split
> > irqchip IOAPIC is in userspace.
> > 
> > After we noticed this, Alex somewhere proposed to disable that in KVM,
> > which is actually the 1st kernel patch (654f1f13ea56).
> > 
> > We should (at the same time) propose patch 1 too in this series but I
> > guess everybody just forgot this afterwards (Paolo actually proposed
> > mostly the whole solution but I guess it got forgotten too)...
> > 
> > About the fast path speedup: the main logic should be to mimic the
> > same resamplefd feature as provided by KVM_IRQFD but this time only in
> > the userspace.  However now we're implementing the same logic only
> > within userspace kvm-all.c, and the kernel KVM should be totally not
> > aware of this.  Doing that benefits us in that the KVM interface in
> > QEMU does not need to be changed (majorly kvm_irqchip_assign_irqfd()).
> > What we need to do is just to wire up the userspace IOAPIC with these
> > resamplefds.  And the idea is actually the same too - someone (VFIO)
> > wants to have one fd (which is the resamplefd) kicked when EOI comes
> > when requesting for a KVM irqfd, no matter who's going to kick it
> > (kernel KVM or userspace).  That's all.
> 
> Yep I think it makes sense to accelerate the trigger path. And for the
> EOI path if you have means to trap this on the userspace irqchip it
> looks better than doing the map/unmap dance. So it looks a good iead to
> me. Now shall it be in kvm-all.c or elsewhere, to me it is not the most
> important, as long as we reach a consensus and the scheme gets
> documented somewhere.

Sure.

For documentation: as mentioned above, I think the irqfd users will
always use the interface just like before, and the resamplefd should
work exactly like what KVM_IRQFD and kvm_irqchip_assign_irqfd() was
offering before this patch too.  IMO it'll just start to work even for
split irqchips which was silently broken without being noticed.

Thanks,

-- 
Peter Xu




Re: [PATCH v2 2/2] hw/arm/armv7m: Downgrade CPU reset handler priority

2020-02-27 Thread Alistair Francis
On Thu, Feb 27, 2020 at 1:44 PM Peter Maydell  wrote:
>
> On Thu, 27 Feb 2020 at 21:37, Alistair Francis  wrote:
> > I do hit this problem, Peter described a workaround in the previous
> > version of this patch, that is to link at address 0 instead of the
> > alias address.
>
> Do you happen to have a simple test case you can send me
> that demonstrates the bug? That will save me a bit of
> messing around when I come to try to fix it...

Yep!

This repo: https://github.com/alistair23/CSSE3010-QEMU-Examples

Run: np2_env.sh to setup variables

Then build an example in the examples directory.

That repo will hard fault on QEMU currently (it doesn't have the address fix).

Alistair

>
> thanks
> -- PMM



Re: [PATCH v2 2/2] hw/arm/armv7m: Downgrade CPU reset handler priority

2020-02-27 Thread Peter Maydell
On Thu, 27 Feb 2020 at 21:37, Alistair Francis  wrote:
> I do hit this problem, Peter described a workaround in the previous
> version of this patch, that is to link at address 0 instead of the
> alias address.

Do you happen to have a simple test case you can send me
that demonstrates the bug? That will save me a bit of
messing around when I come to try to fix it...

thanks
-- PMM



Re: [PATCH v2 2/2] hw/arm/armv7m: Downgrade CPU reset handler priority

2020-02-27 Thread Alistair Francis
On Thu, Feb 27, 2020 at 5:32 AM Philippe Mathieu-Daudé
 wrote:
>
> Hi Stephanos,
>
> On 2/27/20 12:51 PM, Stephanos Ioannidis wrote:
> > The ARMv7-M CPU reset handler, which loads the initial SP and PC
> > register values from the vector table, is currently executed before
> > the ROM reset handler (rom_reset), and this causes the devices that
> > alias low memory region (e.g. STM32F405 that aliases the flash memory
> > located at 0x800 to 0x0) to load an invalid reset vector of 0 when
> > the kernel image is linked to be loaded at the high memory address.
>
> So we have armv7m_load_kernel -> load_elf_as -> rom_add_blob_fixed_as ->
> rom_add_blob -> rom_insert.
>
> arm_cpu_reset is called before rom_reset, rom_ptr is NULL, we call
> initial_pc = ldl_phys(cpu_as) from an empty flash.
>
> Then later rom_reset -> address_space_write_rom.
>
> I think Alistair and myself use the 'loader' device with Cortex-M boards
> and never hit this problem.

I do hit this problem, Peter described a workaround in the previous
version of this patch, that is to link at address 0 instead of the
alias address.

Alistair

>
> >
> > For instance, it is norm for the STM32F405 firmware ELF image to have
> > the text and rodata sections linked at 0x800, as this facilitates
> > proper image loading by the firmware burning utility, and the processor
> > can execute in place from the high flash memory address region as well.
> >
> > In order to resolve this issue, this commit downgrades the ARMCPU reset
> > handler invocation priority level to -1 such that it is always executed
> > after the ROM reset handler, which has a priority level of 0.
> >
> > Signed-off-by: Stephanos Ioannidis 
> > ---
> >   hw/arm/armv7m.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
> > index 7531b97ccd..8b7c4b12a6 100644
> > --- a/hw/arm/armv7m.c
> > +++ b/hw/arm/armv7m.c
> > @@ -352,7 +352,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
> > *kernel_filename, int mem_size)
> >* way A-profile does it. Note that this means that every M profile
> >* board must call this function!
> >*/
> > -qemu_register_reset(armv7m_reset, cpu);
> > +qemu_register_reset_with_priority(
> > +QEMU_RESET_PRIORITY_LEVEL(-1), armv7m_reset, cpu);
> >   }
> >
> >   static Property bitband_properties[] = {
> >
>
>



RE: [EXTERNAL] Re: [PATCH] WHPX: Use QEMU values for trapped CPUID

2020-02-27 Thread Sunil Muthuswamy
> -Original Message-
> From: Eduardo Habkost 
> Sent: Thursday, February 27, 2020 1:10 PM
> To: Sunil Muthuswamy 
> Cc: Paolo Bonzini ; Richard Henderson 
> ; qemu-devel@nongnu.org; Stefan Weil
> 
> Subject: [EXTERNAL] Re: [PATCH] WHPX: Use QEMU values for trapped CPUID
> 
> On Thu, Feb 27, 2020 at 09:01:04PM +, Sunil Muthuswamy wrote:
> > Currently, WHPX is using some default values for the trapped CPUID
> > functions. These were not in sync with the QEMU values because the
> > CPUID values were never set with WHPX during VCPU initialization.
> > Additionally, at the moment, WHPX doesn't support setting CPUID
> > values in the hypervisor at runtime (i.e. after the partition has
> > been setup). That is needed to be able to set the CPUID values in
> > the hypervisor during VCPU init.
> > Until that support comes, use the QEMU values for the trapped CPUIDs.
> >
> > Signed-off-by: Sunil Muthuswamy 
> 
> I like the change, but I wonder if any if your users would still
> prefer to use the default result chosen by WHPX instead of the
> ones chosen by QEMU.
> 

Note that the current patch only applies to the trapped CPUIDs, which for
WHPX are currently only {1, 0x8001}. WHPX will still provide most
of the values.

> On the KVM side I have always wondered if we should have a mode
> where all CPUID leaves are the ones chosen by KVM, making no
> KVM_SET_CPUID calls.  It would be useful for experimentation and
> debugging of KVM/QEMU defaults.
> 
Agreed. I think such an option could be useful debugging tool.

> 
> > ---
> >  target/i386/whpx-all.c | 42 ++
> >  1 file changed, 18 insertions(+), 24 deletions(-)
> >
> > diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
> > index 35601b8176..4fe5a78b29 100644
> > --- a/target/i386/whpx-all.c
> > +++ b/target/i386/whpx-all.c
> > @@ -980,38 +980,32 @@ static int whpx_vcpu_run(CPUState *cpu)
> >  WHV_REGISTER_VALUE reg_values[5];
> >  WHV_REGISTER_NAME reg_names[5];
> >  UINT32 reg_count = 5;
> > -UINT64 rip, rax, rcx, rdx, rbx;
> > +UINT64 cpuid_fn, rip = 0, rax = 0, rcx = 0, rdx = 0, rbx = 0;
> > +X86CPU *x86_cpu = X86_CPU(cpu);
> > +CPUX86State *env = &x86_cpu->env;
> >
> >  memset(reg_values, 0, sizeof(reg_values));
> >
> >  rip = vcpu->exit_ctx.VpContext.Rip +
> >vcpu->exit_ctx.VpContext.InstructionLength;
> > -switch (vcpu->exit_ctx.CpuidAccess.Rax) {
> > -case 1:
> > -rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> > -/* Advertise that we are running on a hypervisor */
> > -rcx =
> > -vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
> > -CPUID_EXT_HYPERVISOR;
> > -
> > -rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > -rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> > -break;
> > +cpuid_fn = vcpu->exit_ctx.CpuidAccess.Rax;
> > +
> > +/*
> > + * Ideally, these should be supplied to the hypervisor during 
> > VCPU
> > + * initialization and it should be able to satisfy this 
> > request.
> > + * But, currently, WHPX doesn't support setting CPUID values 
> > in the
> > + * hypervisor once the partition has been setup, which is too 
> > late
> > + * since VCPUs are realized later. For now, use the values from
> > + * QEMU to satisfy these requests, until WHPX adds support for
> > + * being able to set these values in the hypervisor at runtime.
> > + */
> > +cpu_x86_cpuid(env, cpuid_fn, 0, (UINT32 *)&rax, (UINT32 *)&rbx,
> > +(UINT32 *)&rcx, (UINT32 *)&rdx);
> > +switch (cpuid_fn) {
> >  case 0x8001:
> > -rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> >  /* Remove any support of OSVW */
> > -rcx =
> > -vcpu->exit_ctx.CpuidAccess.DefaultResultRcx &
> > -~CPUID_EXT3_OSVW;
> > -
> > -rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > -rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> > +rcx &= ~CPUID_EXT3_OSVW;
> >  break;
> > -default:
> > -rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> > -rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
> > -rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > -rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> >  }
> >
> >  reg_names[0] = WHvX64RegisterRip;
> > --
> > 2.17.1
> >
> 
> --
> Eduardo




Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Auger Eric
Hi Peter,

On 2/27/20 8:19 PM, Peter Xu wrote:
> On Thu, Feb 27, 2020 at 07:22:08PM +0100, Auger Eric wrote:
>> Hi Peter,
>>
>> On 2/27/20 7:00 PM, Peter Xu wrote:
>>> On Thu, Feb 27, 2020 at 06:42:09PM +0100, Auger Eric wrote:
 Hi Peter,

 On 2/27/20 6:00 PM, Peter Xu wrote:
> This is majorly only for X86 because that's the only one that supports
> split irqchip for now.
>
> When the irqchip is split, we face a dilemma that KVM irqfd will be
> enabled, however the slow irqchip is still running in the userspace.
> It means that the resamplefd in the kernel irqfds won't take any
> effect and it can miss to ack INTx interrupts on EOIs.
 Won't it always fail to ack INTx? With the above sentence I understand
 it can work sometimes?
>>>
>>> I wanted to mean that it will fail.  How about s/can/will/?  Or even
>>> better wordings that you'd suggest?
>> yes: s/can/will
>>>
>
> One example is split irqchip with VFIO INTx, which will break if we
> use the VFIO INTx fast path.
>
> This patch can potentially supports the VFIO fast path again for INTx,
> that the IRQ delivery will still use the fast path, while we don't
> need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
> callers of vfio_eoi() hook).  However the EOI of the INTx will still
> need to be done from the userspace by caching all the resamplefds in
> QEMU and kick properly for IOAPIC EOI broadcast.
 If I understand correctly this is a one way fast path? Fast path is on
 the trigger side only: VFIO -> KVM but not on the deactivation side,
 trapped by the userspace IOAPIC where you directly notify the UNMASK
 eventfd from userspace. Is that correct?
>>>
>>> Right, the injection is still using the whole fast path.  However
>>> AFAIU even for the EOI path it should still be faster than the pure
>>> slow path of vfio INTx EIO.  From what I got from reading the code,
>>> the slow path will conditionally unmap MMIO regions (with a timer to
>>> delay the recovery) so all MMIOs will be slowed down.  For what this
>>> patch is doing, it will need to exit to userspace for sure for each
>>> EOI (after all IOAPIC is in userspace), however for the whole
>>> lifecycle of the device, the MMIO regions should always be mapped so
>>> no unwanted MMIO traps.
>> Yes the EOI is trapped on IOAPIC side and not at the BAR level. So it
>> should be more efficient and more precise.
> 
> Yes.
> 
>>>
>
> When the userspace is responsible for the resamplefd kickup, don't
> register it on the kvm_irqfd anymore, because on newer kernels (after
> commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
> irqchip and resamplefd.  This will make sure that the fast path will
> work for all supported kernels.
>
> https://patchwork.kernel.org/patch/10738541/#22609933
>
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Peter Xu 
> ---
> v1.1 changelog:
> - when resamplefd is going to be kicked from userspace, don't register
>   it again in KVM_IRQFD.  Tested against upstream kernel.
>
>  accel/kvm/kvm-all.c| 74 --
>  accel/kvm/trace-events |  1 +
>  hw/intc/ioapic.c   | 11 +--
>  include/sysemu/kvm.h   |  4 +++
>  4 files changed, 86 insertions(+), 4 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d49b74512a..b766b6e93c 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -159,9 +159,62 @@ static const KVMCapabilityInfo 
> kvm_required_capabilites[] = {
>  static NotifierList kvm_irqchip_change_notifiers =
>  NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
>  
> +struct KVMResampleFd {
> +int gsi;
> +EventNotifier *resample_event;
> +QLIST_ENTRY(KVMResampleFd) node;
> +};
> +typedef struct KVMResampleFd KVMResampleFd;
> +
> +/*
> + * Only used with split irqchip where we need to do the resample fd
> + * kick for the kernel from userspace.
> + */
> +static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
> +QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
> +
>  #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
>  #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
>  
> +static inline void kvm_resample_fd_remove(int gsi)
> +{
> +KVMResampleFd *rfd;
> +
> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> +if (rfd->gsi == gsi) {
> +QLIST_REMOVE(rfd, node);
> +break;
> +}
> +}
> +}
> +
> +static inline void kvm_resample_fd_insert(int gsi, EventNotifier *event)
> +{
> +KVMResampleFd *rfd = g_new0(KVMResampleFd, 1);
> +
> +rfd->gsi = gsi;
> +rfd->resample_event = event;
> +
>>

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-27 Thread Stefan Ring
On Tue, Feb 25, 2020 at 3:12 PM Stefan Ring  wrote:
>
> I find many instances with the following pattern:
>
> current file length (= max position + size written): p
> write request n writes from (p + hole_size), thus leaving a hole
> request n+1 writes exactly hole_size, starting from p, thus completely
> filling the hole
> The two requests' in-flight times overlap.
> hole_size can be almost any value (7-127).

Victory! I have a reproducer in the form of a plain C libgfapi client.

However, I have not been able to trigger corruption by just executing
the simple pattern in an artificial way. Currently, I need to feed my
reproducer 2 GB of data that I streamed out of the qemu block driver.
I get two possible end states out of my reproducer: The correct one or
a corrupted one, where 48 KB are zeroed out. It takes no more than 10
runs to get each of them at least once. The corrupted end state is
exactly the same that I got from the real qemu process from where I
obtained the streamed trace. This gives me a lot of confidence in the
soundness of my reproducer.

More details will follow.



Re: [PATCH] WHPX: Use QEMU values for trapped CPUID

2020-02-27 Thread Eduardo Habkost
On Thu, Feb 27, 2020 at 09:01:04PM +, Sunil Muthuswamy wrote:
> Currently, WHPX is using some default values for the trapped CPUID
> functions. These were not in sync with the QEMU values because the
> CPUID values were never set with WHPX during VCPU initialization.
> Additionally, at the moment, WHPX doesn't support setting CPUID
> values in the hypervisor at runtime (i.e. after the partition has
> been setup). That is needed to be able to set the CPUID values in
> the hypervisor during VCPU init.
> Until that support comes, use the QEMU values for the trapped CPUIDs.
> 
> Signed-off-by: Sunil Muthuswamy 

I like the change, but I wonder if any if your users would still
prefer to use the default result chosen by WHPX instead of the
ones chosen by QEMU.

On the KVM side I have always wondered if we should have a mode
where all CPUID leaves are the ones chosen by KVM, making no
KVM_SET_CPUID calls.  It would be useful for experimentation and
debugging of KVM/QEMU defaults.


> ---
>  target/i386/whpx-all.c | 42 ++
>  1 file changed, 18 insertions(+), 24 deletions(-)
> 
> diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
> index 35601b8176..4fe5a78b29 100644
> --- a/target/i386/whpx-all.c
> +++ b/target/i386/whpx-all.c
> @@ -980,38 +980,32 @@ static int whpx_vcpu_run(CPUState *cpu)
>  WHV_REGISTER_VALUE reg_values[5];
>  WHV_REGISTER_NAME reg_names[5];
>  UINT32 reg_count = 5;
> -UINT64 rip, rax, rcx, rdx, rbx;
> +UINT64 cpuid_fn, rip = 0, rax = 0, rcx = 0, rdx = 0, rbx = 0;
> +X86CPU *x86_cpu = X86_CPU(cpu);
> +CPUX86State *env = &x86_cpu->env;
>  
>  memset(reg_values, 0, sizeof(reg_values));
>  
>  rip = vcpu->exit_ctx.VpContext.Rip +
>vcpu->exit_ctx.VpContext.InstructionLength;
> -switch (vcpu->exit_ctx.CpuidAccess.Rax) {
> -case 1:
> -rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> -/* Advertise that we are running on a hypervisor */
> -rcx =
> -vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
> -CPUID_EXT_HYPERVISOR;
> -
> -rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> -rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> -break;
> +cpuid_fn = vcpu->exit_ctx.CpuidAccess.Rax;
> +
> +/*
> + * Ideally, these should be supplied to the hypervisor during 
> VCPU
> + * initialization and it should be able to satisfy this request.
> + * But, currently, WHPX doesn't support setting CPUID values in 
> the
> + * hypervisor once the partition has been setup, which is too 
> late
> + * since VCPUs are realized later. For now, use the values from
> + * QEMU to satisfy these requests, until WHPX adds support for
> + * being able to set these values in the hypervisor at runtime.
> + */
> +cpu_x86_cpuid(env, cpuid_fn, 0, (UINT32 *)&rax, (UINT32 *)&rbx,
> +(UINT32 *)&rcx, (UINT32 *)&rdx);
> +switch (cpuid_fn) {
>  case 0x8001:
> -rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
>  /* Remove any support of OSVW */
> -rcx =
> -vcpu->exit_ctx.CpuidAccess.DefaultResultRcx &
> -~CPUID_EXT3_OSVW;
> -
> -rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> -rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> +rcx &= ~CPUID_EXT3_OSVW;
>  break;
> -default:
> -rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> -rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
> -rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> -rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
>  }
>  
>  reg_names[0] = WHvX64RegisterRip;
> -- 
> 2.17.1
> 

-- 
Eduardo




[PATCH] WHPX: Use QEMU values for trapped CPUID

2020-02-27 Thread Sunil Muthuswamy
Currently, WHPX is using some default values for the trapped CPUID
functions. These were not in sync with the QEMU values because the
CPUID values were never set with WHPX during VCPU initialization.
Additionally, at the moment, WHPX doesn't support setting CPUID
values in the hypervisor at runtime (i.e. after the partition has
been setup). That is needed to be able to set the CPUID values in
the hypervisor during VCPU init.
Until that support comes, use the QEMU values for the trapped CPUIDs.

Signed-off-by: Sunil Muthuswamy 
---
 target/i386/whpx-all.c | 42 ++
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
index 35601b8176..4fe5a78b29 100644
--- a/target/i386/whpx-all.c
+++ b/target/i386/whpx-all.c
@@ -980,38 +980,32 @@ static int whpx_vcpu_run(CPUState *cpu)
 WHV_REGISTER_VALUE reg_values[5];
 WHV_REGISTER_NAME reg_names[5];
 UINT32 reg_count = 5;
-UINT64 rip, rax, rcx, rdx, rbx;
+UINT64 cpuid_fn, rip = 0, rax = 0, rcx = 0, rdx = 0, rbx = 0;
+X86CPU *x86_cpu = X86_CPU(cpu);
+CPUX86State *env = &x86_cpu->env;
 
 memset(reg_values, 0, sizeof(reg_values));
 
 rip = vcpu->exit_ctx.VpContext.Rip +
   vcpu->exit_ctx.VpContext.InstructionLength;
-switch (vcpu->exit_ctx.CpuidAccess.Rax) {
-case 1:
-rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
-/* Advertise that we are running on a hypervisor */
-rcx =
-vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
-CPUID_EXT_HYPERVISOR;
-
-rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
-rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
-break;
+cpuid_fn = vcpu->exit_ctx.CpuidAccess.Rax;
+
+/*
+ * Ideally, these should be supplied to the hypervisor during VCPU
+ * initialization and it should be able to satisfy this request.
+ * But, currently, WHPX doesn't support setting CPUID values in the
+ * hypervisor once the partition has been setup, which is too late
+ * since VCPUs are realized later. For now, use the values from
+ * QEMU to satisfy these requests, until WHPX adds support for
+ * being able to set these values in the hypervisor at runtime.
+ */
+cpu_x86_cpuid(env, cpuid_fn, 0, (UINT32 *)&rax, (UINT32 *)&rbx,
+(UINT32 *)&rcx, (UINT32 *)&rdx);
+switch (cpuid_fn) {
 case 0x8001:
-rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
 /* Remove any support of OSVW */
-rcx =
-vcpu->exit_ctx.CpuidAccess.DefaultResultRcx &
-~CPUID_EXT3_OSVW;
-
-rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
-rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
+rcx &= ~CPUID_EXT3_OSVW;
 break;
-default:
-rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
-rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
-rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
-rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
 }
 
 reg_names[0] = WHvX64RegisterRip;
-- 
2.17.1



Re: [PATCH v2 3/3] savevm: check RAM is pagesize aligned

2020-02-27 Thread Aleksandar Markovic
On Thursday, February 27, 2020, Juan Quintela  wrote:

> Marc-André Lureau  wrote:
> > Hi Juan
> >
> > On Wed, Jan 8, 2020 at 2:08 PM Juan Quintela 
> wrote:
> >>
> >> Marc-André Lureau  wrote:
> >> n> Check the host pointer is correctly aligned, otherwise we may fail
> >> > during migration in ram_block_discard_range().
> >> >
> >> > Signed-off-by: Marc-André Lureau 
> >>
> >> Reviewed-by: Juan Quintela 
> >>
> >> queued
> >>
> >
> > Did it get lost? thanks
>
> I dropped it in the past, because it made "make check" for mips fail.
> (I put it on my ToDo list to investigate and forgot about it)
>
>
Thank you for caring for mips.

Do you perhaps remember what was tgevtest and environment for the failing
test?

Regards,
Aleksandar


> But now it pass, go figure.
>
> Included again.  Sorry.
>
> Later, Juan.
>
>
>


Re: [PATCH v6 7/8] multifd: Add multifd-zstd-level parameter

2020-02-27 Thread Peter Xu
On Thu, Feb 13, 2020 at 10:17:08PM +0100, Juan Quintela wrote:
> This parameter specifies the zstd compression level. The next patch
> will put it to use.
> 
> Signed-off-by: Juan Quintela 
> Acked-by: Markus Armbruster 

(I didn't look at the rest of patches, but this single patch looks
 sane to me...)

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH V2] vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM

2020-02-27 Thread Tom Lendacky
On 2/27/20 7:02 AM, Halil Pasic wrote:
> On Wed, 26 Feb 2020 11:52:26 -0500
> "Michael S. Tsirkin"  wrote:
> 
>> On Wed, Feb 26, 2020 at 04:36:18PM +0100, Halil Pasic wrote:
>>> On Wed, 26 Feb 2020 08:37:13 -0500
>>> "Michael S. Tsirkin"  wrote:
>>>
 On Wed, Feb 26, 2020 at 02:28:39PM +0100, Halil Pasic wrote:
> On Wed, 26 Feb 2020 17:43:57 +0800
> Jason Wang  wrote:
>
>> We turn on device IOTLB via VIRTIO_F_IOMMU_PLATFORM unconditionally on
>> platform without IOMMU support. This can lead unnecessary IOTLB
>> transactions which will damage the performance.
>>
>> Fixing this by check whether the device is backed by IOMMU and disable
>> device IOTLB.
>>
>> Reported-by: Halil Pasic 
>> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Jason Wang 
>
> Tested-by: Halil Pasic 
> Reviewed-by: Halil Pasic 
>
> Thank you very much for fixing this! BTW as I mentioned before it
> fixes vhost-vsock with iommu_platform=on as well.

 Fixes as in improves performance?
>>>
>>> No, fixes like one does not get something like:
>>> qemu-system-s390x: vhost_set_features failed: Operation not supported (95)
>>> qemu-system-s390x: Error starting vhost: 95
>>> any more.
>>>
>>> Regards,
>>> Halil
>>>
>>> [..]
>>
>> But can commit c471ad0e9bd46 actually boot a secure guest
>> where iommu_platform=on is required?
>>
> 
> No, of course it can not. But I'm not sure about AMD SEV. AFAIU without
> Jason's patch it does not work for AMD SEV. Tom already stated that with
> SEV they don't need the IOVA translation aspect of ACCESS_PLATFORM, but
> I have no idea if the condition vdev->dma_as == &address_space_memory
> catches them as well or not. They probably have !=.
> 
> CCing Tom. @Tom does vhost-vsock work for you with SEV and current qemu?

Adding Brijesh for this, too.

> 
> Also, one can specify iommu_platform=on on a device that ain't a part of
> a secure-capable VM, just for the fun of it. And that breaks
> vhost-vsock. Or is setting iommu_platform=on only valid if
> qemu-system-s390x is protected virtualization capable?
> 
> BTW, I don't have a strong opinion on the fixes tag. We currently do not
> recommend setting iommu_platform, and thus I don't think we care too
> much about past qemus having problems with it.
> 
> Regards,
> Halil
> 



Re: [PATCH v5 2/4] target/riscv: implementation-defined constant parameters

2020-02-27 Thread Richard Henderson
On 2/21/20 1:45 AM, LIU Zhiwei wrote:
> vlen is the vector register length in bits.
> elen is the max element size in bits.
> vext_spec is the vector specification version, default value is v0.7.1.
> 
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/cpu.c | 7 +++
>  target/riscv/cpu.h | 5 +
>  2 files changed, 12 insertions(+)

Reviewed-by: Richard Henderson 


r~



Re: [PATCH v5 1/4] target/riscv: add vector extension field in CPURISCVState

2020-02-27 Thread Richard Henderson
On 2/21/20 1:45 AM, LIU Zhiwei wrote:
> The 32 vector registers will be viewed as a continuous memory block.
> It avoids the convension between element index and (regno, offset).
> Thus elements can be directly accessed by offset from the first vector
> base address.
> 
> Signed-off-by: LIU Zhiwei 
> ---
>  target/riscv/cpu.h | 12 
>  1 file changed, 12 insertions(+)

Reviewed-by: Richard Henderson 


r~



Re: [PATCH v2] qapi/machine: Place the 'Notes' tag after the 'Since' tag

2020-02-27 Thread Eduardo Habkost
On Thu, Feb 27, 2020 at 04:21:56PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/27/20 3:55 PM, Philippe Mathieu-Daudé wrote:
> > On 2/27/20 3:52 PM, Markus Armbruster wrote:
> > > Philippe Mathieu-Daudé  writes:
> > > 
> > > > This fixes when adding a 'Since' tag:
> > > > 
> > > >    In file included from qapi/qapi-schema.json:105:
> > > >    qapi/machine.json:25:1: '@arch:' can't follow 'Notes' section
> > > 
> > > I'm confused.  This error is detected in scripts/qapi/parser.py, and it
> > > is fatal.  Is the build broken for you?  It isn't for me.  Moreover,
> > > where is @arch?  I can't see it anywhere close to the two spots the
> > > patch patches.
> > 
> > I get the error after trying to fix what Eric commented here:
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg682344.html
> 
> Using:
> ---
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6c11e3cf3a..40a36d6276 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -20,13 +20,15 @@
>  #prefix to produce the corresponding QEMU executable name. This
>  #is true even for "qemu-system-x86_64".
>  #
> +# @rx: since 5.0
> +#
>  # Since: 3.0
>  ##
>  { 'enum' : 'SysEmuTarget',
>'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
>   'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
>   'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
> - 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
> + 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
>   'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
>   'x86_64', 'xtensa', 'xtensaeb' ] }
> ---
> 
> or
> 
> ---
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6c11e3cf3a..4b59e87b6f 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -21,12 +21,14 @@
>  #is true even for "qemu-system-x86_64".
>  #
>  # Since: 3.0
> +#
> +# @rx: since 5.0
>  ##
>  { 'enum' : 'SysEmuTarget',
>'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
>   'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
>   'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
> - 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
> + 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
>   'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
>   'x86_64', 'xtensa', 'xtensaeb' ] }
> ---
> 
> I get:
> 
>   GEN qapi-gen
>   GEN rx-softmmu/config-devices.mak
> In file included from qapi/qapi-schema.json:105:
> qapi/machine.json:23:1: '@rx:' can't follow 'Notes' section
> make: *** [Makefile:645: qapi-gen-timestamp] Error 1
> 
> This works however:
> 
> ---
>  ##
>  # @SysEmuTarget:
>  #
>  # The comprehensive enumeration of QEMU system emulation ("softmmu")
>  # targets. Run "./configure --help" in the project root directory, and
>  # look for the *-softmmu targets near the "--target-list" option. The
>  # individual target constants are not documented here, for the time
>  # being.
>  #
> +# @rx: since 5.0
> +#
>  # Notes: The resulting QMP strings can be appended to the "qemu-system-"
>  #prefix to produce the corresponding QEMU executable name. This
>  #is true even for "qemu-system-x86_64".
>  #
>  # Since: 3.0
>  ##
>  { 'enum' : 'SysEmuTarget',
>'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
>   'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
>   'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
> - 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
> + 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
>   'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
>   'x86_64', 'xtensa', 'xtensaeb' ] }

If this works, what exactly is the problem this patch is trying to fix?


-- 
Eduardo




Re: ping Re: [PATCH for-5.0 v2 0/3] benchmark util

2020-02-27 Thread Eduardo Habkost
Sorry, this is due to lack of bandwidth of maintainers who can
review those patches.

I have one suggestion: if you make your script self-contained
inside a scripts/ subdirectory, it would be simpler to merge it
without detailed reviews from others.

The python/ subdirectory is supposed to appear on sys.path, so
maybe simplebench.py and qemu/bench_block_job.py can stay there,
but bench-example.py is not a loadable Python module and
shouldn't be there.

I see two possible options:

a) Moving everything to a scripts/simplebench subdirectory.
b) Moving only bench-example.py to scripts/, and do the sys.path
   hacking the other scripts do.

On either case, please add your name to MAINTAINERS as the
maintainer of those new files.


On Thu, Feb 27, 2020 at 04:18:00PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> Is problem in "S: Odd fixes" in Python section of MAINTAINERS?
> 
> Will it be correct, if I send a patch to MAINTAINERS, proposing
> myself as maintainer of Python scripts and s/Odd fixes/Maintained/ ?
> 
> And then just send pull request with this series, as "nobody minds"?
> 
> 08.02.2020 13:36, Vladimir Sementsov-Ogievskiy wrote:
> > pingg..
> > 
> > Hi! Could it be merged at all?
> > 
> > 20.01.2020 12:10, Vladimir Sementsov-Ogievskiy wrote:
> > > ping
> > > 
> > > 26.11.2019 18:48, Vladimir Sementsov-Ogievskiy wrote:
> > > > Hi all!
> > > > 
> > > > Here is simple benchmarking utility, to generate performance
> > > > comparison tables, like the following:
> > > > 
> > > > --  -  -  -
> > > >  backup-1   backup-2   mirror
> > > > ssd -> ssd  0.43 +- 0.00   4.48 +- 0.06   4.38 +- 0.02
> > > > ssd -> hdd  10.60 +- 0.08  10.69 +- 0.18  10.57 +- 0.05
> > > > ssd -> nbd  33.81 +- 0.37  10.67 +- 0.17  10.07 +- 0.07
> > > > --  -  -  -
> > > > 
> > > > This is a v2, as v1 was inside
> > > >   "[RFC 00/24] backup performance: block_status + async"
> > > > 
> > > > I'll use this benchmark in other series, hope someone
> > > > will like it.
> > > > 
> > > > Vladimir Sementsov-Ogievskiy (3):
> > > >    python: add simplebench.py
> > > >    python: add qemu/bench_block_job.py
> > > >    python: add example usage of simplebench
> > > > 
> > > >   python/bench-example.py    |  80 +
> > > >   python/qemu/bench_block_job.py | 115 +
> > > >   python/simplebench.py  | 128 +
> > > >   3 files changed, 323 insertions(+)
> > > >   create mode 100644 python/bench-example.py
> > > >   create mode 100755 python/qemu/bench_block_job.py
> > > >   create mode 100644 python/simplebench.py
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eduardo




Re: [PATCH v4 4/5] target/riscv: add fault-only-first unit stride load

2020-02-27 Thread Richard Henderson
On 2/25/20 2:35 AM, LIU Zhiwei wrote:
> +GEN_VEXT_LD_ELEM(vlbff_v_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(vlbff_v_h, int8_t,  int16_t, H2, ldsb)
> +GEN_VEXT_LD_ELEM(vlbff_v_w, int8_t,  int32_t, H4, ldsb)
> +GEN_VEXT_LD_ELEM(vlbff_v_d, int8_t,  int64_t, H8, ldsb)
> +GEN_VEXT_LD_ELEM(vlhff_v_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(vlhff_v_w, int16_t, int32_t, H4, ldsw)
> +GEN_VEXT_LD_ELEM(vlhff_v_d, int16_t, int64_t, H8, ldsw)
> +GEN_VEXT_LD_ELEM(vlwff_v_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(vlwff_v_d, int32_t, int64_t, H8, ldl)
> +GEN_VEXT_LD_ELEM(vleff_v_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(vleff_v_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(vleff_v_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(vleff_v_d, int64_t, int64_t, H8, ldq)
> +GEN_VEXT_LD_ELEM(vlbuff_v_b, uint8_t,  uint8_t,  H1, ldub)
> +GEN_VEXT_LD_ELEM(vlbuff_v_h, uint8_t,  uint16_t, H2, ldub)
> +GEN_VEXT_LD_ELEM(vlbuff_v_w, uint8_t,  uint32_t, H4, ldub)
> +GEN_VEXT_LD_ELEM(vlbuff_v_d, uint8_t,  uint64_t, H8, ldub)
> +GEN_VEXT_LD_ELEM(vlhuff_v_h, uint16_t, uint16_t, H2, lduw)
> +GEN_VEXT_LD_ELEM(vlhuff_v_w, uint16_t, uint32_t, H4, lduw)
> +GEN_VEXT_LD_ELEM(vlhuff_v_d, uint16_t, uint64_t, H8, lduw)
> +GEN_VEXT_LD_ELEM(vlwuff_v_w, uint32_t, uint32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(vlwuff_v_d, uint32_t, uint64_t, H8, ldl)

We definitely should not have a 3rd copy of these.


> +if (i == 0) {
> +probe_read_access(env, addr, nf * msz, ra);
> +} else {
> +/* if it triggles an exception, no need to check watchpoint */

triggers.

> +offset = -(addr | TARGET_PAGE_MASK);
> +remain = nf * msz;
> +while (remain > 0) {
> +host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
> +if (host) {
> +#ifdef CONFIG_USER_ONLY
> +if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
> +vl = i;
> +goto ProbeSuccess;
> +}
> +#else
> +probe_read_access(env, addr, nf * msz, ra);
> +#endif

Good job finding all of the corner cases.  I should invent a new cputlb
function that handles this better.  For now, this is the best we can do.


r~



Re: [PULL v2 00/30] virtio, pc: fixes, features

2020-02-27 Thread Peter Maydell
On Thu, 27 Feb 2020 at 08:55, Michael S. Tsirkin  wrote:
>
> On Wed, Feb 26, 2020 at 04:01:02AM -0500, Michael S. Tsirkin wrote:
> > changes from v1:
> > dropped vhost changes, hope this fixes build on Mac OS.
> >
> > The following changes since commit 9a8abceb5f01d1066d3a1ac5a33aabcbaeec1860:
> >
> >   Merge remote-tracking branch 'remotes/pmaydell/tags/pull-docs-20200225' 
> > into staging (2020-02-25 11:03:47 +)
> >
> > are available in the Git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> >
> > for you to fetch changes up to deec824070e408b936e02883a1e2cb5af92448d0:
>
> I updated one of the commit logs to include CC stable, so the new
> hash is: b844a4c77b618acfba6b3f4ce12d2ad709f99279


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM



Re: [PATCH v4 3/5] target/riscv: add vector index load and store instructions

2020-02-27 Thread Richard Henderson
On 2/25/20 2:35 AM, LIU Zhiwei wrote:
> +vsxb_v ... 011 . . . 000 . 0100111 @r_nfvm
> +vsxh_v ... 011 . . . 101 . 0100111 @r_nfvm
> +vsxw_v ... 011 . . . 110 . 0100111 @r_nfvm
> +vsxe_v ... 011 . . . 111 . 0100111 @r_nfvm
> +vsuxb_v... 111 . . . 000 . 0100111 @r_nfvm
> +vsuxh_v... 111 . . . 101 . 0100111 @r_nfvm
> +vsuxw_v... 111 . . . 110 . 0100111 @r_nfvm
> +vsuxe_v... 111 . . . 111 . 0100111 @r_nfvm

These can be merged, with a comment, like

# Vector ordered-indexed and unordered-indexed store insns.
vsxb_v ... -11 . . . 000 . 0100111 @r_nfvm

which means you don't need these:

> +static bool trans_vsuxb_v(DisasContext *s, arg_rnfvm* a)
> +{
> +return trans_vsxb_v(s, a);
> +}
> +
> +static bool trans_vsuxh_v(DisasContext *s, arg_rnfvm* a)
> +{
> +return trans_vsxh_v(s, a);
> +}
> +
> +static bool trans_vsuxw_v(DisasContext *s, arg_rnfvm* a)
> +{
> +return trans_vsxw_v(s, a);
> +}
> +
> +static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
> +{
> +return trans_vsxe_v(s, a);
> +}

> +static inline void vext_ld_index(void *vd, void *v0, target_ulong base,
> +void *vs2, CPURISCVState *env, uint32_t desc,
> +vext_get_index_addr get_index_addr,
> +vext_ld_elem_fn ld_elem,
> +vext_ld_clear_elem clear_elem,
> +uint32_t esz, uint32_t msz, uintptr_t ra)

Similar comment about merging vext_ld_index and vext_st_index.


r~



Re: [PATCH v4 2/5] target/riscv: add vector stride load and store instructions

2020-02-27 Thread Richard Henderson
On 2/25/20 2:35 AM, LIU Zhiwei wrote:
> +GEN_VEXT_LD_ELEM(vlsb_v_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(vlsb_v_h, int8_t,  int16_t, H2, ldsb)
> +GEN_VEXT_LD_ELEM(vlsb_v_w, int8_t,  int32_t, H4, ldsb)
> +GEN_VEXT_LD_ELEM(vlsb_v_d, int8_t,  int64_t, H8, ldsb)
> +GEN_VEXT_LD_ELEM(vlsh_v_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(vlsh_v_w, int16_t, int32_t, H4, ldsw)
> +GEN_VEXT_LD_ELEM(vlsh_v_d, int16_t, int64_t, H8, ldsw)
> +GEN_VEXT_LD_ELEM(vlsw_v_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(vlsw_v_d, int32_t, int64_t, H8, ldl)
> +GEN_VEXT_LD_ELEM(vlse_v_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(vlse_v_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(vlse_v_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(vlse_v_d, int64_t, int64_t, H8, ldq)
> +GEN_VEXT_LD_ELEM(vlsbu_v_b, uint8_t,  uint8_t,  H1, ldub)
> +GEN_VEXT_LD_ELEM(vlsbu_v_h, uint8_t,  uint16_t, H2, ldub)
> +GEN_VEXT_LD_ELEM(vlsbu_v_w, uint8_t,  uint32_t, H4, ldub)
> +GEN_VEXT_LD_ELEM(vlsbu_v_d, uint8_t,  uint64_t, H8, ldub)
> +GEN_VEXT_LD_ELEM(vlshu_v_h, uint16_t, uint16_t, H2, lduw)
> +GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
> +GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
> +GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)

Why do you need to define new functions identical to the old ones?  Are you
doing this just to make the names match up?


> +GEN_VEXT_ST_ELEM(vssb_v_b, int8_t,  H1, stb)
> +GEN_VEXT_ST_ELEM(vssb_v_h, int16_t, H2, stb)
> +GEN_VEXT_ST_ELEM(vssb_v_w, int32_t, H4, stb)
> +GEN_VEXT_ST_ELEM(vssb_v_d, int64_t, H8, stb)
> +GEN_VEXT_ST_ELEM(vssh_v_h, int16_t, H2, stw)
> +GEN_VEXT_ST_ELEM(vssh_v_w, int32_t, H4, stw)
> +GEN_VEXT_ST_ELEM(vssh_v_d, int64_t, H8, stw)
> +GEN_VEXT_ST_ELEM(vssw_v_w, int32_t, H4, stl)
> +GEN_VEXT_ST_ELEM(vssw_v_d, int64_t, H8, stl)
> +GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
> +GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
> +GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
> +GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)

Likewise.

> +static void vext_st_stride(void *vd, void *v0, target_ulong base,
> +target_ulong stride, CPURISCVState *env, uint32_t desc,
> +vext_st_elem_fn st_elem, uint32_t esz, uint32_t msz, uintptr_t ra)
> +{
> +uint32_t i, k;
> +uint32_t nf = vext_nf(desc);
> +uint32_t vm = vext_vm(desc);
> +uint32_t mlen = vext_mlen(desc);
> +uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +/* probe every access*/
> +for (i = 0; i < env->vl; i++) {
> +if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +continue;
> +}
> +probe_write_access(env, base + stride * i, nf * msz, ra);
> +}
> +/* store bytes to guest memory */
> +for (i = 0; i < env->vl; i++) {
> +k = 0;
> +if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +continue;
> +}
> +while (k < nf) {
> +target_ulong addr = base + stride * i + k * msz;
> +st_elem(env, addr, i + k * vlmax, vd, ra);
> +k++;
> +}
> +}
> +}

Similar comments wrt unifying the load and store helpers.

I'll also note that vext_st_stride and vext_st_us_mask could be unified by
passing sizeof(ETYPE) as stride, and vm = true as a parameter.


r~



Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Peter Xu
On Thu, Feb 27, 2020 at 07:22:08PM +0100, Auger Eric wrote:
> Hi Peter,
> 
> On 2/27/20 7:00 PM, Peter Xu wrote:
> > On Thu, Feb 27, 2020 at 06:42:09PM +0100, Auger Eric wrote:
> >> Hi Peter,
> >>
> >> On 2/27/20 6:00 PM, Peter Xu wrote:
> >>> This is majorly only for X86 because that's the only one that supports
> >>> split irqchip for now.
> >>>
> >>> When the irqchip is split, we face a dilemma that KVM irqfd will be
> >>> enabled, however the slow irqchip is still running in the userspace.
> >>> It means that the resamplefd in the kernel irqfds won't take any
> >>> effect and it can miss to ack INTx interrupts on EOIs.
> >> Won't it always fail to ack INTx? With the above sentence I understand
> >> it can work sometimes?
> > 
> > I wanted to mean that it will fail.  How about s/can/will/?  Or even
> > better wordings that you'd suggest?
> yes: s/can/will
> > 
> >>>
> >>> One example is split irqchip with VFIO INTx, which will break if we
> >>> use the VFIO INTx fast path.
> >>>
> >>> This patch can potentially supports the VFIO fast path again for INTx,
> >>> that the IRQ delivery will still use the fast path, while we don't
> >>> need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
> >>> callers of vfio_eoi() hook).  However the EOI of the INTx will still
> >>> need to be done from the userspace by caching all the resamplefds in
> >>> QEMU and kick properly for IOAPIC EOI broadcast.
> >> If I understand correctly this is a one way fast path? Fast path is on
> >> the trigger side only: VFIO -> KVM but not on the deactivation side,
> >> trapped by the userspace IOAPIC where you directly notify the UNMASK
> >> eventfd from userspace. Is that correct?
> > 
> > Right, the injection is still using the whole fast path.  However
> > AFAIU even for the EOI path it should still be faster than the pure
> > slow path of vfio INTx EIO.  From what I got from reading the code,
> > the slow path will conditionally unmap MMIO regions (with a timer to
> > delay the recovery) so all MMIOs will be slowed down.  For what this
> > patch is doing, it will need to exit to userspace for sure for each
> > EOI (after all IOAPIC is in userspace), however for the whole
> > lifecycle of the device, the MMIO regions should always be mapped so
> > no unwanted MMIO traps.
> Yes the EOI is trapped on IOAPIC side and not at the BAR level. So it
> should be more efficient and more precise.

Yes.

> > 
> >>>
> >>> When the userspace is responsible for the resamplefd kickup, don't
> >>> register it on the kvm_irqfd anymore, because on newer kernels (after
> >>> commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
> >>> irqchip and resamplefd.  This will make sure that the fast path will
> >>> work for all supported kernels.
> >>>
> >>> https://patchwork.kernel.org/patch/10738541/#22609933
> >>>
> >>> Suggested-by: Paolo Bonzini 
> >>> Signed-off-by: Peter Xu 
> >>> ---
> >>> v1.1 changelog:
> >>> - when resamplefd is going to be kicked from userspace, don't register
> >>>   it again in KVM_IRQFD.  Tested against upstream kernel.
> >>>
> >>>  accel/kvm/kvm-all.c| 74 --
> >>>  accel/kvm/trace-events |  1 +
> >>>  hw/intc/ioapic.c   | 11 +--
> >>>  include/sysemu/kvm.h   |  4 +++
> >>>  4 files changed, 86 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> >>> index d49b74512a..b766b6e93c 100644
> >>> --- a/accel/kvm/kvm-all.c
> >>> +++ b/accel/kvm/kvm-all.c
> >>> @@ -159,9 +159,62 @@ static const KVMCapabilityInfo 
> >>> kvm_required_capabilites[] = {
> >>>  static NotifierList kvm_irqchip_change_notifiers =
> >>>  NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
> >>>  
> >>> +struct KVMResampleFd {
> >>> +int gsi;
> >>> +EventNotifier *resample_event;
> >>> +QLIST_ENTRY(KVMResampleFd) node;
> >>> +};
> >>> +typedef struct KVMResampleFd KVMResampleFd;
> >>> +
> >>> +/*
> >>> + * Only used with split irqchip where we need to do the resample fd
> >>> + * kick for the kernel from userspace.
> >>> + */
> >>> +static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
> >>> +QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
> >>> +
> >>>  #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
> >>>  #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
> >>>  
> >>> +static inline void kvm_resample_fd_remove(int gsi)
> >>> +{
> >>> +KVMResampleFd *rfd;
> >>> +
> >>> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> >>> +if (rfd->gsi == gsi) {
> >>> +QLIST_REMOVE(rfd, node);
> >>> +break;
> >>> +}
> >>> +}
> >>> +}
> >>> +
> >>> +static inline void kvm_resample_fd_insert(int gsi, EventNotifier *event)
> >>> +{
> >>> +KVMResampleFd *rfd = g_new0(KVMResampleFd, 1);
> >>> +
> >>> +rfd->gsi = gsi;
> >>> +rfd->resample_event = event;
> >>> +
> >>> +QLIST_INSERT_HEAD(&kvm_resample_fd_list, rfd, n

Re: [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts

2020-02-27 Thread Niek Linnenbank
On Wed, Feb 26, 2020 at 7:12 PM Alex Bennée  wrote:

> There is no particular reason to use a static codegen buffer on 64 bit
> hosts as we have address space to burn. Allow the common CONFIG_USER
> case to use the mmap'ed buffers like SoftMMU.
>
> Signed-off-by: Alex Bennée 
> ---
>  accel/tcg/translate-all.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5b66af783b5..4ce5d1b3931 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -892,11 +892,12 @@ static void page_lock_pair(PageDesc **ret_p1,
> tb_page_addr_t phys1,
>  }
>  }
>
> -#if defined(CONFIG_USER_ONLY)
> -/* Currently it is not recommended to allocate big chunks of data in
> -   user mode. It will change when a dedicated libc will be used.  */
> -/* ??? 64-bit hosts ought to have no problem mmaping data outside the
> -   region in which the guest needs to run.  Revisit this.  */
> +#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
> +/*
> + * For user mode on smaller 32 bit systems we may run into trouble
> + * allocating big chunks of data in the right place. On these systems
> + * we utilise a static code generation buffer directly in the binary.
> + */
>  #define USE_STATIC_CODE_GEN_BUFFER
>  #endif
>
> --
> 2.20.1
>
>
> Reviewed-by: Niek Linnenbank 

-- 
Niek Linnenbank


Re: [PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-27 Thread Richard Henderson
On 2/25/20 2:35 AM, LIU Zhiwei wrote:
> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
> +{
> +int legal = widen ? 2 << s->lmul : 1 << s->lmul;
> +
> +return !((s->lmul == 0x3 && widen) || (reg % legal));
> +}
> +
> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm)
> +{
> +return !(s->lmul > 1 && vm == 0 && vd == 0);
> +}
> +
> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
> +{
> +return s->lmul * (nf + 1) <= 8;
> +}

Some commentary would be good here, quoting the rule being applied.  E.g. "The
destination vector register group for a masked vector instruction can only
overlap the source mask regis-
ter (v0) when LMUL=1. (Section 5.3)"

> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +uint8_t nf = a->nf + 1;

Perhaps NF should have the +1 done during decode, so that it cannot be
forgotten here or elsewhere.  E.g.

%nf  31:3  !function=ex_plus_1
@r2_nfvm ... ... vm:1 . . ... . ... \
 &r2nfvm %nf %rs1 %rd

Where ex_plus_1 is the obvious modification of ex_shift_1().

> +static inline uint32_t vext_nf(uint32_t desc)
> +{
> +return (simd_data(desc) >> 11) & 0xf;
> +}
> +
> +static inline uint32_t vext_mlen(uint32_t desc)
> +{
> +return simd_data(desc) & 0xff;
> +}
> +
> +static inline uint32_t vext_vm(uint32_t desc)
> +{
> +return (simd_data(desc) >> 8) & 0x1;
> +}
> +
> +static inline uint32_t vext_lmul(uint32_t desc)
> +{
> +return (simd_data(desc) >> 9) & 0x3;
> +}

You should use FIELD() to define the fields, and then use FIELD_EX32 and
FIELD_DP32 to reference them.

> +/*
> + * This function checks watchpoint before real load operation.
> + *
> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> + * In user mode, there is no watchpoint support now.
> + *
> + * It will triggle an exception if there is no mapping in TLB

trigger.

> + * and page table walk can't fill the TLB entry. Then the guest
> + * software can return here after process the exception or never return.
> + */
> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
> +target_ulong len, uintptr_t ra)
> +{
> +while (len) {
> +const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> +const target_ulong curlen = MIN(pagelen, len);
> +
> +probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
> +addr += curlen;
> +len -= curlen;
> +}
> +}
> +
> +static void probe_write_access(CPURISCVState *env, target_ulong addr,
> +target_ulong len, uintptr_t ra)
> +{
> +while (len) {
> +const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> +const target_ulong curlen = MIN(pagelen, len);
> +
> +probe_write(env, addr, curlen, cpu_mmu_index(env, false), ra);
> +addr += curlen;
> +len -= curlen;
> +}
> +}

A loop is overkill -- the access cannot span to 3 pages.  These two functions
can be merged using probe_access and MMU_DATA_{LOAD,STORE}.

> +
> +#ifdef HOST_WORDS_BIGENDIAN
> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> +{
> +/*
> + * Split the remaining range to two parts.
> + * The first part is in the last uint64_t unit.
> + * The second part start from the next uint64_t unit.
> + */
> +int part1 = 0, part2 = tot - cnt;
> +if (cnt % 64) {
> +part1 = 64 - (cnt % 64);
> +part2 = tot - cnt - part1;
> +memset(tail & ~(63ULL), 0, part1);
> +memset((tail + 64) & ~(63ULL), 0, part2);

You're confusing bit and byte offsets -- cnt and tot are both byte offsets.

> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> +{
> +
> +int idx = (index * mlen) / 8;
> +int pos = (index * mlen) % 8;
> +
> +switch (mlen) {
> +case 8:
> +return *((uint8_t *)v0 + H1(index)) & 0x1;
> +case 16:
> +return *((uint16_t *)v0 + H2(index)) & 0x1;
> +case 32:
> +return *((uint32_t *)v0 + H4(index)) & 0x1;
> +case 64:
> +return *((uint64_t *)v0 + index) & 0x1;
> +default:
> +return (*((uint8_t *)v0 + H1(idx)) >> pos) & 0x1;
> +}

This is not what I had in mind, and looks wrong as well.

int idx = (index * mlen) / 64;
int pos = (index * mlen) % 64;
return (((uint64_t *)v0)[idx] >> pos) & 1;

You also might consider passing log2(mlen), so the multiplication could be
strength-reduced to a shift.

> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)  \
> +static void vext_##NAME##_ld_elem(CPURISCVState *env, abi_ptr addr, \
> +uint32_t idx, void *vd, uintptr_t retaddr)  \
> +{   \
> +int mmu_idx = cpu_mmu_index(env, false);\
> +MTYPE data; \
> +ETYPE *cur = ((ETYPE *)vd + H(idx));

Re: [PATCH] migration/savevm: release gslist after dump_vmstate_json

2020-02-27 Thread Philippe Mathieu-Daudé

Correcting Zhang email.

On 2/19/20 10:47 AM, pannengy...@huawei.com wrote:

From: Pan Nengyuan 

'list' forgot to free at the end of dump_vmstate_json_to_file(), although it's 
called only once, but seems like a clean code.

Fix the leak as follow:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
 #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768)
 #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445)
 #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066)
 #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139)
 #4 0x5585db591581 in object_class_get_list_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1084
 #5 0x5585db590f66 in object_class_foreach_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1028
 #6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7)
 #7 0x5585db59110c in object_class_foreach 
/mnt/sdb/qemu-new/qemu/qom/object.c:1038
 #8 0x5585db5916b6 in object_class_get_list 
/mnt/sdb/qemu-new/qemu/qom/object.c:1092
 #9 0x5585db335ca0 in dump_vmstate_json_to_file 
/mnt/sdb/qemu-new/qemu/migration/savevm.c:638
 #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420
 #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308
 #12 0x5585da29420d in _start 
(/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d)

Indirect leak of 7472 byte(s) in 467 object(s) allocated from:
 #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768)
 #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445)
 #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066)
 #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139)
 #4 0x5585db591581 in object_class_get_list_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1084
 #5 0x5585db590f66 in object_class_foreach_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1028
 #6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7)
 #7 0x5585db59110c in object_class_foreach 
/mnt/sdb/qemu-new/qemu/qom/object.c:1038
 #8 0x5585db5916b6 in object_class_get_list 
/mnt/sdb/qemu-new/qemu/qom/object.c:1092
 #9 0x5585db335ca0 in dump_vmstate_json_to_file 
/mnt/sdb/qemu-new/qemu/migration/savevm.c:638
 #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420
 #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308
 #12 0x5585da29420d in _start 
(/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d)

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
---
  migration/savevm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index f19cb9ec7a..60e6ea8a8d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -665,6 +665,7 @@ void dump_vmstate_json_to_file(FILE *out_file)
  }
  fprintf(out_file, "\n}\n");
  fclose(out_file);
+g_slist_free(list);
  }
  
  static uint32_t calculate_new_instance_id(const char *idstr)







Re: [PULL 0/4] NBD patches for 2020-02-26

2020-02-27 Thread Peter Maydell
On Thu, 27 Feb 2020 at 01:56, Eric Blake  wrote:
>
> The following changes since commit db736e0437aa6fd7c1b7e4599c17f9619ab6b837:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2020-02-25 13:31:16 +)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/ericb.git tags/pull-nbd-2020-02-26
>
> for you to fetch changes up to 8198cf5ef0ef98118b4176970d1cd998d93ec849:
>
>   block/nbd: fix memory leak in nbd_open() (2020-02-26 17:29:00 -0600)
>
> 
> nbd patches for 2020-02-26
>
> - ensure multiple meta contexts work
> - allow leading / in export names
> - fix a failure path memory leak
>
> 
> Eric Blake (2):
>   nbd: Fix regression with multiple meta contexts
>   nbd-client: Support leading / in NBD URI
>
> Pan Nengyuan (2):
>   block/nbd: extract the common cleanup code
>   block/nbd: fix memory leak in nbd_open()
>
>  block/nbd.c  | 33 -
>  nbd/server.c | 12 ++--
>  2 files changed, 30 insertions(+), 15 deletions(-)


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM



[PATCH] qom/object: Comment to use g_slist_free on object_class_get_list result

2020-02-27 Thread Philippe Mathieu-Daudé
Document the list returned by object_class_get_list() must be
released with g_slist_free() to avoid memory leaks.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qom/object.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index 29546496c1..5517b56508 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -984,6 +984,9 @@ void object_class_foreach(void (*fn)(ObjectClass *klass, 
void *opaque),
  * @include_abstract: Whether to include abstract classes.
  *
  * Returns: A singly-linked list of the classes in reverse hashtable order.
+ *
+ * The returned list must be released with g_slist_free()
+ * when no longer required.
  */
 GSList *object_class_get_list(const char *implements_type,
   bool include_abstract);
@@ -995,6 +998,9 @@ GSList *object_class_get_list(const char *implements_type,
  *
  * Returns: A singly-linked list of the classes in alphabetical
  * case-insensitive order.
+ *
+ * The returned list must be released with g_slist_free()
+ * when no longer required.
  */
 GSList *object_class_get_list_sorted(const char *implements_type,
   bool include_abstract);
-- 
2.21.1




Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit

2020-02-27 Thread Niek Linnenbank
Hi Richard,

On Thu, Feb 27, 2020 at 1:57 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 2/27/20 4:31 AM, Alex Bennée wrote:
> >> It does not make sense for a linux-user chroot, running make -jN, on
> just about
> >> any host.  For linux-user, I could be happy with a modest increase, but
> not all
> >> the way out to 2GiB.
> >>
> >> Discuss.
> >
> > Does it matter that much? Surely for small programs the kernel just
> > never pages in the used portions of the mmap?
>
> That's why I used the example of a build under the chroot, because the
> compiler
> is not a small program.
>
> Consider when the memory *is* used, and N * 2GB implies lots of paging,
> where
> the previous N * 32MB did not.
>
> I agree that a lower default value probably is safer until we have more
proof that a larger value does not give any issues.


> I'm saying that we should consider a setting more like 128MB or so, since
> the
> value cannot be changed from the command-line, or through the environment.
>

Proposal: can we then introduce a new command line parameter for this?
Maybe in a new patch?
Since the size of the code generation buffer appears to have an impact on
performance,
in my opinion it would make sense to make it configurable by the user.

Regards,
Niek


>
>
> r~
>
>

-- 
Niek Linnenbank


Re: [PATCH] migration/savevm: release gslist after dump_vmstate_json

2020-02-27 Thread Philippe Mathieu-Daudé

On 2/19/20 10:59 AM, Dr. David Alan Gilbert wrote:

* pannengy...@huawei.com (pannengy...@huawei.com) wrote:

From: Pan Nengyuan 

'list' forgot to free at the end of dump_vmstate_json_to_file(), although it's 
called only once, but seems like a clean code.

Fix the leak as follow:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
 #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768)
 #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445)
 #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066)
 #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139)
 #4 0x5585db591581 in object_class_get_list_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1084
 #5 0x5585db590f66 in object_class_foreach_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1028
 #6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7)
 #7 0x5585db59110c in object_class_foreach 
/mnt/sdb/qemu-new/qemu/qom/object.c:1038
 #8 0x5585db5916b6 in object_class_get_list 
/mnt/sdb/qemu-new/qemu/qom/object.c:1092
 #9 0x5585db335ca0 in dump_vmstate_json_to_file 
/mnt/sdb/qemu-new/qemu/migration/savevm.c:638
 #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420
 #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308
 #12 0x5585da29420d in _start 
(/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d)

Indirect leak of 7472 byte(s) in 467 object(s) allocated from:
 #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768)
 #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445)
 #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066)
 #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139)
 #4 0x5585db591581 in object_class_get_list_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1084
 #5 0x5585db590f66 in object_class_foreach_tramp 
/mnt/sdb/qemu-new/qemu/qom/object.c:1028
 #6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7)
 #7 0x5585db59110c in object_class_foreach 
/mnt/sdb/qemu-new/qemu/qom/object.c:1038
 #8 0x5585db5916b6 in object_class_get_list 
/mnt/sdb/qemu-new/qemu/qom/object.c:1092
 #9 0x5585db335ca0 in dump_vmstate_json_to_file 
/mnt/sdb/qemu-new/qemu/migration/savevm.c:638
 #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420
 #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308
 #12 0x5585da29420d in _start 
(/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d)

Reported-by: Euler Robot 


Good robot!


Unfortunately it doesn't generate the documentation along...

Reviewed-by: Philippe Mathieu-Daudé 




Signed-off-by: Pan Nengyuan 
---
  migration/savevm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index f19cb9ec7a..60e6ea8a8d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -665,6 +665,7 @@ void dump_vmstate_json_to_file(FILE *out_file)
  }
  fprintf(out_file, "\n}\n");
  fclose(out_file);
+g_slist_free(list);


Reviewed-by: Dr. David Alan Gilbert 


  }
  
  static uint32_t calculate_new_instance_id(const char *idstr)

--
2.18.2


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK







Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit

2020-02-27 Thread Niek Linnenbank
Hi Alex,

On Thu, Feb 27, 2020 at 1:19 PM Alex Bennée  wrote:

>
> Niek Linnenbank  writes:
>
> > Hi Alex,
> >
> > On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée 
> wrote:
> >
> >> While 32mb is certainly usable a full system boot ends up flushing the
> >> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
> >> to take advantage of all that spare memory. After this change I can
> >> boot my tests system without any TB flushes.
> >>
> >
> > That great, with this change I'm seeing a performance improvement when
> > running the avocado tests for cubieboard.
> > It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit.
> >
> > I don't know much about the internals of TCG nor how it actually uses the
> > cache,
> > but it seems logical to me that increasing the cache size would improve
> > performance.
> >
> > What I'm wondering is: will this also result in TCG translating larger
> > chunks in one shot, so potentially
> > taking more time to do the translation? If so, could it perhaps affect
> more
> > latency sensitive code?
>
> No - the size of the translation blocks is governed by the guest code
> and where it ends a basic block. In system mode we also care about
> crossing guest page boundaries.
>
> >> Signed-off-by: Alex Bennée 
> >>
> > Tested-by: Niek Linnenbank 
> >
> >
> >> ---
> >>  accel/tcg/translate-all.c | 4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >> index 4ce5d1b3931..f7baa512059 100644
> >> --- a/accel/tcg/translate-all.c
> >> +++ b/accel/tcg/translate-all.c
> >> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1,
> >> tb_page_addr_t phys1,
> >>  # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
> >>  #endif
> >>
> >> +#if TCG_TARGET_REG_BITS == 32
> >>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
> >> +#else
> >> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
> >> +#endif
> >>
> >
> > The qemu process now takes up more virtual memory, about ~2.5GiB in my
> > test, which can be expected with this change.
> >
> > Is it very likely that the TCG cache will be filled quickly and
> completely?
> > I'm asking because I also use Qemu to do automated testing
> > where the nodes are 64-bit but each have only 2GiB physical RAM.
>
> Well so this is the interesting question and as ever it depends.
>
> For system emulation the buffer will just slowly fill-up over time until
> exhausted and which point it will flush and reset. Each time the guest
> needs to flush a page and load fresh code in we will generate more
> translated code. If the guest isn't under load and never uses all it's
> RAM for code then in theory the pages of the mmap that are never filled
> never need to be actualised by the host kernel.
>
> You can view the behaviour by running "info jit" from the HMP monitor in
> your tests. The "TB Flush" value shows the number of times this has
> happened along with other information about translation state.
>

Thanks for clarifying this, now it all starts to make more sense to me.

Regards,
Niek


>
> >
> > Regards,
> > Niek
> >
> >
> >>
> >>  #define DEFAULT_CODE_GEN_BUFFER_SIZE \
> >>(DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
> >> --
> >> 2.20.1
> >>
> >>
> >>
>
>
> --
> Alex Bennée
>


-- 
Niek Linnenbank


Re: [PATCH 3/3] iotests/138: Test leaks/corruptions fixed report

2020-02-27 Thread Eric Blake

On 2/27/20 11:02 AM, Max Reitz wrote:

Test that qemu-img check reports the number of leaks and corruptions
fixed in its JSON report (after a successful run).

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/138 | 41 --
  tests/qemu-iotests/138.out | 14 +
  2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/138 b/tests/qemu-iotests/138
index 54b01046ad..25bfbd4cca 100755
--- a/tests/qemu-iotests/138
+++ b/tests/qemu-iotests/138
@@ -41,8 +41,10 @@ _supported_fmt qcow2
  _supported_proto file
  _supported_os Linux
  # With an external data file, data clusters are not refcounted
-# (and so qemu-img check does not check their refcount)
-_unsupported_imgopts data_file
+# (and so qemu-img check does not check their refcount);


Not this patch's problem, but is that a bug in 'qemu-img check' for not 
validating refcounts on an external data file?  Or is it merely this 
comment wording is not quite perfect?



+# we want to modify the refcounts, so we need them to have a specific
+# format (namely u16)
+_unsupported_imgopts data_file 'refcount_bits=\([^1]\|.\([^6]\|$\)\)'
  
  echo

  echo '=== Check on an image with a multiple of 2^32 clusters ==='
@@ -65,6 +67,41 @@ poke_file "$TEST_IMG" $((2048 + 8)) 
"\x00\x80\x00\x00\x00\x00\x00\x00"
  # allocate memory", we have an error showing that l2 entry is invalid.
  _check_test_img
  
+echo

+echo '=== Check leaks-fixed/corruptions-fixed report'
+echo
+
+# After leaks and corruptions were fixed, those numbers should be
+# reported by qemu-img check
+_make_test_img 64k
+
+# Allocate data cluster
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+reftable_ofs=$(peek_file_be "$TEST_IMG" 48 8)
+refblock_ofs=$(peek_file_be "$TEST_IMG" $reftable_ofs 8)
+
+# Introduce a leak: Make the image header's refcount 2
+poke_file "$TEST_IMG" "$refblock_ofs" "\x00\x02"


Why not use your brand-new poke_file_be "$TEST_IMG" "$refblock_ofs" 2 2


+
+l1_ofs=$(peek_file_be "$TEST_IMG" 40 8)
+
+# Introduce a corruption: Drop the COPIED flag from the (first) L1 entry
+l1_entry=$(peek_file_be "$TEST_IMG" $l1_ofs 8)
+l1_entry=$((l1_entry & ~(1 << 63)))
+poke_file_be "$TEST_IMG" $l1_ofs 8 $l1_entry


Yep, the new function makes this task easier.  (You could also just peek 
1 byte at $((l1_ofs+7)) then write it back out with poke_file 
"$TEST_IMG" $((l1_ofs + 7)) $(printf '\\x%02x' $((val & 0xfe)))", but 
that just doesn't look as nice)



+
+echo
+# Should print the number of corruptions and leaks fixed
+# (Filter out all JSON fields (recognizable by their four-space
+# indentation), but keep the "-fixed" fields (by removing two spaces
+# from their indentation))
+# (Also filter out the L1 entry, because why not)
+_check_test_img -r all --output=json \
+| sed -e 's/^  \(.*\)-fixed"/\1-fixed"/' \
+| grep -v '^' \
+| sed -e "s/\\<$(printf %x $l1_entry)\\>/L1_ENTRY_VALUE/"


sed | grep | sed can often be done with a single sed:

... | sed -e 's/^  \(.*\)-fixed"/\1-fixed"/' \
   -e '/^/d' \
   -e "s/\\..."

Using \\< and \\> in the sed regex is a GNUism; do we want this test to 
run on BSD?


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 2/3] iotests: Add poke_file_[bl]e functions

2020-02-27 Thread Eric Blake

On 2/27/20 11:02 AM, Max Reitz wrote:

Similarly to peek_file_[bl]e, we may want to write binary integers into
a file.  Currently, this often means messing around with poke_file and
raw binary strings.  I hope these functions make it a bit more
comfortable.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/common.rc | 37 
  1 file changed, 37 insertions(+)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 4c246c0450..604f837668 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -53,6 +53,43 @@ poke_file()
  printf "$3" | dd "of=$1" bs=1 "seek=$2" conv=notrunc &>/dev/null
  }
  
+# poke_file_le 'test.img' 512 2 65534

+poke_file_le()
+{


I like the interface.  However, the implementation is a bit bloated (but 
then again, that's why you cc'd me for review ;)



+local img=$1 ofs=$2 len=$3 val=$4 str=''
+
+for i in $(seq 0 $((len - 1))); do


No need to fork seq, when we can let bash do the iteration for us:

while ((len--)); do


+byte=$((val & 0xff))
+if [ $byte != 0 ]; then
+chr="$(printf "\x$(printf %x $byte)")"


Why are we doing two printf command substitutions instead of 1?


+else
+chr="\0"


Why do we have to special-case 0?  printf '\x00' does the right thing.


+fi
+str+="$chr"


I'd go with the faster str+=$(printf '\\x%02x' $((val & 0xff))), 
completely skipping the byte and chr variables.



+val=$((val >> 8))
+done
+
+poke_file "$img" "$ofs" "$str"
+}


So my version:

poke_file_le()
{
local img=$1 ofs=$2 len=$3 val=$4 str=
while ((len--)); do
str+=$(printf '\\x%02x' $((val & 0xff)))
val=$((val >> 8))
done
poke_file "$img" "$ofs" "$str"
}


+
+# poke_file_be 'test.img' 512 2 65279
+poke_file_be()
+{
+local img=$1 ofs=$2 len=$3 val=$4 str=''


And this one's even easier: we get big-endian for free from printf 
output, with a sed post-processing to add \x:


poke_file_be()
{
local str="$(printf "%0$(($3 * 2))x\n" $4 | sed 's/\(..\)/\\x\1/g')"
poke_file "$1" "$2" "$str"
}

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 3/3] savevm: check RAM is pagesize aligned

2020-02-27 Thread Juan Quintela
Marc-André Lureau  wrote:
> Hi Juan
>
> On Wed, Jan 8, 2020 at 2:08 PM Juan Quintela  wrote:
>>
>> Marc-André Lureau  wrote:
>> n> Check the host pointer is correctly aligned, otherwise we may fail
>> > during migration in ram_block_discard_range().
>> >
>> > Signed-off-by: Marc-André Lureau 
>>
>> Reviewed-by: Juan Quintela 
>>
>> queued
>>
>
> Did it get lost? thanks

I dropped it in the past, because it made "make check" for mips fail.
(I put it on my ToDo list to investigate and forgot about it)

But now it pass, go figure.

Included again.  Sorry.

Later, Juan.




Re: [PATCH V2 3/8] savevm: Don't call colo_init_ram_cache twice

2020-02-27 Thread Juan Quintela
zhanghailiang  wrote:
> This helper has been called twice which is wrong.
> Left the one where called while get COLO enable message
> from source side.
>
> Signed-off-by: zhanghailiang 

Reviewed-by: Juan Quintela 




Re: [PATCH V2 1/8] migration: fix COLO broken caused by a previous commit

2020-02-27 Thread Juan Quintela
zhanghailiang  wrote:
> This commit "migration: Create migration_is_running()" broke
> COLO. Becuase there is a process broken by this commit.
>
> colo_process_checkpoint
>  ->colo_do_checkpoint_transaction
>->migrate_set_block_enabled
>  ->qmp_migrate_set_capabilities
>
> It can be fixed by make COLO process as an exception,
> Maybe we need a better way to fix it.
>
> Cc: Juan Quintela 
> Signed-off-by: zhanghailiang 

oops sorry.

Reviewed-by: Juan Quintela 

queued.




[PATCH 1/2] qcow2: Make Qcow2AioTask store the full host offset

2020-02-27 Thread Alberto Garcia
The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
file_cluster_offset was storing was the full compressed cluster
descriptor (offset + size). This does not change with this patch
but it is documented now.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 68 +--
 1 file changed, 33 insertions(+), 35 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3c754f616b..b2c7c8255e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -74,7 +74,7 @@ typedef struct {
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t cluster_descriptor,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2037,7 +2037,7 @@ out:
 
 static coroutine_fn int
 qcow2_co_preadv_encrypted(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2064,16 +2064,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-ret = bdrv_co_pread(s->data_file,
-file_cluster_offset + offset_into_cluster(s, offset),
-bytes, buf, 0);
+ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
 if (ret < 0) {
 goto fail;
 }
 
-if (qcow2_co_decrypt(bs,
- file_cluster_offset + offset_into_cluster(s, offset),
- offset, buf, bytes) < 0)
+if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
 {
 ret = -EIO;
 goto fail;
@@ -2091,7 +2087,7 @@ typedef struct Qcow2AioTask {
 
 BlockDriverState *bs;
 QCow2ClusterType cluster_type; /* only for read */
-uint64_t file_cluster_offset;
+uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
 QEMUIOVector *qiov;
@@ -2104,7 +2100,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
AioTaskPool *pool,
AioTaskFunc func,
QCow2ClusterType cluster_type,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2119,7 +2115,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 .bs = bs,
 .cluster_type = cluster_type,
 .qiov = qiov,
-.file_cluster_offset = file_cluster_offset,
+.host_offset = host_offset,
 .offset = offset,
 .bytes = bytes,
 .qiov_offset = qiov_offset,
@@ -2128,7 +2124,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
  func == qcow2_co_preadv_task_entry ? "read" : "write",
- cluster_type, file_cluster_offset, offset, bytes,
+ cluster_type, host_offset, offset, bytes,
  qiov, qiov_offset);
 
 if (!pool) {
@@ -2142,13 +2138,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
  QCow2ClusterType cluster_type,
- uint64_t file_cluster_offset,
+ uint64_t host_offset,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset)
 {
 BDRVQcow2State *s = bs->opaque;
-int offset_in_cluster = offset_into_cluster(s, offset);
 
 switch (cluster_type) {
 case QCOW2_CLUSTER_ZERO_PLAIN:
@@ -2164,19 +2159,17 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
qiov, qiov_offset, 0);
 
 case QCOW2_CLUSTER_COMPRESSED:
-return qcow2_co_preadv_compressed(bs, file_cluster_offset,
+  

[PATCH 2/2] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

2020-02-27 Thread Alberto Garcia
qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h |  4 ++--
 block/qcow2-cluster.c | 38 ++
 block/qcow2.c | 24 +++-
 3 files changed, 31 insertions(+), 35 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0942126232..f47ef6ca4e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -679,8 +679,8 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
l1_index);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset);
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 78c95dfa16..498330bb09 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -496,10 +496,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
 
 
 /*
- * get_cluster_offset
+ * get_host_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk find the equivalent host
+ * offset in the qcow2 file and store it in *host_offset. Neither
+ * offset needs to be aligned to a cluster boundary.
+ *
+ * If the cluster is unallocated then *host_offset will be 0.
+ * If the cluster is compressed then *host_offset will contain the
+ * complete compressed cluster descriptor.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
@@ -511,12 +516,12 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset)
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
-uint64_t l1_index, l2_offset, *l2_slice;
+uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
 int c;
 unsigned int offset_in_cluster;
 uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -537,7 +542,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 bytes_needed = bytes_available;
 }
 
-*cluster_offset = 0;
+*host_offset = 0;
 
 /* seek to the l2 offset in the l1 table */
 
@@ -570,7 +575,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,7 +583,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
  * true */
 assert(nb_clusters <= INT_MAX);
 
-type = qcow2_get_cluster_type(bs, *cluster_offset);
+type = qcow2_get_cluster_type(bs, l2_entry);
 if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
 qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -599,41 +604,42 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Compressed clusters can only be processed one by one */
 c = 1;
-*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+*host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
 break;
 case QCOW2_CLUSTER_ZERO_PLAIN:
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,

[PATCH 0/2] Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

2020-02-27 Thread Alberto Garcia
Hi,

this is something I did while working on the subcluster series but
it's independent from it so I thought to send it already.

In short: qcow2_get_cluster_offset() returns a host cluster offset but
none of the callers actually wants the offset of the cluster, they
want the host offset into the cluster.

There's a pre-existing exception with compressed clusters. In this
case the returned value was overloaded to contain a cluster offset or
a compressed cluster descriptor, depending on the cluster type. This
is kind of ugly, and we could make it more explicit using a union or
something like that but I don't think it's worth the effort here, so I
just documented it.

Berto

Alberto Garcia (2):
  qcow2: Make Qcow2AioTask store the full host offset
  qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

 block/qcow2.h |  4 +--
 block/qcow2-cluster.c | 38 --
 block/qcow2.c | 74 ++-
 3 files changed, 55 insertions(+), 61 deletions(-)

-- 
2.20.1




Re: [PATCH 1/3] qemu-img: Fix check's leak/corruption fix report

2020-02-27 Thread Eric Blake

On 2/27/20 11:02 AM, Max Reitz wrote:

There are two problems with qemu-img check's report on how many leaks
and/or corruptions have been fixed:

(1) ImageCheck.has_leaks_fixed and ImageCheck.has_corruptions_fixed are
only true when ImageCheck.leaks or ImageCheck.corruptions (respectively)
are non-zero.  qcow2's check implementation will set the latter to zero
after it has fixed leaks and corruptions, though, so leaks-fixed and
corruptions-fixed are actually never reported after successful repairs.
We should always report them when they are non-zero, just like all the
other fields of ImageCheck.

(2) After something has been fixed and we run the check a second time,
leaks_fixed and corruptions_fixed are taken from the first run; but
has_leaks_fixed and has_corruptions_fixed are not.  The second run
actually cannot fix anything, so with (1) fixed, has_leaks_fixed and
has_corruptions_fixed will always be false here.  (With (1) unfixed,
they will at least be false on successful runs, because then the number
of leaks and corruptions found in the second run should be 0.)
We should save has_leaks_fixed and has_corruptions_fixed just like we
save leaks_fixed and corruptions_fixed.

Signed-off-by: Max Reitz 
---
  qemu-img.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Auger Eric
Hi Peter,

On 2/27/20 7:00 PM, Peter Xu wrote:
> On Thu, Feb 27, 2020 at 06:42:09PM +0100, Auger Eric wrote:
>> Hi Peter,
>>
>> On 2/27/20 6:00 PM, Peter Xu wrote:
>>> This is majorly only for X86 because that's the only one that supports
>>> split irqchip for now.
>>>
>>> When the irqchip is split, we face a dilemma that KVM irqfd will be
>>> enabled, however the slow irqchip is still running in the userspace.
>>> It means that the resamplefd in the kernel irqfds won't take any
>>> effect and it can miss to ack INTx interrupts on EOIs.
>> Won't it always fail to ack INTx? With the above sentence I understand
>> it can work sometimes?
> 
> I wanted to mean that it will fail.  How about s/can/will/?  Or even
> better wordings that you'd suggest?
yes: s/can/will
> 
>>>
>>> One example is split irqchip with VFIO INTx, which will break if we
>>> use the VFIO INTx fast path.
>>>
>>> This patch can potentially supports the VFIO fast path again for INTx,
>>> that the IRQ delivery will still use the fast path, while we don't
>>> need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
>>> callers of vfio_eoi() hook).  However the EOI of the INTx will still
>>> need to be done from the userspace by caching all the resamplefds in
>>> QEMU and kick properly for IOAPIC EOI broadcast.
>> If I understand correctly this is a one way fast path? Fast path is on
>> the trigger side only: VFIO -> KVM but not on the deactivation side,
>> trapped by the userspace IOAPIC where you directly notify the UNMASK
>> eventfd from userspace. Is that correct?
> 
> Right, the injection is still using the whole fast path.  However
> AFAIU even for the EOI path it should still be faster than the pure
> slow path of vfio INTx EIO.  From what I got from reading the code,
> the slow path will conditionally unmap MMIO regions (with a timer to
> delay the recovery) so all MMIOs will be slowed down.  For what this
> patch is doing, it will need to exit to userspace for sure for each
> EOI (after all IOAPIC is in userspace), however for the whole
> lifecycle of the device, the MMIO regions should always be mapped so
> no unwanted MMIO traps.
Yes the EOI is trapped on IOAPIC side and not at the BAR level. So it
should be more efficient and more precise.
> 
>>>
>>> When the userspace is responsible for the resamplefd kickup, don't
>>> register it on the kvm_irqfd anymore, because on newer kernels (after
>>> commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
>>> irqchip and resamplefd.  This will make sure that the fast path will
>>> work for all supported kernels.
>>>
>>> https://patchwork.kernel.org/patch/10738541/#22609933
>>>
>>> Suggested-by: Paolo Bonzini 
>>> Signed-off-by: Peter Xu 
>>> ---
>>> v1.1 changelog:
>>> - when resamplefd is going to be kicked from userspace, don't register
>>>   it again in KVM_IRQFD.  Tested against upstream kernel.
>>>
>>>  accel/kvm/kvm-all.c| 74 --
>>>  accel/kvm/trace-events |  1 +
>>>  hw/intc/ioapic.c   | 11 +--
>>>  include/sysemu/kvm.h   |  4 +++
>>>  4 files changed, 86 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index d49b74512a..b766b6e93c 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -159,9 +159,62 @@ static const KVMCapabilityInfo 
>>> kvm_required_capabilites[] = {
>>>  static NotifierList kvm_irqchip_change_notifiers =
>>>  NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
>>>  
>>> +struct KVMResampleFd {
>>> +int gsi;
>>> +EventNotifier *resample_event;
>>> +QLIST_ENTRY(KVMResampleFd) node;
>>> +};
>>> +typedef struct KVMResampleFd KVMResampleFd;
>>> +
>>> +/*
>>> + * Only used with split irqchip where we need to do the resample fd
>>> + * kick for the kernel from userspace.
>>> + */
>>> +static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
>>> +QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
>>> +
>>>  #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
>>>  #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
>>>  
>>> +static inline void kvm_resample_fd_remove(int gsi)
>>> +{
>>> +KVMResampleFd *rfd;
>>> +
>>> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
>>> +if (rfd->gsi == gsi) {
>>> +QLIST_REMOVE(rfd, node);
>>> +break;
>>> +}
>>> +}
>>> +}
>>> +
>>> +static inline void kvm_resample_fd_insert(int gsi, EventNotifier *event)
>>> +{
>>> +KVMResampleFd *rfd = g_new0(KVMResampleFd, 1);
>>> +
>>> +rfd->gsi = gsi;
>>> +rfd->resample_event = event;
>>> +
>>> +QLIST_INSERT_HEAD(&kvm_resample_fd_list, rfd, node);
>>> +}
>>> +
>>> +void kvm_resample_fd_notify(int gsi)
>>> +{
>>> +KVMResampleFd *rfd;
>>> +
>>> +if (!kvm_irqchip_is_split()) {
>>> +return;
>>> +}
>>> +
>>> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
>>> +if (rfd->gsi == gsi) {
>>> +event_

[PATCH 2/2] block: bdrv_reopen() with backing file in different AioContext

2020-02-27 Thread Kevin Wolf
This patch allows bdrv_reopen() (and therefore the x-blockdev-reopen QMP
command) to attach a node as the new backing file even if the node is in
a different AioContext than the parent if one of both nodes can be moved
to the AioContext of the other node.

Signed-off-by: Kevin Wolf 
---
 block.c| 36 +++-
 tests/qemu-iotests/245 |  8 +++-
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index 202c67e1e8..5dbba6cf31 100644
--- a/block.c
+++ b/block.c
@@ -3781,6 +3781,29 @@ static void bdrv_reopen_perm(BlockReopenQueue *q, 
BlockDriverState *bs,
 *shared = cumulative_shared_perms;
 }
 
+static bool bdrv_reopen_can_attach(BdrvChild *child,
+   BlockDriverState *new_child,
+   BlockDriverState *parent,
+   Error **errp)
+{
+AioContext *parent_ctx = bdrv_get_aio_context(parent);
+AioContext *child_ctx = bdrv_get_aio_context(new_child);
+GSList *ignore;
+bool ret;
+
+ignore = g_slist_prepend(NULL, child);
+ret = bdrv_can_set_aio_context(new_child, parent_ctx, &ignore, NULL);
+g_slist_free(ignore);
+if (ret) {
+return ret;
+}
+
+ignore = g_slist_prepend(NULL, child);
+ret = bdrv_can_set_aio_context(parent, child_ctx, &ignore, errp);
+g_slist_free(ignore);
+return ret;
+}
+
 /*
  * Take a BDRVReopenState and check if the value of 'backing' in the
  * reopen_state->options QDict is valid or not.
@@ -3832,16 +3855,11 @@ static int bdrv_reopen_parse_backing(BDRVReopenState 
*reopen_state,
 }
 
 /*
- * TODO: before removing the x- prefix from x-blockdev-reopen we
- * should move the new backing file into the right AioContext
- * instead of returning an error.
+ * Check AioContext compatibility so that the bdrv_set_backing_hd() call in
+ * bdrv_reopen_commit() won't fail.
  */
-if (new_backing_bs) {
-if (bdrv_get_aio_context(new_backing_bs) != bdrv_get_aio_context(bs)) {
-error_setg(errp, "Cannot use a new backing file "
-   "with a different AioContext");
-return -EINVAL;
-}
+if (!bdrv_reopen_can_attach(bs->backing, bs, new_backing_bs, errp)) {
+return -EINVAL;
 }
 
 /*
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 5a2cd5ed0e..d6135ec14d 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -1010,18 +1010,16 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 # neither of them can switch to the other AioContext
 def test_iothreads_error(self):
 self.run_test_iothreads('iothread0', 'iothread1',
-"Cannot use a new backing file with a 
different AioContext")
+"Cannot change iothread of active block 
backend")
 
 def test_iothreads_compatible_users(self):
 self.run_test_iothreads('iothread0', 'iothread0')
 
 def test_iothreads_switch_backing(self):
-self.run_test_iothreads('iothread0', None,
-"Cannot use a new backing file with a 
different AioContext")
+self.run_test_iothreads('iothread0', None)
 
 def test_iothreads_switch_overlay(self):
-self.run_test_iothreads(None, 'iothread0',
-"Cannot use a new backing file with a 
different AioContext")
+self.run_test_iothreads(None, 'iothread0')
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=["qcow2"],
-- 
2.20.1




[PATCH 0/2] block: bdrv_reopen() with backing file in different AioContext

2020-02-27 Thread Kevin Wolf
Kevin Wolf (2):
  iotests: Refactor blockdev-reopen test for iothreads
  block: bdrv_reopen() with backing file in different AioContext

 block.c| 36 +-
 tests/qemu-iotests/245 | 40 --
 tests/qemu-iotests/245.out |  4 ++--
 3 files changed, 59 insertions(+), 21 deletions(-)

-- 
2.20.1




[PATCH 1/2] iotests: Refactor blockdev-reopen test for iothreads

2020-02-27 Thread Kevin Wolf
We'll want to test more than one successful case in the future, so
prepare the test for that by a refactoring that runs each scenario in a
separate VM.

test_iothreads_switch_{backing,overlay} currently produce errors, but
these are cases that should actually work, by switching either the
backing file node or the overlay node to the AioContext of the other
node.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/245 | 42 +-
 tests/qemu-iotests/245.out |  4 ++--
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 489bf78bd0..5a2cd5ed0e 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -970,8 +970,7 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 self.assertEqual(self.get_node('hd1'), None)
 self.assert_qmp(self.get_node('hd2'), 'ro', True)
 
-# We don't allow setting a backing file that uses a different AioContext
-def test_iothreads(self):
+def run_test_iothreads(self, iothread_a, iothread_b, errmsg = None):
 opts = hd_opts(0)
 result = self.vm.qmp('blockdev-add', conv_keys = False, **opts)
 self.assert_qmp(result, 'return', {})
@@ -986,20 +985,43 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 result = self.vm.qmp('object-add', qom_type='iothread', id='iothread1')
 self.assert_qmp(result, 'return', {})
 
-result = self.vm.qmp('x-blockdev-set-iothread', node_name='hd0', 
iothread='iothread0')
+result = self.vm.qmp('device_add', driver='virtio-scsi', id='scsi0',
+ iothread=iothread_a)
 self.assert_qmp(result, 'return', {})
 
-self.reopen(opts, {'backing': 'hd2'}, "Cannot use a new backing file 
with a different AioContext")
-
-result = self.vm.qmp('x-blockdev-set-iothread', node_name='hd2', 
iothread='iothread1')
+result = self.vm.qmp('device_add', driver='virtio-scsi', id='scsi1',
+ iothread=iothread_b)
 self.assert_qmp(result, 'return', {})
 
-self.reopen(opts, {'backing': 'hd2'}, "Cannot use a new backing file 
with a different AioContext")
+if iothread_a:
+result = self.vm.qmp('device_add', driver='scsi-hd', drive='hd0',
+ share_rw=True, bus="scsi0.0")
+self.assert_qmp(result, 'return', {})
 
-result = self.vm.qmp('x-blockdev-set-iothread', node_name='hd2', 
iothread='iothread0')
-self.assert_qmp(result, 'return', {})
+if iothread_b:
+result = self.vm.qmp('device_add', driver='scsi-hd', drive='hd2',
+ share_rw=True, bus="scsi1.0")
+self.assert_qmp(result, 'return', {})
 
-self.reopen(opts, {'backing': 'hd2'})
+self.reopen(opts, {'backing': 'hd2'}, errmsg)
+self.vm.shutdown()
+
+# We don't allow setting a backing file that uses a different AioContext if
+# neither of them can switch to the other AioContext
+def test_iothreads_error(self):
+self.run_test_iothreads('iothread0', 'iothread1',
+"Cannot use a new backing file with a 
different AioContext")
+
+def test_iothreads_compatible_users(self):
+self.run_test_iothreads('iothread0', 'iothread0')
+
+def test_iothreads_switch_backing(self):
+self.run_test_iothreads('iothread0', None,
+"Cannot use a new backing file with a 
different AioContext")
+
+def test_iothreads_switch_overlay(self):
+self.run_test_iothreads(None, 'iothread0',
+"Cannot use a new backing file with a 
different AioContext")
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=["qcow2"],
diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out
index a19de5214d..682b93394d 100644
--- a/tests/qemu-iotests/245.out
+++ b/tests/qemu-iotests/245.out
@@ -1,6 +1,6 @@
-..
+.
 --
-Ran 18 tests
+Ran 21 tests
 
 OK
 {"execute": "job-finalize", "arguments": {"id": "commit0"}}
-- 
2.20.1




Re: [PULL 00/19] testing and plugin updates

2020-02-27 Thread Peter Maydell
On Wed, 26 Feb 2020 at 07:39, Alex Bennée  wrote:
>
> The following changes since commit db736e0437aa6fd7c1b7e4599c17f9619ab6b837:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2020-02-25 13:31:16 +)
>
> are available in the Git repository at:
>
>   https://github.com/stsquad/qemu.git tags/pull-testing-and-plugins-250220-1
>
> for you to fetch changes up to bc97f9f64f8a4a84d0d06949749e9dbec143b9f5:
>
>   tests/tcg: take into account expected clashes pauth-4 (2020-02-25 20:20:23 
> +)
>
> 
> Testing and plugin updates:
>
>  - fix pauth TCG tests
>  - tweak away rcutorture failures
>  - various Travis updates
>  - relax iotest size check a little
>  - fix for -trace/-D clash
>  - fix cross compile detection for tcg tests
>  - document plugin query lifetime
>  - fix missing break in plugin core
>  - fix some plugin warnings
>  - better progressive instruction decode
>  - avoid trampling vaddr in plugins
>



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM



Re: [PATCH] migration/savevm: release gslist after dump_vmstate_json

2020-02-27 Thread Juan Quintela
 wrote:
> From: Pan Nengyuan 
>
> 'list' forgot to free at the end of dump_vmstate_json_to_file(), although 
> it's called only once, but seems like a clean code.
>
> Fix the leak as follow:
> Direct leak of 16 byte(s) in 1 object(s) allocated from:
> #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768)
> #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445)
> #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066)
> #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139)
> #4 0x5585db591581 in object_class_get_list_tramp 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1084
> #5 0x5585db590f66 in object_class_foreach_tramp 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1028
> #6 0x7fb945eb35f7 in g_hash_table_foreach 
> (/lib64/libglib-2.0.so.0+0x3b5f7)
> #7 0x5585db59110c in object_class_foreach 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1038
> #8 0x5585db5916b6 in object_class_get_list 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1092
> #9 0x5585db335ca0 in dump_vmstate_json_to_file 
> /mnt/sdb/qemu-new/qemu/migration/savevm.c:638
> #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420
> #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308
> #12 0x5585da29420d in _start 
> (/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d)
>
> Indirect leak of 7472 byte(s) in 467 object(s) allocated from:
> #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768)
> #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445)
> #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066)
> #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139)
> #4 0x5585db591581 in object_class_get_list_tramp 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1084
> #5 0x5585db590f66 in object_class_foreach_tramp 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1028
> #6 0x7fb945eb35f7 in g_hash_table_foreach 
> (/lib64/libglib-2.0.so.0+0x3b5f7)
> #7 0x5585db59110c in object_class_foreach 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1038
> #8 0x5585db5916b6 in object_class_get_list 
> /mnt/sdb/qemu-new/qemu/qom/object.c:1092
> #9 0x5585db335ca0 in dump_vmstate_json_to_file 
> /mnt/sdb/qemu-new/qemu/migration/savevm.c:638
> #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420
> #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308
> #12 0x5585da29420d in _start 
> (/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d)
>
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 

Reviewed-by: Juan Quintela 




Re: [PATCH v2] test-vmstate: Fix memleaks in test_load_qlist

2020-02-27 Thread Juan Quintela
 wrote:
> From: Chen Qun 
>
> There is memleak in test_load_qlist().It's not a big deal,
> but test-vmstate will fail if sanitizers is enabled.
>
> In addition, "ret" is written twice with the same value
>  in test_gtree_load_iommu().
>
> Reported-by: Euler Robot 
> Signed-off-by: Chen Qun 

Reviewed-by: Juan Quintela 

queued




Re: [PATCH v2 10/13] migration/vmstate: Remove redundant statement in vmstate_save_state_v()

2020-02-27 Thread Juan Quintela
 wrote:
> From: Chen Qun 
>
> The "ret" has been assigned in all branches. It didn't need to be
>  assigned separately.
>
> Clang static code analyzer show warning:
>   migration/vmstate.c:365:17: warning: Value stored to 'ret' is never read
> ret = 0;
> ^ ~
>
> Reported-by: Euler Robot 
> Signed-off-by: Chen Qun 

I thought I had already reviewed it.

Reviewed-by: Juan Quintela 
queued




Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Peter Xu
On Thu, Feb 27, 2020 at 06:42:09PM +0100, Auger Eric wrote:
> Hi Peter,
> 
> On 2/27/20 6:00 PM, Peter Xu wrote:
> > This is majorly only for X86 because that's the only one that supports
> > split irqchip for now.
> > 
> > When the irqchip is split, we face a dilemma that KVM irqfd will be
> > enabled, however the slow irqchip is still running in the userspace.
> > It means that the resamplefd in the kernel irqfds won't take any
> > effect and it can miss to ack INTx interrupts on EOIs.
> Won't it always fail to ack INTx? With the above sentence I understand
> it can work sometimes?

I wanted to mean that it will fail.  How about s/can/will/?  Or even
better wordings that you'd suggest?

> > 
> > One example is split irqchip with VFIO INTx, which will break if we
> > use the VFIO INTx fast path.
> > 
> > This patch can potentially supports the VFIO fast path again for INTx,
> > that the IRQ delivery will still use the fast path, while we don't
> > need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
> > callers of vfio_eoi() hook).  However the EOI of the INTx will still
> > need to be done from the userspace by caching all the resamplefds in
> > QEMU and kick properly for IOAPIC EOI broadcast.
> If I understand correctly this is a one way fast path? Fast path is on
> the trigger side only: VFIO -> KVM but not on the deactivation side,
> trapped by the userspace IOAPIC where you directly notify the UNMASK
> eventfd from userspace. Is that correct?

Right, the injection is still using the whole fast path.  However
AFAIU even for the EOI path it should still be faster than the pure
slow path of vfio INTx EIO.  From what I got from reading the code,
the slow path will conditionally unmap MMIO regions (with a timer to
delay the recovery) so all MMIOs will be slowed down.  For what this
patch is doing, it will need to exit to userspace for sure for each
EOI (after all IOAPIC is in userspace), however for the whole
lifecycle of the device, the MMIO regions should always be mapped so
no unwanted MMIO traps.

> > 
> > When the userspace is responsible for the resamplefd kickup, don't
> > register it on the kvm_irqfd anymore, because on newer kernels (after
> > commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
> > irqchip and resamplefd.  This will make sure that the fast path will
> > work for all supported kernels.
> > 
> > https://patchwork.kernel.org/patch/10738541/#22609933
> > 
> > Suggested-by: Paolo Bonzini 
> > Signed-off-by: Peter Xu 
> > ---
> > v1.1 changelog:
> > - when resamplefd is going to be kicked from userspace, don't register
> >   it again in KVM_IRQFD.  Tested against upstream kernel.
> > 
> >  accel/kvm/kvm-all.c| 74 --
> >  accel/kvm/trace-events |  1 +
> >  hw/intc/ioapic.c   | 11 +--
> >  include/sysemu/kvm.h   |  4 +++
> >  4 files changed, 86 insertions(+), 4 deletions(-)
> > 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index d49b74512a..b766b6e93c 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -159,9 +159,62 @@ static const KVMCapabilityInfo 
> > kvm_required_capabilites[] = {
> >  static NotifierList kvm_irqchip_change_notifiers =
> >  NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
> >  
> > +struct KVMResampleFd {
> > +int gsi;
> > +EventNotifier *resample_event;
> > +QLIST_ENTRY(KVMResampleFd) node;
> > +};
> > +typedef struct KVMResampleFd KVMResampleFd;
> > +
> > +/*
> > + * Only used with split irqchip where we need to do the resample fd
> > + * kick for the kernel from userspace.
> > + */
> > +static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
> > +QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
> > +
> >  #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
> >  #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
> >  
> > +static inline void kvm_resample_fd_remove(int gsi)
> > +{
> > +KVMResampleFd *rfd;
> > +
> > +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> > +if (rfd->gsi == gsi) {
> > +QLIST_REMOVE(rfd, node);
> > +break;
> > +}
> > +}
> > +}
> > +
> > +static inline void kvm_resample_fd_insert(int gsi, EventNotifier *event)
> > +{
> > +KVMResampleFd *rfd = g_new0(KVMResampleFd, 1);
> > +
> > +rfd->gsi = gsi;
> > +rfd->resample_event = event;
> > +
> > +QLIST_INSERT_HEAD(&kvm_resample_fd_list, rfd, node);
> > +}
> > +
> > +void kvm_resample_fd_notify(int gsi)
> > +{
> > +KVMResampleFd *rfd;
> > +
> > +if (!kvm_irqchip_is_split()) {
> > +return;
> > +}
> > +
> > +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> > +if (rfd->gsi == gsi) {
> > +event_notifier_set(rfd->resample_event);
> > +trace_kvm_resample_fd_notify(gsi);
> > +break;
> > +}
> > +}
> > +}
> > +
> >  int kvm_get_max_memslots(void)
> >  {
> >  KVMSta

Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Auger Eric
Hi Peter,

On 2/27/20 6:00 PM, Peter Xu wrote:
> This is majorly only for X86 because that's the only one that supports
> split irqchip for now.
> 
> When the irqchip is split, we face a dilemma that KVM irqfd will be
> enabled, however the slow irqchip is still running in the userspace.
> It means that the resamplefd in the kernel irqfds won't take any
> effect and it can miss to ack INTx interrupts on EOIs.
Won't it always fail to ack INTx? With the above sentence I understand
it can work sometimes?
> 
> One example is split irqchip with VFIO INTx, which will break if we
> use the VFIO INTx fast path.
> 
> This patch can potentially supports the VFIO fast path again for INTx,
> that the IRQ delivery will still use the fast path, while we don't
> need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
> callers of vfio_eoi() hook).  However the EOI of the INTx will still
> need to be done from the userspace by caching all the resamplefds in
> QEMU and kick properly for IOAPIC EOI broadcast.
If I understand correctly this is a one way fast path? Fast path is on
the trigger side only: VFIO -> KVM but not on the deactivation side,
trapped by the userspace IOAPIC where you directly notify the UNMASK
eventfd from userspace. Is that correct?
> 
> When the userspace is responsible for the resamplefd kickup, don't
> register it on the kvm_irqfd anymore, because on newer kernels (after
> commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
> irqchip and resamplefd.  This will make sure that the fast path will
> work for all supported kernels.
> 
> https://patchwork.kernel.org/patch/10738541/#22609933
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Peter Xu 
> ---
> v1.1 changelog:
> - when resamplefd is going to be kicked from userspace, don't register
>   it again in KVM_IRQFD.  Tested against upstream kernel.
> 
>  accel/kvm/kvm-all.c| 74 --
>  accel/kvm/trace-events |  1 +
>  hw/intc/ioapic.c   | 11 +--
>  include/sysemu/kvm.h   |  4 +++
>  4 files changed, 86 insertions(+), 4 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d49b74512a..b766b6e93c 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -159,9 +159,62 @@ static const KVMCapabilityInfo 
> kvm_required_capabilites[] = {
>  static NotifierList kvm_irqchip_change_notifiers =
>  NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
>  
> +struct KVMResampleFd {
> +int gsi;
> +EventNotifier *resample_event;
> +QLIST_ENTRY(KVMResampleFd) node;
> +};
> +typedef struct KVMResampleFd KVMResampleFd;
> +
> +/*
> + * Only used with split irqchip where we need to do the resample fd
> + * kick for the kernel from userspace.
> + */
> +static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
> +QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
> +
>  #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
>  #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
>  
> +static inline void kvm_resample_fd_remove(int gsi)
> +{
> +KVMResampleFd *rfd;
> +
> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> +if (rfd->gsi == gsi) {
> +QLIST_REMOVE(rfd, node);
> +break;
> +}
> +}
> +}
> +
> +static inline void kvm_resample_fd_insert(int gsi, EventNotifier *event)
> +{
> +KVMResampleFd *rfd = g_new0(KVMResampleFd, 1);
> +
> +rfd->gsi = gsi;
> +rfd->resample_event = event;
> +
> +QLIST_INSERT_HEAD(&kvm_resample_fd_list, rfd, node);
> +}
> +
> +void kvm_resample_fd_notify(int gsi)
> +{
> +KVMResampleFd *rfd;
> +
> +if (!kvm_irqchip_is_split()) {
> +return;
> +}
> +
> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> +if (rfd->gsi == gsi) {
> +event_notifier_set(rfd->resample_event);
> +trace_kvm_resample_fd_notify(gsi);
> +break;
> +}
> +}
> +}
> +
>  int kvm_get_max_memslots(void)
>  {
>  KVMState *s = KVM_STATE(current_accel());
> @@ -1642,8 +1695,25 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, 
> EventNotifier *event,
>  };
>  
>  if (rfd != -1) {
> -irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
> -irqfd.resamplefd = rfd;
> +assert(assign);
> +if (kvm_irqchip_is_split()) {
> +/*
> + * When the slow irqchip (e.g. IOAPIC) is in the
> + * userspace, resamplefd will not work because the EOI of
> + * the interrupt will be delivered to userspace instead,
s/delivered to userspace/handled in userspace
> + * the KVM resample fd kick is skipped.  The userspace
> + * needs to remember the resamplefd and kick it when we
> + * receive EOI of this IRQ.
Practically we now talk about a VFIO ACTION_UNMASK classical eventfd
As such isn't it a bit weird to handle those normal UNMASK eventfds in
the KVM code?


> + */
> +  

Re: [PATCH v2 6/6] qga: Improve error report by calling error_setg_win32()

2020-02-27 Thread Philippe Mathieu-Daudé

On 2/27/20 6:20 PM, Marc-André Lureau wrote:

Hi

On Thu, Feb 27, 2020 at 5:32 PM Philippe Mathieu-Daudé
 wrote:


Use error_setg_win32() which adds a hint similar to strerror(errno)).

Signed-off-by: Philippe Mathieu-Daudé 
---
  qga/channel-win32.c  | 3 ++-
  qga/commands-win32.c | 6 +++---
  2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/qga/channel-win32.c b/qga/channel-win32.c
index 774205e017..4f04868a76 100644
--- a/qga/channel-win32.c
+++ b/qga/channel-win32.c
@@ -308,7 +308,8 @@ static gboolean ga_channel_open(GAChannel *c, 
GAChannelMethod method,
  }

  if (method == GA_CHANNEL_ISA_SERIAL && 
!SetCommTimeouts(c->handle,&comTimeOut)) {
-g_critical("error setting timeout for com port: %lu",GetLastError());
+g_autofree gchar *emsg = g_win32_error_message(GetLastError());
+g_critical("error setting timeout for com port: %s", emsg);
  CloseHandle(c->handle);
  return false;
  }
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 2461fd19bf..8e1f32ea23 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -315,8 +315,7 @@ void qmp_guest_shutdown(bool has_mode, const char *mode, 
Error **errp)
  }

  if (!ExitWindowsEx(shutdown_flag, SHTDN_REASON_FLAG_PLANNED)) {
-slog("guest-shutdown failed: %lu", GetLastError());
-error_setg(errp, QERR_UNDEFINED_ERROR);
+error_setg_win32(errp, GetLastError(), "guest-shutdown failed");


did you drop the slog() intentionally?


Oops no :(




  }
  }

@@ -1319,7 +1318,8 @@ static DWORD WINAPI do_suspend(LPVOID opaque)
  DWORD ret = 0;

  if (!SetSuspendState(*mode == GUEST_SUSPEND_MODE_DISK, TRUE, TRUE)) {
-slog("failed to suspend guest, %lu", GetLastError());
+g_autofree gchar *emsg = g_win32_error_message(GetLastError());
+slog("failed to suspend guest: %s", emsg);
  ret = -1;
  }
  g_free(mode);
--
2.21.1








[Bug 1865048] Re: qemu-img --force-share does not disable file locking

2020-02-27 Thread Max Reitz
Hi,

That’s intentional.  The man page says this:

   --force-share (-U)
   If specified, "qemu-img" will open the image in shared mode,
   allowing other QEMU processes to open it in write mode. For
   example, this can be used to get the image information (with
   'info' subcommand) when the image is used by a running guest.

It says nothing about not using file locks.  All it will do is cause
qemu-img to signal to other processes that it’s OK for them to use the
image in any way, or if there already is another process having opened
the image for any access, qemu-img will not complain.

For example, open a qemu-io process on some image:

$ qemu-io foo.qcow2

And in another shell, invoke qemu-img:

$ qemu-img info foo.qcow2
qemu-img: Could not open 'foo.qcow2': Failed to get shared "write" lock
Is another process using the image [foo.qcow2]?

$ qemu-img info --force-share foo.qcow2
image: foo.qcow2
file format: qcow2
[...]


force-share is evaluated in bdrv_child_perm in block.c.  The file locks qemu 
sets are an extension of the internal “permission” system we use for block 
nodes: For example, when some guest device wants write access to an image, it 
has to take the WRITE permission.  That will be disallowed if there is some 
other user of the image that does not allow taking the WRITE permission (we 
say: it “unshares” the WRITE permission).  force-share enforces sharing all 
permissions, but it does not disable the permission system.

The file locks are used to transmit that internal mechanism of
taking/sharing permissions across different processes.  Unshared
permissions are reflected by locks between offset 200 and 299. Taken
permissions are reflected by locks between offset 100 and 199.  As you
can see, qemu-img with --force-share does not unshare any permissions
(it does not take any locks after offset 200).  The only lock it takes
is offset 100, which corresponds to CONSISTENT_READ.  That permission
means “I want to read from the image and get back something sane”.  So
if any other process uses the image in such a way that this is
impossible (i.e., it has to unshare CONSISTENT_READ), qemu-img info will
complain, regardless of --force-share.


File locks can only be completely disabled through file-posix’s @locking option 
(locking=false), e.g. like so:

$ qemu-img info --image-opts file.filename=foo.qcow2,file.locking=off

But that is strongly discouraged, and I have yet to see a case where
this would be the right thing to do.

Max

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1865048

Title:
  qemu-img --force-share does not disable file locking

Status in QEMU:
  Invalid

Bug description:
  The new option "--force-share" for qemu-img does not disable file
  locking.

  I tried it with version qemu-img version 2.11.1(Debian 1:2.11+dfsg-
  1ubuntu7.21~cloud0) and I traced the source code of the current git
  trunk.

  Sample to demonstrate:

  # strace qemu-img info --force-share testfile.qcow2   2>&1 | grep F_RDLCK
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0
  fcntl(11, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, 
l_len=1}) = 0

  I traced the passing of the --force-share option through the source
  code (I used commit 6c599282f8 as of Mon Feb 17 13:32:25 2020 +)

  qemu-img.c:img_info()
  force_share = true;
  qemu-img.c:collect_image_info_list(force_share)
  qemu-img.c:img_open(force_share)
  qemu-img.c:img_open_file(force_share)
  qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true);
  block/block-backend.c:blk_new_open(options)
  block.c:bdrv_open(options)
  block.c:bdrv_open_inheritoptions()
  block.c:bdrv_open_common(options)
  bs->force_share = qemu_opt_get_bool(opts, BDRV_OPT_FORCE_SHARE, 
false);
  block.c:bdrv_open_driver(bs)
  include/block/block_int.h:int (*bdrv_file_open)(BlockDriverState *bs, QDict 
*options, int flags,
  block/file-posix.c:.bdrv_file_open = raw_open,
  block/file-posix.c:raw_open_common(bs)
  locking = qapi_enum_parse(&OnOffAuto_lookup,
qemu_opt_get(opts, "locking"),
ON_OFF_AUTO_AUTO, &local_err);
  ign

Re: [PATCH RISU] aarch64.risu: Add patterns for v8.3-RCPC and v8.4-RCPC insns

2020-02-27 Thread Peter Maydell
On Tue, 25 Feb 2020 at 20:32, Alex Bennée  wrote:
>
>
> Peter Maydell  writes:
>
> > Add patterns for the new instructions in the v8.3-RCPC and
> > v8.4-RCPC extensions.
> >
> > Signed-off-by: Peter Maydell 
> > ---
> > This is what I used for testing the RCPC QEMU patches I sent out
> > the other day. Did I get the @ section syntax here right?
>
> Yep ;-)
>
> Reviewed-by: Alex Bennée 

Thanks; applied to risu master.

-- PMM



Re: [PATCH v2 4/6] util/osdep: Improve error report by calling error_setg_win32()

2020-02-27 Thread Marc-André Lureau
On Thu, Feb 27, 2020 at 5:32 PM Philippe Mathieu-Daudé
 wrote:
>
> Use error_setg_win32() which adds a hint similar to strerror(errno)).
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Marc-André Lureau 


> ---
>  util/osdep.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/util/osdep.c b/util/osdep.c
> index ef40ae512a..144e217cb9 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -82,8 +82,8 @@ static int qemu_mprotect__osdep(void *addr, size_t size, 
> int prot)
>  DWORD old_protect;
>
>  if (!VirtualProtect(addr, size, prot, &old_protect)) {
> -error_report("%s: VirtualProtect failed with error code %ld",
> - __func__, GetLastError());
> +g_autofree gchar *emsg = g_win32_error_message(GetLastError());
> +error_report("%s: VirtualProtect failed: %s", __func__, emsg);
>  return -1;
>  }
>  return 0;
> @@ -506,12 +506,12 @@ int socket_init(void)
>  {
>  #ifdef _WIN32
>  WSADATA Data;
> -int ret, err;
> +int ret;
>
>  ret = WSAStartup(MAKEWORD(2, 2), &Data);
>  if (ret != 0) {
> -err = WSAGetLastError();
> -error_report("WSAStartup: %d", err);
> +g_autofree gchar *emsg = g_win32_error_message(WSAGetLastError());
> +error_report("WSAStartup: %s", emsg);
>  return -1;
>  }
>  atexit(socket_cleanup);
> --
> 2.21.1
>




Re: [PATCH v2 5/6] qga: Fix a memory leak

2020-02-27 Thread Marc-André Lureau
On Thu, Feb 27, 2020 at 5:32 PM Philippe Mathieu-Daudé
 wrote:
>
> The string returned by g_win32_error_message() has to be
> deallocated with g_free().
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Marc-André Lureau 


> ---
>  qga/channel-win32.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/qga/channel-win32.c b/qga/channel-win32.c
> index c86f4388db..774205e017 100644
> --- a/qga/channel-win32.c
> +++ b/qga/channel-win32.c
> @@ -302,8 +302,8 @@ static gboolean ga_channel_open(GAChannel *c, 
> GAChannelMethod method,
> OPEN_EXISTING,
> FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED, 
> NULL);
>  if (c->handle == INVALID_HANDLE_VALUE) {
> -g_critical("error opening path %s: %s", newpath,
> -   g_win32_error_message(GetLastError()));
> +g_autofree gchar *emsg = g_win32_error_message(GetLastError());
> +g_critical("error opening path %s: %s", newpath, emsg);
>  return false;
>  }
>
> --
> 2.21.1
>




Re: [PATCH v2 6/6] qga: Improve error report by calling error_setg_win32()

2020-02-27 Thread Marc-André Lureau
Hi

On Thu, Feb 27, 2020 at 5:32 PM Philippe Mathieu-Daudé
 wrote:
>
> Use error_setg_win32() which adds a hint similar to strerror(errno)).
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  qga/channel-win32.c  | 3 ++-
>  qga/commands-win32.c | 6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/qga/channel-win32.c b/qga/channel-win32.c
> index 774205e017..4f04868a76 100644
> --- a/qga/channel-win32.c
> +++ b/qga/channel-win32.c
> @@ -308,7 +308,8 @@ static gboolean ga_channel_open(GAChannel *c, 
> GAChannelMethod method,
>  }
>
>  if (method == GA_CHANNEL_ISA_SERIAL && 
> !SetCommTimeouts(c->handle,&comTimeOut)) {
> -g_critical("error setting timeout for com port: %lu",GetLastError());
> +g_autofree gchar *emsg = g_win32_error_message(GetLastError());
> +g_critical("error setting timeout for com port: %s", emsg);
>  CloseHandle(c->handle);
>  return false;
>  }
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 2461fd19bf..8e1f32ea23 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -315,8 +315,7 @@ void qmp_guest_shutdown(bool has_mode, const char *mode, 
> Error **errp)
>  }
>
>  if (!ExitWindowsEx(shutdown_flag, SHTDN_REASON_FLAG_PLANNED)) {
> -slog("guest-shutdown failed: %lu", GetLastError());
> -error_setg(errp, QERR_UNDEFINED_ERROR);
> +error_setg_win32(errp, GetLastError(), "guest-shutdown failed");

did you drop the slog() intentionally?

>  }
>  }
>
> @@ -1319,7 +1318,8 @@ static DWORD WINAPI do_suspend(LPVOID opaque)
>  DWORD ret = 0;
>
>  if (!SetSuspendState(*mode == GUEST_SUSPEND_MODE_DISK, TRUE, TRUE)) {
> -slog("failed to suspend guest, %lu", GetLastError());
> +g_autofree gchar *emsg = g_win32_error_message(GetLastError());
> +slog("failed to suspend guest: %s", emsg);
>  ret = -1;
>  }
>  g_free(mode);
> --
> 2.21.1
>




Re: [PATCH v5 48/50] multi-process: Validate incoming commands from Proxy

2020-02-27 Thread Stefan Hajnoczi
On Mon, Feb 24, 2020 at 03:55:39PM -0500, Jagannathan Raman wrote:
> From: Elena Ufimtseva 
> 
> Validate the incoming commands to confirm that they would not cause any
> errors in the remote process.
> 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: Jagannathan Raman 
> Signed-off-by: John G Johnson 
> ---
>  hw/proxy/qemu-proxy.c|  6 +++-
>  include/io/mpqemu-link.h |  2 ++
>  io/mpqemu-link.c | 75 
> +++-
>  remote/remote-main.c |  4 +++
>  4 files changed, 85 insertions(+), 2 deletions(-)

Please squash this into the patch(es) that introduced the code.

Reviewers want to see a logical sequence of patches.  Introducing
unsafe code in an earlier patch and adding checks in a later patch makes
it impossible to review the patches in sequence (reviewers would waste
time pointing out bugs that end up getting fixed later).

> diff --git a/remote/remote-main.c b/remote/remote-main.c
> index 20d160e..c4aa3e0 100644
> --- a/remote/remote-main.c
> +++ b/remote/remote-main.c
> @@ -435,6 +435,10 @@ static void process_msg(GIOCondition cond, MPQemuChannel 
> *chan)
>  if (msg->id > MAX_REMOTE_DEVICES) {
>  error_setg(&err, "id of the device is larger than max number of "\
>   "devices per remote process.");
> +}

Was goto finalize_loop accidentally dropped?


signature.asc
Description: PGP signature


Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip

2020-02-27 Thread Peter Xu
On Thu, Feb 27, 2020 at 12:00:48PM -0500, Peter Xu wrote:
> +static inline void kvm_resample_fd_remove(int gsi)
> +{
> +KVMResampleFd *rfd;
> +
> +QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> +if (rfd->gsi == gsi) {
> +QLIST_REMOVE(rfd, node);

Oops, rfd is leaked...  Will fix that in v2.

> +break;
> +}
> +}
> +}

-- 
Peter Xu




  1   2   3   4   >