date:20190621

[Qemu-devel] [Bug 1824768] Re: Qemu ARMv7 TCG MultiThreading for i386 guest doesn't work

2019-06-21 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824768

Title:
  Qemu ARMv7 TCG MultiThreading for i386 guest doesn't work

Status in QEMU:
  Expired

Bug description:
  Using any Linux image (in this case Alpine Linux iso) I want to
  utilise all cores of my Raspberry with --accel,thread=multi. I know
  there is a probably still problem with memory ordering of the host,
  but I have also seen some very old commits which could potentially
  help with it.

  But anyway, with version qemu-i386 version 3.1.0 (Debian 1:3.1+dfsg-7)
  I can see OpenRC starting up services and then the kernel crash.

  With version QEMU emulator version 3.1.93 (v4.0.0-rc3-dirty)
  The whole machine crash with this error:
  Illegal instruction

  Using command:
  ./qemu-system-i386 -cdrom alpine.iso --accel tcg,thread=multi

  Full Console Output:
  qemu-system-i386: warning: Guest expects a stronger memory ordering than the 
host provides
  This may cause strange/hard to debug errors
  Illegal instruction

  Kernel:
  Linux raspberrypi 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l 
GNU/Linux

  CPU:
  ARMv7 Processor rev 5 (v7l)
  Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm
  4 cores

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824768/+subscriptions

Re: [Qemu-devel] [PATCH 0/2] target/i386: kvm: Fix treatment of AMD SVM in nested migration

2019-06-21 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190621213712.16222-1-liran.a...@oracle.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH 0/2] target/i386: kvm: Fix treatment of AMD SVM in 
nested migration
Type: series
Message-id: 20190621213712.16222-1-liran.a...@oracle.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20190621213712.16222-1-liran.a...@oracle.com 
-> patchew/20190621213712.16222-1-liran.a...@oracle.com
Switched to a new branch 'test'
a5de9408a8 target/i386: kvm: Init nested-state in case of vCPU exposed with SVM
06ca99d907 target/i386: kvm: Block migration on vCPU exposed with SVM when 
kernel lacks caps to save/restore nested state

=== OUTPUT BEGIN ===
1/2 Checking commit 06ca99d907bc (target/i386: kvm: Block migration on vCPU 
exposed with SVM when kernel lacks caps to save/restore nested state)
ERROR: return is not a function, parentheses are not required
#46: FILE: target/i386/cpu.h:1877:
+return (cpu_has_vmx(env) || cpu_has_svm(env));

total: 1 errors, 0 warnings, 32 lines checked

Patch 1/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/2 Checking commit a5de9408a89a (target/i386: kvm: Init nested-state in case 
of vCPU exposed with SVM)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190621213712.16222-1-liran.a...@oracle.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] patch to swap SIGRTMIN + 1 and SIGRTMAX - 1

2019-06-21 Thread Marlies Ruck

Hi,

Attached is a patch to let guest programs use SIGRTMIN + 1 by swapping with
SIGRTMAX - 1. Since QEMU links against glibc, it reserves the signal for
itself and returns EINVAL (as noted in the commit message). This means
various applications that use SIGRTMIN + 1 cannot run on QEMU, including
G-WAN web server and Open TFTP.

Thanks,
Marli


0001-Swap-SIGRTMIN-1-and-SIGRTMAX-1.patch
Description: Binary data

Re: [Qemu-devel] [PATCH 0/2] target/i386: kvm: Fix treatment of AMD SVM in nested migration

2019-06-21 Thread Liran Alon




> On 22 Jun 2019, at 2:59, Paolo Bonzini  wrote:
> 
> On 21/06/19 23:37, Liran Alon wrote:
>> However, during discussion made after merge, it was realised that since QEMU 
>> commit
>> 75d373ef9729 ("target-i386: Disable SVM by default in KVM mode"), an AMD 
>> vCPU that
>> is virtualized by KVM doesn't expose SVM by default, even if you use "-cpu 
>> host".
>> Therefore, it is unlikely that vCPU expose SVM CPUID flag when user is not 
>> running
>> an SVM workload inside guest.
> 
> libvirt has "host-model" mode, which constructs a "-cpu
> model,+feature,+feature" command line option that matches the host as
> good as possible.  This lets libvirt check migratability while retaining
> a lot of the benefits of "-cpu host", and is the default for OpenStack
> for example.  I need to check if libvirt adds SVM to this configuration,
> if it does the QEMU commit you mention is unfortunately not enough.
> 
> Paolo

Hmm nice catch. I haven’t thought about it (Not familiar much with libvirt).
I agree that if libvirt adds SVM to this configuration than we must not block
migration for an AMD vCPU that is exposed with SVM… :(

Please update once you figure out libvirt behaviour regarding this,
-Liran

Re: [Qemu-devel] [PATCH v1 0/9] Update the RISC-V specification versions

2019-06-21 Thread Alistair Francis

On Thu, Jun 20, 2019 at 7:49 PM Palmer Dabbelt  wrote:
>
> On Wed, 19 Jun 2019 07:19:38 PDT (-0700), alistai...@gmail.com wrote:
> > On Wed, Jun 19, 2019 at 3:58 AM Palmer Dabbelt  wrote:
> >>
> >> On Mon, 17 Jun 2019 18:31:00 PDT (-0700), Alistair Francis wrote:
> >> > Based-on: 
> >> >
> >> > Now that the RISC-V spec has started to be ratified let's update our
> >> > QEMU implementation. There are a few things going on here:
> >> >  - Add priv version 1.11.0 to QEMU
> >> > - This is the ratified version of the Privledge spec
> >> > - There are almost no changes to 1.10
> >> >  - Mark the 1.09.1 privledge spec as depreated
> >> >  - Let's aim to remove it in two releases
> >> >  - Set priv version 1.11.0 as the default
> >> >  - Remove the user_spec version
> >> >  - This doesn't really mean anything so let's remove it
> >> >  - Add support for the "Counters" extension
> >> >  - Add command line options for Zifencei and Zicsr
> >>
> >> Thanks!  I'll look at the code, but I've currently got this queued up 
> >> behind
> >> your hypervisor patches so it might take a bit.  LMK if you want me to 
> >> invert
> >> the priority on these.  I'll probably be buried until the start of July.
> >
> > Let's move the Hypervisor patches to the back then. There is a new
> > spec version now anyway so I'll have to update them for that.
>
> OK.  Do you want me to just drop them and wait for a v2 / draft 0.4?

I haven't looked at the 0.4 yet, but I think there are still lots of
similarities so let's just put Hypervisor patches at the back of the
list and see if you get there. It would still be nice to have comments
on the v1.

Alistair

>
> >
> > Alistair
> >
> >>
> >> > We can remove the spec version as it's unused and has never been exposed
> >> > to users. The idea is to match the specs in specifying the version. To
> >> > handle versions in the future we can extend the extension props to
> >> > handle version information.
> >> >
> >> > For example something like this: -cpu 
> >> > rv64,i=2.2,c=2.0,h=0.4,priv_spec=1.11
> >> >
> >> > NOTE: This isn't supported today as we only have one of each version.
> >> >
> >> > This will be a future change if we decide to support multiple versions
> >> > of extensions.
> >> >
> >> > The "priv_spec" string doesn't really match, but I don't have a better
> >> > way to say "Machine ISA" and "Supervisor ISA" which is what is included
> >> > in "priv_spec".
> >> >
> >> > For completeness I have also added the Counters, Zifencei and Zicsr
> >> > extensions.
> >> >
> >> > Everything else seems to match the spec names/style.
> >> >
> >> > Please let me know if I'm missing something. QEMU 4.1 is the first
> >> > release to support the extensions from the command line, so we can
> >> > easily change it until then. After that it'll take more work to change
> >> > the command line interface.
> >> >
> >> > Alistair Francis (9):
> >> >   target/riscv: Restructure deprecatd CPUs
> >> >   target/riscv: Add the privledge spec version 1.11.0
> >> >   target/riscv: Comment in the mcountinhibit CSR
> >> >   target/riscv: Set privledge spec 1.11.0 as default
> >> >   qemu-deprecated.texi: Deprecate the RISC-V privledge spec 1.09.1
> >> >   target/riscv: Require either I or E base extension
> >> >   target/riscv: Remove user version information
> >> >   target/riscv: Add support for disabling/enabling Counters
> >> >   target/riscv: Add Zifencei and Zicsr as command line options
> >> >
> >> >  qemu-deprecated.texi  |  8 +++
> >> >  target/riscv/cpu.c| 72 ++-
> >> >  target/riscv/cpu.h| 19 ++---
> >> >  target/riscv/cpu_bits.h   |  1 +
> >> >  target/riscv/csr.c| 13 +++-
> >> >  .../riscv/insn_trans/trans_privileged.inc.c   |  2 +-
> >> >  6 files changed, 71 insertions(+), 44 deletions(-)

[Qemu-devel] [PATCH v2] ioapic: use irq number instead of vector in ioapic_eoi_broadcast

2019-06-21 Thread Li Qiang

When emulating irqchip in qemu, such as following command:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -smp 4 -hda /home/test/test.img
-machine kernel-irqchip=off --enable-kvm -vnc :0 -device edu -monitor stdio

We will get a crash with following asan output:

(qemu) /home/test/qemu5/qemu/hw/intc/ioapic.c:266:27: runtime error: index 35 
out of bounds for type 'int [24]'
=
==113504==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x61b03114 at pc 0x5579e3c7a80f bp 0x7fd004bf8c10 sp 0x7fd004bf8c00
WRITE of size 4 at 0x61b03114 thread T4
#0 0x5579e3c7a80e in ioapic_eoi_broadcast 
/home/test/qemu5/qemu/hw/intc/ioapic.c:266
#1 0x5579e3c6f480 in apic_eoi /home/test/qemu5/qemu/hw/intc/apic.c:428
#2 0x5579e3c720a7 in apic_mem_write /home/test/qemu5/qemu/hw/intc/apic.c:802
#3 0x5579e3b1e31a in memory_region_write_accessor 
/home/test/qemu5/qemu/memory.c:503
#4 0x5579e3b1e6a2 in access_with_adjusted_size 
/home/test/qemu5/qemu/memory.c:569
#5 0x5579e3b28d77 in memory_region_dispatch_write 
/home/test/qemu5/qemu/memory.c:1497
#6 0x5579e3a1b36b in flatview_write_continue 
/home/test/qemu5/qemu/exec.c:3323
#7 0x5579e3a1b633 in flatview_write /home/test/qemu5/qemu/exec.c:3362
#8 0x5579e3a1bcb1 in address_space_write /home/test/qemu5/qemu/exec.c:3452
#9 0x5579e3a1bd03 in address_space_rw /home/test/qemu5/qemu/exec.c:3463
#10 0x5579e3b8b979 in kvm_cpu_exec 
/home/test/qemu5/qemu/accel/kvm/kvm-all.c:2045
#11 0x5579e3ae4499 in qemu_kvm_cpu_thread_fn 
/home/test/qemu5/qemu/cpus.c:1287
#12 0x5579e4cbdb9f in qemu_thread_start util/qemu-thread-posix.c:502
#13 0x7fd0146376da in start_thread 
(/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#14 0x7fd01436088e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e

This is because in ioapic_eoi_broadcast function, we uses 'vector' to
index the 's->irq_eoi'. To fix this, we should uses the irq number.

Signed-off-by: Li Qiang 
---
Change since v1:
remove auto-generated unnecessary message

 hw/intc/ioapic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 7074489fdf..711775cc6f 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -245,8 +245,8 @@ void ioapic_eoi_broadcast(int vector)
 s->ioredtbl[n] = entry & ~IOAPIC_LVT_REMOTE_IRR;
 
 if (!(entry & IOAPIC_LVT_MASKED) && (s->irr & (1 << n))) {
-++s->irq_eoi[vector];
-if (s->irq_eoi[vector] >= SUCCESSIVE_IRQ_MAX_COUNT) {
+++s->irq_eoi[n];
+if (s->irq_eoi[n] >= SUCCESSIVE_IRQ_MAX_COUNT) {
 /*
  * Real hardware does not deliver the interrupt immediately
  * during eoi broadcast, and this lets a buggy guest make
@@ -254,16 +254,16 @@ void ioapic_eoi_broadcast(int vector)
  * level-triggered interrupt. Emulate this behavior if we
  * detect an interrupt storm.
  */
-s->irq_eoi[vector] = 0;
+s->irq_eoi[n] = 0;
 timer_mod_anticipate(s->delayed_ioapic_service_timer,
  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) 
+
  NANOSECONDS_PER_SECOND / 100);
-trace_ioapic_eoi_delayed_reassert(vector);
+trace_ioapic_eoi_delayed_reassert(n);
 } else {
 ioapic_service(s);
 }
 } else {
-s->irq_eoi[vector] = 0;
+s->irq_eoi[n] = 0;
 }
 }
 }
-- 
2.17.1

Re: [Qemu-devel] [PATCH] ioapic: use irq number instead of vector in ioapic_eoi_broadcast

2019-06-21 Thread Li Qiang

Li Qiang  于2019年6月22日周六 上午12:15写道：

> When emulating irqchip in qemu, such as following command:
>
> x86_64-softmmu/qemu-system-x86_64 -m 1024 -smp 4 -hda /home/test/test.img
> -machine kernel-irqchip=off --enable-kvm -vnc :0 -device edu -monitor stdio
>
> We will get a crash with following asan output:
>
> (qemu) /home/test/qemu5/qemu/hw/intc/ioapic.c:266:27: runtime error: index
> 35 out of bounds for type 'int [24]'
> =
> ==113504==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x61b03114 at pc 0x5579e3c7a80f bp 0x7fd004bf8c10 sp 0x7fd004bf8c00
> WRITE of size 4 at 0x61b03114 thread T4
> #0 0x5579e3c7a80e in ioapic_eoi_broadcast
> /home/test/qemu5/qemu/hw/intc/ioapic.c:266
> #1 0x5579e3c6f480 in apic_eoi /home/test/qemu5/qemu/hw/intc/apic.c:428
> #2 0x5579e3c720a7 in apic_mem_write
> /home/test/qemu5/qemu/hw/intc/apic.c:802
> #3 0x5579e3b1e31a in memory_region_write_accessor
> /home/test/qemu5/qemu/memory.c:503
> #4 0x5579e3b1e6a2 in access_with_adjusted_size
> /home/test/qemu5/qemu/memory.c:569
> #5 0x5579e3b28d77 in memory_region_dispatch_write
> /home/test/qemu5/qemu/memory.c:1497
> #6 0x5579e3a1b36b in flatview_write_continue
> /home/test/qemu5/qemu/exec.c:3323
> #7 0x5579e3a1b633 in flatview_write /home/test/qemu5/qemu/exec.c:3362
> #8 0x5579e3a1bcb1 in address_space_write
> /home/test/qemu5/qemu/exec.c:3452
> #9 0x5579e3a1bd03 in address_space_rw /home/test/qemu5/qemu/exec.c:3463
> #10 0x5579e3b8b979 in kvm_cpu_exec
> /home/test/qemu5/qemu/accel/kvm/kvm-all.c:2045
> #11 0x5579e3ae4499 in qemu_kvm_cpu_thread_fn
> /home/test/qemu5/qemu/cpus.c:1287
> #12 0x5579e4cbdb9f in qemu_thread_start util/qemu-thread-posix.c:502
> #13 0x7fd0146376da in start_thread
> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
> #14 0x7fd01436088e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e
>
> This is because in ioapic_eoi_broadcast function, we uses 'vector' to
> index the 's->irq_eoi'. To fix this, we should uses the irq number.
>
> # Please enter the commit message for your changes. Lines starting
> # with '#' will be kept; you may remove them yourself if you want to.
> # An empty message aborts the commit.
> #
> # On branch master
> # Your branch is up to date with 'origin/master'.
> #
> # Changes to be committed:
> #   modified:   hw/intc/ioapic.c
> #
> # Untracked files:
> #   0001-migration-fix-a-typo.patch
> #   roms/vgabios/
> #   vhost-user-input
> #
>
>

Oops, I forgot to delete these auto-generated message.
I have sent out a revision, please just go to review the v2 patch.

Thanks,
Li Qiang




> Signed-off-by: Li Qiang 
> ---
>  hw/intc/ioapic.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 7074489fdf..711775cc6f 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -245,8 +245,8 @@ void ioapic_eoi_broadcast(int vector)
>  s->ioredtbl[n] = entry & ~IOAPIC_LVT_REMOTE_IRR;
>
>  if (!(entry & IOAPIC_LVT_MASKED) && (s->irr & (1 << n))) {
> -++s->irq_eoi[vector];
> -if (s->irq_eoi[vector] >= SUCCESSIVE_IRQ_MAX_COUNT) {
> +++s->irq_eoi[n];
> +if (s->irq_eoi[n] >= SUCCESSIVE_IRQ_MAX_COUNT) {
>  /*
>   * Real hardware does not deliver the interrupt
> immediately
>   * during eoi broadcast, and this lets a buggy guest
> make
> @@ -254,16 +254,16 @@ void ioapic_eoi_broadcast(int vector)
>   * level-triggered interrupt. Emulate this behavior
> if we
>   * detect an interrupt storm.
>   */
> -s->irq_eoi[vector] = 0;
> +s->irq_eoi[n] = 0;
>  timer_mod_anticipate(s->delayed_ioapic_service_timer,
>
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>   NANOSECONDS_PER_SECOND / 100);
> -trace_ioapic_eoi_delayed_reassert(vector);
> +trace_ioapic_eoi_delayed_reassert(n);
>  } else {
>  ioapic_service(s);
>  }
>  } else {
> -s->irq_eoi[vector] = 0;
> +s->irq_eoi[n] = 0;
>  }
>  }
>  }
> --
> 2.17.1
>
>
>

Re: [Qemu-devel] [PULL SUBSYSTEM s390x 0/3] s390x/tcg: pending patches

2019-06-21 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20190621134338.8425-1-da...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PULL SUBSYSTEM s390x 0/3] s390x/tcg: pending patches
Type: series
Message-id: 20190621134338.8425-1-da...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20190621134338.8425-1-da...@redhat.com -> 
patchew/20190621134338.8425-1-da...@redhat.com
Switched to a new branch 'test'
8141e7bee6 s390x/cpumodel: Prepend KDSA features with "KDSA"
022ac2dcb4 s390x/cpumodel: Rework CPU feature definition
1d587a2a49 tests/tcg/s390x: Fix alignment of csst parameter list

=== OUTPUT BEGIN ===
1/3 Checking commit 1d587a2a49b8 (tests/tcg/s390x: Fix alignment of csst 
parameter list)
2/3 Checking commit 022ac2dcb497 (s390x/cpumodel: Rework CPU feature definition)
ERROR: Macros with complex values should be enclosed in parenthesis
#407: FILE: target/s390x/cpu_features_def.h:18:
+#define DEF_FEAT(_FEAT, ...) S390_FEAT_##_FEAT,

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#762: 
new file mode 100644

WARNING: line over 80 characters
#792: FILE: target/s390x/cpu_features_def.inc.h:26:
+DEF_FEAT(IDTE_SEGMENT, "idtes", STFL, 4, "IDTE selective TLB segment-table 
clearing")

WARNING: line over 80 characters
#793: FILE: target/s390x/cpu_features_def.inc.h:27:
+DEF_FEAT(IDTE_REGION, "idter", STFL, 5, "IDTE selective TLB region-table 
clearing")

WARNING: line over 80 characters
#799: FILE: target/s390x/cpu_features_def.inc.h:33:
+DEF_FEAT(CONFIGURATION_TOPOLOGY, "ctop", STFL, 11, "Configuration-topology 
facility")

ERROR: line over 90 characters
#800: FILE: target/s390x/cpu_features_def.inc.h:34:
+DEF_FEAT(AP_QUERY_CONFIG_INFO, "apqci", STFL, 12, "Query AP Configuration 
Information facility")

WARNING: line over 80 characters
#802: FILE: target/s390x/cpu_features_def.inc.h:36:
+DEF_FEAT(NONQ_KEY_SETTING, "nonqks", STFL, 14, "Nonquiescing key-setting 
facility")

WARNING: line over 80 characters
#804: FILE: target/s390x/cpu_features_def.inc.h:38:
+DEF_FEAT(EXTENDED_TRANSLATION_2, "etf2", STFL, 16, "Extended-translation 
facility 2")

ERROR: line over 90 characters
#805: FILE: target/s390x/cpu_features_def.inc.h:39:
+DEF_FEAT(MSA, "msa-base", STFL, 17, "Message-security-assist facility 
(excluding subfunctions)")

ERROR: line over 90 characters
#807: FILE: target/s390x/cpu_features_def.inc.h:41:
+DEF_FEAT(LONG_DISPLACEMENT_FAST, "ldisphp", STFL, 19, "Long-displacement 
facility has high performance")

WARNING: line over 80 characters
#810: FILE: target/s390x/cpu_features_def.inc.h:44:
+DEF_FEAT(EXTENDED_TRANSLATION_3, "etf3", STFL, 22, "Extended-translation 
facility 3")

WARNING: line over 80 characters
#811: FILE: target/s390x/cpu_features_def.inc.h:45:
+DEF_FEAT(HFP_UNNORMALIZED_EXT, "hfpue", STFL, 23, "HFP-unnormalized-extension 
facility")

ERROR: line over 90 characters
#815: FILE: target/s390x/cpu_features_def.inc.h:49:
+DEF_FEAT(MOVE_WITH_OPTIONAL_SPEC, "mvcos", STFL, 27, 
"Move-with-optional-specification facility")

ERROR: line over 90 characters
#816: FILE: target/s390x/cpu_features_def.inc.h:50:
+DEF_FEAT(TOD_CLOCK_STEERING, "tods-base", STFL, 28, "TOD-clock-steering 
facility (excluding subfunctions)")

ERROR: line over 90 characters
#819: FILE: target/s390x/cpu_features_def.inc.h:53:
+DEF_FEAT(COMPARE_AND_SWAP_AND_STORE, "csst", STFL, 32, 
"Compare-and-swap-and-store facility")

ERROR: line over 90 characters
#820: FILE: target/s390x/cpu_features_def.inc.h:54:
+DEF_FEAT(COMPARE_AND_SWAP_AND_STORE_2, "csst2", STFL, 33, 
"Compare-and-swap-and-store facility 2")

ERROR: line over 90 characters
#821: FILE: target/s390x/cpu_features_def.inc.h:55:
+DEF_FEAT(GENERAL_INSTRUCTIONS_EXT, "ginste", STFL, 34, 
"General-instructions-extension facility")

WARNING: line over 80 characters
#824: FILE: target/s390x/cpu_features_def.inc.h:58:
+DEF_FEAT(FLOATING_POINT_EXT, "fpe", STFL, 37, "Floating-point extension 
facility")

ERROR: line over 90 characters
#825: FILE: target/s390x/cpu_features_def.inc.h:59:
+DEF_FEAT(ORDER_PRESERVING_COMPRESSION, "opc", STFL, 38, "Order Preserving 
Compression facility")

WARNING: line over 80 characters
#826: FILE: target/s390x/cpu_features_def.inc.h:60:
+DEF_FEAT(SET_PROGRAM_PARAMETERS, "sprogp", STFL, 40, "Set-program-parameters 
facility")

ERROR: line over 90 characters
#827: FILE: target/s390x/cpu_features_def.inc.h:61:
+DEF_FEAT(FLOATING_POINT_SUPPPORT_ENH, "fpseh", STFL, 41, 
"Floating-point-support-enhancement facilities")

ERROR: line over 90 characters
#829: FILE: target/s390x/cpu_features_def.inc.h:63:
+DEF_FEAT(DFP_FAST, "dfphp", STFL, 43, "DFP

Re: [Qemu-devel] [PATCH 0/2] target/i386: kvm: Fix treatment of AMD SVM in nested migration

2019-06-21 Thread Paolo Bonzini

On 21/06/19 23:37, Liran Alon wrote:
> However, during discussion made after merge, it was realised that since QEMU 
> commit
> 75d373ef9729 ("target-i386: Disable SVM by default in KVM mode"), an AMD vCPU 
> that
> is virtualized by KVM doesn't expose SVM by default, even if you use "-cpu 
> host".
> Therefore, it is unlikely that vCPU expose SVM CPUID flag when user is not 
> running
> an SVM workload inside guest.

libvirt has "host-model" mode, which constructs a "-cpu
model,+feature,+feature" command line option that matches the host as
good as possible.  This lets libvirt check migratability while retaining
a lot of the benefits of "-cpu host", and is the default for OpenStack
for example.  I need to check if libvirt adds SVM to this configuration,
if it does the QEMU commit you mention is unfortunately not enough.

Paolo

[Qemu-devel] [RFC PATCH] Acceptance tests: run generic tests on all built targets

2019-06-21 Thread Cleber Rosa

The intention here is to discuss the validity of running the the
acceptance tests are not depedent on target specific functionality on
all built targets.

Subtle but important questions that this topic brings:

 1) Should the QEMU binary target provide, as much as possible,
a similar experience across targets, or should upper layer
code deal with it?

 2) Are those tests exercising the same exact features and
implementation, which just happen to be linked to various
different binaries?  Or is the simple fact that they are
integrated into different code worth testing?

 3) Should the default target match the host target?  Or the
first binary found in the build tree?  Or something else?

An example of a Travis CI job based on this can be seen here:

 https://travis-ci.org/clebergnu/qemu/jobs/548905146

Signed-off-by: Cleber Rosa 
---
 .travis.yml   |  2 +-
 tests/Makefile.include| 18 +-
 tests/acceptance/avocado_qemu/__init__.py |  9 +
 tests/acceptance/cpu_queries.py   |  3 +++
 tests/acceptance/migration.py |  4 +++-
 5 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index aeb9b211cd..310febb866 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -232,7 +232,7 @@ matrix:
 # Acceptance (Functional) tests
 - env:
 - CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,aarch64-softmmu,arm-softmmu,s390x-softmmu,alpha-softmmu"
-- TEST_CMD="make check-acceptance"
+- TEST_CMD="make check-acceptance-all"
   after_failure:
 - cat tests/results/latest/job.log
   addons:
diff --git a/tests/Makefile.include b/tests/Makefile.include
index db750dd6d0..34126167a5 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -1154,7 +1154,23 @@ check-acceptance: check-venv $(TESTS_RESULTS_DIR)
 --filter-by-tags-include-empty --filter-by-tags-include-empty-key \
 $(AVOCADO_TAGS) \
 --failfast=on $(SRC_PATH)/tests/acceptance, \
-"AVOCADO", "tests/acceptance")
+"AVOCADO", "tests/acceptance (target arch based on test)")
+
+# Allows one to run tests that are generic (independent of target)
+# using a given target
+check-acceptance-generic-on-%: check-venv $(TESTS_RESULTS_DIR)
+   $(call quiet-command, \
+$(TESTS_VENV_DIR)/bin/python -m avocado \
+--show=$(AVOCADO_SHOW) run --job-results-dir=$(TESTS_RESULTS_DIR) \
+--filter-by-tags-include-empty --filter-by-tags-include-empty-key \
+--filter-by-tags=-arch -p arch=$* \
+-p add-qtest=yes -p set-arm-aarch64-machine=yes \
+--failfast=on $(SRC_PATH)/tests/acceptance, \
+"AVOCADO", "tests/acceptance (target arch set to $*)")
+
+check-acceptance-generic: $(patsubst %-softmmu,check-acceptance-generic-on-%, 
$(filter %-softmmu,$(TARGET_DIRS)))
+
+check-acceptance-all: check-acceptance check-acceptance-generic
 
 # Consolidated targets
 
diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 2b236a1cf0..7a47f0d514 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -69,6 +69,15 @@ class Test(avocado.Test):
 vm = QEMUMachine(self.qemu_bin)
 if args:
 vm.add_args(*args)
+if self.params.get('add-qtest', default=False):
+# mips and sh4 targets require either a bios or a kernel or
+# qtest enabled to not abort right at the commad line
+if self.arch in ('mips', 'mipsel', 'mips64', 'mips64el', 'sh4'):
+vm.add_args('-accel', 'qtest')
+if self.params.get('set-arm-aarch64-machine', default=False):
+# arm and aarch64 require a machine type to be set
+if self.arch in ('arm', 'aarch64'):
+vm.set_machine('virt')
 return vm
 
 @property
diff --git a/tests/acceptance/cpu_queries.py b/tests/acceptance/cpu_queries.py
index e71edec39f..af47d2795a 100644
--- a/tests/acceptance/cpu_queries.py
+++ b/tests/acceptance/cpu_queries.py
@@ -18,6 +18,9 @@ class QueryCPUModelExpansion(Test):
 """
 
 def test(self):
+"""
+:avocado: tags=arch:x86_64
+"""
 self.vm.set_machine('none')
 self.vm.add_args('-S')
 self.vm.launch()
diff --git a/tests/acceptance/migration.py b/tests/acceptance/migration.py
index 6115cf6c24..c4ed87cd98 100644
--- a/tests/acceptance/migration.py
+++ b/tests/acceptance/migration.py
@@ -33,8 +33,10 @@ class Migration(Test):
 self.cancel('Failed to find a free port')
 return port
 
-
 def test_migration_with_tcp_localhost(self):
+blacklist_targets = ["arm", "ppc64", "sh4", "s390x"]
+if self.arch in blacklist_targets:
+self.cancel("Test failing on targets: %s"

Re: [Qemu-devel] [PATCH 07/12] block/backup: add 'always' bitmap sync policy

2019-06-21 Thread John Snow

On 6/21/19 5:48 PM, Max Reitz wrote:
> On 21.06.19 22:58, John Snow wrote:
>>
>>
>> On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
> Just chiming in on this:
> 
>>> "So on cancel and abort you synchronize bitmap too?"
>>
>> I will concede that this means that if you ask for a bitmap backup with
>> the 'always' policy and, for whatever reason change your mind about
>> this, there's no way to "cancel" the job in a manner that does not edit
>> the bitmap at this point.
>>
>> I do agree that this seems to go against the wishes of the user, because
>> we have different "kinds" of cancellations:
>>
>> A) Cancellations that actually represent failures in transactions
>> B) Cancellations that represent genuine user intention
>>
>> It might be nice to allow the user to say "No wait, please don't edit
>> that bitmap, I made a mistake!"
> 
> So that “always” doesn’t mean “always”?  To me, that seems like not so
> good an idea.
> 
> If the user uses always, they have to live with that.  I had to live
> with calling “rm” on the wrong file before.  Life’s tough.
> 

I actually agree, but I was making a concession in the ONE conceivable
case where you would theoretically want to abort "always".

> In all seriousness: “Always” is not something a user would use, is it?
> It’s something for management tools.  Why would they cancel because
> “They made a mistake”?
> 

A user might use it -- it's an attractive mode. It's basically
Incremental with retry ability. It is designed for use by a management
utility though, yes.

> Second, what’s the worst thing that may come out of such a mistake?
> Having to perform a full backup?  If so, that doesn’t seem so bad to me.
>  It certainly doesn’t seem so bad to make an unrelated mechanic have an
> influence on whether “always” means “always”.
> 

No, if you "accidentally" issue always (and you change your mind for
whatever reason), the correct way to fix this is:

(1) If the job completes successfully, nothing. Everything is situation
normal. This behaves exactly like "incremental" mode.

(2) If the job fails so hard you don't succeed in writing data anywhere
at all, nothing. Everything is fine. This behaves exactly like a failure
in "incremental" mode. The only way to reliably tell if this happened is
if job never even succeeded in creating a target for you, or your target
is still verifiably empty. (Even so: good practice would be to never
delete a target if you used 'always' mode.)

(3) If the job fails after writing SOME data, you simply issue another
mode=bitmap policy=always against the same target. (Presumably after
fixing your network or clearing room on the target storage.)

The worst mistake you can make is this:

- Issue sync=bitmap policy=always
- Cancel the job because it's taking too long, and you are impatient
- Forget that you used "always", delete the incomplete backup target

Oops! That had data that QEMU was counting on having written out
already. Your bitmap is now garbage.

You fix this with a full backup, yes.

> Also, this cancel idea would only work for jobs where the bitmap mode
> does not come into play until the job is done, i.e. backup.  I suppose
> if we want to have bitmap modes other than 'always' for mirror, that too
> would have to make a copy of the user-supplied bitmap, so there the
> bitmap mode would make a difference only at the end of the job, too, but
> who knows.
> 

Reasonable point; at the moment I modeled bitmap support for mirror to
only do synchronization at the end of the job as well. In this case,
"soft cancels" are modeled (upstream, today) as ret == 0, so those won't
count as failures at all.

(And, actually, force cancels will count as real failures. So maybe it
IS best not to overload this already hacky semantic we have on cancel.)

> And if it only makes a difference at the end of the job, you might as
> well just add a way to change a running job’s bitmap-mode.
> 

This is prescient. I have wanted a "completion-mode" for mirror that you
can change during its runtime (and to deprecate cancel as a way to
"complete" the job) for a very long time.

It's just that the QAPI for it always seems ugly so I shy away from it.

> Max
> 

So, I will say this:

1) I think the implementation of "always" is perfectly correct, in
single, transaction, or grouped-completion transaction modes.

2) Some of these combinations don't make much practical sense, but it's
more work to disallow them, and past experience reminds me that it's not
my job to save the user from themselves at the primitive level.

3) Adding nicer features like "I want a different completion mode since
i started this job" don't exist for any other mode or any other job
right now, and I don't think I will add them to this series.

Re: [Qemu-devel] [Qemu-riscv] [RFC v1 4/5] roms: Add OpenSBI version 0.3

2019-06-21 Thread Alistair Francis

On Thu, Jun 20, 2019 at 10:42 PM Bin Meng  wrote:
>
> On Thu, Jun 20, 2019 at 2:30 AM Alistair Francis  wrote:
> >
> > On Wed, Jun 19, 2019 at 8:18 AM Bin Meng  wrote:
> > >
> > > On Wed, Jun 19, 2019 at 1:14 PM Anup Patel  wrote:
> > > >
> > > > On Wed, Jun 19, 2019 at 6:24 AM Alistair Francis
> > > >  wrote:
> > > > >
> > > > > Add OpenSBI version 0.3 as a git submodule and as a prebult binary.
> > > > >
> > > > > Signed-off-by: Alistair Francis 
> > > > > ---
> > > > >  .gitmodules |   3 +++
> > > > >  Makefile|   3 ++-
> > > > >  configure   |   1 +
> > > > >  pc-bios/opensbi-riscv32-fw_jump.elf | Bin 0 -> 197988 bytes
> > > > >  pc-bios/opensbi-riscv64-fw_jump.elf | Bin 0 -> 200192 bytes
> > > > >  roms/Makefile   |  17 +
> > > > >  roms/opensbi|   1 +
> > > > >  7 files changed, 24 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 pc-bios/opensbi-riscv32-fw_jump.elf
> > > > >  create mode 100644 pc-bios/opensbi-riscv64-fw_jump.elf
> > > > >  create mode 16 roms/opensbi
> > > > >
> > > >
> > > > The OpenSBI firmwares are platform specific so we should have
> > > > machine directory under pc-bios/ directory
> > > >
> > > > So for virt machine we will have:
> > > > pc-bios/riscv32/virt/fw_jump.elf
> > > > pc-bios/riscv64/virt/fw_jump.elf
> >
> > I have updated the names to indicate the machine. The pc-bios directly
> > appears to be flat (at least for binaries) so I don't want to add
> > subdirectories.
> >
>
> Should we include pre-built OpenSBI "bios" for "sifive_u" machine too?

Yep, I am doing that now.

Alistair

>
> > >
> > > And we should only integrate plain binary image for "bios" images here.
> > >
> > > pc-bios/riscv32/virt/fw_jump.bin
> > > pc-bios/riscv64/virt/fw_jump.bin
> >
> > Yep, fixed.
> >
>
> Regards,
> Bin

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Alex Williamson

On Sat, 22 Jun 2019 02:35:02 +0530
Kirti Wankhede  wrote:

> On 6/22/2019 2:02 AM, Alex Williamson wrote:
> > On Sat, 22 Jun 2019 01:37:47 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 6/22/2019 1:32 AM, Alex Williamson wrote:  
> >>> On Sat, 22 Jun 2019 01:08:40 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  On 6/21/2019 8:46 PM, Alex Williamson wrote:
> > On Fri, 21 Jun 2019 12:08:26 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 6/21/2019 12:55 AM, Alex Williamson wrote:  
> >>> On Thu, 20 Jun 2019 20:07:36 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  Added .save_live_pending, .save_live_iterate and 
>  .save_live_complete_precopy
>  functions. These functions handles pre-copy and stop-and-copy phase.
> 
>  In _SAVING|_RUNNING device state or pre-copy phase:
>  - read pending_bytes
>  - read data_offset - indicates kernel driver to write data to staging
>    buffer which is mmapped.
> >>>
> >>> Why is data_offset the trigger rather than data_size?  It seems that
> >>> data_offset can't really change dynamically since it might be mmap'd,
> >>> so it seems unnatural to bother re-reading it.
> >>> 
> >>
> >> Vendor driver can change data_offset, he can have different data_offset
> >> for device data and dirty pages bitmap.
> >>  
>  - read data_size - amount of data in bytes written by vendor driver 
>  in migration
>    region.
>  - if data section is trapped, pread() number of bytes in data_size, 
>  from
>    data_offset.
>  - if data section is mmaped, read mmaped buffer of size data_size.
>  - Write data packet to file stream as below:
>  {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
>  VFIO_MIG_FLAG_END_OF_STATE }
> 
>  In _SAVING device state or stop-and-copy phase
>  a. read config space of device and save to migration file stream. 
>  This
> doesn't need to be from vendor driver. Any other special config 
>  state
> from driver can be saved as data in following iteration.
>  b. read pending_bytes - indicates kernel driver to write data to 
>  staging
> buffer which is mmapped.
> >>>
> >>> Is it pending_bytes or data_offset that triggers the write out of
> >>> data?  Why pending_bytes vs data_size?  I was interpreting
> >>> pending_bytes as the total data size while data_size is the size
> >>> available to read now, so assumed data_size would be more closely
> >>> aligned to making the data available.
> >>> 
> >>
> >> Sorry, that's my mistake while editing, its read data_offset as in 
> >> above
> >> case.
> >>  
>  c. read data_size - amount of data in bytes written by vendor driver 
>  in
> migration region.
>  d. if data section is trapped, pread() from data_offset of size 
>  data_size.
>  e. if data section is mmaped, read mmaped buffer of size data_size.  
>    
> >>>
> >>> Should this read as "pread() from data_offset of data_size, or
> >>> optionally if mmap is supported on the data area, read data_size from
> >>> start of mapped buffer"?  IOW, pread should always work.  Same in
> >>> previous section.
> >>> 
> >>
> >> ok. I'll update.
> >>  
>  f. Write data packet as below:
> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
>  g. iterate through steps b to f until (pending_bytes > 0)
> >>>
> >>> s/until/while/
> >>
> >> Ok.
> >>  
> >>> 
>  h. Write {VFIO_MIG_FLAG_END_OF_STATE}
> 
>  .save_live_iterate runs outside the iothread lock in the migration 
>  case, which
>  could race with asynchronous call to get dirty page list causing 
>  data corruption
>  in mapped migration region. Mutex added here to serial migration 
>  buffer read
>  operation.
> >>>
> >>> Would we be ahead to use different offsets within the region for 
> >>> device
> >>> data vs dirty bitmap to avoid this?
> >>>
> >>
> >> Lock will still be required to serialize the read/write operations on
> >> vfio_device_migration_info structure in the region.
> >>
> >>  
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
>   hw/vfio/migration.c | 212 
>  
>   1 file changed, 212 insertions(+)
> 
>  diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>  index fe0887c27664..0a2f30872316 100644
>  --- a/hw/vfio/migration.c
>  +++

Re: [Qemu-devel] [RFC PATCH] tests/acceptance: Handle machine type for ARM target

2019-06-21 Thread Cleber Rosa

On Fri, Jun 21, 2019 at 11:38:06AM -0400, Wainer dos Santos Moschetta wrote:
> Hi all,
> 
> I'm still unsure this is the best solution. I tend to think that
> any arch-independent test case (i.e. not tagged 'arch') should
> be skipped on all arches except for x86_64. Opening up for
> discussion though.
>

I'm confused... if you're calling a test case "arch-independent", why
should it be skipped on all but one arch?  Anyway, I don't think we
should define such a broad policy... This line of thought is very
x86_64 centric, and quite honestly, doesn't map to QEMU's goals.

I agree that we're being a bit "disonest" by not assuring that tests
we send will work on all targets... but at least we're having that
discussion.  The next step would be to start triaging and discussing
wether it's worth running those against other targets, considering
the cost and benefits.

> Note: It was decided that ARM targets should not default to any
> machine type: 
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg625999.html
> 
> -- 8< --
> Some tests are meant arch-independent and as such they don't set
> the machine type (i.e. relying to defaults) on launched VMs. The arm
> targets, however, don't provide any default machine so tests fail.
> 
> This patch adds a logic on the base Test class so that machine type
> is set to 'virt' when:
>a) The test case doesn't have arch:aarch64 or arch:arm tag. Here
>   I assume that if the test was tagged for a specific arch then
>   the writer took care of setting a machine type.
>b) The target binary arch is any of aarch64 or arm. Note:
>   self.target_arch can end up None if qemu_bin is passed by
>   Avocado parameter and the filename doesn't match expected
>   format. In this case the test will fail.
> 
> Signed-off-by: Wainer dos Santos Moschetta 
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 2b236a1cf0..fb3e0dc2bc 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -9,6 +9,7 @@
>  # later.  See the COPYING file in the top-level directory.
>  
>  import os
> +import re
>  import sys
>  import uuid
>  
> @@ -65,10 +66,21 @@ class Test(avocado.Test):
>  if self.qemu_bin is None:
>  self.cancel("No QEMU binary defined or found in the source tree")
>  
> +m = re.match('qemu-system-(.*)', self.qemu_bin.split('/').pop())
> +if m:
> +self.target_arch = m.group(1)
> +else:
> +self.target_arch = None
> +

The "arch" tag and parameter are actually related to the target that
should be used.  I don't see the need for a "target_arch" based on
that.

>  def _new_vm(self, *args):
>  vm = QEMUMachine(self.qemu_bin)
>  if args:
>  vm.add_args(*args)
> +# Handle lack of default machine type on some targets.
> +# Assume that arch tagged tests have machine type set properly.
> +if self.tags.get('arch') is None and \
> +   self.target_arch in ('aarch64', 'arm'):
> +vm.set_machine('virt')

This (considering it deals with "arch" instead of "target_arch") is
one of the very important points to be determined.  How much wrapping
around different QEMU behavior on different targets/machines/devices
should we do?  This will possibly be case-by-case discussions with
different outcomes, but hopefully we can come up with a general
direction.

Thanks,
- Cleber.

>  return vm
>  
>  @property
> -- 
> 2.18.1
>

[Qemu-devel] [PATCH] vfio-common.h: Remove inaccurate comment

2019-06-21 Thread Fabiano Rosas

This is a left-over from "f4ec5e26ed vfio: Add host side DMA window
capabilities", which added support to more than one DMA window.

Signed-off-by: Fabiano Rosas 
---
 include/hw/vfio/vfio-common.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a88b69b675..9107bd41c0 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -74,11 +74,6 @@ typedef struct VFIOContainer {
 int error;
 bool initialized;
 unsigned long pgsizes;
-/*
- * This assumes the host IOMMU can support only a single
- * contiguous IOVA window.  We may need to generalize that in
- * future
- */
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_HEAD(, VFIOGroup) group_list;
-- 
2.20.1

Re: [Qemu-devel] [PATCH v4 01/13] vfio: KABI for migration interface

2019-06-21 Thread Alex Williamson

On Sat, 22 Jun 2019 02:00:08 +0530
Kirti Wankhede  wrote:
> On 6/22/2019 1:30 AM, Alex Williamson wrote:
> > On Sat, 22 Jun 2019 01:05:48 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 6/21/2019 8:33 PM, Alex Williamson wrote:  
> >>> On Fri, 21 Jun 2019 11:22:15 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  On 6/20/2019 10:48 PM, Alex Williamson wrote:
> > On Thu, 20 Jun 2019 20:07:29 +0530
> > Kirti Wankhede  wrote:
> >   
> >> - Defined MIGRATION region type and sub-type.
> >> - Used 3 bits to define VFIO device states.
> >> Bit 0 => _RUNNING
> >> Bit 1 => _SAVING
> >> Bit 2 => _RESUMING
> >> Combination of these bits defines VFIO device's state during 
> >> migration
> >> _STOPPED => All bits 0 indicates VFIO device stopped.
> >> _RUNNING => Normal VFIO device running state.
> >> _SAVING | _RUNNING => vCPUs are running, VFIO device is running 
> >> but start
> >>   saving state of device i.e. pre-copy state
> >> _SAVING  => vCPUs are stoppped, VFIO device should be stopped, and
> >>   save device state,i.e. stop-n-copy state
> >> _RESUMING => VFIO device resuming state.
> >> _SAVING | _RESUMING => Invalid state if _SAVING and _RESUMING bits 
> >> are set
> >> - Defined vfio_device_migration_info structure which will be placed at 
> >> 0th
> >>   offset of migration region to get/set VFIO device related 
> >> information.
> >>   Defined members of structure and usage on read/write access:
> >> * device_state: (read/write)
> >> To convey VFIO device state to be transitioned to. Only 3 bits 
> >> are used
> >> as of now.
> >> * pending bytes: (read only)
> >> To get pending bytes yet to be migrated for VFIO device.
> >> * data_offset: (read only)
> >> To get data offset in migration from where data exist during 
> >> _SAVING
> >> and from where data should be written by user space 
> >> application during
> >>  _RESUMING state
> >> * data_size: (read/write)
> >> To get and set size of data copied in migration region during 
> >> _SAVING
> >> and _RESUMING state.
> >> * start_pfn, page_size, total_pfns: (write only)
> >> To get bitmap of dirty pages from vendor driver from given
> >> start address for total_pfns.
> >> * copied_pfns: (read only)
> >> To get number of pfns bitmap copied in migration region.
> >> Vendor driver should copy the bitmap with bits set only for
> >> pages to be marked dirty in migration region. Vendor driver
> >> should return 0 if there are 0 pages dirty in requested
> >> range. Vendor driver should return -1 to mark all pages in the 
> >> section
> >> as dirty
> >>
> >> Migration region looks like:
> >>  --
> >> |vfio_device_migration_info|data section  |
> >> |  | ///  |
> >>  --
> >>  ^  ^  ^
> >>  offset 0-trapped partdata_offset data_size
> >>
> >> Data section is always followed by vfio_device_migration_info
> >> structure in the region, so data_offset will always be none-0.
> >> Offset from where data is copied is decided by kernel driver, data
> >> section can be trapped or mapped depending on how kernel driver
> >> defines data section. If mmapped, then data_offset should be page
> >> aligned, where as initial section which contain
> >> vfio_device_migration_info structure might not end at offset which
> >> is page aligned.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  linux-headers/linux/vfio.h | 71 
> >> ++
> >>  1 file changed, 71 insertions(+)
> >>
> >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> >> index 24f505199f83..274ec477eb82 100644
> >> --- a/linux-headers/linux/vfio.h
> >> +++ b/linux-headers/linux/vfio.h
> >> @@ -372,6 +372,77 @@ struct vfio_region_gfx_edid {
> >>   */
> >>  #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD  (1)
> >>  
> >> +/* Migration region type and sub-type */
> >> +#define VFIO_REGION_TYPE_MIGRATION(2)
> >> +#define VFIO_REGION_SUBTYPE_MIGRATION (1)
> >> +
> >> +/**
> >> + * Structure vfio_device_migration_info is placed at 0th offset of
> >> + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device 
> >>

Re: [Qemu-devel] [PATCH 07/12] block/backup: add 'always' bitmap sync policy

2019-06-21 Thread Max Reitz

On 21.06.19 22:58, John Snow wrote:
> 
> 
> On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:

[...]

Just chiming in on this:

>> "So on cancel and abort you synchronize bitmap too?"
> 
> I will concede that this means that if you ask for a bitmap backup with
> the 'always' policy and, for whatever reason change your mind about
> this, there's no way to "cancel" the job in a manner that does not edit
> the bitmap at this point.
> 
> I do agree that this seems to go against the wishes of the user, because
> we have different "kinds" of cancellations:
> 
> A) Cancellations that actually represent failures in transactions
> B) Cancellations that represent genuine user intention
> 
> It might be nice to allow the user to say "No wait, please don't edit
> that bitmap, I made a mistake!"

So that “always” doesn’t mean “always”?  To me, that seems like not so
good an idea.

If the user uses always, they have to live with that.  I had to live
with calling “rm” on the wrong file before.  Life’s tough.

In all seriousness: “Always” is not something a user would use, is it?
It’s something for management tools.  Why would they cancel because
“They made a mistake”?

Second, what’s the worst thing that may come out of such a mistake?
Having to perform a full backup?  If so, that doesn’t seem so bad to me.
 It certainly doesn’t seem so bad to make an unrelated mechanic have an
influence on whether “always” means “always”.

Also, this cancel idea would only work for jobs where the bitmap mode
does not come into play until the job is done, i.e. backup.  I suppose
if we want to have bitmap modes other than 'always' for mirror, that too
would have to make a copy of the user-supplied bitmap, so there the
bitmap mode would make a difference only at the end of the job, too, but
who knows.

And if it only makes a difference at the end of the job, you might as
well just add a way to change a running job’s bitmap-mode.

Max

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 0/2] target/i386: kvm: Fix treatment of AMD SVM in nested migration

2019-06-21 Thread Liran Alon

Hi,

This patch series aims to fix the recent patch-series that was just merged to
upstream QEMU master branch which adds support for nested migration.

The already merged patch-series was modified during merge to allow migration of 
vCPU
exposed with SVM even though kernel does not support save/restore of required 
nested state.
This was done because of considering backwards-compatability.

However, during discussion made after merge, it was realised that since QEMU 
commit
75d373ef9729 ("target-i386: Disable SVM by default in KVM mode"), an AMD vCPU 
that
is virtualized by KVM doesn't expose SVM by default, even if you use "-cpu 
host".
Therefore, it is unlikely that vCPU expose SVM CPUID flag when user is not 
running
an SVM workload inside guest.

Therefore, this patch series change code back to original intent to block 
migration
in case of vCPU exposed with SVM if kernel does not support required 
capabilities
tos ave/restore nested-state.

Regards,
-Liran

[Qemu-devel] [PATCH 2/2] target/i386: kvm: Init nested-state in case of vCPU exposed with SVM

2019-06-21 Thread Liran Alon

Even though current most upstream kernel does not support save/restore
of nested-state in case of AMD SVM, prepare QEMU code to init
relevant nested-state struct fields.

Reviewed-by: Mark Kanda 
Reviewed-by: Karl Heubaum 
Signed-off-by: Liran Alon 
---
 target/i386/kvm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index c2bae6a3023a..be192e54a80b 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1714,13 +1714,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 env->nested_state->size = max_nested_state_len;
 
-if (IS_INTEL_CPU(env)) {
+if (cpu_has_vmx(env)) {
 struct kvm_vmx_nested_state_hdr *vmx_hdr =
 >nested_state->hdr.vmx;
-
 env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
 vmx_hdr->vmxon_pa = -1ull;
 vmx_hdr->vmcs12_pa = -1ull;
+} else if (cpu_has_svm(env)) {
+env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM;
 }
 }
 
-- 
2.20.1

[Qemu-devel] [PATCH 1/2] target/i386: kvm: Block migration on vCPU exposed with SVM when kernel lacks caps to save/restore nested state

2019-06-21 Thread Liran Alon

Commit 18ab37ba1cee ("target/i386: kvm: Block migration for vCPUs exposed with 
nested virtualization")
was originally suppose to block migration for vCPUs exposed with nested 
virtualization, either Intel VMX
or AMD SVM. However, during merge to upstream, commit was changed such that it 
doesn't even compile...

This was done unintentionally in an attempt to modify submitted patch-series 
such that commit
12604092e26c ("target/i386: kvm: Add nested migration blocker only when kernel 
lacks required capabilities")
will only block migration of vCPU exposed with VMX but still allow migration of 
vCPU exposed
with SVM.

However, since QEMU commit 75d373ef9729 ("target-i386: Disable SVM by default 
in KVM mode"),
an AMD vCPU that is virtualized by KVM doesn’t expose SVM by default, even if 
you use "-cpu host".
Therefore, it is unlikely that vCPU expose SVM CPUID flag when user is not 
running an SVM
workload inside guest.

Therefore, change code back to original intent to block migration in
case of vCPU exposed with SVM if kernel does not support required
capabilities to save/restore nested-state.

Fixes: 12604092e26c ("target/i386: kvm: Add nested migration blocker only when 
kernel lacks required capabilities")
Reviewed-by: Mark Kanda 
Reviewed-by: Karl Heubaum 
Signed-off-by: Liran Alon 
---
 target/i386/cpu.h | 10 ++
 target/i386/kvm.c |  2 +-
 target/i386/machine.c |  2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 93345792f4cb..cbe904beeb25 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1867,6 +1867,16 @@ static inline bool cpu_has_vmx(CPUX86State *env)
 return env->features[FEAT_1_ECX] & CPUID_EXT_VMX;
 }
 
+static inline bool cpu_has_svm(CPUX86State *env)
+{
+return env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM;
+}
+
+static inline bool cpu_has_nested_virt(CPUX86State *env)
+{
+return (cpu_has_vmx(env) || cpu_has_svm(env));
+}
+
 /* fpu_helper.c */
 void update_fp_status(CPUX86State *env);
 void update_mxcsr_status(CPUX86State *env);
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index e4b4f5756a34..c2bae6a3023a 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1640,7 +1640,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
   !!(c->ecx & CPUID_EXT_SMX);
 }
 
-if (cpu_has_vmx(env) && !nested_virt_mig_blocker &&
+if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker &&
 ((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) {
 error_setg(_virt_mig_blocker,
"Kernel do not provide required capabilities for "
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 851b249d1a39..f4d502386df4 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -233,7 +233,7 @@ static int cpu_pre_save(void *opaque)
 
 #ifdef CONFIG_KVM
 /* Verify we have nested virtualization state from kernel if required */
-if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) {
+if (kvm_enabled() && cpu_has_nested_virt(env) && !env->nested_state) {
 error_report("Guest enabled nested virtualization but kernel "
 "does not support saving of nested state");
 return -EINVAL;
-- 
2.20.1

Re: [Qemu-devel] [PATCH 06/12] block/dirty-bitmap: add bdrv_dirty_bitmap_claim

2019-06-21 Thread John Snow




On 6/21/19 7:58 AM, Vladimir Sementsov-Ogievskiy wrote:
> 20.06.2019 19:36, John Snow wrote:
>>
>>
>> On 6/20/19 12:02 PM, Max Reitz wrote:
>>> On 20.06.19 03:03, John Snow wrote:
 This function can claim an hbitmap to replace its own current hbitmap.
 In the case that the granularities do not match, it will use
 hbitmap_merge to copy the bit data instead.
>>>
>>> I really do not like this name because to me it implies a relationship
>>> to bdrv_reclaim_dirty_bitmap().  Maybe just bdrv_dirty_bitmap_take()?
>>> Or, if you want to get more fancy, bdrv_dirty_dirty_bitmap_steal().
>>> (Because references are taken or stolen.)
>>>
>>
>> take or steal is good. I just wanted to highlight that it truly takes
>> ownership. The double-pointer and erasing the caller's reference for
>> emphasis of this idea.
> 
> Didn't you consider bdrv_dirty_bitmap_set_hbitmap? Hmm, but your function
> actually eats pointer, so 'set' is not enough.. bdrv_dirty_bitmap_eat_hbitmap?
> 

:)

> And to be serious: it is the point where we definitely should drop 
> HBitmap:meta
> field, as it in bad relation with parent hbitmap stealing.
> 

You're right, I didn't consider how this would interact with that. Allow
me the time to re-audit how this feature works, there's clearly a lot of
problems with what I've proposed for cross-granularity merging.

>>
>>> The latter might fit in nicely with the abdication theme.  We’d just
>>> need to rename bdrv_reclaim_dirty_bitmap() to
>>> bdrv_dirty_bitmap_backstab(), and it’d be perfect.
>>>
>>
>> Don't tempt me; I do like my silly function names. You are lucky I don't
>> call
>>
>>> (On another note: bdrv_restore_dirty_bitmap() vs.
>>> bdrv_dirty_bitmap_restore() – really? :-/)
>>>
>>
>> I have done a terrible job at enforcing any kind of consistency here,
>> and it gets me all the time too. I had a big series that re-arranged and
>> re-named a ton of stuff just to make things a little more nicer, but I
>> let it bitrot because I didn't want to deal with the bike-shedding.
>>
>> I do agree I am in desperate need of a spring cleaning in here.
>>
>> One thing that does upset me quite often is that the canonical name for
>> the structure is "bdrv dirty bitmap", which makes the function names all
>> quite long.
>>
 Signed-off-by: John Snow 
 ---
   include/block/block_int.h |  1 +
   include/qemu/hbitmap.h|  8 
   block/dirty-bitmap.c  | 14 ++
   util/hbitmap.c|  5 +
   4 files changed, 28 insertions(+)
>>>
>>> The implementation looks good to me.
>>>
>>> Max
>>>
>>
> 
>

Re: [Qemu-devel] [RFC PATCH 0/9] hw/acpi: make build_madt arch agnostic

2019-06-21 Thread Wei Yang

On Fri, Jun 21, 2019 at 10:11:31AM +0200, Igor Mammedov wrote:
>On Fri, 21 Jun 2019 08:56:44 +0800
>Wei Yang  wrote:
>
>> On Thu, Jun 20, 2019 at 05:04:29PM +0200, Igor Mammedov wrote:
>> >On Thu, 20 Jun 2019 14:18:42 +
>> >Wei Yang  wrote:
>> >  
>> >> On Wed, Jun 19, 2019 at 11:04:40AM +0200, Igor Mammedov wrote:  
>> >> >On Wed, 19 Jun 2019 14:20:50 +0800
>> >> >Wei Yang  wrote:
>> >> >
>> >> >> On Tue, Jun 18, 2019 at 05:59:56PM +0200, Igor Mammedov wrote:
>> >> >> >
>> >> >> >On Mon, 13 May 2019 14:19:04 +0800
>> >> >> >Wei Yang  wrote:
>> >> >> >  
>> >> >> >> Now MADT is highly depend in architecture and machine type and 
>> >> >> >> leaves
>> >> >> >> duplicated code in different architecture. The series here tries to 
>> >> >> >> generalize
>> >> >> >> it.
>> >> >> >> 
>> >> >> >> MADT contains one main table and several sub tables. These sub 
>> >> >> >> tables are
>> >> >> >> highly related to architecture. Here we introduce one method to 
>> >> >> >> make it
>> >> >> >> architecture agnostic.
>> >> >> >> 
>> >> >> >>   * each architecture define its sub-table implementation function 
>> >> >> >> in madt_sub
>> >> >> >>   * introduces struct madt_input to collect sub table information 
>> >> >> >> and pass to
>> >> >> >> build_madt
>> >> >> >> 
>> >> >> >> By doing so, each architecture could prepare its own sub-table 
>> >> >> >> implementation
>> >> >> >> and madt_input. And keep build_madt architecture agnostic.  
>> >> >> >
>> >> >> >I've skimmed over patches, and to me it looks mostly as code movement
>> >> >> >without apparent benefits and probably a bit more complex than what 
>> >> >> >we have now
>> >> >> >(it might be ok cost if it simplifies MADT support for other boards).
>> >> >> >
>> >> >> >Before I do line by line review could you demonstrate what effect new 
>> >> >> >way
>> >> >> >to build MADT would have on arm/virt and i386/virt (from NEMU). So it 
>> >> >> >would be
>> >> >> >possible to estimate net benefits from new approach?
>> >> >> >(PS: it doesn't have to be patches ready for merging, just a dirty 
>> >> >> >hack
>> >> >> >that would demonstrate adding MADT for new board using mad_sub[])
>> >> >> >  
>> >> >> 
>> >> >> Per APIC spec 5.2.12, MADT contains a *main* table and several *sub* 
>> >> >> tables
>> >> >> (Interrupt Controllere), so the idea is give a callback hook in
>> >> >> AcpiDeviceIfClass for each table, including *main* and *sub* table.
>> >> >> 
>> >> >> Current AcpiDeviceIfClass has one callback pc_madt_cpu_entry for some 
>> >> >> *sub*
>> >> >> tables, after replacing the AcpiDeviceIfClass will look like this:
>> >> >> 
>> >> >> typedef struct AcpiDeviceIfClass {
>> >> >> /*  */
>> >> >> InterfaceClass parent_class;
>> >> >> 
>> >> >> /*  */
>> >> >> void (*ospm_status)(AcpiDeviceIf *adev, ACPIOSTInfoList ***list);
>> >> >> void (*send_event)(AcpiDeviceIf *adev, AcpiEventStatusBits ev);
>> >> >> -   void (*madt_cpu)(AcpiDeviceIf *adev, int uid,
>> >> >> -const CPUArchIdList *apic_ids, GArray *entry);
>> >> >> +   madt_operation madt_main;
>> >> >> +   madt_operation *madt_sub;
>> >> >> } AcpiDeviceIfClass;
>> >> >> 
>> >> >> By doing so, each arch could have its own implementation for MADT.
>> >> >> 
>> >> >> After this refactoring, build_madt could be simplified to:
>> >> >> 
>> >> >> build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState 
>> >> >> *pcms,
>> >> >>struct madt_input *input)
>> >> >> {
>> >> >> ...
>> >> >> 
>> >> >> if (adevc->madt_main) {
>> >> >> adevc->madt_main(table_data, madt);
>> >> >> }
>> >> >> 
>> >> >> for (i = 0; ; i++) {
>> >> >> sub_id = input[i].sub_id;
>> >> >> if (sub_id == ACPI_APIC_RESERVED) {
>> >> >> break;
>> >> >> }
>> >> >> opaque = input[i].opaque;
>> >> >> adevc->madt_sub[sub_id](table_data, opaque);
>> >> >> }
>> >> >> 
>> >> >> ...
>> >> >> }
>> >> >> 
>> >> >> input is a list of data necessary to build *sub* table. Its details is 
>> >> >> also
>> >> >> arch dependent.
>> >> >I've got general idea reading patches in this series.
>> >> >As I've mentioned before it's hard to generalize MADT since it
>> >> >mostly contains entries unique for target/board.
>> >> >Goal here isn't generalizing at any cost, but rather find out
>> >> >if there is enough common code to justify generalization
>> >> >and if it allows us to reduce code duplication and simplify.
>> >> >
>> >> >> For following new arch, what it need to do is prepare the input array 
>> >> >> and
>> >> >> implement necessary *main*/*sub* table callbacks.
>> >> >What I'd like to see is the actual patch that does this,
>> >> >to see if it has any merit and to compare to the current
>> >> >approach.
>> >> 
>> >> I didn't get some idea about your approach. Would you mind sharing more 
>> >> light?  
>> >With current approach, 'each board' has its own MADT build routine.

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Kirti Wankhede




On 6/22/2019 2:02 AM, Alex Williamson wrote:
> On Sat, 22 Jun 2019 01:37:47 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/22/2019 1:32 AM, Alex Williamson wrote:
>>> On Sat, 22 Jun 2019 01:08:40 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 6/21/2019 8:46 PM, Alex Williamson wrote:  
> On Fri, 21 Jun 2019 12:08:26 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/21/2019 12:55 AM, Alex Williamson wrote:
>>> On Thu, 20 Jun 2019 20:07:36 +0530
>>> Kirti Wankhede  wrote:
>>>   
 Added .save_live_pending, .save_live_iterate and 
 .save_live_complete_precopy
 functions. These functions handles pre-copy and stop-and-copy phase.

 In _SAVING|_RUNNING device state or pre-copy phase:
 - read pending_bytes
 - read data_offset - indicates kernel driver to write data to staging
   buffer which is mmapped.  
>>>
>>> Why is data_offset the trigger rather than data_size?  It seems that
>>> data_offset can't really change dynamically since it might be mmap'd,
>>> so it seems unnatural to bother re-reading it.
>>>   
>>
>> Vendor driver can change data_offset, he can have different data_offset
>> for device data and dirty pages bitmap.
>>
 - read data_size - amount of data in bytes written by vendor driver in 
 migration
   region.
 - if data section is trapped, pread() number of bytes in data_size, 
 from
   data_offset.
 - if data section is mmaped, read mmaped buffer of size data_size.
 - Write data packet to file stream as below:
 {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
 VFIO_MIG_FLAG_END_OF_STATE }

 In _SAVING device state or stop-and-copy phase
 a. read config space of device and save to migration file stream. This
doesn't need to be from vendor driver. Any other special config 
 state
from driver can be saved as data in following iteration.
 b. read pending_bytes - indicates kernel driver to write data to 
 staging
buffer which is mmapped.  
>>>
>>> Is it pending_bytes or data_offset that triggers the write out of
>>> data?  Why pending_bytes vs data_size?  I was interpreting
>>> pending_bytes as the total data size while data_size is the size
>>> available to read now, so assumed data_size would be more closely
>>> aligned to making the data available.
>>>   
>>
>> Sorry, that's my mistake while editing, its read data_offset as in above
>> case.
>>
 c. read data_size - amount of data in bytes written by vendor driver in
migration region.
 d. if data section is trapped, pread() from data_offset of size 
 data_size.
 e. if data section is mmaped, read mmaped buffer of size data_size.
   
>>>
>>> Should this read as "pread() from data_offset of data_size, or
>>> optionally if mmap is supported on the data area, read data_size from
>>> start of mapped buffer"?  IOW, pread should always work.  Same in
>>> previous section.
>>>   
>>
>> ok. I'll update.
>>
 f. Write data packet as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
 g. iterate through steps b to f until (pending_bytes > 0)  
>>>
>>> s/until/while/  
>>
>> Ok.
>>
>>>   
 h. Write {VFIO_MIG_FLAG_END_OF_STATE}

 .save_live_iterate runs outside the iothread lock in the migration 
 case, which
 could race with asynchronous call to get dirty page list causing data 
 corruption
 in mapped migration region. Mutex added here to serial migration 
 buffer read
 operation.  
>>>
>>> Would we be ahead to use different offsets within the region for device
>>> data vs dirty bitmap to avoid this?
>>>  
>>
>> Lock will still be required to serialize the read/write operations on
>> vfio_device_migration_info structure in the region.
>>
>>
 Signed-off-by: Kirti Wankhede 
 Reviewed-by: Neo Jia 
 ---
  hw/vfio/migration.c | 212 
 
  1 file changed, 212 insertions(+)

 diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
 index fe0887c27664..0a2f30872316 100644
 --- a/hw/vfio/migration.c
 +++ b/hw/vfio/migration.c
 @@ -107,6 +107,111 @@ static int vfio_migration_set_state(VFIODevice 
 *vbasedev, uint32_t state)
  return 0;
  }
  
 +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
 +{
 +VFIOMigration *migration = vbasedev->migration;
 +VFIORegion *region = >region.buffer;

Re: [Qemu-devel] [PATCH 07/12] block/backup: add 'always' bitmap sync policy

2019-06-21 Thread John Snow

On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 16:08, Vladimir Sementsov-Ogievskiy wrote:
>> 21.06.2019 15:59, Vladimir Sementsov-Ogievskiy wrote:
>>> 21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:

^ Home Run!

I'm going to reply to all four of these mails at once below, I'm sorry
for the words but I want to make sure I am being clear in my intent.

 20.06.2019 4:03, John Snow wrote:
> This adds an "always" policy for bitmap synchronization. Regardless of if
> the job succeeds or fails, the bitmap is *always* synchronized. This means
> that for backups that fail part-way through, the bitmap retains a record 
> of
> which sectors need to be copied out to accomplish a new backup using the
> old, partial result.
>
> In effect, this allows us to "resume" a failed backup; however the new 
> backup
> will be from the new point in time, so it isn't a "resume" as much as it 
> is
> an "incremental retry." This can be useful in the case of extremely large
> backups that fail considerably through the operation and we'd like to not 
> waste
> the work that was already performed.
>
> Signed-off-by: John Snow 
> ---
>   qapi/block-core.json |  5 -
>   block/backup.c   | 10 ++
>   2 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0332dcaabc..58d267f1f5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1143,6 +1143,9 @@
>   # An enumeration of possible behaviors for the synchronization of a 
> bitmap
>   # when used for data copy operations.
>   #
> +# @always: The bitmap is always synchronized with remaining blocks to 
> copy,
> +#  whether or not the operation has completed successfully or 
> not.

 Hmm, now I think that 'always' sounds a bit like 'really always' i.e. 
 during backup
 too, which is confusing.. But I don't have better suggestion.

I could probably clarify to say "at the conclusion of the operation",
but we should also keep in mind that bitmaps tied to an operation can't
be used during that timeframe anyway.

> +#
>   # @conditional: The bitmap is only synchronized when the operation is 
> successul.
>   #   This is useful for Incremental semantics.
>   #
> @@ -1153,7 +1156,7 @@
>   # Since: 4.1
>   ##
>   { 'enum': 'BitmapSyncMode',
> -  'data': ['conditional', 'never'] }
> +  'data': ['always', 'conditional', 'never'] }
>   ##
>   # @MirrorCopyMode:
> diff --git a/block/backup.c b/block/backup.c
> index 627f724b68..beb2078696 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -266,15 +266,17 @@ static void 
> backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>   BlockDriverState *bs = blk_bs(job->common.blk);
>   if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
> -    /* Failure, or we don't want to synchronize the bitmap.
> - * Merge the successor back into the parent, delete nothing. */
> +    /* Failure, or we don't want to synchronize the bitmap. */
> +    if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
> +    bdrv_dirty_bitmap_claim(job->sync_bitmap, >copy_bitmap);
> +    }
> +    /* Merge the successor back into the parent. */
>   bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);

 Hmm good, it should work. It's a lot more tricky, than just
 "synchronized with remaining blocks to copy", but I'm not sure the we need 
 more details in
 spec.

Right, it's complicated because backups involve two points in time; the
start and finish of the operation. The actual technical truth of what
happens is hard to phrase succinctly.

It was difficult to phrase for even the normal Incremental/conditional
mode that we have.

I can't help but feel I need to write a blog post that has some good
diagrams that can be used to explain the concept clearly.

 What we have in backup? So, from one hand we have an incremental backup, 
 and a bitmap, counting from it.
 On the other hand it's not normal incremental backup, as it don't 
 correspond to any valid state of vm disk,
 and it may be used only as a backing in a chain of further successful 
 incremental backup, yes?

You can also continue writing directly into it, which is likely the
smarter choice because it saves you the trouble of doing an intermediate
block commit later, and then you don't keep any image files that are
"meaningless" by themselves.

However, yes, iotest 257 uses them as backing images.

 And then I think: with this mode we can not stop on first error, but 
 ignore it, just leaving dirty bit for
 resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Alex Williamson

On Sat, 22 Jun 2019 01:37:47 +0530
Kirti Wankhede  wrote:

> On 6/22/2019 1:32 AM, Alex Williamson wrote:
> > On Sat, 22 Jun 2019 01:08:40 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 6/21/2019 8:46 PM, Alex Williamson wrote:  
> >>> On Fri, 21 Jun 2019 12:08:26 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  On 6/21/2019 12:55 AM, Alex Williamson wrote:
> > On Thu, 20 Jun 2019 20:07:36 +0530
> > Kirti Wankhede  wrote:
> >   
> >> Added .save_live_pending, .save_live_iterate and 
> >> .save_live_complete_precopy
> >> functions. These functions handles pre-copy and stop-and-copy phase.
> >>
> >> In _SAVING|_RUNNING device state or pre-copy phase:
> >> - read pending_bytes
> >> - read data_offset - indicates kernel driver to write data to staging
> >>   buffer which is mmapped.  
> >
> > Why is data_offset the trigger rather than data_size?  It seems that
> > data_offset can't really change dynamically since it might be mmap'd,
> > so it seems unnatural to bother re-reading it.
> >   
> 
>  Vendor driver can change data_offset, he can have different data_offset
>  for device data and dirty pages bitmap.
> 
> >> - read data_size - amount of data in bytes written by vendor driver in 
> >> migration
> >>   region.
> >> - if data section is trapped, pread() number of bytes in data_size, 
> >> from
> >>   data_offset.
> >> - if data section is mmaped, read mmaped buffer of size data_size.
> >> - Write data packet to file stream as below:
> >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
> >> VFIO_MIG_FLAG_END_OF_STATE }
> >>
> >> In _SAVING device state or stop-and-copy phase
> >> a. read config space of device and save to migration file stream. This
> >>doesn't need to be from vendor driver. Any other special config 
> >> state
> >>from driver can be saved as data in following iteration.
> >> b. read pending_bytes - indicates kernel driver to write data to 
> >> staging
> >>buffer which is mmapped.  
> >
> > Is it pending_bytes or data_offset that triggers the write out of
> > data?  Why pending_bytes vs data_size?  I was interpreting
> > pending_bytes as the total data size while data_size is the size
> > available to read now, so assumed data_size would be more closely
> > aligned to making the data available.
> >   
> 
>  Sorry, that's my mistake while editing, its read data_offset as in above
>  case.
> 
> >> c. read data_size - amount of data in bytes written by vendor driver in
> >>migration region.
> >> d. if data section is trapped, pread() from data_offset of size 
> >> data_size.
> >> e. if data section is mmaped, read mmaped buffer of size data_size.
> >>   
> >
> > Should this read as "pread() from data_offset of data_size, or
> > optionally if mmap is supported on the data area, read data_size from
> > start of mapped buffer"?  IOW, pread should always work.  Same in
> > previous section.
> >   
> 
>  ok. I'll update.
> 
> >> f. Write data packet as below:
> >>{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
> >> g. iterate through steps b to f until (pending_bytes > 0)  
> >
> > s/until/while/  
> 
>  Ok.
> 
> >   
> >> h. Write {VFIO_MIG_FLAG_END_OF_STATE}
> >>
> >> .save_live_iterate runs outside the iothread lock in the migration 
> >> case, which
> >> could race with asynchronous call to get dirty page list causing data 
> >> corruption
> >> in mapped migration region. Mutex added here to serial migration 
> >> buffer read
> >> operation.  
> >
> > Would we be ahead to use different offsets within the region for device
> > data vs dirty bitmap to avoid this?
> >  
> 
>  Lock will still be required to serialize the read/write operations on
>  vfio_device_migration_info structure in the region.
> 
> 
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  hw/vfio/migration.c | 212 
> >> 
> >>  1 file changed, 212 insertions(+)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index fe0887c27664..0a2f30872316 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -107,6 +107,111 @@ static int vfio_migration_set_state(VFIODevice 
> >> *vbasedev, uint32_t state)
> >>  return 0;
> >>  }
> >>  
> >> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
> >> +{
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +VFIORegion *region = >region.buffer;
> >> +uint64_t data_offset = 0, data_size = 0;
>

Re: [Qemu-devel] [PATCH v4 01/13] vfio: KABI for migration interface

2019-06-21 Thread Kirti Wankhede



DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1;
t=1561149027; bh=Vx3+8epE7Of2rlG84nWeSfu7DRT+uJEzppnOePdWLeI=;
h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From:
 Message-ID:Date:MIME-Version:In-Reply-To:X-Originating-IP:
 X-ClientProxiedBy:Content-Type:Content-Language:
 Content-Transfer-Encoding;
b=AStPyZHPNo4pVIL8/dAX4sDjtHxRlQWMef+MfuN0Ji/9iaLyNFPxxWTsnedkiYO2P
 R6GBmUd0qonw7ekWikkjQHAG2qUGDp+v0bF8Ty28Ea29ontCiHnRNszzSExFdCCPfa
 ie+31KINue8uKtncmacxyqVK6qZh+97PfsQ4fZTOpFT+fk0vx8PkrKv4QYFFD4HjAq
 czjuZ2V3Kvaekug8J3z1OS19D9qBPPZPy+dLLJUKDP79rwjCt5864jGI5Dw1xrjQZ3
 6Zf6qi/eUdohu1DG7YmV7D+UB0qDMO3GW/p9e3KpSIuI6/M7ttLcH0+0fsJ+Cpp0VM
 HqZfnKLeJgctg==

On 6/22/2019 1:30 AM, Alex Williamson wrote:
> On Sat, 22 Jun 2019 01:05:48 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/21/2019 8:33 PM, Alex Williamson wrote:
>>> On Fri, 21 Jun 2019 11:22:15 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 6/20/2019 10:48 PM, Alex Williamson wrote:  
> On Thu, 20 Jun 2019 20:07:29 +0530
> Kirti Wankhede  wrote:
> 
>> - Defined MIGRATION region type and sub-type.
>> - Used 3 bits to define VFIO device states.
>> Bit 0 => _RUNNING
>> Bit 1 => _SAVING
>> Bit 2 => _RESUMING
>> Combination of these bits defines VFIO device's state during 
>> migration
>> _STOPPED => All bits 0 indicates VFIO device stopped.
>> _RUNNING => Normal VFIO device running state.
>> _SAVING | _RUNNING => vCPUs are running, VFIO device is running but 
>> start
>>   saving state of device i.e. pre-copy state
>> _SAVING  => vCPUs are stoppped, VFIO device should be stopped, and
>>   save device state,i.e. stop-n-copy state
>> _RESUMING => VFIO device resuming state.
>> _SAVING | _RESUMING => Invalid state if _SAVING and _RESUMING bits 
>> are set
>> - Defined vfio_device_migration_info structure which will be placed at 
>> 0th
>>   offset of migration region to get/set VFIO device related information.
>>   Defined members of structure and usage on read/write access:
>> * device_state: (read/write)
>> To convey VFIO device state to be transitioned to. Only 3 bits 
>> are used
>> as of now.
>> * pending bytes: (read only)
>> To get pending bytes yet to be migrated for VFIO device.
>> * data_offset: (read only)
>> To get data offset in migration from where data exist during 
>> _SAVING
>> and from where data should be written by user space application 
>> during
>>  _RESUMING state
>> * data_size: (read/write)
>> To get and set size of data copied in migration region during 
>> _SAVING
>> and _RESUMING state.
>> * start_pfn, page_size, total_pfns: (write only)
>> To get bitmap of dirty pages from vendor driver from given
>> start address for total_pfns.
>> * copied_pfns: (read only)
>> To get number of pfns bitmap copied in migration region.
>> Vendor driver should copy the bitmap with bits set only for
>> pages to be marked dirty in migration region. Vendor driver
>> should return 0 if there are 0 pages dirty in requested
>> range. Vendor driver should return -1 to mark all pages in the 
>> section
>> as dirty
>>
>> Migration region looks like:
>>  --
>> |vfio_device_migration_info|data section  |
>> |  | ///  |
>>  --
>>  ^  ^  ^
>>  offset 0-trapped partdata_offset data_size
>>
>> Data section is always followed by vfio_device_migration_info
>> structure in the region, so data_offset will always be none-0.
>> Offset from where data is copied is decided by kernel driver, data
>> section can be trapped or mapped depending on how kernel driver
>> defines data section. If mmapped, then data_offset should be page
>> aligned, where as initial section which contain
>> vfio_device_migration_info structure might not end at offset which
>> is page aligned.
>>
>> Signed-off-by: Kirti Wankhede 
>> Reviewed-by: Neo Jia 
>> ---
>>  linux-headers/linux/vfio.h | 71 
>> ++
>>  1 file changed, 71 insertions(+)
>>
>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>> index 24f505199f83..274ec477eb82 100644
>> --- a/linux-headers/linux/vfio.h
>> +++

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Kirti Wankhede




On 6/22/2019 1:32 AM, Alex Williamson wrote:
> On Sat, 22 Jun 2019 01:08:40 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/21/2019 8:46 PM, Alex Williamson wrote:
>>> On Fri, 21 Jun 2019 12:08:26 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 6/21/2019 12:55 AM, Alex Williamson wrote:  
> On Thu, 20 Jun 2019 20:07:36 +0530
> Kirti Wankhede  wrote:
> 
>> Added .save_live_pending, .save_live_iterate and 
>> .save_live_complete_precopy
>> functions. These functions handles pre-copy and stop-and-copy phase.
>>
>> In _SAVING|_RUNNING device state or pre-copy phase:
>> - read pending_bytes
>> - read data_offset - indicates kernel driver to write data to staging
>>   buffer which is mmapped.
>
> Why is data_offset the trigger rather than data_size?  It seems that
> data_offset can't really change dynamically since it might be mmap'd,
> so it seems unnatural to bother re-reading it.
> 

 Vendor driver can change data_offset, he can have different data_offset
 for device data and dirty pages bitmap.
  
>> - read data_size - amount of data in bytes written by vendor driver in 
>> migration
>>   region.
>> - if data section is trapped, pread() number of bytes in data_size, from
>>   data_offset.
>> - if data section is mmaped, read mmaped buffer of size data_size.
>> - Write data packet to file stream as below:
>> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
>> VFIO_MIG_FLAG_END_OF_STATE }
>>
>> In _SAVING device state or stop-and-copy phase
>> a. read config space of device and save to migration file stream. This
>>doesn't need to be from vendor driver. Any other special config state
>>from driver can be saved as data in following iteration.
>> b. read pending_bytes - indicates kernel driver to write data to staging
>>buffer which is mmapped.
>
> Is it pending_bytes or data_offset that triggers the write out of
> data?  Why pending_bytes vs data_size?  I was interpreting
> pending_bytes as the total data size while data_size is the size
> available to read now, so assumed data_size would be more closely
> aligned to making the data available.
> 

 Sorry, that's my mistake while editing, its read data_offset as in above
 case.
  
>> c. read data_size - amount of data in bytes written by vendor driver in
>>migration region.
>> d. if data section is trapped, pread() from data_offset of size 
>> data_size.
>> e. if data section is mmaped, read mmaped buffer of size data_size.
>
> Should this read as "pread() from data_offset of data_size, or
> optionally if mmap is supported on the data area, read data_size from
> start of mapped buffer"?  IOW, pread should always work.  Same in
> previous section.
> 

 ok. I'll update.
  
>> f. Write data packet as below:
>>{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
>> g. iterate through steps b to f until (pending_bytes > 0)
>
> s/until/while/

 Ok.
  
> 
>> h. Write {VFIO_MIG_FLAG_END_OF_STATE}
>>
>> .save_live_iterate runs outside the iothread lock in the migration case, 
>> which
>> could race with asynchronous call to get dirty page list causing data 
>> corruption
>> in mapped migration region. Mutex added here to serial migration buffer 
>> read
>> operation.
>
> Would we be ahead to use different offsets within the region for device
> data vs dirty bitmap to avoid this?
>

 Lock will still be required to serialize the read/write operations on
 vfio_device_migration_info structure in the region.

  
>> Signed-off-by: Kirti Wankhede 
>> Reviewed-by: Neo Jia 
>> ---
>>  hw/vfio/migration.c | 212 
>> 
>>  1 file changed, 212 insertions(+)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index fe0887c27664..0a2f30872316 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -107,6 +107,111 @@ static int vfio_migration_set_state(VFIODevice 
>> *vbasedev, uint32_t state)
>>  return 0;
>>  }
>>  
>> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
>> +{
>> +VFIOMigration *migration = vbasedev->migration;
>> +VFIORegion *region = >region.buffer;
>> +uint64_t data_offset = 0, data_size = 0;
>> +int ret;
>> +
>> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
>> +region->fd_offset + offsetof(struct 
>> vfio_device_migration_info,
>> + data_offset));
>> +if (ret != sizeof(data_offset)) {
>> +error_report("Failed to get migration buffer data

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Alex Williamson

On Sat, 22 Jun 2019 01:08:40 +0530
Kirti Wankhede  wrote:

> On 6/21/2019 8:46 PM, Alex Williamson wrote:
> > On Fri, 21 Jun 2019 12:08:26 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 6/21/2019 12:55 AM, Alex Williamson wrote:  
> >>> On Thu, 20 Jun 2019 20:07:36 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  Added .save_live_pending, .save_live_iterate and 
>  .save_live_complete_precopy
>  functions. These functions handles pre-copy and stop-and-copy phase.
> 
>  In _SAVING|_RUNNING device state or pre-copy phase:
>  - read pending_bytes
>  - read data_offset - indicates kernel driver to write data to staging
>    buffer which is mmapped.
> >>>
> >>> Why is data_offset the trigger rather than data_size?  It seems that
> >>> data_offset can't really change dynamically since it might be mmap'd,
> >>> so it seems unnatural to bother re-reading it.
> >>> 
> >>
> >> Vendor driver can change data_offset, he can have different data_offset
> >> for device data and dirty pages bitmap.
> >>  
>  - read data_size - amount of data in bytes written by vendor driver in 
>  migration
>    region.
>  - if data section is trapped, pread() number of bytes in data_size, from
>    data_offset.
>  - if data section is mmaped, read mmaped buffer of size data_size.
>  - Write data packet to file stream as below:
>  {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
>  VFIO_MIG_FLAG_END_OF_STATE }
> 
>  In _SAVING device state or stop-and-copy phase
>  a. read config space of device and save to migration file stream. This
> doesn't need to be from vendor driver. Any other special config state
> from driver can be saved as data in following iteration.
>  b. read pending_bytes - indicates kernel driver to write data to staging
> buffer which is mmapped.
> >>>
> >>> Is it pending_bytes or data_offset that triggers the write out of
> >>> data?  Why pending_bytes vs data_size?  I was interpreting
> >>> pending_bytes as the total data size while data_size is the size
> >>> available to read now, so assumed data_size would be more closely
> >>> aligned to making the data available.
> >>> 
> >>
> >> Sorry, that's my mistake while editing, its read data_offset as in above
> >> case.
> >>  
>  c. read data_size - amount of data in bytes written by vendor driver in
> migration region.
>  d. if data section is trapped, pread() from data_offset of size 
>  data_size.
>  e. if data section is mmaped, read mmaped buffer of size data_size.
> >>>
> >>> Should this read as "pread() from data_offset of data_size, or
> >>> optionally if mmap is supported on the data area, read data_size from
> >>> start of mapped buffer"?  IOW, pread should always work.  Same in
> >>> previous section.
> >>> 
> >>
> >> ok. I'll update.
> >>  
>  f. Write data packet as below:
> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
>  g. iterate through steps b to f until (pending_bytes > 0)
> >>>
> >>> s/until/while/
> >>
> >> Ok.
> >>  
> >>> 
>  h. Write {VFIO_MIG_FLAG_END_OF_STATE}
> 
>  .save_live_iterate runs outside the iothread lock in the migration case, 
>  which
>  could race with asynchronous call to get dirty page list causing data 
>  corruption
>  in mapped migration region. Mutex added here to serial migration buffer 
>  read
>  operation.
> >>>
> >>> Would we be ahead to use different offsets within the region for device
> >>> data vs dirty bitmap to avoid this?
> >>>
> >>
> >> Lock will still be required to serialize the read/write operations on
> >> vfio_device_migration_info structure in the region.
> >>
> >>  
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
>   hw/vfio/migration.c | 212 
>  
>   1 file changed, 212 insertions(+)
> 
>  diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>  index fe0887c27664..0a2f30872316 100644
>  --- a/hw/vfio/migration.c
>  +++ b/hw/vfio/migration.c
>  @@ -107,6 +107,111 @@ static int vfio_migration_set_state(VFIODevice 
>  *vbasedev, uint32_t state)
>   return 0;
>   }
>   
>  +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
>  +{
>  +VFIOMigration *migration = vbasedev->migration;
>  +VFIORegion *region = >region.buffer;
>  +uint64_t data_offset = 0, data_size = 0;
>  +int ret;
>  +
>  +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
>  +region->fd_offset + offsetof(struct 
>  vfio_device_migration_info,
>  + data_offset));
>  +if (ret != sizeof(data_offset)) {
>  +error_report("Failed to get migration buffer data offset %d",
>  + ret);
>  +

Re: [Qemu-devel] [PATCH v4 01/13] vfio: KABI for migration interface

2019-06-21 Thread Alex Williamson

On Sat, 22 Jun 2019 01:05:48 +0530
Kirti Wankhede  wrote:

> On 6/21/2019 8:33 PM, Alex Williamson wrote:
> > On Fri, 21 Jun 2019 11:22:15 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 6/20/2019 10:48 PM, Alex Williamson wrote:  
> >>> On Thu, 20 Jun 2019 20:07:29 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  - Defined MIGRATION region type and sub-type.
>  - Used 3 bits to define VFIO device states.
>  Bit 0 => _RUNNING
>  Bit 1 => _SAVING
>  Bit 2 => _RESUMING
>  Combination of these bits defines VFIO device's state during 
>  migration
>  _STOPPED => All bits 0 indicates VFIO device stopped.
>  _RUNNING => Normal VFIO device running state.
>  _SAVING | _RUNNING => vCPUs are running, VFIO device is running but 
>  start
>    saving state of device i.e. pre-copy state
>  _SAVING  => vCPUs are stoppped, VFIO device should be stopped, and
>    save device state,i.e. stop-n-copy state
>  _RESUMING => VFIO device resuming state.
>  _SAVING | _RESUMING => Invalid state if _SAVING and _RESUMING bits 
>  are set
>  - Defined vfio_device_migration_info structure which will be placed at 
>  0th
>    offset of migration region to get/set VFIO device related information.
>    Defined members of structure and usage on read/write access:
>  * device_state: (read/write)
>  To convey VFIO device state to be transitioned to. Only 3 bits 
>  are used
>  as of now.
>  * pending bytes: (read only)
>  To get pending bytes yet to be migrated for VFIO device.
>  * data_offset: (read only)
>  To get data offset in migration from where data exist during 
>  _SAVING
>  and from where data should be written by user space application 
>  during
>   _RESUMING state
>  * data_size: (read/write)
>  To get and set size of data copied in migration region during 
>  _SAVING
>  and _RESUMING state.
>  * start_pfn, page_size, total_pfns: (write only)
>  To get bitmap of dirty pages from vendor driver from given
>  start address for total_pfns.
>  * copied_pfns: (read only)
>  To get number of pfns bitmap copied in migration region.
>  Vendor driver should copy the bitmap with bits set only for
>  pages to be marked dirty in migration region. Vendor driver
>  should return 0 if there are 0 pages dirty in requested
>  range. Vendor driver should return -1 to mark all pages in the 
>  section
>  as dirty
> 
>  Migration region looks like:
>   --
>  |vfio_device_migration_info|data section  |
>  |  | ///  |
>   --
>   ^  ^  ^
>   offset 0-trapped partdata_offset data_size
> 
>  Data section is always followed by vfio_device_migration_info
>  structure in the region, so data_offset will always be none-0.
>  Offset from where data is copied is decided by kernel driver, data
>  section can be trapped or mapped depending on how kernel driver
>  defines data section. If mmapped, then data_offset should be page
>  aligned, where as initial section which contain
>  vfio_device_migration_info structure might not end at offset which
>  is page aligned.
> 
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
>   linux-headers/linux/vfio.h | 71 
>  ++
>   1 file changed, 71 insertions(+)
> 
>  diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>  index 24f505199f83..274ec477eb82 100644
>  --- a/linux-headers/linux/vfio.h
>  +++ b/linux-headers/linux/vfio.h
>  @@ -372,6 +372,77 @@ struct vfio_region_gfx_edid {
>    */
>   #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD(1)
>   
>  +/* Migration region type and sub-type */
>  +#define VFIO_REGION_TYPE_MIGRATION  (2)
>  +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
>  +
>  +/**
>  + * Structure vfio_device_migration_info is placed at 0th offset of
>  + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related 
>  migration
>  + * information. Field accesses from this structure are only supported 
>  at their
>  + * native width and alignment, otherwise should return error.
>  + *
>  + * device_state: (read/write)
>  + *  To indicate vendor driver the state VFIO device should be 
>  transitioned
>

Re: [Qemu-devel] [PATCH 02/12] block/backup: Add mirror sync mode 'bitmap'

2019-06-21 Thread John Snow

On 6/21/19 7:29 AM, Vladimir Sementsov-Ogievskiy wrote:
> 20.06.2019 4:03, John Snow wrote:
>> We don't need or want a new sync mode for simple differences in
>> semantics.  Create a new mode simply named "BITMAP" that is designed to
>> make use of the new Bitmap Sync Mode field.
>>
>> Because the only bitmap mode is 'conditional', this adds no new
>> functionality to the backup job (yet). The old incremental backup mode
>> is maintained as a syntactic sugar for sync=bitmap, mode=conditional.
>>
>> Add all of the plumbing necessary to support this new instruction.
> 
> I don't follow, why you don't want to just add bitmap-mode optional parameter
> for incremental mode?
> 

Vocabulary reasons, see below.

> For this all looks similar to just two separate things:
> 1. add bitmap-mode parameter
> 2. rename incremental to bitmap

This is exactly correct!

> 
> Why do we need [2.] ?
> If we do only [1.], we'll avoid creating two similar modes, syntax sugar, a 
> bit
> of mess as it seems to me..
> 
> Hmm, about differential backups, as I understood, we call 'differential' an 
> incremental
> backup, but which considers difference not from latest incremental backup but 
> from some
> in the past.. Is it incorrect?
> 

The reason is because I have been treating "INCREMENTAL" as meaning
something very specific -- I gather from you and Max that you don't
consider this term to mean something specific.

So, by other prominent backup vendors, they use these terms in this way:

INCREMENTAL: This backup contains a delta from the last INCREMENTAL
backup made. In effect, this creates a chain of backups that must be
squashed together to recover data, but uses less info on copy.

DIFFERENTIAL: This backup contains a delta from the last FULL backup
made. In effect, each differential backup only requires a base image and
a single differential. This usually wastes more data during the backup
process, but makes restoration processes easier.

I *always* use these terms in these *exact* ways; you can see that the
bitmap behavior we use is exactly what MIRROR_SYNC_MODE_INCREMENTAL
does. Even when we are using bitmap manipulation techniques to get it to
do something else, the block job itself is engineered to think that it
is producing an "Incremental" backup.

In the early days of this feature, Fam actually proposed something like
what I am proposing here:

a BITMAP sync mode with an on_complete parameter for the backup job that
would either roll the bitmap forward or not (like my "conditional",
"never") based on the success of the job.

We removed that because at the time we wanted to target a simpler
feature. As part of that removal, I renamed the mode "INCREMENTAL" under
the premise that if we ever wanted to add a "DIFFERENTIAL" mode like
what Fam's original design allowed for, we could add
MIRROR_SYNC_MODE_DIFFERENTIAL and that would differentiate the two
modes. This rename was done with the specific knowledge and intent that
the mode was named after the exact specific backup paradigm it was
enabling. Otherwise, I would have left it "BITMAP" back then.

I've had patches in my branch to add a DIFFERENTIAL mode ever since
then! However, since we added bitmap merging, you'll notice that we
actually CAN do "Differential" backups by playing around with the
bitmaps ourselves, which has largely stopped me from wanting to
introduce the new mode.

You'll recall that recently Xie Changlong sent patches to add
"incremental" support to mirror, but what they ACTUALLY implemented was
"Differential" mode -- they didn't clear the bitmap afterwards. I
actually responded as such on-list -- that if we implement a
"Differential" mode that their patches would have been appropriate for
that mode.

As a result of that discussion, I went to add a "Differential" mode to
mirror, but in the process realized that it's much easier to make the
bitmap sync behavior its own parameter.

However, because the new parameters no longer mean the backup is
"Incremental" by that definition, I decided to rename the mode "BITMAP"
again to be *less specific* and, perhaps now ironically, avoid confusion.

Even given this confusion ... I actually still think that we should NOT
use "Incremental" to mean something generic, and I will continue to
enforce the idea that "Incremental" should mean a series of
non-overlapping time-sliced deltas.

--js

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Kirti Wankhede




On 6/21/2019 8:46 PM, Alex Williamson wrote:
> On Fri, 21 Jun 2019 12:08:26 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/21/2019 12:55 AM, Alex Williamson wrote:
>>> On Thu, 20 Jun 2019 20:07:36 +0530
>>> Kirti Wankhede  wrote:
>>>   
 Added .save_live_pending, .save_live_iterate and 
 .save_live_complete_precopy
 functions. These functions handles pre-copy and stop-and-copy phase.

 In _SAVING|_RUNNING device state or pre-copy phase:
 - read pending_bytes
 - read data_offset - indicates kernel driver to write data to staging
   buffer which is mmapped.  
>>>
>>> Why is data_offset the trigger rather than data_size?  It seems that
>>> data_offset can't really change dynamically since it might be mmap'd,
>>> so it seems unnatural to bother re-reading it.
>>>   
>>
>> Vendor driver can change data_offset, he can have different data_offset
>> for device data and dirty pages bitmap.
>>
 - read data_size - amount of data in bytes written by vendor driver in 
 migration
   region.
 - if data section is trapped, pread() number of bytes in data_size, from
   data_offset.
 - if data section is mmaped, read mmaped buffer of size data_size.
 - Write data packet to file stream as below:
 {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
 VFIO_MIG_FLAG_END_OF_STATE }

 In _SAVING device state or stop-and-copy phase
 a. read config space of device and save to migration file stream. This
doesn't need to be from vendor driver. Any other special config state
from driver can be saved as data in following iteration.
 b. read pending_bytes - indicates kernel driver to write data to staging
buffer which is mmapped.  
>>>
>>> Is it pending_bytes or data_offset that triggers the write out of
>>> data?  Why pending_bytes vs data_size?  I was interpreting
>>> pending_bytes as the total data size while data_size is the size
>>> available to read now, so assumed data_size would be more closely
>>> aligned to making the data available.
>>>   
>>
>> Sorry, that's my mistake while editing, its read data_offset as in above
>> case.
>>
 c. read data_size - amount of data in bytes written by vendor driver in
migration region.
 d. if data section is trapped, pread() from data_offset of size data_size.
 e. if data section is mmaped, read mmaped buffer of size data_size.  
>>>
>>> Should this read as "pread() from data_offset of data_size, or
>>> optionally if mmap is supported on the data area, read data_size from
>>> start of mapped buffer"?  IOW, pread should always work.  Same in
>>> previous section.
>>>   
>>
>> ok. I'll update.
>>
 f. Write data packet as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
 g. iterate through steps b to f until (pending_bytes > 0)  
>>>
>>> s/until/while/  
>>
>> Ok.
>>
>>>   
 h. Write {VFIO_MIG_FLAG_END_OF_STATE}

 .save_live_iterate runs outside the iothread lock in the migration case, 
 which
 could race with asynchronous call to get dirty page list causing data 
 corruption
 in mapped migration region. Mutex added here to serial migration buffer 
 read
 operation.  
>>>
>>> Would we be ahead to use different offsets within the region for device
>>> data vs dirty bitmap to avoid this?
>>>  
>>
>> Lock will still be required to serialize the read/write operations on
>> vfio_device_migration_info structure in the region.
>>
>>
 Signed-off-by: Kirti Wankhede 
 Reviewed-by: Neo Jia 
 ---
  hw/vfio/migration.c | 212 
 
  1 file changed, 212 insertions(+)

 diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
 index fe0887c27664..0a2f30872316 100644
 --- a/hw/vfio/migration.c
 +++ b/hw/vfio/migration.c
 @@ -107,6 +107,111 @@ static int vfio_migration_set_state(VFIODevice 
 *vbasedev, uint32_t state)
  return 0;
  }
  
 +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
 +{
 +VFIOMigration *migration = vbasedev->migration;
 +VFIORegion *region = >region.buffer;
 +uint64_t data_offset = 0, data_size = 0;
 +int ret;
 +
 +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
 +region->fd_offset + offsetof(struct 
 vfio_device_migration_info,
 + data_offset));
 +if (ret != sizeof(data_offset)) {
 +error_report("Failed to get migration buffer data offset %d",
 + ret);
 +return -EINVAL;
 +}
 +
 +ret = pread(vbasedev->fd, _size, sizeof(data_size),
 +region->fd_offset + offsetof(struct 
 vfio_device_migration_info,
 + data_size));
 +if (ret != sizeof(data_size)) {
 +error_report("Failed to get

Re: [Qemu-devel] [PATCH v4 01/13] vfio: KABI for migration interface

2019-06-21 Thread Kirti Wankhede




On 6/21/2019 8:33 PM, Alex Williamson wrote:
> On Fri, 21 Jun 2019 11:22:15 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/20/2019 10:48 PM, Alex Williamson wrote:
>>> On Thu, 20 Jun 2019 20:07:29 +0530
>>> Kirti Wankhede  wrote:
>>>   
 - Defined MIGRATION region type and sub-type.
 - Used 3 bits to define VFIO device states.
 Bit 0 => _RUNNING
 Bit 1 => _SAVING
 Bit 2 => _RESUMING
 Combination of these bits defines VFIO device's state during migration
 _STOPPED => All bits 0 indicates VFIO device stopped.
 _RUNNING => Normal VFIO device running state.
 _SAVING | _RUNNING => vCPUs are running, VFIO device is running but 
 start
   saving state of device i.e. pre-copy state
 _SAVING  => vCPUs are stoppped, VFIO device should be stopped, and
   save device state,i.e. stop-n-copy state
 _RESUMING => VFIO device resuming state.
 _SAVING | _RESUMING => Invalid state if _SAVING and _RESUMING bits are 
 set
 - Defined vfio_device_migration_info structure which will be placed at 0th
   offset of migration region to get/set VFIO device related information.
   Defined members of structure and usage on read/write access:
 * device_state: (read/write)
 To convey VFIO device state to be transitioned to. Only 3 bits are 
 used
 as of now.
 * pending bytes: (read only)
 To get pending bytes yet to be migrated for VFIO device.
 * data_offset: (read only)
 To get data offset in migration from where data exist during 
 _SAVING
 and from where data should be written by user space application 
 during
  _RESUMING state
 * data_size: (read/write)
 To get and set size of data copied in migration region during 
 _SAVING
 and _RESUMING state.
 * start_pfn, page_size, total_pfns: (write only)
 To get bitmap of dirty pages from vendor driver from given
 start address for total_pfns.
 * copied_pfns: (read only)
 To get number of pfns bitmap copied in migration region.
 Vendor driver should copy the bitmap with bits set only for
 pages to be marked dirty in migration region. Vendor driver
 should return 0 if there are 0 pages dirty in requested
 range. Vendor driver should return -1 to mark all pages in the 
 section
 as dirty

 Migration region looks like:
  --
 |vfio_device_migration_info|data section  |
 |  | ///  |
  --
  ^  ^  ^
  offset 0-trapped partdata_offset data_size

 Data section is always followed by vfio_device_migration_info
 structure in the region, so data_offset will always be none-0.
 Offset from where data is copied is decided by kernel driver, data
 section can be trapped or mapped depending on how kernel driver
 defines data section. If mmapped, then data_offset should be page
 aligned, where as initial section which contain
 vfio_device_migration_info structure might not end at offset which
 is page aligned.

 Signed-off-by: Kirti Wankhede 
 Reviewed-by: Neo Jia 
 ---
  linux-headers/linux/vfio.h | 71 
 ++
  1 file changed, 71 insertions(+)

 diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
 index 24f505199f83..274ec477eb82 100644
 --- a/linux-headers/linux/vfio.h
 +++ b/linux-headers/linux/vfio.h
 @@ -372,6 +372,77 @@ struct vfio_region_gfx_edid {
   */
  #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD  (1)
  
 +/* Migration region type and sub-type */
 +#define VFIO_REGION_TYPE_MIGRATION(2)
 +#define VFIO_REGION_SUBTYPE_MIGRATION (1)
 +
 +/**
 + * Structure vfio_device_migration_info is placed at 0th offset of
 + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related 
 migration
 + * information. Field accesses from this structure are only supported at 
 their
 + * native width and alignment, otherwise should return error.
 + *
 + * device_state: (read/write)
 + *  To indicate vendor driver the state VFIO device should be 
 transitioned
 + *  to. If device state transition fails, write to this field return 
 error.
 + *  It consists of 3 bits:
 + *  - If bit 0 set, indicates _RUNNING state. When its reset, that 
 indicates
 + *_STOPPED state. When device is changed to

Re: [Qemu-devel] [RFC v2 PATCH] hw/arm/virt: makes virt a default machine type

2019-06-21 Thread Cleber Rosa

On Fri, Jun 21, 2019 at 11:33:10AM +0100, Peter Maydell wrote:
> On Thu, 20 Jun 2019 at 23:23, Wainer dos Santos Moschetta
>  wrote:
> > I came across this when running the acceptance tests in an aarch64 host.
> > The arch-independent tests fail because, in general, they don't set a
> > machine type. In order to avoid treating arm targets as special cases
> > on avocado_qemu framework I prefered to attempt to promote virt as
> > default for ARM emulation. Moreover since it represents a generic hardware
> > and its used is broaden advised [1], I found it the right choice.
> 
> Not providing a default machine type for Arm is a deliberate
> choice: there is no single right answer and the user has
> to decide what their preference is. We used to have a default
> machine type set, and it caused a lot of user confusion as
> they expected Arm to be like x86 where everything will run
> fine on the default machine type and it did not, which is
> why we switched to not having a default.
> 
> thanks
> -- PMM

The experience acquired here deserves the highest consideration, but I
can't help myself to wonder if this isn't one of the (conceptual)
reasons for parameters such as '-nodefaults'.  I know QEMU doesn't
promise the same behavior across different targets, but that could
improve considerably with very cheap actions.

You can consider me biased (I do consider myself), but trying to wear
the hat of a user first interacting with QEMU, I would expect a (any)
reasonably capable environment that can represent the given target.
That will probably be a different environment than the one I may need,
and I think that's fine.

Now on the functional testing side, this means less code adjusting to
the specifics of each target, and overall, more test code that could
be reused across different targets.  I believe the same to be true
for management layer code.

Anyway, it'd be nice to just double check if keeping things as they
are is in this specific aspect is a firm yes.  If so, tests (and
management layers) will (continue to) have to adapat.

Best,
- Cleber.

Re: [Qemu-devel] [SeaBIOS] [PATCH v3 3/4] geometry: Add boot_lchs_find_*() utility functions

2019-06-21 Thread Kevin O'Connor

On Fri, Jun 21, 2019 at 08:42:28PM +0300, Sam Eiderman wrote:
> Sounds reasonable, how do purpose to deal with:
> 
> config BIOS_GEOMETRY
> config BOOTORDER
> 
> precompiler optouts?

I think you can stick them both under BOOTORDER.  That option is only
there in case someone wants to reduce the size of the SeaBIOS binary.
I can't think of a reasonable situation where one cares that much
about binary size, yet wants to override legacy disk translations..

> If we don’t need any of them we also don’t need to call “get_scsi_devpath", 
> “get_ata_devpath”, “get_pci_dev_path”.
> 
> I’ll see what can be done. 

Thanks.
-Kevin

[Qemu-devel] [PATCH] ati-vga: Add DDC reg names for debug

2019-06-21 Thread BALATON Zoltan

Incremental patch to squash into last series

Signed-off-by: BALATON Zoltan 
---
 hw/display/ati_dbg.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/display/ati_dbg.c b/hw/display/ati_dbg.c
index b045f81d06..88b3a11315 100644
--- a/hw/display/ati_dbg.c
+++ b/hw/display/ati_dbg.c
@@ -19,6 +19,8 @@ static struct ati_regdesc ati_reg_names[] = {
 {"CRTC_GEN_CNTL", 0x0050},
 {"CRTC_EXT_CNTL", 0x0054},
 {"DAC_CNTL", 0x0058},
+{"GPIO_VGA_DDC", 0x0060},
+{"GPIO_DVI_DDC", 0x0064},
 {"GPIO_MONID", 0x0068},
 {"I2C_CNTL_1", 0x0094},
 {"PALETTE_INDEX", 0x00b0},
-- 
2.13.7

Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device

2019-06-21 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190621144541.13770-1-skrtbht...@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Message-id: 20190621144541.13770-1-skrtbht...@gmail.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
21c2b30 hw/pvrdma: Add live migration support

=== OUTPUT BEGIN ===
ERROR: do not use C99 // comments
#50: FILE: hw/rdma/vmw/pvrdma_main.c:610:
+// Remap DSR

ERROR: do not use C99 // comments
#60: FILE: hw/rdma/vmw/pvrdma_main.c:620:
+// Remap cmd slot

WARNING: line over 80 characters
#62: FILE: hw/rdma/vmw/pvrdma_main.c:622:
+dev->dsr_info.req = rdma_pci_dma_map(pci_dev, 
dev->dsr_info.dsr->cmd_slot_dma,

ERROR: do not use C99 // comments
#70: FILE: hw/rdma/vmw/pvrdma_main.c:630:
+// Remap rsp slot

WARNING: line over 80 characters
#72: FILE: hw/rdma/vmw/pvrdma_main.c:632:
+dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, 
dev->dsr_info.dsr->resp_slot_dma,

total: 3 errors, 2 warnings, 77 lines checked

Commit 21c2b3077a6c (hw/pvrdma: Add live migration support) has style problems, 
please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190621144541.13770-1-skrtbht...@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities

2019-06-21 Thread Liran Alon

> On 21 Jun 2019, at 20:27, Paolo Bonzini  wrote:
> 
> On 21/06/19 17:07, Liran Alon wrote:
>>> So, overall I prefer not to block migration.
>> I’m not sure I agree.
>> It is quite likely that vCPU is currently in guest-mode while you are 
>> migrating…
>> A good hypervisor tries to maximise CPU time to be in guest-mode rather than 
>> host-mode. :)
> 
> True, but it is even more likely that they are not using KVM at all and
> just happen to have the CPUID flag set. :)
> 
> Paolo

Since QEMU commit 75d373ef9729 ("target-i386: Disable SVM by default in KVM 
mode"),
An AMD vCPU that is virtualized by KVM don’t expose SVM by default. Even if you 
use "-cpu host".
Therefore, it is unlikely that vCPU expose SVM CPUID flag when user is not 
running any nSVM
workload inside guest. Unless I’m missing something obvious.
Otherwise, I would have agreed with you.

-Liran

Re: [Qemu-devel] [SeaBIOS] [PATCH v3 3/4] geometry: Add boot_lchs_find_*() utility functions

2019-06-21 Thread Sam Eiderman

Sounds reasonable, how do purpose to deal with:

config BIOS_GEOMETRY
config BOOTORDER

precompiler optouts?

If we don’t need any of them we also don’t need to call “get_scsi_devpath", 
“get_ata_devpath”, “get_pci_dev_path”.

I’ll see what can be done. 

> On 20 Jun 2019, at 17:37, Kevin O'Connor  wrote:
> 
> On Wed, Jun 19, 2019 at 12:23:51PM +0300, Sam Eiderman wrote:
>> Adding the following utility functions:
>> 
>>* boot_lchs_find_pci_device
>>* boot_lchs_find_scsi_device
>>* boot_lchs_find_ata_device
> 
> FWIW, this leads to a bit of code duplication.  I think it would be
> preferable to refactor the bootprio_find_XYZ() calls.  Instead of
> returning an 'int prio' they could return a znprintf'd 'char *devpath'
> instead.  Then the boot_add_XYZ() calls could directly call
> find_prio(devpath). The boot_add_hd() could then directly populate
> drive->lchs or call setup_translation().
> 
> -Kevin

Re: [Qemu-devel] [PATCH v3 00/50] tcg plugin support

2019-06-21 Thread Pranith Kumar

On Fri, Jun 21, 2019 at 1:21 AM Alex Bennée  wrote:

> > * Register and memory read/write API
> >
> >   It would be great to have register and memory read/write API i.e., ability
> >   to read/write to registers/memory from within the callback. This gives the
> >   plugin ability to do system introspection. (Not sure if the current 
> > patchset
> >   implements this already).
>
> Not currently. The trick is to have something flexible enough without
> exposing internals. I guess we could consider the gdb register
> enumeration or maybe hook into stuff declared as
> tcg_global_mem_new_i[64|32]. That won't get every "register" but it
> should certainly cover the main general purpose ones. Things like SVE
> and AdvSIMD vector registers wouldn't be seen though.

I guess general registers could be a good starting point. We can then
implement arch specific register access APIs.

>
> > * Register callbacks
> >
> >   A callback needs to be invoked whenever a specified registers is read or
> >   written to.
>
> Again tricky as not every register read/write is obvious from TCG -
> vector registers tweaked from helpers would be a good example.
>
> >
> > * Where do new plugins live in the tree?
> >
> >   The current source files in plugins (api, core etc.,) I think are better 
> > if
> >   moved to tcg/plugins/.  The various plugins we write would then live in 
> > the
> >   plugins/ folder instead of the current tests/plugin/ folder.
>
> The example plugins are really just toys for experimenting with the API
> - I don't see too much problem with them being in tests. However the
> howvec plugin is very guest architecture specific so we could consider a
> bit more of a hierarchy. Maybe these should all live in tests/tcg?
>

So where do you want 'real' plugins to live in the tree? It would be
good to think about the structure for those.

> >
> > * Timer interrupts
> >
> >   What I observed is that the system execution is affected by what you do in
> >   the callbacks because of timer interrupts. For example, if you spend time 
> > in
> >   the memory callback doing a bit of processing and writing to a file, you
> >   will see more timer interrupt instructions. One solution to this would be 
> > to
> >   use 'icount', but it does not work that well. I think we need to do
> >   something similar to what gdb does in debug mode. How would you handle 
> > MTTCG
> >   guests in that case?
>
> icount is going to be the best you can get for deterministic time -
> other efforts to pause/restart virtual time going in and out of plugins
> are just going to add a lot of overhead.

I wonder why using icount is not working in this case. Are there any
timers that fire non-deterministically even when icount is used?

>
> Remember QEMU doesn't even try to be a cycle accurate emulation so
> expecting to get reasonable timing information out of these plugins is a
> stretch. Maybe I should make that clearer in the design document?

It is less about being cycle accurate and more about being
deterministic. For example, when tracing using plugins+callbacks, you
will see a lot more interrupt code in the trace than when if you
execute without tracing. How do we get them to be more similar?

Another idea would be to provide an API for the plugin to generate the
timer interrupt. This allows the plugin to generate regular interrupts
irrespective of what is being done in the callbacks.

>
> The gdb behaviour is just a massive hack. When single-stepping in GDB we
> prevent timer IRQs from being delivered - they have still fired and are
> pending and will execute as soon as you hit continue.
>
> >   Another approach would be to offload callback generation to a separate
> >   plugin thread. The main thread will copy data required by a callback and
> >   invoke the callback asynchronously (std::async in C++ if you are
> >   familiar).
>
> This would complicate things - the current iteration I'm working on
> drops the haddr cb in favour of dynamically resolving the vaddr in the
> callback. But that approach is only valid during the callback before
> something else potentially pollutes the TLB.
>

> >
> > * Starting and stopping callback generation
> >
> >   It would be great if we have a mechanism to dynamically start/stop 
> > callbacks
> >   when a sequence of code (magic instruction) is executed. This would be
> >   useful to annotate region-of-interest (ROI) in benchmarks to
> > generate callbacks.
>
> Well we have that now. At each TB generation event the callback is free to 
> register
> as many or few callbacks as it likes dynamically.

But how does the plugin know that the TB being generated is the first
TB in the ROI?
Similarly the plugin needs to know the then end of ROI has been reached.

Also, please note that there can be multiple ROIs. It would be good to
know if we can assign ids to each ROI for the plugin.

Thanks,
-- 
Pranith

Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities

2019-06-21 Thread Paolo Bonzini

On 21/06/19 17:07, Liran Alon wrote:
>> So, overall I prefer not to block migration.
> I’m not sure I agree.
> It is quite likely that vCPU is currently in guest-mode while you are 
> migrating…
> A good hypervisor tries to maximise CPU time to be in guest-mode rather than 
> host-mode. :)

True, but it is even more likely that they are not using KVM at all and
just happen to have the CPUID flag set. :)

Paolo

Re: [Qemu-devel] [PATCH] configure: linux-user doesn't need neither fdt nor slirp

2019-06-21 Thread Philippe Mathieu-Daudé

On 6/21/19 3:05 PM, Laurent Vivier wrote:
> if softmmu is not enabled, we disable by default fdt and
> slirp as they are only used by -softmmu targets.
> 
> A side effect is the git submodules are not cloned
> if they are not needed.
> 
> Clone and build can be forced with --enable-fdt and
> --enable-slirp.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  configure | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/configure b/configure
> index b091b82cb371..4b3853298c79 100755
> --- a/configure
> +++ b/configure
> @@ -4066,6 +4066,11 @@ if test "$fdt_required" = "yes"; then
>fdt=yes
>  fi
>  
> +# linux-user doesn't need fdt

"fdt is only required when building softmmu targets"

(we don't need it to build tools such qemu-img)

> +if test -z "$fdt" -a "$softmmu" != "yes" ; then
> +fdt="no"
> +fi
> +
>  if test "$fdt" != "no" ; then
>fdt_libs="-lfdt"
># explicitly check for libfdt_env.h as it is missing in some stable 
> installs
> @@ -5923,6 +5928,11 @@ fi
>  ##
>  # check for slirp
>  
> +# linux-user doesn't need slirp

"slirp is only required when building softmmu targets"

> +if test -z "$slirp" -a "$softmmu" != "yes" ; then
> +slirp="no"
> +fi
> +
>  case "$slirp" in
>"" | yes)
>  if $pkg_config slirp; then
> 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé

Re: [Qemu-devel] [PATCH v2 06/14] target/arm: Allow SVE to be disabled via a CPU property

2019-06-21 Thread Andrew Jones

On Fri, Jun 21, 2019 at 06:55:02PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Drew,
> 
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > Since 97a28b0eeac14 ("target/arm: Allow VFP and Neon to be disabled via
> > a CPU property") we can disable the 'max' cpu model's VFP and neon
> > features, but there's no way to disable SVE. Add the 'sve=on|off'
> > property to give it that flexibility. We also rename
> > cpu_max_get/set_sve_vq to cpu_max_get/set_sve_max_vq in order for them
> > to follow the typical *_get/set_ pattern.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  target/arm/cpu.c | 10 +-
> >  target/arm/cpu64.c   | 72 ++--
> >  target/arm/helper.c  |  8 +++--
> >  target/arm/monitor.c |  2 +-
> >  tests/arm-cpu-features.c |  1 +
> >  5 files changed, 78 insertions(+), 15 deletions(-)
> > 
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 858f668d226e..f08e178fc84b 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -198,7 +198,7 @@ static void arm_cpu_reset(CPUState *s)
> >  env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
> >  env->cp15.cptr_el[3] |= CPTR_EZ;
> >  /* with maximum vector length */
> > -env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
> > +env->vfp.zcr_el[1] = cpu->sve_max_vq ? cpu->sve_max_vq - 1 : 0;
> >  env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
> >  env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
> >  /*
> > @@ -1129,6 +1129,14 @@ static void arm_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> >  cpu->isar.mvfr0 = u;
> >  }
> >  
> > +if (!cpu->sve_max_vq) {
> > +uint64_t t;
> > +
> > +t = cpu->isar.id_aa64pfr0;
> > +t = FIELD_DP64(t, ID_AA64PFR0, SVE, 0);
> > +cpu->isar.id_aa64pfr0 = t;
> > +}
> > +
> >  if (arm_feature(env, ARM_FEATURE_M) && !cpu->has_dsp) {
> >  uint32_t u;
> >  
> > diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> > index 946994838d8a..02ada65f240c 100644
> > --- a/target/arm/cpu64.c
> > +++ b/target/arm/cpu64.c
> > @@ -257,27 +257,75 @@ static void aarch64_a72_initfn(Object *obj)
> >  define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
> >  }
> >  
> > -static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
> > -   void *opaque, Error **errp)
> > +static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char 
> > *name,
> > +   void *opaque, Error **errp)
> >  {
> >  ARMCPU *cpu = ARM_CPU(obj);
> >  visit_type_uint32(v, name, >sve_max_vq, errp);
> >  }
> >  
> > -static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
> > -   void *opaque, Error **errp)
> > +static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char 
> > *name,
> > +   void *opaque, Error **errp)
> >  {
> >  ARMCPU *cpu = ARM_CPU(obj);
> >  Error *err = NULL;
> > +uint32_t value;
> >  
> > -visit_type_uint32(v, name, >sve_max_vq, );
> > +visit_type_uint32(v, name, , );
> > +if (err) {
> > +error_propagate(errp, err);
> > +return;
> > +}
> >  
> > -if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
> > -error_setg(, "unsupported SVE vector length");
> > -error_append_hint(, "Valid sve-max-vq in range [1-%d]\n",
> > +if (!cpu->sve_max_vq) {
> > +error_setg(errp, "cannot set sve-max-vq");
> > +error_append_hint(errp, "SVE has been disabled with sve=off\n");
> > +return;
> > +}
> > +
> > +cpu->sve_max_vq = value;
> > +
> > +if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
> > +error_setg(errp, "unsupported SVE vector length");
> > +error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
> >ARM_MAX_VQ);
> >  }
> > -error_propagate(errp, err);
> > +}
> > +
> > +static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
> > +void *opaque, Error **errp)
> > +{
> > +ARMCPU *cpu = ARM_CPU(obj);
> > +bool value = !!cpu->sve_max_vq;
> > +
> > +visit_type_bool(v, name, , errp);
> > +}
> > +
> > +static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> > +void *opaque, Error **errp)
> > +{
> > +ARMCPU *cpu = ARM_CPU(obj);
> > +Error *err = NULL;
> > +bool value;
> > +
> > +visit_type_bool(v, name, , );
> > +if (err) {
> > +error_propagate(errp, err);
> > +return;
> > +}
> > +
> > +if (value) {
> > +/*
> > + * We handle the -cpu ,sve=off,sve=on case by reinitializing,
> > + * but otherwise we don't do anything as an sve=on could come after
> > + * a sve-max-vq setting.
> 
> I don't understand why would someone use that...

Command line

Re: [Qemu-devel] [PATCH v3 2/6] target/arm: Allow setting M mode entry and sp

2019-06-21 Thread Philippe Mathieu-Daudé

Hi Alistair,

On 6/19/19 6:54 AM, Alistair Francis wrote:
> Add M mode initial entry PC and SP properties.
> 
> Signed-off-by: Alistair Francis 
> ---
>  target/arm/cpu.c | 47 +++
>  target/arm/cpu.h |  3 +++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 376db154f0..1d83972ab1 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -301,6 +301,9 @@ static void arm_cpu_reset(CPUState *s)
>   */
>  initial_msp = ldl_p(rom);
>  initial_pc = ldl_p(rom + 4);
> +} else if (cpu->init_sp || cpu->init_entry) {
> +initial_msp = cpu->init_sp;
> +initial_pc = cpu->init_entry;
>  } else {
>  /* Address zero not covered by a ROM blob, or the ROM blob
>   * is in non-modifiable memory and this is a second reset after
> @@ -801,6 +804,38 @@ static void arm_set_init_svtor(Object *obj, Visitor *v, 
> const char *name,
>  visit_type_uint32(v, name, >init_svtor, errp);
>  }
>  
> +static void arm_get_init_sp(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +
> +visit_type_uint32(v, name, >init_sp, errp);
> +}
> +
> +static void arm_set_init_sp(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +
> +visit_type_uint32(v, name, >init_sp, errp);
> +}
> +
> +static void arm_get_init_entry(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +
> +visit_type_uint32(v, name, >init_entry, errp);
> +}
> +
> +static void arm_set_init_entry(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +
> +visit_type_uint32(v, name, >init_entry, errp);
> +}
> +
>  void arm_cpu_post_init(Object *obj)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
> @@ -913,6 +948,18 @@ void arm_cpu_post_init(Object *obj)
>  object_property_add(obj, "init-svtor", "uint32",
>  arm_get_init_svtor, arm_set_init_svtor,
>  NULL, NULL, _abort);
> +} else {
> +/*
> + * M profile: initial value of the SP and entry. We can't just use
> + * a simple DEFINE_PROP_UINT32 for this because we want to permit
> + * the property to be set after realize.
> + */

This comment is mostly a copy of the other if() branch, maybe you can
extract one generic comment for the 2 cases.

> +object_property_add(obj, "init-sp", "uint32",
> +arm_get_init_sp, arm_set_init_sp,
> +NULL, NULL, _abort);
> +object_property_add(obj, "init-entry", "uint32",
> +arm_get_init_entry, arm_set_init_entry,
> +NULL, NULL, _abort);

I'm having difficulties to test your patch :( I tried:

$ arm-softmmu/qemu-system-arm -M emcraft-sf2 \
  -device loader,file=/networking.uImage,cpu-num=0 \
  -d in_asm,int,mmu \
  -global cpu.init-sp=0x2000fff0 \
  -global cpu.init-entry=0xa0008001
PMSA MPU lookup for execute at 0xa0008000 mmu_idx 65 -> Miss (prot rw-)
Taking exception 3 [Prefetch Abort]
...with CFSR.IACCVIOL
PMSA MPU lookup for writing at 0x2000ffd0 mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffd4 mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffd8 mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffdc mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffe0 mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffe4 mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffe8 mmu_idx 65 -> Hit (prot rwx)
PMSA MPU lookup for writing at 0x2000ffec mmu_idx 65 -> Hit (prot rwx)
...taking pending nonsecure exception 3
PMSA MPU lookup for execute at 0x mmu_idx 67 -> Hit (prot rwx)

IN:
PMSA MPU lookup for reading at 0x mmu_idx 67 -> Hit (prot rwx)
0x:    andeqr0, r0, r0

Taking exception 18 [v7M INVSTATE UsageFault]
qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

R00= R01= R02= R03=
R04= R05= R06= R07=
R08= R09= R10= R11=
R12= R13=2000ffd0 R14=fff9 R15=
XPSR=4003 -Z-- A handler
FPSCR: 
Aborted (core dumped)

(same without setting cpu.init-entry).

Downloaded "Prebuilt Linux image ready to be loaded to the M2S-FG484
SOM" here: https://emcraft.com/products/255#software

$ file networking.uImage
networking.uImage: u-boot legacy uImage, Linux-2.6.33-cortexm-1.14.3,
Linux/ARM, OS Kernel Image (Not compressed), 2299232 bytes, Wed

[Qemu-devel] [PATCH v2 14/14] target/arm/kvm: host cpu: Add support for sve properties

2019-06-21 Thread Andrew Jones

Allow cpu 'host' to enable SVE when it's available, unless the
user chooses to disable it with the added 'sve=off' cpu property.
Also give the user the ability to select vector lengths with the
sve properties. We don't adopt 'max' cpu's other sve
property, sve-max-vq, because that property is difficult to
use with KVM. That property assumes all vector lengths in the
range from 1 up to and including the specified maximum length are
supported, but there may be optional lengths not supported by the
host in that range. With KVM one must be more specific when
enabling vector lengths.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu.c |  1 +
 target/arm/cpu.h |  2 ++
 target/arm/cpu64.c   | 47 ++--
 tests/arm-cpu-features.c | 21 +-
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index e060a0d9df0e..9d05291cb5f6 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2407,6 +2407,7 @@ static void arm_host_initfn(Object *obj)
 ARMCPU *cpu = ARM_CPU(obj);
 
 kvm_arm_set_cpu_features_from_host(cpu);
+aarch64_add_sve_properties(obj);
 arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8a1c6c66a462..52a6b219b74a 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -974,11 +974,13 @@ int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t 
*buf, int reg);
 void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
int new_el, bool el0_a64);
+void aarch64_add_sve_properties(Object *obj);
 #else
 static inline void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq) { }
 static inline void aarch64_sve_change_el(CPUARMState *env, int o,
  int n, bool a)
 { }
+static inline void aarch64_add_sve_properties(Object *obj) { }
 #endif
 
 target_ulong do_arm_semihosting(CPUARMState *env);
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 6e92aa54b9c8..89396a7729ec 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -753,6 +753,36 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const 
char *name,
 }
 }
 
+void aarch64_add_sve_properties(Object *obj)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+uint32_t vq;
+
+object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
+cpu_arm_set_sve, NULL, NULL, _fatal);
+
+/*
+ * sve_max_vq is initially unspecified, but must be initialized to a
+ * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
+ * SVE. It will be finalized in arm_cpu_realizefn().
+ */
+assert(!cpu->sve_max_vq || cpu->sve_max_vq == ARM_SVE_INIT);
+cpu->sve_max_vq = ARM_SVE_INIT;
+
+/*
+ * sve_vq_map uses a special state while setting properties, so
+ * we initialize it here with its init function and finalize it
+ * in arm_cpu_realizefn().
+ */
+arm_cpu_vq_map_init(cpu);
+for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+char name[8];
+sprintf(name, "sve%d", vq * 128);
+object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
+cpu_arm_set_sve_vq, NULL, NULL, _fatal);
+}
+}
+
 /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
  * otherwise, a CPU with as many features enabled as our emulation supports.
  * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
@@ -761,7 +791,6 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const 
char *name,
 static void aarch64_max_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
-uint32_t vq;
 
 if (kvm_enabled()) {
 kvm_arm_set_cpu_features_from_host(cpu);
@@ -847,9 +876,6 @@ static void aarch64_max_initfn(Object *obj)
 #endif
 }
 
-object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
-cpu_arm_set_sve, NULL, NULL, _fatal);
-
 /*
  * sve_max_vq is initially unspecified, but must be initialized to a
  * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
@@ -859,18 +885,7 @@ static void aarch64_max_initfn(Object *obj)
 object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
 cpu_max_set_sve_max_vq, NULL, NULL, _fatal);
 
-/*
- * sve_vq_map uses a special state while setting properties, so
- * we initialize it here with its init function and finalize it
- * in arm_cpu_realizefn().
- */
-arm_cpu_vq_map_init(cpu);
-for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
-char name[8];
-sprintf(name, "sve%d", vq * 128);
-object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
-cpu_arm_set_sve_vq, NULL, NULL, _fatal);
-}
+aarch64_add_sve_properties(obj);
 }
 
 struct ARMCPUInfo {
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index 349bd0dca6d1..dfe83f104b27

[Qemu-devel] [PATCH v2 12/14] target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features

2019-06-21 Thread Andrew Jones

kvm_arm_create_scratch_host_vcpu() takes a struct kvm_vcpu_init
parameter. Rather than just using it as an output parameter to
pass back the preferred target, use it also as an input parameter,
allowing a caller to pass a selected target if they wish and to
also pass cpu features. If the caller doesn't want to select a
target they can pass -1 for the target which indicates they want
to use the preferred target and have it passed back like before.

Signed-off-by: Andrew Jones 
---
 target/arm/kvm.c   | 20 +++-
 target/arm/kvm32.c |  6 +-
 target/arm/kvm64.c |  6 +-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 60645a196d3d..66c0c198604a 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -64,7 +64,7 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
   int *fdarray,
   struct kvm_vcpu_init *init)
 {
-int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
+int ret = 0, kvmfd = -1, vmfd = -1, cpufd = -1;
 
 kvmfd = qemu_open("/dev/kvm", O_RDWR);
 if (kvmfd < 0) {
@@ -84,7 +84,14 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 goto finish;
 }
 
-ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
+if (init->target == -1) {
+struct kvm_vcpu_init preferred;
+
+ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, );
+if (!ret) {
+init->target = preferred.target;
+}
+}
 if (ret >= 0) {
 ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
 if (ret < 0) {
@@ -96,10 +103,12 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
  * creating one kind of guest CPU which is its preferred
  * CPU type.
  */
+struct kvm_vcpu_init try;
+
 while (*cpus_to_try != QEMU_KVM_ARM_TARGET_NONE) {
-init->target = *cpus_to_try++;
-memset(init->features, 0, sizeof(init->features));
-ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
+try.target = *cpus_to_try++;
+memcpy(try.features, init->features, sizeof(init->features));
+ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, );
 if (ret >= 0) {
 break;
 }
@@ -107,6 +116,7 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 if (ret < 0) {
 goto err;
 }
+init->target = try.target;
 } else {
 /* Treat a NULL cpus_to_try argument the same as an empty
  * list, which means we will fail the call since this must
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 51f78f722b18..d007f6bd34d7 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -54,7 +54,11 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 QEMU_KVM_ARM_TARGET_CORTEX_A15,
 QEMU_KVM_ARM_TARGET_NONE
 };
-struct kvm_vcpu_init init;
+/*
+ * target = -1 informs kvm_arm_create_scratch_host_vcpu()
+ * to use the preferred target
+ */
+struct kvm_vcpu_init init = { .target = -1, };
 
 if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, )) {
 return false;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 9fc7f078cf68..2821135a4d0e 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -502,7 +502,11 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
*ahcf)
 KVM_ARM_TARGET_CORTEX_A57,
 QEMU_KVM_ARM_TARGET_NONE
 };
-struct kvm_vcpu_init init;
+/*
+ * target = -1 informs kvm_arm_create_scratch_host_vcpu()
+ * to use the preferred target
+ */
+struct kvm_vcpu_init init = { .target = -1, };
 
 if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, )) {
 return false;
-- 
2.20.1

[Qemu-devel] [PATCH v2 11/14] target/arm/kvm64: max cpu: Enable SVE when available

2019-06-21 Thread Andrew Jones

Enable SVE in the KVM guest when the 'max' cpu type is configured
and KVM supports it. KVM SVE requires use of the new finalize
vcpu ioctl, so we add that now too. For starters SVE can only be
turned on or off, getting all vector lengths the host CPU supports
when on. We'll add the other SVE CPU properties in later patches.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu64.c   | 24 ++--
 target/arm/kvm.c |  5 +
 target/arm/kvm64.c   | 25 -
 target/arm/kvm_arm.h | 27 +++
 tests/arm-cpu-features.c |  1 +
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 5def82111dee..2e595ad53137 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -371,6 +371,11 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 return;
 }
 
+/* sve-max-vq and sve properties not yet implemented for KVM */
+if (kvm_enabled()) {
+return;
+}
+
 if (cpu->sve_max_vq == ARM_SVE_INIT) {
 object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", );
 if (err) {
@@ -632,6 +637,10 @@ static void cpu_arm_get_sve(Object *obj, Visitor *v, const 
char *name,
 ARMCPU *cpu = ARM_CPU(obj);
 bool value = !!cpu->sve_max_vq;
 
+if (kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
+value = false;
+}
+
 visit_type_bool(v, name, , errp);
 }
 
@@ -649,6 +658,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const 
char *name,
 }
 
 if (value) {
+if (kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
+error_setg(errp, "'sve' feature not supported by KVM on this 
host");
+return;
+}
+
 /*
  * We handle the -cpu ,sve=off,sve=on case by reinitializing,
  * but otherwise we don't do anything as an sve=on could come after
@@ -675,6 +689,11 @@ static void aarch64_max_initfn(Object *obj)
 
 if (kvm_enabled()) {
 kvm_arm_set_cpu_features_from_host(cpu);
+/*
+ * KVM doesn't yet support the sve-max-vq property, but
+ * setting cpu->sve_max_vq is also used to turn SVE on.
+ */
+cpu->sve_max_vq = ARM_SVE_INIT;
 } else {
 uint64_t t;
 uint32_t u;
@@ -764,8 +783,6 @@ static void aarch64_max_initfn(Object *obj)
 cpu->sve_max_vq = ARM_SVE_INIT;
 object_property_add(obj, "sve-max-vq", "uint32", 
cpu_max_get_sve_max_vq,
 cpu_max_set_sve_max_vq, NULL, NULL, _fatal);
-object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
-cpu_arm_set_sve, NULL, NULL, _fatal);
 
 /*
  * sve_vq_map uses a special state while setting properties, so
@@ -780,6 +797,9 @@ static void aarch64_max_initfn(Object *obj)
 cpu_arm_set_sve_vq, NULL, NULL, _fatal);
 }
 }
+
+object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
+cpu_arm_set_sve, NULL, NULL, _fatal);
 }
 
 struct ARMCPUInfo {
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 69c961a4c62c..60645a196d3d 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -49,6 +49,11 @@ int kvm_arm_vcpu_init(CPUState *cs)
 return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, );
 }
 
+int kvm_arm_vcpu_finalize(CPUState *cs, int feature)
+{
+return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_FINALIZE, );
+}
+
 void kvm_arm_init_serror_injection(CPUState *cs)
 {
 cap_has_inject_serror_esr = kvm_check_extension(cs->kvm_state,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 706541327491..9fc7f078cf68 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -604,6 +604,15 @@ bool kvm_arm_aarch32_supported(CPUState *cpu)
 return ret > 0;
 }
 
+bool kvm_arm_sve_supported(CPUState *cpu)
+{
+KVMState *s = KVM_STATE(current_machine->accelerator);
+int ret;
+
+ret = kvm_check_extension(s, KVM_CAP_ARM_SVE);
+return ret > 0;
+}
+
 #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
 
 int kvm_arch_init_vcpu(CPUState *cs)
@@ -632,13 +641,20 @@ int kvm_arch_init_vcpu(CPUState *cs)
 cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
 }
 if (!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
-cpu->has_pmu = false;
+cpu->has_pmu = false;
 }
 if (cpu->has_pmu) {
 cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PMU_V3;
 } else {
 unset_feature(>features, ARM_FEATURE_PMU);
 }
+if (cpu->sve_max_vq) {
+if (!kvm_arm_sve_supported(cs)) {
+cpu->sve_max_vq = 0;
+} else {
+cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_SVE;
+}
+}
 
 /* Do KVM_ARM_VCPU_INIT ioctl */
 ret = kvm_arm_vcpu_init(cs);
@@ -646,6 +662,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return ret;
 }
 
+if (cpu->sve_max_vq) {
+ret = kvm_arm_vcpu_finalize(cs,

Re: [Qemu-devel] [PATCH v2 06/14] target/arm: Allow SVE to be disabled via a CPU property

2019-06-21 Thread Philippe Mathieu-Daudé

Hi Drew,

On 6/21/19 6:34 PM, Andrew Jones wrote:
> Since 97a28b0eeac14 ("target/arm: Allow VFP and Neon to be disabled via
> a CPU property") we can disable the 'max' cpu model's VFP and neon
> features, but there's no way to disable SVE. Add the 'sve=on|off'
> property to give it that flexibility. We also rename
> cpu_max_get/set_sve_vq to cpu_max_get/set_sve_max_vq in order for them
> to follow the typical *_get/set_ pattern.
> 
> Signed-off-by: Andrew Jones 
> ---
>  target/arm/cpu.c | 10 +-
>  target/arm/cpu64.c   | 72 ++--
>  target/arm/helper.c  |  8 +++--
>  target/arm/monitor.c |  2 +-
>  tests/arm-cpu-features.c |  1 +
>  5 files changed, 78 insertions(+), 15 deletions(-)
> 
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 858f668d226e..f08e178fc84b 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -198,7 +198,7 @@ static void arm_cpu_reset(CPUState *s)
>  env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
>  env->cp15.cptr_el[3] |= CPTR_EZ;
>  /* with maximum vector length */
> -env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
> +env->vfp.zcr_el[1] = cpu->sve_max_vq ? cpu->sve_max_vq - 1 : 0;
>  env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
>  env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
>  /*
> @@ -1129,6 +1129,14 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  cpu->isar.mvfr0 = u;
>  }
>  
> +if (!cpu->sve_max_vq) {
> +uint64_t t;
> +
> +t = cpu->isar.id_aa64pfr0;
> +t = FIELD_DP64(t, ID_AA64PFR0, SVE, 0);
> +cpu->isar.id_aa64pfr0 = t;
> +}
> +
>  if (arm_feature(env, ARM_FEATURE_M) && !cpu->has_dsp) {
>  uint32_t u;
>  
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 946994838d8a..02ada65f240c 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -257,27 +257,75 @@ static void aarch64_a72_initfn(Object *obj)
>  define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
>  }
>  
> -static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
> -   void *opaque, Error **errp)
> +static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
>  visit_type_uint32(v, name, >sve_max_vq, errp);
>  }
>  
> -static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
> -   void *opaque, Error **errp)
> +static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
>  {
>  ARMCPU *cpu = ARM_CPU(obj);
>  Error *err = NULL;
> +uint32_t value;
>  
> -visit_type_uint32(v, name, >sve_max_vq, );
> +visit_type_uint32(v, name, , );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
>  
> -if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
> -error_setg(, "unsupported SVE vector length");
> -error_append_hint(, "Valid sve-max-vq in range [1-%d]\n",
> +if (!cpu->sve_max_vq) {
> +error_setg(errp, "cannot set sve-max-vq");
> +error_append_hint(errp, "SVE has been disabled with sve=off\n");
> +return;
> +}
> +
> +cpu->sve_max_vq = value;
> +
> +if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
> +error_setg(errp, "unsupported SVE vector length");
> +error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
>ARM_MAX_VQ);
>  }
> -error_propagate(errp, err);
> +}
> +
> +static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +bool value = !!cpu->sve_max_vq;
> +
> +visit_type_bool(v, name, , errp);
> +}
> +
> +static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +Error *err = NULL;
> +bool value;
> +
> +visit_type_bool(v, name, , );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +if (value) {
> +/*
> + * We handle the -cpu ,sve=off,sve=on case by reinitializing,
> + * but otherwise we don't do anything as an sve=on could come after
> + * a sve-max-vq setting.

I don't understand why would someone use that...

For the rest:
Reviewed-by: Philippe Mathieu-Daudé 

> + */
> +if (!cpu->sve_max_vq) {
> +cpu->sve_max_vq = ARM_MAX_VQ;
> +}
> +} else {
> +cpu->sve_max_vq = 0;
> +}
>  }
>  
>  /* -cpu max: if KVM is enabled, like -cpu host (best possible with this 
> host);
> @@ -373,8 +421,10 @@ static void

[Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve

2019-06-21 Thread Andrew Jones

These are the SVE equivalents to kvm_arch_get/put_fpsimd. Note, the
swabbing is different than it is for fpsmid because the vector format
is a little-endian stream of words.

Signed-off-by: Andrew Jones 
---
 target/arm/kvm64.c | 135 +++--
 1 file changed, 131 insertions(+), 4 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index a2485d447e6a..706541327491 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -673,11 +673,12 @@ int kvm_arch_destroy_vcpu(CPUState *cs)
 bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 {
 /* Return true if the regidx is a register we should synchronize
- * via the cpreg_tuples array (ie is not a core reg we sync by
- * hand in kvm_arch_get/put_registers())
+ * via the cpreg_tuples array (ie is not a core or sve reg that
+ * we sync by hand in kvm_arch_get/put_registers())
  */
 switch (regidx & KVM_REG_ARM_COPROC_MASK) {
 case KVM_REG_ARM_CORE:
+case KVM_REG_ARM64_SVE:
 return false;
 default:
 return true;
@@ -763,6 +764,70 @@ static int kvm_arch_put_fpsimd(CPUState *cs)
 return 0;
 }
 
+/*
+ * If ARM_MAX_VQ is increased to be greater than 16, then we can no
+ * longer hard code slices to 1 in kvm_arch_put/get_sve().
+ */
+QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
+
+static int kvm_arch_put_sve(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+struct kvm_one_reg reg;
+int slices = 1;
+int i, n, ret;
+
+for (i = 0; i < slices; i++) {
+for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
+uint64_t *q = aa64_vfp_qreg(env, n);
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t d[ARM_MAX_VQ * 2];
+int j;
+for (j = 0; j < cpu->sve_max_vq * 2; j++) {
+d[j] = bswap64(q[j]);
+}
+reg.addr = (uintptr_t)d;
+#else
+reg.addr = (uintptr_t)q;
+#endif
+reg.id = KVM_REG_ARM64_SVE_ZREG(n, i);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+return ret;
+}
+}
+
+for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) {
+uint64_t *q = >vfp.pregs[n].p[0];
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t d[ARM_MAX_VQ * 2 / 8];
+int j;
+for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) {
+d[j] = bswap64(q[j]);
+}
+reg.addr = (uintptr_t)d;
+#else
+reg.addr = (uintptr_t)q;
+#endif
+reg.id = KVM_REG_ARM64_SVE_PREG(n, i);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+return ret;
+}
+}
+
+reg.addr = (uintptr_t)>vfp.pregs[FFR_PRED_NUM].p[0];
+reg.id = KVM_REG_ARM64_SVE_FFR(i);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+return ret;
+}
+}
+
+return 0;
+}
+
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
 struct kvm_one_reg reg;
@@ -857,7 +922,11 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 }
 }
 
-ret = kvm_arch_put_fpsimd(cs);
+if (!cpu->sve_max_vq) {
+ret = kvm_arch_put_fpsimd(cs);
+} else {
+ret = kvm_arch_put_sve(cs);
+}
 if (ret) {
 return ret;
 }
@@ -920,6 +989,60 @@ static int kvm_arch_get_fpsimd(CPUState *cs)
 return 0;
 }
 
+static int kvm_arch_get_sve(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+struct kvm_one_reg reg;
+int slices = 1;
+int i, n, ret;
+
+for (i = 0; i < slices; i++) {
+for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
+uint64_t *q = aa64_vfp_qreg(env, n);
+reg.id = KVM_REG_ARM64_SVE_ZREG(n, i);
+reg.addr = (uintptr_t)q;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+return ret;
+} else {
+#ifdef HOST_WORDS_BIGENDIAN
+int j;
+for (j = 0; j < cpu->sve_max_vq * 2; j++) {
+q[j] = bswap64(q[j]);
+}
+#endif
+}
+}
+
+for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) {
+uint64_t *q = >vfp.pregs[n].p[0];
+reg.id = KVM_REG_ARM64_SVE_PREG(n, i);
+reg.addr = (uintptr_t)q;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+return ret;
+} else {
+#ifdef HOST_WORDS_BIGENDIAN
+int j;
+for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) {
+q[j] = bswap64(q[j]);
+}
+#endif
+}
+}
+
+reg.addr = (uintptr_t)>vfp.pregs[FFR_PRED_NUM].p[0];
+reg.id = KVM_REG_ARM64_SVE_FFR(i);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+return ret;
+}
+}
+
+return

Re: [Qemu-devel] [PULL SUBSYSTEM s390x 0/3] s390x/tcg: pending patches

2019-06-21 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20190621134338.8425-1-da...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PULL SUBSYSTEM s390x 0/3] s390x/tcg: pending patches
Message-id: 20190621134338.8425-1-da...@redhat.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag] patchew/20190621134338.8425-1-da...@redhat.com -> 
patchew/20190621134338.8425-1-da...@redhat.com
Switched to a new branch 'test'
d18279e s390x/cpumodel: Prepend KDSA features with "KDSA"
59e1794 s390x/cpumodel: Rework CPU feature definition
3d46e94 tests/tcg/s390x: Fix alignment of csst parameter list

=== OUTPUT BEGIN ===
1/3 Checking commit 3d46e94140a6 (tests/tcg/s390x: Fix alignment of csst 
parameter list)
2/3 Checking commit 59e1794003e5 (s390x/cpumodel: Rework CPU feature definition)
ERROR: Macros with complex values should be enclosed in parenthesis
#407: FILE: target/s390x/cpu_features_def.h:18:
+#define DEF_FEAT(_FEAT, ...) S390_FEAT_##_FEAT,

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#762: 
new file mode 100644

WARNING: line over 80 characters
#792: FILE: target/s390x/cpu_features_def.inc.h:26:
+DEF_FEAT(IDTE_SEGMENT, "idtes", STFL, 4, "IDTE selective TLB segment-table 
clearing")

WARNING: line over 80 characters
#793: FILE: target/s390x/cpu_features_def.inc.h:27:
+DEF_FEAT(IDTE_REGION, "idter", STFL, 5, "IDTE selective TLB region-table 
clearing")

WARNING: line over 80 characters
#799: FILE: target/s390x/cpu_features_def.inc.h:33:
+DEF_FEAT(CONFIGURATION_TOPOLOGY, "ctop", STFL, 11, "Configuration-topology 
facility")

ERROR: line over 90 characters
#800: FILE: target/s390x/cpu_features_def.inc.h:34:
+DEF_FEAT(AP_QUERY_CONFIG_INFO, "apqci", STFL, 12, "Query AP Configuration 
Information facility")

WARNING: line over 80 characters
#802: FILE: target/s390x/cpu_features_def.inc.h:36:
+DEF_FEAT(NONQ_KEY_SETTING, "nonqks", STFL, 14, "Nonquiescing key-setting 
facility")

WARNING: line over 80 characters
#804: FILE: target/s390x/cpu_features_def.inc.h:38:
+DEF_FEAT(EXTENDED_TRANSLATION_2, "etf2", STFL, 16, "Extended-translation 
facility 2")

ERROR: line over 90 characters
#805: FILE: target/s390x/cpu_features_def.inc.h:39:
+DEF_FEAT(MSA, "msa-base", STFL, 17, "Message-security-assist facility 
(excluding subfunctions)")

ERROR: line over 90 characters
#807: FILE: target/s390x/cpu_features_def.inc.h:41:
+DEF_FEAT(LONG_DISPLACEMENT_FAST, "ldisphp", STFL, 19, "Long-displacement 
facility has high performance")

WARNING: line over 80 characters
#810: FILE: target/s390x/cpu_features_def.inc.h:44:
+DEF_FEAT(EXTENDED_TRANSLATION_3, "etf3", STFL, 22, "Extended-translation 
facility 3")

WARNING: line over 80 characters
#811: FILE: target/s390x/cpu_features_def.inc.h:45:
+DEF_FEAT(HFP_UNNORMALIZED_EXT, "hfpue", STFL, 23, "HFP-unnormalized-extension 
facility")

ERROR: line over 90 characters
#815: FILE: target/s390x/cpu_features_def.inc.h:49:
+DEF_FEAT(MOVE_WITH_OPTIONAL_SPEC, "mvcos", STFL, 27, 
"Move-with-optional-specification facility")

ERROR: line over 90 characters
#816: FILE: target/s390x/cpu_features_def.inc.h:50:
+DEF_FEAT(TOD_CLOCK_STEERING, "tods-base", STFL, 28, "TOD-clock-steering 
facility (excluding subfunctions)")

ERROR: line over 90 characters
#819: FILE: target/s390x/cpu_features_def.inc.h:53:
+DEF_FEAT(COMPARE_AND_SWAP_AND_STORE, "csst", STFL, 32, 
"Compare-and-swap-and-store facility")

ERROR: line over 90 characters
#820: FILE: target/s390x/cpu_features_def.inc.h:54:
+DEF_FEAT(COMPARE_AND_SWAP_AND_STORE_2, "csst2", STFL, 33, 
"Compare-and-swap-and-store facility 2")

ERROR: line over 90 characters
#821: FILE: target/s390x/cpu_features_def.inc.h:55:
+DEF_FEAT(GENERAL_INSTRUCTIONS_EXT, "ginste", STFL, 34, 
"General-instructions-extension facility")

WARNING: line over 80 characters
#824: FILE: target/s390x/cpu_features_def.inc.h:58:
+DEF_FEAT(FLOATING_POINT_EXT, "fpe", STFL, 37, "Floating-point extension 
facility")

ERROR: line over 90 characters
#825: FILE: target/s390x/cpu_features_def.inc.h:59:
+DEF_FEAT(ORDER_PRESERVING_COMPRESSION, "opc", STFL, 38, "Order Preserving 
Compression facility")

WARNING: line over 80 characters
#826: FILE: target/s390x/cpu_features_def.inc.h:60:
+DEF_FEAT(SET_PROGRAM_PARAMETERS, "sprogp", STFL, 40, "Set-program-parameters 
facility")

ERROR: line over 90 characters
#827: FILE: target/s390x/cpu_features_def.inc.h:61:
+DEF_FEAT(FLOATING_POINT_SUPPPORT_ENH, "fpseh", STFL, 41, 
"Floating-point-support-enhancement facilities")

ERROR: line over 90 characters
#829: FILE: target/s390x/cpu_features_def.inc.h:63:
+DEF_FEAT(DFP_FAST, "dfphp", STFL, 43, "DFP (decimal-floating-point)

[Qemu-devel] [PATCH v2 07/14] target/arm/cpu64: max cpu: Introduce sve properties

2019-06-21 Thread Andrew Jones

Introduce cpu properties to give fine control over SVE vector lengths.
We introduce a property for each valid length up to the current
maximum supported, which is 2048-bits. The properties are named, e.g.
sve128, sve256, sve512, ..., where the number is the number of bits.

It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
to provide a guest vector lengths 128, 256, and 512 bits. The resulting
set must conform to the architectural constraint of having all power-of-2
lengths smaller than the maximum length present. It's also possible to
only provide sve properties, e.g. -cpu max,sve512=on. That
example provides the machine with 128, 256, and 512 bit vector lengths.
It doesn't hurt to explicitly ask for all expected vector lengths,
which is what, for example, libvirt should do.

Note1, it is not possible to use sve properties before
sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
that overly complicates the user input validation.

Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
same as -cpu max,sve512=on, they are not. The former enables all vector
lengths 512 bits and smaller, while the latter only sets the 512-bit
length and its smaller power-of-2 lengths. It's probably best not to use
sve-max-vq with sve properties, but it can't be completely
forbidden as we want qmp_query_cpu_model_expansion to work with guests
launched with e.g. -cpu max,sve-max-vq=8 on their command line.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu.c |   6 +
 target/arm/cpu.h |  14 ++
 target/arm/cpu64.c   | 360 ++-
 target/arm/helper.c  |  11 +-
 target/arm/monitor.c |  16 ++
 tests/arm-cpu-features.c | 217 +++
 6 files changed, 620 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index f08e178fc84b..e060a0d9df0e 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 return;
 }
 
+arm_cpu_sve_finalize(cpu, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
 if (arm_feature(env, ARM_FEATURE_AARCH64) &&
 cpu->has_vfp != cpu->has_neon) {
 /*
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f9da672be575..cbb155cf72a5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -184,8 +184,13 @@ typedef struct {
 
 #ifdef TARGET_AARCH64
 # define ARM_MAX_VQ16
+void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
+uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
 #else
 # define ARM_MAX_VQ1
+static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
+static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
+{ return 0; }
 #endif
 
 typedef struct ARMVectorReg {
@@ -915,6 +920,15 @@ struct ARMCPU {
 
 /* Used to set the maximum vector length the cpu will support.  */
 uint32_t sve_max_vq;
+
+/*
+ * In sve_vq_map each set bit is a supported vector length of
+ * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
+ * length in quadwords. We need a map size twice the maximum
+ * quadword length though because we use two bits for each vector
+ * length in order to track three states: uninitialized, off, and on.
+ */
+DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
 };
 
 void arm_cpu_post_init(Object *obj);
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 02ada65f240c..5def82111dee 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj)
 define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
 }
 
+/*
+ * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
+ * has only two states (off/on), until we've finalized the map at realize time
+ * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
+ * tracking of the uninitialized state. The arm_vq_state typedef and following
+ * functions allow us to more easily work with the bitmap. Also, while the map
+ * is still initializing, sve-max-vq has an additional three states, bringing
+ * the number of its states to five, which are the following:
+ *
+ * sve-max-vq:
+ *   0:SVE is disabled. The default value for a vq in the map is 'OFF'.
+ *  -1:SVE is enabled, but neither sve-max-vq nor sve properties
+ * have yet been specified by the user. The default value for a vq in
+ * the map is 'ON'.
+ *  -2:SVE is enabled and one or more sve properties have been
+ * set to 'OFF' by the user, but no sve properties have yet
+ * been set to 'ON'. The user is now blocked from setting sve-max-vq
+ * and the default value for a vq in the map is 'ON'.
+ *  -3:SVE is enabled and one or more sve properties have been
+ * set to 'ON' by the user. The user is blocked from setting

[Qemu-devel] [PATCH v2 09/14] target/arm/kvm64: Move the get/put of fpsimd registers out

2019-06-21 Thread Andrew Jones

Move the getting/putting of the fpsimd registers out of
kvm_arch_get/put_registers() into their own helper functions
to prepare for alternatively getting/putting SVE registers.

No functional change.

Signed-off-by: Andrew Jones 
Reviewed-by: Eric Auger 
---
 target/arm/kvm64.c | 148 +++--
 1 file changed, 88 insertions(+), 60 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 9ca9a0ce821d..a2485d447e6a 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -721,13 +721,53 @@ int kvm_arm_cpreg_level(uint64_t regidx)
 #define AARCH64_SIMD_CTRL_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U32 | \
  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
+static int kvm_arch_put_fpsimd(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+struct kvm_one_reg reg;
+uint32_t fpr;
+int i, ret;
+
+for (i = 0; i < 32; i++) {
+uint64_t *q = aa64_vfp_qreg(env, i);
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t fp_val[2] = { q[1], q[0] };
+reg.addr = (uintptr_t)fp_val;
+#else
+reg.addr = (uintptr_t)q;
+#endif
+reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+return ret;
+}
+}
+
+reg.addr = (uintptr_t)();
+fpr = vfp_get_fpsr(env);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+return ret;
+}
+
+reg.addr = (uintptr_t)();
+fpr = vfp_get_fpcr(env);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+return ret;
+}
+
+return 0;
+}
+
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
 struct kvm_one_reg reg;
-uint32_t fpr;
 uint64_t val;
-int i;
-int ret;
+int i, ret;
 unsigned int el;
 
 ARMCPU *cpu = ARM_CPU(cs);
@@ -817,33 +857,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 }
 }
 
-/* Advanced SIMD and FP registers. */
-for (i = 0; i < 32; i++) {
-uint64_t *q = aa64_vfp_qreg(env, i);
-#ifdef HOST_WORDS_BIGENDIAN
-uint64_t fp_val[2] = { q[1], q[0] };
-reg.addr = (uintptr_t)fp_val;
-#else
-reg.addr = (uintptr_t)q;
-#endif
-reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
-if (ret) {
-return ret;
-}
-}
-
-reg.addr = (uintptr_t)();
-fpr = vfp_get_fpsr(env);
-reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
-if (ret) {
-return ret;
-}
-
-fpr = vfp_get_fpcr(env);
-reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+ret = kvm_arch_put_fpsimd(cs);
 if (ret) {
 return ret;
 }
@@ -864,14 +878,54 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+static int kvm_arch_get_fpsimd(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = >env;
+struct kvm_one_reg reg;
+uint32_t fpr;
+int i, ret;
+
+for (i = 0; i < 32; i++) {
+uint64_t *q = aa64_vfp_qreg(env, i);
+reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
+reg.addr = (uintptr_t)q;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+return ret;
+} else {
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t t;
+t = q[0], q[0] = q[1], q[1] = t;
+#endif
+}
+}
+
+reg.addr = (uintptr_t)();
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+return ret;
+}
+vfp_set_fpsr(env, fpr);
+
+reg.addr = (uintptr_t)();
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+return ret;
+}
+vfp_set_fpcr(env, fpr);
+
+return 0;
+}
+
 int kvm_arch_get_registers(CPUState *cs)
 {
 struct kvm_one_reg reg;
 uint64_t val;
-uint32_t fpr;
 unsigned int el;
-int i;
-int ret;
+int i, ret;
 
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = >env;
@@ -960,36 +1014,10 @@ int kvm_arch_get_registers(CPUState *cs)
 env->spsr = env->banked_spsr[i];
 }
 
-/* Advanced SIMD and FP registers */
-for (i = 0; i < 32; i++) {
-uint64_t *q = aa64_vfp_qreg(env, i);
-reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
-reg.addr = (uintptr_t)q;
-ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
-if (ret) {
-return ret;
-} else {
-#ifdef HOST_WORDS_BIGENDIAN
-uint64_t t;
-t = q[0], q[0] = q[1], q[1] = t;
-#endif
-}
-}
-
-reg.addr = (uintptr_t)();
-reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
-ret =

[Qemu-devel] [PATCH v2 13/14] target/arm/cpu64: max cpu: Support sve properties with KVM

2019-06-21 Thread Andrew Jones

Extend the SVE vq map initialization and validation with KVM's
supported vector lengths when KVM is enabled. In order to determine
and select supported lengths we add two new KVM functions for getting
and setting the KVM_REG_ARM64_SVE_VLS pseudo-register.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu.h |   3 +-
 target/arm/cpu64.c   | 171 +++
 target/arm/kvm64.c   | 117 +--
 target/arm/kvm_arm.h |  19 +
 target/arm/monitor.c |   2 +-
 tests/arm-cpu-features.c |  86 +---
 6 files changed, 331 insertions(+), 67 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cbb155cf72a5..8a1c6c66a462 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -926,7 +926,8 @@ struct ARMCPU {
  * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
  * length in quadwords. We need a map size twice the maximum
  * quadword length though because we use two bits for each vector
- * length in order to track three states: uninitialized, off, and on.
+ * length in order to track four states: uninitialized, uninitialized
+ * but supported by KVM, off, and on.
  */
 DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
 };
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 2e595ad53137..6e92aa54b9c8 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -261,10 +261,11 @@ static void aarch64_a72_initfn(Object *obj)
  * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
  * has only two states (off/on), until we've finalized the map at realize time
  * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
- * tracking of the uninitialized state. The arm_vq_state typedef and following
- * functions allow us to more easily work with the bitmap. Also, while the map
- * is still initializing, sve-max-vq has an additional three states, bringing
- * the number of its states to five, which are the following:
+ * tracking of the uninitialized state and the uninitialized but supported by
+ * KVM state. The arm_vq_state typedef and following functions allow us to more
+ * easily work with the bitmap. Also, while the map is still initializing,
+ * sve-max-vq has an additional three states, bringing the number of its states
+ * to five, which are the following:
  *
  * sve-max-vq:
  *   0:SVE is disabled. The default value for a vq in the map is 'OFF'.
@@ -296,6 +297,11 @@ typedef enum arm_vq_state {
 ARM_VQ_OFF,
 ARM_VQ_ON,
 ARM_VQ_UNINITIALIZED,
+ARM_VQ_UNINITIALIZED_KVM_SUPPORTED
+/*
+ * More states cannot be added without adding bits to sve_vq_map
+ * and modifying its supporting functions.
+ */
 } arm_vq_state;
 
 static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)
@@ -324,6 +330,23 @@ static void arm_cpu_vq_map_init(ARMCPU *cpu)
 {
 bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
 bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
+
+if (kvm_enabled()) {
+DECLARE_BITMAP(kvm_supported, ARM_MAX_VQ);
+uint32_t kvm_max_vq;
+
+bitmap_zero(kvm_supported, ARM_MAX_VQ);
+
+kvm_arm_sve_get_vls(CPU(cpu), kvm_supported, ARM_MAX_VQ, _max_vq);
+
+if (kvm_max_vq > ARM_MAX_VQ) {
+warn_report("KVM supports vector lengths larger than "
+"QEMU can enable");
+}
+
+bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map,
+  kvm_supported, ARM_MAX_VQ);
+}
 }
 
 static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
@@ -371,12 +394,7 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 return;
 }
 
-/* sve-max-vq and sve properties not yet implemented for KVM */
-if (kvm_enabled()) {
-return;
-}
-
-if (cpu->sve_max_vq == ARM_SVE_INIT) {
+if (!kvm_enabled() && cpu->sve_max_vq == ARM_SVE_INIT) {
 object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", );
 if (err) {
 error_propagate(errp, err);
@@ -431,6 +449,11 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor 
*v, const char *name,
 return;
 }
 
+if (kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
+error_setg(errp, "'sve' feature not supported by KVM on this host");
+return;
+}
+
 /*
  * It gets complicated trying to support both sve-max-vq and
  * sve properties together, so we mostly don't. We
@@ -460,6 +483,12 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor 
*v, const char *name,
 sprintf(name, "sve%d", vq * 128);
 object_property_set_bool(obj, true, name, );
 if (err) {
+if (kvm_enabled()) {
+error_append_hint(, "It is not possible to use "
+  "sve-max-vq with this KVM host. Try "
+  "using only sve "
+

[Qemu-devel] [PATCH v2 08/14] target/arm/kvm64: Fix error returns

2019-06-21 Thread Andrew Jones

A couple return -EINVAL's forgot their '-'s.

Signed-off-by: Andrew Jones 
Reviewed-by: Eric Auger 
---
 target/arm/kvm64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 45ccda589903..9ca9a0ce821d 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -856,7 +856,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 write_cpustate_to_list(cpu, true);
 
 if (!write_list_to_kvmstate(cpu, level)) {
-return EINVAL;
+return -EINVAL;
 }
 
 kvm_arm_sync_mpstate_to_kvm(cpu);
@@ -997,7 +997,7 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 
 if (!write_kvmstate_to_list(cpu)) {
-return EINVAL;
+return -EINVAL;
 }
 /* Note that it's OK to have registers which aren't in CPUState,
  * so we can ignore a failure return here.
-- 
2.20.1

[Qemu-devel] [PATCH v2 06/14] target/arm: Allow SVE to be disabled via a CPU property

2019-06-21 Thread Andrew Jones

Since 97a28b0eeac14 ("target/arm: Allow VFP and Neon to be disabled via
a CPU property") we can disable the 'max' cpu model's VFP and neon
features, but there's no way to disable SVE. Add the 'sve=on|off'
property to give it that flexibility. We also rename
cpu_max_get/set_sve_vq to cpu_max_get/set_sve_max_vq in order for them
to follow the typical *_get/set_ pattern.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu.c | 10 +-
 target/arm/cpu64.c   | 72 ++--
 target/arm/helper.c  |  8 +++--
 target/arm/monitor.c |  2 +-
 tests/arm-cpu-features.c |  1 +
 5 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 858f668d226e..f08e178fc84b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -198,7 +198,7 @@ static void arm_cpu_reset(CPUState *s)
 env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
 env->cp15.cptr_el[3] |= CPTR_EZ;
 /* with maximum vector length */
-env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
+env->vfp.zcr_el[1] = cpu->sve_max_vq ? cpu->sve_max_vq - 1 : 0;
 env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
 env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
 /*
@@ -1129,6 +1129,14 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 cpu->isar.mvfr0 = u;
 }
 
+if (!cpu->sve_max_vq) {
+uint64_t t;
+
+t = cpu->isar.id_aa64pfr0;
+t = FIELD_DP64(t, ID_AA64PFR0, SVE, 0);
+cpu->isar.id_aa64pfr0 = t;
+}
+
 if (arm_feature(env, ARM_FEATURE_M) && !cpu->has_dsp) {
 uint32_t u;
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 946994838d8a..02ada65f240c 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -257,27 +257,75 @@ static void aarch64_a72_initfn(Object *obj)
 define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
 }
 
-static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
-   void *opaque, Error **errp)
+static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
+   void *opaque, Error **errp)
 {
 ARMCPU *cpu = ARM_CPU(obj);
 visit_type_uint32(v, name, >sve_max_vq, errp);
 }
 
-static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
-   void *opaque, Error **errp)
+static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
+   void *opaque, Error **errp)
 {
 ARMCPU *cpu = ARM_CPU(obj);
 Error *err = NULL;
+uint32_t value;
 
-visit_type_uint32(v, name, >sve_max_vq, );
+visit_type_uint32(v, name, , );
+if (err) {
+error_propagate(errp, err);
+return;
+}
 
-if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
-error_setg(, "unsupported SVE vector length");
-error_append_hint(, "Valid sve-max-vq in range [1-%d]\n",
+if (!cpu->sve_max_vq) {
+error_setg(errp, "cannot set sve-max-vq");
+error_append_hint(errp, "SVE has been disabled with sve=off\n");
+return;
+}
+
+cpu->sve_max_vq = value;
+
+if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
+error_setg(errp, "unsupported SVE vector length");
+error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
   ARM_MAX_VQ);
 }
-error_propagate(errp, err);
+}
+
+static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+bool value = !!cpu->sve_max_vq;
+
+visit_type_bool(v, name, , errp);
+}
+
+static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+Error *err = NULL;
+bool value;
+
+visit_type_bool(v, name, , );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+if (value) {
+/*
+ * We handle the -cpu ,sve=off,sve=on case by reinitializing,
+ * but otherwise we don't do anything as an sve=on could come after
+ * a sve-max-vq setting.
+ */
+if (!cpu->sve_max_vq) {
+cpu->sve_max_vq = ARM_MAX_VQ;
+}
+} else {
+cpu->sve_max_vq = 0;
+}
 }
 
 /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
@@ -373,8 +421,10 @@ static void aarch64_max_initfn(Object *obj)
 #endif
 
 cpu->sve_max_vq = ARM_MAX_VQ;
-object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_vq,
-cpu_max_set_sve_vq, NULL, NULL, _fatal);
+object_property_add(obj, "sve-max-vq", "uint32", 
cpu_max_get_sve_max_vq,
+cpu_max_set_sve_max_vq, NULL, NULL, _fatal);
+object_property_add(obj, "sve", "bool",

[Qemu-devel] [PATCH v2 03/14] target/arm/monitor: Introduce qmp_query_cpu_model_expansion

2019-06-21 Thread Andrew Jones

Add support for the query-cpu-model-expansion QMP command to Arm. We
do this selectively, only exposing CPU properties which represent
optional CPU features which the user may want to enable/disable. Also,
for simplicity, we restrict the list of queryable cpu models to 'max',
'host', or the current type when KVM is in use, even though there
may exist KVM hosts where other types would also work. For example on a
seattle you could use 'host' for the current type, but then attempt to
query 'cortex-a57', which is also a valid CPU type to use with KVM on
seattle hosts, but that query will fail with our simplifications. This
shouldn't be an issue though as management layers and users have been
preferring the 'host' CPU type for use with KVM for quite some time.
Additionally, if the KVM-enabled QEMU instance running on a seattle
host is using the cortex-a57 CPU type, then querying 'cortex-a57' will
work. Finally, we only implement expansion type 'full', as Arm does not
yet have a "base" CPU type. Below are some example calls and results
(to save character clutter they're not in json, but are still json-ish
to give the idea)

 # expand the 'max' CPU model
 query-cpu-model-expansion: type:full, model:{ name:max }

 return: model:{ name:max, props:{ 'aarch64': true, 'pmu': true }}

 # attempt to expand the 'max' CPU model with pmu=off
 query-cpu-model-expansion:
   type:full, model:{ name:max, props:{ 'pmu': false }}

 return: model:{ name:max, props:{ 'aarch64': true, 'pmu': false }}

 # attempt to expand the 'max' CPU model with aarch64=off
 query-cpu-model-expansion:
   type:full, model:{ name:max, props:{ 'aarch64': false }}

 error: "'aarch64' feature cannot be disabled unless KVM is enabled
 and 32-bit EL1 is supported"

In the last example KVM was not in use so an error was returned.

Note1: It's possible for features to have dependencies on other
features. I.e. it may be possible to change one feature at a time
without error, but when attempting to change all features at once
an error could occur depending on the order they are processed. It's
also possible changing all at once doesn't generate an error, because
a feature's dependencies are satisfied with other features, but the
same feature cannot be changed independently without error. For these
reasons callers should always attempt to make their desired changes
all at once in order to ensure the collection is valid.

Note2: Certainly more features may be added to the list of
advertised features, e.g. 'vfp' and 'neon'. The only requirement
is that their property set accessors fail when invalid
configurations are detected. For vfp we would need something like

 set_vfp()
 {
   if (arm_feature(env, ARM_FEATURE_AARCH64) &&
   cpu->has_vfp != cpu->has_neon)
   error("AArch64 CPUs must have both VFP and Neon or neither")

in its set accessor, and the same for neon, rather than doing that
check at realize time, which isn't executed at qmp query time.

Signed-off-by: Andrew Jones 
---
 qapi/target.json |   6 +-
 target/arm/monitor.c | 132 +++
 2 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/qapi/target.json b/qapi/target.json
index 1d4d54b6002e..edfa2f82b916 100644
--- a/qapi/target.json
+++ b/qapi/target.json
@@ -408,7 +408,7 @@
 ##
 { 'struct': 'CpuModelExpansionInfo',
   'data': { 'model': 'CpuModelInfo' },
-  'if': 'defined(TARGET_S390X) || defined(TARGET_I386)' }
+  'if': 'defined(TARGET_S390X) || defined(TARGET_I386) || defined(TARGET_ARM)' 
}
 
 ##
 # @query-cpu-model-expansion:
@@ -433,7 +433,7 @@
 #   query-cpu-model-expansion while using these is not advised.
 #
 # Some architectures may not support all expansion types. s390x supports
-# "full" and "static".
+# "full" and "static". Arm only supports "full".
 #
 # Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models is
 #  not supported, if the model cannot be expanded, if the model 
contains
@@ -447,7 +447,7 @@
   'data': { 'type': 'CpuModelExpansionType',
 'model': 'CpuModelInfo' },
   'returns': 'CpuModelExpansionInfo',
-  'if': 'defined(TARGET_S390X) || defined(TARGET_I386)' }
+  'if': 'defined(TARGET_S390X) || defined(TARGET_I386) || defined(TARGET_ARM)' 
}
 
 ##
 # @CpuDefinitionInfo:
diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index 41b32b94b258..19e3120eef95 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -23,7 +23,13 @@
 #include "qemu/osdep.h"
 #include "hw/boards.h"
 #include "kvm_arm.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-commands-target.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qdict.h"
+#include "qom/qom-qobject.h"
 
 static GICCapability *gic_cap_new(int version)
 {
@@ -82,3 +88,129 @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
 
 return head;
 }
+
+static const char *cpu_model_advertised_features[] = {
+"aarch64", "pmu",
+NULL

[Qemu-devel] [PATCH v2 02/14] target/arm/cpu: Ensure we can use the pmu with kvm

2019-06-21 Thread Andrew Jones

We first convert the pmu property from a static property to one with
its own accessors. Then we use the set accessor to check if the PMU is
supported when using KVM. Indeed a 32-bit KVM host does not support
the PMU, so this check will catch an attempt to use it at property-set
time.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu.c | 30 +-
 target/arm/kvm.c |  9 +
 target/arm/kvm_arm.h | 13 +
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 376db154f008..858f668d226e 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -759,10 +759,6 @@ static Property arm_cpu_has_el3_property =
 static Property arm_cpu_cfgend_property =
 DEFINE_PROP_BOOL("cfgend", ARMCPU, cfgend, false);
 
-/* use property name "pmu" to match other archs and virt tools */
-static Property arm_cpu_has_pmu_property =
-DEFINE_PROP_BOOL("pmu", ARMCPU, has_pmu, true);
-
 static Property arm_cpu_has_vfp_property =
 DEFINE_PROP_BOOL("vfp", ARMCPU, has_vfp, true);
 
@@ -785,6 +781,29 @@ static Property arm_cpu_pmsav7_dregion_property =
pmsav7_dregion,
qdev_prop_uint32, uint32_t);
 
+static bool arm_get_pmu(Object *obj, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+return cpu->has_pmu;
+}
+
+static void arm_set_pmu(Object *obj, bool value, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+if (value) {
+if (kvm_enabled() && !kvm_arm_pmu_supported(CPU(cpu))) {
+error_setg(errp, "'pmu' feature not supported by KVM on this 
host");
+return;
+}
+set_feature(>env, ARM_FEATURE_PMU);
+} else {
+unset_feature(>env, ARM_FEATURE_PMU);
+}
+cpu->has_pmu = value;
+}
+
 static void arm_get_init_svtor(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
 {
@@ -859,7 +878,8 @@ void arm_cpu_post_init(Object *obj)
 }
 
 if (arm_feature(>env, ARM_FEATURE_PMU)) {
-qdev_property_add_static(DEVICE(obj), _cpu_has_pmu_property,
+cpu->has_pmu = true;
+object_property_add_bool(obj, "pmu", arm_get_pmu, arm_set_pmu,
  _abort);
 }
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index fe4f461d4ef6..69c961a4c62c 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -162,6 +162,15 @@ void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
 env->features = arm_host_cpu_features.features;
 }
 
+bool kvm_arm_pmu_supported(CPUState *cpu)
+{
+KVMState *s = KVM_STATE(current_machine->accelerator);
+int ret;
+
+ret = kvm_check_extension(s, KVM_CAP_ARM_PMU_V3);
+return ret > 0;
+}
+
 int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
 {
 KVMState *s = KVM_STATE(ms->accelerator);
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 812125f805a1..e0ded3607996 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -216,6 +216,14 @@ void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu);
  */
 bool kvm_arm_aarch32_supported(CPUState *cs);
 
+/**
+ * bool kvm_arm_pmu_supported:
+ * @cs: CPUState
+ *
+ * Returns true if the KVM VCPU can enable its PMU and false otherwise.
+ */
+bool kvm_arm_pmu_supported(CPUState *cs);
+
 /**
  * kvm_arm_get_max_vm_ipa_size - Returns the number of bits in the
  * IPA address space supported by KVM
@@ -261,6 +269,11 @@ static inline bool kvm_arm_aarch32_supported(CPUState *cs)
 return false;
 }
 
+static inline bool kvm_arm_pmu_supported(CPUState *cs)
+{
+return false;
+}
+
 static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
 {
 return -ENOENT;
-- 
2.20.1

[Qemu-devel] [PATCH v2 00/14] target/arm/kvm: enable SVE in guests

2019-06-21 Thread Andrew Jones

Since Linux kernel v5.2-rc1 KVM has support for enabling SVE in guests.
This series provides the QEMU bits for that enablement. This is a v2
series, however it looks completely different than v1. Thank you to
all who reviewed v1. I've included all input still relevant to this
new approach. And the new approach is thanks to Igor for suggesting
it. The new approach is to use a CPU property for each vector length
and then implement the preexisting qmp_query_cpu_model_expansion
query for Arm to expose them. Here's how the series goes:

First, we select existing CPU properties representing features we
want to advertise in addition to the SVE vector lengths and prepare
them for the qmp query. Then we introduce the qmp query, applying
it immediately to those selected features. We next add a qtest for
the selected CPU features that uses the qmp query for its tests - and
we continue to add tests as we add CPU features with the following
patches. So then, once we have the support we need for CPU feature
querying and testing, we add our first SVE CPU feature property, sve,
which just allows SVE to be completely enabled/disabled. Following
that feature property, we add all 16 vector length properties along
with the input validation they need and tests to prove the validation
works. At this point the SVE features are still only for TCG, so we
provide some patches to prepare for KVM and then a patch that allows
the 'max' CPU type to enable SVE with KVM, but at first without
vector length properties. After a bit more preparation we add the
SVE vector length properties to the KVM-enabled 'max' CPU type along
with the additional input validation and tests that that needs.
Finally we allow the 'host' CPU type to also enjoy these properties
by simply sharing them with it.

Phew, I think that's everything.

Thanks!
drew

Andrew Jones (14):
  target/arm/cpu64: Ensure kvm really supports aarch64=off
  target/arm/cpu: Ensure we can use the pmu with kvm
  target/arm/monitor: Introduce qmp_query_cpu_model_expansion
  tests: arm: Introduce cpu feature tests
  target/arm/helper: zcr: Add build bug next to value range assumption
  target/arm: Allow SVE to be disabled via a CPU property
  target/arm/cpu64: max cpu: Introduce sve properties
  target/arm/kvm64: Fix error returns
  target/arm/kvm64: Move the get/put of fpsimd registers out
  target/arm/kvm64: Add kvm_arch_get/put_sve
  target/arm/kvm64: max cpu: Enable SVE when available
  target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features
  target/arm/cpu64: max cpu: Support sve properties with KVM
  target/arm/kvm: host cpu: Add support for sve properties

 qapi/target.json |   6 +-
 target/arm/cpu.c |  47 +++-
 target/arm/cpu.h |  17 ++
 target/arm/cpu64.c   | 548 +--
 target/arm/helper.c  |  20 +-
 target/arm/kvm.c |  34 ++-
 target/arm/kvm32.c   |   6 +-
 target/arm/kvm64.c   | 428 +-
 target/arm/kvm_arm.h |  73 ++
 target/arm/monitor.c | 148 +++
 tests/Makefile.include   |   5 +-
 tests/arm-cpu-features.c | 509 
 12 files changed, 1738 insertions(+), 103 deletions(-)
 create mode 100644 tests/arm-cpu-features.c

-- 
2.20.1

[Qemu-devel] [PATCH v2 05/14] target/arm/helper: zcr: Add build bug next to value range assumption

2019-06-21 Thread Andrew Jones

Suggested-by: Dave Martin 
Signed-off-by: Andrew Jones 
---
 target/arm/helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index df4276f5f6ca..edba94004e0b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5319,6 +5319,7 @@ static void zcr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 int new_len;
 
 /* Bits other than [3:0] are RAZ/WI.  */
+QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
 raw_write(env, ri, value & 0xf);
 
 /*
-- 
2.20.1

[Qemu-devel] [PATCH v2 04/14] tests: arm: Introduce cpu feature tests

2019-06-21 Thread Andrew Jones

Now that Arm CPUs have advertised features lets add tests to ensure
we maintain their expected availability with and without KVM.

Signed-off-by: Andrew Jones 
---
 tests/Makefile.include   |   5 +-
 tests/arm-cpu-features.c | 221 +++
 2 files changed, 225 insertions(+), 1 deletion(-)
 create mode 100644 tests/arm-cpu-features.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index db750dd6d09b..d5f43fe03067 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -255,13 +255,15 @@ check-qtest-sparc64-$(CONFIG_ISA_TESTDEV) = 
tests/endianness-test$(EXESUF)
 check-qtest-sparc64-y += tests/prom-env-test$(EXESUF)
 check-qtest-sparc64-y += tests/boot-serial-test$(EXESUF)
 
+check-qtest-arm-y += tests/arm-cpu-features$(EXESUF)
 check-qtest-arm-y += tests/microbit-test$(EXESUF)
 check-qtest-arm-y += tests/m25p80-test$(EXESUF)
 check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
 check-qtest-arm-y += tests/boot-serial-test$(EXESUF)
 check-qtest-arm-y += tests/hexloader-test$(EXESUF)
 
-check-qtest-aarch64-y = tests/numa-test$(EXESUF)
+check-qtest-aarch64-y += tests/arm-cpu-features$(EXESUF)
+check-qtest-aarch64-y += tests/numa-test$(EXESUF)
 check-qtest-aarch64-y += tests/boot-serial-test$(EXESUF)
 check-qtest-aarch64-y += tests/migration-test$(EXESUF)
 # TODO: once aarch64 TCG is fixed on ARM 32 bit host, make test unconditional
@@ -822,6 +824,7 @@ tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o 
$(test-util-obj-y)
 tests/numa-test$(EXESUF): tests/numa-test.o
 tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o tests/boot-sector.o 
tests/acpi-utils.o
 tests/cdrom-test$(EXESUF): tests/cdrom-test.o tests/boot-sector.o 
$(libqos-obj-y)
+tests/arm-cpu-features$(EXESUF): tests/arm-cpu-features.o
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< 
,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
new file mode 100644
index ..31b1c15bb979
--- /dev/null
+++ b/tests/arm-cpu-features.c
@@ -0,0 +1,221 @@
+/*
+ * Arm CPU feature test cases
+ *
+ * Copyright (c) 2019 Red Hat Inc.
+ * Authors:
+ *  Andrew Jones 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qjson.h"
+
+#define MACHINE"-machine virt,gic-version=max "
+#define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
+ "'arguments': { 'type': 'full', "
+#define QUERY_TAIL "}}"
+
+static QDict *do_query_no_props(QTestState *qts, const char *cpu_type)
+{
+return qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s }"
+  QUERY_TAIL, cpu_type);
+}
+
+static const char *resp_get_error(QDict *resp)
+{
+QDict *qdict;
+
+g_assert(resp);
+qdict = qdict_get_qdict(resp, "error");
+if (qdict) {
+return qdict_get_str(qdict, "desc");
+}
+return NULL;
+}
+
+static char *get_error(QTestState *qts, const char *cpu_type,
+   const char *fmt, ...)
+{
+QDict *resp;
+char *error;
+
+if (fmt) {
+QDict *args;
+va_list ap;
+
+va_start(ap, fmt);
+args = qdict_from_vjsonf_nofail(fmt, ap);
+va_end(ap);
+
+resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
+"'props': %p }"
+  QUERY_TAIL, cpu_type, args);
+} else {
+resp = do_query_no_props(qts, cpu_type);
+}
+
+g_assert(resp);
+error = g_strdup(resp_get_error(resp));
+qobject_unref(resp);
+
+return error;
+}
+
+#define assert_error(qts, cpu_type, expected_error, fmt, ...)  \
+({ \
+char *_error = get_error(qts, cpu_type, fmt, ##__VA_ARGS__);   \
+g_assert(_error);  \
+g_assert(g_str_equal(_error, expected_error)); \
+g_free(_error);\
+})
+
+static QDict *resp_get_props(QDict *resp)
+{
+QDict *qdict;
+
+g_assert(resp);
+g_assert(qdict_haskey(resp, "return"));
+qdict = qdict_get_qdict(resp, "return");
+g_assert(qdict_haskey(qdict, "model"));
+qdict = qdict_get_qdict(qdict, "model");
+g_assert(qdict_haskey(qdict, "props"));
+qdict = qdict_get_qdict(qdict, "props");
+return qdict;
+}
+
+#define assert_has_feature(qts, cpu_type, feature) \
+({ \
+QDict *_resp = do_query_no_props(qts, cpu_type);   \
+g_assert(_resp);   \
+g_assert(qdict_get(resp_get_props(_resp),

[Qemu-devel] [PATCH v2 01/14] target/arm/cpu64: Ensure kvm really supports aarch64=off

2019-06-21 Thread Andrew Jones

If -cpu ,aarch64=off is used then KVM must also be used, and it
and the host must support running the vcpu in 32-bit mode. Also, if
-cpu ,aarch64=on is used, then it doesn't matter if kvm is
enabled or not.

Signed-off-by: Andrew Jones 
---
 target/arm/cpu64.c   | 12 ++--
 target/arm/kvm64.c   | 11 +++
 target/arm/kvm_arm.h | 14 ++
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 1901997a0645..946994838d8a 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -407,13 +407,13 @@ static void aarch64_cpu_set_aarch64(Object *obj, bool 
value, Error **errp)
  * restriction allows us to avoid fixing up functionality that assumes a
  * uniform execution state like do_interrupt.
  */
-if (!kvm_enabled()) {
-error_setg(errp, "'aarch64' feature cannot be disabled "
- "unless KVM is enabled");
-return;
-}
-
 if (value == false) {
+if (!kvm_enabled() || !kvm_arm_aarch32_supported(CPU(cpu))) {
+error_setg(errp, "'aarch64' feature cannot be disabled "
+ "unless KVM is enabled and 32-bit EL1 "
+ "is supported");
+return;
+}
 unset_feature(>env, ARM_FEATURE_AARCH64);
 } else {
 set_feature(>env, ARM_FEATURE_AARCH64);
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 22d19c9aec6f..45ccda589903 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -24,7 +24,9 @@
 #include "exec/gdbstub.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
 #include "kvm_arm.h"
+#include "hw/boards.h"
 #include "internals.h"
 
 static bool have_guest_debug;
@@ -593,6 +595,15 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
*ahcf)
 return true;
 }
 
+bool kvm_arm_aarch32_supported(CPUState *cpu)
+{
+KVMState *s = KVM_STATE(current_machine->accelerator);
+int ret;
+
+ret = kvm_check_extension(s, KVM_CAP_ARM_EL1_32BIT);
+return ret > 0;
+}
+
 #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
 
 int kvm_arch_init_vcpu(CPUState *cs)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 2a07333c615f..812125f805a1 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -207,6 +207,15 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
*ahcf);
  */
 void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu);
 
+/**
+ * kvm_arm_aarch32_supported:
+ * @cs: CPUState
+ *
+ * Returns true if the KVM VCPU can enable AArch32 mode and false
+ * otherwise.
+ */
+bool kvm_arm_aarch32_supported(CPUState *cs);
+
 /**
  * kvm_arm_get_max_vm_ipa_size - Returns the number of bits in the
  * IPA address space supported by KVM
@@ -247,6 +256,11 @@ static inline void 
kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
 cpu->host_cpu_probe_failed = true;
 }
 
+static inline bool kvm_arm_aarch32_supported(CPUState *cs)
+{
+return false;
+}
+
 static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
 {
 return -ENOENT;
-- 
2.20.1

Re: [Qemu-devel] [PATCH 2/4] libvhost-user: support many virtqueues

2019-06-21 Thread Stefan Hajnoczi

On Fri, Jun 21, 2019 at 03:48:36PM +0200, Marc-André Lureau wrote:
> On Fri, Jun 21, 2019 at 11:40 AM Stefan Hajnoczi  wrote:
> > diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
> > b/contrib/vhost-user-blk/vhost-user-blk.c
> > index 86a3987744..ae61034656 100644
> > --- a/contrib/vhost-user-blk/vhost-user-blk.c
> > +++ b/contrib/vhost-user-blk/vhost-user-blk.c
> > @@ -25,6 +25,10 @@
> >  #include 
> >  #endif
> >
> > +enum {
> > +VHOST_USER_BLK_MAX_QUEUES = 8,
> > +};
> 
> why do you use enum,(and not const int) ? (similarly for other devices)
> 
> other than than
> Reviewed-by: Marc-André Lureau 

This is how I was taught when I was a little boy.

With an actual variable there's a risk that the compiler reserves space
for a variable when you actually just need a constant.  Whether modern
compilers do that or not, I don't know.

The type is clearer when a variable is used instead of an enum.

Pros and cons...

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 1/6] armv7m: Allow entry information to be returned

2019-06-21 Thread Philippe Mathieu-Daudé

On 6/19/19 6:53 AM, Alistair Francis wrote:
> Allow the kernel's entry point information to be returned when loading a
> kernel.
> 
> Signed-off-by: Alistair Francis 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  hw/arm/armv7m.c   | 4 +++-
>  include/hw/arm/boot.h | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
> index b9efad6bac..8ee6291a47 100644
> --- a/hw/arm/armv7m.c
> +++ b/hw/arm/armv7m.c
> @@ -304,7 +304,7 @@ static void armv7m_reset(void *opaque)
>  cpu_reset(CPU(cpu));
>  }
>  
> -void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
> mem_size)
> +uint64_t armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
> mem_size)
>  {
>  int image_size;
>  uint64_t entry;
> @@ -351,6 +351,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
> *kernel_filename, int mem_size)
>   * board must call this function!
>   */
>  qemu_register_reset(armv7m_reset, cpu);
> +
> +return entry;
>  }
>  
>  static Property bitband_properties[] = {
> diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
> index c48cc4c2bc..4e4db0416c 100644
> --- a/include/hw/arm/boot.h
> +++ b/include/hw/arm/boot.h
> @@ -29,11 +29,13 @@ typedef enum {
>   * @kernel_filename: file to load
>   * @mem_size: mem_size: maximum image size to load
>   *
> + * returns: location of the kernel's entry point
> + *
>   * Load the guest image for an ARMv7M system. This must be called by
>   * any ARMv7M board. (This is necessary to ensure that the CPU resets
>   * correctly on system reset, as well as for kernel loading.)
>   */
> -void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
> mem_size);
> +uint64_t armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
> mem_size);
>  
>  /* arm_boot.c */
>  struct arm_boot_info {
>

[Qemu-devel] [PATCH] ioapic: use irq number instead of vector in ioapic_eoi_broadcast

2019-06-21 Thread Li Qiang

When emulating irqchip in qemu, such as following command:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -smp 4 -hda /home/test/test.img
-machine kernel-irqchip=off --enable-kvm -vnc :0 -device edu -monitor stdio

We will get a crash with following asan output:

(qemu) /home/test/qemu5/qemu/hw/intc/ioapic.c:266:27: runtime error: index 35 
out of bounds for type 'int [24]'
=
==113504==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x61b03114 at pc 0x5579e3c7a80f bp 0x7fd004bf8c10 sp 0x7fd004bf8c00
WRITE of size 4 at 0x61b03114 thread T4
#0 0x5579e3c7a80e in ioapic_eoi_broadcast 
/home/test/qemu5/qemu/hw/intc/ioapic.c:266
#1 0x5579e3c6f480 in apic_eoi /home/test/qemu5/qemu/hw/intc/apic.c:428
#2 0x5579e3c720a7 in apic_mem_write /home/test/qemu5/qemu/hw/intc/apic.c:802
#3 0x5579e3b1e31a in memory_region_write_accessor 
/home/test/qemu5/qemu/memory.c:503
#4 0x5579e3b1e6a2 in access_with_adjusted_size 
/home/test/qemu5/qemu/memory.c:569
#5 0x5579e3b28d77 in memory_region_dispatch_write 
/home/test/qemu5/qemu/memory.c:1497
#6 0x5579e3a1b36b in flatview_write_continue 
/home/test/qemu5/qemu/exec.c:3323
#7 0x5579e3a1b633 in flatview_write /home/test/qemu5/qemu/exec.c:3362
#8 0x5579e3a1bcb1 in address_space_write /home/test/qemu5/qemu/exec.c:3452
#9 0x5579e3a1bd03 in address_space_rw /home/test/qemu5/qemu/exec.c:3463
#10 0x5579e3b8b979 in kvm_cpu_exec 
/home/test/qemu5/qemu/accel/kvm/kvm-all.c:2045
#11 0x5579e3ae4499 in qemu_kvm_cpu_thread_fn 
/home/test/qemu5/qemu/cpus.c:1287
#12 0x5579e4cbdb9f in qemu_thread_start util/qemu-thread-posix.c:502
#13 0x7fd0146376da in start_thread 
(/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#14 0x7fd01436088e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e

This is because in ioapic_eoi_broadcast function, we uses 'vector' to
index the 's->irq_eoi'. To fix this, we should uses the irq number.

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# On branch master
# Your branch is up to date with 'origin/master'.
#
# Changes to be committed:
#   modified:   hw/intc/ioapic.c
#
# Untracked files:
#   0001-migration-fix-a-typo.patch
#   roms/vgabios/
#   vhost-user-input
#

Signed-off-by: Li Qiang 
---
 hw/intc/ioapic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 7074489fdf..711775cc6f 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -245,8 +245,8 @@ void ioapic_eoi_broadcast(int vector)
 s->ioredtbl[n] = entry & ~IOAPIC_LVT_REMOTE_IRR;
 
 if (!(entry & IOAPIC_LVT_MASKED) && (s->irr & (1 << n))) {
-++s->irq_eoi[vector];
-if (s->irq_eoi[vector] >= SUCCESSIVE_IRQ_MAX_COUNT) {
+++s->irq_eoi[n];
+if (s->irq_eoi[n] >= SUCCESSIVE_IRQ_MAX_COUNT) {
 /*
  * Real hardware does not deliver the interrupt immediately
  * during eoi broadcast, and this lets a buggy guest make
@@ -254,16 +254,16 @@ void ioapic_eoi_broadcast(int vector)
  * level-triggered interrupt. Emulate this behavior if we
  * detect an interrupt storm.
  */
-s->irq_eoi[vector] = 0;
+s->irq_eoi[n] = 0;
 timer_mod_anticipate(s->delayed_ioapic_service_timer,
  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) 
+
  NANOSECONDS_PER_SECOND / 100);
-trace_ioapic_eoi_delayed_reassert(vector);
+trace_ioapic_eoi_delayed_reassert(n);
 } else {
 ioapic_service(s);
 }
 } else {
-s->irq_eoi[vector] = 0;
+s->irq_eoi[n] = 0;
 }
 }
 }
-- 
2.17.1

Re: [Qemu-devel] [PULL 20/25] target/i386: kvm: Add support for save and restore nested state

2019-06-21 Thread Liran Alon

> On 21 Jun 2019, at 18:44, Liran Alon  wrote:
> 
> 
> 
>> On 21 Jun 2019, at 18:39, Paolo Bonzini  wrote:
>> 
>> On 21/06/19 17:00, Liran Alon wrote:
>>> Cool.
>>> Are you planning to make those changes when applying/merging or
>>> do you need me to submit a new patch-series version?
>>> Also note my comment on the other patch regarding block migration on AMD 
>>> vCPU which expose SVM.
>> 
>> It's already merged, but it's not a big deal since it's only AMD.  We
>> can follow up before release.
>> 
>> Paolo
> 
> Ok then. We at least now have nVMX migration working in QEMU! :)
> I will just submit additional separate patches on top of QEMU master.
> 
> Thanks,
> -Liran
> 

Oh the applied patch-series is not very nice actually…
It seems that some of the commits cannot even compile such as "target/i386: 
kvm: Block migration for vCPUs exposed with nested virtualization".
You have removed cpu_has_nested_virt(env) from that commit even though it is 
used…
But as the author of the commit I will be blamed for this broken bisection :(
LOL. Oh well… Mistakes happen. :)

-Liran

Re: [Qemu-devel] [PULL 20/25] target/i386: kvm: Add support for save and restore nested state

2019-06-21 Thread Liran Alon




> On 21 Jun 2019, at 18:39, Paolo Bonzini  wrote:
> 
> On 21/06/19 17:00, Liran Alon wrote:
>> Cool.
>> Are you planning to make those changes when applying/merging or
>> do you need me to submit a new patch-series version?
>> Also note my comment on the other patch regarding block migration on AMD 
>> vCPU which expose SVM.
> 
> It's already merged, but it's not a big deal since it's only AMD.  We
> can follow up before release.
> 
> Paolo

Ok then. We at least now have nVMX migration working in QEMU! :)
I will just submit additional separate patches on top of QEMU master.

Thanks,
-Liran

Re: [Qemu-devel] [PULL 20/25] target/i386: kvm: Add support for save and restore nested state

2019-06-21 Thread Paolo Bonzini

On 21/06/19 17:00, Liran Alon wrote:
> Cool.
> Are you planning to make those changes when applying/merging or
> do you need me to submit a new patch-series version?
> Also note my comment on the other patch regarding block migration on AMD vCPU 
> which expose SVM.

It's already merged, but it's not a big deal since it's only AMD.  We
can follow up before release.

Paolo

[Qemu-devel] [RFC PATCH] tests/acceptance: Handle machine type for ARM target

2019-06-21 Thread Wainer dos Santos Moschetta

Hi all,

I'm still unsure this is the best solution. I tend to think that
any arch-independent test case (i.e. not tagged 'arch') should
be skipped on all arches except for x86_64. Opening up for
discussion though.

Note: It was decided that ARM targets should not default to any
machine type: https://www.mail-archive.com/qemu-devel@nongnu.org/msg625999.html

-- 8< --
Some tests are meant arch-independent and as such they don't set
the machine type (i.e. relying to defaults) on launched VMs. The arm
targets, however, don't provide any default machine so tests fail.

This patch adds a logic on the base Test class so that machine type
is set to 'virt' when:
   a) The test case doesn't have arch:aarch64 or arch:arm tag. Here
  I assume that if the test was tagged for a specific arch then
  the writer took care of setting a machine type.
   b) The target binary arch is any of aarch64 or arm. Note:
  self.target_arch can end up None if qemu_bin is passed by
  Avocado parameter and the filename doesn't match expected
  format. In this case the test will fail.

Signed-off-by: Wainer dos Santos Moschetta 
---
 tests/acceptance/avocado_qemu/__init__.py | 12 
 1 file changed, 12 insertions(+)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 2b236a1cf0..fb3e0dc2bc 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -9,6 +9,7 @@
 # later.  See the COPYING file in the top-level directory.
 
 import os
+import re
 import sys
 import uuid
 
@@ -65,10 +66,21 @@ class Test(avocado.Test):
 if self.qemu_bin is None:
 self.cancel("No QEMU binary defined or found in the source tree")
 
+m = re.match('qemu-system-(.*)', self.qemu_bin.split('/').pop())
+if m:
+self.target_arch = m.group(1)
+else:
+self.target_arch = None
+
 def _new_vm(self, *args):
 vm = QEMUMachine(self.qemu_bin)
 if args:
 vm.add_args(*args)
+# Handle lack of default machine type on some targets.
+# Assume that arch tagged tests have machine type set properly.
+if self.tags.get('arch') is None and \
+   self.target_arch in ('aarch64', 'arm'):
+vm.set_machine('virt')
 return vm
 
 @property
-- 
2.18.1

Re: [Qemu-devel] [PATCH v2] blockjob: drain all job nodes in block_job_drain

2019-06-21 Thread Vladimir Sementsov-Ogievskiy

21.06.2019 18:15, Vladimir Sementsov-Ogievskiy wrote:
> Instead of draining additional nodes in each job code, let's do it in
> common block_job_drain, draining just all job's children.
> BlockJobDriver.drain becomes unused, so, drop it at all.
> 
> It's also a first step to finally get rid of blockjob->blk.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy

Ogh, double sending again, I feel like an idiot :(

Certainly, I've just pressed Ctrl+R, and fixed last argument to v2, sorry.

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v4 08/13] vfio: Add save state functions to SaveVMHandlers

2019-06-21 Thread Alex Williamson

On Fri, 21 Jun 2019 12:08:26 +0530
Kirti Wankhede  wrote:

> On 6/21/2019 12:55 AM, Alex Williamson wrote:
> > On Thu, 20 Jun 2019 20:07:36 +0530
> > Kirti Wankhede  wrote:
> >   
> >> Added .save_live_pending, .save_live_iterate and 
> >> .save_live_complete_precopy
> >> functions. These functions handles pre-copy and stop-and-copy phase.
> >>
> >> In _SAVING|_RUNNING device state or pre-copy phase:
> >> - read pending_bytes
> >> - read data_offset - indicates kernel driver to write data to staging
> >>   buffer which is mmapped.  
> > 
> > Why is data_offset the trigger rather than data_size?  It seems that
> > data_offset can't really change dynamically since it might be mmap'd,
> > so it seems unnatural to bother re-reading it.
> >   
> 
> Vendor driver can change data_offset, he can have different data_offset
> for device data and dirty pages bitmap.
> 
> >> - read data_size - amount of data in bytes written by vendor driver in 
> >> migration
> >>   region.
> >> - if data section is trapped, pread() number of bytes in data_size, from
> >>   data_offset.
> >> - if data section is mmaped, read mmaped buffer of size data_size.
> >> - Write data packet to file stream as below:
> >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
> >> VFIO_MIG_FLAG_END_OF_STATE }
> >>
> >> In _SAVING device state or stop-and-copy phase
> >> a. read config space of device and save to migration file stream. This
> >>doesn't need to be from vendor driver. Any other special config state
> >>from driver can be saved as data in following iteration.
> >> b. read pending_bytes - indicates kernel driver to write data to staging
> >>buffer which is mmapped.  
> > 
> > Is it pending_bytes or data_offset that triggers the write out of
> > data?  Why pending_bytes vs data_size?  I was interpreting
> > pending_bytes as the total data size while data_size is the size
> > available to read now, so assumed data_size would be more closely
> > aligned to making the data available.
> >   
> 
> Sorry, that's my mistake while editing, its read data_offset as in above
> case.
> 
> >> c. read data_size - amount of data in bytes written by vendor driver in
> >>migration region.
> >> d. if data section is trapped, pread() from data_offset of size data_size.
> >> e. if data section is mmaped, read mmaped buffer of size data_size.  
> > 
> > Should this read as "pread() from data_offset of data_size, or
> > optionally if mmap is supported on the data area, read data_size from
> > start of mapped buffer"?  IOW, pread should always work.  Same in
> > previous section.
> >   
> 
> ok. I'll update.
> 
> >> f. Write data packet as below:
> >>{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
> >> g. iterate through steps b to f until (pending_bytes > 0)  
> > 
> > s/until/while/  
> 
> Ok.
> 
> >   
> >> h. Write {VFIO_MIG_FLAG_END_OF_STATE}
> >>
> >> .save_live_iterate runs outside the iothread lock in the migration case, 
> >> which
> >> could race with asynchronous call to get dirty page list causing data 
> >> corruption
> >> in mapped migration region. Mutex added here to serial migration buffer 
> >> read
> >> operation.  
> > 
> > Would we be ahead to use different offsets within the region for device
> > data vs dirty bitmap to avoid this?
> >  
> 
> Lock will still be required to serialize the read/write operations on
> vfio_device_migration_info structure in the region.
> 
> 
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  hw/vfio/migration.c | 212 
> >> 
> >>  1 file changed, 212 insertions(+)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index fe0887c27664..0a2f30872316 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -107,6 +107,111 @@ static int vfio_migration_set_state(VFIODevice 
> >> *vbasedev, uint32_t state)
> >>  return 0;
> >>  }
> >>  
> >> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
> >> +{
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +VFIORegion *region = >region.buffer;
> >> +uint64_t data_offset = 0, data_size = 0;
> >> +int ret;
> >> +
> >> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> >> +region->fd_offset + offsetof(struct 
> >> vfio_device_migration_info,
> >> + data_offset));
> >> +if (ret != sizeof(data_offset)) {
> >> +error_report("Failed to get migration buffer data offset %d",
> >> + ret);
> >> +return -EINVAL;
> >> +}
> >> +
> >> +ret = pread(vbasedev->fd, _size, sizeof(data_size),
> >> +region->fd_offset + offsetof(struct 
> >> vfio_device_migration_info,
> >> + data_size));
> >> +if (ret != sizeof(data_size)) {
> >> +error_report("Failed to get migration buffer data size %d",
> >> +

[Qemu-devel] [PATCH v2] blockjob: drain all job nodes in block_job_drain

2019-06-21 Thread Vladimir Sementsov-Ogievskiy

Instead of draining additional nodes in each job code, let's do it in
common block_job_drain, draining just all job's children.
BlockJobDriver.drain becomes unused, so, drop it at all.

It's also a first step to finally get rid of blockjob->blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

v2: apply Max's suggestions:
 - drop BlockJobDriver.drain
 - do firtly loop of bdrv_drained_begin and then separate loop
   of bdrv_drained_end.

   Hmm, a question here: should I call bdrv_drained_end in reverse
   order? Or it's OK as is?

 include/block/blockjob_int.h | 11 ---
 block/backup.c   | 18 +-
 block/mirror.c   | 26 +++---
 blockjob.c   | 13 -
 4 files changed, 12 insertions(+), 56 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index e4a318dd15..e1abf4ee85 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -52,17 +52,6 @@ struct BlockJobDriver {
  * besides job->blk to the new AioContext.
  */
 void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
-
-/*
- * If the callback is not NULL, it will be invoked when the job has to be
- * synchronously cancelled or completed; it should drain BlockDriverStates
- * as required to ensure progress.
- *
- * Block jobs must use the default implementation for job_driver.drain,
- * which will in turn call this callback after doing generic block job
- * stuff.
- */
-void (*drain)(BlockJob *job);
 };
 
 /**
diff --git a/block/backup.c b/block/backup.c
index 715e1d3be8..7930004bbd 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -320,21 +320,6 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
 }
 
-static void backup_drain(BlockJob *job)
-{
-BackupBlockJob *s = container_of(job, BackupBlockJob, common);
-
-/* Need to keep a reference in case blk_drain triggers execution
- * of backup_complete...
- */
-if (s->target) {
-BlockBackend *target = s->target;
-blk_ref(target);
-blk_drain(target);
-blk_unref(target);
-}
-}
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
 bool read, int error)
 {
@@ -493,8 +478,7 @@ static const BlockJobDriver backup_job_driver = {
 .commit = backup_commit,
 .abort  = backup_abort,
 .clean  = backup_clean,
-},
-.drain  = backup_drain,
+}
 };
 
 static int64_t backup_calculate_cluster_size(BlockDriverState *target,
diff --git a/block/mirror.c b/block/mirror.c
index d17be4cdbc..6bea99558f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -644,14 +644,11 @@ static int mirror_exit_common(Job *job)
 bdrv_ref(mirror_top_bs);
 bdrv_ref(target_bs);
 
-/* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
+/*
+ * Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
  * inserting target_bs at s->to_replace, where we might not be able to get
  * these permissions.
- *
- * Note that blk_unref() alone doesn't necessarily drop permissions because
- * we might be running nested inside mirror_drain(), which takes an extra
- * reference, so use an explicit blk_set_perm() first. */
-blk_set_perm(s->target, 0, BLK_PERM_ALL, _abort);
+ */
 blk_unref(s->target);
 s->target = NULL;
 
@@ -1143,21 +1140,6 @@ static bool mirror_drained_poll(BlockJob *job)
 return !!s->in_flight;
 }
 
-static void mirror_drain(BlockJob *job)
-{
-MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
-
-/* Need to keep a reference in case blk_drain triggers execution
- * of mirror_complete...
- */
-if (s->target) {
-BlockBackend *target = s->target;
-blk_ref(target);
-blk_drain(target);
-blk_unref(target);
-}
-}
-
 static const BlockJobDriver mirror_job_driver = {
 .job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
@@ -1172,7 +1154,6 @@ static const BlockJobDriver mirror_job_driver = {
 .complete   = mirror_complete,
 },
 .drained_poll   = mirror_drained_poll,
-.drain  = mirror_drain,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
@@ -1189,7 +1170,6 @@ static const BlockJobDriver commit_active_job_driver = {
 .complete   = mirror_complete,
 },
 .drained_poll   = mirror_drained_poll,
-.drain  = mirror_drain,
 };
 
 static void coroutine_fn
diff --git a/blockjob.c b/blockjob.c
index 458ae76f51..059dc199ba 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -92,12 +92,15 @@ void block_job_free(Job *job)
 void block_job_drain(Job *job)
 {
 BlockJob *bjob =

[Qemu-devel] [PATCH] blockjob: drain all job nodes in block_job_drain

2019-06-21 Thread Vladimir Sementsov-Ogievskiy

Instead of draining additional nodes in each job code, let's do it in
common block_job_drain, draining just all job's children.

It's also a first step to finally get rid of blockjob->blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

As a follow-up for "block: drop bs->job" recently merged, I'm now trying
to drop BlockJob.blk pointer, jobs really works with several nodes and
now reason to keep special blk for one of the children, and no reason to
handle nodes differently in, for example, backup code..

And as a first step I need to sort out block_job_drain, and here is my
suggestion on it.

 block/backup.c | 18 +-
 block/mirror.c | 26 +++---
 blockjob.c |  7 ++-
 3 files changed, 10 insertions(+), 41 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 715e1d3be8..7930004bbd 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -320,21 +320,6 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
 }
 
-static void backup_drain(BlockJob *job)
-{
-BackupBlockJob *s = container_of(job, BackupBlockJob, common);
-
-/* Need to keep a reference in case blk_drain triggers execution
- * of backup_complete...
- */
-if (s->target) {
-BlockBackend *target = s->target;
-blk_ref(target);
-blk_drain(target);
-blk_unref(target);
-}
-}
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
 bool read, int error)
 {
@@ -493,8 +478,7 @@ static const BlockJobDriver backup_job_driver = {
 .commit = backup_commit,
 .abort  = backup_abort,
 .clean  = backup_clean,
-},
-.drain  = backup_drain,
+}
 };
 
 static int64_t backup_calculate_cluster_size(BlockDriverState *target,
diff --git a/block/mirror.c b/block/mirror.c
index d17be4cdbc..6bea99558f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -644,14 +644,11 @@ static int mirror_exit_common(Job *job)
 bdrv_ref(mirror_top_bs);
 bdrv_ref(target_bs);
 
-/* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
+/*
+ * Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
  * inserting target_bs at s->to_replace, where we might not be able to get
  * these permissions.
- *
- * Note that blk_unref() alone doesn't necessarily drop permissions because
- * we might be running nested inside mirror_drain(), which takes an extra
- * reference, so use an explicit blk_set_perm() first. */
-blk_set_perm(s->target, 0, BLK_PERM_ALL, _abort);
+ */
 blk_unref(s->target);
 s->target = NULL;
 
@@ -1143,21 +1140,6 @@ static bool mirror_drained_poll(BlockJob *job)
 return !!s->in_flight;
 }
 
-static void mirror_drain(BlockJob *job)
-{
-MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
-
-/* Need to keep a reference in case blk_drain triggers execution
- * of mirror_complete...
- */
-if (s->target) {
-BlockBackend *target = s->target;
-blk_ref(target);
-blk_drain(target);
-blk_unref(target);
-}
-}
-
 static const BlockJobDriver mirror_job_driver = {
 .job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
@@ -1172,7 +1154,6 @@ static const BlockJobDriver mirror_job_driver = {
 .complete   = mirror_complete,
 },
 .drained_poll   = mirror_drained_poll,
-.drain  = mirror_drain,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
@@ -1189,7 +1170,6 @@ static const BlockJobDriver commit_active_job_driver = {
 .complete   = mirror_complete,
 },
 .drained_poll   = mirror_drained_poll,
-.drain  = mirror_drain,
 };
 
 static void coroutine_fn
diff --git a/blockjob.c b/blockjob.c
index 458ae76f51..0cabdc867d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -94,8 +94,13 @@ void block_job_drain(Job *job)
 BlockJob *bjob = container_of(job, BlockJob, job);
 const JobDriver *drv = job->driver;
 BlockJobDriver *bjdrv = container_of(drv, BlockJobDriver, job_driver);
+GSList *l;
+
+for (l = bjob->nodes; l; l = l->next) {
+BdrvChild *c = l->data;
+bdrv_drain(c->bs);
+}
 
-blk_drain(bjob->blk);
 if (bjdrv->drain) {
 bjdrv->drain(bjob);
 }
-- 
2.18.0

Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities

2019-06-21 Thread Liran Alon




> On 21 Jun 2019, at 18:02, Paolo Bonzini  wrote:
> 
> On 21/06/19 14:39, Liran Alon wrote:
>>> On 21 Jun 2019, at 14:30, Paolo Bonzini  wrote:
>>> 
>>> From: Liran Alon 
>>> 
>>> Previous commits have added support for migration of nested virtualization
>>> workloads. This was done by utilising two new KVM capabilities:
>>> KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are
>>> required in order to correctly migrate such workloads.
>>> 
>>> Therefore, change code to add a migration blocker for vCPUs exposed with
>>> Intel VMX or AMD SVM in case one of these kernel capabilities is
>>> missing.
>>> 
>>> Signed-off-by: Liran Alon 
>>> Reviewed-by: Maran Wilson 
>>> Message-Id: <20190619162140.133674-11-liran.a...@oracle.com>
>>> Signed-off-by: Paolo Bonzini 
>>> ---
>>> target/i386/kvm.c | 9 +++--
>>> target/i386/machine.c | 2 +-
>>> 2 files changed, 8 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>>> index c931e9d..e4b4f57 100644
>>> --- a/target/i386/kvm.c
>>> +++ b/target/i386/kvm.c
>>> @@ -1640,9 +1640,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>  !!(c->ecx & CPUID_EXT_SMX);
>>>}
>>> 
>>> -if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) {
>>> +if (cpu_has_vmx(env) && !nested_virt_mig_blocker &&
>> 
>> The change here from cpu_has_nested_virt(env) to cpu_has_vmx(env) is not 
>> clear.
>> We should explicitly explain that it’s because we still wish to preserve 
>> backwards-compatability
>> to migrating AMD vCPU exposed with SVM.
>> 
>> In addition, commit ("target/i386: kvm: Block migration for vCPUs exposed 
>> with nested virtualization")
>> doesn’t make sense in case we still want to allow migrating AMD vCPU exposed 
>> with SVM.
>> 
>> Since QEMU commit 75d373ef9729 ("target-i386: Disable SVM by default in KVM 
>> mode"),
>> machine-types since v2.2 don’t expose AMD SVM by default.
>> Therefore, my personal opinion on this is that it’s fine to block migration 
>> in this case.
> 
> I totally missed that commit.  My bad.
> 
> Actually, now that I think about it SVM *will* have some state while
> running in guest mode, namely:
> 
> - the NPT page table root
> 
> - the L1 CR4.PAE, EFER.LMA and EFER.NXE bits, which determine the format
> of the NPT12 page tables
> 
> These are covered by the existing vmstate_svm_npt subsection.
> 
> On the other hand, the lack of something like VMXON/VMCS12 state means
> that AMD already sorta works unless you're migrating while in guest
> mode.  For Intel, just execute VMXON before migration, and starting any
> VM after migration is doomed.

True.

> 
> So, overall I prefer not to block migration.

I’m not sure I agree.
It is quite likely that vCPU is currently in guest-mode while you are migrating…
A good hypervisor tries to maximise CPU time to be in guest-mode rather than 
host-mode. :)

I prefer to block migration and once we formally complete the implementation of 
SVM nested state,
we can modify QEMU code such that migration of vCPU exposed with SVM will work 
in case
nested-state states that guest is in host-mode.

> 
>>> +((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) {
>>>error_setg(_virt_mig_blocker,
>>> -   "Nested virtualization does not support live migration 
>>> yet");
>>> +   "Kernel do not provide required capabilities for “
>> 
>> As Maran have noted, we should change this “do not” to “does not”.
>> Sorry for my bad English grammer. :)
>> 
>>> +   "nested virtualization migration. "
>>> +   "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)",
>>> +   kvm_max_nested_state_length() > 0,
>>> +   has_exception_payload);
>>>r = migrate_add_blocker(nested_virt_mig_blocker, _err);
>>>if (local_err) {
>>>error_report_err(local_err);
>>> diff --git a/target/i386/machine.c b/target/i386/machine.c
>>> index fc49e5a..851b249 100644
>>> --- a/target/i386/machine.c
>>> +++ b/target/i386/machine.c
>>> @@ -233,7 +233,7 @@ static int cpu_pre_save(void *opaque)
>>> 
>>> #ifdef CONFIG_KVM
>>>/* Verify we have nested virtualization state from kernel if required */
>>> -if (cpu_has_nested_virt(env) && !env->nested_state) {
>>> +if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) {
>> 
>> Good catch regarding adding kvm_enabled() here.
> 
> Caught by "make check", not by me!

Ah nice to know :) Thanks for the tip.

> 
>> But I think this should have been added to commit ("target/i386: kvm: Add 
>> support for save and restore nested state”).
> 
> This commit is where bisection broke. :)
> 
> Paolo

Re: [Qemu-devel] [PATCH] spapr/xive: H_INT_ESB is used for LSIs only

2019-06-21 Thread Cédric Le Goater

On 21/06/2019 16:52, Greg Kurz wrote:
> As indicated in the function header, this "hcall is only supported for
> LISNs that have the ESB hcall flag set to 1 when returned from hcall()
> H_INT_GET_SOURCE_INFO". We only set that flag for LSIs actually.
> 
> Check that in h_int_esb().

Indeed. H_INT_ESB should work on any IRQ, but I think it's better 
to check that the HCALL is only used with the IRQ requiring it.   

> Signed-off-by: Greg Kurz 

Reviewed-by: Cédric Le Goater 

Thanks,

C.

> ---
>  hw/intc/spapr_xive.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 58c2e5d890bd..01dd47ad5b02 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -1408,6 +1408,12 @@ static target_ulong h_int_esb(PowerPCCPU *cpu,
>  return H_P2;
>  }
>  
> +if (!xive_source_irq_is_lsi(xsrc, lisn)) {
> +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: LISN " TARGET_FMT_lx "isn't 
> LSI\n",
> +  lisn);
> +return H_P2;
> +}
> +
>  if (offset > (1ull << xsrc->esb_shift)) {
>  return H_P3;
>  }
>

Re: [Qemu-devel] [PULL 20/25] target/i386: kvm: Add support for save and restore nested state

2019-06-21 Thread Paolo Bonzini

On 21/06/19 14:48, Liran Alon wrote:
> 
> 
>> On 21 Jun 2019, at 15:45, Paolo Bonzini  wrote:
>>
>> On 21/06/19 14:29, Liran Alon wrote:
 +max_nested_state_len = kvm_max_nested_state_length();
 +if (max_nested_state_len > 0) {
 +assert(max_nested_state_len >= offsetof(struct kvm_nested_state, 
 data));
 +env->nested_state = g_malloc0(max_nested_state_len);
 +
 +env->nested_state->size = max_nested_state_len;
 +
 +if (IS_INTEL_CPU(env)) {
>>> I think it’s better to change this to: “if (cpu_has_vmx(env))” {
>>>
 +struct kvm_vmx_nested_state_hdr *vmx_hdr =
 +>nested_state->hdr.vmx;
 +
 +env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
 +vmx_hdr->vmxon_pa = -1ull;
 +vmx_hdr->vmcs12_pa = -1ull;
 +}
 +}
>>> I think we should add here:
>>> } else if (cpu_has_svm(env)) {
>>>env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM;
>>> }
>>
>> Or even force max_nested_state_len to 0 for AMD hosts, so that
>> kvm_get/put_nested_state are dropped completely.
>>
>> Paolo
>>
> 
> On AMD hosts, KVM returns 0 for KVM_CAP_NESTED_STATE because
> Kvm-and.ko have kvm_x86_ops->get_nested_state set to NULL.
> See kvm_vm_ioctl_check_extension().

Yes, now it does, my idea was to force that behavior in QEMU until we
know what SVM support actually looks like.

In principle I don't like the idea of crossing fingers, even though
there's an actual possibility that it could work.  But it couldn't be
worse than what we have now, so maybe it *is* actually a good idea, just
with some comment that explains the rationale.

Paolo


> I just thought it will be nicer to add what I proposed above as when kernel 
> adds support
> for nested state on AMD host, QEMU would maybe just work.
> (Because maybe all state required for AMD nSVM is just flags in 
> env->nested_state->flags).
> 
> -Liran
>

Re: [Qemu-devel] [PATCH v4 01/13] vfio: KABI for migration interface

2019-06-21 Thread Alex Williamson

On Fri, 21 Jun 2019 11:22:15 +0530
Kirti Wankhede  wrote:

> On 6/20/2019 10:48 PM, Alex Williamson wrote:
> > On Thu, 20 Jun 2019 20:07:29 +0530
> > Kirti Wankhede  wrote:
> >   
> >> - Defined MIGRATION region type and sub-type.
> >> - Used 3 bits to define VFIO device states.
> >> Bit 0 => _RUNNING
> >> Bit 1 => _SAVING
> >> Bit 2 => _RESUMING
> >> Combination of these bits defines VFIO device's state during migration
> >> _STOPPED => All bits 0 indicates VFIO device stopped.
> >> _RUNNING => Normal VFIO device running state.
> >> _SAVING | _RUNNING => vCPUs are running, VFIO device is running but 
> >> start
> >>   saving state of device i.e. pre-copy state
> >> _SAVING  => vCPUs are stoppped, VFIO device should be stopped, and
> >>   save device state,i.e. stop-n-copy state
> >> _RESUMING => VFIO device resuming state.
> >> _SAVING | _RESUMING => Invalid state if _SAVING and _RESUMING bits are 
> >> set
> >> - Defined vfio_device_migration_info structure which will be placed at 0th
> >>   offset of migration region to get/set VFIO device related information.
> >>   Defined members of structure and usage on read/write access:
> >> * device_state: (read/write)
> >> To convey VFIO device state to be transitioned to. Only 3 bits are 
> >> used
> >> as of now.
> >> * pending bytes: (read only)
> >> To get pending bytes yet to be migrated for VFIO device.
> >> * data_offset: (read only)
> >> To get data offset in migration from where data exist during 
> >> _SAVING
> >> and from where data should be written by user space application 
> >> during
> >>  _RESUMING state
> >> * data_size: (read/write)
> >> To get and set size of data copied in migration region during 
> >> _SAVING
> >> and _RESUMING state.
> >> * start_pfn, page_size, total_pfns: (write only)
> >> To get bitmap of dirty pages from vendor driver from given
> >> start address for total_pfns.
> >> * copied_pfns: (read only)
> >> To get number of pfns bitmap copied in migration region.
> >> Vendor driver should copy the bitmap with bits set only for
> >> pages to be marked dirty in migration region. Vendor driver
> >> should return 0 if there are 0 pages dirty in requested
> >> range. Vendor driver should return -1 to mark all pages in the 
> >> section
> >> as dirty
> >>
> >> Migration region looks like:
> >>  --
> >> |vfio_device_migration_info|data section  |
> >> |  | ///  |
> >>  --
> >>  ^  ^  ^
> >>  offset 0-trapped partdata_offset data_size
> >>
> >> Data section is always followed by vfio_device_migration_info
> >> structure in the region, so data_offset will always be none-0.
> >> Offset from where data is copied is decided by kernel driver, data
> >> section can be trapped or mapped depending on how kernel driver
> >> defines data section. If mmapped, then data_offset should be page
> >> aligned, where as initial section which contain
> >> vfio_device_migration_info structure might not end at offset which
> >> is page aligned.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  linux-headers/linux/vfio.h | 71 
> >> ++
> >>  1 file changed, 71 insertions(+)
> >>
> >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> >> index 24f505199f83..274ec477eb82 100644
> >> --- a/linux-headers/linux/vfio.h
> >> +++ b/linux-headers/linux/vfio.h
> >> @@ -372,6 +372,77 @@ struct vfio_region_gfx_edid {
> >>   */
> >>  #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD  (1)
> >>  
> >> +/* Migration region type and sub-type */
> >> +#define VFIO_REGION_TYPE_MIGRATION(2)
> >> +#define VFIO_REGION_SUBTYPE_MIGRATION (1)
> >> +
> >> +/**
> >> + * Structure vfio_device_migration_info is placed at 0th offset of
> >> + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related 
> >> migration
> >> + * information. Field accesses from this structure are only supported at 
> >> their
> >> + * native width and alignment, otherwise should return error.
> >> + *
> >> + * device_state: (read/write)
> >> + *  To indicate vendor driver the state VFIO device should be 
> >> transitioned
> >> + *  to. If device state transition fails, write to this field return 
> >> error.
> >> + *  It consists of 3 bits:
> >> + *  - If bit 0 set, indicates _RUNNING state. When its reset, that 
> >> indicates
> >> + *_STOPPED state. When device is changed to _STOPPED, driver 
> >> should stop
> >> + *

Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities

2019-06-21 Thread Paolo Bonzini

On 21/06/19 14:39, Liran Alon wrote:
>> On 21 Jun 2019, at 14:30, Paolo Bonzini  wrote:
>>
>> From: Liran Alon 
>>
>> Previous commits have added support for migration of nested virtualization
>> workloads. This was done by utilising two new KVM capabilities:
>> KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are
>> required in order to correctly migrate such workloads.
>>
>> Therefore, change code to add a migration blocker for vCPUs exposed with
>> Intel VMX or AMD SVM in case one of these kernel capabilities is
>> missing.
>>
>> Signed-off-by: Liran Alon 
>> Reviewed-by: Maran Wilson 
>> Message-Id: <20190619162140.133674-11-liran.a...@oracle.com>
>> Signed-off-by: Paolo Bonzini 
>> ---
>> target/i386/kvm.c | 9 +++--
>> target/i386/machine.c | 2 +-
>> 2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index c931e9d..e4b4f57 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -1640,9 +1640,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>   !!(c->ecx & CPUID_EXT_SMX);
>> }
>>
>> -if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) {
>> +if (cpu_has_vmx(env) && !nested_virt_mig_blocker &&
> 
> The change here from cpu_has_nested_virt(env) to cpu_has_vmx(env) is not 
> clear.
> We should explicitly explain that it’s because we still wish to preserve 
> backwards-compatability
> to migrating AMD vCPU exposed with SVM.
> 
> In addition, commit ("target/i386: kvm: Block migration for vCPUs exposed 
> with nested virtualization")
> doesn’t make sense in case we still want to allow migrating AMD vCPU exposed 
> with SVM.
> 
> Since QEMU commit 75d373ef9729 ("target-i386: Disable SVM by default in KVM 
> mode"),
> machine-types since v2.2 don’t expose AMD SVM by default.
> Therefore, my personal opinion on this is that it’s fine to block migration 
> in this case.

I totally missed that commit.  My bad.

Actually, now that I think about it SVM *will* have some state while
running in guest mode, namely:

- the NPT page table root

- the L1 CR4.PAE, EFER.LMA and EFER.NXE bits, which determine the format
of the NPT12 page tables

These are covered by the existing vmstate_svm_npt subsection.

On the other hand, the lack of something like VMXON/VMCS12 state means
that AMD already sorta works unless you're migrating while in guest
mode.  For Intel, just execute VMXON before migration, and starting any
VM after migration is doomed.

So, overall I prefer not to block migration.

>> +((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) {
>> error_setg(_virt_mig_blocker,
>> -   "Nested virtualization does not support live migration 
>> yet");
>> +   "Kernel do not provide required capabilities for “
> 
> As Maran have noted, we should change this “do not” to “does not”.
> Sorry for my bad English grammer. :)
> 
>> +   "nested virtualization migration. "
>> +   "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)",
>> +   kvm_max_nested_state_length() > 0,
>> +   has_exception_payload);
>> r = migrate_add_blocker(nested_virt_mig_blocker, _err);
>> if (local_err) {
>> error_report_err(local_err);
>> diff --git a/target/i386/machine.c b/target/i386/machine.c
>> index fc49e5a..851b249 100644
>> --- a/target/i386/machine.c
>> +++ b/target/i386/machine.c
>> @@ -233,7 +233,7 @@ static int cpu_pre_save(void *opaque)
>>
>> #ifdef CONFIG_KVM
>> /* Verify we have nested virtualization state from kernel if required */
>> -if (cpu_has_nested_virt(env) && !env->nested_state) {
>> +if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) {
> 
> Good catch regarding adding kvm_enabled() here.

Caught by "make check", not by me!

> But I think this should have been added to commit ("target/i386: kvm: Add 
> support for save and restore nested state”).

This commit is where bisection broke. :)

Paolo

Re: [Qemu-devel] [PULL 20/25] target/i386: kvm: Add support for save and restore nested state

2019-06-21 Thread Liran Alon




> On 21 Jun 2019, at 17:55, Paolo Bonzini  wrote:
> 
> On 21/06/19 14:48, Liran Alon wrote:
>> 
>> 
>>> On 21 Jun 2019, at 15:45, Paolo Bonzini  wrote:
>>> 
>>> On 21/06/19 14:29, Liran Alon wrote:
> +max_nested_state_len = kvm_max_nested_state_length();
> +if (max_nested_state_len > 0) {
> +assert(max_nested_state_len >= offsetof(struct kvm_nested_state, 
> data));
> +env->nested_state = g_malloc0(max_nested_state_len);
> +
> +env->nested_state->size = max_nested_state_len;
> +
> +if (IS_INTEL_CPU(env)) {
 I think it’s better to change this to: “if (cpu_has_vmx(env))” {
 
> +struct kvm_vmx_nested_state_hdr *vmx_hdr =
> +>nested_state->hdr.vmx;
> +
> +env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
> +vmx_hdr->vmxon_pa = -1ull;
> +vmx_hdr->vmcs12_pa = -1ull;
> +}
> +}
 I think we should add here:
 } else if (cpu_has_svm(env)) {
   env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM;
 }
>>> 
>>> Or even force max_nested_state_len to 0 for AMD hosts, so that
>>> kvm_get/put_nested_state are dropped completely.
>>> 
>>> Paolo
>>> 
>> 
>> On AMD hosts, KVM returns 0 for KVM_CAP_NESTED_STATE because
>> Kvm-and.ko have kvm_x86_ops->get_nested_state set to NULL.
>> See kvm_vm_ioctl_check_extension().
> 
> Yes, now it does, my idea was to force that behavior in QEMU until we
> know what SVM support actually looks like.
> 
> In principle I don't like the idea of crossing fingers, even though
> there's an actual possibility that it could work.  But it couldn't be
> worse than what we have now, so maybe it *is* actually a good idea, just
> with some comment that explains the rationale.
> 
> Paolo

Cool.
Are you planning to make those changes when applying/merging or
do you need me to submit a new patch-series version?
Also note my comment on the other patch regarding block migration on AMD vCPU 
which expose SVM.

-Liran

> 
> 
>> I just thought it will be nicer to add what I proposed above as when kernel 
>> adds support
>> for nested state on AMD host, QEMU would maybe just work.
>> (Because maybe all state required for AMD nSVM is just flags in 
>> env->nested_state->flags).
>> 
>> -Liran
>> 
>

[Qemu-devel] [PATCH] spapr/xive: H_INT_ESB is used for LSIs only

2019-06-21 Thread Greg Kurz

As indicated in the function header, this "hcall is only supported for
LISNs that have the ESB hcall flag set to 1 when returned from hcall()
H_INT_GET_SOURCE_INFO". We only set that flag for LSIs actually.

Check that in h_int_esb().

Signed-off-by: Greg Kurz 
---
 hw/intc/spapr_xive.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 58c2e5d890bd..01dd47ad5b02 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -1408,6 +1408,12 @@ static target_ulong h_int_esb(PowerPCCPU *cpu,
 return H_P2;
 }
 
+if (!xive_source_irq_is_lsi(xsrc, lisn)) {
+qemu_log_mask(LOG_GUEST_ERROR, "XIVE: LISN " TARGET_FMT_lx "isn't 
LSI\n",
+  lisn);
+return H_P2;
+}
+
 if (offset > (1ull << xsrc->esb_shift)) {
 return H_P3;
 }

Re: [Qemu-devel] [PATCH v2 04/20] hw/i386/pc: Add the E820Type enum type

2019-06-21 Thread Philippe Mathieu-Daudé

On 6/20/19 5:31 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 13, 2019 at 04:34:30PM +0200, Philippe Mathieu-Daudé wrote:
>> This ensure we won't use an incorrect value.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
> 
> It doesn't actually ensure anything: compiler does not check IIUC.
> 
> And OTOH it's stored in type field in struct e820_entry.

I totally missed that... Thanks!

>> ---
>> v2: Do not cast the enum (Li)
>> ---
>>  hw/i386/pc.c |  4 ++--
>>  include/hw/i386/pc.h | 16 ++--
>>  2 files changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 5a7cffbb1a..86ba554439 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -872,7 +872,7 @@ static void handle_a20_line_change(void *opaque, int 
>> irq, int level)
>>  x86_cpu_set_a20(cpu, level);
>>  }
>>  
>> -ssize_t e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>> +ssize_t e820_add_entry(uint64_t address, uint64_t length, E820Type type)
>>  {
>>  unsigned int index = le32_to_cpu(e820_reserve.count);
>>  struct e820_entry *entry;
>> @@ -906,7 +906,7 @@ size_t e820_get_num_entries(void)
>>  return e820_entries;
>>  }
>>  
>> -bool e820_get_entry(unsigned int idx, uint32_t type,
>> +bool e820_get_entry(unsigned int idx, E820Type type,
>>  uint64_t *address, uint64_t *length)
>>  {
>>  if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index c56116e6f6..7c07185dd5 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -282,12 +282,16 @@ void pc_system_firmware_init(PCMachineState *pcms, 
>> MemoryRegion *rom_memory);
>>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>> const CPUArchIdList *apic_ids, GArray *entry);
>>  
>> -/* e820 types */
>> -#define E820_RAM1
>> -#define E820_RESERVED   2
>> -#define E820_ACPI   3
>> -#define E820_NVS4
>> -#define E820_UNUSABLE   5
>> +/**
>> + * E820Type: Type of the e820 address range.
>> + */
>> +typedef enum {
>> +E820_RAM= 1,
>> +E820_RESERVED   = 2,
>> +E820_ACPI   = 3,
>> +E820_NVS= 4,
>> +E820_UNUSABLE   = 5
>> +} E820Type;
>>  
>>  ssize_t e820_add_entry(uint64_t, uint64_t, uint32_t);
>>  size_t e820_get_num_entries(void);
>> -- 
>> 2.20.1

[Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device

2019-06-21 Thread Sukrit Bhatnagar

Hi,

I am a GSoC participant, trying to implement live migration for the
pvrdma device with help from my mentors Marcel and Yuval.
My current task is to save and load the various addresses that the
device uses for DMA mapping. We will be adding the device state into
live migration, incrementally. As the first step in the implementation,
we are performing migration to the same host. This will save us from
many complexities, such as GID change, at this stage, and we will
address migration across hosts at a later point when same-host migration
works.

Currently, the save and load logic uses SaveVMHandlers, which is the
legcay way, and will be ported to VMStateDescription once the
existing issues are solved.

This RFC is meant to request suggestions on the things which are
working and for help on the things which are not.


What is working:

* pvrdma device is getting initialized in a VM, its GID entry is
  getting added to the host, and rc_pingpong is successful between
  two such VMs. This is when libvirt is used to launch the VMs.

* The dma, cmd_slot_dma and resp_slot_dma addresses are saved at the
  source and loaded properly in the destination upon migration. That is,
  the values loaded at the dest during migration are the same as the
  ones saved.

  `dma` is provided by the guest device when it writes to BAR1, stored
  in dev->dsr_info.dma. A DSR is created on mapping to this address.
  `cmd_slot_dma` and `resp_slot_dma` are the dma addresses of the command
  and response buffers, respectively, which are provided by the guest
  through the DSR.

* The DSR successfully (re)maps to the dma address loaded from
  migration at the dest.


What is not working:

* In the pvrdma_load() logic, the mapping to DSR is successful at dest.
  But the mapping for cmd and resp slots fails.
  rdma_pci_dma_map() eventually calls address_space_map(). Inside the
  latter, a global BounceBuffer bounce is checked to see if it is in use
  (the atomic_xchg() primitive).
  At the dest, it is in use and the dma remapping fails there, which
  fails the whole migration process. Essentially, I am looking for a
  way to remap guest physical address after a live migration (to the
  same host). Any tips on avoiding the BounceBuffer will also be great.

  I have also tried unmapping the cmd and resp slots at the source before
  saving the dma addresses in pvrdma_save(), but the mapping fails anyway.

* It seems that vmxnet3 migration itself is not working properly, at least
  for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
  is function 1. This is happening even for a build of unmodified code from
  the master branch.
  After migration, the network connectivity is lost at destination.
  Things are fine at the source before migration.
  This is the command I am using at src:

  sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-enable-kvm \
-m 2G -smp cpus=2 \
-hda /home/skrtbhtngr/fedora.img \
-netdev tap,id=hostnet0 \
-device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
-monitor telnet:127.0.0.1:,server,nowait \
-trace events=/home/skrtbhtngr/trace-events \
-vnc 0.0.0.0:0

  Similar command is used for the dest. Currently, I am trying
  same-host migration for testing purpose, without the pvrdma device.
  Two tap interfaces, for src and dest were created successfully at
  the host. Kernel logs:
  ...
  br0: port 2(tap0) entered forwarding state
  ...
  br0: port 3(tap1) entered forwarding state

  tcpdump at the dest reports only outgoing ARP packets, which ask
  for gateway: "ARP, Request who-has _gateway tell guest1".

  Tried using user (slirp) as the network backend, but no luck.
  
  Also tried git bisect to find the issue using a working commit (given
  by Marcel), but it turns out that it is very old and I faced build
  errors one after another.

  Please note that e1000 live migration is working fine in the same setup.

* Since we are aiming at trying on same-host migration first, I cannot
  use libvirt as it does not allow this. Currently, I am running the
  VMs using qemu-system commands. But libvirt is needed to add the GID
  entry of the guest device in the host. I am looking for a workaround,
  if that is possible at all.
  I started a thread few days ago for the same on libvirt-users:
  https://www.redhat.com/archives/libvirt-users/2019-June/msg00011.html


Sukrit Bhatnagar (1):
  hw/pvrdma: Add live migration support

 hw/rdma/vmw/pvrdma_main.c | 56 +++
 1 file changed, 56 insertions(+)

-- 
2.21.0

Re: [Qemu-devel] [PATCH 1/2] Acceptance tests: exclude "flaky" tests

2019-06-21 Thread Cleber Rosa

On Fri, Jun 21, 2019 at 09:03:33AM +0200, Philippe Mathieu-Daudé wrote:
> On 6/21/19 8:09 AM, Cleber Rosa wrote:
> > It's a fact that some tests may not be 100% reliable in all
> > environments.  While it's a tough call to remove a useful test that
> > from the tree because it may fail every 1/100th time (or so), having
> > human attention drawn to known issues is very bad for humans and for
> > the projects they manage.
> > 
> > As a compromise solution, this marks tests that are known to have
> > issues, or that exercises known issues in QEMU or other components,
> > and excludes them from the entry point.  As a consequence, tests
> > marked as "flaky" will not be executed as part of "make
> > check-acceptance".
> > 
> > Because such tests should be forgiven but never be forgotten, it's
> > possible to list them with (assuming "make check-venv" or "make
> > check-acceptance" has already initiatilized the venv):
> > 
> >   $ ./tests/venv/bin/avocado list -t flaky tests/acceptance
> > 
> > The current list of tests marked as flaky are a result of running
> > the entire set of acceptance tests around 20 times.  The results
> > were then processed with a helper script[1].  That either confirmed
> > known issues (in the case of aarch64 and arm)[2] or revealed new
> > ones (mips).
> > 
> > This also bumps the Avocado version to one that includes a fix to the
> > parsing of multiple and mix "key:val" and simple tag values.
> > 
> > [1] 
> > https://raw.githubusercontent.com/avocado-framework/avocado/master/contrib/scripts/summarize-job-failures.py
> > [2] https://bugs.launchpad.net/qemu/+bug/1829779
> > 
> > Signed-off-by: Cleber Rosa 
> > ---
> >  docs/devel/testing.rst   | 17 +
> >  tests/Makefile.include   |  6 +-
> >  tests/acceptance/boot_linux_console.py   |  2 ++
> >  tests/acceptance/linux_ssh_mips_malta.py |  2 ++
> >  tests/requirements.txt   |  2 +-
> >  5 files changed, 27 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
> > index da2d0fc964..ff4d8e2e1c 100644
> > --- a/docs/devel/testing.rst
> > +++ b/docs/devel/testing.rst
> > @@ -574,6 +574,23 @@ may be invoked by running:
> >  
> >tests/venv/bin/avocado run $OPTION1 $OPTION2 tests/acceptance/
> >  
> > +Tagging tests
> > +-
> > +
> > +flaky
> > +~
> > +
> > +If a test is known to fail intermittently, even if only every one
> > +hundredth time, it's highly advisable to mark it as a flaky test.
> > +This will prevent these individual tests from failing much larger
> > +jobs, will avoid human interaction and time wasted to verify a known
> > +issue, and worse of all, can lead to the discredit of automated
> > +testing.
> > +
> > +To mark a test as flaky, add to its docstring.::
> > +
> > +  :avocado: tags=flaky
> 
> I certainly disagree with this patch, failing tests have to be fixed.
> Why not tag all the codebase flaky and sing "happy coding"?
>

That's a great idea! :)

Now, seriously, I also resisted this for quite a long time.  The
reality, though, is that intermittent failures will continue to
appear, and letting tests (and jobs, and CI pipelines, and whatnot)
fail is a very bad idea.  We all agree that real fixes are better than
this, but many times they don't come quickly.

> Anyway if this get accepted, 'flaky' tags must have the intermittent
> failure well described, and a Launchpad/Bugzilla tracking ticket referenced.
>

And here you have a key point that I absolutely agree with.  The
"flaky" approach can either poison a lot of tests, and be seen as
quick way out of a difficult issue revealed by a test.  Or, it can
serve as an effective tool to keep track of these very important
issues.

If we add:

   # https://bugs.launchpad.net/qemu/+bug/1829779
   :avocado: flaky

Topped with some human, I believe this can be very effective.  This goes
without saying, but comments here are very much welcome.

- Cleber.

> > +
> >  Manual Installation
> >  ---
> >  
> > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > index db750dd6d0..4c97da2878 100644
> > --- a/tests/Makefile.include
> > +++ b/tests/Makefile.include
> > @@ -1125,7 +1125,11 @@ TESTS_RESULTS_DIR=$(BUILD_DIR)/tests/results
> >  # Any number of command separated loggers are accepted.  For more
> >  # information please refer to "avocado --help".
> >  AVOCADO_SHOW=app
> > -AVOCADO_TAGS=$(patsubst %-softmmu,-t arch:%, $(filter 
> > %-softmmu,$(TARGET_DIRS)))
> > +
> > +# Additional tags that are added to each occurence of "--filter-by-tags"
> > +AVOCADO_EXTRA_TAGS := ,-flaky
> > +
> > +AVOCADO_TAGS=$(patsubst 
> > %-softmmu,--filter-by-tags=arch:%$(AVOCADO_EXTRA_TAGS), $(filter 
> > %-softmmu,$(TARGET_DIRS)))
> >  
> >  ifneq ($(findstring v2,"v$(PYTHON_VERSION)"),v2)
> >  $(TESTS_VENV_DIR): $(TESTS_VENV_REQ)
> > diff --git a/tests/acceptance/boot_linux_console.py 
> > b/tests/acceptance/boot_linux_console.py
> > index

Re: [Qemu-devel] [PATCH v2 02/20] hw/i386/pc: Use size_t type to hold/return a size of array

2019-06-21 Thread Philippe Mathieu-Daudé

On 6/20/19 5:28 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 13, 2019 at 04:34:28PM +0200, Philippe Mathieu-Daudé wrote:
>> Reviewed-by: Li Qiang 
>> Signed-off-by: Philippe Mathieu-Daudé 
> 
> Motivation? do you expect more than 2^31 entries?

Building with -Wsign-compare:

hw/i386/pc.c:973:36: warning: comparison of integers of different signs:
'unsigned int' and 'int' [-Wsign-compare]
for (i = 0, array_count = 0; i < e820_get_num_entries(); i++) {
 ~ ^ ~~

>> ---
>>  hw/i386/pc.c | 4 ++--
>>  include/hw/i386/pc.h | 2 +-
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index bb3c74f4ca..ff0f63 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -105,7 +105,7 @@ struct e820_table {
>>  
>>  static struct e820_table e820_reserve;
>>  static struct e820_entry *e820_table;
>> -static unsigned e820_entries;
>> +static size_t e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>>  /* Physical Address of PVH entry point read from kernel ELF NOTE */
>> @@ -901,7 +901,7 @@ int e820_add_entry(uint64_t address, uint64_t length, 
>> uint32_t type)
>>  return e820_entries;
>>  }
>>  
>> -int e820_get_num_entries(void)
>> +size_t e820_get_num_entries(void)
>>  {
>>  return e820_entries;
>>  }
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 3b3a0d6e59..fc29893624 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -290,7 +290,7 @@ void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>>  #define E820_UNUSABLE   5
>>  
>>  int e820_add_entry(uint64_t, uint64_t, uint32_t);
>> -int e820_get_num_entries(void);
>> +size_t e820_get_num_entries(void);
>>  bool e820_get_entry(unsigned int, uint32_t, uint64_t *, uint64_t *);
>>  
>>  extern GlobalProperty pc_compat_4_0_1[];
>> -- 
>> 2.20.1

[Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support

2019-06-21 Thread Sukrit Bhatnagar

Define and register SaveVMHandlers pvrdma_save and
pvrdma_load for saving and loading the device state,
which currently includes only the dma, command slot
and response slot addresses.

Remap the DSR, command slot and response slot upon
loading the addresses in the pvrdma_load function.

Cc: Marcel Apfelbaum 
Cc: Yuval Shaia 
Signed-off-by: Sukrit Bhatnagar 
---
 hw/rdma/vmw/pvrdma_main.c | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index adcf79cd63..cd8573173c 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -28,6 +28,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "hw/rdma/rdma.h"
+#include "migration/register.h"
 
 #include "../rdma_rm.h"
 #include "../rdma_backend.h"
@@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void 
*opaque)
 pvrdma_fini(pci_dev);
 }
 
+static void pvrdma_save(QEMUFile *f, void *opaque)
+{
+PVRDMADev *dev = PVRDMA_DEV(opaque);
+
+qemu_put_be64(f, dev->dsr_info.dma);
+qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
+qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
+}
+
+static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
+{
+PVRDMADev *dev = PVRDMA_DEV(opaque);
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+// Remap DSR
+dev->dsr_info.dma = qemu_get_be64(f);
+dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
+sizeof(struct 
pvrdma_device_shared_region));
+if (!dev->dsr_info.dsr) {
+rdma_error_report("Failed to map to DSR");
+return -1;
+}
+qemu_log("pvrdma_load: successfully remapped to DSR\n");
+
+// Remap cmd slot
+dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
+dev->dsr_info.req = rdma_pci_dma_map(pci_dev, 
dev->dsr_info.dsr->cmd_slot_dma,
+ sizeof(union pvrdma_cmd_req));
+if (!dev->dsr_info.req) {
+rdma_error_report("Failed to map to command slot address");
+return -1;
+}
+qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
+
+// Remap rsp slot
+dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
+dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, 
dev->dsr_info.dsr->resp_slot_dma,
+ sizeof(union pvrdma_cmd_resp));
+if (!dev->dsr_info.rsp) {
+rdma_error_report("Failed to map to response slot address");
+return -1;
+}
+qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
+
+return 0;
+}
+
+static SaveVMHandlers savevm_pvrdma = {
+.save_state = pvrdma_save,
+.load_state = pvrdma_load,
+};
+
 static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 {
 int rc = 0;
+DeviceState *s = DEVICE(pdev);
 PVRDMADev *dev = PVRDMA_DEV(pdev);
 Object *memdev_root;
 bool ram_shared = false;
@@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
 qemu_register_shutdown_notifier(>shutdown_notifier);
 
+register_savevm_live(s, "pvrdma", -1, 1, _pvrdma, dev);
+
 out:
 if (rc) {
 pvrdma_fini(pdev);
-- 
2.21.0

Re: [Qemu-devel] [PATCH v2 01/20] hw/i386/pc: Use unsigned type to index arrays

2019-06-21 Thread Philippe Mathieu-Daudé

On 6/20/19 5:27 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 13, 2019 at 04:34:27PM +0200, Philippe Mathieu-Daudé wrote:
>> Reviewed-by: Li Qiang 
>> Signed-off-by: Philippe Mathieu-Daudé 
> 
> Motivation?  Is this a bugfix?

Apparently I started to work on this series after "chardev: Convert
qemu_chr_write() to take a size_t argument" [*] for which I had these
extra warnings:

  --extra-cflags=-Wtype-limits\
 -Wsign-compare\
 -Wno-error=sign-compare

[*] https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg05229.html

>> ---
>>  hw/i386/pc.c | 5 +++--
>>  include/hw/i386/pc.h | 2 +-
>>  2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 2c5446b095..bb3c74f4ca 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -874,7 +874,7 @@ static void handle_a20_line_change(void *opaque, int 
>> irq, int level)
>>  
>>  int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>>  {
>> -int index = le32_to_cpu(e820_reserve.count);
>> +unsigned int index = le32_to_cpu(e820_reserve.count);
>>  struct e820_entry *entry;
>>  
>>  if (type != E820_RAM) {
>> @@ -906,7 +906,8 @@ int e820_get_num_entries(void)
>>  return e820_entries;
>>  }
>>  
>> -bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t 
>> *length)
>> +bool e820_get_entry(unsigned int idx, uint32_t type,
>> +uint64_t *address, uint64_t *length)
>>  {
>>  if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
>>  *address = le64_to_cpu(e820_table[idx].address);

And here I wanted to fix:

hw/i386/pc.c:911:13: warning: comparison of integers of different signs:
'int' and 'unsigned int' [-Wsign-compare]
if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
~~~ ^ 
hw/i386/pc.c:972:36: warning: comparison of integers of different signs:
'unsigned int' and 'int' [-Wsign-compare]
for (i = 0, array_count = 0; i < e820_get_num_entries(); i++) {
 ~ ^ ~~
Is it worthwhile?

>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index a7d0b87166..3b3a0d6e59 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -291,7 +291,7 @@ void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>>  
>>  int e820_add_entry(uint64_t, uint64_t, uint32_t);
>>  int e820_get_num_entries(void);
>> -bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>> +bool e820_get_entry(unsigned int, uint32_t, uint64_t *, uint64_t *);
>>  
>>  extern GlobalProperty pc_compat_4_0_1[];
>>  extern const size_t pc_compat_4_0_1_len;
>> -- 
>> 2.20.1

Re: [Qemu-devel] [PULL v2 00/25] Misc (mostly x86) patches for 2019-06-21

2019-06-21 Thread Peter Maydell

On Fri, 21 Jun 2019 at 12:34, Paolo Bonzini  wrote:
>
> The following changes since commit 33d609990621dea6c7d056c86f707b8811320ac1:
>
>   Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
> (2019-06-18 17:00:52 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 8e8cbed09ad9d577955691b4c061b61b602406d1:
>
>   hw: Nuke hw_compat_4_0_1 and pc_compat_4_0_1 (2019-06-21 13:25:29 +0200)
>
> 
> * Nuke hw_compat_4_0_1 and pc_compat_4_0_1 (Greg)
> * Static analysis fixes (Igor, Lidong)
> * X86 Hyper-V CPUID improvements (Vitaly)
> * X86 nested virt migration (Liran)
> * New MSR-based features (Xiaoyao)
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.1
for any user-visible changes.

-- PMM

[Qemu-devel] [PATCH] migrtion: define MigrationState/MigrationIncomingState.state as MigrationStatus

2019-06-21 Thread Wei Yang

No functional change. Add default case to fix warning.

Signed-off-by: Wei Yang 
---
 migration/migration.c | 8 +++-
 migration/migration.h | 6 +++---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2865ae3fa9..0fd2364961 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -946,6 +946,8 @@ static void fill_source_migration_info(MigrationInfo *info)
 case MIGRATION_STATUS_CANCELLED:
 info->has_status = true;
 break;
+default:
+return;
 }
 info->status = s->state;
 }
@@ -1054,6 +1056,8 @@ static void fill_destination_migration_info(MigrationInfo 
*info)
 info->has_status = true;
 fill_destination_postcopy_migration_info(info);
 break;
+default:
+return;
 }
 info->status = mis->state;
 }
@@ -1446,7 +1450,7 @@ void qmp_migrate_start_postcopy(Error **errp)
 
 /* shared migration helpers */
 
-void migrate_set_state(int *state, int old_state, int new_state)
+void migrate_set_state(MigrationStatus *state, int old_state, int new_state)
 {
 assert(new_state < MIGRATION_STATUS__MAX);
 if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
@@ -1683,6 +1687,8 @@ bool migration_is_idle(void)
 return false;
 case MIGRATION_STATUS__MAX:
 g_assert_not_reached();
+default:
+g_assert_not_reached();
 }
 
 return false;
diff --git a/migration/migration.h b/migration/migration.h
index 5e8f09c6db..418ee00053 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -65,7 +65,7 @@ struct MigrationIncomingState {
 
 QEMUBH *bh;
 
-int state;
+MigrationStatus state;
 
 bool have_colo_incoming_thread;
 QemuThread colo_incoming_thread;
@@ -151,7 +151,7 @@ struct MigrationState
 /* params from 'migrate-set-parameters' */
 MigrationParameters parameters;
 
-int state;
+MigrationStatus state;
 
 /* State related to return path */
 struct {
@@ -234,7 +234,7 @@ struct MigrationState
 bool decompress_error_check;
 };
 
-void migrate_set_state(int *state, int old_state, int new_state);
+void migrate_set_state(MigrationStatus *state, int old_state, int new_state);
 
 void migration_fd_process_incoming(QEMUFile *f);
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
-- 
2.19.1

Re: [Qemu-devel] [PATCH v8 01/10] escc: introduce a selector for the register bit

2019-06-21 Thread Philippe Mathieu-Daudé

On 6/20/19 12:19 AM, Laurent Vivier wrote:
> On Sparc and PowerMac, the bit 0 of the address
> selects the register type (control or data) and
> bit 1 selects the channel (B or A).
> 
> On m68k Macintosh, the bit 0 selects the channel and
> bit 1 the register type.
> 
> This patch introduces a new parameter (bit_swap) to
> the device interface to indicate bits usage must
> be swapped between registers and channels.
> 
> For the moment all the machines use the bit 0,
> but this change will be needed to emulate Quadra 800.

I feel we are missing something and this model slowly becomes another
Frankenstein. The SCC core is a monster anyway.
I'm glad you could resolve your issue with this easy fix.


> Signed-off-by: Laurent Vivier 
> Reviewed-by: Hervé Poussineau 
> Reviewed-by: Thomas Huth 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  hw/char/escc.c | 30 --
>  include/hw/char/escc.h |  1 +
>  2 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/char/escc.c b/hw/char/escc.c
> index 8ddbb4be4f..2748bd62c3 100644
> --- a/hw/char/escc.c
> +++ b/hw/char/escc.c
> @@ -43,14 +43,21 @@
>   * mouse and keyboard ports don't implement all functions and they are
>   * only asynchronous. There is no DMA.
>   *
> - * Z85C30 is also used on PowerMacs. There are some small differences
> - * between Sparc version (sunzilog) and PowerMac (pmac):
> + * Z85C30 is also used on PowerMacs and m68k Macs.
> + *
> + * There are some small differences between Sparc version (sunzilog)
> + * and PowerMac (pmac):
>   *  Offset between control and data registers
>   *  There is some kind of lockup bug, but we can ignore it
>   *  CTS is inverted
>   *  DMA on pmac using DBDMA chip
>   *  pmac can do IRDA and faster rates, sunzilog can only do 38400
>   *  pmac baud rate generator clock is 3.6864 MHz, sunzilog 4.9152 MHz
> + *
> + * Linux driver for m68k Macs is the same as for PowerMac (pmac_zilog),
> + * but registers are grouped by type and not by channel:
> + * channel is selected by bit 0 of the address (instead of bit 1)
> + * and register is selected by bit 1 of the address (instead of bit 0).
>   */
>  
>  /*
> @@ -170,6 +177,16 @@ static void handle_kbd_command(ESCCChannelState *s, int 
> val);
>  static int serial_can_receive(void *opaque);
>  static void serial_receive_byte(ESCCChannelState *s, int ch);
>  
> +static int reg_shift(ESCCState *s)
> +{
> +return s->bit_swap ? s->it_shift + 1 : s->it_shift;
> +}
> +
> +static int chn_shift(ESCCState *s)
> +{
> +return s->bit_swap ? s->it_shift : s->it_shift + 1;
> +}
> +
>  static void clear_queue(void *opaque)
>  {
>  ESCCChannelState *s = opaque;
> @@ -434,8 +451,8 @@ static void escc_mem_write(void *opaque, hwaddr addr,
>  int newreg, channel;
>  
>  val &= 0xff;
> -saddr = (addr >> serial->it_shift) & 1;
> -channel = (addr >> (serial->it_shift + 1)) & 1;
> +saddr = (addr >> reg_shift(serial)) & 1;
> +channel = (addr >> chn_shift(serial)) & 1;
>  s = >chn[channel];
>  switch (saddr) {
>  case SERIAL_CTRL:
> @@ -545,8 +562,8 @@ static uint64_t escc_mem_read(void *opaque, hwaddr addr,
>  uint32_t ret;
>  int channel;
>  
> -saddr = (addr >> serial->it_shift) & 1;
> -channel = (addr >> (serial->it_shift + 1)) & 1;
> +saddr = (addr >> reg_shift(serial)) & 1;
> +channel = (addr >> chn_shift(serial)) & 1;
>  s = >chn[channel];
>  switch (saddr) {
>  case SERIAL_CTRL:
> @@ -830,6 +847,7 @@ static void escc_realize(DeviceState *dev, Error **errp)
>  static Property escc_properties[] = {
>  DEFINE_PROP_UINT32("frequency", ESCCState, frequency,   0),
>  DEFINE_PROP_UINT32("it_shift",  ESCCState, it_shift,0),
> +DEFINE_PROP_BOOL("bit_swap",ESCCState, bit_swap,false),
>  DEFINE_PROP_UINT32("disabled",  ESCCState, disabled,0),
>  DEFINE_PROP_UINT32("chnBtype",  ESCCState, chn[0].type, 0),
>  DEFINE_PROP_UINT32("chnAtype",  ESCCState, chn[1].type, 0),
> diff --git a/include/hw/char/escc.h b/include/hw/char/escc.h
> index 42aca83611..8762f61c14 100644
> --- a/include/hw/char/escc.h
> +++ b/include/hw/char/escc.h
> @@ -50,6 +50,7 @@ typedef struct ESCCState {
>  
>  struct ESCCChannelState chn[2];
>  uint32_t it_shift;
> +bool bit_swap;
>  MemoryRegion mmio;
>  uint32_t disabled;
>  uint32_t frequency;
>

Re: [Qemu-devel] [PATCH 2/4] libvhost-user: support many virtqueues

2019-06-21 Thread Marc-André Lureau

Hi

On Fri, Jun 21, 2019 at 11:40 AM Stefan Hajnoczi  wrote:
>
> Currently libvhost-user is hardcoded to at most 8 virtqueues.  The
> device backend should decide the number of virtqueues, not
> libvhost-user.  This is important for multiqueue device backends where
> the guest driver needs an accurate number of virtqueues.
>
> This change breaks libvhost-user and libvhost-user-glib API stability.
> There is no stability guarantee yet, so make this change now and update
> all in-tree library users.
>
> This patch touches up vhost-user-blk, vhost-user-gpu, vhost-user-input,
> vhost-user-scsi, and vhost-user-bridge.  If the device has a fixed
> number of queues that exact number is used.  Otherwise the previous
> default of 8 virtqueues is used.
>
> vu_init() and vug_init() can now fail if malloc() returns NULL.  I
> considered aborting with an error in libvhost-user but it should be safe
> to instantiate new vhost-user instances at runtime without risk of
> terminating the process.  Therefore callers need to handle the vu_init()
> failure now.
>
> vhost-user-blk and vhost-user-scsi duplicate virtqueue index checks that
> are already performed by libvhost-user.  This code would need to be
> modified to use max_queues but remove it completely instead since it's
> redundant.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  contrib/libvhost-user/libvhost-user-glib.h |  2 +-
>  contrib/libvhost-user/libvhost-user.h  | 10 --
>  contrib/libvhost-user/libvhost-user-glib.c | 12 +--
>  contrib/libvhost-user/libvhost-user.c  | 32 -
>  contrib/vhost-user-blk/vhost-user-blk.c| 16 +
>  contrib/vhost-user-gpu/main.c  |  9 -
>  contrib/vhost-user-input/main.c| 11 +-
>  contrib/vhost-user-scsi/vhost-user-scsi.c  | 21 +--
>  tests/vhost-user-bridge.c  | 42 ++
>  9 files changed, 104 insertions(+), 51 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user-glib.h 
> b/contrib/libvhost-user/libvhost-user-glib.h
> index d3200f3afc..64d539d93a 100644
> --- a/contrib/libvhost-user/libvhost-user-glib.h
> +++ b/contrib/libvhost-user/libvhost-user-glib.h
> @@ -25,7 +25,7 @@ typedef struct VugDev {
>  GSource *src;
>  } VugDev;
>
> -void vug_init(VugDev *dev, int socket,
> +bool vug_init(VugDev *dev, uint16_t max_queues, int socket,
>vu_panic_cb panic, const VuDevIface *iface);
>  void vug_deinit(VugDev *dev);
>
> diff --git a/contrib/libvhost-user/libvhost-user.h 
> b/contrib/libvhost-user/libvhost-user.h
> index 3b888ff0a5..46b600799b 100644
> --- a/contrib/libvhost-user/libvhost-user.h
> +++ b/contrib/libvhost-user/libvhost-user.h
> @@ -25,7 +25,6 @@
>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  #define VHOST_LOG_PAGE 4096
>
> -#define VHOST_MAX_NR_VIRTQUEUE 8
>  #define VIRTQUEUE_MAX_SIZE 1024
>
>  #define VHOST_MEMORY_MAX_NREGIONS 8
> @@ -353,7 +352,7 @@ struct VuDev {
>  int sock;
>  uint32_t nregions;
>  VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> -VuVirtq vq[VHOST_MAX_NR_VIRTQUEUE];
> +VuVirtq *vq;
>  VuDevInflightInfo inflight_info;
>  int log_call_fd;
>  int slave_fd;
> @@ -362,6 +361,7 @@ struct VuDev {
>  uint64_t features;
>  uint64_t protocol_features;
>  bool broken;
> +uint16_t max_queues;
>
>  /* @set_watch: add or update the given fd to the watch set,
>   * call cb when condition is met */
> @@ -391,6 +391,7 @@ typedef struct VuVirtqElement {
>  /**
>   * vu_init:
>   * @dev: a VuDev context
> + * @max_queues: maximum number of virtqueues
>   * @socket: the socket connected to vhost-user master
>   * @panic: a panic callback
>   * @set_watch: a set_watch callback
> @@ -398,8 +399,11 @@ typedef struct VuVirtqElement {
>   * @iface: a VuDevIface structure with vhost-user device callbacks
>   *
>   * Intializes a VuDev vhost-user context.
> + *
> + * Returns: true on success, false on failure.
>   **/
> -void vu_init(VuDev *dev,
> +bool vu_init(VuDev *dev,
> + uint16_t max_queues,
>   int socket,
>   vu_panic_cb panic,
>   vu_set_watch_cb set_watch,
> diff --git a/contrib/libvhost-user/libvhost-user-glib.c 
> b/contrib/libvhost-user/libvhost-user-glib.c
> index 42660a1b36..99edd2f3de 100644
> --- a/contrib/libvhost-user/libvhost-user-glib.c
> +++ b/contrib/libvhost-user/libvhost-user-glib.c
> @@ -131,18 +131,24 @@ static void vug_watch(VuDev *dev, int condition, void 
> *data)
>  }
>  }
>
> -void
> -vug_init(VugDev *dev, int socket,
> +bool
> +vug_init(VugDev *dev, uint16_t max_queues, int socket,
>   vu_panic_cb panic, const VuDevIface *iface)
>  {
>  g_assert(dev);
>  g_assert(iface);
>
> -vu_init(>parent, socket, panic, set_watch, remove_watch, iface);
> +if (!vu_init(>parent, max_queues, socket, panic, set_watch,
> + remove_watch, iface)) {
> +return false;
> +}
> +
>  dev->fdmap =

[Qemu-devel] [PULL SUBSYSTEM s390x 3/3] s390x/cpumodel: Prepend KDSA features with "KDSA"

2019-06-21 Thread David Hildenbrand

Let's handle it just like for other crypto features.

Reviewed-by: Janosch Frank 
Acked-by: Cornelia Huck 
Signed-off-by: David Hildenbrand 
---
 target/s390x/cpu_features_def.inc.h | 30 ++---
 target/s390x/gen-features.c | 30 ++---
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/target/s390x/cpu_features_def.inc.h 
b/target/s390x/cpu_features_def.inc.h
index 5d65fa..c20c780f2e 100644
--- a/target/s390x/cpu_features_def.inc.h
+++ b/target/s390x/cpu_features_def.inc.h
@@ -339,21 +339,21 @@ DEF_FEAT(KMA_GCM_EAES_192, "kma-gcm-eaes-192", KMA, 27, 
"KMA GCM-Encrypted-AES-1
 DEF_FEAT(KMA_GCM_EAES_256, "kma-gcm-eaes-256", KMA, 28, "KMA 
GCM-Encrypted-AES-256")
 
 /* Features exposed via the KDSA instruction. */
-DEF_FEAT(ECDSA_VERIFY_P256, "kdsa-ecdsa-verify-p256", KDSA, 1, "KDSA 
ECDSA-Verify-P256")
-DEF_FEAT(ECDSA_VERIFY_P384, "kdsa-ecdsa-verify-p384", KDSA, 2, "KDSA 
ECDSA-Verify-P384")
-DEF_FEAT(ECDSA_VERIFY_P512, "kdsa-ecdsa-verify-p521", KDSA, 3, "KDSA 
ECDSA-Verify-P521")
-DEF_FEAT(ECDSA_SIGN_P256, "kdsa-ecdsa-sign-p256", KDSA, 9, "KDSA 
ECDSA-Sign-P256")
-DEF_FEAT(ECDSA_SIGN_P384, "kdsa-ecdsa-sign-p384", KDSA, 10, "KDSA 
ECDSA-Sign-P384")
-DEF_FEAT(ECDSA_SIGN_P512, "kdsa-ecdsa-sign-p521", KDSA, 11, "KDSA 
ECDSA-Sign-P521")
-DEF_FEAT(EECDSA_SIGN_P256, "kdsa-eecdsa-sign-p256", KDSA, 17, "KDSA 
Encrypted-ECDSA-Sign-P256")
-DEF_FEAT(EECDSA_SIGN_P384, "kdsa-eecdsa-sign-p384", KDSA, 18, "KDSA 
Encrypted-ECDSA-Sign-P384")
-DEF_FEAT(EECDSA_SIGN_P512, "kdsa-eecdsa-sign-p521", KDSA, 19, "KDSA 
Encrypted-ECDSA-Sign-P521")
-DEF_FEAT(EDDSA_VERIFY_ED25519, "kdsa-eddsa-verify-ed25519", KDSA, 32, "KDSA 
EdDSA-Verify-Ed25519")
-DEF_FEAT(EDDSA_VERIFY_ED448, "kdsa-eddsa-verify-ed448", KDSA, 36, "KDSA 
EdDSA-Verify-Ed448")
-DEF_FEAT(EDDSA_SIGN_ED25519, "kdsa-eddsa-sign-ed25519", KDSA, 40, "KDSA 
EdDSA-Sign-Ed25519")
-DEF_FEAT(EDDSA_SIGN_ED448, "kdsa-eddsa-sign-ed448", KDSA, 44, "KDSA 
EdDSA-Sign-Ed448")
-DEF_FEAT(EEDDSA_SIGN_ED25519, "kdsa-eeddsa-sign-ed25519", KDSA, 48, "KDSA 
Encrypted-EdDSA-Sign-Ed25519")
-DEF_FEAT(EEDDSA_SIGN_ED448, "kdsa-eeddsa-sign-ed448", KDSA, 52, "KDSA 
Encrypted-EdDSA-Sign-Ed448")
+DEF_FEAT(KDSA_ECDSA_VERIFY_P256, "kdsa-ecdsa-verify-p256", KDSA, 1, "KDSA 
ECDSA-Verify-P256")
+DEF_FEAT(KDSA_ECDSA_VERIFY_P384, "kdsa-ecdsa-verify-p384", KDSA, 2, "KDSA 
ECDSA-Verify-P384")
+DEF_FEAT(KDSA_ECDSA_VERIFY_P512, "kdsa-ecdsa-verify-p521", KDSA, 3, "KDSA 
ECDSA-Verify-P521")
+DEF_FEAT(KDSA_ECDSA_SIGN_P256, "kdsa-ecdsa-sign-p256", KDSA, 9, "KDSA 
ECDSA-Sign-P256")
+DEF_FEAT(KDSA_ECDSA_SIGN_P384, "kdsa-ecdsa-sign-p384", KDSA, 10, "KDSA 
ECDSA-Sign-P384")
+DEF_FEAT(KDSA_ECDSA_SIGN_P512, "kdsa-ecdsa-sign-p521", KDSA, 11, "KDSA 
ECDSA-Sign-P521")
+DEF_FEAT(KDSA_EECDSA_SIGN_P256, "kdsa-eecdsa-sign-p256", KDSA, 17, "KDSA 
Encrypted-ECDSA-Sign-P256")
+DEF_FEAT(KDSA_EECDSA_SIGN_P384, "kdsa-eecdsa-sign-p384", KDSA, 18, "KDSA 
Encrypted-ECDSA-Sign-P384")
+DEF_FEAT(KDSA_EECDSA_SIGN_P512, "kdsa-eecdsa-sign-p521", KDSA, 19, "KDSA 
Encrypted-ECDSA-Sign-P521")
+DEF_FEAT(KDSA_EDDSA_VERIFY_ED25519, "kdsa-eddsa-verify-ed25519", KDSA, 32, 
"KDSA EdDSA-Verify-Ed25519")
+DEF_FEAT(KDSA_EDDSA_VERIFY_ED448, "kdsa-eddsa-verify-ed448", KDSA, 36, "KDSA 
EdDSA-Verify-Ed448")
+DEF_FEAT(KDSA_EDDSA_SIGN_ED25519, "kdsa-eddsa-sign-ed25519", KDSA, 40, "KDSA 
EdDSA-Sign-Ed25519")
+DEF_FEAT(KDSA_EDDSA_SIGN_ED448, "kdsa-eddsa-sign-ed448", KDSA, 44, "KDSA 
EdDSA-Sign-Ed448")
+DEF_FEAT(KDSA_EEDDSA_SIGN_ED25519, "kdsa-eeddsa-sign-ed25519", KDSA, 48, "KDSA 
Encrypted-EdDSA-Sign-Ed25519")
+DEF_FEAT(KDSA_EEDDSA_SIGN_ED448, "kdsa-eeddsa-sign-ed448", KDSA, 52, "KDSA 
Encrypted-EdDSA-Sign-Ed448")
 
 /* Features exposed via the SORTL instruction. */
 DEF_FEAT(SORTL_SFLR, "sortl-sflr", SORTL, 1, "SORTL SFLR")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index dc320a06c2..af06be3e3b 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -216,21 +216,21 @@
 
 #define S390_FEAT_GROUP_MSA_EXT_9 \
 S390_FEAT_MSA_EXT_9, \
-S390_FEAT_ECDSA_VERIFY_P256, \
-S390_FEAT_ECDSA_VERIFY_P384, \
-S390_FEAT_ECDSA_VERIFY_P512, \
-S390_FEAT_ECDSA_SIGN_P256, \
-S390_FEAT_ECDSA_SIGN_P384, \
-S390_FEAT_ECDSA_SIGN_P512, \
-S390_FEAT_EECDSA_SIGN_P256, \
-S390_FEAT_EECDSA_SIGN_P384, \
-S390_FEAT_EECDSA_SIGN_P512, \
-S390_FEAT_EDDSA_VERIFY_ED25519, \
-S390_FEAT_EDDSA_VERIFY_ED448, \
-S390_FEAT_EDDSA_SIGN_ED25519, \
-S390_FEAT_EDDSA_SIGN_ED448, \
-S390_FEAT_EEDDSA_SIGN_ED25519, \
-S390_FEAT_EEDDSA_SIGN_ED448, \
+S390_FEAT_KDSA_ECDSA_VERIFY_P256, \
+S390_FEAT_KDSA_ECDSA_VERIFY_P384, \
+S390_FEAT_KDSA_ECDSA_VERIFY_P512, \
+S390_FEAT_KDSA_ECDSA_SIGN_P256, \
+S390_FEAT_KDSA_ECDSA_SIGN_P384, \
+S390_FEAT_KDSA_ECDSA_SIGN_P512, \
+S390_FEAT_KDSA_EECDSA_SIGN_P256, \
+S390_FEAT_KDSA_EECDSA_SIGN_P384, \
+S390_FEAT_KDSA_EECDSA_SIGN_P512, \
+

[Qemu-devel] [PULL SUBSYSTEM s390x 0/3] s390x/tcg: pending patches

2019-06-21 Thread David Hildenbrand

This pull request is not for master.

Hi Conny,

The following changes since commit 33d609990621dea6c7d056c86f707b8811320ac1:

  Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
(2019-06-18 17:00:52 +0100)

are available in the Git repository at:

  https://github.com/davidhildenbrand/qemu.git tags/s390x-tcg-2019-06-21

for you to fetch changes up to ef506f804a48a6fb7983d721092f1b32a4543f3e:

  s390x/cpumodel: Prepend KDSA features with "KDSA" (2019-06-21 15:26:53 +0200)


One fix for a tcg test case and two cleanups/refactorings of cpu feature
definitions.


David Hildenbrand (2):
  s390x/cpumodel: Rework CPU feature definition
  s390x/cpumodel: Prepend KDSA features with "KDSA"

Richard Henderson (1):
  tests/tcg/s390x: Fix alignment of csst parameter list

 target/s390x/cpu_features.c | 352 +-
 target/s390x/cpu_features_def.h | 352 +-
 target/s390x/cpu_features_def.inc.h | 369 
 target/s390x/gen-features.c |  30 +--
 tests/tcg/s390x/csst.c  |   2 +-
 5 files changed, 402 insertions(+), 703 deletions(-)
 create mode 100644 target/s390x/cpu_features_def.inc.h

-- 
2.21.0

Re: [Qemu-devel] [PATCH 4/4] docs: avoid vhost-user-net specifics in multiqueue section

2019-06-21 Thread Marc-André Lureau

On Fri, Jun 21, 2019 at 11:41 AM Stefan Hajnoczi  wrote:
>
> The "Multiple queue support" section makes references to vhost-user-net
> "queue pairs".  This is confusing for two reasons:
> 1. This actually applies to all device types, not just vhost-user-net.
> 2. VHOST_USER_GET_QUEUE_NUM returns the number of virtqueues, not the
>number of queue pairs.
>
> Reword the section so that the vhost-user-net specific part is relegated
> to the very end: we acknowledge that vhost-user-net historically
> automatically enabled the first queue pair.
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Marc-André Lureau 

> ---
>  docs/interop/vhost-user.rst | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index dc0ff9211f..5750668aba 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -324,19 +324,20 @@ must support changing some configuration aspects on the 
> fly.
>  Multiple queue support
>  --
>
> -Multiple queue is treated as a protocol extension, hence the slave has
> -to implement protocol features first. The multiple queues feature is
> -supported only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ``
> -(bit 0) is set.
> +Multiple queue support allows the slave to advertise the maximum number of
> +queues.  This is treated as a protocol extension, hence the slave has to
> +implement protocol features first. The multiple queues feature is supported
> +only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
>
> -The max number of queue pairs the slave supports can be queried with
> -message ``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the
> -number of requested queues is bigger than that.
> +The max number of queues the slave supports can be queried with message
> +``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested
> +queues is bigger than that.
>
>  As all queues share one connection, the master uses a unique index for each
> -queue in the sent message to identify a specified queue. One queue pair
> -is enabled initially. More queues are enabled dynamically, by sending
> -message ``VHOST_USER_SET_VRING_ENABLE``.
> +queue in the sent message to identify a specified queue.
> +
> +The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
> +vhost-user-net has historically automatically enabled the first queue pair.
>
>  Migration
>  -
> --
> 2.21.0
>

Re: [Qemu-devel] [PULL v2 00/25] Misc (mostly x86) patches for 2019-06-21

2019-06-21 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/1561116620-22245-1-git-send-email-pbonz...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PULL v2 00/25] Misc (mostly x86) patches for 2019-06-21
Message-id: 1561116620-22245-1-git-send-email-pbonz...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20190621132324.2165-1-mre...@redhat.com -> 
patchew/20190621132324.2165-1-mre...@redhat.com
Switched to a new branch 'test'
cf66e55 hw: Nuke hw_compat_4_0_1 and pc_compat_4_0_1
6c88e3a util/main-loop: Fix incorrect assertion
13371b1 sd: Fix out-of-bounds assertions
0b837d9 target/i386: kvm: Add nested migration blocker only when kernel lacks 
required capabilities
1583048 target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD
4e4d392 target/i386: kvm: Add support for save and restore nested state
c952fd2 vmstate: Add support for kernel integer types
be0bc2a linux-headers: sync with latest KVM headers from Linux 5.2
fcc123c target/i386: kvm: Block migration for vCPUs exposed with nested 
virtualization
ff0e7df target/i386: kvm: Re-inject #DB to guest with updated DR6
b03c3ff target/i386: kvm: Use symbolic constant for #DB/#BP exception constants
67ea33b KVM: Introduce kvm_arch_destroy_vcpu()
9e26b32 target/i386: kvm: Delete VMX migration blocker on vCPU init failure
7020f89 target/i386: define a new MSR based feature word - FEAT_CORE_CAPABILITY
3a0d495 i386/kvm: add support for Direct Mode for Hyper-V synthetic timers
6a7127a i386/kvm: hv-evmcs requires hv-vapic
c7514b4 i386/kvm: hv-tlbflush/ipi require hv-vpindex
51cdfac i386/kvm: hv-stimer requires hv-time and hv-synic
1f02e73 i386/kvm: implement 'hv-passthrough' mode
aeaf6f8 i386/kvm: document existing Hyper-V enlightenments
82d1c81 i386/kvm: move Hyper-V CPUID filling to hyperv_handle_properties()
f5e0932 i386/kvm: add support for KVM_GET_SUPPORTED_HV_CPUID
b0b89da i386/kvm: convert hyperv enlightenments properties from bools to bits
d49efb1 hax: Honor CPUState::halted
113cfe4 kvm-all: Add/update fprintf's for kvm_*_ioeventfd_del

=== OUTPUT BEGIN ===
1/25 Checking commit 113cfe429a07 (kvm-all: Add/update fprintf's for 
kvm_*_ioeventfd_del)
2/25 Checking commit d49efb131de1 (hax: Honor CPUState::halted)
WARNING: Block comments use a leading /* on a separate line
#77: FILE: target/i386/hax-all.c:479:
+/* After a vcpu is halted (either because it is an AP and has just been

WARNING: Block comments use a leading /* on a separate line
#109: FILE: target/i386/hax-all.c:519:
+/* If this vcpu is halted, we must not ask HAXM to run it. Instead, we

total: 0 errors, 2 warnings, 60 lines checked

Patch 2/25 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/25 Checking commit b0b89daa3c59 (i386/kvm: convert hyperv enlightenments 
properties from bools to bits)
4/25 Checking commit f5e09320671b (i386/kvm: add support for 
KVM_GET_SUPPORTED_HV_CPUID)
5/25 Checking commit 82d1c8147a54 (i386/kvm: move Hyper-V CPUID filling to 
hyperv_handle_properties())
6/25 Checking commit aeaf6f88598f (i386/kvm: document existing Hyper-V 
enlightenments)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#18: 
new file mode 100644

total: 0 errors, 1 warnings, 181 lines checked

Patch 6/25 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/25 Checking commit 1f02e738d953 (i386/kvm: implement 'hv-passthrough' mode)
8/25 Checking commit 51cdfac4fca3 (i386/kvm: hv-stimer requires hv-time and 
hv-synic)
9/25 Checking commit c7514b44e2d0 (i386/kvm: hv-tlbflush/ipi require hv-vpindex)
10/25 Checking commit 6a7127a97552 (i386/kvm: hv-evmcs requires hv-vapic)
11/25 Checking commit 3a0d49524898 (i386/kvm: add support for Direct Mode for 
Hyper-V synthetic timers)
12/25 Checking commit 7020f89f2f08 (target/i386: define a new MSR based feature 
word - FEAT_CORE_CAPABILITY)
13/25 Checking commit 9e26b329c385 (target/i386: kvm: Delete VMX migration 
blocker on vCPU init failure)
14/25 Checking commit 67ea33bcc49e (KVM: Introduce kvm_arch_destroy_vcpu())
ERROR: code indent should never use tabs
#61: FILE: target/arm/kvm32.c:245:
+^Ireturn 0;$

ERROR: g_free(NULL) is safe this check is probably not required
#96: FILE: target/i386/kvm.c:1687:
+if (cpu->kvm_msr_buf) {
+g_free(cpu->kvm_msr_buf);

total: 2 errors, 0 warnings, 96 lines checked

Patch 14/25 has style problems, please review.  If any of these errors
are false positives

[Qemu-devel] [Bug 1831225] Re: guest migration 100% cpu freeze bug

2019-06-21 Thread Frank Schreuder

An update on our further research. We tried bumping the hypervisor
kernel form 4.19.43 to 4.19.50 which included the following patch, which
we hoped to be related to our issue:

https://lore.kernel.org/lkml/20190520115253.743557...@linuxfoundation.org/#t

Sadly after a few thousand migrations we encountered two freezes again,
and halted further migrations. Again the affected VMs seem to run
pre-4.x kernels and all but one freezes occurred with Gold 6126 CPUs.

While analyzing memory dumps of the VMs with the crash utility we found
a peculiar similarity. They all seem to have jumped ~584 years, which
led me to this bug report from 2011:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/805341 . Does this
provide any insight into what the issue might be on host-level?

It is very well possible that the issue lies in the guest kernel, but as
the service we provide is unmanaged we have no control over it.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1831225

Title:
  guest migration 100% cpu freeze bug

Status in QEMU:
  New

Bug description:
  # Investigate migration cpu hog(100%) bug

  I have some issues when migrating from kernel 4.14.63 running qemu 2.11.2 to 
kernel 4.19.43 running qemu 2.11.2.
  The hypervisors are running on debian jessie with libvirt v5.3.0.
  Linux, libvirt and qemu are all custom compiled.

  I migrated around 10.000 vms and every once in a while a vm is stuck
  at 100% cpu after what we can see right now is that the target
  hypervisor runs on linux 4.19.53. This happened with 4 vms so far. It
  is not that easy to debug, we found this out pretty quickly because we
  are running monitoring on frozen vms after migrations.

  Last year we were having the same "kind of" bug 
https://bugs.launchpad.net/qemu/+bug/177 when trying to upgrade qemu 2.6 to 
2.11.
  This bug was fixed after applying the following patch: 
http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00820.html

  This patch is still applied as you can see because of the available pre_load 
var on the kvmclock_vmsd struct:
  (gdb) ptype kvmclock_vmsd
  type = const struct VMStateDescription {
  const char *name;
  int unmigratable;
  int version_id;
  int minimum_version_id;
  int minimum_version_id_old;
  MigrationPriority priority;
  LoadStateHandler *load_state_old;
  int (*pre_load)(void *);
  int (*post_load)(void *, int);
  int (*pre_save)(void *);
  _Bool (*needed)(void *);
  VMStateField *fields;
  const VMStateDescription **subsections;
  }

  I attached gdb to a vcpu thread of one stuck vm, and a bt showed the 
following info:
  Thread 4 (Thread 0x7f3a431a4700 (LWP 37799)):
  #0  0x7f3a576f5017 in ioctl () at ../sysdeps/unix/syscall-template.S:84
  #1  0x55d84d15de57 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55d84fca78d0, 
type=type@entry=44672) at 
/home/dbosschieter/src/qemu-pkg/src/accel/kvm/kvm-all.c:2050
  #2  0x55d84d15dfc6 in kvm_cpu_exec (cpu=cpu@entry=0x55d84fca78d0) at 
/home/dbosschieter/src/qemu-pkg/src/accel/kvm/kvm-all.c:1887
  #3  0x55d84d13ab64 in qemu_kvm_cpu_thread_fn (arg=0x55d84fca78d0) at 
/home/dbosschieter/src/qemu-pkg/src/cpus.c:1136
  #4  0x7f3a579ba4a4 in start_thread (arg=0x7f3a431a4700) at 
pthread_create.c:456
  #5  0x7f3a576fcd0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

  Thread 3 (Thread 0x7f3a439a5700 (LWP 37798)):
  #0  0x7f3a576f5017 in ioctl () at ../sysdeps/unix/syscall-template.S:84
  #1  0x55d84d15de57 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55d84fc5cbb0, 
type=type@entry=44672) at 
/home/dbosschieter/src/qemu-pkg/src/accel/kvm/kvm-all.c:2050
  #2  0x55d84d15dfc6 in kvm_cpu_exec (cpu=cpu@entry=0x55d84fc5cbb0) at 
/home/dbosschieter/src/qemu-pkg/src/accel/kvm/kvm-all.c:1887
  #3  0x55d84d13ab64 in qemu_kvm_cpu_thread_fn (arg=0x55d84fc5cbb0) at 
/home/dbosschieter/src/qemu-pkg/src/cpus.c:1136
  #4  0x7f3a579ba4a4 in start_thread (arg=0x7f3a439a5700) at 
pthread_create.c:456
  #5  0x7f3a576fcd0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

  The ioctl call is a ioctl(18, KVM_RUN and it looks like it is looping
  inside the vm itself.

  I saved the state of the VM (with `virsh save`) after I found it was hanging 
on its vcpu threads. Then I restored this vm on a test environment running the 
same kernel, QEMU and libvirt version). After the restore the VM still was 
haning at 100% cpu usage on all the vcpus.
  I tried to use the perf kvm guest option to trace the guest vm with a copy of 
the kernel, modules and kallsyms files from inside the guest vm and I got to 
the following perf stat:

   Event Total %Total CurAvg/s
   kvm_entry   5198993   23.1   277007
   kvm_exit5198976   23.1

Re: [Qemu-devel] [PATCH 1/4] libvhost-user: add vmsg_set_reply_u64() helper

2019-06-21 Thread Marc-André Lureau

On Fri, Jun 21, 2019 at 11:40 AM Stefan Hajnoczi  wrote:
>
> The VhostUserMsg request is reused as the reply by message processing
> functions.  This is risky since request fields may corrupt the reply if
> the vhost-user message handler function forgets to re-initialize them.
>
> Changing this practice would be very invasive but we can introduce a
> helper function to make u64 payload replies safe.  This also eliminates
> code duplication in message processing functions.
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Marc-André Lureau 


> ---
>  contrib/libvhost-user/libvhost-user.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c 
> b/contrib/libvhost-user/libvhost-user.c
> index 443b7e08c3..a8657c7af2 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -216,6 +216,15 @@ vmsg_close_fds(VhostUserMsg *vmsg)
>  }
>  }
>
> +/* Set reply payload.u64 and clear request flags and fd_num */
> +static void vmsg_set_reply_u64(VhostUserMsg *vmsg, uint64_t val)
> +{
> +vmsg->flags = 0; /* defaults will be set by vu_send_reply() */
> +vmsg->size = sizeof(vmsg->payload.u64);
> +vmsg->payload.u64 = val;
> +vmsg->fd_num = 0;
> +}
> +
>  /* A test to see if we have userfault available */
>  static bool
>  have_userfault(void)
> @@ -1168,10 +1177,7 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
> *vmsg)
>  features |= dev->iface->get_protocol_features(dev);
>  }
>
> -vmsg->payload.u64 = features;
> -vmsg->size = sizeof(vmsg->payload.u64);
> -vmsg->fd_num = 0;
> -
> +vmsg_set_reply_u64(vmsg, features);
>  return true;
>  }
>
> @@ -1307,17 +1313,14 @@ out:
>  static bool
>  vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
>  {
> -vmsg->payload.u64 = -1;
> -vmsg->size = sizeof(vmsg->payload.u64);
> -
>  if (dev->nregions) {
>  vu_panic(dev, "Regions already registered at postcopy-listen");
> +vmsg_set_reply_u64(vmsg, -1);
>  return true;
>  }
>  dev->postcopy_listening = true;
>
> -vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
> -vmsg->payload.u64 = 0; /* Success */
> +vmsg_set_reply_u64(vmsg, 0);
>  return true;
>  }
>
> @@ -1332,10 +1335,7 @@ vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
>  DPRINT("%s: Done close\n", __func__);
>  }
>
> -vmsg->fd_num = 0;
> -vmsg->payload.u64 = 0;
> -vmsg->size = sizeof(vmsg->payload.u64);
> -vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
> +vmsg_set_reply_u64(vmsg, 0);
>  DPRINT("%s: exit\n", __func__);
>  return true;
>  }
> --
> 2.21.0
>

Re: [Qemu-devel] [PATCH 3/4] libvhost-user: implement VHOST_USER_PROTOCOL_F_MQ

2019-06-21 Thread Marc-André Lureau

On Fri, Jun 21, 2019 at 11:40 AM Stefan Hajnoczi  wrote:
>
> Existing vhost-user device backends, including vhost-user-scsi and
> vhost-user-blk, support multiqueue but libvhost-user currently does not
> advertise this.
>
> VHOST_USER_PROTOCOL_F_MQ enables the VHOST_USER_GET_QUEUE_NUM request
> needed for a vhost-user master to query the number of queues.  For
> example, QEMU's vhost-user-net master depends on
> VHOST_USER_PROTOCOL_F_MQ for multiqueue.
>
> If you're wondering how any device backend with more than one virtqueue
> functions today, it's because device types with a fixed number of
> virtqueues do not require querying the number of queues.  Therefore the
> vhost-user master for vhost-user-input with 2 virtqueues, for example,
> doesn't actually depend on VHOST_USER_PROTOCOL_F_MQ.  It just enables
> virtqueues 0 and 1 without asking.
>
> Let there be multiqueue!
>
> Suggested-by: Sebastien Boeuf 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Marc-André Lureau 

> ---
>  contrib/libvhost-user/libvhost-user.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c 
> b/contrib/libvhost-user/libvhost-user.c
> index 0c88431e8f..312c54f260 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -1160,7 +1160,8 @@ vu_set_vring_err_exec(VuDev *dev, VhostUserMsg *vmsg)
>  static bool
>  vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
> -uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
> +uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_MQ |
> +1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
>  1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ |
>  1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER |
>  1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD;
> @@ -1200,8 +1201,8 @@ vu_set_protocol_features_exec(VuDev *dev, VhostUserMsg 
> *vmsg)
>  static bool
>  vu_get_queue_num_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
> -DPRINT("Function %s() not implemented yet.\n", __func__);
> -return false;
> +vmsg_set_reply_u64(vmsg, dev->max_queues);
> +return true;
>  }
>
>  static bool
> --
> 2.21.0
>

[Qemu-devel] [PATCH v14 6/7] ext4: disable map_sync for async flush

2019-06-21 Thread Pankaj Gupta

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

Re: [Qemu-devel] [PATCH RFC] checkpatch: do not warn for multiline parenthesized returned value

2019-06-21 Thread Eric Blake

On 6/21/19 6:28 AM, Paolo Bonzini wrote:
> While indeed we do not want to have
> 
> return (a);
> 
> it is less clear that this applies to
> 
> return (a &&
> b);
> 
> Some editors indent more nicely if you have parentheses, and some people's
> eyes may appreciate that as well.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  scripts/checkpatch.pl | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

I'm certainly in favor of this (as I've been known to use this style).

> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index c2aaf42..2f81371 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2296,7 +2296,8 @@ sub process {
>  $value =~ s/\([^\(\)]*\)/1/) {
>   }
>  #print "value<$value>\n";
> - if ($value =~ /^\s*(?:$Ident|-?$Constant)\s*$/) {
> + if ($value =~ /^\s*(?:$Ident|-?$Constant)\s*$/ &&
> + $line =~ /;$/) {

So the diagnosis now checks for a trailing ';' as its witness of whether
this is a one-liner return statement, leaving multi-liners undiagnosed.
Easy enough to understand.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL SUBSYSTEM s390x 2/3] s390x/cpumodel: Rework CPU feature definition

2019-06-21 Thread David Hildenbrand

Let's define features at a single spot and make it less error prone to
define new features.

Acked-by: Janosch Frank 
Acked-by: Cornelia Huck 
Signed-off-by: David Hildenbrand 
---
 target/s390x/cpu_features.c | 352 +-
 target/s390x/cpu_features_def.h | 352 +-
 target/s390x/cpu_features_def.inc.h | 369 
 3 files changed, 386 insertions(+), 687 deletions(-)
 create mode 100644 target/s390x/cpu_features_def.inc.h

diff --git a/target/s390x/cpu_features.c b/target/s390x/cpu_features.c
index f64f581c86..9f817e3cfa 100644
--- a/target/s390x/cpu_features.c
+++ b/target/s390x/cpu_features.c
@@ -2,8 +2,9 @@
  * CPU features/facilities for s390x
  *
  * Copyright IBM Corp. 2016, 2018
+ * Copyright Red Hat, Inc. 2019
  *
- * Author(s): David Hildenbrand 
+ * Author(s): David Hildenbrand 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or (at
  * your option) any later version. See the COPYING file in the top-level
@@ -14,346 +15,17 @@
 #include "qemu/module.h"
 #include "cpu_features.h"
 
-#define FEAT_INIT(_name, _type, _bit, _desc) \
-{\
-.name = _name,   \
-.type = _type,   \
-.bit = _bit, \
-.desc = _desc,   \
-}
-
-/* S390FeatDef.bit is not applicable as there is no feature block. */
-#define FEAT_INIT_MISC(_name, _desc) \
-FEAT_INIT(_name, S390_FEAT_TYPE_MISC, 0, _desc)
-
-/* indexed by feature number for easy lookup */
-static const S390FeatDef s390_features[] = {
-FEAT_INIT("esan3", S390_FEAT_TYPE_STFL, 0, "Instructions marked as n3"),
-FEAT_INIT("zarch", S390_FEAT_TYPE_STFL, 1, "z/Architecture architectural 
mode"),
-FEAT_INIT("dateh", S390_FEAT_TYPE_STFL, 3, "DAT-enhancement facility"),
-FEAT_INIT("idtes", S390_FEAT_TYPE_STFL, 4, "IDTE selective TLB 
segment-table clearing"),
-FEAT_INIT("idter", S390_FEAT_TYPE_STFL, 5, "IDTE selective TLB 
region-table clearing"),
-FEAT_INIT("asnlxr", S390_FEAT_TYPE_STFL, 6, "ASN-and-LX reuse facility"),
-FEAT_INIT("stfle", S390_FEAT_TYPE_STFL, 7, "Store-facility-list-extended 
facility"),
-FEAT_INIT("edat", S390_FEAT_TYPE_STFL, 8, "Enhanced-DAT facility"),
-FEAT_INIT("srs", S390_FEAT_TYPE_STFL, 9, "Sense-running-status facility"),
-FEAT_INIT("csske", S390_FEAT_TYPE_STFL, 10, "Conditional-SSKE facility"),
-FEAT_INIT("ctop", S390_FEAT_TYPE_STFL, 11, "Configuration-topology 
facility"),
-FEAT_INIT("apqci", S390_FEAT_TYPE_STFL, 12, "Query AP Configuration 
Information facility"),
-FEAT_INIT("ipter", S390_FEAT_TYPE_STFL, 13, "IPTE-range facility"),
-FEAT_INIT("nonqks", S390_FEAT_TYPE_STFL, 14, "Nonquiescing key-setting 
facility"),
-FEAT_INIT("apft", S390_FEAT_TYPE_STFL, 15, "AP Facilities Test facility"),
-FEAT_INIT("etf2", S390_FEAT_TYPE_STFL, 16, "Extended-translation facility 
2"),
-FEAT_INIT("msa-base", S390_FEAT_TYPE_STFL, 17, "Message-security-assist 
facility (excluding subfunctions)"),
-FEAT_INIT("ldisp", S390_FEAT_TYPE_STFL, 18, "Long-displacement facility"),
-FEAT_INIT("ldisphp", S390_FEAT_TYPE_STFL, 19, "Long-displacement facility 
has high performance"),
-FEAT_INIT("hfpm", S390_FEAT_TYPE_STFL, 20, "HFP-multiply-add/subtract 
facility"),
-FEAT_INIT("eimm", S390_FEAT_TYPE_STFL, 21, "Extended-immediate facility"),
-FEAT_INIT("etf3", S390_FEAT_TYPE_STFL, 22, "Extended-translation facility 
3"),
-FEAT_INIT("hfpue", S390_FEAT_TYPE_STFL, 23, "HFP-unnormalized-extension 
facility"),
-FEAT_INIT("etf2eh", S390_FEAT_TYPE_STFL, 24, "ETF2-enhancement facility"),
-FEAT_INIT("stckf", S390_FEAT_TYPE_STFL, 25, "Store-clock-fast facility"),
-FEAT_INIT("parseh", S390_FEAT_TYPE_STFL, 26, "Parsing-enhancement 
facility"),
-FEAT_INIT("mvcos", S390_FEAT_TYPE_STFL, 27, 
"Move-with-optional-specification facility"),
-FEAT_INIT("tods-base", S390_FEAT_TYPE_STFL, 28, "TOD-clock-steering 
facility (excluding subfunctions)"),
-FEAT_INIT("etf3eh", S390_FEAT_TYPE_STFL, 30, "ETF3-enhancement facility"),
-FEAT_INIT("ectg", S390_FEAT_TYPE_STFL, 31, "Extract-CPU-time facility"),
-FEAT_INIT("csst", S390_FEAT_TYPE_STFL, 32, "Compare-and-swap-and-store 
facility"),
-FEAT_INIT("csst2", S390_FEAT_TYPE_STFL, 33, "Compare-and-swap-and-store 
facility 2"),
-FEAT_INIT("ginste", S390_FEAT_TYPE_STFL, 34, 
"General-instructions-extension facility"),
-FEAT_INIT("exrl", S390_FEAT_TYPE_STFL, 35, "Execute-extensions facility"),
-FEAT_INIT("emon", S390_FEAT_TYPE_STFL, 36, "Enhanced-monitor facility"),
-FEAT_INIT("fpe", S390_FEAT_TYPE_STFL, 37, "Floating-point extension 
facility"),
-FEAT_INIT("opc", S390_FEAT_TYPE_STFL, 38, "Order Preserving Compression 
facility"),
-FEAT_INIT("sprogp", S390_FEAT_TYPE_STFL, 40,

Re: [Qemu-devel] [PATCH 07/12] block/backup: add 'always' bitmap sync policy

2019-06-21 Thread Vladimir Sementsov-Ogievskiy

21.06.2019 16:08, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 15:59, Vladimir Sementsov-Ogievskiy wrote:
>> 21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:
>>> 20.06.2019 4:03, John Snow wrote:
 This adds an "always" policy for bitmap synchronization. Regardless of if
 the job succeeds or fails, the bitmap is *always* synchronized. This means
 that for backups that fail part-way through, the bitmap retains a record of
 which sectors need to be copied out to accomplish a new backup using the
 old, partial result.

 In effect, this allows us to "resume" a failed backup; however the new 
 backup
 will be from the new point in time, so it isn't a "resume" as much as it is
 an "incremental retry." This can be useful in the case of extremely large
 backups that fail considerably through the operation and we'd like to not 
 waste
 the work that was already performed.

 Signed-off-by: John Snow 
 ---
   qapi/block-core.json |  5 -
   block/backup.c   | 10 ++
   2 files changed, 10 insertions(+), 5 deletions(-)

 diff --git a/qapi/block-core.json b/qapi/block-core.json
 index 0332dcaabc..58d267f1f5 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
 @@ -1143,6 +1143,9 @@
   # An enumeration of possible behaviors for the synchronization of a 
 bitmap
   # when used for data copy operations.
   #
 +# @always: The bitmap is always synchronized with remaining blocks to 
 copy,
 +#  whether or not the operation has completed successfully or not.
>>>
>>> Hmm, now I think that 'always' sounds a bit like 'really always' i.e. 
>>> during backup
>>> too, which is confusing.. But I don't have better suggestion.
>>>
 +#
   # @conditional: The bitmap is only synchronized when the operation is 
 successul.
   #   This is useful for Incremental semantics.
   #
 @@ -1153,7 +1156,7 @@
   # Since: 4.1
   ##
   { 'enum': 'BitmapSyncMode',
 -  'data': ['conditional', 'never'] }
 +  'data': ['always', 'conditional', 'never'] }
   ##
   # @MirrorCopyMode:
 diff --git a/block/backup.c b/block/backup.c
 index 627f724b68..beb2078696 100644
 --- a/block/backup.c
 +++ b/block/backup.c
 @@ -266,15 +266,17 @@ static void 
 backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
   BlockDriverState *bs = blk_bs(job->common.blk);
   if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
 -    /* Failure, or we don't want to synchronize the bitmap.
 - * Merge the successor back into the parent, delete nothing. */
 +    /* Failure, or we don't want to synchronize the bitmap. */
 +    if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
 +    bdrv_dirty_bitmap_claim(job->sync_bitmap, >copy_bitmap);
 +    }
 +    /* Merge the successor back into the parent. */
   bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>>>
>>> Hmm good, it should work. It's a lot more tricky, than just
>>> "synchronized with remaining blocks to copy", but I'm not sure the we need 
>>> more details in
>>> spec.
>>>
>>> What we have in backup? So, from one hand we have an incremental backup, 
>>> and a bitmap, counting from it.
>>> On the other hand it's not normal incremental backup, as it don't 
>>> correspond to any valid state of vm disk,
>>> and it may be used only as a backing in a chain of further successful 
>>> incremental backup, yes?
>>>
>>> And then I think: with this mode we can not stop on first error, but ignore 
>>> it, just leaving dirty bit for
>>> resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to 
>>> achieve it, but seems it don't
>>> work as expected, as in backup_loop() we retry operation if ret < 0 and  
>>> action != BLOCK_ERROR_ACTION_REPORT.
>>>
>>> And another thought: can user take a decision of discarding (like 
>>> CONDITIONAL) or saving in backing chain (like
>>> ALWAYS) failed backup result _after_ backup job complete? For example, for 
>>> small resulting backup it may be
>>> better to discard it and for large - to save.
>>> Will it work if we start job with ALWAYS mode and autocomplete = false, 
>>> then on fail we can look at job progress,
>>> and if it is small we cancel job, otherwise call complete? Or stop, 
>>> block-job-complete will not work with failure
>>> scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first error 
>>> event decide, cancel or not? But we
>>> can only cancel or continue..
>>>
>>> Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in 
>>> bad relation with what cancel should do,
>>> and in transactions in general...
>>
>> I mean grouped transaction mode, how should it work with this?
> 
> Actual the problem is that you want to implement partial success, and block 
> jobs api and transactions api

Re: [Qemu-devel] [PATCH v5 34/42] block: Inline bdrv_co_block_status_from_*()

2019-06-21 Thread Vladimir Sementsov-Ogievskiy

13.06.2019 1:09, Max Reitz wrote:
> With bdrv_filtered_rw_bs(), we can easily handle this default filter
> behavior in bdrv_co_block_status().
> 
> blkdebug wants to have an additional assertion, so it keeps its own
> implementation, except bdrv_co_block_status_from_file() needs to be
> inlined there.
> 
> Suggested-by: Eric Blake
> Signed-off-by: Max Reitz

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir

1 2 3 >

1 - 100 of 279 matches

Mail list logo