date:20231206

Re: [PATCH v7 1/3] hw/ppc: Add pnv nest pervasive common chiplet model

2023-12-06 Thread Cédric Le Goater


On 12/7/23 03:43, Chalapathi V wrote:

A POWER10 chip is divided into logical pieces called chiplets. Chiplets
are broadly divided into "core chiplets" (with the processor cores) and
"nest chiplets" (with everything else). Each chiplet has an attachment
to the pervasive bus (PIB) and with chiplet-specific registers. All nest
chiplets have a common basic set of registers and This model will provide
the registers functionality for common registers of nest chiplet (Pervasive
Chiplet, PB Chiplet, PCI Chiplets, MC Chiplet, PAU Chiplets)

This commit implement the read/write functions of chiplet control registers.

Signed-off-by: Chalapathi V 


Some minor comments below which would be nice to fix. Anyhow,

Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  include/hw/ppc/pnv_nest_pervasive.h |  32 +
  include/hw/ppc/pnv_xscom.h  |   3 +
  hw/ppc/pnv_nest_pervasive.c | 208 
  hw/ppc/meson.build  |   1 +
  4 files changed, 244 insertions(+)
  create mode 100644 include/hw/ppc/pnv_nest_pervasive.h
  create mode 100644 hw/ppc/pnv_nest_pervasive.c

diff --git a/include/hw/ppc/pnv_nest_pervasive.h 
b/include/hw/ppc/pnv_nest_pervasive.h
new file mode 100644
index 00..270d84ff9d
--- /dev/null
+++ b/include/hw/ppc/pnv_nest_pervasive.h
@@ -0,0 +1,32 @@
+/*
+ * QEMU PowerPC nest pervasive common chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef PPC_PNV_NEST_CHIPLET_PERVASIVE_H
+#define PPC_PNV_NEST_CHIPLET_PERVASIVE_H
+
+#define TYPE_PNV_NEST_CHIPLET_PERVASIVE "pnv-nest-chiplet-pervasive"
+#define PNV_NEST_CHIPLET_PERVASIVE(obj) OBJECT_CHECK(PnvNestChipletPervasive, 
(obj), TYPE_PNV_NEST_CHIPLET_PERVASIVE)
+
+typedef struct PnvPervasiveCtrlRegs {
+#define CPLT_CTRL_SIZE 6
+uint64_t cplt_ctrl[CPLT_CTRL_SIZE];
+uint64_t cplt_cfg0;
+uint64_t cplt_cfg1;
+uint64_t cplt_stat0;
+uint64_t cplt_mask0;
+uint64_t ctrl_protect_mode;
+uint64_t ctrl_atomic_lock;
+} PnvPervasiveCtrlRegs;
+
+typedef struct PnvNestChipletPervasive {
+DeviceState parent;
+MemoryRegionxscom_ctrl_regs_mr;
+PnvPervasiveCtrlRegscontrol_regs;
+} PnvNestChipletPervasive;
+
+#endif /*PPC_PNV_NEST_CHIPLET_PERVASIVE_H */
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index f5becbab41..3e15706dec 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -170,6 +170,9 @@ struct PnvXScomInterfaceClass {
  #define PNV10_XSCOM_XIVE2_BASE 0x2010800
  #define PNV10_XSCOM_XIVE2_SIZE 0x400
  
+#define PNV10_XSCOM_N1_CHIPLET_CTRL_REGS_BASE  0x300

+#define PNV10_XSCOM_CHIPLET_CTRL_REGS_SIZE 0x400
+
  #define PNV10_XSCOM_PEC_NEST_BASE  0x3011800 /* index goes downwards ... */
  #define PNV10_XSCOM_PEC_NEST_SIZE  0x100
  
diff --git a/hw/ppc/pnv_nest_pervasive.c b/hw/ppc/pnv_nest_pervasive.c

new file mode 100644
index 00..886e0d2491
--- /dev/null
+++ b/hw/ppc/pnv_nest_pervasive.c
@@ -0,0 +1,208 @@
+/*
+ * QEMU PowerPC nest pervasive common chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_nest_pervasive.h"
+
+/*
+ * Status, configuration, and control units in POWER chips is provided
+ * by the pervasive subsystem, which connects registers to the SCOM bus,
+ * which can be programmed by processor cores, other units on the chip,
+ * BMCs, or other POWER chips.
+ *
+ * A POWER10 chip is divided into logical pieces called chiplets. Chiplets


pieces -> units ?


+ * are broadly divided into "core chiplets" (with the processor cores) and
+ * "nest chiplets" (with everything else). Each chiplet has an attachment
+ * to the nest_pervasiveasive bus (PIB) and with chiplet-specific registers.


nest_pervasive


+ * All nest chiplets have a common basic set of registers.
+ *
+ * This model will provide the registers fuctionality for common registers of
+ * nest unit (PB Chiplet, PCI Chiplets, MC Chiplet, PAU Chiplets)
+ *
+ * Currently this model provide the read/write fuctionality of chiplet control


functionality


+ * scom registers.
+ */
+
+#define CPLT_CONF0   0x08
+#define CPLT_CONF0_OR0x18
+#define CPLT_CONF0_CLEAR 0x28
+#define CPLT_CONF1   0x09
+#define CPLT_CONF1_OR0x19
+#define CPLT_CONF1_CLEAR 0x29
+#define CPLT_STAT0   0x100
+#define CPLT_MASK0   0x101
+#define CPLT_PROTECT_MODE0x3FE
+#define CPLT_ATOMIC_CLOCK0x3FF
+
+static uint64_t pnv_chiplet_ctrl_read(void *opaque, hwaddr addr, unsigned size)
+{
+PnvNestChipletPervasive *nest_pervasive = PNV_NEST_CHIPLET_PERVASIVE(
+  opaque);
+uint32_t reg = addr >> 3;
+

Re: [PATCH v3 52/70] i386/tdx: handle TDG.VP.VMCALL

2023-12-06 Thread Xiaoyao Li


On 12/1/2023 7:02 PM, Markus Armbruster wrote:

Xiaoyao Li  writes:


From: Isaku Yamahata 

For GetQuote, delegate a request to Quote Generation Service.
Add property "quote-generation-socket" to tdx-guest, whihc is a property
of type SocketAddress to specify Quote Generation Service(QGS).

On request, connect to the QGS, read request buffer from shared guest
memory, send the request buffer to the server and store the response
into shared guest memory and notify TD guest by interrupt.

command line example:
   qemu-system-x86_64 \
 -object '{"qom-type":"tdx-guest","id":"tdx0","quote-generation-socket":{"type": "vsock", 
"cid":"2","port":"1234"}}' \
 -machine confidential-guest-support=tdx0

Signed-off-by: Isaku Yamahata 
Codeveloped-by: Chenyi Qiang 
Signed-off-by: Chenyi Qiang 
Signed-off-by: Xiaoyao Li 
---
Changes in v3:
- rename property "quote-generation-service" to "quote-generation-socket";
- change the type of "quote-generation-socket" from str to
   SocketAddress;
- squash next patch into this one;
---
  qapi/qom.json |   5 +-
  target/i386/kvm/tdx.c | 430 ++
  target/i386/kvm/tdx.h |   6 +
  3 files changed, 440 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index fd99aa1ff8cc..cf36a1832ddd 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -894,13 +894,16 @@
  #
  # @mrownerconfig: base64 MROWNERCONFIG SHA384 digest
  #
+# @quote-generation-socket: socket address for Quote Generation Service(QGS)
+#


Long line.  Better:

# @quote-generation-socket: socket address for Quote Generation
# Service(QGS)


May I ask what's the limitation for qom.json? if 80 columns limitation 
doesn't apply to it.



  # Since: 8.2
  ##
  { 'struct': 'TdxGuestProperties',
'data': { '*sept-ve-disable': 'bool',
  '*mrconfigid': 'str',
  '*mrowner': 'str',
-'*mrownerconfig': 'str' } }
+'*mrownerconfig': 'str',
+'*quote-generation-socket': 'SocketAddress' } }
  
  ##

  # @ThreadContextProperties:

Re: [PATCH v3 18/70] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES

2023-12-06 Thread Xiaoyao Li


On 11/15/2023 6:54 PM, Daniel P. Berrangé wrote:

+static int tdx_ioctl_internal(void *state, enum tdx_ioctl_level level, int 
cmd_id,
+__u32 flags, void *data)
+{
+struct kvm_tdx_cmd tdx_cmd;

Add   ' = {}'  to initialize to all-zeros, avoiding the explicit
memset call


thanks for the suggestion. Will do it in the next version.

Re: [PATCH v3 18/70] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES

2023-12-06 Thread Xiaoyao Li


On 11/18/2023 5:18 AM, Isaku Yamahata wrote:

On Wed, Nov 15, 2023 at 02:14:27AM -0500,
Xiaoyao Li  wrote:


KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.

Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.

Besides, introduce the interfaces to invoke TDX "ioctls" at different
scope (KVM, VM and VCPU) in preparation.

Signed-off-by: Xiaoyao Li 
---
Changes in v3:
- rename __tdx_ioctl() to tdx_ioctl_internal()
- Pass errp in get_tdx_capabilities();

changes in v2:
   - Make the error message more clear;

changes in v1:
   - start from nr_cpuid_configs = 6 for the loop;
   - stop the loop when nr_cpuid_configs exceeds KVM_MAX_CPUID_ENTRIES;
---
  target/i386/kvm/kvm.c  |   2 -
  target/i386/kvm/kvm_i386.h |   2 +
  target/i386/kvm/tdx.c  | 102 -
  3 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7abcdebb1452..28e60c5ea4a7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1687,8 +1687,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
  
  static Error *invtsc_mig_blocker;
  
-#define KVM_MAX_CPUID_ENTRIES  100

-
  static void kvm_init_xsave(CPUX86State *env)
  {
  if (has_xsave2) {
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 55fb25fa8e2e..c3ef46a97a7b 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -13,6 +13,8 @@
  
  #include "sysemu/kvm.h"
  
+#define KVM_MAX_CPUID_ENTRIES  100

+
  #ifdef CONFIG_KVM
  
  #define kvm_pit_in_kernel() \

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 621a05beeb4e..cb0040187b27 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,17 +12,117 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/error-report.h"
  #include "qapi/error.h"
  #include "qom/object_interfaces.h"
+#include "sysemu/kvm.h"
  
  #include "hw/i386/x86.h"

+#include "kvm_i386.h"
  #include "tdx.h"
  
+static struct kvm_tdx_capabilities *tdx_caps;

+
+enum tdx_ioctl_level{
+TDX_PLATFORM_IOCTL,
+TDX_VM_IOCTL,
+TDX_VCPU_IOCTL,
+};
+
+static int tdx_ioctl_internal(void *state, enum tdx_ioctl_level level, int 
cmd_id,
+__u32 flags, void *data)
+{
+struct kvm_tdx_cmd tdx_cmd;
+int r;
+
+memset(_cmd, 0x0, sizeof(tdx_cmd));
+
+tdx_cmd.id = cmd_id;
+tdx_cmd.flags = flags;
+tdx_cmd.data = (__u64)(unsigned long)data;
+
+switch (level) {
+case TDX_PLATFORM_IOCTL:
+r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, _cmd);
+break;
+case TDX_VM_IOCTL:
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, _cmd);
+break;
+case TDX_VCPU_IOCTL:
+r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, _cmd);
+break;
+default:
+error_report("Invalid tdx_ioctl_level %d", level);
+exit(1);
+}
+
+return r;
+}
+
+static inline int tdx_platform_ioctl(int cmd_id, __u32 flags, void *data)
+{
+return tdx_ioctl_internal(NULL, TDX_PLATFORM_IOCTL, cmd_id, flags, data);
+}
+
+static inline int tdx_vm_ioctl(int cmd_id, __u32 flags, void *data)
+{
+return tdx_ioctl_internal(NULL, TDX_VM_IOCTL, cmd_id, flags, data);
+}
+
+static inline int tdx_vcpu_ioctl(void *vcpu_fd, int cmd_id, __u32 flags,
+ void *data)
+{
+return  tdx_ioctl_internal(vcpu_fd, TDX_VCPU_IOCTL, cmd_id, flags, data);
+}


As all of ioctl variants aren't used yet, we can split out them. 


No. tdx_vm_ioctl() is used right below.

I can remove the tdx_platform_ioctl() because its sole user, 
KVM_TDX_CAPABILITIES, changed to vm scope.



An independent
patch to define ioctl functions.



+
+static int get_tdx_capabilities(Error **errp)
+{
+struct kvm_tdx_capabilities *caps;
+/* 1st generation of TDX reports 6 cpuid configs */
+int nr_cpuid_configs = 6;
+size_t size;
+int r;
+
+do {
+size = sizeof(struct kvm_tdx_capabilities) +
+   nr_cpuid_configs * sizeof(struct kvm_tdx_cpuid_config);
+caps = g_malloc0(size);
+caps->nr_cpuid_configs = nr_cpuid_configs;
+
+r = tdx_vm_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
+if (r == -E2BIG) {
+g_free(caps);
+nr_cpuid_configs *= 2;


g_realloc()?  Maybe a matter of preference.


I would like to keep the current code unless strong objection from 
maintainers.



Other than this, it looks good to me.

Re: [PATCH v3] crypto: Introduce SM4 symmetric cipher algorithm

2023-12-06 Thread Yong Huang

On Thu, Dec 7, 2023 at 1:41 AM Daniel P. Berrangé 
wrote:

> On Wed, Nov 29, 2023 at 11:17:49PM +0800, Hyman Huang wrote:
> > Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).
> >
> > SM4 (GBT.32907-2016) is a cryptographic standard issued by the
> > Organization of State Commercial Administration of China (OSCCA)
> > as an authorized cryptographic algorithms for the use within China.
> >
> > Use the crypto-sm4 meson build option to explicitly control the
> > feature, which would be detected by default.
> >
> > Signed-off-by: Hyman Huang 
> > ---
> >  crypto/block-luks.c | 11 
> >  crypto/cipher-gcrypt.c.inc  |  8 ++
> >  crypto/cipher-nettle.c.inc  | 49 +
> >  crypto/cipher.c |  6 
> >  meson.build | 42 
> >  meson_options.txt   |  2 ++
> >  qapi/crypto.json|  5 +++-
> >  scripts/meson-buildoptions.sh   |  3 ++
> >  tests/unit/test-crypto-cipher.c | 13 +
> >  9 files changed, 138 insertions(+), 1 deletion(-)
> >
>
> > diff --git a/meson.build b/meson.build
> > index ec01f8b138..765f9c9f50 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1480,6 +1480,7 @@ endif
> >  gcrypt = not_found
> >  nettle = not_found
> >  hogweed = not_found
> > +crypto_sm4 = not_found
> >  xts = 'none'
> >
> >  if get_option('nettle').enabled() and get_option('gcrypt').enabled()
> > @@ -1505,6 +1506,28 @@ if not gnutls_crypto.found()
> >   cc.find_library('gpg-error', required: true)],
> >  version: gcrypt.version())
> >  endif
> > +crypto_sm4 = gcrypt
> > +# SM4 ALG is available in libgcrypt >= 1.9
> > +if gcrypt.found() and not cc.links('''
> > +  #include 
> > +  int main(void) {
> > +gcry_cipher_hd_t handler;
> > +gcry_cipher_open(, GCRY_CIPHER_SM4,
> GCRY_CIPHER_MODE_ECB, 0);
> > +return 0;
> > +  }''', dependencies: gcrypt)
> > +  crypto_sm4 = not_found
> > +  if get_option('crypto_sm4').enabled()
> > +error('could not link sm4')
> > +  else
> > +warning('could not link sm4, disabling')
> > +  endif
>
> IMHO we don't need to have an option for 'crypto_sm4', just
> silently disable it if not present in the host provideed
> library.


OK, I don't insist on that and I'll drop the option in the next version.

>
>
> +endif
> > +if crypto_sm4.found() and get_option('prefer_static')
> > +  crypto_sm4 = declare_dependency(dependencies: [
> > +gcrypt,
> > +cc.find_library('gpg-error', required: true)],
> > +version: gcrypt.version())
> > +endif
>
> This last if/endif block is redundant. We already have earlier
> logic that detects gpg-error, and we never use the 'crypto_sm4'
> object after this point anyway
>
> >endif
> >if (not get_option('nettle').auto() or have_system) and not
> gcrypt.found()
> >  nettle = dependency('nettle', version: '>=3.4',
> > @@ -1513,6 +1536,23 @@ if not gnutls_crypto.found()
> >  if nettle.found() and not cc.has_header('nettle/xts.h',
> dependencies: nettle)
> >xts = 'private'
> >  endif
> > +crypto_sm4 = nettle
> > +# SM4 ALG is available in nettle >= 3.9
> > +if nettle.found() and not cc.links('''
> > +  #include 
> > +  int main(void) {
> > +struct sm4_ctx ctx;
> > +unsigned char key[16] = {0};
> > +sm4_set_encrypt_key(, key);
> > +return 0;
> > +  }''', dependencies: nettle)
> > +  crypto_sm4 = not_found
> > +  if get_option('crypto_sm4').enabled()
> > +error('could not link sm4')
> > +  else
> > +warning('could not link sm4, disabling')
> > +  endif
>
> Likewise no need for an option, just silently disable it.
>
> > +endif
> >endif
> >  endif
> >
> > @@ -2199,6 +2239,7 @@ config_host_data.set('CONFIG_GNUTLS_CRYPTO',
> gnutls_crypto.found())
> >  config_host_data.set('CONFIG_TASN1', tasn1.found())
> >  config_host_data.set('CONFIG_GCRYPT', gcrypt.found())
> >  config_host_data.set('CONFIG_NETTLE', nettle.found())
> > +config_host_data.set('CONFIG_CRYPTO_SM4', crypto_sm4.found())
> >  config_host_data.set('CONFIG_HOGWEED', hogweed.found())
> >  config_host_data.set('CONFIG_QEMU_PRIVATE_XTS', xts == 'private')
> >  config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
> > @@ -4273,6 +4314,7 @@ summary_info += {'nettle':nettle}
> >  if nettle.found()
> > summary_info += {'  XTS': xts != 'private'}
> >  endif
> > +summary_info += {'SM4 ALG support':   crypto_sm4}
> >  summary_info += {'AF_ALG support':have_afalg}
> >  summary_info += {'rng-none':  get_option('rng_none')}
> >  summary_info += {'Linux keyring': have_keyring}
> > diff --git a/meson_options.txt b/meson_options.txt
> > index c9baeda639..db8de4ec5b 100644
> > --- a/meson_options.txt
> > +++ b/meson_options.txt
> > @@ -172,6 +172,8 @@ option('nettle',

Re: [PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2023-12-06 Thread Shaoqin Huang


Hi Gavin,

On 12/1/23 13:37, Gavin Shan wrote:

Hi Shaoqin,

On 11/29/23 14:08, Shaoqin Huang wrote:

The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide
which PMU events are provided to the guest. Add a new option
`pmu-filter` as -accel sub-option to set the PMU Event Filtering.
Without the filter, the KVM will expose all events from the host to
guest by default.

The `pmu-filter` has such format:

   pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"

The A means "allow" and D means "deny", start is the first event of the
range and the end is the last one. The first registered range defines
the global policy(global ALLOW if the first @action is DENY, global DENY
if the first @action is ALLOW). The start and end only support hex
format now. For example:

   pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"

Since the first action is allow, we have a global deny policy. It
will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
also allowed except the event 0x30 is denied, and all the other events
are disallowed.

Here is an real example shows how to use the PMU Event Filtering, when
we launch a guest by use kvm, add such command line:

   # qemu-system-aarch64 \
-accel kvm,pmu-filter="D:0x11-0x11"

Since the first action is deny, we have a global allow policy. This
disables the filtering of the cycle counter (event 0x11 being 
CPU_CYCLES).


And then in guest, use the perf to count the cycle:

   # perf stat sleep 1

    Performance counter stats for 'sleep 1':

   1.22 msec task-clock   #    0.001 
CPUs utilized

  1  context-switches #  820.695 /sec
  0  cpu-migrations   #    0.000 /sec
 55  page-faults  #   45.138 
K/sec

      cycles
    1128954  instructions
 227031  branches #  186.323 
M/sec
   8686  branch-misses    #    3.83% 
of all branches


    1.002492480 seconds time elapsed

    0.001752000 seconds user
    0.0 seconds sys

As we can see, the cycle counter has been disabled in the guest, but
other pmu events are still work.

Signed-off-by: Shaoqin Huang 
---
v2->v3:
   - Improve commits message, use kernel doc wording, add more 
explaination on

 filter example, fix some typo error.    [Eric]
   - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. 
[Eric]

   - Add more precise error message report.  [Eric]
   - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER 
support in

 KVM.    [Eric]

v1->v2:
   - Add more description for allow and deny meaning in
 commit message. [Sebastian]
   - Small improvement.  [Sebastian]

v2: 
https://lore.kernel.org/all/20231117060838.39723-1-shahu...@redhat.com/
v1: 
https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/

---
  include/sysemu/kvm_int.h |  1 +
  qemu-options.hx  | 21 +
  target/arm/kvm.c | 23 ++
  target/arm/kvm64.c   | 68 
  4 files changed, 113 insertions(+)

diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index fd846394be..8f4601474f 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -120,6 +120,7 @@ struct KVMState
  uint32_t xen_caps;
  uint16_t xen_gnttab_max_frames;
  uint16_t xen_evtchn_max_pirq;
+    char *kvm_pmu_filter;
  };
  void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/qemu-options.hx b/qemu-options.hx
index 42fd09e4de..8b721d6668 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
  "    tb-size=n (TCG translation block cache size)\n"
  "    dirty-ring-size=n (KVM dirty ring GFN count, 
default 0)\n"
  "    eager-split-size=n (KVM Eager Page Split chunk 
size, default 0, disabled. ARM only)\n"
+    "    pmu-filter={A,D}:start-end[;...] (KVM PMU Event 
Filter, default no filter. ARM only)\n"

   ^^^

Potential alignment issue, or the email isn't shown for me correctly.
Besides, why not follow the pattern in the commit log, which is nicer
than what's of being:

pmu-filter={A,D}:start-end[;...]

to

pmu-filter="{A,D}:start-end[;{A,D}:start-end...]



Ok. I can replace it with the better format.

  "
notify-vmexit=run|internal-error|disable,notify-window=n (enable 
notify VM exit and set notify window, x86 only)\n"
  "    thread=single|multi (enable multi-threaded 
TCG)\n", QEMU_ARCH_ALL)

  SRST
@@ -259,6 +260,26 @@ SRST
  impact on the memory. By default, this feature is disabled
  (eager-split-size=0).
+

Re: [PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2023-12-06 Thread Shaoqin Huang





On 12/1/23 00:55, Sebastian Ott wrote:

On Tue, 28 Nov 2023, Shaoqin Huang wrote:

+static void kvm_arm_pmu_filter_init(CPUState *cs)
+{
+    static bool pmu_filter_init = false;
+    struct kvm_pmu_event_filter filter;
+    struct kvm_device_attr attr = {
+    .group  = KVM_ARM_VCPU_PMU_V3_CTRL,
+    .attr   = KVM_ARM_VCPU_PMU_V3_FILTER,
+    .addr   = (uint64_t),
+    };
+    KVMState *kvm_state = cs->kvm_state;
+    char *tmp;
+    char *str, act;
+
+    if (!kvm_state->kvm_pmu_filter)
+    return;
+
+    if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
+    error_report("The kernel doesn't support the pmu event 
filter!\n");

+    abort();
+    }
+
+    /* The filter only needs to be initialized for 1 vcpu. */
+    if (!pmu_filter_init)
+    pmu_filter_init = true;


Imho this is missing an else to bail out. Or the shorter version

 if (pmu_filter_init)
     return;

 pmu_filter_init = true;



Yes. This is what I want to do. Thanks for fixing it.


which could also move above the other tests.

Sebastian



--
Shaoqin

[PATCH v7 1/3] hw/ppc: Add pnv nest pervasive common chiplet model

2023-12-06 Thread Chalapathi V

A POWER10 chip is divided into logical pieces called chiplets. Chiplets
are broadly divided into "core chiplets" (with the processor cores) and
"nest chiplets" (with everything else). Each chiplet has an attachment
to the pervasive bus (PIB) and with chiplet-specific registers. All nest
chiplets have a common basic set of registers and This model will provide
the registers functionality for common registers of nest chiplet (Pervasive
Chiplet, PB Chiplet, PCI Chiplets, MC Chiplet, PAU Chiplets)

This commit implement the read/write functions of chiplet control registers.

Signed-off-by: Chalapathi V 
---
 include/hw/ppc/pnv_nest_pervasive.h |  32 +
 include/hw/ppc/pnv_xscom.h  |   3 +
 hw/ppc/pnv_nest_pervasive.c | 208 
 hw/ppc/meson.build  |   1 +
 4 files changed, 244 insertions(+)
 create mode 100644 include/hw/ppc/pnv_nest_pervasive.h
 create mode 100644 hw/ppc/pnv_nest_pervasive.c

diff --git a/include/hw/ppc/pnv_nest_pervasive.h 
b/include/hw/ppc/pnv_nest_pervasive.h
new file mode 100644
index 00..270d84ff9d
--- /dev/null
+++ b/include/hw/ppc/pnv_nest_pervasive.h
@@ -0,0 +1,32 @@
+/*
+ * QEMU PowerPC nest pervasive common chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef PPC_PNV_NEST_CHIPLET_PERVASIVE_H
+#define PPC_PNV_NEST_CHIPLET_PERVASIVE_H
+
+#define TYPE_PNV_NEST_CHIPLET_PERVASIVE "pnv-nest-chiplet-pervasive"
+#define PNV_NEST_CHIPLET_PERVASIVE(obj) OBJECT_CHECK(PnvNestChipletPervasive, 
(obj), TYPE_PNV_NEST_CHIPLET_PERVASIVE)
+
+typedef struct PnvPervasiveCtrlRegs {
+#define CPLT_CTRL_SIZE 6
+uint64_t cplt_ctrl[CPLT_CTRL_SIZE];
+uint64_t cplt_cfg0;
+uint64_t cplt_cfg1;
+uint64_t cplt_stat0;
+uint64_t cplt_mask0;
+uint64_t ctrl_protect_mode;
+uint64_t ctrl_atomic_lock;
+} PnvPervasiveCtrlRegs;
+
+typedef struct PnvNestChipletPervasive {
+DeviceState parent;
+MemoryRegionxscom_ctrl_regs_mr;
+PnvPervasiveCtrlRegscontrol_regs;
+} PnvNestChipletPervasive;
+
+#endif /*PPC_PNV_NEST_CHIPLET_PERVASIVE_H */
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index f5becbab41..3e15706dec 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -170,6 +170,9 @@ struct PnvXScomInterfaceClass {
 #define PNV10_XSCOM_XIVE2_BASE 0x2010800
 #define PNV10_XSCOM_XIVE2_SIZE 0x400
 
+#define PNV10_XSCOM_N1_CHIPLET_CTRL_REGS_BASE  0x300
+#define PNV10_XSCOM_CHIPLET_CTRL_REGS_SIZE 0x400
+
 #define PNV10_XSCOM_PEC_NEST_BASE  0x3011800 /* index goes downwards ... */
 #define PNV10_XSCOM_PEC_NEST_SIZE  0x100
 
diff --git a/hw/ppc/pnv_nest_pervasive.c b/hw/ppc/pnv_nest_pervasive.c
new file mode 100644
index 00..886e0d2491
--- /dev/null
+++ b/hw/ppc/pnv_nest_pervasive.c
@@ -0,0 +1,208 @@
+/*
+ * QEMU PowerPC nest pervasive common chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_nest_pervasive.h"
+
+/*
+ * Status, configuration, and control units in POWER chips is provided
+ * by the pervasive subsystem, which connects registers to the SCOM bus,
+ * which can be programmed by processor cores, other units on the chip,
+ * BMCs, or other POWER chips.
+ *
+ * A POWER10 chip is divided into logical pieces called chiplets. Chiplets
+ * are broadly divided into "core chiplets" (with the processor cores) and
+ * "nest chiplets" (with everything else). Each chiplet has an attachment
+ * to the nest_pervasiveasive bus (PIB) and with chiplet-specific registers.
+ * All nest chiplets have a common basic set of registers.
+ *
+ * This model will provide the registers fuctionality for common registers of
+ * nest unit (PB Chiplet, PCI Chiplets, MC Chiplet, PAU Chiplets)
+ *
+ * Currently this model provide the read/write fuctionality of chiplet control
+ * scom registers.
+ */
+
+#define CPLT_CONF0   0x08
+#define CPLT_CONF0_OR0x18
+#define CPLT_CONF0_CLEAR 0x28
+#define CPLT_CONF1   0x09
+#define CPLT_CONF1_OR0x19
+#define CPLT_CONF1_CLEAR 0x29
+#define CPLT_STAT0   0x100
+#define CPLT_MASK0   0x101
+#define CPLT_PROTECT_MODE0x3FE
+#define CPLT_ATOMIC_CLOCK0x3FF
+
+static uint64_t pnv_chiplet_ctrl_read(void *opaque, hwaddr addr, unsigned size)
+{
+PnvNestChipletPervasive *nest_pervasive = PNV_NEST_CHIPLET_PERVASIVE(
+  opaque);
+uint32_t reg = addr >> 3;
+uint64_t val = ~0ull;
+
+/* CPLT_CTRL0 to CPLT_CTRL5 */
+for (int i = 0; i < CPLT_CTRL_SIZE; i++) {
+if (reg == i) {
+return nest_pervasive->control_regs.cplt_ctrl[i];
+} else if ((reg == (i

[PATCH v7 3/3] hw/ppc: N1 chiplet wiring

2023-12-06 Thread Chalapathi V

This part of the patchset connects the nest1 chiplet model to p10 chip.

Signed-off-by: Chalapathi V 
---
 include/hw/ppc/pnv_chip.h |  2 ++
 hw/ppc/pnv.c  | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/hw/ppc/pnv_chip.h b/include/hw/ppc/pnv_chip.h
index 0ab5c42308..9b06c8d87c 100644
--- a/include/hw/ppc/pnv_chip.h
+++ b/include/hw/ppc/pnv_chip.h
@@ -4,6 +4,7 @@
 #include "hw/pci-host/pnv_phb4.h"
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_homer.h"
+#include "hw/ppc/pnv_n1_chiplet.h"
 #include "hw/ppc/pnv_lpc.h"
 #include "hw/ppc/pnv_occ.h"
 #include "hw/ppc/pnv_psi.h"
@@ -113,6 +114,7 @@ struct Pnv10Chip {
 PnvOCC   occ;
 PnvSBE   sbe;
 PnvHomer homer;
+PnvN1Chiplet n1_chiplet;
 
 uint32_t nr_quads;
 PnvQuad  *quads;
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0297871bdd..be3e922644 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1680,6 +1680,8 @@ static void pnv_chip_power10_instance_init(Object *obj)
 object_initialize_child(obj, "occ",  >occ, TYPE_PNV10_OCC);
 object_initialize_child(obj, "sbe",  >sbe, TYPE_PNV10_SBE);
 object_initialize_child(obj, "homer", >homer, TYPE_PNV10_HOMER);
+object_initialize_child(obj, "n1-chiplet", >n1_chiplet,
+TYPE_PNV_N1_CHIPLET);
 
 chip->num_pecs = pcc->num_pecs;
 
@@ -1849,6 +1851,19 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
 memory_region_add_subregion(get_system_memory(), PNV10_HOMER_BASE(chip),
 >homer.regs);
 
+/* N1 chiplet */
+if (!qdev_realize(DEVICE(>n1_chiplet), NULL, errp)) {
+return;
+}
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_CHIPLET_CTRL_REGS_BASE,
+ >n1_chiplet.nest_pervasive.xscom_ctrl_regs_mr);
+
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_PB_SCOM_EQ_BASE,
+   >n1_chiplet.xscom_pb_eq_mr);
+
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_PB_SCOM_ES_BASE,
+   >n1_chiplet.xscom_pb_es_mr);
+
 /* PHBs */
 pnv_chip_power10_phb_realize(chip, _err);
 if (local_err) {
-- 
2.31.1

[PATCH v7 2/3] hw/ppc: Add N1 chiplet model

2023-12-06 Thread Chalapathi V

The N1 chiplet handle the high speed i/o traffic over PCIe and others.
The N1 chiplet consists of PowerBus Fabric controller,
nest Memory Management Unit, chiplet control unit and more.

This commit creates a N1 chiplet model and initialize and realize the
pervasive chiplet model where chiplet control registers are implemented.

This commit also implement the read/write method for the powerbus scom
registers

Signed-off-by: Chalapathi V 
---
 include/hw/ppc/pnv_n1_chiplet.h |  33 ++
 include/hw/ppc/pnv_xscom.h  |   6 ++
 hw/ppc/pnv_n1_chiplet.c | 173 
 hw/ppc/meson.build  |   1 +
 4 files changed, 213 insertions(+)
 create mode 100644 include/hw/ppc/pnv_n1_chiplet.h
 create mode 100644 hw/ppc/pnv_n1_chiplet.c

diff --git a/include/hw/ppc/pnv_n1_chiplet.h b/include/hw/ppc/pnv_n1_chiplet.h
new file mode 100644
index 00..d5177bebce
--- /dev/null
+++ b/include/hw/ppc/pnv_n1_chiplet.h
@@ -0,0 +1,33 @@
+/*
+ * QEMU PowerPC N1 chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef PPC_PNV_N1_CHIPLET_H
+#define PPC_PNV_N1_CHIPLET_H
+
+#include "hw/ppc/pnv_nest_pervasive.h"
+
+#define TYPE_PNV_N1_CHIPLET "pnv-N1-chiplet"
+#define PNV_N1_CHIPLET(obj) OBJECT_CHECK(PnvN1Chiplet, (obj), 
TYPE_PNV_N1_CHIPLET)
+
+typedef struct PnvPbScom {
+uint64_t mode;
+uint64_t hp_mode2_curr;
+} PnvPbScom;
+
+typedef struct PnvN1Chiplet {
+DeviceState parent;
+MemoryRegion xscom_pb_eq_mr;
+MemoryRegion xscom_pb_es_mr;
+/* common pervasive chiplet unit */
+PnvNestChipletPervasive nest_pervasive;
+#define PB_SCOM_EQ_SIZE 8
+PnvPbScom eq[PB_SCOM_EQ_SIZE];
+#define PB_SCOM_ES_SIZE 4
+PnvPbScom es[PB_SCOM_ES_SIZE];
+} PnvN1Chiplet;
+#endif /*PPC_PNV_N1_CHIPLET_H */
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index 3e15706dec..535ae1dab0 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -173,6 +173,12 @@ struct PnvXScomInterfaceClass {
 #define PNV10_XSCOM_N1_CHIPLET_CTRL_REGS_BASE  0x300
 #define PNV10_XSCOM_CHIPLET_CTRL_REGS_SIZE 0x400
 
+#define PNV10_XSCOM_N1_PB_SCOM_EQ_BASE  0x3011000
+#define PNV10_XSCOM_N1_PB_SCOM_EQ_SIZE  0x200
+
+#define PNV10_XSCOM_N1_PB_SCOM_ES_BASE  0x3011300
+#define PNV10_XSCOM_N1_PB_SCOM_ES_SIZE  0x100
+
 #define PNV10_XSCOM_PEC_NEST_BASE  0x3011800 /* index goes downwards ... */
 #define PNV10_XSCOM_PEC_NEST_SIZE  0x100
 
diff --git a/hw/ppc/pnv_n1_chiplet.c b/hw/ppc/pnv_n1_chiplet.c
new file mode 100644
index 00..03ff9fbad0
--- /dev/null
+++ b/hw/ppc/pnv_n1_chiplet.c
@@ -0,0 +1,173 @@
+/*
+ * QEMU PowerPC N1 chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_n1_chiplet.h"
+#include "hw/ppc/pnv_nest_pervasive.h"
+
+/*
+ * The n1 chiplet contains chiplet control unit,
+ * PowerBus/RaceTrack/Bridge logic, nest Memory Management Unit(nMMU)
+ * and more.
+ *
+ * In this model Nest1 chiplet control registers are modelled via common
+ * nest pervasive model and few PowerBus racetrack registers are modelled.
+ */
+
+#define PB_SCOM_EQ0_HP_MODE2_CURR  0xe
+#define PB_SCOM_ES3_MODE   0x8a
+
+static uint64_t pnv_n1_chiplet_pb_scom_eq_read(void *opaque, hwaddr addr,
+  unsigned size)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque);
+uint32_t reg = addr >> 3;
+uint64_t val = ~0ull;
+
+switch (reg) {
+case PB_SCOM_EQ0_HP_MODE2_CURR:
+val = n1_chiplet->eq[0].hp_mode2_curr;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom read at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+return val;
+}
+
+static void pnv_n1_chiplet_pb_scom_eq_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque);
+uint32_t reg = addr >> 3;
+
+switch (reg) {
+case PB_SCOM_EQ0_HP_MODE2_CURR:
+n1_chiplet->eq[0].hp_mode2_curr = val;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom write at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+}
+
+static const MemoryRegionOps pnv_n1_chiplet_pb_scom_eq_ops = {
+.read = pnv_n1_chiplet_pb_scom_eq_read,
+.write = pnv_n1_chiplet_pb_scom_eq_write,
+.valid.min_access_size = 8,
+.valid.max_access_size = 8,
+.impl.min_access_size = 8,
+.impl.max_access_size = 8,
+.endianness = DEVICE_BIG_ENDIAN,
+};
+
+static uint64_t pnv_n1_chiplet_pb_scom_es_read(void *opaque, hwaddr addr,
+  unsigned size)
+{
+PnvN1Chiplet *n1_chiplet =

[PATCH v7 0/3] pnv N1 chiplet model

2023-12-06 Thread Chalapathi V

Hello,

Thank you for the review and suggestions on V6.

The suggestions and changes requested from V6 are addressed in V7.

Updates in Version 7 of this series are: 
There are no major changes done in revision 7 from revision 6. 
1. The property "parent-obj-name" is removed from nest pervasive chiplet
   model as it is not required.
2. nest pervasive chiplet model is initialized from instance_init handler.

The new qom-tree looks like below.
(qemu) info qom-tree 
/machine (powernv10-machine)
  /chip[0] (power10_v2.0-pnv-chip)
/n1-chiplet (pnv-N1-chiplet)
  /nest-pervasive-common (pnv-nest-chiplet-pervasive)
/pervasive-control[0] (memory-region)
  /xscom-n1-chiplet-pb-scom-eq[0] (memory-region)
  /xscom-n1-chiplet-pb-scom-es[0] (memory-region)

Patches overview in V7.
PATCH1: Create a common nest pervasive chiplet model with control chiplet scom
registers.
PATCH2: Create a N1 chiplet model and implement powerbus scom registers.
Connect common nest pervasive model to N1 chiplet model to define
chiplet control scoms for N1 chiplet.
PATCH3: Connect N1 chiplet model to p10 chip.

Test covered:
Ran make check && make check-avocado and found no obvious issues.

Thank You,
Chalapathi

Chalapathi V (3):
  hw/ppc: Add pnv nest pervasive common chiplet model
  hw/ppc: Add N1 chiplet model
  hw/ppc: N1 chiplet wiring

 include/hw/ppc/pnv_chip.h   |   2 +
 include/hw/ppc/pnv_n1_chiplet.h |  33 +
 include/hw/ppc/pnv_nest_pervasive.h |  32 +
 include/hw/ppc/pnv_xscom.h  |   9 ++
 hw/ppc/pnv.c|  15 ++
 hw/ppc/pnv_n1_chiplet.c | 173 +++
 hw/ppc/pnv_nest_pervasive.c | 208 
 hw/ppc/meson.build  |   2 +
 8 files changed, 474 insertions(+)
 create mode 100644 include/hw/ppc/pnv_n1_chiplet.h
 create mode 100644 include/hw/ppc/pnv_nest_pervasive.h
 create mode 100644 hw/ppc/pnv_n1_chiplet.c
 create mode 100644 hw/ppc/pnv_nest_pervasive.c

-- 
2.31.1

[PULL 0/2] Pull request for some I2C changes

2023-12-06 Thread minyard

From: Corey Minyard 

Some minor I2C changes.  One has been sitting in my queue forgotten for
a while, but still needs to go in.  The other is fairly recent.  Both
are for BMC related stuff.

These are available at:
   https://github.com/cminyard/qemu.git i2c-for-release-2023-12-06
and signed by me.

Potin Lai (1):
  hw/i2c: add pca9543 i2c-mux switch

Titus Rwantare (1):
  hw/sensor: enable setting adm1272 temperature with qmp

 hw/i2c/i2c_mux_pca954x.c | 12 
 hw/sensor/adm1272.c  | 27 ++-
 include/hw/i2c/i2c_mux_pca954x.h |  1 +
 tests/qtest/adm1272-test.c   | 28 +++-
 4 files changed, 66 insertions(+), 2 deletions(-)

-- 
2.34.1

[PULL 1/2] hw/sensor: enable setting adm1272 temperature with qmp

2023-12-06 Thread minyard

From: Titus Rwantare 

Reviewed-by: Patrick Venture 
Reviewed-by: Chris Rauer 
Reviewed-by: Hao Wu 
Signed-off-by: Titus Rwantare 
Message-Id: <20220106173814.3580141-1-vent...@google.com>
Signed-off-by: Corey Minyard 
---
 hw/sensor/adm1272.c| 27 ++-
 tests/qtest/adm1272-test.c | 28 +++-
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/hw/sensor/adm1272.c b/hw/sensor/adm1272.c
index 8f4a1c2cd4..6e5ae6e63b 100644
--- a/hw/sensor/adm1272.c
+++ b/hw/sensor/adm1272.c
@@ -65,6 +65,7 @@
 #define ADM1272_VOLTAGE_COEFF_DEFAULT   1
 #define ADM1272_CURRENT_COEFF_DEFAULT   3
 #define ADM1272_PWR_COEFF_DEFAULT   7
+#define ADM1272_TEMP_COEFF_DEFAULT  8
 #define ADM1272_IOUT_OFFSET 0x5000
 #define ADM1272_IOUT_OFFSET 0x5000
 
@@ -185,6 +186,22 @@ static uint32_t adm1272_direct_to_watts(uint16_t value)
 return pmbus_direct_mode2data(c, value);
 }
 
+static uint16_t adm1272_millidegrees_to_direct(uint32_t value)
+{
+PMBusCoefficients c = adm1272_coefficients[ADM1272_TEMP_COEFF_DEFAULT];
+c.b = c.b * 1000;
+c.R = c.R - 3;
+return pmbus_data2direct_mode(c, value);
+}
+
+static uint32_t adm1272_direct_to_millidegrees(uint16_t value)
+{
+PMBusCoefficients c = adm1272_coefficients[ADM1272_TEMP_COEFF_DEFAULT];
+c.b = c.b * 1000;
+c.R = c.R - 3;
+return pmbus_direct_mode2data(c, value);
+}
+
 static void adm1272_exit_reset(Object *obj)
 {
 ADM1272State *s = ADM1272(obj);
@@ -219,7 +236,7 @@ static void adm1272_exit_reset(Object *obj)
 = adm1272_millivolts_to_direct(ADM1272_VOLT_DEFAULT);
 pmdev->pages[0].read_iout
 = adm1272_milliamps_to_direct(ADM1272_IOUT_DEFAULT);
-pmdev->pages[0].read_temperature_1 = 0;
+pmdev->pages[0].read_temperature_1 = adm1272_millidegrees_to_direct(3);
 pmdev->pages[0].read_pin = adm1272_watts_to_direct(ADM1272_PWR_DEFAULT);
 pmdev->pages[0].revision = ADM1272_PMBUS_REVISION_DEFAULT;
 pmdev->pages[0].mfr_id = ADM1272_MFR_ID_DEFAULT;
@@ -422,6 +439,8 @@ static void adm1272_get(Object *obj, Visitor *v, const char 
*name, void *opaque,
 value = adm1272_direct_to_milliamps(*(uint16_t *)opaque);
 } else if (strcmp(name, "pin") == 0) {
 value = adm1272_direct_to_watts(*(uint16_t *)opaque);
+} else if (strcmp(name, "temperature") == 0) {
+value = adm1272_direct_to_millidegrees(*(uint16_t *)opaque);
 } else {
 value = *(uint16_t *)opaque;
 }
@@ -446,6 +465,8 @@ static void adm1272_set(Object *obj, Visitor *v, const char 
*name, void *opaque,
 *internal = adm1272_milliamps_to_direct(value);
 } else if (strcmp(name, "pin") == 0) {
 *internal = adm1272_watts_to_direct(value);
+} else if (strcmp(name, "temperature") == 0) {
+*internal = adm1272_millidegrees_to_direct(value);
 } else {
 *internal = value;
 }
@@ -509,6 +530,10 @@ static void adm1272_init(Object *obj)
 adm1272_get,
 adm1272_set, NULL, >pages[0].read_pin);
 
+object_property_add(obj, "temperature", "uint16",
+adm1272_get,
+adm1272_set, NULL, 
>pages[0].read_temperature_1);
+
 }
 
 static void adm1272_class_init(ObjectClass *klass, void *data)
diff --git a/tests/qtest/adm1272-test.c b/tests/qtest/adm1272-test.c
index 63f8514801..98134aabd2 100644
--- a/tests/qtest/adm1272-test.c
+++ b/tests/qtest/adm1272-test.c
@@ -65,6 +65,7 @@
 #define ADM1272_VOLTAGE_COEFF_DEFAULT   1
 #define ADM1272_CURRENT_COEFF_DEFAULT   3
 #define ADM1272_PWR_COEFF_DEFAULT   7
+#define ADM1272_TEMP_COEFF_DEFAULT  8
 #define ADM1272_IOUT_OFFSET 0x5000
 #define ADM1272_IOUT_OFFSET 0x5000
 
@@ -144,6 +145,22 @@ static uint32_t adm1272_direct_to_watts(uint16_t value)
 return pmbus_direct_mode2data(c, value);
 }
 
+static uint16_t adm1272_millidegrees_to_direct(uint32_t value)
+{
+PMBusCoefficients c = adm1272_coefficients[ADM1272_TEMP_COEFF_DEFAULT];
+c.b = c.b * 1000;
+c.R = c.R - 3;
+return pmbus_data2direct_mode(c, value);
+}
+
+static uint32_t adm1272_direct_to_millidegrees(uint16_t value)
+{
+PMBusCoefficients c = adm1272_coefficients[ADM1272_TEMP_COEFF_DEFAULT];
+c.b = c.b * 1000;
+c.R = c.R - 3;
+return pmbus_direct_mode2data(c, value);
+}
+
 static uint16_t qmp_adm1272_get(const char *id, const char *property)
 {
 QDict *response;
@@ -248,7 +265,7 @@ static void test_defaults(void *obj, void *data, 
QGuestAllocator *alloc)
 /* test qmp access */
 static void test_tx_rx(void *obj, void *data, QGuestAllocator *alloc)
 {
-uint16_t i2c_value, value, i2c_voltage, i2c_pwr, lossy_value;
+uint16_t i2c_value, value, i2c_voltage, i2c_pwr, i2c_temp, lossy_value;
 QI2CDevice *i2cdev = (QI2CDevice *)obj;
 
 /* converting to direct mode is lossy - we generate the same loss here */
@@ -287,6 +304,15 @@ static void

[PULL 2/2] hw/i2c: add pca9543 i2c-mux switch

2023-12-06 Thread minyard

From: Potin Lai 

Add pca9543 2-channel i2c-mux switch support.

Signed-off-by: Potin Lai 
Reviewed-by: Patrick Venture 
Message-Id: <20231113063156.2264941-1-potin.lai...@gmail.com>
Signed-off-by: Corey Minyard 
---
 hw/i2c/i2c_mux_pca954x.c | 12 
 include/hw/i2c/i2c_mux_pca954x.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
index db5db956a6..6aace0fc47 100644
--- a/hw/i2c/i2c_mux_pca954x.c
+++ b/hw/i2c/i2c_mux_pca954x.c
@@ -30,6 +30,7 @@
 
 #define PCA9548_CHANNEL_COUNT 8
 #define PCA9546_CHANNEL_COUNT 4
+#define PCA9543_CHANNEL_COUNT 2
 
 /*
  * struct Pca954xState - The pca954x state object.
@@ -172,6 +173,12 @@ I2CBus *pca954x_i2c_get_bus(I2CSlave *mux, uint8_t channel)
 return pca954x->bus[channel];
 }
 
+static void pca9543_class_init(ObjectClass *klass, void *data)
+{
+Pca954xClass *s = PCA954X_CLASS(klass);
+s->nchans = PCA9543_CHANNEL_COUNT;
+}
+
 static void pca9546_class_init(ObjectClass *klass, void *data)
 {
 Pca954xClass *s = PCA954X_CLASS(klass);
@@ -246,6 +253,11 @@ static const TypeInfo pca954x_info[] = {
 .class_init= pca954x_class_init,
 .abstract  = true,
 },
+{
+.name  = TYPE_PCA9543,
+.parent= TYPE_PCA954X,
+.class_init= pca9543_class_init,
+},
 {
 .name  = TYPE_PCA9546,
 .parent= TYPE_PCA954X,
diff --git a/include/hw/i2c/i2c_mux_pca954x.h b/include/hw/i2c/i2c_mux_pca954x.h
index 3dd25ec983..1da5508ed5 100644
--- a/include/hw/i2c/i2c_mux_pca954x.h
+++ b/include/hw/i2c/i2c_mux_pca954x.h
@@ -3,6 +3,7 @@
 
 #include "hw/i2c/i2c.h"
 
+#define TYPE_PCA9543 "pca9543"
 #define TYPE_PCA9546 "pca9546"
 #define TYPE_PCA9548 "pca9548"
 
-- 
2.34.1

Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Philippe Mathieu-Daudé


On 6/12/23 21:48, Philippe Mathieu-Daudé wrote:

On 6/12/23 20:19, Steven Sistare wrote:

On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:

Hi Steve,

On 6/12/23 18:23, Steve Sistare wrote:
Currently, a vm in the suspended state is not completely stopped.  
The VCPUs
have been paused, but the cpu clock still runs, and runstate 
notifiers for
the transition to stopped have not been called.  This causes 
problems for
live migration.  Stale cpu timers_state is saved to the migration 
stream,
causing time errors in the guest when it wakes from suspend, and 
state that

would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is 
suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was 
suspended.

Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi 
stop and

cont commands.  For example:

  (qemu) info status
  VM status: paused (suspended)

  (qemu) stop
  (qemu) info status
  VM status: paused

  (qemu) system_wakeup
  Error: Unable to wake up: guest is not in suspended state

  (qemu) cont
  (qemu) info status
  VM status: paused (suspended)

  (qemu) system_wakeup
  (qemu) info status
  VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
   include/sysemu/runstate.h |  5 +
   qapi/misc.json    | 10 --
   system/cpus.c | 23 +++
   system/runstate.c |  3 +++
   4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool 
shutdown_caused_by_guest(ShutdownCause cause)

   return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
   }
   +static inline bool runstate_is_live(RunState state)
+{
+    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}


Not being familiar with (live) migration, from a generic vCPU PoV
I don't get what "runstate_is_live" means. Can we add a comment
explaining what this helper is for?


Sure.  "live" means the cpu clock is ticking, and the runstate 
notifiers think
we are running.  It is everything that is enabled in vm_start except 
for starting

the vcpus.


What about runstate_is_vcpu_clock_ticking() then?


Since this is a migration particular case, maybe we can be verbose
in vm_resume() and keep runstate_is_live() -- eventually undocumented
-- in migration/migration.c.


runstate_is_live is about cpu and vm state, not migration state (the 
"live" is not
live migration), and is used in multiple places in cpus code and 
elsewhere, so I would
like to keep it in runstate.h.  It has a specific meaning, and it is 
useful to search
for it to see who handles "liveness", and distinguish it from code 
that checks the

running and suspended states for other reasons.


OK then, no objection, but please add a comment describing.

Thanks,

Phil.

Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Philippe Mathieu-Daudé


On 6/12/23 20:19, Steven Sistare wrote:

On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:

Hi Steve,

On 6/12/23 18:23, Steve Sistare wrote:

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

  (qemu) info status
  VM status: paused (suspended)

  (qemu) stop
  (qemu) info status
  VM status: paused

  (qemu) system_wakeup
  Error: Unable to wake up: guest is not in suspended state

  (qemu) cont
  (qemu) info status
  VM status: paused (suspended)

  (qemu) system_wakeup
  (qemu) info status
  VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
   include/sysemu/runstate.h |  5 +
   qapi/misc.json    | 10 --
   system/cpus.c | 23 +++
   system/runstate.c |  3 +++
   4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
cause)
   return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
   }
   +static inline bool runstate_is_live(RunState state)
+{
+    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}


Not being familiar with (live) migration, from a generic vCPU PoV
I don't get what "runstate_is_live" means. Can we add a comment
explaining what this helper is for?


Sure.  "live" means the cpu clock is ticking, and the runstate notifiers think
we are running.  It is everything that is enabled in vm_start except for 
starting
the vcpus.


Since this is a migration particular case, maybe we can be verbose
in vm_resume() and keep runstate_is_live() -- eventually undocumented
-- in migration/migration.c.


runstate_is_live is about cpu and vm state, not migration state (the "live" is 
not
live migration), and is used in multiple places in cpus code and elsewhere, so 
I would
like to keep it in runstate.h.  It has a specific meaning, and it is useful to 
search
for it to see who handles "liveness", and distinguish it from code that checks 
the
running and suspended states for other reasons.


OK then, no objection, but please add a comment describing.

Thanks,

Phil.

Re: [PATCH v3 for-8.2] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Paolo Bonzini


On 12/6/23 16:58, Michael Roth wrote:

Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
exposed a long-running bug in current KVM support for SEV-ES where the
kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
kernel, in which case EFER write traps would result in KVM eventually
seeing MSR_EFER_LMA get set and recording it in such a way that it would
be subsequently visible when accessing it via KVM_GET_SREGS/etc.

However, guest kernels currently rely on MSR_EFER_LMA getting set
automatically when MSR_EFER_LME is set and paging is enabled via
CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
MSR_EFER_LMA bit, even though it is set internally, and when QEMU
subsequently tries to pass this EFER value back to KVM via
KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
which is now considered fatal due to the aforementioned QEMU commit.

This can be addressed by inferring the MSR_EFER_LMA bit being set when
paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
the expected bits are all present in subsequent handling on the host
side.

Ultimately, this handling will be implemented in the host kernel, but to
avoid breaking QEMU's SEV-ES support when using older host kernels, the
same handling can be done in QEMU just after fetching the register
values via KVM_GET_SREGS*. Implement that here.

Cc: Paolo Bonzini 
Cc: Marcelo Tosatti 
Cc: Tom Lendacky 
Cc: Akihiko Odaki 
Cc: Philippe Mathieu-Daudé 
Cc: Lara Lazier 
Cc: Vitaly Kuznetsov 
Cc: Maxim Levitsky 
Cc: k...@vger.kernel.org
Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
Signed-off-by: Michael Roth 
---
  target/i386/kvm/kvm.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 11b8177eff..4ce80555b4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3643,6 +3643,10 @@ static int kvm_get_sregs(X86CPU *cpu)
  env->cr[4] = sregs.cr4;
  
  env->efer = sregs.efer;

+if (sev_es_enabled() && env->efer & MSR_EFER_LME &&
+env->cr[0] & CR0_PG_MASK) {
+env->efer |= MSR_EFER_LMA;
+}
  
  /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */

  x86_update_hflags(env);
@@ -3682,6 +3686,10 @@ static int kvm_get_sregs2(X86CPU *cpu)
  env->cr[4] = sregs.cr4;
  
  env->efer = sregs.efer;

+if (sev_es_enabled() && env->efer & MSR_EFER_LME &&
+env->cr[0] & CR0_PG_MASK) {
+env->efer |= MSR_EFER_LMA;
+}
  
  env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
  


Acked-by: Paolo Bonzini 

Thanks.  We can change it for 9.0 (especially adding a comment, because 
long term having it only in the commit message isn't great) but for now 
it's safe.


Paolo

Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Steven Sistare

On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:
> Hi Steve,
> 
> On 6/12/23 18:23, Steve Sistare wrote:
>> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
>> have been paused, but the cpu clock still runs, and runstate notifiers for
>> the transition to stopped have not been called.  This causes problems for
>> live migration.  Stale cpu timers_state is saved to the migration stream,
>> causing time errors in the guest when it wakes from suspend, and state that
>> would have been modified by runstate notifiers is wrong.
>>
>> Modify vm_stop to completely stop the vm if the current state is suspended,
>> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
>> Modify vm_start to restore the suspended state.
>>
>> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
>> cont commands.  For example:
>>
>>  (qemu) info status
>>  VM status: paused (suspended)
>>
>>  (qemu) stop
>>  (qemu) info status
>>  VM status: paused
>>
>>  (qemu) system_wakeup
>>  Error: Unable to wake up: guest is not in suspended state
>>
>>  (qemu) cont
>>  (qemu) info status
>>  VM status: paused (suspended)
>>
>>  (qemu) system_wakeup
>>  (qemu) info status
>>  VM status: running
>>
>> Suggested-by: Peter Xu 
>> Signed-off-by: Steve Sistare 
>> Reviewed-by: Peter Xu 
>> ---
>>   include/sysemu/runstate.h |  5 +
>>   qapi/misc.json    | 10 --
>>   system/cpus.c | 23 +++
>>   system/runstate.c |  3 +++
>>   4 files changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>> index 88a67e2..867e53c 100644
>> --- a/include/sysemu/runstate.h
>> +++ b/include/sysemu/runstate.h
>> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
>> cause)
>>   return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>   }
>>   +static inline bool runstate_is_live(RunState state)
>> +{
>> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>> +}
> 
> Not being familiar with (live) migration, from a generic vCPU PoV
> I don't get what "runstate_is_live" means. Can we add a comment
> explaining what this helper is for?

Sure.  "live" means the cpu clock is ticking, and the runstate notifiers think
we are running.  It is everything that is enabled in vm_start except for 
starting
the vcpus.

> Since this is a migration particular case, maybe we can be verbose
> in vm_resume() and keep runstate_is_live() -- eventually undocumented
> -- in migration/migration.c.

runstate_is_live is about cpu and vm state, not migration state (the "live" is 
not 
live migration), and is used in multiple places in cpus code and elsewhere, so 
I would 
like to keep it in runstate.h.  It has a specific meaning, and it is useful to 
search 
for it to see who handles "liveness", and distinguish it from code that checks 
the
running and suspended states for other reasons.

- Steve

>  void vm_resume(RunState state)
>  {
>  switch (state) {
>  case RUN_STATE_RUNNING:
>  case RUN_STATE_SUSPENDED:
>  vm_start();
>  break;
>  default:
>  runstate_set(state);
>  }
>  }

Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching

2023-12-06 Thread Peter Zijlstra

On Wed, Dec 06, 2023 at 10:37:33AM -0600, Madhavan T. Venkataraman wrote:
> 
> 
> On 11/30/23 05:33, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote:
> > 
> >> Kernel Lockdown
> >> ---
> >>
> >> But, we must provide at least some security in V2. Otherwise, it is 
> >> useless.
> >>
> >> So, we have implemented what we call a kernel lockdown. At the end of 
> >> kernel
> >> boot, Heki establishes permissions in the extended page table as mentioned
> >> before. Also, it adds an immutable attribute for kernel text and kernel RO 
> >> data.
> >> Beyond that point, guest requests that attempt to modify permissions on 
> >> any of
> >> the immutable pages will be denied.
> >>
> >> This means that features like FTrace and KProbes will not work on kernel 
> >> text
> >> in V2. This is a temporary limitation. Once authentication is in place, the
> >> limitation will go away.
> > 
> > So either you're saying your patch 17 / text_poke is broken (so why
> > include it ?!?) or your statement above is incorrect. Pick one.
> > 
> 
> It has been included so that people can be aware of the changes.
> 
> I will remove the text_poke() changes from the patchset and send it later when
> I have some authentication in place. It will make sense then.

If you know its broken then fucking say so in the Changelog instead of
wasting everybody's time.. OMG.

Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Philippe Mathieu-Daudé


Hi Steve,

On 6/12/23 18:23, Steve Sistare wrote:

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

 (qemu) info status
 VM status: paused (suspended)

 (qemu) stop
 (qemu) info status
 VM status: paused

 (qemu) system_wakeup
 Error: Unable to wake up: guest is not in suspended state

 (qemu) cont
 (qemu) info status
 VM status: paused (suspended)

 (qemu) system_wakeup
 (qemu) info status
 VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
  include/sysemu/runstate.h |  5 +
  qapi/misc.json| 10 --
  system/cpus.c | 23 +++
  system/runstate.c |  3 +++
  4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
cause)
  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
  }
  
+static inline bool runstate_is_live(RunState state)

+{
+return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}


Not being familiar with (live) migration, from a generic vCPU PoV
I don't get what "runstate_is_live" means. Can we add a comment
explaining what this helper is for?

Since this is a migration particular case, maybe we can be verbose
in vm_resume() and keep runstate_is_live() -- eventually undocumented
-- in migration/migration.c.

 void vm_resume(RunState state)
 {
 switch (state) {
 case RUN_STATE_RUNNING:
 case RUN_STATE_SUSPENDED:
 vm_start();
 break;
 default:
 runstate_set(state);
 }
 }

Regards,

Phil.

Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Michael Roth

On Wed, Dec 06, 2023 at 07:20:14PM +0200, Maxim Levitsky wrote:
> On Tue, 2023-12-05 at 16:28 -0600, Michael Roth wrote:
> > Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
> > exposed a long-running bug in current KVM support for SEV-ES where the
> > kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
> > kernel, in which case EFER write traps would result in KVM eventually
> > seeing MSR_EFER_LMA get set and recording it in such a way that it would
> > be subsequently visible when accessing it via KVM_GET_SREGS/etc.
> > 
> > However, guests kernels currently rely on MSR_EFER_LMA getting set
> > automatically when MSR_EFER_LME is set and paging is enabled via
> > CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
> > MSR_EFER_LMA even though it is set internally, and when QEMU
> > subsequently tries to pass this EFER value back to KVM via
> > KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
> > which is now considered fatal due to the aforementioned QEMU commit.
> > 
> > This can be addressed by inferring the MSR_EFER_LMA bit being set when
> > paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
> > the expected bits are all present in subsequent handling on the host
> > side.
> > 
> > Ultimately, this handling will be implemented in the host kernel, but to
> > avoid breaking QEMU's SEV-ES support when using older host kernels, the
> > same handling can be done in QEMU just after fetching the register
> > values via KVM_GET_SREGS*. Implement that here.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Marcelo Tosatti 
> > Cc: Tom Lendacky 
> > Cc: Akihiko Odaki 
> > Cc: k...@vger.kernel.org
> > Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > Signed-off-by: Michael Roth 
> > ---
> > v2:
> >   - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2
> > 
> >  target/i386/kvm/kvm.c | 14 ++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 11b8177eff..8721c1bf8f 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu)
> >  {
> >  CPUX86State *env = >env;
> >  struct kvm_sregs sregs;
> > +target_ulong cr0_old;
> >  int ret;
> >  
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, );
> > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
> >  env->gdt.limit = sregs.gdt.limit;
> >  env->gdt.base = sregs.gdt.base;
> >  
> > +cr0_old = env->cr[0];
> >  env->cr[0] = sregs.cr0;
> >  env->cr[2] = sregs.cr2;
> >  env->cr[3] = sregs.cr3;
> >  env->cr[4] = sregs.cr4;
> >  
> >  env->efer = sregs.efer;
> > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > +env->efer |= MSR_EFER_LMA;
> > +}
> > +}
> 
> I think that we should not check that CR0_PG has changed, and just blindly 
> assume
> that if EFER.LME is set and CR0.PG is set, then EFER.LMA must be set as 
> defined in x86 spec.
> 
> Otherwise, suppose qemu calls kvm_get_sregs twice: First time it will work,
> but second time CR0.PG will match one that is stored in the env, and thus the 
> workaround
> will not be executed, and instead we will revert back to wrong EFER value 
> reported by the kernel.
> 
> How about something like that:
> 
> 
> if (sev_es_enabled() && env->efer & MSR_EFER_LME && env->cr[0] & CR0_PG_MASK) 
> {
>   /* 
>  * Workaround KVM bug, because of which KVM might not be aware of the 
>  * fact that EFER.LMA was toggled by the hardware 
>  */
>   env->efer |= MSR_EFER_LMA;
> }

Hi Maxim,

I'd already sent a v3 based on a similar suggestion from Paolo:

  https://lists.gnu.org/archive/html/qemu-devel/2023-12/msg00751.html

Does that one look okay to you?

Thanks,

Mike

> 
> 
> Best regards,
>   Maxim Levitsky
> 
> >  
> >  /* changes to apic base and cr8/tpr are read back via 
> > kvm_arch_post_run */
> >  x86_update_hflags(env);
> > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
> >  {
> >  CPUX86State *env = >env;
> >  struct kvm_sregs2 sregs;
> > +target_ulong cr0_old;
> >  int i, ret;
> >  
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, );
> > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
> >  env->gdt.limit = sregs.gdt.limit;
> >  env->gdt.base = sregs.gdt.base;
> >  
> > +cr0_old = env->cr[0];
> >  env->cr[0] = sregs.cr0;
> >  env->cr[2] = sregs.cr2;
> >  env->cr[3] = sregs.cr3;
> >  env->cr[4] = sregs.cr4;
> >  
> >  env->efer = sregs.efer;
> > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > +

Re: [PATCH v3] crypto: Introduce SM4 symmetric cipher algorithm

2023-12-06 Thread Daniel P . Berrangé

On Wed, Nov 29, 2023 at 11:17:49PM +0800, Hyman Huang wrote:
> Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).
> 
> SM4 (GBT.32907-2016) is a cryptographic standard issued by the
> Organization of State Commercial Administration of China (OSCCA)
> as an authorized cryptographic algorithms for the use within China.
> 
> Use the crypto-sm4 meson build option to explicitly control the
> feature, which would be detected by default.
> 
> Signed-off-by: Hyman Huang 
> ---
>  crypto/block-luks.c | 11 
>  crypto/cipher-gcrypt.c.inc  |  8 ++
>  crypto/cipher-nettle.c.inc  | 49 +
>  crypto/cipher.c |  6 
>  meson.build | 42 
>  meson_options.txt   |  2 ++
>  qapi/crypto.json|  5 +++-
>  scripts/meson-buildoptions.sh   |  3 ++
>  tests/unit/test-crypto-cipher.c | 13 +
>  9 files changed, 138 insertions(+), 1 deletion(-)
> 

> diff --git a/meson.build b/meson.build
> index ec01f8b138..765f9c9f50 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1480,6 +1480,7 @@ endif
>  gcrypt = not_found
>  nettle = not_found
>  hogweed = not_found
> +crypto_sm4 = not_found
>  xts = 'none'
>  
>  if get_option('nettle').enabled() and get_option('gcrypt').enabled()
> @@ -1505,6 +1506,28 @@ if not gnutls_crypto.found()
>   cc.find_library('gpg-error', required: true)],
>  version: gcrypt.version())
>  endif
> +crypto_sm4 = gcrypt
> +# SM4 ALG is available in libgcrypt >= 1.9
> +if gcrypt.found() and not cc.links('''
> +  #include 
> +  int main(void) {
> +gcry_cipher_hd_t handler;
> +gcry_cipher_open(, GCRY_CIPHER_SM4, GCRY_CIPHER_MODE_ECB, 0);
> +return 0;
> +  }''', dependencies: gcrypt)
> +  crypto_sm4 = not_found
> +  if get_option('crypto_sm4').enabled()
> +error('could not link sm4')
> +  else
> +warning('could not link sm4, disabling')
> +  endif

IMHO we don't need to have an option for 'crypto_sm4', just
silently disable it if not present in the host provideed
library.

> +endif
> +if crypto_sm4.found() and get_option('prefer_static')
> +  crypto_sm4 = declare_dependency(dependencies: [
> +gcrypt,
> +cc.find_library('gpg-error', required: true)],
> +version: gcrypt.version())
> +endif

This last if/endif block is redundant. We already have earlier
logic that detects gpg-error, and we never use the 'crypto_sm4'
object after this point anyway

>endif
>if (not get_option('nettle').auto() or have_system) and not gcrypt.found()
>  nettle = dependency('nettle', version: '>=3.4',
> @@ -1513,6 +1536,23 @@ if not gnutls_crypto.found()
>  if nettle.found() and not cc.has_header('nettle/xts.h', dependencies: 
> nettle)
>xts = 'private'
>  endif
> +crypto_sm4 = nettle
> +# SM4 ALG is available in nettle >= 3.9
> +if nettle.found() and not cc.links('''
> +  #include 
> +  int main(void) {
> +struct sm4_ctx ctx;
> +unsigned char key[16] = {0};
> +sm4_set_encrypt_key(, key);
> +return 0;
> +  }''', dependencies: nettle)
> +  crypto_sm4 = not_found
> +  if get_option('crypto_sm4').enabled()
> +error('could not link sm4')
> +  else
> +warning('could not link sm4, disabling')
> +  endif

Likewise no need for an option, just silently disable it.

> +endif
>endif
>  endif
>  
> @@ -2199,6 +2239,7 @@ config_host_data.set('CONFIG_GNUTLS_CRYPTO', 
> gnutls_crypto.found())
>  config_host_data.set('CONFIG_TASN1', tasn1.found())
>  config_host_data.set('CONFIG_GCRYPT', gcrypt.found())
>  config_host_data.set('CONFIG_NETTLE', nettle.found())
> +config_host_data.set('CONFIG_CRYPTO_SM4', crypto_sm4.found())
>  config_host_data.set('CONFIG_HOGWEED', hogweed.found())
>  config_host_data.set('CONFIG_QEMU_PRIVATE_XTS', xts == 'private')
>  config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
> @@ -4273,6 +4314,7 @@ summary_info += {'nettle':nettle}
>  if nettle.found()
> summary_info += {'  XTS': xts != 'private'}
>  endif
> +summary_info += {'SM4 ALG support':   crypto_sm4}
>  summary_info += {'AF_ALG support':have_afalg}
>  summary_info += {'rng-none':  get_option('rng_none')}
>  summary_info += {'Linux keyring': have_keyring}
> diff --git a/meson_options.txt b/meson_options.txt
> index c9baeda639..db8de4ec5b 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -172,6 +172,8 @@ option('nettle', type : 'feature', value : 'auto',
> description: 'nettle cryptography support')
>  option('gcrypt', type : 'feature', value : 'auto',
> description: 'libgcrypt cryptography support')
> +option('crypto_sm4', type : 'feature', value : 'auto',
> +   description: 'SM4 symmetric cipher algorithm support')

Drop this.

>  option('crypto_afalg', type :

Re: [PATCH V7 00/12] fix migration of suspended runstate

2023-12-06 Thread Steven Sistare

FYI, these patches still need RB:
  migration: propagate suspended runstate
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

This has RB, but the interaction between vm_start and vm_prepare_start
changed, so needs another look.
cpus: stop vm in suspended runstate

- Steve

On 12/6/2023 12:23 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>   cpus: refactor vm_stop
>   cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>   cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
> vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
> migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
> "pass runstate to vm_prepare_start" to
> "check running not RUN_STATE_RUNNING"
>   * drop patches:
>   tests/qtest: bootfile per vm
>   tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c  |   2 +-
>  hw/usb/hcd-ehci.c|   2 +-
>  hw/usb/redirect.c|   2 +-
>  hw/xen/xen-hvm-common.c  |   2 +-
>  include/migration/snapshot.h |   7 ++
>  include/sysemu/runstate.h|  16 
>  migration/global_state.c |  35 ++-
>  migration/migration-hmp-cmds.c   |   8 +-
>  migration/migration.c|  15 +--
>  migration/savevm.c   |  23 +++--
>  qapi/misc.json   |  10 +-
>  system/cpus.c|  47 +++--
>  system/runstate.c|   9 ++
>  system/vl.c  |   2 +
>  tests/migration/i386/Makefile|   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c  |  27 ++
>  tests/qtest/migration-helpers.h  |  11 ++-
>  tests/qtest/migration-test.c | 181 
> +--
>  20 files changed, 354 insertions(+), 126 deletions(-)
>

[PATCH V7 08/12] migration: preserve suspended for bg_migration

2023-12-06 Thread Steve Sistare

Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 7 +--
 system/runstate.c | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8124811..2cc7e2a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3390,7 +3390,7 @@ static void bg_migration_vm_start_bh(void *opaque)
 qemu_bh_delete(s->vm_start_bh);
 s->vm_start_bh = NULL;
 
-vm_start();
+vm_resume(s->vm_old_state);
 migration_downtime_end(s);
 }
 
@@ -3462,11 +3462,6 @@ static void *bg_migration_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 
-/*
- * If VM is currently in suspended state, then, to make a valid runstate
- * transition in vm_stop_force_state() we need to wakeup it up.
- */
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 s->vm_old_state = runstate_get();
 
 global_state_store();
diff --git a/system/runstate.c b/system/runstate.c
index ca9eb54..621a023 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
 { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM },
 { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM },
+{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN },
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1

[PATCH V7 05/12] migration: propagate suspended runstate

2023-12-06 Thread Steve Sistare

If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.

Signed-off-by: Steve Sistare 
---
 migration/global_state.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..d4f61a1 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -22,7 +22,16 @@
 
 typedef struct {
 uint32_t size;
-uint8_t runstate[100];
+
+/*
+ * runstate was 100 bytes, zero padded, but we trimmed it to add a
+ * few fields and maintain backwards compatibility.
+ */
+uint8_t runstate[32];
+uint8_t has_vm_was_suspended;
+uint8_t vm_was_suspended;
+uint8_t unused[66];
+
 RunState state;
 bool received;
 } GlobalState;
@@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
 assert(strlen(state_str) < sizeof(global_state.runstate));
 strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
   state_str, '\0');
+global_state.has_vm_was_suspended = true;
+global_state.vm_was_suspended = vm_get_suspended();
+
+memset(global_state.unused, 0, sizeof(global_state.unused));
 }
 
 void global_state_store(void)
@@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
 return true;
 }
 
+/* If the suspended state must be remembered, it is needed */
+
+if (vm_get_suspended()) {
+return true;
+}
+
 /* If state is running or paused, it is not needed */
 
 if (strcmp(runstate, "running") == 0 ||
@@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 Error *local_err = NULL;
 int r;
 char *runstate = (char *)s->runstate;
+bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;
 
 s->received = true;
 trace_migrate_global_state_post_load(runstate);
@@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 sizeof(s->runstate)) == sizeof(s->runstate)) {
 /*
  * This condition should never happen during migration, because
- * all runstate names are shorter than 100 bytes (the size of
+ * all runstate names are shorter than 32 bytes (the size of
  * s->runstate). However, a malicious stream could overflow
  * the qapi_enum_parse() call, so we force the last character
  * to a NUL byte.
@@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int 
version_id)
 }
 s->state = r;
 
+/*
+ * global_state is saved on the outgoing side before forcing a stopped
+ * state, so it may have saved state=suspended and vm_was_suspended=0.
+ * Now we are in a paused state, and when we later call vm_start, it must
+ * restore the suspended state, so we must set vm_was_suspended=1 here.
+ */
+vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
+
 return 0;
 }
 
@@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(size, GlobalState),
 VMSTATE_BUFFER(runstate, GlobalState),
+VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
+VMSTATE_UINT8(vm_was_suspended, GlobalState),
+VMSTATE_BUFFER(unused, GlobalState),
 VMSTATE_END_OF_LIST()
 },
 };
-- 
1.8.3.1

[PATCH V7 06/12] migration: preserve suspended runstate

2023-12-06 Thread Steve Sistare

A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3ce04b2..8124811 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -604,7 +604,7 @@ static void process_incoming_migration_bh(void *opaque)
  */
 if (!migrate_late_block_activate() ||
  (autostart && (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING))) {
+runstate_is_live(global_state_get_runstate() {
 /* Make sure all file formats throw away their mutable metadata.
  * If we get an error here, just don't restart the VM yet. */
 bdrv_activate_all(_err);
@@ -628,7 +628,7 @@ static void process_incoming_migration_bh(void *opaque)
 dirty_bitmap_mig_before_vm_start();
 
 if (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING) {
+runstate_is_live(global_state_get_runstate())) {
 if (autostart) {
 vm_start();
 } else {
@@ -2416,7 +2416,6 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 migration_downtime_start(ms);
 
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 global_state_store();
 ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
@@ -2615,7 +2614,6 @@ static int migration_completion_precopy(MigrationState *s,
 
 qemu_mutex_lock_iothread();
 migration_downtime_start(s);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
 s->vm_old_state = runstate_get();
 global_state_store();
@@ -3136,7 +3134,7 @@ static void migration_iteration_finish(MigrationState *s)
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
 case MIGRATION_STATUS_CANCELLING:
-if (s->vm_old_state == RUN_STATE_RUNNING) {
+if (runstate_is_live(s->vm_old_state)) {
 if (!runstate_check(RUN_STATE_SHUTDOWN)) {
 vm_start();
 }
-- 
1.8.3.1

[PATCH V7 01/12] cpus: vm_was_suspended

2023-12-06 Thread Steve Sistare

Add a state variable to remember if a vm previously transitioned into a
suspended state.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h |  2 ++
 system/cpus.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index c8c2bd8..88a67e2 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
+void vm_set_suspended(bool suspended);
+bool vm_get_suspended(void);
 
 typedef enum WakeupReason {
 /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/system/cpus.c b/system/cpus.c
index a444a74..9f631ab 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -259,6 +259,21 @@ void cpu_interrupt(CPUState *cpu, int mask)
 }
 }
 
+/*
+ * True if the vm was previously suspended, and has not been woken or reset.
+ */
+static int vm_was_suspended;
+
+void vm_set_suspended(bool suspended)
+{
+vm_was_suspended = suspended;
+}
+
+bool vm_get_suspended(void)
+{
+return vm_was_suspended;
+}
+
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
-- 
1.8.3.1

[PATCH V7 11/12] tests/qtest: precopy migration with suspend

2023-12-06 Thread Steve Sistare

Add a test case to verify that the suspended state is handled correctly
during live migration precopy.  The test suspends the src, migrates, then
wakes the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-helpers.c |  3 ++
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-test.c| 62 +++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index fd3b94e..37e8e81 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char 
*name,
 if (g_str_equal(name, "STOP")) {
 state->stop_seen = true;
 return true;
+} else if (g_str_equal(name, "SUSPEND")) {
+state->suspend_seen = true;
+return true;
 } else if (g_str_equal(name, "RESUME")) {
 state->resume_seen = true;
 return true;
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3d32699..b478549 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -18,6 +18,8 @@
 typedef struct QTestMigrationState {
 bool stop_seen;
 bool resume_seen;
+bool suspend_seen;
+bool suspend_me;
 } QTestMigrationState;
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e10d5a4..f57a978 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -178,7 +178,7 @@ static void bootfile_delete(void)
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
- * but on the destination we won't have the A.
+ * but on the destination we won't have the A (unless we enabled 
suspend/resume)
  */
 static void wait_for_serial(const char *side)
 {
@@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, 
QTestMigrationState *state)
 }
 }
 
+static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
+{
+if (state->suspend_me && !state->suspend_seen) {
+qtest_qmp_eventwait(who, "SUSPEND");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
 {
 uint64_t pass, prev_pass = 0, changes = 0;
 
-while (changes < 2 && !src_state.stop_seen) {
+while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
 changes += (pass != prev_pass);
@@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
 usleep(1000 * 10);
 } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
 
+
+/* If suspended, src only iterates once, and watch_byte may never change */
+if (src_state.suspend_me) {
+return;
+}
+
 /*
  * Now ensure that already transferred bytes are
  * dirty again from the guest workload. Note the
@@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 dst_state = (QTestMigrationState) { };
 src_state = (QTestMigrationState) { };
 bootfile_create(tmpfs, args->suspend_me);
+src_state.suspend_me = args->suspend_me;
 
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
@@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args)
 /* Wait for the first serial output from the source */
 if (args->result == MIG_TEST_SUCCEED) {
 wait_for_serial("src_serial");
+wait_for_suspend(from, _state);
 }
 
 if (args->live) {
@@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args)
 
 wait_for_resume(to, _state);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 wait_for_serial("dest_serial");
 }
 
@@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void)
 test_precopy_common();
 }
 
+static void test_precopy_unix_suspend_live(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+/*
+ * despite being live, the test is fast because the src
+ * suspends immediately.
+ */
+.live = true,
+.start.suspend_me = true,
+};
+
+test_precopy_common();
+}
+
+static void test_precopy_unix_suspend_notlive(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+.start.suspend_me = true,
+};
+
+

[PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING

2023-12-06 Thread Steve Sistare

When a vm transitions from running to suspended, runstate notifiers are
not called, so the notifiers still think the vm is running.  Hence, when
we call vm_start to restore the suspended state, we call vm_state_notify
with running=1.  However, some notifiers check for RUN_STATE_RUNNING.
They must check the running boolean instead.

No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 backends/tpm/tpm_emulator.c | 2 +-
 hw/usb/hcd-ehci.c   | 2 +-
 hw/usb/redirect.c   | 2 +-
 hw/xen/xen-hvm-common.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index f7f1b4a..254fce7 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool 
running,
 
 trace_tpm_emulator_vm_state_change(running, state);
 
-if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) {
+if (!running || !tpm_emu->relock_storage) {
 return;
 }
 
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 19b4534..10c82ce 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool 
running, RunState state)
  * USB-devices which have async handled packages have a packet in the
  * ep queue to match the completion with.
  */
-if (state == RUN_STATE_RUNNING) {
+if (running) {
 ehci_advance_async_state(ehci);
 }
 
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index c9893df..3785bb0 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool 
running, RunState state)
 {
 USBRedirDevice *dev = priv;
 
-if (state == RUN_STATE_RUNNING && dev->parser != NULL) {
+if (running && dev->parser != NULL) {
 usbredirparser_do_write(dev->parser); /* Flush any pending writes */
 }
 }
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39..47e6cb1 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool 
running,
 
 xen_set_ioreq_server_state(xen_domid,
state->ioservid,
-   (rstate == RUN_STATE_RUNNING));
+   running);
 }
 
 void xen_exit_notifier(Notifier *n, void *data)
-- 
1.8.3.1

[PATCH V7 09/12] tests/qtest: migration events

2023-12-06 Thread Steve Sistare

Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass.  No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qtest/migration-helpers.c | 24 ---
 tests/qtest/migration-helpers.h |  9 ++--
 tests/qtest/migration-test.c| 91 +++--
 3 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 24fb7b3..fd3b94e 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -24,26 +24,16 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque)
-{
-bool *seen = opaque;
-
-if (g_str_equal(name, "STOP")) {
-*seen = true;
-return true;
-}
-
-return false;
-}
-
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque)
 {
-bool *seen = opaque;
+QTestMigrationState *state = opaque;
 
-if (g_str_equal(name, "RESUME")) {
-*seen = true;
+if (g_str_equal(name, "STOP")) {
+state->stop_seen = true;
+return true;
+} else if (g_str_equal(name, "RESUME")) {
+state->resume_seen = true;
 return true;
 }
 
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index e31dc85..3d32699 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -15,9 +15,12 @@
 
 #include "libqtest.h"
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque);
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+typedef struct QTestMigrationState {
+bool stop_seen;
+bool resume_seen;
+} QTestMigrationState;
+
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque);
 
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0fbaa6a..05c0740 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -43,8 +43,8 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
-static bool got_src_stop;
-static bool got_dst_resume;
+static QTestMigrationState src_state;
+static QTestMigrationState dst_state;
 
 /*
  * An initial 3 MB offset is used as that corresponds
@@ -230,6 +230,20 @@ static void wait_for_serial(const char *side)
 } while (true);
 }
 
+static void wait_for_stop(QTestState *who, QTestMigrationState *state)
+{
+if (!state->stop_seen) {
+qtest_qmp_eventwait(who, "STOP");
+}
+}
+
+static void wait_for_resume(QTestState *who, QTestMigrationState *state)
+{
+if (!state->resume_seen) {
+qtest_qmp_eventwait(who, "RESUME");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who)
 qobject_unref(rsp_return);
 }
 
+/*
+ * Wait for two changes in the migration pass count, but bail if we stop.
+ */
 static void wait_for_migration_pass(QTestState *who)
 {
-uint64_t initial_pass = get_migration_pass(who);
-uint64_t pass;
+uint64_t pass, prev_pass = 0, changes = 0;
 
-/* Wait for the 1st sync */
-while (!got_src_stop && !initial_pass) {
-usleep(1000);
-initial_pass = get_migration_pass(who);
-}
-
-do {
+while (changes < 2 && !src_state.stop_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
-} while (pass == initial_pass && !got_src_stop);
+changes += (pass != prev_pass);
+prev_pass = pass;
+}
 }
 
 static void check_guests_ram(QTestState *who)
@@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 {
 qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
-if (!got_src_stop) {
-qtest_qmp_eventwait(from, "STOP");
-}
-
+wait_for_stop(from, _state);
 qtest_qmp_eventwait(to, "RESUME");
 }
 
@@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 }
 }
 
-got_src_stop = false;
-got_dst_resume = false;
+dst_state = (QTestMigrationState) { };
+src_state = (QTestMigrationState) { };
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
 
@@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 if (!args->only_target) {
 *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);

[PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Steve Sistare

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

(qemu) info status
VM status: paused (suspended)

(qemu) stop
(qemu) info status
VM status: paused

(qemu) system_wakeup
Error: Unable to wake up: guest is not in suspended state

(qemu) cont
(qemu) info status
VM status: paused (suspended)

(qemu) system_wakeup
(qemu) info status
VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h |  5 +
 qapi/misc.json| 10 --
 system/cpus.c | 23 +++
 system/runstate.c |  3 +++
 4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
cause)
 return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
 }
 
+static inline bool runstate_is_live(RunState state)
+{
+return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}
+
 void vm_start(void);
 
 /**
diff --git a/qapi/misc.json b/qapi/misc.json
index cda2eff..efb8d44 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -134,7 +134,7 @@
 ##
 # @stop:
 #
-# Stop all guest VCPU execution.
+# Stop all guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -143,6 +143,9 @@
 # the guest remains paused once migration finishes, as if the -S
 # option was passed on the command line.
 #
+# In the "suspended" state, it will completely stop the VM and
+# cause a transition to the "paused" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "stop" }
@@ -153,7 +156,7 @@
 ##
 # @cont:
 #
-# Resume guest VCPU execution.
+# Resume guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -165,6 +168,9 @@
 # guest starts once migration finishes, removing the effect of the
 # -S command line option if it was passed.
 #
+# If the VM was previously suspended, and not been reset or woken,
+# this command will transition back to the "suspended" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "cont" }
diff --git a/system/cpus.c b/system/cpus.c
index 9f631ab..f162435 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -277,11 +277,15 @@ bool vm_get_suspended(void)
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
+RunState oldstate = runstate_get();
 
-if (runstate_is_running()) {
+if (runstate_is_live(oldstate)) {
+vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
 runstate_set(state);
 cpu_disable_ticks();
-pause_all_vcpus();
+if (oldstate == RUN_STATE_RUNNING) {
+pause_all_vcpus();
+}
 vm_state_notify(0, state);
 if (send_stop) {
 qapi_event_send_stop();
@@ -694,11 +698,13 @@ int vm_stop(RunState state)
 
 /**
  * Prepare for (re)starting the VM.
- * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
- * running or in case of an error condition), 0 otherwise.
+ * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
+ * and 1 otherwise.
  */
 int vm_prepare_start(bool step_pending)
 {
+int ret = vm_was_suspended ? 1 : 0;
+RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
RUN_STATE_RUNNING;
 RunState requested;
 
 qemu_vmstop_requested();
@@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending)
 qapi_event_send_resume();
 
 cpu_enable_ticks();
-runstate_set(RUN_STATE_RUNNING);
-vm_state_notify(1, RUN_STATE_RUNNING);
-return 0;
+runstate_set(state);
+vm_state_notify(1, state);
+vm_was_suspended = false;
+return ret;
 }
 
 void vm_start(void)
@@ -745,7 +752,7 @@ void vm_start(void)
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
 {
-if (runstate_is_running()) {
+if (runstate_is_live(runstate_get())) {
 return vm_stop(state);
 } else {
 int ret;
diff --git a/system/runstate.c b/system/runstate.c
index ea9d6c2..e2fa204 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 {

[PATCH V7 07/12] migration: preserve suspended for snapshot

2023-12-06 Thread Steve Sistare

Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.

To save, the existing vm_stop call now completely stops the suspended
state.  Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/migration/snapshot.h   |  7 +++
 migration/migration-hmp-cmds.c |  8 +---
 migration/savevm.c | 23 +--
 system/runstate.c  |  5 +
 system/vl.c|  2 ++
 5 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index e72083b..9e4dcaa 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -16,6 +16,7 @@
 #define QEMU_MIGRATION_SNAPSHOT_H
 
 #include "qapi/qapi-builtin-types.h"
+#include "qapi/qapi-types-run-state.h"
 
 /**
  * save_snapshot: Save an internal snapshot.
@@ -61,4 +62,10 @@ bool delete_snapshot(const char *name,
 bool has_devices, strList *devices,
 Error **errp);
 
+/**
+ * load_snapshot_resume: Restore runstate after loading snapshot.
+ * @state: state to restore
+ */
+void load_snapshot_resume(RunState state);
+
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 86ae832..c8d70bc 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
QDict *qdict)
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-int saved_vm_running  = runstate_is_running();
+RunState saved_state = runstate_get();
+
 const char *name = qdict_get_str(qdict, "name");
 Error *err = NULL;
 
 vm_stop(RUN_STATE_RESTORE_VM);
 
-if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
-vm_start();
+if (load_snapshot(name, NULL, false, NULL, )) {
+load_snapshot_resume(saved_state);
 }
+
 hmp_handle_error(mon, err);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index eec5503..26866f0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 QEMUSnapshotInfo sn1, *sn = 
 int ret = -1, ret2;
 QEMUFile *f;
-int saved_vm_running;
+RunState saved_state = runstate_get();
 uint64_t vm_state_size;
 g_autoptr(GDateTime) now = g_date_time_new_now_local();
 AioContext *aio_context;
@@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 }
 aio_context = bdrv_get_aio_context(bs);
 
-saved_vm_running = runstate_is_running();
-
 global_state_store();
 vm_stop(RUN_STATE_SAVE_VM);
 
@@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 
 bdrv_drain_all_end();
 
-if (saved_vm_running) {
-vm_start();
-}
+vm_resume(saved_state);
 return ret == 0;
 }
 
@@ -3339,6 +3335,14 @@ err_drain:
 return false;
 }
 
+void load_snapshot_resume(RunState state)
+{
+vm_resume(state);
+if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) {
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, _abort);
+}
+}
+
 bool delete_snapshot(const char *name, bool has_devices,
  strList *devices, Error **errp)
 {
@@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque)
 {
 Job *job = opaque;
 SnapshotJob *s = container_of(job, SnapshotJob, common);
-int orig_vm_running;
+RunState orig_state = runstate_get();
 
 job_progress_set_remaining(>common, 1);
 
-orig_vm_running = runstate_is_running();
 vm_stop(RUN_STATE_RESTORE_VM);
 
 s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
-if (s->ret && orig_vm_running) {
-vm_start();
+if (s->ret) {
+load_snapshot_resume(orig_state);
 }
 
 job_progress_update(>common, 1);
diff --git a/system/runstate.c b/system/runstate.c
index e2fa204..ca9eb54 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@

[PATCH V7 12/12] tests/qtest: postcopy migration with suspend

2023-12-06 Thread Steve Sistare

Add a test case to verify that the suspended state is handled correctly by
live migration postcopy.  The test suspends the src, migrates, then wakes
the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f57a978..b7c094e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
+wait_for_suspend(from, _state);
 
 g_autofree char *uri = migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, uri, "{}");
@@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to,
 {
 wait_for_migration_complete(from);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 /* Make sure we get at least one "B" on destination */
 wait_for_serial("dest_serial");
 
@@ -1397,6 +1403,15 @@ static void test_postcopy(void)
 test_postcopy_common();
 }
 
+static void test_postcopy_suspend(void)
+{
+MigrateCommon args = {
+.start.suspend_me = true,
+};
+
+test_postcopy_common();
+}
+
 static void test_postcopy_compress(void)
 {
 MigrateCommon args = {
@@ -3412,7 +3427,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/recovery/double-failures",
test_postcopy_recovery_double_fail);
 #endif /* _WIN32 */
-
+if (is_x86) {
+qtest_add_func("/migration/postcopy/suspend",
+   test_postcopy_suspend);
+}
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
-- 
1.8.3.1

[PATCH V7 04/12] cpus: vm_resume

2023-12-06 Thread Steve Sistare

Define the vm_resume helper, for use in subsequent patches.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h | 9 +
 system/cpus.c | 9 +
 2 files changed, 18 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 867e53c..315046b 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -53,6 +53,15 @@ void vm_start(void);
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
  */
 int vm_prepare_start(bool step_pending);
+
+/**
+ * vm_resume: If @state is a live state, start the vm and set the state,
+ * else just set the state.
+ *
+ * @state: the state to restore
+ */
+void vm_resume(RunState state);
+
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/system/cpus.c b/system/cpus.c
index f162435..7d2c28b 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -748,6 +748,15 @@ void vm_start(void)
 }
 }
 
+void vm_resume(RunState state)
+{
+if (runstate_is_live(state)) {
+vm_start();
+} else {
+runstate_set(state);
+}
+}
+
 /* does a state transition even if the VM is already stopped,
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
-- 
1.8.3.1

[PATCH V7 10/12] tests/qtest: option to suspend during migration

2023-12-06 Thread Steve Sistare

Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory.  The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known.  Generate the bootblock for each test, because suspend_me
may differ for each.

Signed-off-by: Steve Sistare 
Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
---
 tests/migration/i386/Makefile|  5 ++--
 tests/migration/i386/a-b-bootblock.S | 50 +---
 tests/migration/i386/a-b-bootblock.h | 26 +--
 tests/qtest/migration-test.c | 12 ++---
 4 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
index 5c03241..37a72ae 100644
--- a/tests/migration/i386/Makefile
+++ b/tests/migration/i386/Makefile
@@ -4,9 +4,10 @@
 .PHONY: all clean
 all: a-b-bootblock.h
 
-a-b-bootblock.h: x86.bootsect
+a-b-bootblock.h: x86.bootsect x86.o
echo "$$__note" > header.tmp
xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
+   nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
mv header.tmp $@
 
 x86.bootsect: x86.boot
@@ -16,7 +17,7 @@ x86.boot: x86.o
$(CROSS_PREFIX)objcopy -O binary $< $@
 
 x86.o: a-b-bootblock.S
-   $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
+   $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
 
 clean:
@rm -rf *.boot *.o *.bootsect
diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 6bb..6f39eb6 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -9,6 +9,23 @@
 #
 # Author: dgilb...@redhat.com
 
+#include "migration-test.h"
+
+#define ACPI_ENABLE 0xf1
+#define ACPI_PORT_SMI_CMD   0xb2
+#define ACPI_PM_BASE0x600
+#define PM1A_CNT_OFFSET 4
+
+#define ACPI_SCI_ENABLE 0x0001
+#define ACPI_SLEEP_TYPE 0x0400
+#define ACPI_SLEEP_ENABLE   0x2000
+#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
+
+#define LOW_ADDRX86_TEST_MEM_START
+#define HIGH_ADDR   X86_TEST_MEM_END
+
+/* Save the suspended status at an address that is not written in the loop. */
+#define suspended   (X86_TEST_MEM_START + 4)
 
 .code16
 .org 0x7c00
@@ -35,8 +52,8 @@ start: # at 0x7c00 ?
 mov %eax,%ds
 
 # Start from 1MB
-.set TEST_MEM_START, (1024*1024)
-.set TEST_MEM_END, (100*1024*1024)
+.set TEST_MEM_START, X86_TEST_MEM_START
+.set TEST_MEM_END, X86_TEST_MEM_END
 
 mov $65,%ax
 mov $0x3f8,%dx
@@ -69,7 +86,30 @@ innerloop:
 mov $0x3f8,%dx
 outb %al,%dx
 
-jmp mainloop
+# should this test suspend?
+mov (suspend_me),%eax
+cmp $0,%eax
+je mainloop
+
+# are we waking after suspend?  do not suspend again.
+mov $suspended,%eax
+mov (%eax),%eax
+cmp $1,%eax
+je mainloop
+
+# enable acpi
+mov $ACPI_ENABLE,%al
+outb %al,$ACPI_PORT_SMI_CMD
+
+# suspend to ram
+mov $suspended,%eax
+movl $1,(%eax)
+mov $SLEEP,%ax
+mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
+outw %ax,%dx
+# not reached.  The wakeup causes reset and restart at 0x7c00, and we
+# do not save and restore registers as a real kernel would do.
+
 
 # GDT magic from old (GPLv2)  Grub startup.S
 .p2align2   /* force 4-byte alignment */
@@ -95,6 +135,10 @@ gdtdesc:
 .word   0x27/* limit */
 .long   gdt /* addr */
 
+/* test launcher can poke a 1 here to exercise suspend */
+suspend_me:
+.int  0
+
 /* I'm a bootable disk */
 .org 0x7dfe
 .byte 0x55
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index 5b52391..c83f871 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,7 +4,7 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
@@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = {
   0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
   0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
   0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
-  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00,

[PATCH V7 00/12] fix migration of suspended runstate

2023-12-06 Thread Steve Sistare

Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.

See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Changes in V4:
  * rebase to tip
  * add patch for vm_prepare_start (thanks Peter)
  * add patch to preserve cpu ticks

Changes in V5:
  * rebase to tip
  * added patches to completely stop vm in suspended state:
  cpus: refactor vm_stop
  cpus: stop vm in suspended state
  * added patch to partially resume vm in suspended state:
  cpus: start vm in suspended state
  * modified "preserve suspended ..." patches to use the above.
  * deleted patch "preserve cpu ticks if suspended".  stop ticks in
vm_stop_force_state instead.
  * deleted patch "add runstate function".  defined new helper function
migrate_new_runstate in "preserve suspended runstate"
  * Added some RB's, but removed other RB's because the patches changed.

Changes in V6:
  * all vm_stop calls completely stop the suspended state
  * refactored and updated the "cpus" patches
  * simplified the "preserve suspended" patches
  * added patch "bootfile per vm"

Changes in V7:
  * rebase to tip, add RB-s
  * fix backwards compatibility for global_state.vm_was_suspended
  * delete vm_prepare_start state argument, and rename patch
"pass runstate to vm_prepare_start" to
"check running not RUN_STATE_RUNNING"
  * drop patches:
  tests/qtest: bootfile per vm
  tests/qtest: background migration with suspend
  * rename runstate_is_started to runstate_is_live
  * move wait_for_suspend in tests

Steve Sistare (12):
  cpus: vm_was_suspended
  cpus: stop vm in suspended runstate
  cpus: check running not RUN_STATE_RUNNING
  cpus: vm_resume
  migration: propagate suspended runstate
  migration: preserve suspended runstate
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

 backends/tpm/tpm_emulator.c  |   2 +-
 hw/usb/hcd-ehci.c|   2 +-
 hw/usb/redirect.c|   2 +-
 hw/xen/xen-hvm-common.c  |   2 +-
 include/migration/snapshot.h |   7 ++
 include/sysemu/runstate.h|  16 
 migration/global_state.c |  35 ++-
 migration/migration-hmp-cmds.c   |   8 +-
 migration/migration.c|  15 +--
 migration/savevm.c   |  23 +++--
 qapi/misc.json   |  10 +-
 system/cpus.c|  47 +++--
 system/runstate.c|   9 ++
 system/vl.c  |   2 +
 tests/migration/i386/Makefile|   5 +-
 tests/migration/i386/a-b-bootblock.S |  50 +-
 tests/migration/i386/a-b-bootblock.h |  26 +++--
 tests/qtest/migration-helpers.c  |  27 ++
 tests/qtest/migration-helpers.h  |  11 ++-
 tests/qtest/migration-test.c | 181 +--
 20 files changed, 354 insertions(+), 126 deletions(-)

-- 
1.8.3.1

Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Maxim Levitsky

On Tue, 2023-12-05 at 16:28 -0600, Michael Roth wrote:
> Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
> exposed a long-running bug in current KVM support for SEV-ES where the
> kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
> kernel, in which case EFER write traps would result in KVM eventually
> seeing MSR_EFER_LMA get set and recording it in such a way that it would
> be subsequently visible when accessing it via KVM_GET_SREGS/etc.
> 
> However, guests kernels currently rely on MSR_EFER_LMA getting set
> automatically when MSR_EFER_LME is set and paging is enabled via
> CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
> MSR_EFER_LMA even though it is set internally, and when QEMU
> subsequently tries to pass this EFER value back to KVM via
> KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
> which is now considered fatal due to the aforementioned QEMU commit.
> 
> This can be addressed by inferring the MSR_EFER_LMA bit being set when
> paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
> the expected bits are all present in subsequent handling on the host
> side.
> 
> Ultimately, this handling will be implemented in the host kernel, but to
> avoid breaking QEMU's SEV-ES support when using older host kernels, the
> same handling can be done in QEMU just after fetching the register
> values via KVM_GET_SREGS*. Implement that here.
> 
> Cc: Paolo Bonzini 
> Cc: Marcelo Tosatti 
> Cc: Tom Lendacky 
> Cc: Akihiko Odaki 
> Cc: k...@vger.kernel.org
> Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> Signed-off-by: Michael Roth 
> ---
> v2:
>   - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2
> 
>  target/i386/kvm/kvm.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 11b8177eff..8721c1bf8f 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu)
>  {
>  CPUX86State *env = >env;
>  struct kvm_sregs sregs;
> +target_ulong cr0_old;
>  int ret;
>  
>  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, );
> @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
>  env->gdt.limit = sregs.gdt.limit;
>  env->gdt.base = sregs.gdt.base;
>  
> +cr0_old = env->cr[0];
>  env->cr[0] = sregs.cr0;
>  env->cr[2] = sregs.cr2;
>  env->cr[3] = sregs.cr3;
>  env->cr[4] = sregs.cr4;
>  
>  env->efer = sregs.efer;
> +if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> +env->efer |= MSR_EFER_LMA;
> +}
> +}

I think that we should not check that CR0_PG has changed, and just blindly 
assume
that if EFER.LME is set and CR0.PG is set, then EFER.LMA must be set as defined 
in x86 spec.

Otherwise, suppose qemu calls kvm_get_sregs twice: First time it will work,
but second time CR0.PG will match one that is stored in the env, and thus the 
workaround
will not be executed, and instead we will revert back to wrong EFER value 
reported by the kernel.

How about something like that:


if (sev_es_enabled() && env->efer & MSR_EFER_LME && env->cr[0] & CR0_PG_MASK) {
/* 
 * Workaround KVM bug, because of which KVM might not be aware of the 
 * fact that EFER.LMA was toggled by the hardware 
 */
env->efer |= MSR_EFER_LMA;
}


Best regards,
Maxim Levitsky

>  
>  /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run 
> */
>  x86_update_hflags(env);
> @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
>  {
>  CPUX86State *env = >env;
>  struct kvm_sregs2 sregs;
> +target_ulong cr0_old;
>  int i, ret;
>  
>  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, );
> @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
>  env->gdt.limit = sregs.gdt.limit;
>  env->gdt.base = sregs.gdt.base;
>  
> +cr0_old = env->cr[0];
>  env->cr[0] = sregs.cr0;
>  env->cr[2] = sregs.cr2;
>  env->cr[3] = sregs.cr3;
>  env->cr[4] = sregs.cr4;
>  
>  env->efer = sregs.efer;
> +if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> +env->efer |= MSR_EFER_LMA;
> +}
> +}
>  
>  env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
>

[PATCH V7 06/12] migration: preserve suspended runstate

2023-12-06 Thread Steve Sistare

A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3ce04b2..8124811 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -604,7 +604,7 @@ static void process_incoming_migration_bh(void *opaque)
  */
 if (!migrate_late_block_activate() ||
  (autostart && (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING))) {
+runstate_is_live(global_state_get_runstate() {
 /* Make sure all file formats throw away their mutable metadata.
  * If we get an error here, just don't restart the VM yet. */
 bdrv_activate_all(_err);
@@ -628,7 +628,7 @@ static void process_incoming_migration_bh(void *opaque)
 dirty_bitmap_mig_before_vm_start();
 
 if (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING) {
+runstate_is_live(global_state_get_runstate())) {
 if (autostart) {
 vm_start();
 } else {
@@ -2416,7 +2416,6 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 migration_downtime_start(ms);
 
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 global_state_store();
 ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
@@ -2615,7 +2614,6 @@ static int migration_completion_precopy(MigrationState *s,
 
 qemu_mutex_lock_iothread();
 migration_downtime_start(s);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
 s->vm_old_state = runstate_get();
 global_state_store();
@@ -3136,7 +3134,7 @@ static void migration_iteration_finish(MigrationState *s)
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
 case MIGRATION_STATUS_CANCELLING:
-if (s->vm_old_state == RUN_STATE_RUNNING) {
+if (runstate_is_live(s->vm_old_state)) {
 if (!runstate_check(RUN_STATE_SHUTDOWN)) {
 vm_start();
 }
-- 
1.8.3.1

[PATCH 13/14] tests/qtest: precopy migration with suspend

2023-12-06 Thread Steve Sistare

Add a test case to verify that the suspended state is handled correctly
during live migration precopy.  The test suspends the src, migrates, then
wakes the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-helpers.c |  3 ++
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-test.c| 62 +++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index fd3b94e..37e8e81 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char 
*name,
 if (g_str_equal(name, "STOP")) {
 state->stop_seen = true;
 return true;
+} else if (g_str_equal(name, "SUSPEND")) {
+state->suspend_seen = true;
+return true;
 } else if (g_str_equal(name, "RESUME")) {
 state->resume_seen = true;
 return true;
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3d32699..b478549 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -18,6 +18,8 @@
 typedef struct QTestMigrationState {
 bool stop_seen;
 bool resume_seen;
+bool suspend_seen;
+bool suspend_me;
 } QTestMigrationState;
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 84c78e3..03da984 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -178,7 +178,7 @@ static void bootfile_delete(void)
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
- * but on the destination we won't have the A.
+ * but on the destination we won't have the A (unless we enabled 
suspend/resume)
  */
 static void wait_for_serial(const char *side)
 {
@@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, 
QTestMigrationState *state)
 }
 }
 
+static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
+{
+if (state->suspend_me && !state->suspend_seen) {
+qtest_qmp_eventwait(who, "SUSPEND");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
 {
 uint64_t pass, prev_pass = 0, changes = 0;
 
-while (changes < 2 && !src_state.stop_seen) {
+while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
 changes += (pass != prev_pass);
@@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
 usleep(1000 * 10);
 } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
 
+
+/* If suspended, src only iterates once, and watch_byte may never change */
+if (src_state.suspend_me) {
+return;
+}
+
 /*
  * Now ensure that already transferred bytes are
  * dirty again from the guest workload. Note the
@@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 dst_state = (QTestMigrationState) { };
 src_state = (QTestMigrationState) { };
 bootfile_create(tmpfs, args->suspend_me);
+src_state.suspend_me = args->suspend_me;
 
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
@@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args)
 /* Wait for the first serial output from the source */
 if (args->result == MIG_TEST_SUCCEED) {
 wait_for_serial("src_serial");
+wait_for_suspend(from, _state);
 }
 
 if (args->live) {
@@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args)
 
 wait_for_resume(to, _state);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 wait_for_serial("dest_serial");
 }
 
@@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void)
 test_precopy_common();
 }
 
+static void test_precopy_unix_suspend_live(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+/*
+ * despite being live, the test is fast because the src
+ * suspends immediately.
+ */
+.live = true,
+.start.suspend_me = true,
+};
+
+test_precopy_common();
+}
+
+static void test_precopy_unix_suspend_notlive(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+.start.suspend_me = true,
+};
+
+

[PATCH 06/14] cpus: vm_resume

2023-12-06 Thread Steve Sistare

Define the vm_resume helper, for use in subsequent patches.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h | 9 +
 system/cpus.c | 9 +
 2 files changed, 18 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 867e53c..315046b 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -53,6 +53,15 @@ void vm_start(void);
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
  */
 int vm_prepare_start(bool step_pending);
+
+/**
+ * vm_resume: If @state is a live state, start the vm and set the state,
+ * else just set the state.
+ *
+ * @state: the state to restore
+ */
+void vm_resume(RunState state);
+
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/system/cpus.c b/system/cpus.c
index 73b0e3e..45e4b3e 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -741,6 +741,15 @@ void vm_start(void)
 }
 }
 
+void vm_resume(RunState state)
+{
+if (runstate_is_live(state)) {
+vm_start();
+} else {
+runstate_set(state);
+}
+}
+
 /* does a state transition even if the VM is already stopped,
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
-- 
1.8.3.1

[PATCH 08/14] migration: preserve suspended runstate

2023-12-06 Thread Steve Sistare

A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 28a34c9..52ba0bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -603,7 +603,7 @@ static void process_incoming_migration_bh(void *opaque)
  */
 if (!migrate_late_block_activate() ||
  (autostart && (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING))) {
+runstate_is_live(global_state_get_runstate() {
 /* Make sure all file formats throw away their mutable metadata.
  * If we get an error here, just don't restart the VM yet. */
 bdrv_activate_all(_err);
@@ -627,7 +627,7 @@ static void process_incoming_migration_bh(void *opaque)
 dirty_bitmap_mig_before_vm_start();
 
 if (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING) {
+runstate_is_live(global_state_get_runstate())) {
 if (autostart) {
 vm_start();
 } else {
@@ -2415,7 +2415,6 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 migration_downtime_start(ms);
 
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 global_state_store();
 ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
@@ -2614,7 +2613,6 @@ static int migration_completion_precopy(MigrationState *s,
 
 qemu_mutex_lock_iothread();
 migration_downtime_start(s);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
 s->vm_old_state = runstate_get();
 global_state_store();
@@ -3135,7 +3133,7 @@ static void migration_iteration_finish(MigrationState *s)
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
 case MIGRATION_STATUS_CANCELLING:
-if (s->vm_old_state == RUN_STATE_RUNNING) {
+if (runstate_is_live(s->vm_old_state)) {
 if (!runstate_check(RUN_STATE_SHUTDOWN)) {
 vm_start();
 }
-- 
1.8.3.1

[PATCH V7 10/12] tests/qtest: option to suspend during migration

2023-12-06 Thread Steve Sistare

Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory.  The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known.  Generate the bootblock for each test, because suspend_me
may differ for each.

Signed-off-by: Steve Sistare 
Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
---
 tests/migration/i386/Makefile|  5 ++--
 tests/migration/i386/a-b-bootblock.S | 50 +---
 tests/migration/i386/a-b-bootblock.h | 26 +--
 tests/qtest/migration-test.c | 12 ++---
 4 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
index 5c03241..37a72ae 100644
--- a/tests/migration/i386/Makefile
+++ b/tests/migration/i386/Makefile
@@ -4,9 +4,10 @@
 .PHONY: all clean
 all: a-b-bootblock.h
 
-a-b-bootblock.h: x86.bootsect
+a-b-bootblock.h: x86.bootsect x86.o
echo "$$__note" > header.tmp
xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
+   nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
mv header.tmp $@
 
 x86.bootsect: x86.boot
@@ -16,7 +17,7 @@ x86.boot: x86.o
$(CROSS_PREFIX)objcopy -O binary $< $@
 
 x86.o: a-b-bootblock.S
-   $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
+   $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
 
 clean:
@rm -rf *.boot *.o *.bootsect
diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 6bb..6f39eb6 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -9,6 +9,23 @@
 #
 # Author: dgilb...@redhat.com
 
+#include "migration-test.h"
+
+#define ACPI_ENABLE 0xf1
+#define ACPI_PORT_SMI_CMD   0xb2
+#define ACPI_PM_BASE0x600
+#define PM1A_CNT_OFFSET 4
+
+#define ACPI_SCI_ENABLE 0x0001
+#define ACPI_SLEEP_TYPE 0x0400
+#define ACPI_SLEEP_ENABLE   0x2000
+#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
+
+#define LOW_ADDRX86_TEST_MEM_START
+#define HIGH_ADDR   X86_TEST_MEM_END
+
+/* Save the suspended status at an address that is not written in the loop. */
+#define suspended   (X86_TEST_MEM_START + 4)
 
 .code16
 .org 0x7c00
@@ -35,8 +52,8 @@ start: # at 0x7c00 ?
 mov %eax,%ds
 
 # Start from 1MB
-.set TEST_MEM_START, (1024*1024)
-.set TEST_MEM_END, (100*1024*1024)
+.set TEST_MEM_START, X86_TEST_MEM_START
+.set TEST_MEM_END, X86_TEST_MEM_END
 
 mov $65,%ax
 mov $0x3f8,%dx
@@ -69,7 +86,30 @@ innerloop:
 mov $0x3f8,%dx
 outb %al,%dx
 
-jmp mainloop
+# should this test suspend?
+mov (suspend_me),%eax
+cmp $0,%eax
+je mainloop
+
+# are we waking after suspend?  do not suspend again.
+mov $suspended,%eax
+mov (%eax),%eax
+cmp $1,%eax
+je mainloop
+
+# enable acpi
+mov $ACPI_ENABLE,%al
+outb %al,$ACPI_PORT_SMI_CMD
+
+# suspend to ram
+mov $suspended,%eax
+movl $1,(%eax)
+mov $SLEEP,%ax
+mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
+outw %ax,%dx
+# not reached.  The wakeup causes reset and restart at 0x7c00, and we
+# do not save and restore registers as a real kernel would do.
+
 
 # GDT magic from old (GPLv2)  Grub startup.S
 .p2align2   /* force 4-byte alignment */
@@ -95,6 +135,10 @@ gdtdesc:
 .word   0x27/* limit */
 .long   gdt /* addr */
 
+/* test launcher can poke a 1 here to exercise suspend */
+suspend_me:
+.int  0
+
 /* I'm a bootable disk */
 .org 0x7dfe
 .byte 0x55
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index 5b52391..c83f871 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,7 +4,7 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
@@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = {
   0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
   0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
   0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
-  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00,

[PATCH V7 11/12] tests/qtest: precopy migration with suspend

2023-12-06 Thread Steve Sistare

Add a test case to verify that the suspended state is handled correctly
during live migration precopy.  The test suspends the src, migrates, then
wakes the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-helpers.c |  3 ++
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-test.c| 62 +++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index fd3b94e..37e8e81 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char 
*name,
 if (g_str_equal(name, "STOP")) {
 state->stop_seen = true;
 return true;
+} else if (g_str_equal(name, "SUSPEND")) {
+state->suspend_seen = true;
+return true;
 } else if (g_str_equal(name, "RESUME")) {
 state->resume_seen = true;
 return true;
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3d32699..b478549 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -18,6 +18,8 @@
 typedef struct QTestMigrationState {
 bool stop_seen;
 bool resume_seen;
+bool suspend_seen;
+bool suspend_me;
 } QTestMigrationState;
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e10d5a4..f57a978 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -178,7 +178,7 @@ static void bootfile_delete(void)
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
- * but on the destination we won't have the A.
+ * but on the destination we won't have the A (unless we enabled 
suspend/resume)
  */
 static void wait_for_serial(const char *side)
 {
@@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, 
QTestMigrationState *state)
 }
 }
 
+static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
+{
+if (state->suspend_me && !state->suspend_seen) {
+qtest_qmp_eventwait(who, "SUSPEND");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
 {
 uint64_t pass, prev_pass = 0, changes = 0;
 
-while (changes < 2 && !src_state.stop_seen) {
+while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
 changes += (pass != prev_pass);
@@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
 usleep(1000 * 10);
 } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
 
+
+/* If suspended, src only iterates once, and watch_byte may never change */
+if (src_state.suspend_me) {
+return;
+}
+
 /*
  * Now ensure that already transferred bytes are
  * dirty again from the guest workload. Note the
@@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 dst_state = (QTestMigrationState) { };
 src_state = (QTestMigrationState) { };
 bootfile_create(tmpfs, args->suspend_me);
+src_state.suspend_me = args->suspend_me;
 
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
@@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args)
 /* Wait for the first serial output from the source */
 if (args->result == MIG_TEST_SUCCEED) {
 wait_for_serial("src_serial");
+wait_for_suspend(from, _state);
 }
 
 if (args->live) {
@@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args)
 
 wait_for_resume(to, _state);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 wait_for_serial("dest_serial");
 }
 
@@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void)
 test_precopy_common();
 }
 
+static void test_precopy_unix_suspend_live(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+/*
+ * despite being live, the test is fast because the src
+ * suspends immediately.
+ */
+.live = true,
+.start.suspend_me = true,
+};
+
+test_precopy_common();
+}
+
+static void test_precopy_unix_suspend_notlive(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+.start.suspend_me = true,
+};
+
+

[PATCH V7 07/12] migration: preserve suspended for snapshot

2023-12-06 Thread Steve Sistare

Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.

To save, the existing vm_stop call now completely stops the suspended
state.  Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/migration/snapshot.h   |  7 +++
 migration/migration-hmp-cmds.c |  8 +---
 migration/savevm.c | 23 +--
 system/runstate.c  |  5 +
 system/vl.c|  2 ++
 5 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index e72083b..9e4dcaa 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -16,6 +16,7 @@
 #define QEMU_MIGRATION_SNAPSHOT_H
 
 #include "qapi/qapi-builtin-types.h"
+#include "qapi/qapi-types-run-state.h"
 
 /**
  * save_snapshot: Save an internal snapshot.
@@ -61,4 +62,10 @@ bool delete_snapshot(const char *name,
 bool has_devices, strList *devices,
 Error **errp);
 
+/**
+ * load_snapshot_resume: Restore runstate after loading snapshot.
+ * @state: state to restore
+ */
+void load_snapshot_resume(RunState state);
+
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 86ae832..c8d70bc 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
QDict *qdict)
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-int saved_vm_running  = runstate_is_running();
+RunState saved_state = runstate_get();
+
 const char *name = qdict_get_str(qdict, "name");
 Error *err = NULL;
 
 vm_stop(RUN_STATE_RESTORE_VM);
 
-if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
-vm_start();
+if (load_snapshot(name, NULL, false, NULL, )) {
+load_snapshot_resume(saved_state);
 }
+
 hmp_handle_error(mon, err);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index eec5503..26866f0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 QEMUSnapshotInfo sn1, *sn = 
 int ret = -1, ret2;
 QEMUFile *f;
-int saved_vm_running;
+RunState saved_state = runstate_get();
 uint64_t vm_state_size;
 g_autoptr(GDateTime) now = g_date_time_new_now_local();
 AioContext *aio_context;
@@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 }
 aio_context = bdrv_get_aio_context(bs);
 
-saved_vm_running = runstate_is_running();
-
 global_state_store();
 vm_stop(RUN_STATE_SAVE_VM);
 
@@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 
 bdrv_drain_all_end();
 
-if (saved_vm_running) {
-vm_start();
-}
+vm_resume(saved_state);
 return ret == 0;
 }
 
@@ -3339,6 +3335,14 @@ err_drain:
 return false;
 }
 
+void load_snapshot_resume(RunState state)
+{
+vm_resume(state);
+if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) {
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, _abort);
+}
+}
+
 bool delete_snapshot(const char *name, bool has_devices,
  strList *devices, Error **errp)
 {
@@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque)
 {
 Job *job = opaque;
 SnapshotJob *s = container_of(job, SnapshotJob, common);
-int orig_vm_running;
+RunState orig_state = runstate_get();
 
 job_progress_set_remaining(>common, 1);
 
-orig_vm_running = runstate_is_running();
 vm_stop(RUN_STATE_RESTORE_VM);
 
 s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
-if (s->ret && orig_vm_running) {
-vm_start();
+if (s->ret) {
+load_snapshot_resume(orig_state);
 }
 
 job_progress_update(>common, 1);
diff --git a/system/runstate.c b/system/runstate.c
index e2fa204..ca9eb54 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@

[PATCH 14/14] tests/qtest: postcopy migration with suspend

2023-12-06 Thread Steve Sistare

Add a test case to verify that the suspended state is handled correctly by
live migration postcopy.  The test suspends the src, migrates, then wakes
the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 03da984..2033085 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
+wait_for_suspend(from, _state);
 
 g_autofree char *uri = migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, uri, "{}");
@@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to,
 {
 wait_for_migration_complete(from);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 /* Make sure we get at least one "B" on destination */
 wait_for_serial("dest_serial");
 
@@ -1397,6 +1403,15 @@ static void test_postcopy(void)
 test_postcopy_common();
 }
 
+static void test_postcopy_suspend(void)
+{
+MigrateCommon args = {
+.start.suspend_me = true,
+};
+
+test_postcopy_common();
+}
+
 static void test_postcopy_compress(void)
 {
 MigrateCommon args = {
@@ -3412,7 +3427,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/recovery/double-failures",
test_postcopy_recovery_double_fail);
 #endif /* _WIN32 */
-
+if (is_x86) {
+qtest_add_func("/migration/postcopy/suspend",
+   test_postcopy_suspend);
+}
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
-- 
1.8.3.1

[PATCH V7 12/12] tests/qtest: postcopy migration with suspend

2023-12-06 Thread Steve Sistare

Add a test case to verify that the suspended state is handled correctly by
live migration postcopy.  The test suspends the src, migrates, then wakes
the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f57a978..b7c094e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
+wait_for_suspend(from, _state);
 
 g_autofree char *uri = migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, uri, "{}");
@@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to,
 {
 wait_for_migration_complete(from);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 /* Make sure we get at least one "B" on destination */
 wait_for_serial("dest_serial");
 
@@ -1397,6 +1403,15 @@ static void test_postcopy(void)
 test_postcopy_common();
 }
 
+static void test_postcopy_suspend(void)
+{
+MigrateCommon args = {
+.start.suspend_me = true,
+};
+
+test_postcopy_common();
+}
+
 static void test_postcopy_compress(void)
 {
 MigrateCommon args = {
@@ -3412,7 +3427,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/recovery/double-failures",
test_postcopy_recovery_double_fail);
 #endif /* _WIN32 */
-
+if (is_x86) {
+qtest_add_func("/migration/postcopy/suspend",
+   test_postcopy_suspend);
+}
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
-- 
1.8.3.1

Re: [PATCH V7 00/12] fix migration of suspended runstate

2023-12-06 Thread Steven Sistare

Gack, there was cruft in my send mail directory.  Please ignore this threaded 
series,
and I will resend.  Sorry for the noise.

- Steve

On 12/6/2023 12:12 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>   cpus: refactor vm_stop
>   cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>   cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
> vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
> migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
> "pass runstate to vm_prepare_start" to
> "check running not RUN_STATE_RUNNING"
>   * drop patches:
>   tests/qtest: bootfile per vm
>   tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c  |   2 +-
>  hw/usb/hcd-ehci.c|   2 +-
>  hw/usb/redirect.c|   2 +-
>  hw/xen/xen-hvm-common.c  |   2 +-
>  include/migration/snapshot.h |   7 ++
>  include/sysemu/runstate.h|  16 
>  migration/global_state.c |  35 ++-
>  migration/migration-hmp-cmds.c   |   8 +-
>  migration/migration.c|  15 +--
>  migration/savevm.c   |  23 +++--
>  qapi/misc.json   |  10 +-
>  system/cpus.c|  47 +++--
>  system/runstate.c|   9 ++
>  system/vl.c  |   2 +
>  tests/migration/i386/Makefile|   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c  |  27 ++
>  tests/qtest/migration-helpers.h  |  11 ++-
>  tests/qtest/migration-test.c | 181 
> +--
>  20 files changed, 354 insertions(+), 126 deletions(-)
>

[PATCH 12/14] tests/qtest: option to suspend during migration

2023-12-06 Thread Steve Sistare

Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory.  The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known.  Generate the bootblock for each test, because suspend_me
may differ for each.

Signed-off-by: Steve Sistare 
Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
---
 tests/migration/i386/Makefile|  5 ++--
 tests/migration/i386/a-b-bootblock.S | 50 +---
 tests/migration/i386/a-b-bootblock.h | 26 +--
 tests/qtest/migration-test.c | 12 ++---
 4 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
index 5c03241..37a72ae 100644
--- a/tests/migration/i386/Makefile
+++ b/tests/migration/i386/Makefile
@@ -4,9 +4,10 @@
 .PHONY: all clean
 all: a-b-bootblock.h
 
-a-b-bootblock.h: x86.bootsect
+a-b-bootblock.h: x86.bootsect x86.o
echo "$$__note" > header.tmp
xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
+   nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
mv header.tmp $@
 
 x86.bootsect: x86.boot
@@ -16,7 +17,7 @@ x86.boot: x86.o
$(CROSS_PREFIX)objcopy -O binary $< $@
 
 x86.o: a-b-bootblock.S
-   $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
+   $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
 
 clean:
@rm -rf *.boot *.o *.bootsect
diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 6bb..6f39eb6 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -9,6 +9,23 @@
 #
 # Author: dgilb...@redhat.com
 
+#include "migration-test.h"
+
+#define ACPI_ENABLE 0xf1
+#define ACPI_PORT_SMI_CMD   0xb2
+#define ACPI_PM_BASE0x600
+#define PM1A_CNT_OFFSET 4
+
+#define ACPI_SCI_ENABLE 0x0001
+#define ACPI_SLEEP_TYPE 0x0400
+#define ACPI_SLEEP_ENABLE   0x2000
+#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
+
+#define LOW_ADDRX86_TEST_MEM_START
+#define HIGH_ADDR   X86_TEST_MEM_END
+
+/* Save the suspended status at an address that is not written in the loop. */
+#define suspended   (X86_TEST_MEM_START + 4)
 
 .code16
 .org 0x7c00
@@ -35,8 +52,8 @@ start: # at 0x7c00 ?
 mov %eax,%ds
 
 # Start from 1MB
-.set TEST_MEM_START, (1024*1024)
-.set TEST_MEM_END, (100*1024*1024)
+.set TEST_MEM_START, X86_TEST_MEM_START
+.set TEST_MEM_END, X86_TEST_MEM_END
 
 mov $65,%ax
 mov $0x3f8,%dx
@@ -69,7 +86,30 @@ innerloop:
 mov $0x3f8,%dx
 outb %al,%dx
 
-jmp mainloop
+# should this test suspend?
+mov (suspend_me),%eax
+cmp $0,%eax
+je mainloop
+
+# are we waking after suspend?  do not suspend again.
+mov $suspended,%eax
+mov (%eax),%eax
+cmp $1,%eax
+je mainloop
+
+# enable acpi
+mov $ACPI_ENABLE,%al
+outb %al,$ACPI_PORT_SMI_CMD
+
+# suspend to ram
+mov $suspended,%eax
+movl $1,(%eax)
+mov $SLEEP,%ax
+mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
+outw %ax,%dx
+# not reached.  The wakeup causes reset and restart at 0x7c00, and we
+# do not save and restore registers as a real kernel would do.
+
 
 # GDT magic from old (GPLv2)  Grub startup.S
 .p2align2   /* force 4-byte alignment */
@@ -95,6 +135,10 @@ gdtdesc:
 .word   0x27/* limit */
 .long   gdt /* addr */
 
+/* test launcher can poke a 1 here to exercise suspend */
+suspend_me:
+.int  0
+
 /* I'm a bootable disk */
 .org 0x7dfe
 .byte 0x55
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index 5b52391..c83f871 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,7 +4,7 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
@@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = {
   0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
   0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
   0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
-  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00,

[PATCH 03/14] cpus: vm_was_suspended

2023-12-06 Thread Steve Sistare

Add a state variable to remember if a vm previously transitioned into a
suspended state.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h |  2 ++
 system/cpus.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index c8c2bd8..88a67e2 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
+void vm_set_suspended(bool suspended);
+bool vm_get_suspended(void);
 
 typedef enum WakeupReason {
 /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/system/cpus.c b/system/cpus.c
index 0848e0d..9baf096 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -252,6 +252,21 @@ void cpu_interrupt(CPUState *cpu, int mask)
 }
 }
 
+/*
+ * True if the vm was previously suspended, and has not been woken or reset.
+ */
+static int vm_was_suspended;
+
+void vm_set_suspended(bool suspended)
+{
+vm_was_suspended = suspended;
+}
+
+bool vm_get_suspended(void)
+{
+return vm_was_suspended;
+}
+
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
-- 
1.8.3.1

[PATCH V7 01/12] cpus: vm_was_suspended

2023-12-06 Thread Steve Sistare

Add a state variable to remember if a vm previously transitioned into a
suspended state.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h |  2 ++
 system/cpus.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index c8c2bd8..88a67e2 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
+void vm_set_suspended(bool suspended);
+bool vm_get_suspended(void);
 
 typedef enum WakeupReason {
 /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/system/cpus.c b/system/cpus.c
index a444a74..9f631ab 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -259,6 +259,21 @@ void cpu_interrupt(CPUState *cpu, int mask)
 }
 }
 
+/*
+ * True if the vm was previously suspended, and has not been woken or reset.
+ */
+static int vm_was_suspended;
+
+void vm_set_suspended(bool suspended)
+{
+vm_was_suspended = suspended;
+}
+
+bool vm_get_suspended(void)
+{
+return vm_was_suspended;
+}
+
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
-- 
1.8.3.1

[PATCH 04/14] cpus: stop vm in suspended runstate

2023-12-06 Thread Steve Sistare

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

(qemu) info status
VM status: paused (suspended)

(qemu) stop
(qemu) info status
VM status: paused

(qemu) system_wakeup
Error: Unable to wake up: guest is not in suspended state

(qemu) cont
(qemu) info status
VM status: paused (suspended)

(qemu) system_wakeup
(qemu) info status
VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h |  5 +
 qapi/misc.json| 10 --
 system/cpus.c | 23 +++
 system/runstate.c |  3 +++
 4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
cause)
 return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
 }
 
+static inline bool runstate_is_live(RunState state)
+{
+return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}
+
 void vm_start(void);
 
 /**
diff --git a/qapi/misc.json b/qapi/misc.json
index cda2eff..efb8d44 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -134,7 +134,7 @@
 ##
 # @stop:
 #
-# Stop all guest VCPU execution.
+# Stop all guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -143,6 +143,9 @@
 # the guest remains paused once migration finishes, as if the -S
 # option was passed on the command line.
 #
+# In the "suspended" state, it will completely stop the VM and
+# cause a transition to the "paused" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "stop" }
@@ -153,7 +156,7 @@
 ##
 # @cont:
 #
-# Resume guest VCPU execution.
+# Resume guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -165,6 +168,9 @@
 # guest starts once migration finishes, removing the effect of the
 # -S command line option if it was passed.
 #
+# If the VM was previously suspended, and not been reset or woken,
+# this command will transition back to the "suspended" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "cont" }
diff --git a/system/cpus.c b/system/cpus.c
index 9baf096..73b0e3e 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -270,11 +270,15 @@ bool vm_get_suspended(void)
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
+RunState oldstate = runstate_get();
 
-if (runstate_is_running()) {
+if (runstate_is_live(oldstate)) {
+vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
 runstate_set(state);
 cpu_disable_ticks();
-pause_all_vcpus();
+if (oldstate == RUN_STATE_RUNNING) {
+pause_all_vcpus();
+}
 vm_state_notify(0, state);
 if (send_stop) {
 qapi_event_send_stop();
@@ -687,11 +691,13 @@ int vm_stop(RunState state)
 
 /**
  * Prepare for (re)starting the VM.
- * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
- * running or in case of an error condition), 0 otherwise.
+ * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
+ * and 1 otherwise.
  */
 int vm_prepare_start(bool step_pending)
 {
+int ret = vm_was_suspended ? 1 : 0;
+RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
RUN_STATE_RUNNING;
 RunState requested;
 
 qemu_vmstop_requested();
@@ -722,9 +728,10 @@ int vm_prepare_start(bool step_pending)
 qapi_event_send_resume();
 
 cpu_enable_ticks();
-runstate_set(RUN_STATE_RUNNING);
-vm_state_notify(1, RUN_STATE_RUNNING);
-return 0;
+runstate_set(state);
+vm_state_notify(1, state);
+vm_was_suspended = false;
+return ret;
 }
 
 void vm_start(void)
@@ -738,7 +745,7 @@ void vm_start(void)
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
 {
-if (runstate_is_running()) {
+if (runstate_is_live(runstate_get())) {
 return vm_stop(state);
 } else {
 int ret;
diff --git a/system/runstate.c b/system/runstate.c
index ea9d6c2..e2fa204 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 {

[PATCH V7 08/12] migration: preserve suspended for bg_migration

2023-12-06 Thread Steve Sistare

Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 7 +--
 system/runstate.c | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8124811..2cc7e2a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3390,7 +3390,7 @@ static void bg_migration_vm_start_bh(void *opaque)
 qemu_bh_delete(s->vm_start_bh);
 s->vm_start_bh = NULL;
 
-vm_start();
+vm_resume(s->vm_old_state);
 migration_downtime_end(s);
 }
 
@@ -3462,11 +3462,6 @@ static void *bg_migration_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 
-/*
- * If VM is currently in suspended state, then, to make a valid runstate
- * transition in vm_stop_force_state() we need to wakeup it up.
- */
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 s->vm_old_state = runstate_get();
 
 global_state_store();
diff --git a/system/runstate.c b/system/runstate.c
index ca9eb54..621a023 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
 { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM },
 { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM },
+{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN },
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1

[PATCH 07/14] migration: propagate suspended runstate

2023-12-06 Thread Steve Sistare

If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.

Signed-off-by: Steve Sistare 
---
 migration/global_state.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..d4f61a1 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -22,7 +22,16 @@
 
 typedef struct {
 uint32_t size;
-uint8_t runstate[100];
+
+/*
+ * runstate was 100 bytes, zero padded, but we trimmed it to add a
+ * few fields and maintain backwards compatibility.
+ */
+uint8_t runstate[32];
+uint8_t has_vm_was_suspended;
+uint8_t vm_was_suspended;
+uint8_t unused[66];
+
 RunState state;
 bool received;
 } GlobalState;
@@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
 assert(strlen(state_str) < sizeof(global_state.runstate));
 strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
   state_str, '\0');
+global_state.has_vm_was_suspended = true;
+global_state.vm_was_suspended = vm_get_suspended();
+
+memset(global_state.unused, 0, sizeof(global_state.unused));
 }
 
 void global_state_store(void)
@@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
 return true;
 }
 
+/* If the suspended state must be remembered, it is needed */
+
+if (vm_get_suspended()) {
+return true;
+}
+
 /* If state is running or paused, it is not needed */
 
 if (strcmp(runstate, "running") == 0 ||
@@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 Error *local_err = NULL;
 int r;
 char *runstate = (char *)s->runstate;
+bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;
 
 s->received = true;
 trace_migrate_global_state_post_load(runstate);
@@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 sizeof(s->runstate)) == sizeof(s->runstate)) {
 /*
  * This condition should never happen during migration, because
- * all runstate names are shorter than 100 bytes (the size of
+ * all runstate names are shorter than 32 bytes (the size of
  * s->runstate). However, a malicious stream could overflow
  * the qapi_enum_parse() call, so we force the last character
  * to a NUL byte.
@@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int 
version_id)
 }
 s->state = r;
 
+/*
+ * global_state is saved on the outgoing side before forcing a stopped
+ * state, so it may have saved state=suspended and vm_was_suspended=0.
+ * Now we are in a paused state, and when we later call vm_start, it must
+ * restore the suspended state, so we must set vm_was_suspended=1 here.
+ */
+vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
+
 return 0;
 }
 
@@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(size, GlobalState),
 VMSTATE_BUFFER(runstate, GlobalState),
+VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
+VMSTATE_UINT8(vm_was_suspended, GlobalState),
+VMSTATE_BUFFER(unused, GlobalState),
 VMSTATE_END_OF_LIST()
 },
 };
-- 
1.8.3.1

[PATCH 10/14] migration: preserve suspended for bg_migration

2023-12-06 Thread Steve Sistare

Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 7 +--
 system/runstate.c | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 52ba0bc..b30313a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3389,7 +3389,7 @@ static void bg_migration_vm_start_bh(void *opaque)
 qemu_bh_delete(s->vm_start_bh);
 s->vm_start_bh = NULL;
 
-vm_start();
+vm_resume(s->vm_old_state);
 migration_downtime_end(s);
 }
 
@@ -3461,11 +3461,6 @@ static void *bg_migration_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 
-/*
- * If VM is currently in suspended state, then, to make a valid runstate
- * transition in vm_stop_force_state() we need to wakeup it up.
- */
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 s->vm_old_state = runstate_get();
 
 global_state_store();
diff --git a/system/runstate.c b/system/runstate.c
index ca9eb54..621a023 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
 { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM },
 { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM },
+{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN },
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1

[PATCH 01/14] temporary: define CLONE_NEWCGROUP qemu-8.2.0

2023-12-06 Thread Steve Sistare

---
 system/qemu-seccomp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/system/qemu-seccomp.c b/system/qemu-seccomp.c
index 4d7439e..9e1ff85 100644
--- a/system/qemu-seccomp.c
+++ b/system/qemu-seccomp.c
@@ -22,6 +22,7 @@
 #include 
 #include "sysemu/seccomp.h"
 #include 
+#define CLONE_NEWCGROUP 0x0200
 
 /* For some architectures (notably ARM) cacheflush is not supported until
  * libseccomp 2.2.3, but configure enforces that we are using a more recent
-- 
1.8.3.1

[PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Steve Sistare

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

(qemu) info status
VM status: paused (suspended)

(qemu) stop
(qemu) info status
VM status: paused

(qemu) system_wakeup
Error: Unable to wake up: guest is not in suspended state

(qemu) cont
(qemu) info status
VM status: paused (suspended)

(qemu) system_wakeup
(qemu) info status
VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h |  5 +
 qapi/misc.json| 10 --
 system/cpus.c | 23 +++
 system/runstate.c |  3 +++
 4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
cause)
 return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
 }
 
+static inline bool runstate_is_live(RunState state)
+{
+return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}
+
 void vm_start(void);
 
 /**
diff --git a/qapi/misc.json b/qapi/misc.json
index cda2eff..efb8d44 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -134,7 +134,7 @@
 ##
 # @stop:
 #
-# Stop all guest VCPU execution.
+# Stop all guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -143,6 +143,9 @@
 # the guest remains paused once migration finishes, as if the -S
 # option was passed on the command line.
 #
+# In the "suspended" state, it will completely stop the VM and
+# cause a transition to the "paused" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "stop" }
@@ -153,7 +156,7 @@
 ##
 # @cont:
 #
-# Resume guest VCPU execution.
+# Resume guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -165,6 +168,9 @@
 # guest starts once migration finishes, removing the effect of the
 # -S command line option if it was passed.
 #
+# If the VM was previously suspended, and not been reset or woken,
+# this command will transition back to the "suspended" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "cont" }
diff --git a/system/cpus.c b/system/cpus.c
index 9f631ab..f162435 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -277,11 +277,15 @@ bool vm_get_suspended(void)
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
+RunState oldstate = runstate_get();
 
-if (runstate_is_running()) {
+if (runstate_is_live(oldstate)) {
+vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
 runstate_set(state);
 cpu_disable_ticks();
-pause_all_vcpus();
+if (oldstate == RUN_STATE_RUNNING) {
+pause_all_vcpus();
+}
 vm_state_notify(0, state);
 if (send_stop) {
 qapi_event_send_stop();
@@ -694,11 +698,13 @@ int vm_stop(RunState state)
 
 /**
  * Prepare for (re)starting the VM.
- * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
- * running or in case of an error condition), 0 otherwise.
+ * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
+ * and 1 otherwise.
  */
 int vm_prepare_start(bool step_pending)
 {
+int ret = vm_was_suspended ? 1 : 0;
+RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
RUN_STATE_RUNNING;
 RunState requested;
 
 qemu_vmstop_requested();
@@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending)
 qapi_event_send_resume();
 
 cpu_enable_ticks();
-runstate_set(RUN_STATE_RUNNING);
-vm_state_notify(1, RUN_STATE_RUNNING);
-return 0;
+runstate_set(state);
+vm_state_notify(1, state);
+vm_was_suspended = false;
+return ret;
 }
 
 void vm_start(void)
@@ -745,7 +752,7 @@ void vm_start(void)
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
 {
-if (runstate_is_running()) {
+if (runstate_is_live(runstate_get())) {
 return vm_stop(state);
 } else {
 int ret;
diff --git a/system/runstate.c b/system/runstate.c
index ea9d6c2..e2fa204 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 {

[PATCH 09/14] migration: preserve suspended for snapshot

2023-12-06 Thread Steve Sistare

Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.

To save, the existing vm_stop call now completely stops the suspended
state.  Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/migration/snapshot.h   |  7 +++
 migration/migration-hmp-cmds.c |  8 +---
 migration/savevm.c | 23 +--
 system/runstate.c  |  5 +
 system/vl.c|  2 ++
 5 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index e72083b..9e4dcaa 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -16,6 +16,7 @@
 #define QEMU_MIGRATION_SNAPSHOT_H
 
 #include "qapi/qapi-builtin-types.h"
+#include "qapi/qapi-types-run-state.h"
 
 /**
  * save_snapshot: Save an internal snapshot.
@@ -61,4 +62,10 @@ bool delete_snapshot(const char *name,
 bool has_devices, strList *devices,
 Error **errp);
 
+/**
+ * load_snapshot_resume: Restore runstate after loading snapshot.
+ * @state: state to restore
+ */
+void load_snapshot_resume(RunState state);
+
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 86ae832..c8d70bc 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
QDict *qdict)
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-int saved_vm_running  = runstate_is_running();
+RunState saved_state = runstate_get();
+
 const char *name = qdict_get_str(qdict, "name");
 Error *err = NULL;
 
 vm_stop(RUN_STATE_RESTORE_VM);
 
-if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
-vm_start();
+if (load_snapshot(name, NULL, false, NULL, )) {
+load_snapshot_resume(saved_state);
 }
+
 hmp_handle_error(mon, err);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index eec5503..26866f0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 QEMUSnapshotInfo sn1, *sn = 
 int ret = -1, ret2;
 QEMUFile *f;
-int saved_vm_running;
+RunState saved_state = runstate_get();
 uint64_t vm_state_size;
 g_autoptr(GDateTime) now = g_date_time_new_now_local();
 AioContext *aio_context;
@@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 }
 aio_context = bdrv_get_aio_context(bs);
 
-saved_vm_running = runstate_is_running();
-
 global_state_store();
 vm_stop(RUN_STATE_SAVE_VM);
 
@@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 
 bdrv_drain_all_end();
 
-if (saved_vm_running) {
-vm_start();
-}
+vm_resume(saved_state);
 return ret == 0;
 }
 
@@ -3339,6 +3335,14 @@ err_drain:
 return false;
 }
 
+void load_snapshot_resume(RunState state)
+{
+vm_resume(state);
+if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) {
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, _abort);
+}
+}
+
 bool delete_snapshot(const char *name, bool has_devices,
  strList *devices, Error **errp)
 {
@@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque)
 {
 Job *job = opaque;
 SnapshotJob *s = container_of(job, SnapshotJob, common);
-int orig_vm_running;
+RunState orig_state = runstate_get();
 
 job_progress_set_remaining(>common, 1);
 
-orig_vm_running = runstate_is_running();
 vm_stop(RUN_STATE_RESTORE_VM);
 
 s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
-if (s->ret && orig_vm_running) {
-vm_start();
+if (s->ret) {
+load_snapshot_resume(orig_state);
 }
 
 job_progress_update(>common, 1);
diff --git a/system/runstate.c b/system/runstate.c
index e2fa204..ca9eb54 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@

[PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING

2023-12-06 Thread Steve Sistare

When a vm transitions from running to suspended, runstate notifiers are
not called, so the notifiers still think the vm is running.  Hence, when
we call vm_start to restore the suspended state, we call vm_state_notify
with running=1.  However, some notifiers check for RUN_STATE_RUNNING.
They must check the running boolean instead.

No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 backends/tpm/tpm_emulator.c | 2 +-
 hw/usb/hcd-ehci.c   | 2 +-
 hw/usb/redirect.c   | 2 +-
 hw/xen/xen-hvm-common.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index f7f1b4a..254fce7 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool 
running,
 
 trace_tpm_emulator_vm_state_change(running, state);
 
-if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) {
+if (!running || !tpm_emu->relock_storage) {
 return;
 }
 
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 19b4534..10c82ce 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool 
running, RunState state)
  * USB-devices which have async handled packages have a packet in the
  * ep queue to match the completion with.
  */
-if (state == RUN_STATE_RUNNING) {
+if (running) {
 ehci_advance_async_state(ehci);
 }
 
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index c9893df..3785bb0 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool 
running, RunState state)
 {
 USBRedirDevice *dev = priv;
 
-if (state == RUN_STATE_RUNNING && dev->parser != NULL) {
+if (running && dev->parser != NULL) {
 usbredirparser_do_write(dev->parser); /* Flush any pending writes */
 }
 }
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39..47e6cb1 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool 
running,
 
 xen_set_ioreq_server_state(xen_domid,
state->ioservid,
-   (rstate == RUN_STATE_RUNNING));
+   running);
 }
 
 void xen_exit_notifier(Notifier *n, void *data)
-- 
1.8.3.1

[PATCH 11/14] tests/qtest: migration events

2023-12-06 Thread Steve Sistare

Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass.  No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qtest/migration-helpers.c | 24 ---
 tests/qtest/migration-helpers.h |  9 ++--
 tests/qtest/migration-test.c| 91 +++--
 3 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 24fb7b3..fd3b94e 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -24,26 +24,16 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque)
-{
-bool *seen = opaque;
-
-if (g_str_equal(name, "STOP")) {
-*seen = true;
-return true;
-}
-
-return false;
-}
-
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque)
 {
-bool *seen = opaque;
+QTestMigrationState *state = opaque;
 
-if (g_str_equal(name, "RESUME")) {
-*seen = true;
+if (g_str_equal(name, "STOP")) {
+state->stop_seen = true;
+return true;
+} else if (g_str_equal(name, "RESUME")) {
+state->resume_seen = true;
 return true;
 }
 
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index e31dc85..3d32699 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -15,9 +15,12 @@
 
 #include "libqtest.h"
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque);
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+typedef struct QTestMigrationState {
+bool stop_seen;
+bool resume_seen;
+} QTestMigrationState;
+
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque);
 
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5752412..59fbbef 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -43,8 +43,8 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
-static bool got_src_stop;
-static bool got_dst_resume;
+static QTestMigrationState src_state;
+static QTestMigrationState dst_state;
 
 /*
  * An initial 3 MB offset is used as that corresponds
@@ -230,6 +230,20 @@ static void wait_for_serial(const char *side)
 } while (true);
 }
 
+static void wait_for_stop(QTestState *who, QTestMigrationState *state)
+{
+if (!state->stop_seen) {
+qtest_qmp_eventwait(who, "STOP");
+}
+}
+
+static void wait_for_resume(QTestState *who, QTestMigrationState *state)
+{
+if (!state->resume_seen) {
+qtest_qmp_eventwait(who, "RESUME");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who)
 qobject_unref(rsp_return);
 }
 
+/*
+ * Wait for two changes in the migration pass count, but bail if we stop.
+ */
 static void wait_for_migration_pass(QTestState *who)
 {
-uint64_t initial_pass = get_migration_pass(who);
-uint64_t pass;
+uint64_t pass, prev_pass = 0, changes = 0;
 
-/* Wait for the 1st sync */
-while (!got_src_stop && !initial_pass) {
-usleep(1000);
-initial_pass = get_migration_pass(who);
-}
-
-do {
+while (changes < 2 && !src_state.stop_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
-} while (pass == initial_pass && !got_src_stop);
+changes += (pass != prev_pass);
+prev_pass = pass;
+}
 }
 
 static void check_guests_ram(QTestState *who)
@@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 {
 qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
-if (!got_src_stop) {
-qtest_qmp_eventwait(from, "STOP");
-}
-
+wait_for_stop(from, _state);
 qtest_qmp_eventwait(to, "RESUME");
 }
 
@@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 }
 }
 
-got_src_stop = false;
-got_dst_resume = false;
+dst_state = (QTestMigrationState) { };
+src_state = (QTestMigrationState) { };
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
 
@@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 if (!args->only_target) {
 *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);

[PATCH V7 09/12] tests/qtest: migration events

2023-12-06 Thread Steve Sistare

Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass.  No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qtest/migration-helpers.c | 24 ---
 tests/qtest/migration-helpers.h |  9 ++--
 tests/qtest/migration-test.c| 91 +++--
 3 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 24fb7b3..fd3b94e 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -24,26 +24,16 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque)
-{
-bool *seen = opaque;
-
-if (g_str_equal(name, "STOP")) {
-*seen = true;
-return true;
-}
-
-return false;
-}
-
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque)
 {
-bool *seen = opaque;
+QTestMigrationState *state = opaque;
 
-if (g_str_equal(name, "RESUME")) {
-*seen = true;
+if (g_str_equal(name, "STOP")) {
+state->stop_seen = true;
+return true;
+} else if (g_str_equal(name, "RESUME")) {
+state->resume_seen = true;
 return true;
 }
 
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index e31dc85..3d32699 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -15,9 +15,12 @@
 
 #include "libqtest.h"
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque);
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+typedef struct QTestMigrationState {
+bool stop_seen;
+bool resume_seen;
+} QTestMigrationState;
+
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque);
 
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0fbaa6a..05c0740 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -43,8 +43,8 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
-static bool got_src_stop;
-static bool got_dst_resume;
+static QTestMigrationState src_state;
+static QTestMigrationState dst_state;
 
 /*
  * An initial 3 MB offset is used as that corresponds
@@ -230,6 +230,20 @@ static void wait_for_serial(const char *side)
 } while (true);
 }
 
+static void wait_for_stop(QTestState *who, QTestMigrationState *state)
+{
+if (!state->stop_seen) {
+qtest_qmp_eventwait(who, "STOP");
+}
+}
+
+static void wait_for_resume(QTestState *who, QTestMigrationState *state)
+{
+if (!state->resume_seen) {
+qtest_qmp_eventwait(who, "RESUME");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who)
 qobject_unref(rsp_return);
 }
 
+/*
+ * Wait for two changes in the migration pass count, but bail if we stop.
+ */
 static void wait_for_migration_pass(QTestState *who)
 {
-uint64_t initial_pass = get_migration_pass(who);
-uint64_t pass;
+uint64_t pass, prev_pass = 0, changes = 0;
 
-/* Wait for the 1st sync */
-while (!got_src_stop && !initial_pass) {
-usleep(1000);
-initial_pass = get_migration_pass(who);
-}
-
-do {
+while (changes < 2 && !src_state.stop_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
-} while (pass == initial_pass && !got_src_stop);
+changes += (pass != prev_pass);
+prev_pass = pass;
+}
 }
 
 static void check_guests_ram(QTestState *who)
@@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 {
 qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
-if (!got_src_stop) {
-qtest_qmp_eventwait(from, "STOP");
-}
-
+wait_for_stop(from, _state);
 qtest_qmp_eventwait(to, "RESUME");
 }
 
@@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 }
 }
 
-got_src_stop = false;
-got_dst_resume = false;
+dst_state = (QTestMigrationState) { };
+src_state = (QTestMigrationState) { };
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
 
@@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 if (!args->only_target) {
 *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);

[PATCH V7 05/12] migration: propagate suspended runstate

2023-12-06 Thread Steve Sistare

If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.

Signed-off-by: Steve Sistare 
---
 migration/global_state.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..d4f61a1 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -22,7 +22,16 @@
 
 typedef struct {
 uint32_t size;
-uint8_t runstate[100];
+
+/*
+ * runstate was 100 bytes, zero padded, but we trimmed it to add a
+ * few fields and maintain backwards compatibility.
+ */
+uint8_t runstate[32];
+uint8_t has_vm_was_suspended;
+uint8_t vm_was_suspended;
+uint8_t unused[66];
+
 RunState state;
 bool received;
 } GlobalState;
@@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
 assert(strlen(state_str) < sizeof(global_state.runstate));
 strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
   state_str, '\0');
+global_state.has_vm_was_suspended = true;
+global_state.vm_was_suspended = vm_get_suspended();
+
+memset(global_state.unused, 0, sizeof(global_state.unused));
 }
 
 void global_state_store(void)
@@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
 return true;
 }
 
+/* If the suspended state must be remembered, it is needed */
+
+if (vm_get_suspended()) {
+return true;
+}
+
 /* If state is running or paused, it is not needed */
 
 if (strcmp(runstate, "running") == 0 ||
@@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 Error *local_err = NULL;
 int r;
 char *runstate = (char *)s->runstate;
+bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;
 
 s->received = true;
 trace_migrate_global_state_post_load(runstate);
@@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 sizeof(s->runstate)) == sizeof(s->runstate)) {
 /*
  * This condition should never happen during migration, because
- * all runstate names are shorter than 100 bytes (the size of
+ * all runstate names are shorter than 32 bytes (the size of
  * s->runstate). However, a malicious stream could overflow
  * the qapi_enum_parse() call, so we force the last character
  * to a NUL byte.
@@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int 
version_id)
 }
 s->state = r;
 
+/*
+ * global_state is saved on the outgoing side before forcing a stopped
+ * state, so it may have saved state=suspended and vm_was_suspended=0.
+ * Now we are in a paused state, and when we later call vm_start, it must
+ * restore the suspended state, so we must set vm_was_suspended=1 here.
+ */
+vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
+
 return 0;
 }
 
@@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(size, GlobalState),
 VMSTATE_BUFFER(runstate, GlobalState),
+VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
+VMSTATE_UINT8(vm_was_suspended, GlobalState),
+VMSTATE_BUFFER(unused, GlobalState),
 VMSTATE_END_OF_LIST()
 },
 };
-- 
1.8.3.1

[PATCH V7 00/12] fix migration of suspended runstate

2023-12-06 Thread Steve Sistare

Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.

See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Changes in V4:
  * rebase to tip
  * add patch for vm_prepare_start (thanks Peter)
  * add patch to preserve cpu ticks

Changes in V5:
  * rebase to tip
  * added patches to completely stop vm in suspended state:
  cpus: refactor vm_stop
  cpus: stop vm in suspended state
  * added patch to partially resume vm in suspended state:
  cpus: start vm in suspended state
  * modified "preserve suspended ..." patches to use the above.
  * deleted patch "preserve cpu ticks if suspended".  stop ticks in
vm_stop_force_state instead.
  * deleted patch "add runstate function".  defined new helper function
migrate_new_runstate in "preserve suspended runstate"
  * Added some RB's, but removed other RB's because the patches changed.

Changes in V6:
  * all vm_stop calls completely stop the suspended state
  * refactored and updated the "cpus" patches
  * simplified the "preserve suspended" patches
  * added patch "bootfile per vm"

Changes in V7:
  * rebase to tip, add RB-s
  * fix backwards compatibility for global_state.vm_was_suspended
  * delete vm_prepare_start state argument, and rename patch
"pass runstate to vm_prepare_start" to
"check running not RUN_STATE_RUNNING"
  * drop patches:
  tests/qtest: bootfile per vm
  tests/qtest: background migration with suspend
  * rename runstate_is_started to runstate_is_live
  * move wait_for_suspend in tests

Steve Sistare (12):
  cpus: vm_was_suspended
  cpus: stop vm in suspended runstate
  cpus: check running not RUN_STATE_RUNNING
  cpus: vm_resume
  migration: propagate suspended runstate
  migration: preserve suspended runstate
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

 backends/tpm/tpm_emulator.c  |   2 +-
 hw/usb/hcd-ehci.c|   2 +-
 hw/usb/redirect.c|   2 +-
 hw/xen/xen-hvm-common.c  |   2 +-
 include/migration/snapshot.h |   7 ++
 include/sysemu/runstate.h|  16 
 migration/global_state.c |  35 ++-
 migration/migration-hmp-cmds.c   |   8 +-
 migration/migration.c|  15 +--
 migration/savevm.c   |  23 +++--
 qapi/misc.json   |  10 +-
 system/cpus.c|  47 +++--
 system/runstate.c|   9 ++
 system/vl.c  |   2 +
 tests/migration/i386/Makefile|   5 +-
 tests/migration/i386/a-b-bootblock.S |  50 +-
 tests/migration/i386/a-b-bootblock.h |  26 +++--
 tests/qtest/migration-helpers.c  |  27 ++
 tests/qtest/migration-helpers.h  |  11 ++-
 tests/qtest/migration-test.c | 181 +--
 20 files changed, 354 insertions(+), 126 deletions(-)

-- 
1.8.3.1

[PATCH 05/14] cpus: check running not RUN_STATE_RUNNING

2023-12-06 Thread Steve Sistare

When a vm transitions from running to suspended, runstate notifiers are
not called, so the notifiers still think the vm is running.  Hence, when
we call vm_start to restore the suspended state, we call vm_state_notify
with running=1.  However, some notifiers check for RUN_STATE_RUNNING.
They must check the running boolean instead.

No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 backends/tpm/tpm_emulator.c | 2 +-
 hw/usb/hcd-ehci.c   | 2 +-
 hw/usb/redirect.c   | 2 +-
 hw/xen/xen-hvm-common.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index f7f1b4a..254fce7 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool 
running,
 
 trace_tpm_emulator_vm_state_change(running, state);
 
-if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) {
+if (!running || !tpm_emu->relock_storage) {
 return;
 }
 
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 19b4534..10c82ce 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool 
running, RunState state)
  * USB-devices which have async handled packages have a packet in the
  * ep queue to match the completion with.
  */
-if (state == RUN_STATE_RUNNING) {
+if (running) {
 ehci_advance_async_state(ehci);
 }
 
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index c9893df..3785bb0 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool 
running, RunState state)
 {
 USBRedirDevice *dev = priv;
 
-if (state == RUN_STATE_RUNNING && dev->parser != NULL) {
+if (running && dev->parser != NULL) {
 usbredirparser_do_write(dev->parser); /* Flush any pending writes */
 }
 }
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39..47e6cb1 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool 
running,
 
 xen_set_ioreq_server_state(xen_domid,
state->ioservid,
-   (rstate == RUN_STATE_RUNNING));
+   running);
 }
 
 void xen_exit_notifier(Notifier *n, void *data)
-- 
1.8.3.1

[PATCH V7 04/12] cpus: vm_resume

2023-12-06 Thread Steve Sistare

Define the vm_resume helper, for use in subsequent patches.

Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 include/sysemu/runstate.h | 9 +
 system/cpus.c | 9 +
 2 files changed, 18 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 867e53c..315046b 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -53,6 +53,15 @@ void vm_start(void);
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
  */
 int vm_prepare_start(bool step_pending);
+
+/**
+ * vm_resume: If @state is a live state, start the vm and set the state,
+ * else just set the state.
+ *
+ * @state: the state to restore
+ */
+void vm_resume(RunState state);
+
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/system/cpus.c b/system/cpus.c
index f162435..7d2c28b 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -748,6 +748,15 @@ void vm_start(void)
 }
 }
 
+void vm_resume(RunState state)
+{
+if (runstate_is_live(state)) {
+vm_start();
+} else {
+runstate_set(state);
+}
+}
+
 /* does a state transition even if the VM is already stopped,
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
-- 
1.8.3.1

Re: [PATCH v4 2/6] xen: backends: don't overwrite XenStore nodes created by toolstack

2023-12-06 Thread Anthony PERARD

On Sat, Dec 02, 2023 at 01:41:21AM +, Volodymyr Babchuk wrote:
> Xen PV devices in QEMU can be created in two ways: either by QEMU
> itself, if they were passed via command line, or by Xen toolstack. In
> the latter case, QEMU scans XenStore entries and configures devices
> accordingly.
> 
> In the second case we don't want QEMU to write/delete front-end
> entries for two reasons: it might have no access to those entries if
> it is running in un-privileged domain and it is just incorrect to
> overwrite entries already provided by Xen toolstack, because toolstack
> manages those nodes. For example, it might read backend- or frontend-
> state to be sure that they are both disconnected and it is safe to
> destroy a domain.
> 
> This patch checks presence of xendev->backend to check if Xen PV
> device was configured by Xen toolstack to decide if it should touch
> frontend entries in XenStore. Also, when we need to remove XenStore
> entries during device teardown only if they weren't created by Xen
> toolstack. If they were created by toolstack, then it is toolstack's
> job to do proper clean-up.
> 
> Suggested-by: Paul Durrant 
> Suggested-by: David Woodhouse 
> Co-Authored-by: Oleksandr Tyshchenko 
> Signed-off-by: Volodymyr Babchuk 
> Reviewed-by: David Woodhouse 
> 

Hi Volodymyr,

There's something wrong with this patch. The block backend doesn't work
when creating a guest via libxl, an x86 hvm guest with qdisk.

Error from guest kernel:
"2 reading backend fields at /local/domain/0/backend/qdisk/23/768"

It seems that "sector-size" is missing for the disk.

Thanks,

-- 
Anthony PERARD

Re: [PATCH] meson: sort C warning flags alphabetically

2023-12-06 Thread Philippe Mathieu-Daudé


On 6/12/23 17:44, Daniel P. Berrangé wrote:

When scanning the list of warning flags to see if one is present, it is
helpful if they are in alphabetical order. It is further helpful to
separate out the 'no-' prefixed warnings.


I was sure this was already done eh...

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Daniel P. Berrangé 
---

The diff looks horrendous, so look at the resulting meson.build to see
the benefit.

  meson.build | 40 +---
  1 file changed, 21 insertions(+), 19 deletions(-)

Re: [PATCH V7 3/8] hw/acpi: Update ACPI GED framework to support vCPU Hotplug

2023-12-06 Thread Zhao Liu

Hi Salil,

On Mon, Nov 13, 2023 at 08:12:31PM +, Salil Mehta via wrote:
> Date: Mon, 13 Nov 2023 20:12:31 +
> From: Salil Mehta via 
> Subject: [PATCH V7 3/8] hw/acpi: Update ACPI GED framework to support vCPU
>  Hotplug
> X-Mailer: git-send-email 2.8.3
> 

[snip]

> @@ -400,6 +411,12 @@ static void acpi_ged_initfn(Object *obj)
>  memory_region_init_io(_st->regs, obj, _regs_ops, ged_st,
>TYPE_ACPI_GED "-regs", ACPI_GED_REG_COUNT);
>  sysbus_init_mmio(sbd, _st->regs);
> +
> +memory_region_init(>container_cpuhp, OBJECT(dev), "cpuhp container",
> +   ACPI_CPU_HOTPLUG_REG_LEN);
> +sysbus_init_mmio(SYS_BUS_DEVICE(dev), >container_cpuhp);
> +cpu_hotplug_hw_init(>container_cpuhp, OBJECT(dev),
> +>cpuhp_state, 0);

I find this cpu_hotplug_hw_init() can still cause qtest errors on x86
platforms as you mentioned in v6: 
https://lore.kernel.org/qemu-devel/15e70616-6abb-63a4-17d0-820f4a254...@opnsrc.net/T/#m108f102b2fe92b7dd7218f2f942f7b233a9d6af3

IIUC, microvm machine has its own 'possible_cpus_arch_ids' and that is
inherited from its parent x86 machine.

The above error is because device-introspect-test sets the none-machine:

# starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3094820.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-3094820.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -audio none -nodefaults -machine none 
-accel qtest

So what about just checking mc->possible_cpu_arch_ids instead of an
assert in cpu_hotplug_hw_init()?

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 4b24a2500361..303f1f1f57bc 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -221,7 +221,10 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
 const CPUArchIdList *id_list;
 int i;

-assert(mc->possible_cpu_arch_ids);
+if (!mc->possible_cpu_arch_ids) {
+return;
+}
+
 id_list = mc->possible_cpu_arch_ids(machine);
 state->dev_count = id_list->len;
 state->devs = g_new0(typeof(*state->devs), state->dev_count);

This check seems to be acceptable in the general code path? Not all machines
have possible_cpu_arch_ids, after all.

Thanks,
Zhao

>  }
>  
>  static void acpi_ged_class_init(ObjectClass *class, void *data)
> diff --git a/include/hw/acpi/generic_event_device.h 
> b/include/hw/acpi/generic_event_device.h
> index ba84ce0214..90fc41cbb8 100644
> --- a/include/hw/acpi/generic_event_device.h
> +++ b/include/hw/acpi/generic_event_device.h
> @@ -60,6 +60,7 @@
>  #define HW_ACPI_GENERIC_EVENT_DEVICE_H
>  
>  #include "hw/sysbus.h"
> +#include "hw/acpi/cpu_hotplug.h"
>  #include "hw/acpi/memory_hotplug.h"
>  #include "hw/acpi/ghes.h"
>  #include "qom/object.h"
> @@ -95,6 +96,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
>  #define ACPI_GED_MEM_HOTPLUG_EVT   0x1
>  #define ACPI_GED_PWR_DOWN_EVT  0x2
>  #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
> +#define ACPI_GED_CPU_HOTPLUG_EVT0x8
>  
>  typedef struct GEDState {
>  MemoryRegion evt;
> @@ -106,6 +108,8 @@ struct AcpiGedState {
>  SysBusDevice parent_obj;
>  MemHotplugState memhp_state;
>  MemoryRegion container_memhp;
> +CPUHotplugState cpuhp_state;
> +MemoryRegion container_cpuhp;
>  GEDState ged_state;
>  uint32_t ged_event_bitmap;
>  qemu_irq irq;
> -- 
> 2.34.1
> 
>

Re: [PATCH 09/11] replay: stop us hanging in rr_wait_io_event

2023-12-06 Thread Richard Henderson


On 12/5/23 12:41, Alex Bennée wrote:

A lot of the hang I see are when we end up spinning in
rr_wait_io_event for an event that will never come in playback. As a
new check functions which can see if we are in PLAY mode and kick us
us the wait function so the event can be processed.

This fixes most of the failures in replay_kernel.py

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2013
Signed-off-by: Alex Bennée 
Cc: Pavel Dovgalyuk 
---
  include/sysemu/replay.h  |  5 +
  accel/tcg/tcg-accel-ops-rr.c |  2 +-
  replay/replay.c  | 24 
  3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 08aae5869f..83995ae4bd 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -70,6 +70,11 @@ int replay_get_instructions(void);
  /*! Updates instructions counter in replay mode. */
  void replay_account_executed_instructions(void);
  
+/**

+ * replay_can_wait: check if we should pause for wait-io
+ */
+bool replay_can_wait(void);
+
  /* Processing clocks and other time sources */
  
  /*! Save the specified clock */

diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 611932f3c3..825e35b3dc 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
  {
  CPUState *cpu;
  
-while (all_cpu_threads_idle()) {

+while (all_cpu_threads_idle() && replay_can_wait()) {
  rr_stop_kick_timer();
  qemu_cond_wait_iothread(first_cpu->halt_cond);
  }
diff --git a/replay/replay.c b/replay/replay.c
index e83c01285c..042a6a9636 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -347,6 +347,30 @@ void replay_start(void)
  replay_enable_events();
  }
  
+/*

+ * For none/record the answer is yes.
+ */
+bool replay_can_wait(void)
+{
+if (replay_mode == REPLAY_MODE_PLAY) {
+/*
+ * For playback we shouldn't ever be at a point we wait. If
+ * the instruction count has reached zero and we have an
+ * unconsumed event we should go around again and consume it.
+ */
+if (replay_state.instruction_count == 0 && 
replay_state.has_unread_data) {
+return false;
+} else {
+fprintf(stderr, "Error: Invalid replay state\n");
+fprintf(stderr,"instruction_count = %d, has = %d, event_kind = 
%d\n",
+replay_state.instruction_count, 
replay_state.has_unread_data, replay_state.data_kind);
+abort();


error_report.


r~


+}
+}
+return true;
+}
+
+
  void replay_finish(void)
  {
  if (replay_mode == REPLAY_MODE_NONE) {

Re: [PATCH 08/11] replay: introduce a central report point for sync errors

2023-12-06 Thread Richard Henderson


On 12/5/23 12:41, Alex Bennée wrote:

Figuring out why replay has failed is tricky at the best of times.
Lets centralise the reporting of a replay sync error and add a little
bit of extra information to help with debugging.

Signed-off-by: Alex Bennée 
---
  replay/replay-internal.h | 12 
  replay/replay-char.c |  6 ++
  replay/replay-internal.c |  1 +
  replay/replay.c  |  9 +
  4 files changed, 24 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [PATCH 08/11] replay: introduce a central report point for sync errors

2023-12-06 Thread Richard Henderson


On 12/6/23 03:35, Philippe Mathieu-Daudé wrote:

Hi Alex,

On 5/12/23 21:41, Alex Bennée wrote:

Figuring out why replay has failed is tricky at the best of times.
Lets centralise the reporting of a replay sync error and add a little
bit of extra information to help with debugging.

Signed-off-by: Alex Bennée 
---
  replay/replay-internal.h | 12 
  replay/replay-char.c |  6 ++
  replay/replay-internal.c |  1 +
  replay/replay.c  |  9 +
  4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 1bc8fd5086..709e2eb4cb 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -74,6 +74,7 @@ enum ReplayEvents {
   * @cached_clock: Cached clocks values
   * @current_icount: number of processed instructions
   * @instruction_count: number of instructions until next event
+ * @current_event: current event index
   * @data_kind: current event
   * @has_unread_data: true if event not yet processed
   * @file_offset: offset into replay log at replay snapshot
@@ -84,6 +85,7 @@ typedef struct ReplayState {
  int64_t cached_clock[REPLAY_CLOCK_COUNT];
  uint64_t current_icount;
  int instruction_count;
+    unsigned int current_event;
  unsigned int data_kind;
  bool has_unread_data;
  uint64_t file_offset;

Shouldn't this field be migrated?


No, it's for diagnostic use only.


r~

[PATCH] meson: sort C warning flags alphabetically

2023-12-06 Thread Daniel P . Berrangé

When scanning the list of warning flags to see if one is present, it is
helpful if they are in alphabetical order. It is further helpful to
separate out the 'no-' prefixed warnings.

Signed-off-by: Daniel P. Berrangé 
---

The diff looks horrendous, so look at the resulting meson.build to see
the benefit.

 meson.build | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/meson.build b/meson.build
index d2c4c2adb3..5c884cf7ce 100644
--- a/meson.build
+++ b/meson.build
@@ -433,36 +433,38 @@ endif
 add_global_arguments(qemu_common_flags, native: false, language: all_languages)
 add_global_link_arguments(qemu_ldflags, native: false, language: all_languages)
 
-# Collect warnings that we want to enable
-
+# Collect warning flags we want to set, sorted alphabetically
 warn_flags = [
-  '-Wundef',
-  '-Wwrite-strings',
-  '-Wmissing-prototypes',
-  '-Wstrict-prototypes',
-  '-Wredundant-decls',
-  '-Wold-style-declaration',
-  '-Wold-style-definition',
-  '-Wtype-limits',
-  '-Wformat-security',
-  '-Wformat-y2k',
-  '-Winit-self',
-  '-Wignored-qualifiers',
+  # First enable interesting warnings
   '-Wempty-body',
-  '-Wnested-externs',
   '-Wendif-labels',
   '-Wexpansion-to-defined',
+  '-Wformat-security',
+  '-Wformat-y2k',
+  '-Wignored-qualifiers',
   '-Wimplicit-fallthrough=2',
+  '-Winit-self',
   '-Wmissing-format-attribute',
+  '-Wmissing-prototypes',
+  '-Wnested-externs',
+  '-Wold-style-declaration',
+  '-Wold-style-definition',
+  '-Wredundant-decls',
+  '-Wshadow=local',
+  '-Wstrict-prototypes',
+  '-Wtype-limits',
+  '-Wundef',
+  '-Wwrite-strings',
+
+  # Then disable some undesirable warnings
+  '-Wno-gnu-variable-sized-type-not-at-end',
   '-Wno-initializer-overrides',
   '-Wno-missing-include-dirs',
+  '-Wno-psabi',
   '-Wno-shift-negative-value',
   '-Wno-string-plus-int',
-  '-Wno-typedef-redefinition',
   '-Wno-tautological-type-limit-compare',
-  '-Wno-psabi',
-  '-Wno-gnu-variable-sized-type-not-at-end',
-  '-Wshadow=local',
+  '-Wno-typedef-redefinition',
 ]
 
 if targetos != 'darwin'
-- 
2.43.0

Re: [PATCH 07/11] replay: make has_unread_data a bool

2023-12-06 Thread Richard Henderson


On 12/5/23 12:41, Alex Bennée wrote:

For clarity given it only has two states.

Signed-off-by: Alex Bennée 
---
  replay/replay-internal.h | 4 ++--
  replay/replay-internal.c | 4 ++--
  replay/replay-snapshot.c | 6 +++---
  replay/replay.c  | 2 +-
  4 files changed, 8 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching

2023-12-06 Thread Madhavan T. Venkataraman




On 11/30/23 18:45, Edgecombe, Rick P wrote:
> On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote:
>> Threat Model
>> 
>>
>> In the threat model in Heki, the attacker is a user space attacker
>> who exploits
>> a kernel vulnerability to gain more privileges or bypass the kernel's
>> access
>> control and self-protection mechanisms. 
>>
>> In the context of the guest page table, one of the things that the
>> threat model translates
>> to is a hacker gaining access to a guest page with RWX permissions.
>> E.g., by adding execute
>> permissions to a writable page or by adding write permissions to an
>> executable page.
>>
>> Today, the permissions for a guest page in the extended page table
>> are RWX by
>> default. So, if a hacker manages to establish RWX for a page in the
>> guest page
>> table, then that is all he needs to do some damage.
> 
> I had a few random comments from watching the plumbers talk online:
> 
> Is there really a big difference between a page that is RWX, and a RW
> page that is about to become RX? I realize that there is an addition of
> timing, but when executable code is getting loaded it can be written to
> then and later executed. I think that gap could be addressed in two
> different ways, both pretty difficult:
>  1. Verifying the loaded code before it gets marked 
> executable. This is difficult because the kernel does lots of 
> tweaks on the code it is loading (alternatives, etc). It can't 
> just check a signature.
>  2. Loading the code in a protected environment. In this model the 
> (for example) module signature would be checked, then the code 
> would be loaded in some sort of protected environment. This way 
> integrity of the loaded code would be enforced. But extracting 
> module loading into a separate domain would be difficult. 
> Various scattered features all have their hands in the loading.
> 
> Secondly, I wonder if another way to look at the memory parts of HEKI
> could be that this is a way to protect certain page table bits from
> stay writes. The RWX bits in the EPT are not directly writable, so more
> steps are needed to change things than just a stray write (instead the
> helpers involved in the operations need to be called). If that is a
> fair way of looking at it, then I wonder how HEKI compares to a
> solution like this security-wise:
> https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgeco...@intel.com/
> 
> Functional-wise it had the benefit of working on bare metal and
> supporting the normal kernel features.

Thanks for the comments. I will think about what you have said and will respond
soon.

Madhavan

[v2 0/4] Support generic Luks encryption

2023-12-06 Thread Hyman Huang

v2:
- Simplify the design by reusing the LUKS driver to implement
  the generic Luks encryption, thank Daniel for the insightful 
  advice.
- rebase on master. 

This functionality was motivated by the following to-do list seen
in crypto documents:
https://wiki.qemu.org/Features/Block/Crypto 

The last chapter says we should "separate header volume": 

The LUKS format has ability to store the header in a separate volume
from the payload. We should extend the LUKS driver in QEMU to support
this use case.

By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Please review, thanks

Best regared,

Yong

Hyman Huang (4):
  crypto: Introduce option and structure for detached LUKS header
  crypto: Introduce payload offset set function
  crypto: Support generic LUKS encryption
  block: Support detached LUKS header creation for blockdev-create

 block/crypto.c | 47 --
 crypto/block.c |  4 
 include/crypto/block.h |  1 +
 qapi/block-core.json   | 11 --
 4 files changed, 59 insertions(+), 4 deletions(-)

-- 
2.39.1

[v2 2/4] crypto: Introduce payload offset set function

2023-12-06 Thread Hyman Huang

Signed-off-by: Hyman Huang 
---
 crypto/block.c | 4 
 include/crypto/block.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/crypto/block.c b/crypto/block.c
index 7bb4b74a37..3dcf22a69f 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -319,6 +319,10 @@ QCryptoHashAlgorithm 
qcrypto_block_get_kdf_hash(QCryptoBlock *block)
 return block->kdfhash;
 }
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset)
+{
+block->payload_offset = offset;
+}
 
 uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block)
 {
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 4f63a37872..b47a90c529 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free)
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset);
 #endif /* QCRYPTO_BLOCK_H */
-- 
2.39.1

[v2 3/4] crypto: Support generic LUKS encryption

2023-12-06 Thread Hyman Huang

By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Signed-off-by: Hyman Huang 
---
 block/crypto.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index f82b13d32b..7d70349463 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -40,6 +40,7 @@ struct BlockCrypto {
 QCryptoBlock *block;
 bool updating_keys;
 BdrvChild *header;  /* Reference to the detached LUKS header */
+bool detached_mode; /* If true, LUKS plays a detached header role */
 };
 
 
@@ -64,12 +65,16 @@ static int block_crypto_read_func(QCryptoBlock *block,
   Error **errp)
 {
 BlockDriverState *bs = opaque;
+BlockCrypto *crypto = bs->opaque;
 ssize_t ret;
 
 GLOBAL_STATE_CODE();
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
+if (crypto->detached_mode)
+ret = bdrv_pread(crypto->header, offset, buflen, buf, 0);
+else
+ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not read encryption header");
 return ret;
@@ -269,6 +274,8 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 QCryptoBlockOpenOptions *open_opts = NULL;
 unsigned int cflags = 0;
 QDict *cryptoopts = NULL;
+const char *header_bdref =
+qdict_get_try_str(options, "header");
 
 GLOBAL_STATE_CODE();
 
@@ -277,6 +284,16 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
+if (header_bdref) {
+crypto->detached_mode = true;
+crypto->header = bdrv_open_child(NULL, options, "header", bs,
+ _of_bds, BDRV_CHILD_METADATA,
+ false, errp);
+if (!crypto->header) {
+return -EINVAL;
+}
+}
+
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
 bs->supported_write_flags = BDRV_REQ_FUA &
@@ -312,6 +329,14 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 goto cleanup;
 }
 
+if (crypto->detached_mode) {
+/*
+ * Set payload offset to zero as the file bdref has no LUKS
+ * header under detached mode.
+ */
+qcrypto_block_set_payload_offset(crypto->block, 0);
+}
+
 bs->encrypted = true;
 
 ret = 0;
@@ -903,6 +928,17 @@ block_crypto_child_perms(BlockDriverState *bs, BdrvChild 
*c,
 
 BlockCrypto *crypto = bs->opaque;
 
+if (role == (role & BDRV_CHILD_METADATA)) {
+/* Assign read permission only */
+perm |= BLK_PERM_CONSISTENT_READ;
+/* Share all permissions */
+shared |= BLK_PERM_ALL;
+
+*nperm = perm;
+*nshared = shared;
+return;
+}
+
 bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, 
nshared);
 
 /*
-- 
2.39.1

[v2 4/4] block: Support detached LUKS header creation for blockdev-create

2023-12-06 Thread Hyman Huang

Provide the "detached-mode" option for detached LUKS header
formatting.

To format the LUKS header on the pre-creating disk, example
as follows:

1. add a protocol blockdev node of LUKS header
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/cipher.gluks" }}'

2. add the secret for encrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type": "secret", "id":
> "libvirt-1-storage-secret0", "data": "abc123"}}'

3. format the disk node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-create",
> "arguments":{"job-id":"job0", "options":{"driver":"luks",
> "size":0, "file":"libvirt-1-storage", "detached-mode":true,
> "cipher-alg":"aes-256",
> "key-secret":"libvirt-3-storage-encryption-secret0"}}}'

Signed-off-by: Hyman Huang 
---
 block/crypto.c   | 8 +++-
 qapi/block-core.json | 5 -
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 7d70349463..e77c49bd0c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -667,10 +667,12 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 BlockDriverState *bs = NULL;
 QCryptoBlockCreateOptions create_opts;
 PreallocMode preallocation = PREALLOC_MODE_OFF;
+int64_t size;
 int ret;
 
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
 luks_opts = _options->u.luks;
+size = luks_opts->size;
 
 bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
 if (bs == NULL) {
@@ -686,7 +688,11 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 preallocation = luks_opts->preallocation;
 }
 
-ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
+if (luks_opts->detached_mode) {
+size = 0;
+}
+
+ret = block_crypto_co_create_generic(bs, size, _opts,
  preallocation, errp);
 if (ret < 0) {
 goto fail;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 10be08d08f..1e7a7e1b05 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4952,13 +4952,16 @@
 # @preallocation: Preallocation mode for the new image (since: 4.2)
 # (default: off; allowed values: off, metadata, falloc, full)
 #
+# @detached-mode: create a detached LUKS header. (since 9.0)
+#
 # Since: 2.12
 ##
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
   'data': { 'file': 'BlockdevRef',
 'size': 'size',
-'*preallocation':   'PreallocMode' } }
+'*preallocation':   'PreallocMode',
+'*detached-mode':   'bool'}}
 
 ##
 # @BlockdevCreateOptionsNfs:
-- 
2.39.1

[v2 1/4] crypto: Introduce option and structure for detached LUKS header

2023-12-06 Thread Hyman Huang

Add the "header" option for the LUKS format. This field would be
used to identify the blockdev's position where a detachable LUKS
header is stored.

In addition, introduce header field in struct BlockCrypto

Signed-off-by: Hyman Huang 
---
 block/crypto.c   | 1 +
 qapi/block-core.json | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 921933a5e5..f82b13d32b 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto;
 struct BlockCrypto {
 QCryptoBlock *block;
 bool updating_keys;
+BdrvChild *header;  /* Reference to the detached LUKS header */
 };
 
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..10be08d08f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3352,11 +3352,15 @@
 # decryption key (since 2.6). Mandatory except when doing a
 # metadata-only probe of the image.
 #
+# @header: optional reference to the location of a blockdev
+# storing a detached LUKS header. (since 9.0)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsLUKS',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { '*key-secret': 'str' } }
+  'data': { '*key-secret': 'str',
+'*header': 'BlockdevRef'} }
 
 ##
 # @BlockdevOptionsGenericCOWFormat:
-- 
2.39.1

Re: [PATCH 05/11] replay: remove host_clock_last

2023-12-06 Thread Richard Henderson


On 12/5/23 12:41, Alex Bennée wrote:

Fixes: a02fe2ca70 (replay: Remove host_clock_last)
Signed-off-by: Alex Bennée 
---
  replay/replay-internal.h | 2 --
  1 file changed, 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 04/11] scripts/replay_dump: track total number of instructions

2023-12-06 Thread Richard Henderson


On 12/5/23 12:40, Alex Bennée wrote:

This will help in tracking where we are in the stream when debugging.

Signed-off-by: Alex Bennée 
---
  scripts/replay-dump.py | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching

2023-12-06 Thread Madhavan T. Venkataraman




On 11/30/23 05:33, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote:
> 
>> Kernel Lockdown
>> ---
>>
>> But, we must provide at least some security in V2. Otherwise, it is useless.
>>
>> So, we have implemented what we call a kernel lockdown. At the end of kernel
>> boot, Heki establishes permissions in the extended page table as mentioned
>> before. Also, it adds an immutable attribute for kernel text and kernel RO 
>> data.
>> Beyond that point, guest requests that attempt to modify permissions on any 
>> of
>> the immutable pages will be denied.
>>
>> This means that features like FTrace and KProbes will not work on kernel text
>> in V2. This is a temporary limitation. Once authentication is in place, the
>> limitation will go away.
> 
> So either you're saying your patch 17 / text_poke is broken (so why
> include it ?!?) or your statement above is incorrect. Pick one.
> 

It has been included so that people can be aware of the changes.

I will remove the text_poke() changes from the patchset and send it later when
I have some authentication in place. It will make sense then.

> 
>> __text_poke()
>>  This function is called by various features to patch text.
>>  This calls heki_text_poke_start() and heki_text_poke_end().
>>
>>  heki_text_poke_start() is called to add write permissions to the
>>  extended page table so that text can be patched. heki_text_poke_end()
>>  is called to revert write permissions in the extended page table.
> 
> This, if text_poke works, then static_call / jump_label / ftrace and
> everything else should work, they all rely on this.
> 
> 
>> Peter mentioned the following:
>>
>> "if you want to mirror the native PTEs why don't you hook into the
>> paravirt page-table muck and get all that for free?"
>>
>> We did consider using a shadow page table kind of approach so that guest 
>> page table
>> modifications can be intercepted and reflected in the page table entry. We 
>> did not
>> do this for two reasons:
>>
>> - there are bits in the page table entry that are not permission bits. We 
>> would like
>>   the guest kernel to be able to modify them directly.
> 
> This statement makes no sense.
> 
>> - we cannot tell a genuine request from an attack.
> 
> Why not? How is an explicit call different from an explicit call in a
> paravirt hook?
> 
>>From a maintenance pov we already hate paravirt with a passion, but it
> is ever so much better than sprinkling yet another pile of crap
> (heki_*) around.

I only said that the idea was considered.

We can resume the discussion on this topic when I send the text_poke() changes 
in a later
version of the Heki patchset.

Madhavan

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-06 Thread Michal Suchánek

On Wed, Dec 06, 2023 at 03:49:28PM +, Miguel Luis wrote:
> 
> 
> > On 6 Dec 2023, at 14:25, Michal Suchánek  wrote:
> > 
> > On Wed, Dec 06, 2023 at 01:17:08PM -0100, Miguel Luis wrote:
> >> Hi!
> >> 
> >> On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote:
> >>> Unplugging vCPU triggers the following assertion in
> >>> tcg_register_thread():
> >>> 
> >>> 796 void tcg_register_thread(void)
> >>> 797 {
> >>> ...
> >>> 812 /* Claim an entry in tcg_ctxs */
> >>> 813 n = qatomic_fetch_inc(_cur_ctxs);
> >>> 814 g_assert(n < tcg_max_ctxs);
> >>> 
> >>> Implement and use tcg_unregister_thread() so when a
> >>> vCPU is unplugged, the tcg_cur_ctxs refcount is
> >>> decremented.
> >> 
> >> 
> >> I've had addressed this issue before (posted at [1]) and had exercised
> >> it with vCPU hotplug/unplug patches for ARM although I was not sure about 
> >> what
> >> would be needed to be done regarding plugins on the context of
> >> tcg_unregister_thread. I guess they would need to be freed also?
> > 
> > Doesn't it have the same problem that it will randomly free some context
> > which is not necessarily associated with the unplugged CPU?
> > 
> > Consider machine with 4 CPUs, they are likely added in order - cpu0
> > getting context0, cpu1 context1, etc.
> > 
> > Unplug CPU 1. Given that context 3 is top the would be unallocated by
> > the decrement, or am I missing something?
> > 
> 
> I think you’re right and I share of the same opinion that matching a tcg 
> thread
> to a vCPU would be handy to solve this and maybe sorting tcg_ctxs after
> unregistering the thread.

Tried to apply the patch. It does not crash right away, and due to virsh
limitation I get only one (8-thread) core to hotplug so it did survive a few
hotplug cycles. After a while of hotplugging it crashed, anyway.

Given the atomic_dec there is probably no expectation that the
processing is sequential.

Thanks

Michal

Re: [PATCH 03/11] scripts/replay-dump: update to latest format

2023-12-06 Thread Richard Henderson


On 12/5/23 12:40, Alex Bennée wrote:

@@ -268,6 +279,49 @@ def decode_clock(eid, name, dumpfile):
Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
  ]
  
+# Shutdown cause added

+v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),


This comment applied to the v7 changes.

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH 1/4] scsi: only access SCSIDevice->requests from one thread

2023-12-06 Thread Stefan Hajnoczi

On Tue, 5 Dec 2023 at 05:01, Kevin Wolf  wrote:
>
> Am 04.12.2023 um 17:30 hat Stefan Hajnoczi geschrieben:
> > On Fri, Dec 01, 2023 at 05:03:13PM +0100, Kevin Wolf wrote:
> > > Am 23.11.2023 um 20:49 hat Stefan Hajnoczi geschrieben:
> > > > Stop depending on the AioContext lock and instead access
> > > > SCSIDevice->requests from only one thread at a time:
> > > > - When the VM is running only the BlockBackend's AioContext may access
> > > >   the requests list.
> > > > - When the VM is stopped only the main loop may access the requests
> > > >   list.
> > > >
> > > > These constraints protect the requests list without the need for locking
> > > > in the I/O code path.
> > > >
> > > > Note that multiple IOThreads are not supported yet because the code
> > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > that as future work.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi 
> > > > ---
> > > >  include/hw/scsi/scsi.h |   7 +-
> > > >  hw/scsi/scsi-bus.c | 174 -
> > > >  2 files changed, 124 insertions(+), 57 deletions(-)
> > > >
> > > > diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> > > > index 3692ca82f3..10c4e8288d 100644
> > > > --- a/include/hw/scsi/scsi.h
> > > > +++ b/include/hw/scsi/scsi.h
> > > > @@ -69,14 +69,19 @@ struct SCSIDevice
> > > >  {
> > > >  DeviceState qdev;
> > > >  VMChangeStateEntry *vmsentry;
> > > > -QEMUBH *bh;
> > > >  uint32_t id;
> > > >  BlockConf conf;
> > > >  SCSISense unit_attention;
> > > >  bool sense_is_ua;
> > > >  uint8_t sense[SCSI_SENSE_BUF_SIZE];
> > > >  uint32_t sense_len;
> > > > +
> > > > +/*
> > > > + * The requests list is only accessed from the AioContext that 
> > > > executes
> > > > + * requests or from the main loop when IOThread processing is 
> > > > stopped.
> > > > + */
> > > >  QTAILQ_HEAD(, SCSIRequest) requests;
> > > > +
> > > >  uint32_t channel;
> > > >  uint32_t lun;
> > > >  int blocksize;
> > > > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> > > > index fc4b77fdb0..b8bfde9565 100644
> > > > --- a/hw/scsi/scsi-bus.c
> > > > +++ b/hw/scsi/scsi-bus.c
> > > > @@ -85,6 +85,82 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int 
> > > > channel, int id, int lun)
> > > >  return d;
> > > >  }
> > > >
> > > > +/*
> > > > + * Invoke @fn() for each enqueued request in device @s. Must be called 
> > > > from the
> > > > + * main loop thread while the guest is stopped. This is only suitable 
> > > > for
> > > > + * vmstate ->put(), use scsi_device_for_each_req_async() for other 
> > > > cases.
> > > > + */
> > > > +static void scsi_device_for_each_req_sync(SCSIDevice *s,
> > > > +  void (*fn)(SCSIRequest *, 
> > > > void *),
> > > > +  void *opaque)
> > > > +{
> > > > +SCSIRequest *req;
> > > > +SCSIRequest *next_req;
> > > > +
> > > > +assert(!runstate_is_running());
> > > > +assert(qemu_in_main_thread());
> > > > +
> > > > +QTAILQ_FOREACH_SAFE(req, >requests, next, next_req) {
> > > > +fn(req, opaque);
> > > > +}
> > > > +}
> > > > +
> > > > +typedef struct {
> > > > +SCSIDevice *s;
> > > > +void (*fn)(SCSIRequest *, void *);
> > > > +void *fn_opaque;
> > > > +} SCSIDeviceForEachReqAsyncData;
> > > > +
> > > > +static void scsi_device_for_each_req_async_bh(void *opaque)
> > > > +{
> > > > +g_autofree SCSIDeviceForEachReqAsyncData *data = opaque;
> > > > +SCSIDevice *s = data->s;
> > > > +SCSIRequest *req;
> > > > +SCSIRequest *next;
> > > > +
> > > > +/*
> > > > + * It is unlikely that the AioContext will change before this BH 
> > > > is called,
> > > > + * but if it happens then ->requests must not be accessed from this
> > > > + * AioContext.
> > > > + */
> > >
> > > What is the scenario where this happens? I would have expected that
> > > switching the AioContext of a node involves draining the node first,
> > > which would execute this BH before the context changes.
> >
> > I don't think aio_poll() is invoked by bdrv_drained_begin() when there
> > are no requests in flight. In that case the BH could remain pending
> > across bdrv_drained_begin()/bdrv_drained_end().
>
> Hm, I wonder if that is actually a bug. Without executing pending BHs,
> you can't be sure that nothing touches the node any more.
>
> Before commit 5e8ac217, we always polled at least once, though I think
> that was an unintentional side effect.

It makes the programming model easier and safer if aio_bh_poll() is
guaranteed to be called by bdrv_drained_begin().

Then I could convert this conditional into an assertion and assume it
never happens.

> > > The other option I see is an empty BlockBackend, which can change its
> > > AioContext without polling BHs, but in that case there is no connection
> > > to other users, so the only change could come from

Re: [PATCH 02/11] tests/avocado: fix typo in replay_linux

2023-12-06 Thread Richard Henderson


On 12/5/23 12:40, Alex Bennée wrote:

Signed-off-by: Alex Bennée 
---
  tests/avocado/replay_linux.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test

2023-12-06 Thread Richard Henderson


On 12/5/23 12:40, Alex Bennée wrote:

There are a number of bugs against 32 bit x86 on the tracker. Lets at
least establish a baseline pure kernel boot can do record/replay
before we start looking at the devices.

Signed-off-by: Alex Bennée 
---
  tests/avocado/replay_kernel.py | 16 
  1 file changed, 16 insertions(+)


Acked-by: Richard Henderson 


r~

Re: [PATCH qemu 1/2] hw/arm: Add minimal support for the STM32L4x5 SoC

2023-12-06 Thread Alex Bennée

Alistair Francis  writes:

> On Mon, Nov 27, 2023 at 12:44 AM ~inesvarhol  wrote:
>>
>> From: Inès Varhol 
>>
>> This patch adds a new STM32L4x5 SoC, it is necessary to add support for
>> the B-L475E-IOT01A board.
>> The implementation is derived from the STM32F405 SoC.
>> The implementation contains no peripherals, only memory regions are
>> implemented.
>>
>> Reviewed-by: Philippe Mathieu-Daudé 
>>
>> Signed-off-by: Arnaud Minier 
>> Signed-off-by: Inès Varhol 
>> ---
>>  MAINTAINERS|   8 +
>>  hw/arm/Kconfig |   5 +
>>  hw/arm/meson.build |   1 +
>>  hw/arm/stm32l4x5_soc.c | 277 +
>>  include/hw/arm/stm32l4x5_soc.h |  68 
>>  5 files changed, 359 insertions(+)
>>  create mode 100644 hw/arm/stm32l4x5_soc.c
>>  create mode 100644 include/hw/arm/stm32l4x5_soc.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index ff1238bb98..32458d41dd 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -1122,6 +1122,14 @@ L: qemu-...@nongnu.org
>>  S: Maintained
>>  F: hw/arm/olimex-stm32-h405.c
>>
>> +STM32L4x5 SoC Family
>> +M: Arnaud Minier 
>> +M: Inès Varhol 
>> +L: qemu-...@nongnu.org
>> +S: Maintained
>> +F: hw/arm/stm32l4x5_soc.c
>> +F: include/hw/arm/stm32l4x5_soc.h
>> +
>>  SmartFusion2
>>  M: Subbaraya Sundeep 
>>  M: Peter Maydell 
>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
>> index 3ada335a24..d2b94d9a47 100644
>> --- a/hw/arm/Kconfig
>> +++ b/hw/arm/Kconfig
>> @@ -448,6 +448,11 @@ config STM32F405_SOC
>>  select STM32F4XX_SYSCFG
>>  select STM32F4XX_EXTI
>>
>> +config STM32L4X5_SOC
>> +bool
>> +select ARM_V7M
>> +select OR_IRQ
>> +
>>  config XLNX_ZYNQMP_ARM
>>  bool
>>  default y if PIXMAN
>> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
>> index 68245d3ad1..9766da10c4 100644
>> --- a/hw/arm/meson.build
>> +++ b/hw/arm/meson.build
>> @@ -42,6 +42,7 @@ arm_ss.add(when: 'CONFIG_RASPI', if_true: 
>> files('bcm2836.c', 'raspi.c'))
>>  arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c'))
>>  arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c'))
>>  arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c'))
>> +arm_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_soc.c'))
>>  arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 
>> 'xlnx-zcu102.c'))
>>  arm_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal.c', 
>> 'xlnx-versal-virt.c'))
>>  arm_ss.add(when: 'CONFIG_FSL_IMX25', if_true: files('fsl-imx25.c', 
>> 'imx25_pdk.c'))
>> diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
>> new file mode 100644
>> index 00..f476878b2c
>> --- /dev/null
>> +++ b/hw/arm/stm32l4x5_soc.c
>> @@ -0,0 +1,277 @@
>> +/*
>> + * STM32L4x5 SoC family
>> + *
>> + * SPDX-License-Identifier: MIT
>
> I'm pretty sure this must be GPL to be accepted

Does it? A quick grep of the code shows we have quite a lot of hw
emulation files that are MIT licensed. Although IANAL MIT is very
permissive and easily combined with GPL as long at the final product is
under GPL.

I think there is some LGPL code about for TCG but I'm not sure how well
tagged that is.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v3 for-8.2] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Michael Roth

Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
exposed a long-running bug in current KVM support for SEV-ES where the
kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
kernel, in which case EFER write traps would result in KVM eventually
seeing MSR_EFER_LMA get set and recording it in such a way that it would
be subsequently visible when accessing it via KVM_GET_SREGS/etc.

However, guest kernels currently rely on MSR_EFER_LMA getting set
automatically when MSR_EFER_LME is set and paging is enabled via
CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
MSR_EFER_LMA bit, even though it is set internally, and when QEMU
subsequently tries to pass this EFER value back to KVM via
KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
which is now considered fatal due to the aforementioned QEMU commit.

This can be addressed by inferring the MSR_EFER_LMA bit being set when
paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
the expected bits are all present in subsequent handling on the host
side.

Ultimately, this handling will be implemented in the host kernel, but to
avoid breaking QEMU's SEV-ES support when using older host kernels, the
same handling can be done in QEMU just after fetching the register
values via KVM_GET_SREGS*. Implement that here.

Cc: Paolo Bonzini 
Cc: Marcelo Tosatti 
Cc: Tom Lendacky 
Cc: Akihiko Odaki 
Cc: Philippe Mathieu-Daudé 
Cc: Lara Lazier 
Cc: Vitaly Kuznetsov 
Cc: Maxim Levitsky 
Cc: k...@vger.kernel.org
Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
Signed-off-by: Michael Roth 
---
 target/i386/kvm/kvm.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 11b8177eff..4ce80555b4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3643,6 +3643,10 @@ static int kvm_get_sregs(X86CPU *cpu)
 env->cr[4] = sregs.cr4;
 
 env->efer = sregs.efer;
+if (sev_es_enabled() && env->efer & MSR_EFER_LME &&
+env->cr[0] & CR0_PG_MASK) {
+env->efer |= MSR_EFER_LMA;
+}
 
 /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */
 x86_update_hflags(env);
@@ -3682,6 +3686,10 @@ static int kvm_get_sregs2(X86CPU *cpu)
 env->cr[4] = sregs.cr4;
 
 env->efer = sregs.efer;
+if (sev_es_enabled() && env->efer & MSR_EFER_LME &&
+env->cr[0] & CR0_PG_MASK) {
+env->efer |= MSR_EFER_LMA;
+}
 
 env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
 
-- 
2.25.1

Re: [PATCH qemu v3 02/20] Fixing the basic functionality of STM32 timers

2023-12-06 Thread Alex Bennée

~lbryndza  writes:

> From: Lucjan Bryndza 
>
> The current implementation of timers does not work properly
> even in basic functionality. A counter configured to report
> an interrupt every 10ms reports the first interrupts after a
> few seconds.  There are also no properly implemented count up an
> count down modes. This commit fixes bugs with interrupt
> reporting and implements the basic modes of the counter's
> time-base block.
>
> Remove wrong qemu timer implementation

I suspect this breaks bisectability of the series. Each point in the
series should still be able to compile and at least function as well as
it did before. So in this case I think this patch needs to be merged
with the patch that brings in the replacement functionality.

>
> Signed-off-by: Lucjan Bryndza 
> ---
>  hw/timer/stm32f2xx_timer.c | 55 --
>  1 file changed, 5 insertions(+), 50 deletions(-)
>
> diff --git a/hw/timer/stm32f2xx_timer.c b/hw/timer/stm32f2xx_timer.c
> index ba8694dcd3..f03f594a17 100644
> --- a/hw/timer/stm32f2xx_timer.c
> +++ b/hw/timer/stm32f2xx_timer.c
> @@ -23,12 +23,17 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qapi/error.h"
>  #include "hw/irq.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/timer/stm32f2xx_timer.h"
>  #include "migration/vmstate.h"
>  #include "qemu/log.h"
>  #include "qemu/module.h"
> +#include "qemu/typedefs.h"
> +#include "qemu/timer.h"
> +#include "qemu/main-loop.h"
> +#include "sysemu/dma.h"

Seems odd to increase the includes needed when the rest of the patch
just deletes code.



-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-06 Thread Miguel Luis



> On 6 Dec 2023, at 14:25, Michal Suchánek  wrote:
> 
> On Wed, Dec 06, 2023 at 01:17:08PM -0100, Miguel Luis wrote:
>> Hi!
>> 
>> On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote:
>>> Unplugging vCPU triggers the following assertion in
>>> tcg_register_thread():
>>> 
>>> 796 void tcg_register_thread(void)
>>> 797 {
>>> ...
>>> 812 /* Claim an entry in tcg_ctxs */
>>> 813 n = qatomic_fetch_inc(_cur_ctxs);
>>> 814 g_assert(n < tcg_max_ctxs);
>>> 
>>> Implement and use tcg_unregister_thread() so when a
>>> vCPU is unplugged, the tcg_cur_ctxs refcount is
>>> decremented.
>> 
>> 
>> I've had addressed this issue before (posted at [1]) and had exercised
>> it with vCPU hotplug/unplug patches for ARM although I was not sure about 
>> what
>> would be needed to be done regarding plugins on the context of
>> tcg_unregister_thread. I guess they would need to be freed also?
> 
> Doesn't it have the same problem that it will randomly free some context
> which is not necessarily associated with the unplugged CPU?
> 
> Consider machine with 4 CPUs, they are likely added in order - cpu0
> getting context0, cpu1 context1, etc.
> 
> Unplug CPU 1. Given that context 3 is top the would be unallocated by
> the decrement, or am I missing something?
> 

I think you’re right and I share of the same opinion that matching a tcg thread
to a vCPU would be handy to solve this and maybe sorting tcg_ctxs after
unregistering the thread.

Thank you

Miguel

> Thanks
> 
> Michal
> 
>> 
>> Thank you
>> 
>> Miguel
>> 
>> [1]: 
>> https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-06 Thread Alex Bennée

Miguel Luis  writes:

> Hi!
>
> On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote:
>> Unplugging vCPU triggers the following assertion in
>> tcg_register_thread():
>>
>>  796 void tcg_register_thread(void)
>>  797 {
>>  ...
>>  812 /* Claim an entry in tcg_ctxs */
>>  813 n = qatomic_fetch_inc(_cur_ctxs);
>>  814 g_assert(n < tcg_max_ctxs);
>>
>> Implement and use tcg_unregister_thread() so when a
>> vCPU is unplugged, the tcg_cur_ctxs refcount is
>> decremented.
>
>
> I've had addressed this issue before (posted at [1]) and had exercised
> it with vCPU hotplug/unplug patches for ARM although I was not sure about what
> would be needed to be done regarding plugins on the context of
> tcg_unregister_thread. I guess they would need to be freed also?

Good catch. They should indeed.

>
>
>> Reported-by: Michal Suchánek 
>> Suggested-by: Stefan Hajnoczi 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---

>>  void tcg_register_thread(void)
>>  {
>>  TCGContext *s = g_malloc(sizeof(*s));
>> @@ -814,6 +821,16 @@ void tcg_register_thread(void)
>>  
>>  tcg_ctx = s;
>>  }
>> +
>> +void tcg_unregister_thread(void)
>> +{
>> +unsigned int n;
>> +
>> +n = qatomic_fetch_dec(_cur_ctxs);
>> +g_free(tcg_ctxs[n]);

Perhaps a bit of re-factoring and we could have a tcg_alloc_context and
tcg_free_context to keep everything together?

>
>
> Is the above off-by-one?
>
>
>> +qatomic_set(_ctxs[n], NULL);
>> +}
>> +
>
> Thank you
>
> Miguel
>
> [1]: 
> https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/
>
>
>>  #endif /* !CONFIG_USER_ONLY */
>>  
>>  /* pool based memory allocation */

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-06 Thread Michal Suchánek

On Wed, Dec 06, 2023 at 01:17:08PM -0100, Miguel Luis wrote:
> Hi!
> 
> On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote:
> > Unplugging vCPU triggers the following assertion in
> > tcg_register_thread():
> >
> >  796 void tcg_register_thread(void)
> >  797 {
> >  ...
> >  812 /* Claim an entry in tcg_ctxs */
> >  813 n = qatomic_fetch_inc(_cur_ctxs);
> >  814 g_assert(n < tcg_max_ctxs);
> >
> > Implement and use tcg_unregister_thread() so when a
> > vCPU is unplugged, the tcg_cur_ctxs refcount is
> > decremented.
> 
> 
> I've had addressed this issue before (posted at [1]) and had exercised
> it with vCPU hotplug/unplug patches for ARM although I was not sure about what
> would be needed to be done regarding plugins on the context of
> tcg_unregister_thread. I guess they would need to be freed also?

Doesn't it have the same problem that it will randomly free some context
which is not necessarily associated with the unplugged CPU?

Consider machine with 4 CPUs, they are likely added in order - cpu0
getting context0, cpu1 context1, etc.

Unplug CPU 1. Given that context 3 is top the would be unallocated by
the decrement, or am I missing something?

Thanks

Michal

> 
> Thank you
> 
> Miguel
> 
> [1]: 
> https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/

Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Michael Roth

On Wed, Dec 06, 2023 at 04:04:43PM +0100, Paolo Bonzini wrote:
> On Wed, Dec 6, 2023 at 3:46 PM Michael Roth  wrote:
> > > There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is
> > > simply EFER.LME && CR0.PG.
> >
> > Yah, I originally had it like that, but svm_set_cr0() in the kernel only
> > sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might
> > be safer to be more selective and mirror that handling on the QEMU side
> > so we can leave as much of any other sanity checks on kernel/QEMU side
> > intact as possible. E.g., if some other bug in the kernel ends up
> > unsetting EFER.LMA while paging is still enabled, we'd still notice that
> > when passing it back in via KVM_SET_SREGS*.
> >
> > But agree it's simpler to just always set it based on CR0.PG and EFER.LMA
> > and can send a v3 if that's preferred.
> 
> Yeah, in this case I think the chance of something breaking is really,
> really small.
> 
> The behavior of svm_set_cr0() is more due to how the surrounding code
> looks like, than anything else.

Ok, seems reasonable. I'll plan to send a v3 with the old_cr0 stuff
dropped.

> 
> > > Alternatively, sev_es_enabled() could be an assertion, that is:
> > >
> > > if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) &&
> > >!(env->efer & MSR_EFER_LMA)) {
> > > /* Workaround for... */
> > > assert(sev_es_enabled());
> > > env->efer |= MSR_EFER_LMA;
> > > }
> > >
> > > What do you think?
> >
> > I'm a little apprehensive about this approach for similar reasons as
> > above
> 
> I agree on this. I think it's worth in general to have clear
> expectations, though. If you think it's worrisome, we can commit it
> without assertion now and add it in 9.0.

I think that seems like a good approach. That would give us more time to
discuss the fix/handling on the kernel side, and then as a follow-up we
can tighten down the QEMU handling/expectations based on that.

Thanks,

Mike

> 
> Paolo
> 
>

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-06 Thread Stefan Hajnoczi

On Wed, 6 Dec 2023 at 09:29, Philippe Mathieu-Daudé  wrote:
>
> Hi Stefan,
>
> On 6/12/23 13:56, Michal Suchánek wrote:
> > On Tue, Dec 05, 2023 at 11:09:59AM +0100, Michal Suchánek wrote:
> >> On Mon, Dec 04, 2023 at 03:02:45PM -0800, Richard Henderson wrote:
> >>> On 12/4/23 12:09, Michal Suchánek wrote:
>  On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote:
> > On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé  
> > wrote:
> >> +void tcg_unregister_thread(void)
> >> +{
> >> +unsigned int n;
> >> +
> >> +n = qatomic_fetch_dec(_cur_ctxs);
> >> +g_free(tcg_ctxs[n]);
> >> +qatomic_set(_ctxs[n], NULL);
> >> +}
> >
> > tcg_ctxs[n] may not be our context, so this looks like it could free
> > another thread's context and lead to undefined behavior.
> >>>
> >>> Correct.
> >>>
>  There is cpu->thread_id so perhaps cpu->thread_ctx could be added as
>  well. That would require a bitmap of used threads contexts rather than a
>  counter, though.
> >>>
> >>> Or don't free the context at all, but re-use it when incrementing and
> >>> tcg_ctxs[n] != null (i.e. plugging in a repacement vcpu).  After all, 
> >>> there
> >>> can only be tcg_max_ctxs contexts.
> >>
> >> But you would not know which contexts are in use and which aren't without
> >> tracking the association of contexts to CPUs.
> >>
> >> Unless there is a cpu array somewhere and you can use the same index for
> >> both to create the association.
> >
> > I tried to use cpu_index for correlating the tcg_ctx with cpu. I added
> > some asserts that only null contexts are allocated and non-null contexts
> > released but qemu crashes somewhere in tcg sometime after the guest gets
> > to switch root.
>
> Since this isn't trivial and is a long standing issue, let's not
> block the 8.2 release with it.

Sounds good.

Thanks,
Stefan

Re: [PATCH] target/i386: Add new CPU model SierraForest

2023-12-06 Thread Zhao Liu

On Wed, Dec 06, 2023 at 09:19:23PM +0800, Tao Su wrote:
> Date: Wed,  6 Dec 2023 21:19:23 +0800
> From: Tao Su 
> Subject: [PATCH] target/i386: Add new CPU model SierraForest
> X-Mailer: git-send-email 2.34.1
> 
> SierraForest is Intel's first generation E-core based Xeon server
> processor, which will be released in the first half of 2024.
> 
> SierraForest mainly adds the following new features based on
> GraniteRapids:
> 
> - CMPCCXADD CPUID.(EAX=7,ECX=1):EAX[bit 7]
> - AVX-IFMA CPUID.(EAX=7,ECX=1):EAX[bit 23]
> - AVX-VNNI-INT8 CPUID.(EAX=7,ECX=1):EDX[bit 4]
> - AVX-NE-CONVERT CPUID.(EAX=7,ECX=1):EDX[bit 5]
> - LAM CPUID.(EAX=7,ECX=1):EAX[bit 26]
> - LASS CPUID.(EAX=7,ECX=1):EAX[bit 6]
> 
> and removes the following features based on GraniteRapids:
> 
> - HLE CPUID.(EAX=7,ECX=0):EBX[bit 4]
> - RTM CPUID.(EAX=7,ECX=0):EBX[bit 11]
> - AVX512F CPUID.(EAX=7,ECX=0):EBX[bit 16]
> - AVX512DQ CPUID.(EAX=7,ECX=0):EBX[bit 17]
> - AVX512_IFMA CPUID.(EAX=7,ECX=0):EBX[bit 21]
> - AVX512CD CPUID.(EAX=7,ECX=0):EBX[bit 28]
> - AVX512BW CPUID.(EAX=7,ECX=0):EBX[bit 30]
> - AVX512VL CPUID.(EAX=7,ECX=0):EBX[bit 31]
> - AVX512_VBMI CPUID.(EAX=7,ECX=0):ECX[bit 1]
> - AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 6]
> - AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 11]
> - AVX512_BITALG CPUID.(EAX=7,ECX=0):ECX[bit 12]
> - AVX512_VPOPCNTDQ CPUID.(EAX=7,ECX=0):ECX[bit 14]
> - LA57 CPUID.(EAX=7,ECX=0):ECX[bit 16]
> - TSXLDTRK CPUID.(EAX=7,ECX=0):EDX[bit 16]
> - AMX-BF16 CPUID.(EAX=7,ECX=0):EDX[bit 22]
> - AVX512_FP16 CPUID.(EAX=7,ECX=0):EDX[bit 23]
> - AMX-TILE CPUID.(EAX=7,ECX=0):EDX[bit 24]
> - AMX-INT8 CPUID.(EAX=7,ECX=0):EDX[bit 25]
> - AVX512_BF16 CPUID.(EAX=7,ECX=1):EAX[bit 5]
> - fast zero-length MOVSB CPUID.(EAX=7,ECX=1):EAX[bit 10]
> - fast short CMPSB, SCASB CPUID.(EAX=7,ECX=1):EAX[bit 12]
> - AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]
> - PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14]
> - XFD CPUID.(EAX=0xD,ECX=1):EAX[bit 4]
> - EPT_PAGE_WALK_LENGTH_5 VMX_EPT_VPID_CAP(0x48c)[bit 7]
> 
> SierraForest doesn't support TSX, so TSX Async Abort(TAA) vulnerabilities
> don't exist on SierraForest. On KVM side, if host doesn't enumerate RTM
> or RTM gets disabled, ARCH_CAP_TAA_NO is reported as unsupported. To
> avoid the confusing warning:
> warning: host doesn't support requested feature: MSR(10AH).taa-no
>  [bit 8]
> 
> just don't include TAA_NO in SierraForest CPU model.
> 
> Currently LAM and LASS are not enabled in KVM mainline yet,  will add
> them after merged.
> 
> Signed-off-by: Tao Su 
> ---

Reviewed-by: Zhao Liu

Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Paolo Bonzini

On Wed, Dec 6, 2023 at 3:46 PM Michael Roth  wrote:
> > There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is
> > simply EFER.LME && CR0.PG.
>
> Yah, I originally had it like that, but svm_set_cr0() in the kernel only
> sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might
> be safer to be more selective and mirror that handling on the QEMU side
> so we can leave as much of any other sanity checks on kernel/QEMU side
> intact as possible. E.g., if some other bug in the kernel ends up
> unsetting EFER.LMA while paging is still enabled, we'd still notice that
> when passing it back in via KVM_SET_SREGS*.
>
> But agree it's simpler to just always set it based on CR0.PG and EFER.LMA
> and can send a v3 if that's preferred.

Yeah, in this case I think the chance of something breaking is really,
really small.

The behavior of svm_set_cr0() is more due to how the surrounding code
looks like, than anything else.

> > Alternatively, sev_es_enabled() could be an assertion, that is:
> >
> > if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) &&
> >!(env->efer & MSR_EFER_LMA)) {
> > /* Workaround for... */
> > assert(sev_es_enabled());
> > env->efer |= MSR_EFER_LMA;
> > }
> >
> > What do you think?
>
> I'm a little apprehensive about this approach for similar reasons as
> above

I agree on this. I think it's worth in general to have clear
expectations, though. If you think it's worrisome, we can commit it
without assertion now and add it in 9.0.

Paolo

Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

2023-12-06 Thread Michael Roth

On Wed, Dec 06, 2023 at 02:41:13PM +0100, Paolo Bonzini wrote:
> On Tue, Dec 5, 2023 at 11:28 PM Michael Roth  wrote:
> > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
> >  env->gdt.limit = sregs.gdt.limit;
> >  env->gdt.base = sregs.gdt.base;
> >
> > +cr0_old = env->cr[0];
> >  env->cr[0] = sregs.cr0;
> >  env->cr[2] = sregs.cr2;
> >  env->cr[3] = sregs.cr3;
> >  env->cr[4] = sregs.cr4;
> >
> >  env->efer = sregs.efer;
> > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > +env->efer |= MSR_EFER_LMA;
> > +}
> > +}
> 
> There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is
> simply EFER.LME && CR0.PG.

Yah, I originally had it like that, but svm_set_cr0() in the kernel only
sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might
be safer to be more selective and mirror that handling on the QEMU side
so we can leave as much of any other sanity checks on kernel/QEMU side
intact as possible. E.g., if some other bug in the kernel ends up
unsetting EFER.LMA while paging is still enabled, we'd still notice that
when passing it back in via KVM_SET_SREGS*.

But agree it's simpler to just always set it based on CR0.PG and EFER.LMA
and can send a v3 if that's preferred.

> 
> Alternatively, sev_es_enabled() could be an assertion, that is:
> 
> if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) &&
>!(env->efer & MSR_EFER_LMA)) {
> /* Workaround for... */
> assert(sev_es_enabled());
> env->efer |= MSR_EFER_LMA;
> }
> 
> What do you think?

I'm a little apprehensive about this approach for similar reasons as
above. The current patch is guaranteed to only affect SEV-ES, whereas
this approach could trigger assertions for other edge-cases we aren't
aware of that could further impact the release. For instance, "in
theory", QEMU or KVM might have some handling where EFER.LMA is set
somewhere after (or outside of) KVM_GET_SREGS, but now with this
proposed change QEMU would become more restrictive and generate an
assert for those cases.

I don't think that's actually the case, but in the off-chance that such
a case exists there could be more fall-out such as further delays, or
the need for a stable fix. But no strong opinion there either if that
ends up being the preferred approach, just trying to consider the
pros/cons.

Thanks,

Mike

> 
> Thanks,
> 
> Paolo
> 
> >  /* changes to apic base and cr8/tpr are read back via 
> > kvm_arch_post_run */
> >  x86_update_hflags(env);
> > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
> >  {
> >  CPUX86State *env = >env;
> >  struct kvm_sregs2 sregs;
> > +target_ulong cr0_old;
> >  int i, ret;
> >
> >  ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, );
> > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
> >  env->gdt.limit = sregs.gdt.limit;
> >  env->gdt.base = sregs.gdt.base;
> >
> > +cr0_old = env->cr[0];
> >  env->cr[0] = sregs.cr0;
> >  env->cr[2] = sregs.cr2;
> >  env->cr[3] = sregs.cr3;
> >  env->cr[4] = sregs.cr4;
> >
> >  env->efer = sregs.efer;
> > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > +env->efer |= MSR_EFER_LMA;
> > +}
> > +}
> >
> >  env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
> >
> > --
> > 2.25.1
> >
>

Re: [PATCH 1/1] hw/i2c: add pca9543 i2c-mux switch

2023-12-06 Thread Corey Minyard

On Tue, Dec 05, 2023 at 11:05:33AM -0800, Patrick Venture wrote:
> On Tue, Nov 14, 2023 at 3:30 PM Corey Minyard  wrote:
> 
> > On Mon, Nov 13, 2023 at 02:31:56PM +0800, Potin Lai wrote:
> > > Add pca9543 2-channel i2c-mux switch support.
> > >
> > > Signed-off-by: Potin Lai 
> >
> > Looks good to me.
> >
> > Acked-by: Corey Minyard 
> >
> > > ---
> > >  hw/i2c/i2c_mux_pca954x.c | 12 
> > >  include/hw/i2c/i2c_mux_pca954x.h |  1 +
> > >  2 files changed, 13 insertions(+)
> > >
> > > diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
> > > index db5db956a6..6aace0fc47 100644
> > > --- a/hw/i2c/i2c_mux_pca954x.c
> > > +++ b/hw/i2c/i2c_mux_pca954x.c
> > > @@ -30,6 +30,7 @@
> > >
> > >  #define PCA9548_CHANNEL_COUNT 8
> > >  #define PCA9546_CHANNEL_COUNT 4
> > > +#define PCA9543_CHANNEL_COUNT 2
> > >
> > >  /*
> > >   * struct Pca954xState - The pca954x state object.
> > > @@ -172,6 +173,12 @@ I2CBus *pca954x_i2c_get_bus(I2CSlave *mux, uint8_t
> > channel)
> > >  return pca954x->bus[channel];
> > >  }
> > >
> > > +static void pca9543_class_init(ObjectClass *klass, void *data)
> > > +{
> > > +Pca954xClass *s = PCA954X_CLASS(klass);
> > > +s->nchans = PCA9543_CHANNEL_COUNT;
> > > +}
> > > +
> > >  static void pca9546_class_init(ObjectClass *klass, void *data)
> > >  {
> > >  Pca954xClass *s = PCA954X_CLASS(klass);
> > > @@ -246,6 +253,11 @@ static const TypeInfo pca954x_info[] = {
> > >  .class_init= pca954x_class_init,
> > >  .abstract  = true,
> > >  },
> > > +{
> > > +.name  = TYPE_PCA9543,
> > > +.parent= TYPE_PCA954X,
> > > +.class_init= pca9543_class_init,
> > > +},
> > >  {
> > >  .name  = TYPE_PCA9546,
> > >  .parent= TYPE_PCA954X,
> > > diff --git a/include/hw/i2c/i2c_mux_pca954x.h
> > b/include/hw/i2c/i2c_mux_pca954x.h
> > > index 3dd25ec983..1da5508ed5 100644
> > > --- a/include/hw/i2c/i2c_mux_pca954x.h
> > > +++ b/include/hw/i2c/i2c_mux_pca954x.h
> > > @@ -3,6 +3,7 @@
> > >
> > >  #include "hw/i2c/i2c.h"
> > >
> > > +#define TYPE_PCA9543 "pca9543"
> > >  #define TYPE_PCA9546 "pca9546"
> > >  #define TYPE_PCA9548 "pca9548"
> > >
> > > --
> > > 2.31.1
> > >
> > >
> 
> 
> Corey, can you include this in a pull email? I don't quite have that set up
> at present, Titus is going to help me figure it out.

Ok, I'm running tests now, I'll get a pull request in assuming
everything is ok.

-corey

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-06 Thread Philippe Mathieu-Daudé


Hi Stefan,

On 6/12/23 13:56, Michal Suchánek wrote:

On Tue, Dec 05, 2023 at 11:09:59AM +0100, Michal Suchánek wrote:

On Mon, Dec 04, 2023 at 03:02:45PM -0800, Richard Henderson wrote:

On 12/4/23 12:09, Michal Suchánek wrote:

On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote:

On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé  wrote:

+void tcg_unregister_thread(void)
+{
+unsigned int n;
+
+n = qatomic_fetch_dec(_cur_ctxs);
+g_free(tcg_ctxs[n]);
+qatomic_set(_ctxs[n], NULL);
+}


tcg_ctxs[n] may not be our context, so this looks like it could free
another thread's context and lead to undefined behavior.


Correct.


There is cpu->thread_id so perhaps cpu->thread_ctx could be added as
well. That would require a bitmap of used threads contexts rather than a
counter, though.


Or don't free the context at all, but re-use it when incrementing and
tcg_ctxs[n] != null (i.e. plugging in a repacement vcpu).  After all, there
can only be tcg_max_ctxs contexts.


But you would not know which contexts are in use and which aren't without
tracking the association of contexts to CPUs.

Unless there is a cpu array somewhere and you can use the same index for
both to create the association.


I tried to use cpu_index for correlating the tcg_ctx with cpu. I added
some asserts that only null contexts are allocated and non-null contexts
released but qemu crashes somewhere in tcg sometime after the guest gets
to switch root.


Since this isn't trivial and is a long standing issue, let's not
block the 8.2 release with it.

Regards,

Phil.

Re: [PATCH 10/11] chardev: force write all when recording replay logs

2023-12-06 Thread Philippe Mathieu-Daudé


On 5/12/23 21:41, Alex Bennée wrote:

This is mostly a problem within avocado as serial generally isn't busy
enough to overfill pipes. However the consequences of recording a
failed write will haunt us on replay when causing the log to go out of
sync.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2010
Signed-off-by: Alex Bennée 
Cc: Pavel Dovgalyuk 
---
  chardev/char.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/chardev/char.c b/chardev/char.c
index 996a024c7a..6e5b4d7345 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -171,7 +171,8 @@ int qemu_chr_write(Chardev *s, const uint8_t *buf, int len, 
bool write_all)
  return res;
  }


Please add a comment explaining why in the code. Maybe simpler
using:

   if (replay_mode == REPLAY_MODE_RECORD) {
   /* $explanation */
   write_all = true;
   }


-res = qemu_chr_write_buffer(s, buf, len, , write_all);
+res = qemu_chr_write_buffer(s, buf, len, ,
+replay_mode == REPLAY_MODE_RECORD ? true : 
write_all);

With a comment:
Reviewed-by: Philippe Mathieu-Daudé

1 2 >

1 - 100 of 123 matches

Mail list logo