Re: [PATCH v1] KVM: s390x: selftests: Add shared zeropage test
Am 15.04.24 um 16:19 schrieb Alexander Gordeev: On Fri, Apr 12, 2024 at 10:43:29AM +0200, David Hildenbrand wrote: Hi David, tools/testing/selftests/kvm/Makefile | 1 + .../kvm/s390x/shared_zeropage_test.c | 110 ++ 2 files changed, 111 insertions(+) create mode 100644 tools/testing/selftests/kvm/s390x/shared_zeropage_test.c Tested-by: Alexander Gordeev @Janosch, Christian, I think this patch should go via s390 tree together with https://lore.kernel.org/r/20240411161441.910170-3-da...@redhat.com Do you agree? Yes, thank you Acked-by: Christian Borntraeger
Re: [PATCH] target/s390x: improve cpu compatibility check error message
Am 14.03.24 um 20:00 schrieb Claudio Fontana: some users were confused by this message showing under TCG: Selected CPU generation is too new. Maximum supported model in the configuration: 'xyz' Try to clarify that the maximum can depend on the accel by adding also the current accelerator to the message as such: Selected CPU generation is too new. Maximum supported model in the accelerator 'tcg' configuration: 'xyz' Signed-off-by: Claudio Fontana Independent on which message we end up with (see comments in this mail thread), I agree that improving the error message is helpful. Acked-by: Christian Borntraeger --- target/s390x/cpu_models.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c index 1a1c096122..0d6d8fc727 100644 --- a/target/s390x/cpu_models.c +++ b/target/s390x/cpu_models.c @@ -508,14 +508,14 @@ static void check_compatibility(const S390CPUModel *max_model, if (model->def->gen > max_model->def->gen) { error_setg(errp, "Selected CPU generation is too new. Maximum " - "supported model in the configuration: \'%s\'", - max_model->def->name); + "supported model in the accelerator \'%s\' configuration: \'%s\'", + current_accel_name(), max_model->def->name); return; } else if (model->def->gen == max_model->def->gen && model->def->ec_ga > max_model->def->ec_ga) { error_setg(errp, "Selected CPU GA level is too new. Maximum " - "supported model in the configuration: \'%s\'", - max_model->def->name); + "supported model in the accelerator \'%s\' configuration: \'%s\'", + current_accel_name(), max_model->def->name); return; } @@ -537,7 +537,8 @@ static void check_compatibility(const S390CPUModel *max_model, error_setg(errp, " "); s390_feat_bitmap_to_ascii(missing, errp, error_prepend_missing_feat); error_prepend(errp, "Some features requested in the CPU model are not " - "available in the configuration: "); + "available in the accelerator \'%s\' configuration: ", + current_accel_name()); } S390CPUModel *get_max_cpu_model(Error **errp)
Re: [PATCH v3 2/2] KVM: s390: selftests: memop: add a simple AR test
Am 16.02.24 um 22:36 schrieb Eric Farman: There is a selftest that checks for an (expected) error when an invalid AR is specified, but not one that exercises the AR path. Add a simple test that mirrors the vanilla write/read test while providing an AR. An AR that contains zero will direct the CPU to use the primary address space normally used anyway. AR[1] is selected for this test because the host AR[1] is usually non-zero, and KVM needs to correctly swap those values. Signed-off-by: Eric Farman Acked-by: Christian Borntraeger
Re: [PATCH v3 1/2] KVM: s390: fix access register usage in ioctls
Am 16.02.24 um 22:36 schrieb Eric Farman: The routine ar_translation() can be reached by both the instruction intercept path (where the access registers had been loaded with the guest register contents), and the MEM_OP ioctls (which hadn't). Since this routine saves the current registers to vcpu->run, this routine erroneously saves host registers into the guest space. Introduce a boolean in the kvm_vcpu_arch struct to indicate whether the registers contain guest contents. If they do (the instruction intercept path), the save can be performed and the AR translation is done just as it is today. If they don't (the MEM_OP path), the AR can be read from vcpu->run without stashing the current contents. Signed-off-by: Eric Farman Acked-by: Christian Borntraeger Reviewed-by: Christian Borntraeger --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/gaccess.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 +++ 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 52664105a473..c86215eb4ca7 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -765,6 +765,7 @@ struct kvm_vcpu_arch { __u64 cputm_start; bool gs_enabled; bool skey_enabled; + bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; }; diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 5bfcc50c1a68..b4c805092021 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -391,7 +391,8 @@ static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, u8 ar, if (ar >= NUM_ACRS) return -EINVAL; - save_access_regs(vcpu->run->s.regs.acrs); + if (vcpu->arch.acrs_loaded) + save_access_regs(vcpu->run->s.regs.acrs); alet.val = vcpu->run->s.regs.acrs[ar]; if (ar == 0 || alet.val == 0) { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ea63ac769889..61092f0e0a66 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3951,6 +3951,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) KVM_SYNC_ARCH0 | KVM_SYNC_PFAULT | KVM_SYNC_DIAG318; + vcpu->arch.acrs_loaded = false; kvm_s390_set_prefix(vcpu, 0); if (test_kvm_facility(vcpu->kvm, 64)) vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB; @@ -4951,6 +4952,7 @@ static void sync_regs(struct kvm_vcpu *vcpu) } save_access_regs(vcpu->arch.host_acrs); restore_access_regs(vcpu->run->s.regs.acrs); + vcpu->arch.acrs_loaded = true; /* save host (userspace) fprs/vrs */ save_fpu_regs(); vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc; @@ -5021,6 +5023,7 @@ static void store_regs(struct kvm_vcpu *vcpu) kvm_run->s.regs.pfc = vcpu->arch.pfault_compare; save_access_regs(vcpu->run->s.regs.acrs); restore_access_regs(vcpu->arch.host_acrs); + vcpu->arch.acrs_loaded = false; /* Save guest register state */ save_fpu_regs(); vcpu->run->s.regs.fpc = current->thread.fpu.fpc;
Re: [PULL v2 00/39] tcg patch queue
Am 06.02.24 um 22:24 schrieb Peter Maydell: [..] Christian: this is on the s390x machine we have. Does the VM setup for that share IO or CPU with other VMs somehow? Yes it does, this is a shared system. I will talk to the team if there is anything that we can do about this.
Re: [PULL 2/7] s390x: do a subsystem reset before the unprotect on reboot
Am 11.01.24 um 10:43 schrieb Cédric Le Goater: [...] On a side note, I am also seeing : Michael? [ 73.989688] [ cut here ] [ 73.989696] unexpected non zero alert.mask 0x20 [ 73.989748] WARNING: CPU: 9 PID: 4503 at arch/s390/kvm/interrupt.c:3214 kvm_s390_gisa_destroy+0xd4/0xe8 [kvm] [ 73.989791] Modules linked in: vfio_pci vfio_pci_core irqbypass vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink 8021q garp mrp rfkill sunrpc ext4 mbcache jbd2 vfio_ap zcrypt_cex4 vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c dm_service_time mlx5_core sd_mod t10_pi ghash_s390 sg prng des_s390 libdes sha3_512_s390 sha3_256_s390 mlxfw tls scm_block psample eadm_sch qeth_l2 bridge stp llc dasd_eckd_mod zfcp qeth dasd_mod scsi_transport_fc ccwgroup qdio dm_multipath dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt kvm aes_s390 [ 73.989825] CPU: 9 PID: 4503 Comm: worker Kdump: loaded Not tainted 6.7.0-clg-dirty #52 [ 73.989827] Hardware name: IBM 3931 LA1 400 (LPAR) [ 73.989829] Krnl PSW : 0704c0018000 03ff7fcd2198 (kvm_s390_gisa_destroy+0xd8/0xe8 [kvm]) [ 73.989845] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 73.989847] Krnl GPRS: c000fffe 00070027 0023 0007df4249c8 [ 73.989849] 03800649b858 03800649b850 0007fcb9db00 [ 73.989851] 8ebae8c8 83a8c4f0 00b69900 8ebac000 [ 73.989853] 03ff903aef68 03800649bd98 03ff7fcd2194 03800649b9f8 [ 73.989859] Krnl Code: 03ff7fcd2188: c0224f88 larl %r2,03ff7fd1c098 03ff7fcd218e: c0e5fffea360 brasl %r14,03ff7fca684e #03ff7fcd2194: af00 mc 0,0 >03ff7fcd2198: e310b7680204 lg %r1,10088(%r11) 03ff7fcd219e: a7f4ffae brc 15,03ff7fcd20fa 03ff7fcd21a2: 0707 bcr 0,%r7 03ff7fcd21a4: 0707 bcr 0,%r7 03ff7fcd21a6: 0707 bcr 0,%r7 [ 73.989929] Call Trace: [ 73.989931] [<03ff7fcd2198>] kvm_s390_gisa_destroy+0xd8/0xe8 [kvm] [ 73.989946] ([<03ff7fcd2194>] kvm_s390_gisa_destroy+0xd4/0xe8 [kvm]) [ 73.989960] [<03ff7fcc1578>] kvm_arch_destroy_vm+0x50/0x118 [kvm] [ 73.989974] [<03ff7fcb00a2>] kvm_destroy_vm+0x15a/0x260 [kvm] [ 73.989985] [<03ff7fcb021e>] kvm_vm_release+0x36/0x48 [kvm] [ 73.989996] [<0007de4f830c>] __fput+0x94/0x2d0 [ 73.990009] [<0007de20d838>] task_work_run+0x88/0xe8 [ 73.990013] [<0007de1e75e0>] do_exit+0x2e0/0x4e0 [ 73.990016] [<0007de1e79c0>] do_group_exit+0x40/0xb8 [ 73.990017] [<0007de1f96e8>] send_sig_info+0x0/0xa8 [ 73.990021] [<0007de194b26>] arch_do_signal_or_restart+0x56/0x318 [ 73.990025] [<0007de28bf12>] exit_to_user_mode_prepare+0x10a/0x1a0 [ 73.990028] [<0007deb607d2>] __do_syscall+0x152/0x1f8 [ 73.990032] [<0007deb70ac8>] system_call+0x70/0x98 [ 73.990036] Last Breaking-Event-Address: [ 73.990037] [<0007de1e0c58>] __warn_printk+0x78/0xe8
Re: [PATCH 3/4] hw/s390x/ipl: Remove unused 'exec/exec-all.h' included header
Am 12.12.23 um 16:28 schrieb Eric Farman: So why do you think exec-all.h is unused? I think because that got moved out of exec-all.h a few months ago, via commit 3549118b498873c84b442bc280a5edafbb61e0a4 Author: Philippe Mathieu-Daudé Date: Thu Sep 14 20:57:08 2023 +0200 exec: Move cpu_loop_foo() target agnostic functions to 'cpu- common.h' While these functions are not TCG specific, they are not target specific. Move them to "exec/cpu-common.h" so their callers don't have to be tainted as target specific. Ah right, I was looking at an old QEMU version
Re: [PATCH 3/4] hw/s390x/ipl: Remove unused 'exec/exec-all.h' included header
Am 12.12.23 um 12:36 schrieb Philippe Mathieu-Daudé: Signed-off-by: Philippe Mathieu-Daudé --- hw/s390x/ipl.c | 1 - 1 file changed, 1 deletion(-) diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c index 515dcf51b5..62182d81a0 100644 --- a/hw/s390x/ipl.c +++ b/hw/s390x/ipl.c @@ -35,7 +35,6 @@ #include "qemu/cutils.h" #include "qemu/option.h" #include "standard-headers/linux/virtio_ids.h" -#include "exec/exec-all.h" Philippe, This include came with commit a30fb811cbe940020a498d2cdac9326cac38b4d9 Author: David Hildenbrand AuthorDate: Tue Apr 24 12:18:59 2018 +0200 Commit: Cornelia Huck CommitDate: Mon May 14 17:10:02 2018 +0200 s390x: refactor reset/reipl handling And I think one reason was cpu_loop_exit This is still part of ipl.c a30fb811cbe (David Hildenbrand 2018-04-24 12:18:59 +0200 664) /* as this is triggered by a CPU, make sure to exit the loop */ a30fb811cbe (David Hildenbrand 2018-04-24 12:18:59 +0200 665) if (tcg_enabled()) { a30fb811cbe (David Hildenbrand 2018-04-24 12:18:59 +0200 666) cpu_loop_exit(cs); a30fb811cbe (David Hildenbrand 2018-04-24 12:18:59 +0200 667) } So why do you think exec-all.h is unused?
Re: [PATCH 00/32] kmsan: Enable on s390
Am 16.11.23 um 11:13 schrieb Ilya Leoshkevich: It's also possible to get a free s390 machine at [1]. [1] https://linuxone.cloud.marist.edu/oss I think the URL for registration is this one https://linuxone.cloud.marist.edu/#/register?flag=VM
Re: [PATCH v2 06/13] migration: Use VMSTATE_INSTANCE_ID_ANY for s390 devices
Am 20.10.23 um 11:07 schrieb Juan Quintela: Just with make check I can see that we can have more than one of this devices, so use ANY. ok 5 /s390x/device/introspect/abstract-interfaces ... Broken pipe ../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) Aborted (core dumped) Reviewed-by: Stefan Berger Signed-off-by: Juan Quintela --- hw/s390x/s390-skeys.c| 3 ++- hw/s390x/s390-stattrib.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) Actually both devices should be theŕe only once - I think. diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c index 5024faf411..ef089e1967 100644 --- a/hw/s390x/s390-skeys.c +++ b/hw/s390x/s390-skeys.c @@ -22,6 +22,7 @@ #include "sysemu/kvm.h" #include "migration/qemu-file-types.h" #include "migration/register.h" +#include "migration/vmstate.h" #define S390_SKEYS_BUFFER_SIZE (128 * KiB) /* Room for 128k storage keys */ #define S390_SKEYS_SAVE_FLAG_EOS 0x01 @@ -457,7 +458,7 @@ static inline void s390_skeys_set_migration_enabled(Object *obj, bool value, ss->migration_enabled = value; if (ss->migration_enabled) { -register_savevm_live(TYPE_S390_SKEYS, 0, 1, +register_savevm_live(TYPE_S390_SKEYS, VMSTATE_INSTANCE_ID_ANY, 1, _s390_storage_keys, ss); } else { unregister_savevm(VMSTATE_IF(ss), TYPE_S390_SKEYS, ss); diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c index 220e845d12..055d382c3c 100644 --- a/hw/s390x/s390-stattrib.c +++ b/hw/s390x/s390-stattrib.c @@ -13,6 +13,7 @@ #include "qemu/units.h" #include "migration/qemu-file.h" #include "migration/register.h" +#include "migration/vmstate.h" #include "hw/s390x/storage-attributes.h" #include "qemu/error-report.h" #include "exec/ram_addr.h" @@ -380,7 +381,7 @@ static void s390_stattrib_instance_init(Object *obj) { S390StAttribState *sas = S390_STATTRIB(obj); -register_savevm_live(TYPE_S390_STATTRIB, 0, 0, +register_savevm_live(TYPE_S390_STATTRIB, VMSTATE_INSTANCE_ID_ANY, 0, _s390_stattrib_handlers, sas); object_property_add_bool(obj, "migration-enabled",
Re: [PATCH v2 06/13] migration: Use VMSTATE_INSTANCE_ID_ANY for s390 devices
Am 20.10.23 um 11:07 schrieb Juan Quintela: Just with make check I can see that we can have more than one of this devices, so use ANY. ok 5 /s390x/device/introspect/abstract-interfaces ... Broken pipe ../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) Aborted (core dumped) Reviewed-by: Stefan Berger Signed-off-by: Juan Quintela --- hw/s390x/s390-skeys.c| 3 ++- hw/s390x/s390-stattrib.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) Actually both devices should be theŕe only once - I think. diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c index 5024faf411..ef089e1967 100644 --- a/hw/s390x/s390-skeys.c +++ b/hw/s390x/s390-skeys.c @@ -22,6 +22,7 @@ #include "sysemu/kvm.h" #include "migration/qemu-file-types.h" #include "migration/register.h" +#include "migration/vmstate.h" #define S390_SKEYS_BUFFER_SIZE (128 * KiB) /* Room for 128k storage keys */ #define S390_SKEYS_SAVE_FLAG_EOS 0x01 @@ -457,7 +458,7 @@ static inline void s390_skeys_set_migration_enabled(Object *obj, bool value, ss->migration_enabled = value; if (ss->migration_enabled) { -register_savevm_live(TYPE_S390_SKEYS, 0, 1, +register_savevm_live(TYPE_S390_SKEYS, VMSTATE_INSTANCE_ID_ANY, 1, _s390_storage_keys, ss); } else { unregister_savevm(VMSTATE_IF(ss), TYPE_S390_SKEYS, ss); diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c index 220e845d12..055d382c3c 100644 --- a/hw/s390x/s390-stattrib.c +++ b/hw/s390x/s390-stattrib.c @@ -13,6 +13,7 @@ #include "qemu/units.h" #include "migration/qemu-file.h" #include "migration/register.h" +#include "migration/vmstate.h" #include "hw/s390x/storage-attributes.h" #include "qemu/error-report.h" #include "exec/ram_addr.h" @@ -380,7 +381,7 @@ static void s390_stattrib_instance_init(Object *obj) { S390StAttribState *sas = S390_STATTRIB(obj); -register_savevm_live(TYPE_S390_STATTRIB, 0, 0, +register_savevm_live(TYPE_S390_STATTRIB, VMSTATE_INSTANCE_ID_ANY, 0, _s390_stattrib_handlers, sas); object_property_add_bool(obj, "migration-enabled",
Re: [PATCH v2 2/2] target/s390x/kvm: Simplify the GPRs, ACRs, CRs and prefix synchronization code
Am 09.10.23 um 19:07 schrieb Thomas Huth: [...] @@ -483,20 +482,14 @@ int kvm_arch_put_registers(CPUState *cs, int level) cs->kvm_run->psw_addr = env->psw.addr; cs->kvm_run->psw_mask = env->psw.mask; -if (can_sync_regs(cs, KVM_SYNC_GPRS)) { -for (i = 0; i < 16; i++) { -cs->kvm_run->s.regs.gprs[i] = env->regs[i]; -cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_GPRS; -} -} else { -for (i = 0; i < 16; i++) { -regs.gprs[i] = env->regs[i]; -} -r = kvm_vcpu_ioctl(cs, KVM_SET_REGS, ); -if (r < 0) { -return r; -} -} +g_assert((cs->kvm_run->kvm_valid_regs & KVM_SYNC_REQUIRED_BITS) == + KVM_SYNC_REQUIRED_BITS); +cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_REQUIRED_BITS; +memcpy(cs->kvm_run->s.regs.gprs, env->regs, sizeof(cs->kvm_run->s.regs.gprs)); +memcpy(cs->kvm_run->s.regs.acrs, env->aregs, sizeof(cs->kvm_run->s.regs.acrs)); +memcpy(cs->kvm_run->s.regs.crs, env->cregs, sizeof(cs->kvm_run->s.regs.crs)); + +cs->kvm_run->s.regs.prefix = env->psa; These 3 have only been saved for level> KVM_PUT_RUNTIME_STATE. By moving the memcpy into this position you would always write them. Probably ok but a change in behaviour. Was this intentional?
Re: [PATCH v2 1/2] target/s390x/kvm: Turn KVM_CAP_SYNC_REGS into a hard requirement
Am 10.10.23 um 13:12 schrieb Thomas Huth: On 10/10/2023 13.02, Christian Borntraeger wrote: Am 09.10.23 um 19:07 schrieb Thomas Huth: Since we already require at least kernel 3.15 in the s390x KVM code, we can assume that the KVM_CAP_SYNC_REGS capability is always there. Thus turn this into a hard requirement now. Signed-off-by: Thomas Huth --- target/s390x/kvm/kvm.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c index bc5c56a305..b3e2eaa2eb 100644 --- a/target/s390x/kvm/kvm.c +++ b/target/s390x/kvm/kvm.c @@ -337,21 +337,29 @@ int kvm_arch_get_default_type(MachineState *ms) int kvm_arch_init(MachineState *ms, KVMState *s) { + int required_caps[] = { + KVM_CAP_DEVICE_CTRL, + KVM_CAP_SYNC_REGS, + }; + + for (int i = 0; i < ARRAY_SIZE(required_caps); i++) { + if (!kvm_check_extension(s, required_caps[i])) { + error_report("KVM is missing capability #%d - " + "please use kernel 3.15 or newer", required_caps[i]); + return -1; + } + } + object_class_foreach(ccw_machine_class_foreach, TYPE_S390_CCW_MACHINE, false, NULL); - if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) { - error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - " - "please use kernel 3.15 or newer"); - return -1; - } if (!kvm_check_extension(s, KVM_CAP_S390_COW)) { error_report("KVM is missing capability KVM_CAP_S390_COW - " "unsupported environment"); return -1; } Not sure if we also want to move KVM_CAP_S390_COW somehow. The message would be different. IIRC that error could happen when you ran KVM within an older version of z/VM, so the "please use kernel 3.15 or newer" message would be completely misleading there. Yes, thats what I was trying to say, we would need a different message. Lets go with this patch. Aparch from that: Reviewed-by: Christian Borntraeger Thanks, Thomas
Re: [PATCH v2 1/2] target/s390x/kvm: Turn KVM_CAP_SYNC_REGS into a hard requirement
Am 09.10.23 um 19:07 schrieb Thomas Huth: Since we already require at least kernel 3.15 in the s390x KVM code, we can assume that the KVM_CAP_SYNC_REGS capability is always there. Thus turn this into a hard requirement now. Signed-off-by: Thomas Huth --- target/s390x/kvm/kvm.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c index bc5c56a305..b3e2eaa2eb 100644 --- a/target/s390x/kvm/kvm.c +++ b/target/s390x/kvm/kvm.c @@ -337,21 +337,29 @@ int kvm_arch_get_default_type(MachineState *ms) int kvm_arch_init(MachineState *ms, KVMState *s) { +int required_caps[] = { +KVM_CAP_DEVICE_CTRL, +KVM_CAP_SYNC_REGS, +}; + +for (int i = 0; i < ARRAY_SIZE(required_caps); i++) { +if (!kvm_check_extension(s, required_caps[i])) { +error_report("KVM is missing capability #%d - " + "please use kernel 3.15 or newer", required_caps[i]); +return -1; +} +} + object_class_foreach(ccw_machine_class_foreach, TYPE_S390_CCW_MACHINE, false, NULL); -if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) { -error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - " - "please use kernel 3.15 or newer"); -return -1; -} if (!kvm_check_extension(s, KVM_CAP_S390_COW)) { error_report("KVM is missing capability KVM_CAP_S390_COW - " "unsupported environment"); return -1; } Not sure if we also want to move KVM_CAP_S390_COW somehow. The message would be different. Aparch from that: Reviewed-by: Christian Borntraeger -cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS); +cap_sync_regs = true; cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF); cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP); cap_mem_op_extension = kvm_check_extension(s, KVM_CAP_S390_MEM_OP_EXTENSION);
Re: [PATCH] s390x: do a subsystem reset before the unprotect on reboot
Am 01.09.23 um 13:48 schrieb Janosch Frank: Bound APQNs have to be reset before tearing down the secure config via s390_machine_unprotect(). Otherwise the Ultravisor will return a error code. So let's do a subsystem_reset() which includes a AP reset before the unprotect call. We'll do a full device_reset() afterwards which will reset some devices twice. That's ok since we can't move the device_reset() before the unprotect as it includes a CPU clear reset which the Ultravisor does not expect at that point in time. Signed-off-by: Janosch Frank Makes sense. Acked-by: Christian Borntraeger --- I'm not able to test this for the PV AP case right new, that has to wait to early next week. However Marc told me that the non-AP PV test works fine now. --- hw/s390x/s390-virtio-ccw.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 3dd0b2372d..2d75f2131f 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -438,10 +438,20 @@ static void s390_machine_reset(MachineState *machine, ShutdownCause reason) switch (reset_type) { case S390_RESET_EXTERNAL: case S390_RESET_REIPL: +/* + * Reset the subsystem which includes a AP reset. If a PV + * guest had APQNs attached the AP reset is a prerequisite to + * unprotecting since the UV checks if all APQNs are reset. + */ +subsystem_reset(); if (s390_is_pv()) { s390_machine_unprotect(ms); } +/* + * Device reset includes CPU clear resets so this has to be + * done AFTER the unprotect call above. + */ qemu_devices_reset(reason); s390_crypto_reset();
Re: SLES15SP4 IPL failure: stage2 boot loader
Am 03.08.23 um 18:31 schrieb Clovis Pereira: Looks like a problem with the host (KVM) not with the guest (SLES). Right. There was at least one bug in QEMU where in (seldom) cases the boot loader failed: https://gitlab.com/qemu-project/qemu/-/commit/5f97ba0c74ccace0a4014460de9751ff3c6f454a And maybe the guest update to SP4 just triggered that corner case and the update to SP5 fixed it. In any case qemu 5.2 from SLES15SP3 (or newer) should contain that fix so if you see that again, updating the host should fix it. Christian Clovis Pereira IT Mainframe Specialist - SW services TLS - IBM Technology Lifecycle Services 55 11 99925-6242 (Mobile) gclo...@br.ibm.com IBM De: Linux on 390 Port em nome de Alan Haff Enviado: quinta-feira, 3 de agosto de 2023 12:31 Para: LINUX-390@VM.MARIST.EDU Assunto: [EXTERNAL] Re: SLES15SP4 IPL failure: stage2 boot loader What QEMU do you have in the host? (rpm -qa | grep qemu). Is that up2date? qemu-s390-4.2.1-11.16.3.s390x I know, I'm pretty far backlevel. Strange that it worked ok with SP3, though. The good news is that I've been able to work around the problem for now by extracting the kernel and initrd from the guest disk and storing it on the host to IPL the guest with a direct kernel boot. -Original Message- From: Linux on 390 Port On Behalf Of Christian Borntraeger Sent: Thursday, August 3, 2023 03:08 To: LINUX-390@VM.MARIST.EDU Subject: [EXTERNAL] - Re: SLES15SP4 IPL failure: stage2 boot loader Am 02.08.23 um 21:11 schrieb Alan Haff: I have a SLES15 SP4 system running under KVM on a z13s that refuses to IPL after upgrading from SP3. The failure message is "! Cannot read stage2 boot loader !". I've IPLed into a rescue shell from the SP4 ISO, set up a chroot environment, and re-ran grub2-install with no obvious errors. Still no joy. Not sure what to try next. Any ideas/suggestions/assistance would be greatly appreciated. This looks like an error of the QEMU loader not being able to find the right things on the disk. What QEMU do you have in the host? (rpm -qa | grep qemu). Is that up2date? -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390
Re: SLES15SP4 IPL failure: stage2 boot loader
Am 02.08.23 um 21:11 schrieb Alan Haff: I have a SLES15 SP4 system running under KVM on a z13s that refuses to IPL after upgrading from SP3. The failure message is "! Cannot read stage2 boot loader !". I've IPLed into a rescue shell from the SP4 ISO, set up a chroot environment, and re-ran grub2-install with no obvious errors. Still no joy. Not sure what to try next. Any ideas/suggestions/assistance would be greatly appreciated. This looks like an error of the QEMU loader not being able to find the right things on the disk. What QEMU do you have in the host? (rpm -qa | grep qemu). Is that up2date? -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390
Re: s390 intermittent test failure in qemu:block / io-qcow2-copy-before-write
Am 25.07.23 um 18:45 schrieb Peter Maydell: There seems to be an intermittent failure on the s390 host in the qemu:block / io-qcow2-copy-before-write test: https://gitlab.com/qemu-project/qemu/-/jobs/4737819873 The log says the test was expecting to do some reading and writing but got an unexpected 'permission denied' error on the read. Any idea why this might happen ? 768/835 qemu:block / io-qcow2-copy-before-write ERROR 12.05s exit status 1 PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=101 /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/qemu-iotests --build-dir /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qemu-iotests ― ✀ ― stderr: --- /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/qemu-iotests/tests/copy-before-write.out +++ /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad @@ -1,5 +1,21 @@ - +...F +== +FAIL: test_timeout_break_snapshot (__main__.TestCbwError) +-- +Traceback (most recent call last): + File "/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot + self.assertEqual(log, """\ +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n' + wrote 524288/524288 bytes at offset 0 + 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + wrote 524288/524288 bytes at offset 524288 + 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ++ read failed: Permission denied +- read 1048576/1048576 bytes at offset 0 +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + + -- Ran 4 tests -OK +FAILED (failures=1) (test program exited with status code 1) ―― Same failure, previous job: https://gitlab.com/qemu-project/qemu/-/jobs/4736463062 This one's a "Failed to get write lock" in io-qcow2-161: https://gitlab.com/qemu-project/qemu/-/jobs/4734846533 See https://lore.kernel.org/qemu-devel/028059df-eaf4-9e65-a195-4733b708a...@linux.ibm.com/ I was not yet able to understand that. And I can only reproduce with Ubuntu (RHEL,SLES,Fedora are fine). Upstream kernel on Ubuntu does not help, so I assume it must be something in the libraries or config.
Re: s390 intermittent test failure in qemu:block / io-qcow2-copy-before-write
Am 25.07.23 um 18:45 schrieb Peter Maydell: There seems to be an intermittent failure on the s390 host in the qemu:block / io-qcow2-copy-before-write test: https://gitlab.com/qemu-project/qemu/-/jobs/4737819873 The log says the test was expecting to do some reading and writing but got an unexpected 'permission denied' error on the read. Any idea why this might happen ? 768/835 qemu:block / io-qcow2-copy-before-write ERROR 12.05s exit status 1 PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=101 /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/../tests/qemu-iotests/check -tap -qcow2 copy-before-write --source-dir /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/qemu-iotests --build-dir /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qemu-iotests ― ✀ ― stderr: --- /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/qemu-iotests/tests/copy-before-write.out +++ /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad @@ -1,5 +1,21 @@ - +...F +== +FAIL: test_timeout_break_snapshot (__main__.TestCbwError) +-- +Traceback (most recent call last): + File "/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/qemu-iotests/tests/copy-before-write", line 210, in test_timeout_break_snapshot + self.assertEqual(log, """\ +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n' + wrote 524288/524288 bytes at offset 0 + 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + wrote 524288/524288 bytes at offset 524288 + 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ++ read failed: Permission denied +- read 1048576/1048576 bytes at offset 0 +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + + -- Ran 4 tests -OK +FAILED (failures=1) (test program exited with status code 1) ―― Same failure, previous job: https://gitlab.com/qemu-project/qemu/-/jobs/4736463062 This one's a "Failed to get write lock" in io-qcow2-161: https://gitlab.com/qemu-project/qemu/-/jobs/4734846533 See https://lore.kernel.org/qemu-devel/028059df-eaf4-9e65-a195-4733b708a...@linux.ibm.com/ I was not yet able to understand that. And I can only reproduce with Ubuntu (RHEL,SLES,Fedora are fine). Upstream kernel on Ubuntu does not help, so I assume it must be something in the libraries or config.
Re: [PATCH] pc-bios/s390-ccw: Get rid of the the __u* types
Am 28.06.23 um 10:32 schrieb Thomas Huth: On 27/06/2023 14.19, Christian Borntraeger wrote: Am 27.06.23 um 13:41 schrieb Thomas Huth: Using types starting with double underscores should be avoided since these names are marked as reserved by the C standard. The corresponding Linux In general I think this change is fine, but this is kind of interesting, as /usr/include/linux/types.h does have __u64 and friends. In fact there is __u64 but not u64 in /usr/include. And yes a google search for double underscore has The use of two underscores (` __ ') in identifiers is reserved for the compiler's internal use according to the ANSI-C standard. Underscores (` _ ') are often used in names of library functions (such as " _main " and " _exit "). In order to avoid collisions, do not begin an identifier with an underscore. kernel header file has also been changed accordingly a long time ago: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/s390/cio/cio.h?id=cd6b4f27b9bb2a but IIRC from a kernel perspective u64 is for kernel internal uint64_t and __u64 is for uapi, e.g. see https://lkml.indiana.edu/hypermail/linux/kernel/1401.2/02851.html So in essence we (QEMU/s390-ccw) have to decide what to use for our internal purposes. And yes, u64 and this patch is certainly ok. But we might need to change the patch description Ok, agreed, the patch description could be better. Maybe just something like this: " The types starting with double underscores have likely been introduced into the s390-ccw bios to be able to re-use structs from the Linux kernel in the past, but the corresponding structs in cio.h have been changed there a long time ago already to not use the variants with the double underscores anymore: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/s390/cio/cio.h?id=cd6b4f27b9bb2a So it would be good to replace these in the s390-ccw bios now, too. Yes, looks good.
Re: [PATCH] pc-bios/s390-ccw: Get rid of the the __u* types
Am 27.06.23 um 13:41 schrieb Thomas Huth: Using types starting with double underscores should be avoided since these names are marked as reserved by the C standard. The corresponding Linux In general I think this change is fine, but this is kind of interesting, as /usr/include/linux/types.h does have __u64 and friends. In fact there is __u64 but not u64 in /usr/include. And yes a google search for double underscore has The use of two underscores (` __ ') in identifiers is reserved for the compiler's internal use according to the ANSI-C standard. Underscores (` _ ') are often used in names of library functions (such as " _main " and " _exit "). In order to avoid collisions, do not begin an identifier with an underscore. kernel header file has also been changed accordingly a long time ago: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/s390/cio/cio.h?id=cd6b4f27b9bb2a but IIRC from a kernel perspective u64 is for kernel internal uint64_t and __u64 is for uapi, e.g. see https://lkml.indiana.edu/hypermail/linux/kernel/1401.2/02851.html So in essence we (QEMU/s390-ccw) have to decide what to use for our internal purposes. And yes, u64 and this patch is certainly ok. But we might need to change the patch description So we should get rid of the __u* in the s390-ccw bios now finally, too. Signed-off-by: Thomas Huth --- Based-on: <20230510143925.4094-4-quint...@redhat.com> pc-bios/s390-ccw/cio.h | 232 ++-- pc-bios/s390-ccw/s390-ccw.h | 4 - 2 files changed, 116 insertions(+), 120 deletions(-) diff --git a/pc-bios/s390-ccw/s390-ccw.h b/pc-bios/s390-ccw/s390-ccw.h index efeb449572..c977a52b50 100644 --- a/pc-bios/s390-ccw/s390-ccw.h +++ b/pc-bios/s390-ccw/s390-ccw.h @@ -17,10 +17,6 @@ typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; typedef unsigned long long u64; -typedef unsigned char __u8; -typedef unsigned short __u16; -typedef unsigned int __u32; -typedef unsigned long long __u64; #define true 1 #define false 0 diff --git a/pc-bios/s390-ccw/cio.h b/pc-bios/s390-ccw/cio.h index 88a88adfd2..8b18153deb 100644 --- a/pc-bios/s390-ccw/cio.h +++ b/pc-bios/s390-ccw/cio.h @@ -17,32 +17,32 @@ * path management control word */ struct pmcw { -__u32 intparm; /* interruption parameter */ -__u32 qf:1; /* qdio facility */ -__u32 w:1; -__u32 isc:3;/* interruption subclass */ -__u32 res5:3; /* reserved zeros */ -__u32 ena:1;/* enabled */ -__u32 lm:2; /* limit mode */ -__u32 mme:2;/* measurement-mode enable */ -__u32 mp:1; /* multipath mode */ -__u32 tf:1; /* timing facility */ -__u32 dnv:1;/* device number valid */ -__u32 dev:16; /* device number */ -__u8 lpm; /* logical path mask */ -__u8 pnom; /* path not operational mask */ -__u8 lpum; /* last path used mask */ -__u8 pim; /* path installed mask */ -__u16 mbi; /* measurement-block index */ -__u8 pom; /* path operational mask */ -__u8 pam; /* path available mask */ -__u8 chpid[8]; /* CHPID 0-7 (if available) */ -__u32 unused1:8;/* reserved zeros */ -__u32 st:3; /* subchannel type */ -__u32 unused2:18; /* reserved zeros */ -__u32 mbfc:1; /* measurement block format control */ -__u32 xmwme:1; /* extended measurement word mode enable */ -__u32 csense:1; /* concurrent sense; can be enabled ...*/ +u32 intparm;/* interruption parameter */ +u32 qf:1; /* qdio facility */ +u32 w:1; +u32 isc:3; /* interruption subclass */ +u32 res5:3; /* reserved zeros */ +u32 ena:1; /* enabled */ +u32 lm:2; /* limit mode */ +u32 mme:2; /* measurement-mode enable */ +u32 mp:1; /* multipath mode */ +u32 tf:1; /* timing facility */ +u32 dnv:1; /* device number valid */ +u32 dev:16; /* device number */ +u8 lpm;/* logical path mask */ +u8 pnom; /* path not operational mask */ +u8 lpum; /* last path used mask */ +u8 pim;/* path installed mask */ +u16 mbi;/* measurement-block index */ +u8 pom;/* path operational mask */ +u8 pam;/* path available mask */ +u8 chpid[8]; /* CHPID 0-7 (if available) */ +u32 unused1:8; /* reserved zeros */ +u32 st:3; /* subchannel type */ +u32 unused2:18; /* reserved zeros */ +u32 mbfc:1; /* measurement block format control */ +u32 xmwme:1;/* extended measurement word mode enable */ +u32 csense:1; /* concurrent sense; can be enabled ...*/ /* ... per MSCH,
Re: [PATCH 3/4] pc-bios/s390-ccw: Move the stack array into start.S
Am 26.06.23 um 15:21 schrieb Thomas Huth: diff --git a/pc-bios/s390-ccw/start.S b/pc-bios/s390-ccw/start.S index 29b0a9ece0..47ef6e8aa8 100644 --- a/pc-bios/s390-ccw/start.S +++ b/pc-bios/s390-ccw/start.S @@ -120,3 +120,8 @@ external_new_mask: .quad 0x00018000 io_new_mask: .quad 0x00018000 + +.bss + +.align 16 +.lcomm stack,STACK_SIZE IIRC, the ELF ABI defines the stack to be 8 byte aligned, but 16 certainly does not hurt. Acked-by: Christian Borntraeger
Re: [PATCH 2/4] pc-bios/s390-ccw: Provide space for initial stack frame in start.S
Am 26.06.23 um 15:21 schrieb Thomas Huth: Providing the space of a stack frame is the duty of the caller, so we should reserve 160 bytes before jumping into the main function. Otherwise the main() function might write past the stack array. While we're at it, add a proper STACK_SIZE macro for the stack size instead of using magic numbers (this is also required for the following patch). Signed-off-by: Thomas Huth --- pc-bios/s390-ccw/start.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/pc-bios/s390-ccw/start.S b/pc-bios/s390-ccw/start.S index d29de09cc6..29b0a9ece0 100644 --- a/pc-bios/s390-ccw/start.S +++ b/pc-bios/s390-ccw/start.S @@ -10,10 +10,12 @@ * directory. */ +#define STACK_SIZE 0x8000 + .globl _start _start: -larl%r15,stack + 0x8000 /* Set up stack */ +larl%r15,stack + STACK_SIZE - 160 /* Set up stack */ Reviewed-by: Christian Borntraeger
Re: [PATCH] pc-bios/s390-ccw/Makefile: Use -z noexecstack to silence linker warning
Am 22.06.23 um 15:08 schrieb Thomas Huth: Recent versions of ld complain when linking the s390-ccw bios: /usr/bin/ld: warning: start.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker We can silence the warning by telling the linker to mark the stack as not executable. Signed-off-by: Thomas Huth --- pc-bios/s390-ccw/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile index 2e8cc015aa..acfcd1e71a 100644 --- a/pc-bios/s390-ccw/Makefile +++ b/pc-bios/s390-ccw/Makefile @@ -55,7 +55,7 @@ config-cc.mak: Makefile $(call cc-option,-march=z900,-march=z10)) 3> config-cc.mak -include config-cc.mak -LDFLAGS += -Wl,-pie -nostdlib +LDFLAGS += -Wl,-pie -nostdlib -z noexecstack build-all: s390-ccw.img s390-netboot.img In the end this should not matter as the resulting binary is not loaded by an elf loader so this should be fine Acked-by: Christian Borntraeger
Re: s390 private runner CI job timing out
Am 06.04.23 um 14:39 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 13:30, Thomas Huth wrote: The thing is: it shouldn't take that long to build QEMU and run the tests here, theoretically. Some days ago, the job was finishing in 39 minutes: https://gitlab.com/qemu-project/qemu/-/jobs/3973481571 The recent run took 74 minutes: https://gitlab.com/qemu-project/qemu/-/jobs/4066136770 That's almost a factor of two! So there is definitely something strange going on. So that 39 minute run was about 22 minutes compile, 17 minutes test. The 74 minute run was 45 minutes compile, 30 minutes test. The number of compile steps in meson was pretty much the same (10379 vs 10384) in each case. So the most plausible conclusion seems like "the VM mysteriously got slower by nearly a factor of 2", given that the slowdown seems to affect the compile and test stages about equally. The VM has been up for 44 days, so we can rule out "rebooted into a new kernel with some kind of perf bug". Looks like the team is looking into some storage delays at the moment. So it would be good to get some numbers for io wait and steal on the next run to see if this is related to storage or CPU consumption. Maybe collect /proc/stat before and after a run. Or login and use top.
Re: s390 private runner CI job timing out
Am 06.04.23 um 14:39 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 13:30, Thomas Huth wrote: The thing is: it shouldn't take that long to build QEMU and run the tests here, theoretically. Some days ago, the job was finishing in 39 minutes: https://gitlab.com/qemu-project/qemu/-/jobs/3973481571 The recent run took 74 minutes: https://gitlab.com/qemu-project/qemu/-/jobs/4066136770 That's almost a factor of two! So there is definitely something strange going on. So that 39 minute run was about 22 minutes compile, 17 minutes test. The 74 minute run was 45 minutes compile, 30 minutes test. The number of compile steps in meson was pretty much the same (10379 vs 10384) in each case. So the most plausible conclusion seems like "the VM mysteriously got slower by nearly a factor of 2", given that the slowdown seems to affect the compile and test stages about equally. The VM has been up for 44 days, so we can rule out "rebooted into a new kernel with some kind of perf bug". I will ask around if we can get a higher share of the system.
Re: s390 private runner CI job timing out
Am 06.04.23 um 14:30 schrieb Thomas Huth: On 06/04/2023 14.13, Christian Borntraeger wrote: Am 06.04.23 um 14:05 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 12:17, Christian Borntraeger wrote: Am 06.04.23 um 12:44 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 11:40, Christian Borntraeger wrote: Am 06.04.23 um 11:21 schrieb Peter Maydell: Christian, does our S390X machine get a guaranteed amount of CPU, or does it depend on what else is running on the hardware? I think its a shared system with shared CPUs. Can you check the steal time in top or proc? If this is far too high we could ask to give you more weight for that VM. It's idle at the moment and steal time seems to be low (0.0 .. 0.3); I'll try to remember to check next time it's running a job. Do you have /proc/stat ? Yes; hopefully it means more to you than it does to me :-) linux1@qemu01:~$ cat /proc/stat cpu 60904459 604975 15052194 1435958176 17128179 351949 758578 22218760 0 0 cpu0 15022535 146734 3786909 358774818 4283172 98313 237156 5894809 0 0 cpu1 15306890 151164 3746024 358968957 4378864 85629 172492 5434255 0 0 cpu2 15307709 157180 3762691 359141276 4138714 85736 176367 5474594 0 0 cpu3 15267324 149895 3756569 359073124 4327428 82269 172562 5415101 0 0 This is user,nice,system,idle,iowait,irq,softirq,steal,guest,guest_nice So overall there is some (20-30% since the last reboot) steal going on. Not sure if this is the real problem since it is only Ubuntu 20.04. Does a higher timeout make the problem go away? The thing is: it shouldn't take that long to build QEMU and run the tests here, theoretically. Some days ago, the job was finishing in 39 minutes: https://gitlab.com/qemu-project/qemu/-/jobs/3973481571 The recent run took 74 minutes: https://gitlab.com/qemu-project/qemu/-/jobs/4066136770 That's almost a factor of two! So there is definitely something strange going on. But this has 786 instead of 659 tests, no?
Re: s390 private runner CI job timing out
Am 06.04.23 um 14:05 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 12:17, Christian Borntraeger wrote: Am 06.04.23 um 12:44 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 11:40, Christian Borntraeger wrote: Am 06.04.23 um 11:21 schrieb Peter Maydell: Christian, does our S390X machine get a guaranteed amount of CPU, or does it depend on what else is running on the hardware? I think its a shared system with shared CPUs. Can you check the steal time in top or proc? If this is far too high we could ask to give you more weight for that VM. It's idle at the moment and steal time seems to be low (0.0 .. 0.3); I'll try to remember to check next time it's running a job. Do you have /proc/stat ? Yes; hopefully it means more to you than it does to me :-) linux1@qemu01:~$ cat /proc/stat cpu 60904459 604975 15052194 1435958176 17128179 351949 758578 22218760 0 0 cpu0 15022535 146734 3786909 358774818 4283172 98313 237156 5894809 0 0 cpu1 15306890 151164 3746024 358968957 4378864 85629 172492 5434255 0 0 cpu2 15307709 157180 3762691 359141276 4138714 85736 176367 5474594 0 0 cpu3 15267324 149895 3756569 359073124 4327428 82269 172562 5415101 0 0 This is user,nice,system,idle,iowait,irq,softirq,steal,guest,guest_nice So overall there is some (20-30% since the last reboot) steal going on. Not sure if this is the real problem since it is only Ubuntu 20.04. Does a higher timeout make the problem go away?
Re: s390 private runner CI job timing out
Am 06.04.23 um 12:44 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 11:40, Christian Borntraeger wrote: Am 06.04.23 um 11:21 schrieb Peter Maydell: Christian, does our S390X machine get a guaranteed amount of CPU, or does it depend on what else is running on the hardware? I think its a shared system with shared CPUs. Can you check the steal time in top or proc? If this is far too high we could ask to give you more weight for that VM. It's idle at the moment and steal time seems to be low (0.0 .. 0.3); I'll try to remember to check next time it's running a job. Do you have /proc/stat ?
Re: s390 private runner CI job timing out
Am 06.04.23 um 11:21 schrieb Peter Maydell: On Thu, 6 Apr 2023 at 07:57, Thomas Huth wrote: On 05/04/2023 17.15, Peter Maydell wrote: The s390 private runner CI job ubuntu-20.04-s390x-all seems to have started timing out a lot recently. Here's an example where it passed, but with only 53 seconds left on the clock before it would have been killed: https://gitlab.com/qemu-project/qemu/-/jobs/4066136770 It looks like 'make check' was about 30 minutes of the 75 minutes total, and compilation was 45 minutes. Any suggestions for how we can trim this down? (Presumably we could also raise the time limit given that this is a private runner job...) I don't have access to that system, so I can only guess: Did you check whether there are any other processes still running (leftovers from earlier test runs)? I did check for that, as it's been a problem in the past, but no, in this case no other jobs are running in the VM. If not, it's maybe because it is a VM that is running with other VMs in parallel that hog the CPU? In that case, you could contact the owner of the machine and ask whether there is anything that could be done about it. Or simply increase the timeout a little bit more... (our highest timeout in another job is 90 minutes, so I think that would still be OK?). Christian, does our S390X machine get a guaranteed amount of CPU, or does it depend on what else is running on the hardware? I think its a shared system with shared CPUs. Can you check the steal time in top or proc? If this is far too high we could ask to give you more weight for that VM.
Re: [PATCH] s390x/gdb: Split s390-virt.xml
Am 13.03.23 um 22:16 schrieb Ilya Leoshkevich: TCG emulates ckc, cputm, last_break and prefix, and it's quite useful to have them during debugging. KVM provides those as well so I dont get what you are trying to do here. (I would understand moving out the pfault things into a KVM section) So move them into the new s390-virt-tcg.xml file. pp, pfault_token, pfault_select and pfault_compare are not emulated, so keep them in s390-virt.xml. Signed-off-by: Ilya Leoshkevich --- configs/targets/s390x-linux-user.mak | 2 +- configs/targets/s390x-softmmu.mak| 2 +- gdb-xml/s390-virt-tcg.xml| 14 + gdb-xml/s390-virt.xml| 4 -- target/s390x/gdbstub.c | 82 ++-- 5 files changed, 69 insertions(+), 35 deletions(-) create mode 100644 gdb-xml/s390-virt-tcg.xml diff --git a/configs/targets/s390x-linux-user.mak b/configs/targets/s390x-linux-user.mak index e2978248ede..fb3e2b73be7 100644 --- a/configs/targets/s390x-linux-user.mak +++ b/configs/targets/s390x-linux-user.mak @@ -2,4 +2,4 @@ TARGET_ARCH=s390x TARGET_SYSTBL_ABI=common,64 TARGET_SYSTBL=syscall.tbl TARGET_BIG_ENDIAN=y -TARGET_XML_FILES= gdb-xml/s390x-core64.xml gdb-xml/s390-acr.xml gdb-xml/s390-fpr.xml gdb-xml/s390-vx.xml gdb-xml/s390-cr.xml gdb-xml/s390-virt.xml gdb-xml/s390-gs.xml +TARGET_XML_FILES= gdb-xml/s390x-core64.xml gdb-xml/s390-acr.xml gdb-xml/s390-fpr.xml gdb-xml/s390-vx.xml gdb-xml/s390-cr.xml gdb-xml/s390-virt.xml gdb-xml/s390-virt-tcg.xml gdb-xml/s390-gs.xml diff --git a/configs/targets/s390x-softmmu.mak b/configs/targets/s390x-softmmu.mak index 258b4cf3582..554330d7c85 100644 --- a/configs/targets/s390x-softmmu.mak +++ b/configs/targets/s390x-softmmu.mak @@ -1,4 +1,4 @@ TARGET_ARCH=s390x TARGET_BIG_ENDIAN=y TARGET_SUPPORTS_MTTCG=y -TARGET_XML_FILES= gdb-xml/s390x-core64.xml gdb-xml/s390-acr.xml gdb-xml/s390-fpr.xml gdb-xml/s390-vx.xml gdb-xml/s390-cr.xml gdb-xml/s390-virt.xml gdb-xml/s390-gs.xml +TARGET_XML_FILES= gdb-xml/s390x-core64.xml gdb-xml/s390-acr.xml gdb-xml/s390-fpr.xml gdb-xml/s390-vx.xml gdb-xml/s390-cr.xml gdb-xml/s390-virt.xml gdb-xml/s390-virt-tcg.xml gdb-xml/s390-gs.xml diff --git a/gdb-xml/s390-virt-tcg.xml b/gdb-xml/s390-virt-tcg.xml new file mode 100644 index 000..0f77c9b48c6 --- /dev/null +++ b/gdb-xml/s390-virt-tcg.xml @@ -0,0 +1,14 @@ + + + + + + + + + + diff --git a/gdb-xml/s390-virt.xml b/gdb-xml/s390-virt.xml index e2e9a7ad3cc..a79c0307682 100644 --- a/gdb-xml/s390-virt.xml +++ b/gdb-xml/s390-virt.xml @@ -7,10 +7,6 @@ - - - - diff --git a/target/s390x/gdbstub.c b/target/s390x/gdbstub.c index a5d69d0e0bc..111b695dc85 100644 --- a/target/s390x/gdbstub.c +++ b/target/s390x/gdbstub.c @@ -200,61 +200,81 @@ static int cpu_write_c_reg(CPUS390XState *env, uint8_t *mem_buf, int n) } } -/* the values represent the positions in s390-virt.xml */ -#define S390_VIRT_CKC_REGNUM0 -#define S390_VIRT_CPUTM_REGNUM 1 -#define S390_VIRT_BEA_REGNUM2 -#define S390_VIRT_PREFIX_REGNUM 3 -#define S390_VIRT_PP_REGNUM 4 -#define S390_VIRT_PFT_REGNUM5 -#define S390_VIRT_PFS_REGNUM6 -#define S390_VIRT_PFC_REGNUM7 -/* total number of registers in s390-virt.xml */ -#define S390_NUM_VIRT_REGS 8 +/* the values represent the positions in s390-virt-tcg.xml */ +#define S390_VIRT_TCG_CKC_REGNUM0 +#define S390_VIRT_TCG_CPUTM_REGNUM 1 +#define S390_VIRT_TCG_BEA_REGNUM2 +#define S390_VIRT_TCG_PREFIX_REGNUM 3 +/* total number of registers in s390-virt-tcg.xml */ +#define S390_NUM_VIRT_TCG_REGS 4 -static int cpu_read_virt_reg(CPUS390XState *env, GByteArray *mem_buf, int n) +static int cpu_read_virt_tcg_reg(CPUS390XState *env, GByteArray *mem_buf, int n) { switch (n) { -case S390_VIRT_CKC_REGNUM: +case S390_VIRT_TCG_CKC_REGNUM: return gdb_get_regl(mem_buf, env->ckc); -case S390_VIRT_CPUTM_REGNUM: +case S390_VIRT_TCG_CPUTM_REGNUM: return gdb_get_regl(mem_buf, env->cputm); -case S390_VIRT_BEA_REGNUM: +case S390_VIRT_TCG_BEA_REGNUM: return gdb_get_regl(mem_buf, env->gbea); -case S390_VIRT_PREFIX_REGNUM: +case S390_VIRT_TCG_PREFIX_REGNUM: return gdb_get_regl(mem_buf, env->psa); -case S390_VIRT_PP_REGNUM: -return gdb_get_regl(mem_buf, env->pp); -case S390_VIRT_PFT_REGNUM: -return gdb_get_regl(mem_buf, env->pfault_token); -case S390_VIRT_PFS_REGNUM: -return gdb_get_regl(mem_buf, env->pfault_select); -case S390_VIRT_PFC_REGNUM: -return gdb_get_regl(mem_buf, env->pfault_compare); default: return 0; } } -static int cpu_write_virt_reg(CPUS390XState *env, uint8_t *mem_buf, int n) +static int cpu_write_virt_tcg_reg(CPUS390XState *env, uint8_t *mem_buf, int n) { switch (n) { -case S390_VIRT_CKC_REGNUM: +case S390_VIRT_TCG_CKC_REGNUM: env->ckc = ldtul_p(mem_buf);
Re: Taking some time
Thank you Mark for all your work! I think we met first at a zExpo or SHARE (around 2005?) and you have always been a person that did a lot of work helping others. Very much appreciated! All the best for your future life :-) Christian Am 15.02.23 um 16:55 schrieb Mark Post: All, I'm retiring from SUSE. Today is my last day on the job. I intend to stay involved with Linux on the mainframe, including this mailing list, but I'm not sure just how involved. I'll know better after I've had some time to adjust to my new life. When I left EDS in 2007, the environment at the company had become very toxic, with layoffs every quarter of good people with decades of experience. Joining Novell/SUSE at that time was the best possible thing that could have happened to me professionally. It's been very satisfying to contribute to the creation of SUSE's products. It has literally been a dream job for me. I'm thankful to the many people that have helped me over the years. Without them, I might never have been able to make working with mainframe Linux my career for 20+ years. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390
Re: [PATCH] Makefile: allow 'make uninstall'
Am 10.01.23 um 16:13 schrieb Peter Maydell: Meson supports an "uninstall", so we can easily allow it to work by not suppressing the forwarding of it from Make to meson. We originally suppressed this because Meson's 'uninstall' has a hole in it: it will remove everything that is installed by a mechanism meson knows about, but not things installed by "custom install scripts", and there is no "custom uninstall script" mechanism. For QEMU, though, the only thing that was being installed by a custom install script was the LC_MESSAGES files handled by Meson's i18n module, and that code was fixed in Meson commit 487d45c1e5bfff0fbdb4, which is present in Meson 0.60.0 and later. Since we already require a Meson version newer than that, we're now safe to enable 'uninstall', as it will now correctly uninstall everything that was installed. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/109 Signed-off-by: Peter Maydell Always missed that functionality. Thanks.
Re: [PATCH v13 0/7] s390x: CPU Topology
Am 13.12.22 um 14:50 schrieb Christian Borntraeger: Am 12.12.22 um 11:01 schrieb Pierre Morel: On 12/9/22 15:45, Cédric Le Goater wrote: On 12/8/22 10:44, Pierre Morel wrote: Hi, Implementation discussions == CPU models -- Since the S390_FEAT_CONFIGURATION_TOPOLOGY is already in the CPU model for old QEMU we could not activate it as usual from KVM but needed a KVM capability: KVM_CAP_S390_CPU_TOPOLOGY. Checking and enabling this capability enables S390_FEAT_CONFIGURATION_TOPOLOGY. Migration - Once the S390_FEAT_CONFIGURATION_TOPOLOGY is enabled in the source host the STFL(11) is provided to the guest. Since the feature is already in the CPU model of older QEMU, a migration from a new QEMU enabling the topology to an old QEMU will keep STFL(11) enabled making the guest get an exception for illegal operation as soon as it uses the PTF instruction. A VMState keeping track of the S390_FEAT_CONFIGURATION_TOPOLOGY allows to forbid the migration in such a case. Note that the VMState will be used to hold information on the topology once we implement topology change for a running guest. Topology Until we introduce bookss and drawers, polarization and dedication the topology is kept very simple and is specified uniquely by the core_id of the vCPU which is also the vCPU address. Testing === To use the QEMU patches, you will need Linux V6-rc1 or newer, or use the following Linux mainline patches: f5ecfee94493 2022-07-20 KVM: s390: resetting the Topology-Change-Report 24fe0195bc19 2022-07-20 KVM: s390: guest support for topology function 0130337ec45b 2022-07-20 KVM: s390: Cleanup ipte lock access and SIIF fac.. Currently this code is for KVM only, I have no idea if it is interesting to provide a TCG patch. If ever it will be done in another series. Documentation = To have a better understanding of the S390x CPU Topology and its implementation in QEMU you can have a look at the documentation in the last patch of this series. The admin will want to match the host and the guest topology, taking into account that the guest does not recognize multithreading. Consequently, two vCPU assigned to threads of the same real CPU should preferably be assigned to the same socket of the guest machine. Future developments === Two series are actively prepared: - Adding drawers, book, polarization and dedication to the vCPU. - changing the topology with a running guest Since we have time before the next QEMU 8.0 release, could you please send the whole patchset ? Having the full picture would help in taking decisions for downstream also. I am still uncertain about the usefulness of S390Topology object because, as for now, the state can be computed on the fly, the reset can be handled at the machine level directly under s390_machine_reset() and so could migration if the machine had a vmstate (not the case today but quite easy to add). So before merging anything that could be difficult to maintain and/or backport, I would prefer to see it all ! The goal of dedicating an object for topology was to ease the maintenance and portability by using the QEMU object framework. If on contrary it is a problem for maintenance or backport we surely better not use it. Any other opinion? I agree with Cedric. There is no point in a topology object. The state is calculated on the fly depending on the command line. This would change if we ever implement the PTF horizontal/vertical state. But this can then be another CPU state. So I think we could go forward with this patch as a simple patch set that allows to sets a static topology. This makes sense on its own, e.g. if you plan to pin your vCPUs to given host CPUs. Dynamic changes do come with CPU hotplug, not sure what libvirt does with new CPUs in that case during migration. I assume those are re-generated on the target with whatever topology was created on the source. So I guess this will just work, but it would be good if we could test that. A more fine-grained topology (drawer, book) could be added later or upfront. It does require common code and libvirt enhancements, though. Now I have discussed with Pierre and there IS a state that we want to migrate. The topology change state is a guest wide bit that might be still set when topology is changed->bit is set guest is not yet started migration 2 options: 1. a vmstate with that bit and migrate the state 2. always set the topology change bit after migration
Re: [PATCH v13 0/7] s390x: CPU Topology
Am 13.12.22 um 14:57 schrieb Janis Schoetterl-Glausch: On Tue, 2022-12-13 at 14:41 +0100, Christian Borntraeger wrote: Am 12.12.22 um 11:17 schrieb Thomas Huth: On 12/12/2022 11.10, Pierre Morel wrote: On 12/12/22 10:07, Thomas Huth wrote: On 12/12/2022 09.51, Pierre Morel wrote: On 12/9/22 14:32, Thomas Huth wrote: On 08/12/2022 10.44, Pierre Morel wrote: Hi, Implementation discussions == CPU models -- Since the S390_FEAT_CONFIGURATION_TOPOLOGY is already in the CPU model for old QEMU we could not activate it as usual from KVM but needed a KVM capability: KVM_CAP_S390_CPU_TOPOLOGY. Checking and enabling this capability enables S390_FEAT_CONFIGURATION_TOPOLOGY. Migration - Once the S390_FEAT_CONFIGURATION_TOPOLOGY is enabled in the source host the STFL(11) is provided to the guest. Since the feature is already in the CPU model of older QEMU, a migration from a new QEMU enabling the topology to an old QEMU will keep STFL(11) enabled making the guest get an exception for illegal operation as soon as it uses the PTF instruction. I now thought that it is not possible to enable "ctop" on older QEMUs since the don't enable the KVM capability? ... or is it still somehow possible? What did I miss? Thomas Enabling ctop with ctop=on on old QEMU is not possible, this is right. But, if STFL(11) is enable in the source KVM by a new QEMU, I can see that even with -ctop=off the STFL(11) is migrated to the destination. This does not make sense. the cpu model and stfle values are not migrated. This is re-created during startup depending on the command line parameters of -cpu. Thats why source and host have the same command lines for -cpu. And STFLE.11 must not be set on the SOURCE of ctop is off. What about linux? I didn't look to thoroughly at it, but it caches the stfle bits, doesn't it? So if the migration succeeds, even tho it should not, it will look to the guest like the facility is enabled. That is true, but the migration should not succeed in that case unless you use an unsafe way of migrating. And the cpu model was exactly added to block those unsafe way. One thing: -cpu host is unsafe (host-passthrough in libvirt speak). Either use host-model or a fully specified model like z14.2,ctop=on
Re: [PATCH v13 0/7] s390x: CPU Topology
Am 12.12.22 um 11:01 schrieb Pierre Morel: On 12/9/22 15:45, Cédric Le Goater wrote: On 12/8/22 10:44, Pierre Morel wrote: Hi, Implementation discussions == CPU models -- Since the S390_FEAT_CONFIGURATION_TOPOLOGY is already in the CPU model for old QEMU we could not activate it as usual from KVM but needed a KVM capability: KVM_CAP_S390_CPU_TOPOLOGY. Checking and enabling this capability enables S390_FEAT_CONFIGURATION_TOPOLOGY. Migration - Once the S390_FEAT_CONFIGURATION_TOPOLOGY is enabled in the source host the STFL(11) is provided to the guest. Since the feature is already in the CPU model of older QEMU, a migration from a new QEMU enabling the topology to an old QEMU will keep STFL(11) enabled making the guest get an exception for illegal operation as soon as it uses the PTF instruction. A VMState keeping track of the S390_FEAT_CONFIGURATION_TOPOLOGY allows to forbid the migration in such a case. Note that the VMState will be used to hold information on the topology once we implement topology change for a running guest. Topology Until we introduce bookss and drawers, polarization and dedication the topology is kept very simple and is specified uniquely by the core_id of the vCPU which is also the vCPU address. Testing === To use the QEMU patches, you will need Linux V6-rc1 or newer, or use the following Linux mainline patches: f5ecfee94493 2022-07-20 KVM: s390: resetting the Topology-Change-Report 24fe0195bc19 2022-07-20 KVM: s390: guest support for topology function 0130337ec45b 2022-07-20 KVM: s390: Cleanup ipte lock access and SIIF fac.. Currently this code is for KVM only, I have no idea if it is interesting to provide a TCG patch. If ever it will be done in another series. Documentation = To have a better understanding of the S390x CPU Topology and its implementation in QEMU you can have a look at the documentation in the last patch of this series. The admin will want to match the host and the guest topology, taking into account that the guest does not recognize multithreading. Consequently, two vCPU assigned to threads of the same real CPU should preferably be assigned to the same socket of the guest machine. Future developments === Two series are actively prepared: - Adding drawers, book, polarization and dedication to the vCPU. - changing the topology with a running guest Since we have time before the next QEMU 8.0 release, could you please send the whole patchset ? Having the full picture would help in taking decisions for downstream also. I am still uncertain about the usefulness of S390Topology object because, as for now, the state can be computed on the fly, the reset can be handled at the machine level directly under s390_machine_reset() and so could migration if the machine had a vmstate (not the case today but quite easy to add). So before merging anything that could be difficult to maintain and/or backport, I would prefer to see it all ! The goal of dedicating an object for topology was to ease the maintenance and portability by using the QEMU object framework. If on contrary it is a problem for maintenance or backport we surely better not use it. Any other opinion? I agree with Cedric. There is no point in a topology object. The state is calculated on the fly depending on the command line. This would change if we ever implement the PTF horizontal/vertical state. But this can then be another CPU state. So I think we could go forward with this patch as a simple patch set that allows to sets a static topology. This makes sense on its own, e.g. if you plan to pin your vCPUs to given host CPUs. Dynamic changes do come with CPU hotplug, not sure what libvirt does with new CPUs in that case during migration. I assume those are re-generated on the target with whatever topology was created on the source. So I guess this will just work, but it would be good if we could test that. A more fine-grained topology (drawer, book) could be added later or upfront. It does require common code and libvirt enhancements, though.
Re: [PATCH v13 0/7] s390x: CPU Topology
Am 12.12.22 um 11:17 schrieb Thomas Huth: On 12/12/2022 11.10, Pierre Morel wrote: On 12/12/22 10:07, Thomas Huth wrote: On 12/12/2022 09.51, Pierre Morel wrote: On 12/9/22 14:32, Thomas Huth wrote: On 08/12/2022 10.44, Pierre Morel wrote: Hi, Implementation discussions == CPU models -- Since the S390_FEAT_CONFIGURATION_TOPOLOGY is already in the CPU model for old QEMU we could not activate it as usual from KVM but needed a KVM capability: KVM_CAP_S390_CPU_TOPOLOGY. Checking and enabling this capability enables S390_FEAT_CONFIGURATION_TOPOLOGY. Migration - Once the S390_FEAT_CONFIGURATION_TOPOLOGY is enabled in the source host the STFL(11) is provided to the guest. Since the feature is already in the CPU model of older QEMU, a migration from a new QEMU enabling the topology to an old QEMU will keep STFL(11) enabled making the guest get an exception for illegal operation as soon as it uses the PTF instruction. I now thought that it is not possible to enable "ctop" on older QEMUs since the don't enable the KVM capability? ... or is it still somehow possible? What did I miss? Thomas Enabling ctop with ctop=on on old QEMU is not possible, this is right. But, if STFL(11) is enable in the source KVM by a new QEMU, I can see that even with -ctop=off the STFL(11) is migrated to the destination. This does not make sense. the cpu model and stfle values are not migrated. This is re-created during startup depending on the command line parameters of -cpu. Thats why source and host have the same command lines for -cpu. And STFLE.11 must not be set on the SOURCE of ctop is off. Is this with the "host" CPU model or another one? And did you explicitly specify "ctop=off" at the command line, or are you just using the default setting by not specifying it? With explicit cpumodel and using ctop=off like in sudo /usr/local/bin/qemu-system-s390x_master \ -m 512M \ -enable-kvm -smp 4,sockets=4,cores=2,maxcpus=8 \ -cpu z14,ctop=off \ -machine s390-ccw-virtio-7.2,accel=kvm \ ... Ok ... that sounds like a bug somewhere in your patches or in the kernel code ... the guest should never see STFL bit 11 if ctop=off, should it? Correct. If ctop=off then QEMU should disable STFLE.11 for the CPU model.
Re: [PATCH v13 4/7] s390x/cpu_topology: CPU topology migration
Am 08.12.22 um 10:44 schrieb Pierre Morel: The migration can only take place if both source and destination of the migration both use or both do not use the CPU topology facility. We indicate a change in topology during migration postload for the case the topology changed between source and destination. I dont get why we need this? If the target QEMU has topology it should already create this according to the configuration. WHy do we need a trigger? Signed-off-by: Pierre Morel --- target/s390x/cpu.h| 1 + hw/s390x/cpu-topology.c | 49 +++ target/s390x/cpu-sysemu.c | 8 +++ 3 files changed, 58 insertions(+) diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h index bc1a7de932..284c708a6c 100644 --- a/target/s390x/cpu.h +++ b/target/s390x/cpu.h @@ -854,6 +854,7 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg); int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id, int vq, bool assign); void s390_cpu_topology_reset(void); +int s390_cpu_topology_mtcr_set(void); #ifndef CONFIG_USER_ONLY unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu); #else diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c index f54afcf550..8a2fe041d4 100644 --- a/hw/s390x/cpu-topology.c +++ b/hw/s390x/cpu-topology.c @@ -18,6 +18,7 @@ #include "target/s390x/cpu.h" #include "hw/s390x/s390-virtio-ccw.h" #include "hw/s390x/cpu-topology.h" +#include "migration/vmstate.h" /** * s390_has_topology @@ -129,6 +130,53 @@ static void s390_topology_reset(DeviceState *dev) s390_cpu_topology_reset(); } +/** + * cpu_topology_postload + * @opaque: a pointer to the S390Topology + * @version_id: version identifier + * + * We check that the topology is used or is not used + * on both side identically. + * + * If the topology is in use we set the Modified Topology Change Report + * on the destination host. + */ +static int cpu_topology_postload(void *opaque, int version_id) +{ +int ret; + +/* We do not support CPU Topology, all is good */ +if (!s390_has_topology()) { +return 0; +} + +/* We support CPU Topology, set the MTCR unconditionally */ +ret = s390_cpu_topology_mtcr_set(); +if (ret) { +error_report("Failed to set MTCR: %s", strerror(-ret)); +} +return ret; +} + +/** + * cpu_topology_needed: + * @opaque: The pointer to the S390Topology + * + * We always need to know if source and destination use the topology. + */ +static bool cpu_topology_needed(void *opaque) +{ +return s390_has_topology(); +} + +const VMStateDescription vmstate_cpu_topology = { +.name = "cpu_topology", +.version_id = 1, +.post_load = cpu_topology_postload, +.minimum_version_id = 1, +.needed = cpu_topology_needed, +}; + /** * topology_class_init: * @oc: Object class @@ -145,6 +193,7 @@ static void topology_class_init(ObjectClass *oc, void *data) device_class_set_props(dc, s390_topology_properties); set_bit(DEVICE_CATEGORY_MISC, dc->categories); dc->reset = s390_topology_reset; +dc->vmsd = _cpu_topology; } static const TypeInfo cpu_topology_info = { diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c index e27864c5f5..a8e3e6219d 100644 --- a/target/s390x/cpu-sysemu.c +++ b/target/s390x/cpu-sysemu.c @@ -319,3 +319,11 @@ void s390_cpu_topology_reset(void) } } } + +int s390_cpu_topology_mtcr_set(void) +{ +if (kvm_enabled()) { +return kvm_s390_topology_set_mtcr(1); +} +return -ENOENT; +}
Re: [PATCH 1/1] qemu-iotests/stream-under-throttle: do not shutdown QEMU
Am 07.12.22 um 14:14 schrieb Christian Borntraeger: Without a kernel or boot disk a QEMU on s390 will exit (usually with a disabled wait state). This breaks the stream-under-throttle test case. Do not exit qemu if on s390. Signed-off-by: Christian Borntraeger --- tests/qemu-iotests/tests/stream-under-throttle | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/tests/stream-under-throttle b/tests/qemu-iotests/tests/stream-under-throttle index 8d2d9e16840d..c24dfbcaa2f2 100755 --- a/tests/qemu-iotests/tests/stream-under-throttle +++ b/tests/qemu-iotests/tests/stream-under-throttle @@ -88,6 +88,8 @@ class TestStreamWithThrottle(iotests.QMPTestCase): 'x-iops-total=1,x-bps-total=104857600') self.vm.add_blockdev(self.vm.qmp_to_opts(blockdev)) self.vm.add_device('virtio-blk,iothread=iothr0,drive=throttled-node') +if iotests.qemu_default_machine == 's390-ccw-virtio': +self.vm.add_args('-no-shutdown') self.vm.launch() def tearDown(self) -> None: ping. I guess, this will come after the release?
Re: [PATCH 1/1] qemu-iotests/stream-under-throttle: do not shutdown QEMU
Am 07.12.22 um 14:14 schrieb Christian Borntraeger: Without a kernel or boot disk a QEMU on s390 will exit (usually with a disabled wait state). This breaks the stream-under-throttle test case. Do not exit qemu if on s390. Signed-off-by: Christian Borntraeger --- tests/qemu-iotests/tests/stream-under-throttle | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/tests/stream-under-throttle b/tests/qemu-iotests/tests/stream-under-throttle index 8d2d9e16840d..c24dfbcaa2f2 100755 --- a/tests/qemu-iotests/tests/stream-under-throttle +++ b/tests/qemu-iotests/tests/stream-under-throttle @@ -88,6 +88,8 @@ class TestStreamWithThrottle(iotests.QMPTestCase): 'x-iops-total=1,x-bps-total=104857600') self.vm.add_blockdev(self.vm.qmp_to_opts(blockdev)) self.vm.add_device('virtio-blk,iothread=iothr0,drive=throttled-node') +if iotests.qemu_default_machine == 's390-ccw-virtio': +self.vm.add_args('-no-shutdown') self.vm.launch() def tearDown(self) -> None: ping. I guess, this will come after the release?
Re: z/OS equivalent to z/VM's Link Aggregation
Am 08.12.22 um 17:44 schrieb Jim Elliott: I am sure that z/OS must provide a function equivalent to z/VM's Link Aggregation (where you can share physical OSA ports in a VSwitch), but lots of Google searches are not helping me. Any suggestions? Link aggregration is on layer2 (ethernet) and used by Linux/zVM while z/OS talks to OSA in layer3. So I guess the equivalent would be VIPA. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: [PATCH v3 02/13] tcg/s390x: Remove TCG_REG_TB
Am 07.12.22 um 21:40 schrieb Ilya Leoshkevich: On Wed, 2022-12-07 at 08:55 -0600, Richard Henderson wrote: On 12/7/22 01:45, Thomas Huth wrote: On 06/12/2022 23.22, Richard Henderson wrote: On 12/6/22 13:29, Ilya Leoshkevich wrote: This change doesn't seem to affect that, but what is the minimum supported s390x qemu host? z900? Possibly z990, if I'm reading the gcc processor_flags_table[] correctly; long-displacement-facility is definitely a minimum. We probably should revisit what the minimum for TCG should be, assert those features at startup, and drop the corresponding runtime tests. If we consider the official IBM support statement: https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Mainframe%20Life%20Cycle%20History%20V2.10%20-%20Sept%2013%202022_1.pdf ... that would mean that the z10 and all older machines are not supported anymore. Thanks for the pointer. It would appear that z114 exits support at the end of this month, which would leave z12 as minimum supported cpu. Even assuming z196 gets us extended-immediate, general-insn- extension, load-on-condition, and distinct-operands, which are all quite important to TCG, and constitute almost all of the current runtime checks. The other metric would be matching the set of supported cpus from the set of supported os distributions, but I would be ready to believe z196 is below the minimum there too. r~ I think it should be safe to raise the minimum required hardware for TCG to z196: We recently raised the minimum hardware for the Linux kernel to be z10, so that would be super-safe, but z196 is certainly a sane minimum. * The oldest supported RHEL is v7, it requires z196: https://access.redhat.com/product-life-cycles/ https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/installation_guide/chap-installation-planning-s390 * The oldest supported SLES is v12, it requires z196: https://www.suse.com/de-de/lifecycle/ https://documentation.suse.com/sles/12-SP5/html/SLES-all/cha-zseries.html * The oldest supported Ubuntu is v16.04, it requires zEC12+: https://wiki.ubuntu.com/S390X Best regards, Ilya
Re: [PATCH 1/1] qemu-iotests/stream-under-throttle: do not shutdown QEMU
Am 07.12.22 um 14:23 schrieb Thomas Huth: On 07/12/2022 14.14, Christian Borntraeger wrote: Without a kernel or boot disk a QEMU on s390 will exit (usually with a disabled wait state). This breaks the stream-under-throttle test case. Do not exit qemu if on s390. Signed-off-by: Christian Borntraeger --- tests/qemu-iotests/tests/stream-under-throttle | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/tests/stream-under-throttle b/tests/qemu-iotests/tests/stream-under-throttle index 8d2d9e16840d..c24dfbcaa2f2 100755 --- a/tests/qemu-iotests/tests/stream-under-throttle +++ b/tests/qemu-iotests/tests/stream-under-throttle @@ -88,6 +88,8 @@ class TestStreamWithThrottle(iotests.QMPTestCase): 'x-iops-total=1,x-bps-total=104857600') self.vm.add_blockdev(self.vm.qmp_to_opts(blockdev)) self.vm.add_device('virtio-blk,iothread=iothr0,drive=throttled-node') + if iotests.qemu_default_machine == 's390-ccw-virtio': + self.vm.add_args('-no-shutdown') self.vm.launch() I guess you could even add that unconditionally for all architectures? maybe. It might even fix other architecture with the same problem. But I dont know if thats the case. So we can start with this fix and then remove the if at a later point in time if necessary/useful. Anyway: Reviewed-by: Thomas Huth
Re: [PATCH 1/1] qemu-iotests/stream-under-throttle: do not shutdown QEMU
Am 07.12.22 um 14:23 schrieb Thomas Huth: On 07/12/2022 14.14, Christian Borntraeger wrote: Without a kernel or boot disk a QEMU on s390 will exit (usually with a disabled wait state). This breaks the stream-under-throttle test case. Do not exit qemu if on s390. Signed-off-by: Christian Borntraeger --- tests/qemu-iotests/tests/stream-under-throttle | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/tests/stream-under-throttle b/tests/qemu-iotests/tests/stream-under-throttle index 8d2d9e16840d..c24dfbcaa2f2 100755 --- a/tests/qemu-iotests/tests/stream-under-throttle +++ b/tests/qemu-iotests/tests/stream-under-throttle @@ -88,6 +88,8 @@ class TestStreamWithThrottle(iotests.QMPTestCase): 'x-iops-total=1,x-bps-total=104857600') self.vm.add_blockdev(self.vm.qmp_to_opts(blockdev)) self.vm.add_device('virtio-blk,iothread=iothr0,drive=throttled-node') + if iotests.qemu_default_machine == 's390-ccw-virtio': + self.vm.add_args('-no-shutdown') self.vm.launch() I guess you could even add that unconditionally for all architectures? maybe. It might even fix other architecture with the same problem. But I dont know if thats the case. So we can start with this fix and then remove the if at a later point in time if necessary/useful. Anyway: Reviewed-by: Thomas Huth
[PATCH 1/1] qemu-iotests/stream-under-throttle: do not shutdown QEMU
Without a kernel or boot disk a QEMU on s390 will exit (usually with a disabled wait state). This breaks the stream-under-throttle test case. Do not exit qemu if on s390. Signed-off-by: Christian Borntraeger --- tests/qemu-iotests/tests/stream-under-throttle | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/tests/stream-under-throttle b/tests/qemu-iotests/tests/stream-under-throttle index 8d2d9e16840d..c24dfbcaa2f2 100755 --- a/tests/qemu-iotests/tests/stream-under-throttle +++ b/tests/qemu-iotests/tests/stream-under-throttle @@ -88,6 +88,8 @@ class TestStreamWithThrottle(iotests.QMPTestCase): 'x-iops-total=1,x-bps-total=104857600') self.vm.add_blockdev(self.vm.qmp_to_opts(blockdev)) self.vm.add_device('virtio-blk,iothread=iothr0,drive=throttled-node') +if iotests.qemu_default_machine == 's390-ccw-virtio': +self.vm.add_args('-no-shutdown') self.vm.launch() def tearDown(self) -> None: -- 2.38.1
[PATCH 1/1] qemu-iotests/stream-under-throttle: do not shutdown QEMU
Without a kernel or boot disk a QEMU on s390 will exit (usually with a disabled wait state). This breaks the stream-under-throttle test case. Do not exit qemu if on s390. Signed-off-by: Christian Borntraeger --- tests/qemu-iotests/tests/stream-under-throttle | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/tests/stream-under-throttle b/tests/qemu-iotests/tests/stream-under-throttle index 8d2d9e16840d..c24dfbcaa2f2 100755 --- a/tests/qemu-iotests/tests/stream-under-throttle +++ b/tests/qemu-iotests/tests/stream-under-throttle @@ -88,6 +88,8 @@ class TestStreamWithThrottle(iotests.QMPTestCase): 'x-iops-total=1,x-bps-total=104857600') self.vm.add_blockdev(self.vm.qmp_to_opts(blockdev)) self.vm.add_device('virtio-blk,iothread=iothr0,drive=throttled-node') +if iotests.qemu_default_machine == 's390-ccw-virtio': +self.vm.add_args('-no-shutdown') self.vm.launch() def tearDown(self) -> None: -- 2.38.1
Re: [PATCH v2] tests/stream-under-throttle: New test
Am 07.12.22 um 13:56 schrieb Christian Borntraeger: Am 07.12.22 um 11:31 schrieb Christian Borntraeger: Am 14.11.22 um 10:52 schrieb Hanna Reitz: Test streaming a base image into the top image underneath two throttle nodes. This was reported to make qemu 7.1 hang (https://gitlab.com/qemu-project/qemu/-/issues/1215), so this serves as a regression test. Signed-off-by: Hanna Reitz --- Based-on: <20221107151321.211175-1-hre...@redhat.com> v1: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00368.html v2: - Replace `asyncio.exceptions.TimeoutError` by `asyncio.TimeoutError`: Stefan reported that the CI does not recognize the former: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00424.html As far as I understand, the latter was basically moved to become the former in Python 3.11, and an alias remains, so both are basically equivalent. I only have 3.10, though, where the documentation says that both are different, even though using either seems to work fine (i.e. both catch the timeout there). Not sure about previous versions, but the CI seems pretty certain about not knowing `asyncio.exceptions.TimeoutError`, so use `asyncio.TimeoutError` instead. (Even though that is deprecated in 3.11, but this is not the first place in the tree to use it, so it should not be too bad.) --- .../qemu-iotests/tests/stream-under-throttle | 121 ++ .../tests/stream-under-throttle.out | 5 + 2 files changed, 126 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-under-throttle create mode 100644 tests/qemu-iotests/tests/stream-under-throttle.out As a heads up, I do get the following on s390. I have not yet looked into that: +EE +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/tests/qemu-iotests/tests/stream-under-throttle", line 110, in test_stream + self.vm.run_job('stream') + File "qemu/tests/qemu-iotests/iotests.py", line 986, in run_job + result = self.qmp('query-jobs') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 533, in _soft_shutdown + self.qmp('quit') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 554, in _do_shutdown + self._soft_shutdown(timeout) + File "qemu/python/qemu/machine/machine.py", line 536, in _soft_shutdown + self._close_qmp_connection() + File "qemu/python/qemu/machine/machine.py", line 476, in _close_qmp_connection + self._qmp.close() + File "qemu/python/qemu/qmp/legacy.py", line 277, in close + self._sync( + File "qemu/python/qemu/qmp/legacy.py", line 94, in _sync + return self._aloop.run_until_complete( + File "/usr/lib64/python3.10/asyncio/base_events.py", line 649, in run_until_complete + return future.result() + File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for + return await fut + File "qemu/python/qemu/qmp/protocol.py", line 398, in disconnect + await self._wait_disconnect() + File "qemu/python/qemu/qmp/protocol.py", line 710, in _wait_disconnect + await al
Re: [PATCH v2] tests/stream-under-throttle: New test
Am 07.12.22 um 13:56 schrieb Christian Borntraeger: Am 07.12.22 um 11:31 schrieb Christian Borntraeger: Am 14.11.22 um 10:52 schrieb Hanna Reitz: Test streaming a base image into the top image underneath two throttle nodes. This was reported to make qemu 7.1 hang (https://gitlab.com/qemu-project/qemu/-/issues/1215), so this serves as a regression test. Signed-off-by: Hanna Reitz --- Based-on: <20221107151321.211175-1-hre...@redhat.com> v1: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00368.html v2: - Replace `asyncio.exceptions.TimeoutError` by `asyncio.TimeoutError`: Stefan reported that the CI does not recognize the former: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00424.html As far as I understand, the latter was basically moved to become the former in Python 3.11, and an alias remains, so both are basically equivalent. I only have 3.10, though, where the documentation says that both are different, even though using either seems to work fine (i.e. both catch the timeout there). Not sure about previous versions, but the CI seems pretty certain about not knowing `asyncio.exceptions.TimeoutError`, so use `asyncio.TimeoutError` instead. (Even though that is deprecated in 3.11, but this is not the first place in the tree to use it, so it should not be too bad.) --- .../qemu-iotests/tests/stream-under-throttle | 121 ++ .../tests/stream-under-throttle.out | 5 + 2 files changed, 126 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-under-throttle create mode 100644 tests/qemu-iotests/tests/stream-under-throttle.out As a heads up, I do get the following on s390. I have not yet looked into that: +EE +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/tests/qemu-iotests/tests/stream-under-throttle", line 110, in test_stream + self.vm.run_job('stream') + File "qemu/tests/qemu-iotests/iotests.py", line 986, in run_job + result = self.qmp('query-jobs') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 533, in _soft_shutdown + self.qmp('quit') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 554, in _do_shutdown + self._soft_shutdown(timeout) + File "qemu/python/qemu/machine/machine.py", line 536, in _soft_shutdown + self._close_qmp_connection() + File "qemu/python/qemu/machine/machine.py", line 476, in _close_qmp_connection + self._qmp.close() + File "qemu/python/qemu/qmp/legacy.py", line 277, in close + self._sync( + File "qemu/python/qemu/qmp/legacy.py", line 94, in _sync + return self._aloop.run_until_complete( + File "/usr/lib64/python3.10/asyncio/base_events.py", line 649, in run_until_complete + return future.result() + File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for + return await fut + File "qemu/python/qemu/qmp/protocol.py", line 398, in disconnect + await self._wait_disconnect() + File "qemu/python/qemu/qmp/protocol.py", line 710, in _wait_disconnect + await al
Re: [PATCH v2] tests/stream-under-throttle: New test
Am 07.12.22 um 11:31 schrieb Christian Borntraeger: Am 14.11.22 um 10:52 schrieb Hanna Reitz: Test streaming a base image into the top image underneath two throttle nodes. This was reported to make qemu 7.1 hang (https://gitlab.com/qemu-project/qemu/-/issues/1215), so this serves as a regression test. Signed-off-by: Hanna Reitz --- Based-on: <20221107151321.211175-1-hre...@redhat.com> v1: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00368.html v2: - Replace `asyncio.exceptions.TimeoutError` by `asyncio.TimeoutError`: Stefan reported that the CI does not recognize the former: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00424.html As far as I understand, the latter was basically moved to become the former in Python 3.11, and an alias remains, so both are basically equivalent. I only have 3.10, though, where the documentation says that both are different, even though using either seems to work fine (i.e. both catch the timeout there). Not sure about previous versions, but the CI seems pretty certain about not knowing `asyncio.exceptions.TimeoutError`, so use `asyncio.TimeoutError` instead. (Even though that is deprecated in 3.11, but this is not the first place in the tree to use it, so it should not be too bad.) --- .../qemu-iotests/tests/stream-under-throttle | 121 ++ .../tests/stream-under-throttle.out | 5 + 2 files changed, 126 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-under-throttle create mode 100644 tests/qemu-iotests/tests/stream-under-throttle.out As a heads up, I do get the following on s390. I have not yet looked into that: +EE +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/tests/qemu-iotests/tests/stream-under-throttle", line 110, in test_stream + self.vm.run_job('stream') + File "qemu/tests/qemu-iotests/iotests.py", line 986, in run_job + result = self.qmp('query-jobs') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 533, in _soft_shutdown + self.qmp('quit') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 554, in _do_shutdown + self._soft_shutdown(timeout) + File "qemu/python/qemu/machine/machine.py", line 536, in _soft_shutdown + self._close_qmp_connection() + File "qemu/python/qemu/machine/machine.py", line 476, in _close_qmp_connection + self._qmp.close() + File "qemu/python/qemu/qmp/legacy.py", line 277, in close + self._sync( + File "qemu/python/qemu/qmp/legacy.py", line 94, in _sync + return self._aloop.run_until_complete( + File "/usr/lib64/python3.10/asyncio/base_events.py", line 649, in run_until_complete + return future.result() + File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for + return await fut + File "qemu/python/qemu/qmp/protocol.py", line 398, in disconnect + await self._wait_disconnect() + File "qemu/python/qemu/qmp/protocol.py", line 710, in _wait_disconnect + await all_defined_tasks # Raise Exceptions from the bottom half. +
Re: [PATCH v2] tests/stream-under-throttle: New test
Am 07.12.22 um 11:31 schrieb Christian Borntraeger: Am 14.11.22 um 10:52 schrieb Hanna Reitz: Test streaming a base image into the top image underneath two throttle nodes. This was reported to make qemu 7.1 hang (https://gitlab.com/qemu-project/qemu/-/issues/1215), so this serves as a regression test. Signed-off-by: Hanna Reitz --- Based-on: <20221107151321.211175-1-hre...@redhat.com> v1: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00368.html v2: - Replace `asyncio.exceptions.TimeoutError` by `asyncio.TimeoutError`: Stefan reported that the CI does not recognize the former: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00424.html As far as I understand, the latter was basically moved to become the former in Python 3.11, and an alias remains, so both are basically equivalent. I only have 3.10, though, where the documentation says that both are different, even though using either seems to work fine (i.e. both catch the timeout there). Not sure about previous versions, but the CI seems pretty certain about not knowing `asyncio.exceptions.TimeoutError`, so use `asyncio.TimeoutError` instead. (Even though that is deprecated in 3.11, but this is not the first place in the tree to use it, so it should not be too bad.) --- .../qemu-iotests/tests/stream-under-throttle | 121 ++ .../tests/stream-under-throttle.out | 5 + 2 files changed, 126 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-under-throttle create mode 100644 tests/qemu-iotests/tests/stream-under-throttle.out As a heads up, I do get the following on s390. I have not yet looked into that: +EE +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/tests/qemu-iotests/tests/stream-under-throttle", line 110, in test_stream + self.vm.run_job('stream') + File "qemu/tests/qemu-iotests/iotests.py", line 986, in run_job + result = self.qmp('query-jobs') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 533, in _soft_shutdown + self.qmp('quit') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp + ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd + return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj + self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper + raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 554, in _do_shutdown + self._soft_shutdown(timeout) + File "qemu/python/qemu/machine/machine.py", line 536, in _soft_shutdown + self._close_qmp_connection() + File "qemu/python/qemu/machine/machine.py", line 476, in _close_qmp_connection + self._qmp.close() + File "qemu/python/qemu/qmp/legacy.py", line 277, in close + self._sync( + File "qemu/python/qemu/qmp/legacy.py", line 94, in _sync + return self._aloop.run_until_complete( + File "/usr/lib64/python3.10/asyncio/base_events.py", line 649, in run_until_complete + return future.result() + File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for + return await fut + File "qemu/python/qemu/qmp/protocol.py", line 398, in disconnect + await self._wait_disconnect() + File "qemu/python/qemu/qmp/protocol.py", line 710, in _wait_disconnect + await all_defined_tasks # Raise Exceptions from the bottom half. +
Re: [PATCH v2] tests/stream-under-throttle: New test
Am 14.11.22 um 10:52 schrieb Hanna Reitz: Test streaming a base image into the top image underneath two throttle nodes. This was reported to make qemu 7.1 hang (https://gitlab.com/qemu-project/qemu/-/issues/1215), so this serves as a regression test. Signed-off-by: Hanna Reitz --- Based-on: <20221107151321.211175-1-hre...@redhat.com> v1: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00368.html v2: - Replace `asyncio.exceptions.TimeoutError` by `asyncio.TimeoutError`: Stefan reported that the CI does not recognize the former: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00424.html As far as I understand, the latter was basically moved to become the former in Python 3.11, and an alias remains, so both are basically equivalent. I only have 3.10, though, where the documentation says that both are different, even though using either seems to work fine (i.e. both catch the timeout there). Not sure about previous versions, but the CI seems pretty certain about not knowing `asyncio.exceptions.TimeoutError`, so use `asyncio.TimeoutError` instead. (Even though that is deprecated in 3.11, but this is not the first place in the tree to use it, so it should not be too bad.) --- .../qemu-iotests/tests/stream-under-throttle | 121 ++ .../tests/stream-under-throttle.out | 5 + 2 files changed, 126 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-under-throttle create mode 100644 tests/qemu-iotests/tests/stream-under-throttle.out As a heads up, I do get the following on s390. I have not yet looked into that: +EE +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/tests/qemu-iotests/tests/stream-under-throttle", line 110, in test_stream +self.vm.run_job('stream') + File "qemu/tests/qemu-iotests/iotests.py", line 986, in run_job +result = self.qmp('query-jobs') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp +ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd +return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj +self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper +raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 533, in _soft_shutdown +self.qmp('quit') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp +ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd +return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj +self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper +raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 554, in _do_shutdown +self._soft_shutdown(timeout) + File "qemu/python/qemu/machine/machine.py", line 536, in _soft_shutdown +self._close_qmp_connection() + File "qemu/python/qemu/machine/machine.py", line 476, in _close_qmp_connection +self._qmp.close() + File "qemu/python/qemu/qmp/legacy.py", line 277, in close +self._sync( + File "qemu/python/qemu/qmp/legacy.py", line 94, in _sync +return self._aloop.run_until_complete( + File "/usr/lib64/python3.10/asyncio/base_events.py", line 649, in run_until_complete +return future.result() + File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for +return await fut + File "qemu/python/qemu/qmp/protocol.py", line 398, in disconnect +await self._wait_disconnect() + File "qemu/python/qemu/qmp/protocol.py", line 710, in _wait_disconnect +await all_defined_tasks # Raise Exceptions from the bottom half. + File "qemu/python/qemu/qmp/protocol.py", line 861, in _bh_loop_forever +await async_fn() + File "qemu/python/qemu/qmp/protocol.py", line 899, in _bh_recv_message +msg = await self._recv() + File "qemu/python/qemu/qmp/protocol.py", line 1000, in _recv +
Re: [PATCH v2] tests/stream-under-throttle: New test
Am 14.11.22 um 10:52 schrieb Hanna Reitz: Test streaming a base image into the top image underneath two throttle nodes. This was reported to make qemu 7.1 hang (https://gitlab.com/qemu-project/qemu/-/issues/1215), so this serves as a regression test. Signed-off-by: Hanna Reitz --- Based-on: <20221107151321.211175-1-hre...@redhat.com> v1: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00368.html v2: - Replace `asyncio.exceptions.TimeoutError` by `asyncio.TimeoutError`: Stefan reported that the CI does not recognize the former: https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00424.html As far as I understand, the latter was basically moved to become the former in Python 3.11, and an alias remains, so both are basically equivalent. I only have 3.10, though, where the documentation says that both are different, even though using either seems to work fine (i.e. both catch the timeout there). Not sure about previous versions, but the CI seems pretty certain about not knowing `asyncio.exceptions.TimeoutError`, so use `asyncio.TimeoutError` instead. (Even though that is deprecated in 3.11, but this is not the first place in the tree to use it, so it should not be too bad.) --- .../qemu-iotests/tests/stream-under-throttle | 121 ++ .../tests/stream-under-throttle.out | 5 + 2 files changed, 126 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-under-throttle create mode 100644 tests/qemu-iotests/tests/stream-under-throttle.out As a heads up, I do get the following on s390. I have not yet looked into that: +EE +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/tests/qemu-iotests/tests/stream-under-throttle", line 110, in test_stream +self.vm.run_job('stream') + File "qemu/tests/qemu-iotests/iotests.py", line 986, in run_job +result = self.qmp('query-jobs') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp +ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd +return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj +self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper +raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +== +ERROR: test_stream (__main__.TestStreamWithThrottle) +Do a simple stream beneath the two throttle nodes. Should complete +-- +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 533, in _soft_shutdown +self.qmp('quit') + File "qemu/python/qemu/machine/machine.py", line 646, in qmp +ret = self._qmp.cmd(cmd, args=qmp_args) + File "qemu/python/qemu/qmp/legacy.py", line 204, in cmd +return self.cmd_obj(qmp_cmd) + File "qemu/python/qemu/qmp/legacy.py", line 184, in cmd_obj +self._qmp._raw(qmp_cmd, assign_id=False), + File "qemu/python/qemu/qmp/protocol.py", line 154, in _wrapper +raise StateError(emsg, proto.runstate, required_state) +qemu.qmp.protocol.StateError: QMPClient is disconnecting. Call disconnect() to return to IDLE state. + +During handling of the above exception, another exception occurred: + +Traceback (most recent call last): + File "qemu/python/qemu/machine/machine.py", line 554, in _do_shutdown +self._soft_shutdown(timeout) + File "qemu/python/qemu/machine/machine.py", line 536, in _soft_shutdown +self._close_qmp_connection() + File "qemu/python/qemu/machine/machine.py", line 476, in _close_qmp_connection +self._qmp.close() + File "qemu/python/qemu/qmp/legacy.py", line 277, in close +self._sync( + File "qemu/python/qemu/qmp/legacy.py", line 94, in _sync +return self._aloop.run_until_complete( + File "/usr/lib64/python3.10/asyncio/base_events.py", line 649, in run_until_complete +return future.result() + File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for +return await fut + File "qemu/python/qemu/qmp/protocol.py", line 398, in disconnect +await self._wait_disconnect() + File "qemu/python/qemu/qmp/protocol.py", line 710, in _wait_disconnect +await all_defined_tasks # Raise Exceptions from the bottom half. + File "qemu/python/qemu/qmp/protocol.py", line 861, in _bh_loop_forever +await async_fn() + File "qemu/python/qemu/qmp/protocol.py", line 899, in _bh_recv_message +msg = await self._recv() + File "qemu/python/qemu/qmp/protocol.py", line 1000, in _recv +
Re: qemu iotest 161 and make check
Am 27.10.22 um 07:54 schrieb Christian Borntraeger: [...] diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 0f1fecc68e..01bdb05575 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -388,7 +388,7 @@ _cleanup_qemu() kill -KILL ${QEMU_PID} 2>/dev/null fi if [ -n "${QEMU_PID}" ]; then - wait ${QEMU_PID} 2>/dev/null # silent kill + wait 2>/dev/null # silent kill fi fi And this also helps. Still trying to find out what clone/fork happens here. As a new information, the problem only exists on Ubuntu, I cannot reproduce it with Fedora or RHEL. I also changed the kernel, its not the reason. As soon as I add tracing the different timing also makes the problem go away.
Re: qemu iotest 161 and make check
Am 27.10.22 um 07:54 schrieb Christian Borntraeger: [...] diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 0f1fecc68e..01bdb05575 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -388,7 +388,7 @@ _cleanup_qemu() kill -KILL ${QEMU_PID} 2>/dev/null fi if [ -n "${QEMU_PID}" ]; then - wait ${QEMU_PID} 2>/dev/null # silent kill + wait 2>/dev/null # silent kill fi fi And this also helps. Still trying to find out what clone/fork happens here. As a new information, the problem only exists on Ubuntu, I cannot reproduce it with Fedora or RHEL. I also changed the kernel, its not the reason. As soon as I add tracing the different timing also makes the problem go away.
Re: [PATCH] virtiofsd: Add `sigreturn` to the seccomp whitelist
Am 29.11.22 um 10:42 schrieb Dr. David Alan Gilbert: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: "Dr. David Alan Gilbert" writes: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: The virtiofsd currently crashes on s390x. This is because of a `sigreturn` system call. See audit log below: type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=8016 syscall=119 compat=0 ip=0x3fff15f748a code=0x8000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn I'm curious; doesn't that mean that some signal is being delivered and you're returning? Which one? code=0x8000 means that the seccomp action SECCOMP_RET_KILL_PROCESS is taken => process is killed by a SIGSYS signal (31) [1]. At least, that’s my understanding of this log message. [1] https://man7.org/linux/man-pages/man2/seccomp.2.html But isn't that the fallout rather than the cause ? i.e. seccomp is sending a SIGSYS because the process used sigreturn, my question is why did the process call sigreturn in the first place - it must have received a signal to return from? Good question. virtiofsd seems to prepare itself for int fuse_set_signal_handlers(struct fuse_session *se) { /* * If we used SIG_IGN instead of the do_nothing function, * then we would be unable to tell if we set SIG_IGN (and * thus should reset to SIG_DFL in fuse_remove_signal_handlers) * or if it was already set to SIG_IGN (and should be left * untouched. */ if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 || set_one_signal_handler(SIGINT, exit_handler, 0) == -1 || set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 || set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1) { return -1; } Given that rt_sigreturn was already on the seccomp list it seems to be expected that those handlers are called.
Re: [Virtio-fs] [PATCH] virtiofsd: Add `sigreturn` to the seccomp whitelist
Am 29.11.22 um 10:42 schrieb Dr. David Alan Gilbert: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: "Dr. David Alan Gilbert" writes: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: The virtiofsd currently crashes on s390x. This is because of a `sigreturn` system call. See audit log below: type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=8016 syscall=119 compat=0 ip=0x3fff15f748a code=0x8000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn I'm curious; doesn't that mean that some signal is being delivered and you're returning? Which one? code=0x8000 means that the seccomp action SECCOMP_RET_KILL_PROCESS is taken => process is killed by a SIGSYS signal (31) [1]. At least, that’s my understanding of this log message. [1] https://man7.org/linux/man-pages/man2/seccomp.2.html But isn't that the fallout rather than the cause ? i.e. seccomp is sending a SIGSYS because the process used sigreturn, my question is why did the process call sigreturn in the first place - it must have received a signal to return from? Good question. virtiofsd seems to prepare itself for int fuse_set_signal_handlers(struct fuse_session *se) { /* * If we used SIG_IGN instead of the do_nothing function, * then we would be unable to tell if we set SIG_IGN (and * thus should reset to SIG_DFL in fuse_remove_signal_handlers) * or if it was already set to SIG_IGN (and should be left * untouched. */ if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 || set_one_signal_handler(SIGINT, exit_handler, 0) == -1 || set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 || set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1) { return -1; } Given that rt_sigreturn was already on the seccomp list it seems to be expected that those handlers are called. ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH] virtiofsd: Add `sigreturn` to the seccomp whitelist
Am 29.11.22 um 10:52 schrieb Christian Borntraeger: Am 29.11.22 um 10:42 schrieb Dr. David Alan Gilbert: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: "Dr. David Alan Gilbert" writes: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: The virtiofsd currently crashes on s390x. This is because of a `sigreturn` system call. See audit log below: type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=8016 syscall=119 compat=0 ip=0x3fff15f748a code=0x8000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn I'm curious; doesn't that mean that some signal is being delivered and you're returning? Which one? code=0x8000 means that the seccomp action SECCOMP_RET_KILL_PROCESS is taken => process is killed by a SIGSYS signal (31) [1]. At least, that’s my understanding of this log message. [1] https://man7.org/linux/man-pages/man2/seccomp.2.html But isn't that the fallout rather than the cause ? i.e. seccomp is sending a SIGSYS because the process used sigreturn, my question is why did the process call sigreturn in the first place - it must have received a signal to return from? Good question. virtiofsd seems to prepare itself for int fuse_set_signal_handlers(struct fuse_session *se) { /* * If we used SIG_IGN instead of the do_nothing function, * then we would be unable to tell if we set SIG_IGN (and * thus should reset to SIG_DFL in fuse_remove_signal_handlers) * or if it was already set to SIG_IGN (and should be left * untouched. */ if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 || set_one_signal_handler(SIGINT, exit_handler, 0) == -1 || set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 || set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1) { return -1; } Given that rt_sigreturn was already on the seccomp list it seems to be expected that those handlers are called. For me, it seems to happen on shutdown: Stack trace of thread 1: #0 0x03ffc06f348a __kernel_sigreturn (linux-vdso64.so.1 + 0x48a) #1 0x03ffc06f3488 __kernel_sigreturn (linux-vdso64.so.1 + 0x488) #2 0x03ff9af1be96 __GI___futex_abstimed_wait_cancelable64 (libc.so.6 + 0x9be96) #3 0x03ff9af211b4 __pthread_clockjoin_ex (libc.so.6 + 0xa11b4) #4 0x03ff9af2106e pthread_join@GLIBC_2.2 (libc.so.6 + 0xa106e) #5 0x02aa35d2fe36 fv_queue_cleanup_thread (virtiofsd + 0x2fe36) #6 0x02aa35d3152c stop_all_queues (virtiofsd + 0x3152c) #7 0x02aa35d2869c main (virtiofsd + 0x2869c) #8 0x03ff9aeb4872 __libc_start_call_main (libc.so.6 + 0x34872) #9 0x03ff9aeb4950 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x34950) #10 0x02aa35d290a0 .annobin_libvhost_user.c_end.startup (virtiofsd + 0x290a0)
Re: [Virtio-fs] [PATCH] virtiofsd: Add `sigreturn` to the seccomp whitelist
Am 29.11.22 um 10:52 schrieb Christian Borntraeger: Am 29.11.22 um 10:42 schrieb Dr. David Alan Gilbert: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: "Dr. David Alan Gilbert" writes: * Marc Hartmayer (mhart...@linux.ibm.com) wrote: The virtiofsd currently crashes on s390x. This is because of a `sigreturn` system call. See audit log below: type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=8016 syscall=119 compat=0 ip=0x3fff15f748a code=0x8000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn I'm curious; doesn't that mean that some signal is being delivered and you're returning? Which one? code=0x8000 means that the seccomp action SECCOMP_RET_KILL_PROCESS is taken => process is killed by a SIGSYS signal (31) [1]. At least, that’s my understanding of this log message. [1] https://man7.org/linux/man-pages/man2/seccomp.2.html But isn't that the fallout rather than the cause ? i.e. seccomp is sending a SIGSYS because the process used sigreturn, my question is why did the process call sigreturn in the first place - it must have received a signal to return from? Good question. virtiofsd seems to prepare itself for int fuse_set_signal_handlers(struct fuse_session *se) { /* * If we used SIG_IGN instead of the do_nothing function, * then we would be unable to tell if we set SIG_IGN (and * thus should reset to SIG_DFL in fuse_remove_signal_handlers) * or if it was already set to SIG_IGN (and should be left * untouched. */ if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 || set_one_signal_handler(SIGINT, exit_handler, 0) == -1 || set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 || set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1) { return -1; } Given that rt_sigreturn was already on the seccomp list it seems to be expected that those handlers are called. For me, it seems to happen on shutdown: Stack trace of thread 1: #0 0x03ffc06f348a __kernel_sigreturn (linux-vdso64.so.1 + 0x48a) #1 0x03ffc06f3488 __kernel_sigreturn (linux-vdso64.so.1 + 0x488) #2 0x03ff9af1be96 __GI___futex_abstimed_wait_cancelable64 (libc.so.6 + 0x9be96) #3 0x03ff9af211b4 __pthread_clockjoin_ex (libc.so.6 + 0xa11b4) #4 0x03ff9af2106e pthread_join@GLIBC_2.2 (libc.so.6 + 0xa106e) #5 0x02aa35d2fe36 fv_queue_cleanup_thread (virtiofsd + 0x2fe36) #6 0x02aa35d3152c stop_all_queues (virtiofsd + 0x3152c) #7 0x02aa35d2869c main (virtiofsd + 0x2869c) #8 0x03ff9aeb4872 __libc_start_call_main (libc.so.6 + 0x34872) #9 0x03ff9aeb4950 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x34950) #10 0x02aa35d290a0 .annobin_libvhost_user.c_end.startup (virtiofsd + 0x290a0) ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [Virtio-fs] [PATCH] virtiofsd: Add `sigreturn` to the seccomp whitelist
Am 25.11.22 um 17:32 schrieb German Maglione: On Fri, Nov 25, 2022 at 3:40 PM Marc Hartmayer wrote: The virtiofsd currently crashes on s390x. This is because of a `sigreturn` system call. See audit log below: type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=8016 syscall=119 compat=0 ip=0x3fff15f748a code=0x8000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn Signed-off-by: Marc Hartmayer --- tools/virtiofsd/passthrough_seccomp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c index 888295c073de..0033dab4939e 100644 --- a/tools/virtiofsd/passthrough_seccomp.c +++ b/tools/virtiofsd/passthrough_seccomp.c @@ -110,6 +110,7 @@ static const int syscall_allowlist[] = { #endif SCMP_SYS(set_robust_list), SCMP_SYS(setxattr), +SCMP_SYS(sigreturn), SCMP_SYS(symlinkat), SCMP_SYS(syncfs), SCMP_SYS(time), /* Rarely needed, except on static builds */ -- 2.34.1 ___ Virtio-fs mailing list virtio...@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs Reviewed-by: German Maglione Should we add this also in the rust version?, I see we don't have it enabled either. this is probably a good idea.
Re: [Virtio-fs] [PATCH] virtiofsd: Add `sigreturn` to the seccomp whitelist
Am 25.11.22 um 17:32 schrieb German Maglione: On Fri, Nov 25, 2022 at 3:40 PM Marc Hartmayer wrote: The virtiofsd currently crashes on s390x. This is because of a `sigreturn` system call. See audit log below: type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=8016 syscall=119 compat=0 ip=0x3fff15f748a code=0x8000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn Signed-off-by: Marc Hartmayer --- tools/virtiofsd/passthrough_seccomp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c index 888295c073de..0033dab4939e 100644 --- a/tools/virtiofsd/passthrough_seccomp.c +++ b/tools/virtiofsd/passthrough_seccomp.c @@ -110,6 +110,7 @@ static const int syscall_allowlist[] = { #endif SCMP_SYS(set_robust_list), SCMP_SYS(setxattr), +SCMP_SYS(sigreturn), SCMP_SYS(symlinkat), SCMP_SYS(syncfs), SCMP_SYS(time), /* Rarely needed, except on static builds */ -- 2.34.1 ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs Reviewed-by: German Maglione Should we add this also in the rust version?, I see we don't have it enabled either. this is probably a good idea. ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 21.11.22 um 23:37 schrieb Michael S. Tsirkin: [...] qemu-system-x86_64: ../hw/virtio/vhost-vsock-common.c:203: vhost_vsock_common_pre_save: Assertion `!vhost_dev_is_started(>vhost_dev)' failed. 2022-11-15 16:38:46.096+: shutting down, reason=crashed Alex were you able to replicate? Just curious. Ping?
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 21.11.22 um 23:37 schrieb Michael S. Tsirkin: [...] qemu-system-x86_64: ../hw/virtio/vhost-vsock-common.c:203: vhost_vsock_common_pre_save: Assertion `!vhost_dev_is_started(>vhost_dev)' failed. 2022-11-15 16:38:46.096+: shutting down, reason=crashed Alex were you able to replicate? Just curious. Ping?
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 21.11.22 um 23:37 schrieb Michael S. Tsirkin: [...] qemu-system-x86_64: ../hw/virtio/vhost-vsock-common.c:203: vhost_vsock_common_pre_save: Assertion `!vhost_dev_is_started(>vhost_dev)' failed. 2022-11-15 16:38:46.096+: shutting down, reason=crashed Alex were you able to replicate? Just curious. Ping? ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH v9 00/10] s390x: CPU Topology
Am 02.09.22 um 09:55 schrieb Pierre Morel: Hi, The implementation of the CPU Topology in QEMU has been drastically modified since the last patch series and the number of LOCs has been greatly reduced. Unnecessary objects have been removed, only a single S390Topology object is created to support migration and reset. Also a documentation has been added to the series. To use these patches, you will need Linux V6-rc1 or newer. Mainline patches needed are: f5ecfee94493 2022-07-20 KVM: s390: resetting the Topology-Change-Report 24fe0195bc19 2022-07-20 KVM: s390: guest support for topology function 0130337ec45b 2022-07-20 KVM: s390: Cleanup ipte lock access and SIIF fac.. Currently this code is for KVM only, I have no idea if it is interesting to provide a TCG patch. If ever it will be done in another series. To have a better understanding of the S390x CPU Topology and its implementation in QEMU you can have a look at the documentation in the last patch. New in this series == s390x/cpus: Make absence of multithreading clear This patch makes clear that CPU-multithreading is not supported in the guest. s390x/cpu topology: core_id sets s390x CPU topology This patch uses the core_id to build the container topology and the placement of the CPU inside the container. s390x/cpu topology: reporting the CPU topology to the guest This patch is based on the fact that the CPU type for guests is always IFL, CPUs are always dedicated and the polarity is always horizontal. This may change in the future. hw/core: introducing drawer and books for s390x s390x/cpu: reporting drawers and books topology to the guest These two patches extend the topology handling to add two new containers levels above sockets: books and drawers. The subject of the last patches is clear enough (I hope). Regards, Pierre Pierre Morel (10): s390x/cpus: Make absence of multithreading clear s390x/cpu topology: core_id sets s390x CPU topology s390x/cpu topology: reporting the CPU topology to the guest hw/core: introducing drawer and books for s390x s390x/cpu: reporting drawers and books topology to the guest s390x/cpu_topology: resetting the Topology-Change-Report s390x/cpu_topology: CPU topology migration target/s390x: interception of PTF instruction s390x/cpu_topology: activating CPU topology Do we really need a machine property? As far as I can see, old QEMU cannot activate the ctop facility with old and new kernel unless it enables CAP_S390_CPU_TOPOLOGY. I do get oldqemu -cpu z14,ctop=on qemu-system-s390x: Some features requested in the CPU model are not available in the configuration: ctop With the newer QEMU we can. So maybe we can simply have a topology (and then a cpu model feature) in new QEMUs and non in old. the cpu model would then also fence migration from enabled to disabled.
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 17:40 schrieb Christian Borntraeger: Am 15.11.22 um 17:05 schrieb Alex Bennée: Christian Borntraeger writes: Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. If this isn't in our test suite I'm going to need exact steps. Just get any libvirt guest, add to your libvirt xml. Start the guest (with the new xml). Run virsh managedsave - qemu crashes. On x86 and s390. the libvirt log: /home/cborntra/REPOS/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -name guest=f36,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-f36/master-key.aes"}' \ -machine pc-i440fx-7.2,usb=off,dump-guest-core=off,memory-backend=pc.ra
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 17:40 schrieb Christian Borntraeger: Am 15.11.22 um 17:05 schrieb Alex Bennée: Christian Borntraeger writes: Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. If this isn't in our test suite I'm going to need exact steps. Just get any libvirt guest, add to your libvirt xml. Start the guest (with the new xml). Run virsh managedsave - qemu crashes. On x86 and s390. the libvirt log: /home/cborntra/REPOS/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -name guest=f36,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-f36/master-key.aes"}' \ -machine pc-i440fx-7.2,usb=off,dump-guest-core=off,memory-backend=pc.ra
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 17:40 schrieb Christian Borntraeger: Am 15.11.22 um 17:05 schrieb Alex Bennée: Christian Borntraeger writes: Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. If this isn't in our test suite I'm going to need exact steps. Just get any libvirt guest, add to your libvirt xml. Start the guest (with the new xml). Run virsh managedsave - qemu crashes. On x86 and s390. the libvirt log: /home/cborntra/REPOS/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -name guest=f36,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-f36/master-key.aes"}' \ -machine pc-i440fx-7.2,usb=off,dump-guest-core=off,memory-backend=pc.ra
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 17:05 schrieb Alex Bennée: Christian Borntraeger writes: Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. If this isn't in our test suite I'm going to need exact steps. Just get any libvirt guest, add to your libvirt xml. Start the guest (with the new xml). Run virsh managedsave - qemu crashes. On x86 and s390.
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 17:05 schrieb Alex Bennée: Christian Borntraeger writes: Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. If this isn't in our test suite I'm going to need exact steps. Just get any libvirt guest, add to your libvirt xml. Start the guest (with the new xml). Run virsh managedsave - qemu crashes. On x86 and s390.
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 17:05 schrieb Alex Bennée: Christian Borntraeger writes: Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. If this isn't in our test suite I'm going to need exact steps. Just get any libvirt guest, add to your libvirt xml. Start the guest (with the new xml). Run virsh managedsave - qemu crashes. On x86 and s390. ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore.
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore.
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 15:31 schrieb Alex Bennée: "Michael S. Tsirkin" writes: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. I need to replicate the failing test for that. Which test is failing? Pretty much the same as before. guest with vsock, managedsave and restore. ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 12:25 schrieb Michael S. Tsirkin: On Tue, Nov 15, 2022 at 09:18:27AM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. The initial fix changed to qemu/master does still work for me diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a973811cbfc6..fb3072838119 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status) */ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status) { -if (vdev->use_started) { -return vdev->started; -} - if (!vdev->vm_running) { return false; } +if (vdev->use_started) { +return vdev->started; +} + return status & VIRTIO_CONFIG_S_DRIVER_OK; } Triggers failure on gitlab unfortunately: https://gitlab.com/mstredhat/qemu/-/jobs/3323768122 S
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 12:25 schrieb Michael S. Tsirkin: On Tue, Nov 15, 2022 at 09:18:27AM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. The initial fix changed to qemu/master does still work for me diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a973811cbfc6..fb3072838119 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status) */ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status) { -if (vdev->use_started) { -return vdev->started; -} - if (!vdev->vm_running) { return false; } +if (vdev->use_started) { +return vdev->started; +} + return status & VIRTIO_CONFIG_S_DRIVER_OK; } Triggers failure on gitlab unfortunately: https://gitlab.com/mstredhat/qemu/-/jobs/3323768122 S
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 15.11.22 um 12:25 schrieb Michael S. Tsirkin: On Tue, Nov 15, 2022 at 09:18:27AM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. The initial fix changed to qemu/master does still work for me diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a973811cbfc6..fb3072838119 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status) */ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status) { -if (vdev->use_started) { -return vdev->started; -} - if (!vdev->vm_running) { return false; } +if (vdev->use_started) { +return vdev->started; +} + return status & VIRTIO_CONFIG_S_DRIVER_OK; } Triggers failure on gitlab unfortunately: https://gitlab.com/mstredhat/qemu/-/jobs/3323768122 S
Re: zLinux 3270 console support
Am 14.11.22 um 22:11 schrieb Alan Altmark: Sven asked about this on an internal IBM discussion channel and I mentioned that there are two issues that I can see in the Linux 3270 support: 1. Linux is definitely doing it wrong on z/VM. Linux infers the dimensions of the screen from a model number. (Bzzt!) If you have a standard-size screen for model 2, 3, 4, or 5, then the model number from DIAG 0x210 will be correct. Otherwise it will reflect model 2. I'm not sure why CP does that or why Linux is using the model number at all. Even at the turn of the century it was irrelevant. 2. Linux doesn't detect a reconnect event, so it doesn't know to re-query the device and adjust any 'views' that have been created. To be honest while someone could invest time in here, I think the proper solution for Linux would be to improve z/VMs ASCII console support (e.g. make it virtual instead of attaching the LPAR one and provide a console server). -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. The initial fix changed to qemu/master does still work for me diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a973811cbfc6..fb3072838119 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status) */ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status) { -if (vdev->use_started) { -return vdev->started; -} - if (!vdev->vm_running) { return false; } +if (vdev->use_started) { +return vdev->started; +} + return status & VIRTIO_CONFIG_S_DRIVER_OK; }
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. The initial fix changed to qemu/master does still work for me diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a973811cbfc6..fb3072838119 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status) */ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status) { -if (vdev->use_started) { -return vdev->started; -} - if (!vdev->vm_running) { return false; } +if (vdev->use_started) { +return vdev->started; +} + return status & VIRTIO_CONFIG_S_DRIVER_OK; }
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? I don't remember at this point. So how do we proceed now? I'm hopeful Alex will come up with a fix. The initial fix changed to qemu/master does still work for me diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index a973811cbfc6..fb3072838119 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -411,14 +411,14 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status) */ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status) { -if (vdev->use_started) { -return vdev->started; -} - if (!vdev->vm_running) { return false; } +if (vdev->use_started) { +return vdev->started; +} + return status & VIRTIO_CONFIG_S_DRIVER_OK; } ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: [...] This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? Just dobble checked, 9f6bcfd99f was the patch that created the original problem, no?
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: [...] This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? Just dobble checked, 9f6bcfd99f was the patch that created the original problem, no?
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:20 schrieb Michael S. Tsirkin: [...] This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? Just dobble checked, 9f6bcfd99f was the patch that created the original problem, no? ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This does not seem to fix the regression that I have reported.
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? So how do we proceed now?
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly.
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly.
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? So how do we proceed now?
Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This does not seem to fix the regression that I have reported.
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger wrote: Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. That one was failing under github CI though (for reasons we didn't really address, such as disconnect during stop causing a recursive call to stop, but there you are). Even the double revert of everything? So how do we proceed now? ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin: On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian Borntraeger wrote: Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This is what got merged: https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org This patch was sent after I merged the RFC. I think the only difference is the commit log but I might be missing something. This does not seem to fix the regression that I have reported. This was applied on top of 9f6bcfd99f which IIUC does, right? QEMU master still fails for me for suspend/resume to disk: #0 0x03ff8e3980a6 in __pthread_kill_implementation () at /lib64/libc.so.6 #1 0x03ff8e348580 in raise () at /lib64/libc.so.6 #2 0x03ff8e32b5c0 in abort () at /lib64/libc.so.6 #3 0x03ff8e3409da in __assert_fail_base () at /lib64/libc.so.6 #4 0x03ff8e340a4e in () at /lib64/libc.so.6 #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save (opaque=) at ../hw/virtio/vhost-vsock-common.c:203 #6 0x02aa1fe5e0ee in vmstate_save_state_v (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0 , opaque=0x2aa21bac9f8, vmdesc=vmdesc@entry=0x3fddc08eb30, version_id=version_id@entry=0) at ../migration/vmstate.c:329 #7 0x02aa1fe5ebf8 in vmstate_save_state (f=f@entry=0x2aa21bdc170, vmsd=, opaque=, vmdesc_id=vmdesc_id@entry=0x3fddc08eb30) at ../migration/vmstate.c:317 #8 0x02aa1fe75bd0 in vmstate_save (f=f@entry=0x2aa21bdc170, se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) at ../migration/savevm.c:908 #9 0x02aa1fe79584 in qemu_savevm_state_complete_precopy_non_iterable (f=f@entry=0x2aa21bdc170, in_postcopy=in_postcopy@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1393 #10 0x02aa1fe79a96 in qemu_savevm_state_complete_precopy (f=0x2aa21bdc170, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true) at ../migration/savevm.c:1459 #11 0x02aa1fe6d6ee in migration_completion (s=0x2aa218ef600) at ../migration/migration.c:3314 #12 migration_iteration_run (s=0x2aa218ef600) at ../migration/migration.c:3761 #13 migration_thread (opaque=opaque@entry=0x2aa218ef600) at ../migration/migration.c:3989 #14 0x02aa201f0b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #15 0x03ff8e396248 in start_thread () at /lib64/libc.so.6 #16 0x03ff8e41183e in thread_start () at /lib64/libc.so.6 Michael, your previous branch did work if I recall correctly. ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [Virtio-fs] [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 08.11.22 um 10:23 schrieb Alex Bennée: The previous fix to virtio_device_started revealed a problem in its use by both the core and the device code. The core code should be able to handle the device "starting" while the VM isn't running to handle the restoration of migration state. To solve this dual use introduce a new helper for use by the vhost-user backends who all use it to feed a should_start variable. We can also pick up a change vhost_user_blk_set_status while we are at it which follows the same pattern. Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) Fixes: 27ba7b027f (hw/virtio: add boilerplate for vhost-user-gpio device) Signed-off-by: Alex Bennée Cc: "Michael S. Tsirkin" Hmmm, is this commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35 Author: Alex Bennée AuthorDate: Mon Nov 7 12:14:07 2022 + Commit: Michael S. Tsirkin CommitDate: Mon Nov 7 14:08:18 2022 -0500 hw/virtio: introduce virtio_device_should_start and older version? This does not seem to fix the regression that I have reported. ___ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
Re: [PATCH/RFC] virtio_test: A module for testing virtio via userspace
Am 12.11.22 um 17:19 schrieb Dmitry Vyukov: Hi, The original email is from 2009, so I assume you don't have it in your inboxes already. Here is the original email: https://lore.kernel.org/all/200906190927.34831.borntrae...@de.ibm.com/ This patch introduces a prototype for a virtio_test module. This module can be bound to any virtio device via sysfs bind/unbind feature, e.g: $ echo virtio1 > /sys/bus/virtio/drivers/virtio_rng/unbind $ modprobe virtio_test On probe this module registers to all virtqueues and creates a character device for every virtio device. (/dev/viotest). The character device offers ioctls to allow a userspace application to submit virtio operations like addbuf, kick and getbuf. It also offers ioctls to get information about the device and to query the amount of occurred callbacks (or wait synchronously on callbacks). As far as I understand the test driver was never merged and I can't find any similar testing drivers. I am looking for a test module that allows to create a transient virtio device that can be used to activate a virtio driver are communicate with it as if from the host. Does such thing exist already? Or how are virtio transports/drivers tested/fuzzed nowadays? Right, the driver was never merged. Adding Michael as todays virtio maintainer for ideas how to proceed. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio: re-order vm_running and use_started checks
Am 04.11.22 um 17:51 schrieb Christian Borntraeger: Am 04.11.22 um 17:14 schrieb Michael S. Tsirkin: On Fri, Nov 04, 2022 at 04:59:35PM +0100, Christian Borntraeger wrote: Am 04.11.22 um 16:56 schrieb Michael S. Tsirkin: On Fri, Oct 14, 2022 at 02:21:08PM +0100, Alex Bennée wrote: During migration the virtio device state can be restored before we restart the VM. As no devices can be running while the VM is paused it makes sense to bail out early in that case. This returns the order introduced in: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) to what virtio-sock was doing longhand. Signed-off-by: Alex Bennée Cc: Christian Borntraeger What happens now: with this applied I get: https://gitlab.com/mitsirkin/qemu/-/pipelines/685829158/failures ― ✀ ― stderr: qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-reconnect,path=/tmp/vhost-test-QLKXU1/reconnect.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-QLKXU1/reconnect.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-connect-fail,path=/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: Failed to read msg header. Read 0 instead of 12. Original request 1. qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: vhost_backend_init failed: Protocol error qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: failed to init vhost_net for queue 0 qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-flags-mismatch,path=/tmp/vhost-test-3MO5U1/flags-mismatch.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-3MO5U1/flags-mismatch.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 52. qemu-system-arm: vhost_set_mem_table failed: Invalid argument (22) qemu-system-arm: unable to start vhost net: 22: falling back on userspace virtio vhost lacks feature mask 0x4000 for backend qemu-system-arm: failed to init vhost_net for queue 0 qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 2 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 3 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed
Re: [RFC PATCH] virtio: re-order vm_running and use_started checks
Am 04.11.22 um 17:14 schrieb Michael S. Tsirkin: On Fri, Nov 04, 2022 at 04:59:35PM +0100, Christian Borntraeger wrote: Am 04.11.22 um 16:56 schrieb Michael S. Tsirkin: On Fri, Oct 14, 2022 at 02:21:08PM +0100, Alex Bennée wrote: During migration the virtio device state can be restored before we restart the VM. As no devices can be running while the VM is paused it makes sense to bail out early in that case. This returns the order introduced in: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) to what virtio-sock was doing longhand. Signed-off-by: Alex Bennée Cc: Christian Borntraeger What happens now: with this applied I get: https://gitlab.com/mitsirkin/qemu/-/pipelines/685829158/failures ― ✀ ― stderr: qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-reconnect,path=/tmp/vhost-test-QLKXU1/reconnect.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-QLKXU1/reconnect.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-connect-fail,path=/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: Failed to read msg header. Read 0 instead of 12. Original request 1. qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: vhost_backend_init failed: Protocol error qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: failed to init vhost_net for queue 0 qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-flags-mismatch,path=/tmp/vhost-test-3MO5U1/flags-mismatch.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-3MO5U1/flags-mismatch.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 52. qemu-system-arm: vhost_set_mem_table failed: Invalid argument (22) qemu-system-arm: unable to start vhost net: 22: falling back on userspace virtio vhost lacks feature mask 0x4000 for backend qemu-system-arm: failed to init vhost_net for queue 0 qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 2 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 3 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -5: Input/output error (5) qemu-system-arm: ../hw
Re: [RFC PATCH] virtio: re-order vm_running and use_started checks
Am 04.11.22 um 16:56 schrieb Michael S. Tsirkin: On Fri, Oct 14, 2022 at 02:21:08PM +0100, Alex Bennée wrote: During migration the virtio device state can be restored before we restart the VM. As no devices can be running while the VM is paused it makes sense to bail out early in that case. This returns the order introduced in: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started) to what virtio-sock was doing longhand. Signed-off-by: Alex Bennée Cc: Christian Borntraeger What happens now: with this applied I get: https://gitlab.com/mitsirkin/qemu/-/pipelines/685829158/failures ― ✀ ― stderr: qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-reconnect,path=/tmp/vhost-test-QLKXU1/reconnect.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-QLKXU1/reconnect.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-connect-fail,path=/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: Failed to read msg header. Read 0 instead of 12. Original request 1. qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: vhost_backend_init failed: Protocol error qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: failed to init vhost_net for queue 0 qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-L9Q6U1/connect-fail.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: -chardev socket,id=chr-flags-mismatch,path=/tmp/vhost-test-3MO5U1/flags-mismatch.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-3MO5U1/flags-mismatch.sock,server=on qemu-system-arm: Failed to write msg. Wrote -1 instead of 52. qemu-system-arm: vhost_set_mem_table failed: Invalid argument (22) qemu-system-arm: unable to start vhost net: 22: falling back on userspace virtio vhost lacks feature mask 0x4000 for backend qemu-system-arm: failed to init vhost_net for queue 0 qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 2 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 3 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22) qemu-system-arm: Failed to set msg fds. qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22) qemu-system-arm: Failed to write msg. Wrote -1 instead of 20. qemu-system-arm: vhost VQ 0 ring restore failed: -5: Input/output error (5) qemu-system-arm: ../hw/virtio/virtio-bus.c:211: void virtio_bus_release_ioeventfd(VirtioBusState *): Assertion `bus->ioeventfd_grabbed != 0' fai
Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling
Am 03.11.22 um 00:18 schrieb Sean Christopherson: Non-x86 folks, please test on hardware when possible. I made a _lot_ of mistakes when moving code around. Thankfully, x86 was the trickiest code to deal with, and I'm fairly confident that I found all the bugs I introduced via testing. But the number of mistakes I made and found on x86 makes me more than a bit worried that I screwed something up in other arch code. This is a continuation of Chao's series to do x86 CPU compatibility checks during virtualization hardware enabling[1], and of Isaku's series to try and clean up the hardware enabling paths so that x86 (Intel specifically) can temporarily enable hardware during module initialization without causing undue pain for other architectures[2]. It also includes one patch from another mini-series from Isaku that provides the less controversial patches[3]. The main theme of this series is to kill off kvm_arch_init(), kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which all originated in x86 code from way back when, and needlessly complicate both common KVM code and architecture code. E.g. many architectures don't mark functions/data as __init/__ro_after_init purely because kvm_init() isn't marked __init to support x86's separate vendor modules. The idea/hope is that with those hooks gone (moved to arch code), it will be easier for x86 (and other architectures) to modify their module init sequences as needed without having to fight common KVM code. E.g. I'm hoping that ARM can build on this to simplify its hardware enabling logic, especially the pKVM side of things. There are bug fixes throughout this series. They are more scattered than I would usually prefer, but getting the sequencing correct was a gigantic pain for many of the x86 fixes due to needing to fix common code in order for the x86 fix to have any meaning. And while the bugs are often fatal, they aren't all that interesting for most users as they either require a malicious admin or broken hardware, i.e. aren't likely to be encountered by the vast majority of KVM users. So unless someone _really_ wants a particular fix isolated for backporting, I'm not planning on shuffling patches. Tested on x86. Lightly tested on arm64. Compile tested only on all other architectures. Some sniff tests seem to work ok on s390. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling
Am 03.11.22 um 00:18 schrieb Sean Christopherson: Non-x86 folks, please test on hardware when possible. I made a _lot_ of mistakes when moving code around. Thankfully, x86 was the trickiest code to deal with, and I'm fairly confident that I found all the bugs I introduced via testing. But the number of mistakes I made and found on x86 makes me more than a bit worried that I screwed something up in other arch code. This is a continuation of Chao's series to do x86 CPU compatibility checks during virtualization hardware enabling[1], and of Isaku's series to try and clean up the hardware enabling paths so that x86 (Intel specifically) can temporarily enable hardware during module initialization without causing undue pain for other architectures[2]. It also includes one patch from another mini-series from Isaku that provides the less controversial patches[3]. The main theme of this series is to kill off kvm_arch_init(), kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which all originated in x86 code from way back when, and needlessly complicate both common KVM code and architecture code. E.g. many architectures don't mark functions/data as __init/__ro_after_init purely because kvm_init() isn't marked __init to support x86's separate vendor modules. The idea/hope is that with those hooks gone (moved to arch code), it will be easier for x86 (and other architectures) to modify their module init sequences as needed without having to fight common KVM code. E.g. I'm hoping that ARM can build on this to simplify its hardware enabling logic, especially the pKVM side of things. There are bug fixes throughout this series. They are more scattered than I would usually prefer, but getting the sequencing correct was a gigantic pain for many of the x86 fixes due to needing to fix common code in order for the x86 fix to have any meaning. And while the bugs are often fatal, they aren't all that interesting for most users as they either require a malicious admin or broken hardware, i.e. aren't likely to be encountered by the vast majority of KVM users. So unless someone _really_ wants a particular fix isolated for backporting, I'm not planning on shuffling patches. Tested on x86. Lightly tested on arm64. Compile tested only on all other architectures. Some sniff tests seem to work ok on s390.
Re: [PATCH 1/1] tcg: add perfmap and jitdump
Am 12.10.22 um 07:18 schrieb Ilya Leoshkevich: Add ability to dump /tmp/perf-.map and jit-.dump. The first one allows the perf tool to map samples to each individual translation block. The second one adds the ability to resolve symbol names, line numbers and inspect JITed code. Example of use: perf record qemu-x86_64 -perfmap ./a.out perf report or perf record -k 1 qemu-x86_64 -jitdump ./a.out perf inject -j -i perf.data -o perf.data.jitted perf report -i perf.data.jitted Co-developed-by: Vanderson M. do Rosario Co-developed-by: Alex Bennée Signed-off-by: Ilya Leoshkevich I think this would be awesome to have. I know our performance people do use this for Java a lot.
Re: qemu iotest 161 and make check
Am 31.03.22 um 10:25 schrieb Christian Borntraeger: Am 31.03.22 um 09:44 schrieb Christian Borntraeger: Am 21.02.22 um 11:27 schrieb Christian Borntraeger: Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy: 10.02.2022 20:13, Thomas Huth wrote: On 10/02/2022 15.51, Christian Borntraeger wrote: Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy: 10.02.2022 10:57, Christian Borntraeger wrote: Hello, I do see spurious failures of 161 in our CI, but only when I use make check with parallelism (-j). I have not yet figured out which other testcase could interfere @@ -34,6 +34,8 @@ *** Commit and then change an option on the backing file Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576 +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock FWIW, qemu_lock_fd_test returns -11 (EAGAIN) and raw_check_lock_bytes spits this error. And its coming from here (ret is 0) int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive) { int ret; struct flock fl = { .l_whence = SEEK_SET, .l_start = start, .l_len = len, .l_type = exclusive ? F_WRLCK : F_RDLCK, }; qemu_probe_lock_ops(); ret = fcntl(fd, fcntl_op_getlk, ); if (ret == -1) { return -errno; } else { -> return fl.l_type == F_UNLCK ? 0 : -EAGAIN; } } Is this just some overload situation that we do not recover because we do not handle EAGAIN any special. Restarted my investigation. Looks like the file lock from qemu is not fully cleaned up when the process is gone. Something like diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 0f1fecc68e..b28a6c187c 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -403,4 +403,5 @@ _cleanup_qemu() unset QEMU_IN[$i] unset QEMU_OUT[$i] done +sleep 0.5 } makes the problem go away. Looks like we do use the OFD variant of the file lock, so any clone, fork etc will keep the lock. So I tested the following: diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 0f1fecc68e..01bdb05575 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -388,7 +388,7 @@ _cleanup_qemu() kill -KILL ${QEMU_PID} 2>/dev/null fi if [ -n "${QEMU_PID}" ]; then -wait ${QEMU_PID} 2>/dev/null # silent kill +wait 2>/dev/null # silent kill fi fi And this also helps. Still trying to find out what clone/fork happens here.