Re: [PVE-User] Proxmox VM hard resets

Roland Tue, 17 Jan 2023 09:58:32 -0800

can you reproduce this with debian 11 or ubuntu 22 VM (create some load
there), i think this is not a proxmox problem which can be solved at the
proxmox/vm-guest level


see
https://www.theregister.com/2017/11/28/stunning_antistun_vm_stun_problem_fix/
for example

roland

Am 17.01.23 um 16:04 schrieb Adam Weremczuk:

Hi all,

My environment is quite unusual as I run PVE 7.2-11 as a VM on VMware
7.0.2. It runs several LXC containers and generally things are working
fine.

Recently the Proxmox VM (called "jaguar") started resetting itself
(and all containers) shortly after Altaro VM Backup kicked off a
scheduled VM backup over the network.
Each time a hard reset was requested by the OS itself (Proxmox
hypervisor).

The time of the "stun/unstun" operation seems to be causing the issue
here i.e. usually the stun/unstun operation should take a very short
amount of time, however, in my case, depending on the load on both the
hypervisor and the guest VM (nested hypervisor), that time can vary
and take a bit longer, snippet below from various stun/unstun operations:

2023-01-12T23:00:55.407Z| vcpu-0| | I005: CPT: vm was stunned for
32142467 us
2023-01-12T23:01:12.848Z| vcpu-0| | I005: CPT: vm was stunned for
14942070 us
2023-01-12T23:11:35.984Z| vcpu-0| opID=1487b0d5| I005: CPT: vm was
stunned for 277986 us
2023-01-12T23:11:39.431Z| vcpu-0| | I005: CPT: vm was stunned for
122089 us

As you can see the stun time is different between each disk, now what
I think that is happening here is depending on the stun/unstun time of
the VM (virtualized hypervisor), the virtualized hypervisor watchdog
is noticing that the OS is being frozen for a X amount time and
issuing a hard reset. I guess when the stun time is over 30 sec, the
guest OS is issuing a hard reset.

2023-01-12T23:00:55.407Z| vcpu-0| | I005: CPT: vm was stunned for
32142467 us
2023-01-12T23:00:55.407Z| vcpu-0| | I005: SnapshotVMXTakeSnapshotWork:
Transition to mode 1.
2023-01-12T23:00:55.407Z| vcpu-0| | I005:
SnapshotVMXTakeSnapshotComplete: Done with snapshot
'ALTAROTEMPSNAPSHOTDONOTDELETE463b73a7-f363-4daf-acf3-b0322fe84429': 95
2023-01-12T23:00:55.407Z| vcpu-0| | I005:
VigorTransport_ServerSendResponse opID=1487b008 seq=887616: Completed
Snapshot request.
2023-01-12T23:00:55.409Z| vcpu-8| | I005: HBACommon: First write on
scsi0:0.fileName='/vmfs/volumes/61364720-e494cfe4-6cff-b083fed97d91/jaguar/jaguar-000001.vmdk'
2023-01-12T23:00:55.409Z| vcpu-8| | I005: DDB: "longContentID" =
"08bf301ae8e75c151d2f273571a4ea9f" (was
"2a6fd4c33a60f8d724ccc100a666f0d7")
2023-01-12T23:00:57.906Z| vcpu-8| | I005: DISKLIB-CHAIN :
DiskChainUpdateContentID: old=0xa666f0d7, new=0x71a4ea9f
(08bf301ae8e75c151d2f273571a4ea9f)
2023-01-12T23:00:57.906Z| vcpu-9| | I005: Chipset: The guest has
requested that the virtual machine be hard reset.

I'm struggling to establish how the watchdog timer (or equivalent) is
configured :( Maybe increasing its trigger time would solve the issue?

Any other ideas / similar experiences?

Regards,
Adam


_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] Proxmox VM hard resets

Reply via email to