I am running proxmox 7.3-4 with a now Debian 11 VM.

I have ZFS local storage in each server in the cluster. Every 15 minutes the VM is replicated to the other server(s). Recently I've upgraded a server from Debian 9 to Debian 11 and it started locking up. This didn't seem to have a certain amount of time that it took to lockup, or a certain number of replications.

Through some debugging I found this was the qemu-agent not unfreezing the OS after the replication. This should happen in under 100 ms is my understanding and from what I could see, it worked fine on all my other VM's with Ubuntu or RHEL.

I compared the agent from the debian 11 server and the Ubuntu servers, and debian was 5.2.0 vs 6.2.0 on Ubuntu. I compiled the agent from the 7.2.0 qemu sources (statically too if anyone wants a copy) and ran it from screen on a terminal on the Debian 11 VM. This still locked up hard after 2-4 hours.

Debian is using the stock kernel:
Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux

I read some things online and thought it might be related to VirtIO, and changed that to VirtIO single with no difference.

I've reverted back to the old kernel and am going to let this run.
4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux

Complicating this, the box is my observium install and I don't have another device watching it, so when it locks up, it takes my monitoring offline :-D

On the working Ubuntu boxes I'm running:
5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 
x86_64 GNU/Linux

Below is the log where this locks up, and there's no more output after the last one (I have verbose enabled)

1673846104.535376: debug: received EOF
1673846104.635560: debug: received EOF
1673846104.735735: debug: received EOF
1673846104.835868: debug: received EOF
1673846104.936067: debug: read data, count: 104, data: 
{"execute":"guest-sync-delimited","arguments":{"id":371290701}}
{"arguments":{},"execute":"guest-ping"}

1673846104.936136: debug: process_event: called
1673846104.936144: debug: processing command
1673846104.936216: debug: sending data, count: 23
1673846104.936257: debug: process_event: called
1673846104.936272: debug: processing command
1673846104.936350: debug: sending data, count: 15
1673846104.936833: debug: received EOF
1673846105.37003: debug: received EOF
1673846105.137190: debug: received EOF
1673846105.237344: debug: received EOF
1673846105.337525: debug: received EOF
1673846105.437693: debug: received EOF
1673846105.537907: debug: received EOF
1673846105.638096: debug: received EOF
1673846105.738307: debug: received EOF
1673846105.838495: debug: received EOF
1673846105.938652: debug: received EOF
1673846106.38813: debug: received EOF
1673846106.139011: debug: received EOF
1673846106.239210: debug: received EOF
1673846106.339403: debug: received EOF
1673846106.439583: debug: received EOF
1673846106.539782: debug: received EOF
1673846106.639990: debug: received EOF
1673846106.740190: debug: received EOF
1673846106.840388: debug: read data, count: 115, data: 
{"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
{"execute":"guest-fsfreeze-freeze","arguments":{}}

1673846106.840450: debug: process_event: called
1673846106.840465: debug: processing command
1673846106.840497: debug: sending data, count: 23
1673846106.840545: debug: process_event: called
1673846106.840563: debug: processing command
1673846106.841114: debug: disabling command: guest-get-time
1673846106.841131: debug: disabling command: guest-set-time
1673846106.841138: debug: disabling command: guest-shutdown
1673846106.841145: debug: disabling command: guest-file-open
1673846106.841151: debug: disabling command: guest-file-close
1673846106.841157: debug: disabling command: guest-file-read
1673846106.841164: debug: disabling command: guest-file-write
1673846106.841171: debug: disabling command: guest-file-seek
1673846106.841179: debug: disabling command: guest-file-flush
1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
1673846106.841202: debug: disabling command: guest-fstrim
1673846106.841209: debug: disabling command: guest-suspend-disk
1673846106.841217: debug: disabling command: guest-suspend-ram
1673846106.841225: debug: disabling command: guest-suspend-hybrid
1673846106.841232: debug: disabling command: guest-network-get-interfaces
1673846106.841239: debug: disabling command: guest-get-vcpus
1673846106.841245: debug: disabling command: guest-set-vcpus
1673846106.841251: debug: disabling command: guest-get-disks
1673846106.841257: debug: disabling command: guest-get-fsinfo
1673846106.841265: debug: disabling command: guest-set-user-password
1673846106.841272: debug: disabling command: guest-get-memory-blocks
1673846106.841278: debug: disabling command: guest-set-memory-blocks
1673846106.841286: debug: disabling command: guest-get-memory-block-info
1673846106.841294: debug: disabling command: guest-exec-status
1673846106.841303: debug: disabling command: guest-exec
1673846106.841311: debug: disabling command: guest-get-host-name
1673846106.841319: debug: disabling command: guest-get-users
1673846106.841326: debug: disabling command: guest-get-timezone
1673846106.841334: debug: disabling command: guest-get-osinfo
1673846106.841343: debug: disabling command: guest-get-devices
1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys
1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys
1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys
1673846106.841371: warning: disabling logging due to filesystem freeze


Other than disabling the agent, is there any reason this is hapening? I can't think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd 152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the host. Could this be something with the VirtIO pipe/IPC?

Anyone else seeing this or have any ideas?

--
Bryan Fields

727-409-1194 - Voice
http://bryanfields.net

_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to