Dear Proxmox users,
I'm trying to install a 3-node cluster (latest proxmox/ceph) and
experience random freezes. The node can either be completely frozen (no
blinking cursor on console, no ping) or can get somewhat blocked /
slow etc.
This happens most often on node 2 (approx. 3-4 times / day), node 3
never got stuck within 14 days runtime, node 1 once.
Unfortunately I did not find any way to trigger this behaviour, however,
I *think* that this happens most often if I stress the machine in some
way (performance test within a virtual machine) and then idling the
machine.
When the machine freezes completely, there is no logfile. However, if it
is partially frozen, some info can be aquired via dmesg. (See attached
file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
perhaps there is some driver issue regarding this ethernet adapter?)
The system consists of the following components:
- AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
- ASRock Rack X470D4U2-2T (Mainboard)
- Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
- 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
- be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
- 2 * Micron 5300 PRO - Read Intensive 960GB, SATA
(MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
- LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
#1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
What I did so far (without success):
- Disabled C6 as I read that this CPU-state can lead to unstable systems
(via "python zenstates.py --c6-disable" -> still errors).
- Updated my Bios to the latest version (3.30)
- Checked that the CPU + RAM are compatible to the mainboard (they are
listed as compatible on the ASRock website)
- Checked logs in IPMI (undervoltage, temperature etc., nothing is
logged)
- Memory test (memtest86, no errors)
Do you have any clue what could be the reason for these freezes? Should
I think of some hardware error? Or is this some known Linux bug that can
be fixed?
Best Regards,
Hermann
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user