--- Begin Message ---
Hi Rainer,

El 12/7/21 a las 8:53, Rainer Krienke escribió:
Hello,

I run a 5 node PVE cluster with pve-manager/6.4-8/185e14db (running kernel: 5.4.119-1-pve). The storage backend is a HDD based "external" ceph cluster running Ceph 14.2.16 with 144 OSDs on 9 hosts. Currently there are about 70 VMs running on this PVE cluster, all Linux (Ubuntu, SLES).

The problem I have is that writing on VMS has become slower and slower over time and eg running linux updates (eg apt upgrade) on the VMS takes longer and longer. The reason seams to be a steadily rising write IOPs rate on the storage side. Of course over time the number of VNMs also increased up to the current number causing higher numbers.

Over the week day I can see rates on the ceph side of up to 1000 IOPS/sec writing and about 300 IOPS/sec reading. The really stange thing is however that even at weekends where the services the VMs offer are hardly used at all, there is still a quite high write IOPS rate of about 400/sec whereas the read rate is only about 50 IOPS/sec then. The Bytes read/written are minimal at this time with only about 100KBytes read/sec and  about 5MBytes write/sec.

I don't think you should have I/O problems with a Ceph cluster with 144 OSDs and 9 hosts if they are healthy, you should be able to perform more than that. I'd suspect of some host or OSDs performing poorly that break whole cluster's performance...


So what I am looking for is by what the "always there" write IOPS-Rate of about 400 could be caused.  My guess is that this could be caused by file time (mtime,ctime,atime) write updates to the VMs filesystems.  If this was true then using lazytime in /etc/fstab on all VMs could help to avoid this behaviour.

But on the other hand all VMs use the (safe) "Writeback"-cache setting. So shouldn't this cache mode also cache writes caused by updates for file times?

If yes, than I have to look for other reasons for my write IOPS problem allthough I have no idea about this at the moment.  Any suggestions?

We have a cluster with 62 VMs running (mostly Linux but also some Windows), I'm seeing right now 5-15MB/s read  and 5-35MB/s writes, with IOPS ~500 read and ~200 writes. This is with two pools, one 4 SSD OSD based and the other 11 HDD OSDs. HDD pool has 45 VMs running on it and apt upgrade performance is good...

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to