--- Begin Message ---
Am 27.12.2022 um 18:54 schrieb Óscar de Arriba:
Hello all,

 From ~1 week ago, one of my Proxmox nodes' data LVM is doing strange things.

  For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly 
to the motherboard controller (no PCIe HBA for the system+data disk) and it is 
brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. I have set 
up proxmox inside a cluster with LVM and making backups to a NFS external 
location.

Last week I tried to migrate an stopped VM of ~64 GiB from one server to 
another, and found out *the SSD started to underperform (~5 MB/s) after roughly 
55 GiB copied *(this pattern was repeated several times).
It was so bad that *even cancelling the migration, the SSD continued busy 
writting at that speeed and I need to reboot the instance, as it was completely 
unusable* (it is in my homelab, not running mission critical workloads, so it 
was okay to do that). After the reboot, I could remove the half-copied VM disk.

After that, (and several retries, even making a backup to an external storage 
and trying to restore the backup, just in case the bottleneck was on the 
migration process) I ended up creating the instance from scratch and migrating 
data from one VM to another - so the VM was crearted brand new and no 
bottleneck was hit.

The problem is that *now the pve/data logical volume is showing 377 GiB used, 
but the total size of stored VM disks (even if they are 100% approvisioned) is 
168 GiB*. I checked and both VMs have no snapshots.

I don't know if the reboot while writting to the disk (always having cancelled 
the migration first) damaged the LV in some way, but after thinking about it it 
does not even make sense that an SSD of this type ends up writting at 5 MB/s, 
even with the writting cache full. It should be writting far faster than that 
even without cache.

Some information about the storage:

`root@venom:~# lvs -a
   LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log 
Cpy%Sync Convert
   data            pve twi-aotz-- 377.55g             96.13  1.54
   [data_tdata]    pve Twi-ao---- 377.55g
   [data_tmeta]    pve ewi-ao----  <3.86g
   [lvol0_pmspare] pve ewi-------  <3.86g
   root            pve -wi-ao----  60.00g
   swap            pve -wi-ao----   4.00g
   vm-150-disk-0   pve Vwi-a-tz--   4.00m data        14.06
   vm-150-disk-1   pve Vwi-a-tz-- 128.00g data        100.00
   vm-201-disk-0   pve Vwi-aotz--   4.00m data        14.06
   vm-201-disk-1   pve Vwi-aotz--  40.00g data        71.51`

and can be also seen on this post on the forum I did a couple of days ago: 
https://forum.proxmox.com/threads/thin-lvm-showing-more-used-space-than-expected.120051/

Any ideas aside from doing a backup and reinstall from scratch?

Thanks in advance!

_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Hi,

Never used lvm-thin, so beware, this is just guessing, but to me this looks like, for some reason, something filled up your pool once (probably the migration?). Consumer SSDs don't perform well when allocation all space (at least to my knowledge) and, even there is still space in the pool, there are no free blocks (as for the SSDs controller). Therefore the low speed may come from this situation, as the controller needs to erase blocks, before writing them again, due to the lack of (known) free space. Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick. Also the "discard" option needs to be enabled for all volumes you want to trim, so check the VM config first.

hth
Martin



--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to