Hi
After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool
(which was way too low, only 100 for 9 osds) the cluster started to eat
additional disk space.
First I thought that was caused by the moved PGs using additional space during
unfinished backfills. I pinned most of new PGs to old OSDs via `pg-upmap` and
indeed it freed some space in the cluster.
Then I reduced osd_max_backfills to 1 and started to remove upmap pins in small
portions which allowed Ceph to finish backfills for these PGs.
HOWEVER, used capacity still grows! It drops after moving each PG, but still
grows overall.
It has grown +1.3TB yesterday. In the same period of time clients have written
only ~200 new objects (~800 MB, there are RBD images only).
Why, what's using such big amount of additional space?
Graphs from our prometheus are attached. Only ~200 objects were created by RBD
clients yesterday, but used raw space increased +1.3 TB.
Additional question is why ceph df / rados df tells there is only 16 TB actual
data written, but it uses 29.8 TB (now 31 TB) of raw disk space. Shouldn't it
be 16 / 2*3 = 24 TB ?
ceph df output:
[root@sill-01 ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
38 TiB 6.9 TiB 32 TiB 82.03
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
ecpool_hdd 13 16 TiB 93.94 1.0 TiB 7611672
rpool_hdd 15 9.2 MiB 0 515 GiB 92
fs_meta 44 20 KiB 0 515 GiB 23
fs_data 45 0 B 0 1.0 TiB 0
How to heal it?
--
With best regards,
Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com