Hi

After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool 
(which was way too low, only 100 for 9 osds) the cluster started to eat 
additional disk space.

First I thought that was caused by the moved PGs using additional space during 
unfinished backfills. I pinned most of new PGs to old OSDs via `pg-upmap` and 
indeed it freed some space in the cluster.

Then I reduced osd_max_backfills to 1 and started to remove upmap pins in small 
portions which allowed Ceph to finish backfills for these PGs.

HOWEVER, used capacity still grows! It drops after moving each PG, but still 
grows overall.

It has grown +1.3TB yesterday. In the same period of time clients have written 
only ~200 new objects (~800 MB, there are RBD images only).

Why, what's using such big amount of additional space?

Graphs from our prometheus are attached. Only ~200 objects were created by RBD 
clients yesterday, but used raw space increased +1.3 TB.

Additional question is why ceph df / rados df tells there is only 16 TB actual 
data written, but it uses 29.8 TB (now 31 TB) of raw disk space. Shouldn't it 
be 16 / 2*3 = 24 TB ?

ceph df output:

[root@sill-01 ~]# ceph df
GLOBAL:
    SIZE       AVAIL       RAW USED     %RAW USED 
    38 TiB     6.9 TiB       32 TiB         82.03 
POOLS:
    NAME           ID     USED        %USED     MAX AVAIL     OBJECTS 
    ecpool_hdd     13      16 TiB     93.94       1.0 TiB     7611672 
    rpool_hdd      15     9.2 MiB         0       515 GiB          92 
    fs_meta        44      20 KiB         0       515 GiB          23 
    fs_data        45         0 B         0       1.0 TiB           0 

How to heal it?
-- 
With best regards,
  Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to