Hi ceph users, I am using CephFS for file storage and I have noticed that the data gets distributed very unevenly across OSDs.
* I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data pool with 2 replicas, which is in line with the total PG recommendation if "Total PGs = (OSDs * 100) / pool_size" from the docs. * CephFS distributes the data pretty much evenly across the PGs as shown by 'ceph pg dump' * However - the number of PGs assigned to various OSDs (per weight unit/terabyte) varies quite a lot. The fullest OSD has as many as 44 PGs per terabyte (weight unit), while the emptier ones have as few as 19 or 20. * Even if I consider the total number of PGs for all pools per OSD, the number varies similarly wildly (as with the cephfs_data pool only). As a result, when the whole CephFS file system is at 60% full, some of the OSDs already reach the 95% full condition, and no more data can be written to the system. Is there any way to force a more even distribution of PGs to OSDs? I am using the default crush map, with two levels (root/host). Can any changes to the crush map help? I would really like to be get higher disk utilization than 60% without 1 of 90 disks filling up so early. Thanks, Andras
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com