[ceph-users] Too many objects per pg than average: deadlock situation

Mike A Sun, 20 May 2018 13:29:01 -0700

Hello!

In our cluster, we see a deadlock situation.
This is a standard cluster for an OpenStack without a RadosGW, we have a 
standard block access pools and one for metrics from a gnocchi.
The amount of data in the gnocchi pool is small, but objects are just a lot.


When planning a distribution of PG between pools, the PG are distributed 
depending on the estimated data size of each pool. Correspondingly, as 
suggested by pgcalc for the gnocchi pool, it is necessary to allocate a little 
PG quantity.

As a result, the cluster is constantly hanging with the error "1 pools have 
many more objects per pg than average" and this is understandable: the gnocchi 
produces a lot of small objects and in comparison with the rest of pools it is 
tens times larger.

And here we are at a deadlock:
1. We can not increase the amount of PG on the gnocchi pool, since it is very 
small in data size
2. Even if we increase the number of PG - we can cross the recommended 200 PGs 
limit for each OSD in cluster
3. Constantly holding the cluster in the HEALTH_WARN mode is a bad idea
4. We can set the parameter "mon pg warn max object skew", but we do not know 
how the Ceph will work when there is one pool with a huge object / pool ratio

There is no obvious solution.

How to solve this problem correctly?
— 
Mike, runs!
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Too many objects per pg than average: deadlock situation

Reply via email to