Re: [ceph-users] PG explosion with erasure codes, power of two and "x pools have many more objects per pg than average"

Paul Emmerich Fri, 25 May 2018 11:26:23 -0700

Answes inline.

2018-05-25 17:57 GMT+02:00 Jesus Cea <j...@jcea.es>:


> Hi there.
>
> I have configured a POOL with a 8+2 erasure code. My target by space
> usage and OSD configuration, would be 128 PG, but since each configure
> PG will be using 10 actual "PGs", I have created the pool with only 8 PG
> (80 real PG). Since I can increase PGs but not decreasing it, this
> decision seems sensible.
>
> Some questions:
>
> 1. Documentation insists everywhere that the PG could should be a power
> of two. Would be nice to know the consequences of not following this
> recommendation. Would be nice to know too if being "close" to a power of
> two is better than be far away and if it is better to be close but below
> or close but a little bit more. If ideal value is 128 but I only can be
> 120 or 130, what should I choose?. 120 or 130?. Why?
>

Go for the next larger power of two under the assumption that your cluster
will grow.


>
> 2. As I understand, the PG count that should be "power of two" is "8",
> in this case (real 80 PG underneath). Good. In this case, the next step
> would be 16 (160 real PG). I would rather prefer to increase it to 12 or
> 13 (120/130 real PGs). Would it be reasonable?. What are the
> consequences of increasing PG to 12 or 13 instead of choosing 16 (the
> next power of two).
>

Data will be poorly balanced between PGs if it's not a power of two.


>
> 3. Is there any negative effect for CRUSH of using erasure code 8+2
> instead of 6+2 or 14+2 (power of two)?. I have 25 OSDs, so requiring 16
> for a single operation seems a bad idea, even more when my OSD
> capacities are very spread (from 150 GB to 1TB) and filling a small OSD
> would block writes in the entire pool.
>

EC rules don't have to be powers of two. And yes, too many chunks for
EC pools is a bad idea. It's rarely advisable to have a total of k + m
larger
than 8 or so.

Also, you should have at least k + m + 1 servers, otherwise full server
failures cannot be handled properly.

A large spread between the OSD capacities within one crush rule is also
usually a bad idea, 150 GB to 1 TB is typically too big.


>
> 4. Since I have created a erasure coded pool with 8 PG, I am getting
> warnings of "x pools have many more objects per pg than average". The
> data I am copying is coming from a legacy pool with PG=512. New pool PG
> is 8. That is creating ~30.000 objects per PG, far above average (616
> objects). What can I do?. Moving to 16 or 32 PGs is not going to improve
> the situation, but will consume PGs (32*10). Advice?.
>

Well, you reduced the number of PGs by a factor of 64, so you'll of course
see a large skew here. The option mon_pg_warn_max_object_skew
controls when this warning is shown, default is 10.


>
> 5. I understand the advice of having <300 PGs per OSD because memory
> usage, but I am wondering about the impact of the number of objects in
> each PG. I wonder if memory and resource wise, having 100 PG with 10.000
> objects each is far more demanding than 1000 PGs with 50 objects each.
> Since I have PGs with 300 objects and PGs with 30.000 objects, I wonder
> about the memory impact of each. What is the actual memory hungry factor
> in a OSD, PGs or objects per PG?.
>

PGs typically impose a bigger overhead. But PGs with a large number of
objects
can become annoying...


Paul


>
> Thanks for your time and knowledge :).
>
> --
> Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
> j...@jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
> Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
> jabber / xmpp:j...@jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG explosion with erasure codes, power of two and "x pools have many more objects per pg than average"

Reply via email to