This may be less of an issue now - the most traumatic experience for us,
back around hammer, memory usage under recovery+load ended up with OOM kill
of osds, needing more recovery, a pretty vicious cycle.

-KJ

On Wed, Nov 14, 2018 at 11:45 AM Vladimir Brik <
vladimir.b...@icecube.wisc.edu> wrote:

> Hello
>
> I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs and
> 4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about 400
> PGs each (a lot more pools use SSDs than HDDs). Servers are fairly
> powerful: 48 HT cores, 192GB of RAM, and 2x25Gbps Ethernet.
>
> The impression I got from the docs is that having more than 200 PGs per
> OSD is not a good thing, but justifications were vague (no concrete
> numbers), like increased peering time, increased resource consumption,
> and possibly decreased recovery performance. None of these appeared to
> be a significant problem in my testing, but the tests were very basic
> and done on a pretty empty cluster under minimal load, so I worry I'll
> run into trouble down the road.
>
> Here are the questions I have:
> - In practice, is it a big deal that some OSDs have ~400 PGs?
> - In what situations would our cluster most likely fare significantly
> better if I went through the trouble of re-creating pools so that no OSD
> would have more than, say, ~100 PGs?
> - What performance metrics could I monitor to detect possible issues due
> to having too many PGs?
>
> Thanks,
>
> Vlad
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Kjetil Joergensen <kje...@medallia.com>
SRE, Medallia Inc
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to