Hello Yoann,

On Fri, Oct 7, 2022 at 10:51 AM Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>
> Hello,
>
> >> Is 256 good value in our case ? We have 80TB of data with more than 300M 
> >> files.
> >
> > You want at least as many PGs that each of the OSDs host a portion of the 
> > OMAP data. You want to spread out OMAP to as many _fast_ OSDs as possible.
> >
> > I have tried to find an answer to your question: are more metadata PGs 
> > better? I haven't found a definitive answer. This would ideally be tested 
> > in a non-prod / pre-prod environment and tuned
> > to individual requirements (type of workload). For now, I would not blindly 
> > trust the PG autoscaler. I have seen it advise settings that would 
> > definately not be OK. You can skew things in the
> > autoscaler with the "bias" parameter, to compensate for this. But as far as 
> > I know the current heuristics to determine a good value do not take into 
> > account the importance of OMAP (RocksDB)
> > spread accross OSDs. See a blog post about autoscaler tuning [1].
> >
> > It would be great if tuning metadata PGs for CephFS / RGW could be 
> > performed during the "large scale tests" the devs are planning to perform 
> > in the future. With use cases that take into
> > consideration "a lot of small files / objects" versus "loads of large files 
> > / objects" to get a feeling how tuning this impacts performance for 
> > different work loads.
> >
> > Gr. Stefan
> >
> > [1]: https://ceph.io/en/news/blog/2022/autoscaler_tuning/
>
> Thanks for the information, I agree that autoscaler seem to not be able to 
> handle my use case.
> (thanks to icepic...@gmail.com too)
>
> By the way, since I have set PG=256, I have much less SLOW requests than 
> before, even I still have, the impact on my users has been reduced a lot.
>
> > # zgrep -c -E 'WRN.*(SLOW_OPS|SLOW_REQUEST|MDS_SLOW_METADATA_IO)' 
> > floki.log.4.gz floki.log.3.gz floki.log.2.gz floki.log.1.gz floki.log
> > floki.log.4.gz:6883
> > floki.log.3.gz:11794
> > floki.log.2.gz:3391
> > floki.log.1.gz:1180
> > floki.log:122
>
> If I have the opportunity, I will try to run some benchmark with multiple 
> value of the PG on cephfs_metadata pool.

256 sounds like a good number to me. Maybe even 128. If you do some
experiments, please do share the results.

Also, you mentioned you're using 7 active MDS. How's that working out
for you? Do you use pinning?


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to