Hi Robert,
This sounds very much like a big problem we had 2 weeks back.

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EWPPEMPAJQT6GGYSHM7GIM3BZWS2PSUY/

Are you running EC? Which version are you running? It would fit our narrative 
if you use EC and recently updated to 14.2.11+

For some reason this memory use started growing a day after we updated to 
14.2.13. Another case I read was 14.2.11 I think. We don't know if the pg_logs 
hadn't really been used before, or if each entry size just grew much larger 
after the update for some reason. We don't see this in our replicated pools.

We significantly reduced the default pg_log amount from 3000->500. If your 
cluster is still up and pgs are healthy, this should be doable online. Sadly we 
couldn't support the memory usage, and OSD processes started to get OOM killed. 
We had to trim these logs offline, which sadly affected our production.

Cheers,
Kalle


----- Original Message -----
> From: "Robert Brooks" <robert.bro...@riskiq.net>
> To: "ceph-users" <ceph-users@ceph.io>
> Sent: Wednesday, 25 November, 2020 20:23:05
> Subject: [ceph-users] high memory usage in osd_pglog

> We are seeing very high osd_pglog usage in mempools for ceph osds. For
> example...
> 
>    "mempool": {
>        "bloom_filter_bytes": 0,
>        "bloom_filter_items": 0,
>        "bluestore_alloc_bytes": 41857200,
>        "bluestore_alloc_items": 523215,
>        "bluestore_cache_data_bytes": 50876416,
>        "bluestore_cache_data_items": 1326,
>        "bluestore_cache_onode_bytes": 6814080,
>        "bluestore_cache_onode_items": 13104,
>        "bluestore_cache_other_bytes": 57793850,
>        "bluestore_cache_other_items": 2599669,
>        "bluestore_fsck_bytes": 0,
>        "bluestore_fsck_items": 0,
>        "bluestore_txc_bytes": 29904,
>        "bluestore_txc_items": 42,
>        "bluestore_writing_deferred_bytes": 733191,
>        "bluestore_writing_deferred_items": 96,
>        "bluestore_writing_bytes": 0,
>        "bluestore_writing_items": 0,
>        "bluefs_bytes": 101400,
>        "bluefs_items": 1885,
>        "buffer_anon_bytes": 21505818,
>        "buffer_anon_items": 14949,
>        "buffer_meta_bytes": 1161512,
>        "buffer_meta_items": 13199,
>        "osd_bytes": 1962920,
>        "osd_items": 167,
>        "osd_mapbl_bytes": 825079,
>        "osd_mapbl_items": 17,
>        "osd_pglog_bytes": 14099381936,
>        "osd_pglog_items": 134285429,
>        "osdmap_bytes": 734616,
>        "osdmap_items": 26508,
>        "osdmap_mapping_bytes": 0,
>        "osdmap_mapping_items": 0,
>        "pgmap_bytes": 0,
>        "pgmap_items": 0,
>        "mds_co_bytes": 0,
>        "mds_co_items": 0,
>        "unittest_1_bytes": 0,
>        "unittest_1_items": 0,
>        "unittest_2_bytes": 0,
>        "unittest_2_items": 0
>    },
> 
> Where roughly 14g is required for pg_logs. Cluster has 106 OSD and 2432
> placement groups.
> 
> The pg log count for placement groups is much less than 134285429 logs.
> 
> Top counts are...
> 
> 1486 1.41c
> 883 7.3
> 834 7.f
> 683 7.13
> 669 7.a
> 623 7.5
> 565 7.8
> 560 7.1c
> 546 7.16
> 544 7.19
> 
> Summing these gives 21594 pg logs.
> 
> Overall the performance of the cluster is poor, OSD memory usage is high
> (20-30G resident), and with a moderate workload we are seeing iowait on OSD
> hosts. The memory allocated to caches appears to be low, I believe because
> osd_pglog is taking most of the available memory.
> 
> Regards,
> 
> Rob
> 
> --
> *******************************************************************
> This
> message was sent from RiskIQ, and is intended only for the designated
> recipient(s). It may contain confidential or proprietary information and
> may be subject to confidentiality protections. If you are not a designated
> recipient, you may not review, copy or distribute this message. If you
> receive this in error, please notify the sender by reply e-mail and delete
> this message. Thank you.
> 
> *******************************************************************
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to