Hi Yuval,

Thanks for the info. So, this is a side effect of pub sub sitting on-top of the 
RGW sync mechanism? I've re-included ceph-users mailing list on this email in 
case anyone has ideas how to alleviate this.

Some good news on my part is that I've managed to clear 16 of the large OMAP 
objects with the instructions here [1]. That is, bilog trimming and running a 
deep scrub on the affected PGs.

That leaves the large OMAP objects in the "siteApubsub.rgw.log" pool that I am 
still hoping to find a way to clear. These are the objects of the form 
"9:03d18f4d:::data_log.47:head". From [2] I gather that these are used for 
multisite syncing. Our pubsub zones are not syncing between multisite. I wonder 
if that makes this simply a misconfiguration and the fix is just a correction 
to config.

I've been doing some digging today and found that our pubsub zone has the 
following config:

        {
            "id": "4f442377-4b71-4c6a-aaa9-ba945d7694f8",
            "name": "siteApubsub",
            "endpoints": [
                https://10.225.41.200:7481,
                https://10.225.41.201:7481,
                https://10.225.41.202:7481
            ],
            "log_meta": "false",
            "log_data": "true",
            "bucket_index_max_shards": 11,
            "read_only": "false",
            "tier_type": "pubsub",
            "sync_from_all": "false",
            "sync_from": [
                "siteA"
            ],
            "redirect_zone": ""
        }

And sync status shows...

source: 4f442377-4b71-4c6a-aaa9-ba945d7694f8 (siteApubsub)
              not syncing from zone

If I set the "log_data" field to false, I think this simply stops writing these 
files which are not required anyway. And presumably have been building up 
gradually forever as the normal trimming is not occurring as there is no 
multisite sync.

So my question to any who may be able to answer:

  *   Is the above analysis sound?
  *   Can I update the zone config and delete these data_log objects manual to 
restore my cluster to HEALTH_OK?

Thanks,
Alex

[1] https://access.redhat.com/solutions/6450561
[2] https://www.spinics.net/lists/ceph-users/msg54282.html

From: Yuval Lifshitz <ylifs...@redhat.com>
Sent: Thursday, October 27, 2022 5:35 PM
To: Alex Hussein-Kershaw (HE/HIM) <alex...@microsoft.com>
Subject: Re: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub

Hi Alex,
I checked with the RGW people working on multisite, they say they observed that 
in high-load tests (unrelated to pubsub).
This means that even if this is fixed, the fix is not going to be backported to 
octopus.
If they have some kind of workaround, I will let you know.

Yuval


On Thu, Oct 27, 2022 at 5:50 PM Alex Hussein-Kershaw (HE/HIM) 
<alex...@microsoft.com<mailto:alex...@microsoft.com>> wrote:
Hi Yuval,

Thanks for your reply and consideration. It's much appreciated. We don't use 
kafka (nor do I know what it is - I had a quick google) but I think the concern 
is the same - if our client goes down and misses notifications from Ceph we 
need Ceph to resend it until it is acknowledged. Sounds like the bucket 
notification and persistent notifications fits this requirement perfectly. I'll 
flag with my Team that this is available in Pacific, and that we should take it 
when we move.

That said, we're still on Octopus for our main release so while that gives us a 
direction for future, I'd still like to find a solution to the initial problem 
as we have slow-moving customers who might stick with Octopus for several years 
even after we offer a Pacific (and bucket notification) based solution.

Interestingly we've not seen this at any customers systems, only on our heavily 
loaded test system. I suspect the high and regular load this system receives 
must be the cause of this. I've contemplated fully stopping the load for a 
month or so and observing the effect. I wonder if we're out-pacing some 
clean-up mechanism (I think we've seen similar things elsewhere in our Ceph 
usage).

However, we're fairly limited on virtualisation rig space and don't want to sit 
this system idle if we can avoid it.

Best wishes,
Alex

From: Yuval Lifshitz <ylifs...@redhat.com<mailto:ylifs...@redhat.com>>
Sent: Thursday, October 27, 2022 10:05 AM
To: Alex Hussein-Kershaw (HE/HIM) 
<alex...@microsoft.com<mailto:alex...@microsoft.com>>
Subject: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub

Hi Alex,
Not sure I can help you here. We recommend using the "bucket notification" [1] 
mechanism over "pubsub" [2] (since it is not maintained, lacks many 
functionalities, and will be deprecated).
If you are concerned with kafka outages, you can use persistent notifications 
[3] (they will retry until the broker is up again) which have been available 
since Ceph 16 (pacific).

It looks like an issue with the site syncing process (which drives pubsub), so 
I will try to figure out if there is a simple fix here.

Yuval

[1] 
https://docs.ceph.com/en/latest/radosgw/notifications/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PTwnZyi9y6jdXGkvdD2HeoMxNL%2BlughLO5qy3vtlGCA%3D&reserved=0>
[2] 
https://docs.ceph.com/en/latest/radosgw/pubsub-module/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fpubsub-module%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6muaFhFiDJRH%2B3s1tF5akT2BYXRHL1ejGHdTVq2GhKE%3D&reserved=0>
[3] 
https://docs.ceph.com/en/latest/radosgw/notifications/#notification-reliability<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F%23notification-reliability&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HSBo9d3x1gj7vbqmMM6jNbGD8%2Bi8NquvrdjnT3mEVY8%3D&reserved=0>

On Wed, Oct 26, 2022 at 11:57 AM Alex Hussein-Kershaw (HE/HIM) 
<alex...@microsoft.com<mailto:alex...@microsoft.com>> wrote:
Hi Yuval,

Hope you are well. I think pubsub is your area of expertise (we've briefly 
discussed it in the past).

Would love to get your advice on the below email if possible.

Kindest regards,
Alex
________________________________
From: Alex Hussein-Kershaw (HE/HIM)
Sent: Tuesday, October 25, 2022 2:48 PM
To: Ceph Users <ceph-users@ceph.io<mailto:ceph-users@ceph.io>>
Subject: Large OMAP Objects & Pubsub

Hi All,

Looking to get some advice on an issue my clusters have been suffering from. 
Realize there are lots of text below. Thanks in advance for your consideration.

The cluster has a health warning of "32 large omap objects". It's persisted for 
several months.

It appears functional and there are no indications of a performance problem at 
the client for now (no slow ops - everything seems to work fine). It is a 
multisite cluster with CephFS & S3 in use, as well as pubsub. It is running 
Ceph version 15.2.13.

We run automated client load tests against this system every day and have been 
doing that for a year or longer against this system. The key counts of the 
large OMAP objects in question are growing, I've monitored this over a period 
of several months. Intuitively I gather this means at some point in the future 
I will hit performance problems as a result of this.

Large OMAP objects are split across two pools: siteApubsub.rgw.log and 
siteApubsub.rgw.buckets.index. My client is responsible for processing the 
pubsub queue. It appears to be doing that successfully: there are no objects in 
the pubsub data pool as shown in the details below.

I've been keeping a spreadsheet to track the growth of these, assuming I can't 
attach a file to the mailing list so I've uploaded an image of it here: 
https://imgur.com/a/gAtAcvp<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FgAtAcvp&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1RId6I4egHiYi4s2Ixs%2FdVvep3jqvU%2FpZTjjSll9i98%3D&reserved=0>.
 The data shows constant growth of all of these objects through the last couple 
of months. It also includes the names of the objects, where there are two 
categories:

  *   16 instances of objects with names like: 9:03d18f4d:::data_log.47:head
  *   16 instances of objects with names like: 
13:0118e6b8:::.dir.4f442377-4b71-4c6a-aaa9-ba945d7694f8.84778.1.15:head
Please find output of a few Ceph commands below giving details of the cluster.

  *   I'm really keen to understand this better and would be more than happy to 
share additional diags.
  *   I'd like to understand what I need to do to remove these large OMAP 
objects and prevent future build ups, so I don't need to worry about the 
stability of this system.
Thanks,
Alex


$ ceph -s
    id:     0b91b8be-3e01-4240-bea5-df01c7e53b7c
    health: HEALTH_WARN
            32 large omap objects

  services:
    mon: 3 daemons, quorum albans_sc0,albans_sc1,albans_sc2 (age 6w)
    mgr: albans_sc2(active, since 6w), standbys: albans_sc1, albans_sc0
    mds: cephfs:1 {0=albans_sc2=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 6w), 3 in (since 10M)
    rgw: 6 daemons active (albans_sc0.pubsub, albans_sc0.rgw0, 
albans_sc1.pubsub, albans_sc1.rgw0, albans_sc2.pubsub, albans_sc2.rgw0)

  task status:

  data:
    pools:   14 pools, 137 pgs
    objects: 4.52M objects, 160 GiB
    usage:   536 GiB used, 514 GiB / 1.0 TiB avail
    pgs:     137 active+clean

  io:
    client:   28 MiB/s rd, 1.2 MiB/s wr, 673 op/s rd, 189 op/s wr


$ ceph health detail
HEALTH_WARN 32 large omap objects
[WRN] LARGE_OMAP_OBJECTS: 32 large omap objects
    16 large objects found in pool 'siteApubsub.rgw.log'
    16 large objects found in pool 'siteApubsub.rgw.buckets.index'
    Search the cluster log for 'Large omap object found' for more details.

$ ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    1.0 TiB  514 GiB  496 GiB   536 GiB      51.07
TOTAL  1.0 TiB  514 GiB  496 GiB   536 GiB      51.07

--- POOLS ---
POOL                           ID  PGS  STORED   OBJECTS  USED     %USED  MAX 
AVAIL
device_health_metrics           1    1      0 B        0      0 B      0    153 
GiB
cephfs_data                     2   32  135 GiB    1.99M  415 GiB  47.50    153 
GiB
cephfs_metadata                 3   32  3.3 GiB    2.09M  9.8 GiB   2.09    153 
GiB
siteA.rgw.buckets.data          4   32   24 GiB  438.62k   80 GiB  14.88    153 
GiB
.rgw.root                       5    4   19 KiB       29  1.3 MiB      0    153 
GiB
siteA.rgw.log                   6    4   79 MiB      799  247 MiB   0.05    153 
GiB
siteA.rgw.control               7    4      0 B        8      0 B      0    153 
GiB
siteA.rgw.meta                  8    4   13 KiB       37  1.6 MiB      0    153 
GiB
siteApubsub.rgw.log             9    4  1.9 GiB      789  5.7 GiB   1.22    153 
GiB
siteA.rgw.buckets.index        10    4  456 MiB       31  1.3 GiB   0.29    153 
GiB
siteApubsub.rgw.control        11    4      0 B        8      0 B      0    153 
GiB
siteApubsub.rgw.meta           12    4   11 KiB       40  1.7 MiB      0    153 
GiB
siteApubsub.rgw.buckets.index  13    4  2.0 GiB       47  6.1 GiB   1.31    153 
GiB
siteApubsub.rgw.buckets.data   14    4      0 B        0      0 B      0    153 
GiB





_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to