Hi Alex,
How many overall zones do you have configured in the system?
We have an issue with pubsub based notifications, where we may get as many
as (#zone-1) duplicates per object.
This, however, won't explain 13 events per object.

Did you verify that these are indeed the same events? For the same object,
do you see the same mtime and etag?

RGW restarts may also explain the issue, if the RGW restarts mid bucket
syncing, it will restart the sync over, and this may result in duplicate
notification events.

As a side note, why are you using pubsub based notifications and not the
regular "push" bucket notifications [1]?
The application APIs for bucket notifications are more standard (exists in
most SDKs), and the feature is more robust (especially with the async
"persistent" notifications), easier to configure (no special zone is
needed), and more feature-rich.

Yuval

[1] https://docs.ceph.com/en/octopus/radosgw/notifications/



On Mon, Oct 11, 2021 at 9:54 PM Alex Hussein-Kershaw <alex...@microsoft.com>
wrote:

> Hi Ceph-Users,
>
> I have a multisite Ceph cluster deployed on containers within 3 VMs (6 VMs
> total over 2 sites). Each VM has a mon, osd, mgr, mds, and two rgw
> containers (regular and pubsub).  It was installed with ceph-ansible.
>
> One of the sites has been up for a few years, the other site has been
> recently re-installed and paired with the initial site. The initial site is
> using Nautlius (14.2.9), the new site is on Octopus (15.2.13). (Side point
> - is this valid?)
>
> I've noticed that on the new site, pubsub is building a gigantic queue of
> objects (it's building faster than our product can acknowledge the events).
> I'm having a rough time trying to debug this/understand why the queue is
> building.
>
> I currently have 450k objects stored in an S3 bucket, that is mostly
> inactive (our test system backed by this cluster is off while we attempt to
> resolve this), synced between the two sites. The pubsub queue on the second
> site currently has 1.7M objects, and I've disabled the pubsub containers to
> prevent it building further.  As soon as I enable the pubsub containers
> again this starts building at an alarming rate.
>
> What I've tried:
>
>   *   Interacting with the pubsub REST API. I pulled all the events in the
> pubsub queue and did some analysis on them.
>   *   Of the 1.7M events, there were 106k unique S3 objects referenced.
>   *   The average S3 object had 13 pubsub events referring to it. This
> seems very odd given the inactivity of the data, I was expecting to find no
> duplicate entries here.
>   *   The most mentioned S3 object was referred to 362 times (i.e. a
> single S3 object had 362 pubsub OBJECT_CREATE events).
>   *   All the mTimes are from 2020 (other than 35 in 2021) - the second
> site was only deployed this month.
>
> Does anyone have any suggestions as to why this is occurring?
>
> Thanks,
> Alex
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to