[ceph-users] Slow RGW multisite sync due to "304 Not Modified" responses on primary zone

Alam Mohammad Sun, 11 Feb 2024 22:54:55 -0800

Hi,
We have 2 clusters (v18.2.1) primarily used for RGW which has over 2+ billion 
RGW objects. They are also in multisite configuration totaling to 2 zones and 
we've got around 2 Gbps of bandwidth dedicated (P2P) for the multisite traffic. 
We see that using "radosgw-admin sync status" on the zone 2, all the 128 shards 
are recovering and unfortunately there is very less data transfer from primary 
zone ie., the link utilization is barely 100 Mbps / 2 Gbps. Our objects are 
quite small as well like avg. of 1 MB in size. 
On further inspection, we noticed the rgw access the logs at primary site are 
mostly yielding "304 Not Modified" for RGWs at site-2. Is this expected? Here 
are some of the logs (information is redacted)


root@host-04:~# tail -f /var/log/haproxy-msync.log
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:33730 
[12/Feb/2024:05:06:51.047] https~ backend/host-04-msync 0/0/0/2/2 304 143 - - 
---- 56/55/1/0/0 0/0 "GET 
/bucket1/object1.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7
 HTTP/1.1"
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:59730 
[12/Feb/2024:05:06:51.048] https~ backend/host-04-msync 0/0/0/2/2 304 143 - - 
---- 56/55/3/1/0 0/0 "GET 
/bucket1/object91.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7
 HTTP/1.1"

We also took a look at our grafana instance and out of 1000 requests / second, 
200 are "200 OK" and 800 are "304 Not Modified". Sync threads are run on only 2 
rgw daemons per zone and are behind a Load Balancer. "# radosgw-admin sync 
error list" also contains around 20 errors which are mostly automatically 
recoverable.
As we understand, does it mean that RGW multisite sync logs in the log pool are 
yet to be generated or some sort? Please provide us some insights and let us 
know how to resolve this.

Thanks,
Saif
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Slow RGW multisite sync due to "304 Not Modified" responses on primary zone

Reply via email to