Hi, We've been trying to set up multi-site sync on two test VMs before rolling things out on actual production hardware. Both are running Ceph 18.2.4 deployed via cephadm. Host OS is Debian 12, container runtime is podman (switched from Debian 11 and docker.io, same error there). There is only one RGW daemon on each site. Ceph config is pretty much defaults. One thing I did change was setting rgw_relaxed_region_enforcement to true because the zonegroup got renamed from "default" during the switch to multi-site using the dashboard's assistant. There's nothing special like server-side encryption either. Our end goal is to replicate all RGW data from our current cluster to a new one.
The Multi-Site configuration itself went pretty smoothly through the dashboard and pre-existing data started syncing right away. Unfortunately, not all objects made it. To be precise, none of the larger objects over the multipart threshold got synced. This is consistent for newly uploaded multipart objects as well. Curiously, it's working fine in the other direction, i.e. multipart uploads from the secondary zone do get synced to the master. Here are some relevant logs: >From `radosgw-admin sync error list`: { "shard_id": 26, "entries": [ { "id": "1_1722598249.479766_23730.1", "section": "data", "name": "foobar/new:5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3:7/logstash_1%3a8.12.2-1_amd64.deb", "timestamp": "2024-08-02T11:30:49.479766Z", "info": { "source_zone": "5160b406-4428-4fdc-9c5d-5ec9fe9404c0", "error_code": 35, "message": "failed to sync object(35) Resource deadlock avoided" } } ] }, >From RGW on the receiving end: Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.474+0000 7f3a6243e640 0 rgw async rados processor: store->fetch_remote_obj() returned r=-35 Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.474+0000 7f3a36b7b640 2 req 7168648379339657593 0.000000000s :list_data_changes_log normalizing buckets and tenants Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.474+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log init permissions Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log recalculating target Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log reading permissions Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log init op Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log verifying op mask Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log verifying op permissions Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 overriding permissions due to system operation Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a36b7b640 2 req 7168648379339657593 0.003999872s :list_data_changes_log verifying op params Aug 02 13:30:49 dev-ceph-single bash[754387]: debug 2024-08-02T11:30:49.478+0000 7f3a5241e640 0 RGW-SYNC:data:sync:shard[28]:entry[foobar/new:5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3:7[0]]:bucket_sync_sources[source=foobar:new[5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3]):7:source_zone=5160b406-4428-4fdc-9c5d-5ec9fe9404c0]:bucket[foobar/new:5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3<-foobar/new:5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3:7]:inc_sync[foobar/new:5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3:7]:entry[logstash_1%3a8.12.2-1_amd64.deb]: ERROR: failed to sync object: foobar/new:5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3:7/logstash_1%3a8.12.2-1_amd64.deb And from the sender: Aug 02 13:30:49 test-ceph-single bash[885118]: debug 2024-08-02T11:30:49.476+0000 7f0acfdb2640 1 ====== req done req=0x7f0ab50e4710 op status=-104 http_status=200 latency=0.419986606s ====== Aug 02 13:30:49 test-ceph-single bash[885118]: debug 2024-08-02T11:30:49.476+0000 7f0ba9f66640 2 req 5943847843579143466 0.000000000s initializing for trans_id = tx00000527cca1f3381a52a-0066acc369-c052e6-eu2 Aug 02 13:30:49 test-ceph-single bash[885118]: debug 2024-08-02T11:30:49.476+0000 7f0acfdb2640 1 beast: 0x7f0ab50e4710: 10.139.0.151 - synchronization-user [02/Aug/2024:11:30:49.056 +0000] "GET /foobar%3Anew/logstash_1%253a8.12.2-1_amd64.deb?rgwx-zonegroup=9c1ee979-4362-45a1-ae70-2a83a30ea9fc&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a0fab4b8-ec26-4a11-85dd-abab2e3205fa%3Afoobar%2Fnew%3A5160b406-4428-4fdc-9c5d-5ec9fe9404c0.12564119.3 HTTP/1.1" 200 138413732 - - - latency=0.419986606s Aug 02 13:30:49 test-ceph-single bash[885118]: debug 2024-08-02T11:30:49.476+0000 7f0ba9f66640 2 req 5943847843579143466 0.000000000s getting op 0 Aug 02 13:30:49 test-ceph-single bash[885118]: debug 2024-08-02T11:30:49.476+0000 7f0ba9f66640 2 req 5943847843579143466 0.000000000s :list_metadata_log verifying requester They all keep running into the same error: "failed to sync object(35) Resource deadlock avoided" Any ideas? Thanks! -- Mit freundlichem Gruß // Best regards, Tino Lehnig Cloud Architect Contabo GmbH Aschauer Straße 32a 81549 München https://contabo.com E-Mail: tino.leh...@contabo.de Amtsgericht München HRB 180722 Vertretungsberechtigte Geschäftsführer: Dr. Christian Böing & Thomas Schimmel _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io