I ended up giving up after trying everything I could find in the forums and docs, deleted the problematic zone, and then re-added it back to the zonegroup and re-established the group sync policy for the bucket in question. The sync-status is OK now, though the error list still shows a bunch of errors from yesterday that I cannot figure out how to clear ("sync error trim" doesn't do anything that I can tell).
My opinion is that multisite sync policy in the current Pacific release (16.2.9) is still very fragile and poorly documented as far as troubleshooting goes. I'd love to see clear explanations of the various data and metadata operations - metadata, data, bucket, bilog, datalog. It's hard to know where to start when things get into a bad state and the online resources are not helpful enough. Another question, if a sync policy is defined on a bucket already has some objects in it, what command should be used to force a sync operation based on the new policy? It seems that only objects added AFTER the policy is applied get replicated, pre-existing ones are not replicated. ________________________________ From: Wyll Ingersoll <wyllys.ingers...@keepertech.com> Sent: Thursday, June 9, 2022 9:35 AM To: Amit Ghadge <amitg....@gmail.com>; ceph-users@ceph.io <ceph-users@ceph.io>; d...@ceph.io <d...@ceph.io> Subject: [ceph-users] Re: radosgw multisite sync - how to fix data behind shards? I think you mean "radosgw-admin sync error list", in which case there are 32 shards, each with the same error. I dont see errors on the master zone logs so I'm not sure how to correct the situation. "shard_id": 31, "entries": [ { "id": "1_1654722349.230688_62850.1", "section": "data", "name": "zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6", "timestamp": "2022-06-08T21:05:49.230688Z", "info": { "source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d", "error_code": 125, "message": "failed to sync bucket instance: (125) Operation canceled" } } ] } ________________________________ From: Amit Ghadge <amitg....@gmail.com> Sent: Wednesday, June 8, 2022 9:16 PM To: Wyll Ingersoll <wyllys.ingers...@keepertech.com> Subject: Re: radosgw multisite sync - how to fix data behind shards? check any error by running command radosgw-admin data sync error list -AmitG On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll <wyllys.ingers...@keepertech.com<mailto:wyllys.ingers...@keepertech.com>> wrote: Seeking help from a radosgw expert... I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 bucket per zone and a couple of small objects in each bucket for testing purposes. One of the secondary zones cannot get seem to get into sync with the master, sync status reports: metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1) syncing full sync: 128/128 shards full sync: 66 buckets to sync incremental sync: 0/128 shards data is behind on 128 shards behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127] I have tried using "data sync init" and restarting the radosgw multiple times, but that does not seem to be helping in any way. If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just hangs forever and doesn't appear to do anything. Checking the sync status never shows any improvement in the shards. It is very hard to figure out what to do as there are a several sync commands - bucket sync, data sync, metadata sync - and it is not clear what effect they have or how to properly run them when the syncing gets confused. Any guidance on how to get out of this situation would be greatly appreciated. I've read lots of threads on various mailing list archives (via google search) and very few of them have any sort of resolution or recommendation that is confirmed to have fixed these sort of problems. _______________________________________________ Dev mailing list -- d...@ceph.io<mailto:d...@ceph.io> To unsubscribe send an email to dev-le...@ceph.io<mailto:dev-le...@ceph.io> _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io