[ceph-users] Re: radosgw multisite sync - how to fix data behind shards?

Wyll Ingersoll Thu, 09 Jun 2022 11:48:11 -0700


I ended up giving up after trying everything I could find in the forums and 
docs, deleted the problematic zone, and then re-added it back to the zonegroup 
and re-established the group sync policy for the bucket in question.  The 
sync-status is OK now, though the error list still shows a bunch of errors from 
yesterday that I cannot figure out how to clear ("sync error trim" doesn't do 
anything that I can tell).


My opinion is that multisite sync policy in the current Pacific release 
(16.2.9) is still very fragile and poorly documented as far as troubleshooting 
goes.  I'd love to see clear explanations of the various data and metadata 
operations - metadata, data, bucket, bilog, datalog.  It's hard to know where 
to start when things get into a bad state and the online resources are not 
helpful enough.

Another question, if a sync policy is defined on a bucket already has some 
objects in it, what command should be used to force a sync operation based on 
the new policy? It seems that only objects added AFTER the policy is applied 
get replicated, pre-existing ones are not replicated.


________________________________
From: Wyll Ingersoll <wyllys.ingers...@keepertech.com>
Sent: Thursday, June 9, 2022 9:35 AM
To: Amit Ghadge <amitg....@gmail.com>; ceph-users@ceph.io <ceph-users@ceph.io>; 
d...@ceph.io <d...@ceph.io>
Subject: [ceph-users] Re: radosgw multisite sync - how to fix data behind 
shards?

I think you mean "radosgw-admin sync error list", in which case there are 32 
shards, each with the same error.  I dont see errors on the master zone logs so 
I'm not sure how to correct the situation.


        "shard_id": 31,
        "entries": [
            {
                "id": "1_1654722349.230688_62850.1",
                "section": "data",
                "name": 
"zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6",
                "timestamp": "2022-06-08T21:05:49.230688Z",
                "info": {
                    "source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
                    "error_code": 125,
                    "message": "failed to sync bucket instance: (125) Operation 
canceled"
                }
            }
        ]
    }




________________________________
From: Amit Ghadge <amitg....@gmail.com>
Sent: Wednesday, June 8, 2022 9:16 PM
To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>
Subject: Re: radosgw multisite sync - how to fix data behind shards?

check any error by running command radosgw-admin data sync error list


-AmitG


On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll 
<wyllys.ingers...@keepertech.com<mailto:wyllys.ingers...@keepertech.com>> wrote:

Seeking help from a radosgw expert...

I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 
bucket per zone and a couple of small objects in each bucket for testing 
purposes.
One of the secondary zones cannot get seem to get into sync with the master, 
sync status reports:


  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1)
                        syncing
                        full sync: 128/128 shards
                        full sync: 66 buckets to sync
                        incremental sync: 0/128 shards
                        data is behind on 128 shards
                        behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]


I have tried using "data sync init" and restarting the radosgw multiple times, 
but that does not seem to be helping in any way.

If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just 
hangs forever and doesn't appear to do anything.  Checking the sync status 
never shows any improvement in the shards.

It is very hard to figure out what to do as there are a several sync commands - 
 bucket sync, data sync, metadata sync  - and it is not clear what effect they 
have or how to properly run them when the syncing gets confused.

Any guidance on how to get out of this situation would be greatly appreciated.  
I've read lots of threads on various mailing list archives (via google search) 
and very few of them have any sort of resolution or recommendation that is 
confirmed to have fixed these sort of problems.


_______________________________________________
Dev mailing list -- d...@ceph.io<mailto:d...@ceph.io>
To unsubscribe send an email to dev-le...@ceph.io<mailto:dev-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: radosgw multisite sync - how to fix data behind shards?

Reply via email to