Apparently the data shards that are behind go in both directions, but only
one zone is aware of the problem.  Each cluster has objects in their data
pool that the other doesn't have.  I'm thinking about initiating a `data
sync init` on both sides (one at a time) to get them back on the same
page.  Does anyone know if that command will overwrite any local data that
the zone has that the other doesn't if you run `data sync init` on it?

On Thu, Aug 24, 2017 at 1:51 PM David Turner <drakonst...@gmail.com> wrote:

> After restarting the 2 RGW daemons on the second site again, everything
> caught up on the metadata sync.  Is there something about having 2 RGW
> daemons on each side of the multisite that might be causing an issue with
> the sync getting stale?  I have another realm set up the same way that is
> having a hard time with its data shards being behind.  I haven't told them
> to resync, but yesterday I noticed 90 shards were behind.  It's caught back
> up to only 17 shards behind, but the oldest change not applied is 2 months
> old and no order of restarting RGW daemons is helping to resolve this.
>
> On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com>
> wrote:
>
>> I have a RGW Multisite 10.2.7 set up for bi-directional syncing.  This
>> has been operational for 5 months and working fine.  I recently created a
>> new user on the master zone, used that user to create a bucket, and put in
>> a public-acl object in there.  The Bucket created on the second site, but
>> the user did not and the object errors out complaining about the access_key
>> not existing.
>>
>> That led me to think that the metadata isn't syncing, while bucket and
>> data both are.  I've also confirmed that data is syncing for other buckets
>> as well in both directions. The sync status from the second site was this.
>>
>>
>>    1.
>>
>>      metadata sync syncing
>>
>>    2.
>>
>>                    full sync: 0/64 shards
>>
>>    3.
>>
>>                    incremental sync: 64/64 shards
>>
>>    4.
>>
>>                    metadata is caught up with master
>>
>>    5.
>>
>>          data sync source: f4c12327-4721-47c9-a365-86332d84c227 
>> (public-atl01)
>>
>>    6.
>>
>>                            syncing
>>
>>    7.
>>
>>                            full sync: 0/128 shards
>>
>>    8.
>>
>>                            incremental sync: 128/128 shards
>>
>>    9.
>>
>>                            data is caught up with source
>>
>>
>>
>> Sync status leads me to think that the second site believes it is up to
>> date, even though it is missing a freshly created user.  I restarted all of
>> the rgw daemons for the zonegroup, but it didn't trigger anything to fix
>> the missing user in the second site.  I did some googling and found the
>> sync init commands mentioned in a few ML posts and used metadata sync init
>> and now have this as the sync status.
>>
>>
>>    1.
>>
>>      metadata sync preparing for full sync
>>
>>    2.
>>
>>                    full sync: 64/64 shards
>>
>>    3.
>>
>>                    full sync: 0 entries to sync
>>
>>    4.
>>
>>                    incremental sync: 0/64 shards
>>
>>    5.
>>
>>                    metadata is behind on 70 shards
>>
>>    6.
>>
>>                    oldest incremental change not applied: 2017-03-01 
>> 21:13:43.0.126971s
>>
>>    7.
>>
>>          data sync source: f4c12327-4721-47c9-a365-86332d84c227 
>> (public-atl01)
>>
>>    8.
>>
>>                            syncing
>>
>>    9.
>>
>>                            full sync: 0/128 shards
>>
>>    10.
>>
>>                            incremental sync: 128/128 shards
>>
>>    11.
>>
>>                            data is caught up with source
>>
>>
>>
>> It definitely triggered a fresh sync and told it to forget about what
>> it's previously applied as the date of the oldest change not applied is the
>> day we initially set up multisite for this zone.  The problem is that was
>> over 12 hours ago and the sync stat hasn't caught up on any shards yet.
>>
>> Does anyone have any suggestions other than blast the second site and set
>> it back up with a fresh start (the only option I can think of at this
>> point)?
>>
>> Thank you,
>> David Turner
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to