After restarting the 2 RGW daemons on the second site again, everything
caught up on the metadata sync.  Is there something about having 2 RGW
daemons on each side of the multisite that might be causing an issue with
the sync getting stale?  I have another realm set up the same way that is
having a hard time with its data shards being behind.  I haven't told them
to resync, but yesterday I noticed 90 shards were behind.  It's caught back
up to only 17 shards behind, but the oldest change not applied is 2 months
old and no order of restarting RGW daemons is helping to resolve this.

On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com> wrote:

> I have a RGW Multisite 10.2.7 set up for bi-directional syncing.  This has
> been operational for 5 months and working fine.  I recently created a new
> user on the master zone, used that user to create a bucket, and put in a
> public-acl object in there.  The Bucket created on the second site, but the
> user did not and the object errors out complaining about the access_key not
> existing.
>
> That led me to think that the metadata isn't syncing, while bucket and
> data both are.  I've also confirmed that data is syncing for other buckets
> as well in both directions. The sync status from the second site was this.
>
>
>    1.
>
>      metadata sync syncing
>
>    2.
>
>                    full sync: 0/64 shards
>
>    3.
>
>                    incremental sync: 64/64 shards
>
>    4.
>
>                    metadata is caught up with master
>
>    5.
>
>          data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01)
>
>    6.
>
>                            syncing
>
>    7.
>
>                            full sync: 0/128 shards
>
>    8.
>
>                            incremental sync: 128/128 shards
>
>    9.
>
>                            data is caught up with source
>
>
>
> Sync status leads me to think that the second site believes it is up to
> date, even though it is missing a freshly created user.  I restarted all of
> the rgw daemons for the zonegroup, but it didn't trigger anything to fix
> the missing user in the second site.  I did some googling and found the
> sync init commands mentioned in a few ML posts and used metadata sync init
> and now have this as the sync status.
>
>
>    1.
>
>      metadata sync preparing for full sync
>
>    2.
>
>                    full sync: 64/64 shards
>
>    3.
>
>                    full sync: 0 entries to sync
>
>    4.
>
>                    incremental sync: 0/64 shards
>
>    5.
>
>                    metadata is behind on 70 shards
>
>    6.
>
>                    oldest incremental change not applied: 2017-03-01 
> 21:13:43.0.126971s
>
>    7.
>
>          data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01)
>
>    8.
>
>                            syncing
>
>    9.
>
>                            full sync: 0/128 shards
>
>    10.
>
>                            incremental sync: 128/128 shards
>
>    11.
>
>                            data is caught up with source
>
>
>
> It definitely triggered a fresh sync and told it to forget about what it's
> previously applied as the date of the oldest change not applied is the day
> we initially set up multisite for this zone.  The problem is that was over
> 12 hours ago and the sync stat hasn't caught up on any shards yet.
>
> Does anyone have any suggestions other than blast the second site and set
> it back up with a fresh start (the only option I can think of at this
> point)?
>
> Thank you,
> David Turner
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to