After restarting the 2 RGW daemons on the second site again, everything caught up on the metadata sync. Is there something about having 2 RGW daemons on each side of the multisite that might be causing an issue with the sync getting stale? I have another realm set up the same way that is having a hard time with its data shards being behind. I haven't told them to resync, but yesterday I noticed 90 shards were behind. It's caught back up to only 17 shards behind, but the oldest change not applied is 2 months old and no order of restarting RGW daemons is helping to resolve this.
On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com> wrote: > I have a RGW Multisite 10.2.7 set up for bi-directional syncing. This has > been operational for 5 months and working fine. I recently created a new > user on the master zone, used that user to create a bucket, and put in a > public-acl object in there. The Bucket created on the second site, but the > user did not and the object errors out complaining about the access_key not > existing. > > That led me to think that the metadata isn't syncing, while bucket and > data both are. I've also confirmed that data is syncing for other buckets > as well in both directions. The sync status from the second site was this. > > > 1. > > metadata sync syncing > > 2. > > full sync: 0/64 shards > > 3. > > incremental sync: 64/64 shards > > 4. > > metadata is caught up with master > > 5. > > data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01) > > 6. > > syncing > > 7. > > full sync: 0/128 shards > > 8. > > incremental sync: 128/128 shards > > 9. > > data is caught up with source > > > > Sync status leads me to think that the second site believes it is up to > date, even though it is missing a freshly created user. I restarted all of > the rgw daemons for the zonegroup, but it didn't trigger anything to fix > the missing user in the second site. I did some googling and found the > sync init commands mentioned in a few ML posts and used metadata sync init > and now have this as the sync status. > > > 1. > > metadata sync preparing for full sync > > 2. > > full sync: 64/64 shards > > 3. > > full sync: 0 entries to sync > > 4. > > incremental sync: 0/64 shards > > 5. > > metadata is behind on 70 shards > > 6. > > oldest incremental change not applied: 2017-03-01 > 21:13:43.0.126971s > > 7. > > data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01) > > 8. > > syncing > > 9. > > full sync: 0/128 shards > > 10. > > incremental sync: 128/128 shards > > 11. > > data is caught up with source > > > > It definitely triggered a fresh sync and told it to forget about what it's > previously applied as the date of the oldest change not applied is the day > we initially set up multisite for this zone. The problem is that was over > 12 hours ago and the sync stat hasn't caught up on any shards yet. > > Does anyone have any suggestions other than blast the second site and set > it back up with a fresh start (the only option I can think of at this > point)? > > Thank you, > David Turner >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com