Hi David,
The 'radosgw-admin sync error list' command may be useful in debugging
sync failures for specific entries. For users, we've seen some sync
failures caused by conflicting user metadata that was only present on
the secondary site. For example, a user that had the same access key or
email address, which we require to be unique.
Running multiple gateways on the same zone is fully supported, and
unlikely to cause these kinds of issues.
On 08/24/2017 01:51 PM, David Turner wrote:
After restarting the 2 RGW daemons on the second site again,
everything caught up on the metadata sync. Is there something about
having 2 RGW daemons on each side of the multisite that might be
causing an issue with the sync getting stale? I have another realm
set up the same way that is having a hard time with its data shards
being behind. I haven't told them to resync, but yesterday I noticed
90 shards were behind. It's caught back up to only 17 shards behind,
but the oldest change not applied is 2 months old and no order of
restarting RGW daemons is helping to resolve this.
On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com
<mailto:drakonst...@gmail.com>> wrote:
I have a RGW Multisite 10.2.7 set up for bi-directional syncing.
This has been operational for 5 months and working fine. I
recently created a new user on the master zone, used that user to
create a bucket, and put in a public-acl object in there. The
Bucket created on the second site, but the user did not and the
object errors out complaining about the access_key not existing.
That led me to think that the metadata isn't syncing, while bucket
and data both are. I've also confirmed that data is syncing for
other buckets as well in both directions. The sync status from the
second site was this.
1.
metadata sync syncing
2.
full sync:0/64shards
3.
incremental sync:64/64shards
4.
metadata iscaught up withmaster
5.
data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)
6.
syncing
7.
full sync:0/128shards
8.
incremental sync:128/128shards
9.
data iscaught up withsource
Sync status leads me to think that the second site believes it is
up to date, even though it is missing a freshly created user. I
restarted all of the rgw daemons for the zonegroup, but it didn't
trigger anything to fix the missing user in the second site. I
did some googling and found the sync init commands mentioned in a
few ML posts and used metadata sync init and now have this as the
sync status.
1.
metadata sync preparing forfull sync
2.
full sync:64/64shards
3.
full sync:0entries to sync
4.
incremental sync:0/64shards
5.
metadata isbehind on 70shards
6.
oldest incremental change notapplied:2017-03-0121:13:43.0.126971s
7.
data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)
8.
syncing
9.
full sync:0/128shards
10.
incremental sync:128/128shards
11.
data iscaught up withsource
It definitely triggered a fresh sync and told it to forget about
what it's previously applied as the date of the oldest change not
applied is the day we initially set up multisite for this zone.
The problem is that was over 12 hours ago and the sync stat hasn't
caught up on any shards yet.
Does anyone have any suggestions other than blast the second site
and set it back up with a fresh start (the only option I can think
of at this point)?
Thank you,
David Turner
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com