Hi David,
The 'data sync init' command won't touch any actual object data, no.
Resetting the data sync status will just cause a zone to restart a full
sync of the --source-zone's data changes log. This log only lists which
buckets/shards have changes in them, which causes radosgw to consider
them for bucket sync. So while the command may silence the warnings
about data shards being behind, it's unlikely to resolve the issue with
missing objects in those buckets.
When data sync is behind for an extended period of time, it's usually
because it's stuck retrying previous bucket sync failures. The 'sync
error list' may help narrow down where those failures are.
There is also a 'bucket sync init' command to clear the bucket sync
status. Following that with a 'bucket sync run' should restart a full
sync on the bucket, pulling in any new objects that are present on the
source-zone. I'm afraid that those commands haven't seen a lot of polish
or testing, however.
Casey
On 08/24/2017 04:15 PM, David Turner wrote:
Apparently the data shards that are behind go in both directions, but
only one zone is aware of the problem. Each cluster has objects in
their data pool that the other doesn't have. I'm thinking about
initiating a `data sync init` on both sides (one at a time) to get
them back on the same page. Does anyone know if that command will
overwrite any local data that the zone has that the other doesn't if
you run `data sync init` on it?
On Thu, Aug 24, 2017 at 1:51 PM David Turner <drakonst...@gmail.com
<mailto:drakonst...@gmail.com>> wrote:
After restarting the 2 RGW daemons on the second site again,
everything caught up on the metadata sync. Is there something
about having 2 RGW daemons on each side of the multisite that
might be causing an issue with the sync getting stale? I have
another realm set up the same way that is having a hard time with
its data shards being behind. I haven't told them to resync, but
yesterday I noticed 90 shards were behind. It's caught back up to
only 17 shards behind, but the oldest change not applied is 2
months old and no order of restarting RGW daemons is helping to
resolve this.
On Thu, Aug 24, 2017 at 10:59 AM David Turner
<drakonst...@gmail.com <mailto:drakonst...@gmail.com>> wrote:
I have a RGW Multisite 10.2.7 set up for bi-directional
syncing. This has been operational for 5 months and working
fine. I recently created a new user on the master zone, used
that user to create a bucket, and put in a public-acl object
in there. The Bucket created on the second site, but the user
did not and the object errors out complaining about the
access_key not existing.
That led me to think that the metadata isn't syncing, while
bucket and data both are. I've also confirmed that data is
syncing for other buckets as well in both directions. The sync
status from the second site was this.
1.
metadata sync syncing
2.
full sync:0/64shards
3.
incremental sync:64/64shards
4.
metadata iscaught up withmaster
5.
data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)
6.
syncing
7.
full sync:0/128shards
8.
incremental sync:128/128shards
9.
data iscaught up withsource
Sync status leads me to think that the second site believes it
is up to date, even though it is missing a freshly created
user. I restarted all of the rgw daemons for the zonegroup,
but it didn't trigger anything to fix the missing user in the
second site. I did some googling and found the sync init
commands mentioned in a few ML posts and used metadata sync
init and now have this as the sync status.
1.
metadata sync preparing forfull sync
2.
full sync:64/64shards
3.
full sync:0entries to sync
4.
incremental sync:0/64shards
5.
metadata isbehind on 70shards
6.
oldest incremental change
notapplied:2017-03-0121:13:43.0.126971s
7.
data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)
8.
syncing
9.
full sync:0/128shards
10.
incremental sync:128/128shards
11.
data iscaught up withsource
It definitely triggered a fresh sync and told it to forget
about what it's previously applied as the date of the oldest
change not applied is the day we initially set up multisite
for this zone. The problem is that was over 12 hours ago and
the sync stat hasn't caught up on any shards yet.
Does anyone have any suggestions other than blast the second
site and set it back up with a fresh start (the only option I
can think of at this point)?
Thank you,
David Turner
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com