Hi David, On Mon, Aug 28, 2017 at 8:33 PM, David Turner <drakonst...@gmail.com> wrote:
> The vast majority of the sync error list is "failed to sync bucket > instance: (16) Device or resource busy". I can't find anything on Google > about this error message in relation to Ceph. Does anyone have any idea > what this means? and/or how to fix it? > Those are intermediate errors resulting from several radosgw trying to acquire the same sync log shard lease. It doesn't effect the sync progress. Are there any other errors? Orit > > On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley <cbod...@redhat.com> wrote: > >> Hi David, >> >> The 'data sync init' command won't touch any actual object data, no. >> Resetting the data sync status will just cause a zone to restart a full >> sync of the --source-zone's data changes log. This log only lists which >> buckets/shards have changes in them, which causes radosgw to consider them >> for bucket sync. So while the command may silence the warnings about data >> shards being behind, it's unlikely to resolve the issue with missing >> objects in those buckets. >> >> When data sync is behind for an extended period of time, it's usually >> because it's stuck retrying previous bucket sync failures. The 'sync error >> list' may help narrow down where those failures are. >> >> There is also a 'bucket sync init' command to clear the bucket sync >> status. Following that with a 'bucket sync run' should restart a full sync >> on the bucket, pulling in any new objects that are present on the >> source-zone. I'm afraid that those commands haven't seen a lot of polish or >> testing, however. >> >> Casey >> >> On 08/24/2017 04:15 PM, David Turner wrote: >> >> Apparently the data shards that are behind go in both directions, but >> only one zone is aware of the problem. Each cluster has objects in their >> data pool that the other doesn't have. I'm thinking about initiating a >> `data sync init` on both sides (one at a time) to get them back on the same >> page. Does anyone know if that command will overwrite any local data that >> the zone has that the other doesn't if you run `data sync init` on it? >> >> On Thu, Aug 24, 2017 at 1:51 PM David Turner <drakonst...@gmail.com> >> wrote: >> >>> After restarting the 2 RGW daemons on the second site again, everything >>> caught up on the metadata sync. Is there something about having 2 RGW >>> daemons on each side of the multisite that might be causing an issue with >>> the sync getting stale? I have another realm set up the same way that is >>> having a hard time with its data shards being behind. I haven't told them >>> to resync, but yesterday I noticed 90 shards were behind. It's caught back >>> up to only 17 shards behind, but the oldest change not applied is 2 months >>> old and no order of restarting RGW daemons is helping to resolve this. >>> >>> On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com> >>> wrote: >>> >>>> I have a RGW Multisite 10.2.7 set up for bi-directional syncing. This >>>> has been operational for 5 months and working fine. I recently created a >>>> new user on the master zone, used that user to create a bucket, and put in >>>> a public-acl object in there. The Bucket created on the second site, but >>>> the user did not and the object errors out complaining about the access_key >>>> not existing. >>>> >>>> That led me to think that the metadata isn't syncing, while bucket and >>>> data both are. I've also confirmed that data is syncing for other buckets >>>> as well in both directions. The sync status from the second site was this. >>>> >>>> >>>> 1. >>>> >>>> metadata sync syncing >>>> >>>> 2. >>>> >>>> full sync: 0/64 shards >>>> >>>> 3. >>>> >>>> incremental sync: 64/64 shards >>>> >>>> 4. >>>> >>>> metadata is caught up with master >>>> >>>> 5. >>>> >>>> data sync source: f4c12327-4721-47c9-a365-86332d84c227 >>>> (public-atl01) >>>> >>>> 6. >>>> >>>> syncing >>>> >>>> 7. >>>> >>>> full sync: 0/128 shards >>>> >>>> 8. >>>> >>>> incremental sync: 128/128 shards >>>> >>>> 9. >>>> >>>> data is caught up with source >>>> >>>> >>>> >>>> Sync status leads me to think that the second site believes it is up to >>>> date, even though it is missing a freshly created user. I restarted all of >>>> the rgw daemons for the zonegroup, but it didn't trigger anything to fix >>>> the missing user in the second site. I did some googling and found the >>>> sync init commands mentioned in a few ML posts and used metadata sync init >>>> and now have this as the sync status. >>>> >>>> >>>> 1. >>>> >>>> metadata sync preparing for full sync >>>> >>>> 2. >>>> >>>> full sync: 64/64 shards >>>> >>>> 3. >>>> >>>> full sync: 0 entries to sync >>>> >>>> 4. >>>> >>>> incremental sync: 0/64 shards >>>> >>>> 5. >>>> >>>> metadata is behind on 70 shards >>>> >>>> 6. >>>> >>>> oldest incremental change not applied: 2017-03-01 >>>> 21:13:43.0.126971s >>>> >>>> 7. >>>> >>>> data sync source: f4c12327-4721-47c9-a365-86332d84c227 >>>> (public-atl01) >>>> >>>> 8. >>>> >>>> syncing >>>> >>>> 9. >>>> >>>> full sync: 0/128 shards >>>> >>>> 10. >>>> >>>> incremental sync: 128/128 shards >>>> >>>> 11. >>>> >>>> data is caught up with source >>>> >>>> >>>> >>>> It definitely triggered a fresh sync and told it to forget about what >>>> it's previously applied as the date of the oldest change not applied is the >>>> day we initially set up multisite for this zone. The problem is that was >>>> over 12 hours ago and the sync stat hasn't caught up on any shards yet. >>>> >>>> Does anyone have any suggestions other than blast the second site and >>>> set it back up with a fresh start (the only option I can think of at this >>>> point)? >>>> >>>> Thank you, >>>> David Turner >>>> >>> >> >> _______________________________________________ >> ceph-users mailing >> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com