All of the messages from sync error list are listed below. The number on the left is how many times the error message is found.
1811 "message": "failed to sync bucket instance: (16) Device or resource busy" 7 "message": "failed to sync bucket instance: (5) Input\/output error" 65 "message": "failed to sync object" On Tue, Aug 29, 2017 at 10:00 AM Orit Wasserman <owass...@redhat.com> wrote: > > Hi David, > > On Mon, Aug 28, 2017 at 8:33 PM, David Turner <drakonst...@gmail.com> > wrote: > >> The vast majority of the sync error list is "failed to sync bucket >> instance: (16) Device or resource busy". I can't find anything on Google >> about this error message in relation to Ceph. Does anyone have any idea >> what this means? and/or how to fix it? >> > > Those are intermediate errors resulting from several radosgw trying to > acquire the same sync log shard lease. It doesn't effect the sync progress. > Are there any other errors? > > Orit > >> >> On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley <cbod...@redhat.com> wrote: >> >>> Hi David, >>> >>> The 'data sync init' command won't touch any actual object data, no. >>> Resetting the data sync status will just cause a zone to restart a full >>> sync of the --source-zone's data changes log. This log only lists which >>> buckets/shards have changes in them, which causes radosgw to consider them >>> for bucket sync. So while the command may silence the warnings about data >>> shards being behind, it's unlikely to resolve the issue with missing >>> objects in those buckets. >>> >>> When data sync is behind for an extended period of time, it's usually >>> because it's stuck retrying previous bucket sync failures. The 'sync error >>> list' may help narrow down where those failures are. >>> >>> There is also a 'bucket sync init' command to clear the bucket sync >>> status. Following that with a 'bucket sync run' should restart a full sync >>> on the bucket, pulling in any new objects that are present on the >>> source-zone. I'm afraid that those commands haven't seen a lot of polish or >>> testing, however. >>> >>> Casey >>> >>> On 08/24/2017 04:15 PM, David Turner wrote: >>> >>> Apparently the data shards that are behind go in both directions, but >>> only one zone is aware of the problem. Each cluster has objects in their >>> data pool that the other doesn't have. I'm thinking about initiating a >>> `data sync init` on both sides (one at a time) to get them back on the same >>> page. Does anyone know if that command will overwrite any local data that >>> the zone has that the other doesn't if you run `data sync init` on it? >>> >>> On Thu, Aug 24, 2017 at 1:51 PM David Turner <drakonst...@gmail.com> >>> wrote: >>> >>>> After restarting the 2 RGW daemons on the second site again, everything >>>> caught up on the metadata sync. Is there something about having 2 RGW >>>> daemons on each side of the multisite that might be causing an issue with >>>> the sync getting stale? I have another realm set up the same way that is >>>> having a hard time with its data shards being behind. I haven't told them >>>> to resync, but yesterday I noticed 90 shards were behind. It's caught back >>>> up to only 17 shards behind, but the oldest change not applied is 2 months >>>> old and no order of restarting RGW daemons is helping to resolve this. >>>> >>>> On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com> >>>> wrote: >>>> >>>>> I have a RGW Multisite 10.2.7 set up for bi-directional syncing. This >>>>> has been operational for 5 months and working fine. I recently created a >>>>> new user on the master zone, used that user to create a bucket, and put in >>>>> a public-acl object in there. The Bucket created on the second site, but >>>>> the user did not and the object errors out complaining about the >>>>> access_key >>>>> not existing. >>>>> >>>>> That led me to think that the metadata isn't syncing, while bucket and >>>>> data both are. I've also confirmed that data is syncing for other buckets >>>>> as well in both directions. The sync status from the second site was this. >>>>> >>>>> >>>>> 1. >>>>> >>>>> metadata sync syncing >>>>> >>>>> 2. >>>>> >>>>> full sync: 0/64 shards >>>>> >>>>> 3. >>>>> >>>>> incremental sync: 64/64 shards >>>>> >>>>> 4. >>>>> >>>>> metadata is caught up with master >>>>> >>>>> 5. >>>>> >>>>> data sync source: f4c12327-4721-47c9-a365-86332d84c227 >>>>> (public-atl01) >>>>> >>>>> 6. >>>>> >>>>> syncing >>>>> >>>>> 7. >>>>> >>>>> full sync: 0/128 shards >>>>> >>>>> 8. >>>>> >>>>> incremental sync: 128/128 shards >>>>> >>>>> 9. >>>>> >>>>> data is caught up with source >>>>> >>>>> >>>>> >>>>> Sync status leads me to think that the second site believes it is up >>>>> to date, even though it is missing a freshly created user. I restarted >>>>> all >>>>> of the rgw daemons for the zonegroup, but it didn't trigger anything to >>>>> fix >>>>> the missing user in the second site. I did some googling and found the >>>>> sync init commands mentioned in a few ML posts and used metadata sync init >>>>> and now have this as the sync status. >>>>> >>>>> >>>>> 1. >>>>> >>>>> metadata sync preparing for full sync >>>>> >>>>> 2. >>>>> >>>>> full sync: 64/64 shards >>>>> >>>>> 3. >>>>> >>>>> full sync: 0 entries to sync >>>>> >>>>> 4. >>>>> >>>>> incremental sync: 0/64 shards >>>>> >>>>> 5. >>>>> >>>>> metadata is behind on 70 shards >>>>> >>>>> 6. >>>>> >>>>> oldest incremental change not applied: 2017-03-01 >>>>> 21:13:43.0.126971s >>>>> >>>>> 7. >>>>> >>>>> data sync source: f4c12327-4721-47c9-a365-86332d84c227 >>>>> (public-atl01) >>>>> >>>>> 8. >>>>> >>>>> syncing >>>>> >>>>> 9. >>>>> >>>>> full sync: 0/128 shards >>>>> >>>>> 10. >>>>> >>>>> incremental sync: 128/128 shards >>>>> >>>>> 11. >>>>> >>>>> data is caught up with source >>>>> >>>>> >>>>> >>>>> It definitely triggered a fresh sync and told it to forget about what >>>>> it's previously applied as the date of the oldest change not applied is >>>>> the >>>>> day we initially set up multisite for this zone. The problem is that was >>>>> over 12 hours ago and the sync stat hasn't caught up on any shards yet. >>>>> >>>>> Does anyone have any suggestions other than blast the second site and >>>>> set it back up with a fresh start (the only option I can think of at this >>>>> point)? >>>>> >>>>> Thank you, >>>>> David Turner >>>>> >>>> >>> >>> _______________________________________________ >>> ceph-users mailing >>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com