I created a test user named 'ice' and then used it to create a bucket named ice. The bucket ice can be found in the second dc, but not the user. `mdlog list` showed ice for the bucket, but not for the user. I performed the same test in the internal realm and it showed the user and bucket both in `mdlog list`.
On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com> wrote: > On Thu, Sep 7, 2017 at 10:04 PM, David Turner <drakonst...@gmail.com> > wrote: > > One realm is called public with a zonegroup called public-zg with a zone > for > > each datacenter. The second realm is called internal with a zonegroup > > called internal-zg with a zone for each datacenter. they each have their > > own rgw's and load balancers. The needs of our public facing rgw's and > load > > balancers vs internal use ones was different enough that we split them up > > completely. We also have a local realm that does not use multisite and a > > 4th realm called QA that mimics the public realm as much as possible for > > staging configuration stages for the rgw daemons. All 4 realms have > their > > own buckets, users, etc and that is all working fine. For all of the > > radosgw-admin commands I am using the proper identifiers to make sure > that > > each datacenter and realm are running commands on exactly what I expect > them > > to (--rgw-realm=public --rgw-zonegroup=public-zg --rgw-zone=public-dc1 > > --source-zone=public-dc2). > > > > The data sync issue was in the internal realm but running a data sync > init > > and kickstarting the rgw daemons in each datacenter fixed the data > > discrepancies (I'm thinking it had something to do with a power failure a > > few months back that I just noticed recently). The metadata sync issue > is > > in the public realm. I have no idea what is causing this to not sync > > properly since running a `metadata sync init` catches it back up to the > > primary zone, but then it doesn't receive any new users created after > that. > > > > Sounds like an issue with the metadata log in the primary master zone. > Not sure what could go wrong there, but maybe the master zone doesn't > know that it is a master zone, or it's set to not log metadata. Or > maybe there's a problem when the secondary is trying to fetch the > metadata log. Maybe some kind of # of shards mismatch (though not > likely). > Try to see if the master logs any changes: should use the > 'radosgw-admin mdlog list' command. > > Yehuda > > > On Thu, Sep 7, 2017 at 2:52 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com> > > wrote: > >> > >> On Thu, Sep 7, 2017 at 7:44 PM, David Turner <drakonst...@gmail.com> > >> wrote: > >> > Ok, I've been testing, investigating, researching, etc for the last > week > >> > and > >> > I don't have any problems with data syncing. The clients on one side > >> > are > >> > creating multipart objects while the multisite sync is creating them > as > >> > whole objects and one of the datacenters is slower at cleaning up the > >> > shadow > >> > files. That's the big discrepancy between object counts in the pools > >> > between datacenters. I created a tool that goes through for each > bucket > >> > in > >> > a realm and does a recursive listing of all objects in it for both > >> > datacenters and compares the 2 lists for any differences. The data is > >> > definitely in sync between the 2 datacenters down to the modified time > >> > and > >> > byte of each file in s3. > >> > > >> > The metadata is still not syncing for the other realm, though. If I > run > >> > `metadata sync init` then the second datacenter will catch up with all > >> > of > >> > the new users, but until I do that newly created users on the primary > >> > side > >> > don't exist on the secondary side. `metadata sync status`, `sync > >> > status`, > >> > `metadata sync run` (only left running for 30 minutes before I ctrl+c > >> > it), > >> > etc don't show any problems... but the new users just don't exist on > the > >> > secondary side until I run `metadata sync init`. I created a new > bucket > >> > with the new user and the bucket shows up in the second datacenter, > but > >> > no > >> > objects because the objects don't have a valid owner. > >> > > >> > Thank you all for the help with the data sync issue. You pushed me > into > >> > good directions. Does anyone have any insight as to what is > preventing > >> > the > >> > metadata from syncing in the other realm? I have 2 realms being sync > >> > using > >> > multi-site and it's only 1 of them that isn't getting the metadata > >> > across. > >> > As far as I can tell it is configured identically. > >> > >> What do you mean you have two realms? Zones and zonegroups need to > >> exist in the same realm in order for meta and data sync to happen > >> correctly. Maybe I'm misunderstanding. > >> > >> Yehuda > >> > >> > > >> > On Thu, Aug 31, 2017 at 12:46 PM David Turner <drakonst...@gmail.com> > >> > wrote: > >> >> > >> >> All of the messages from sync error list are listed below. The > number > >> >> on > >> >> the left is how many times the error message is found. > >> >> > >> >> 1811 "message": "failed to sync bucket > instance: > >> >> (16) Device or resource busy" > >> >> 7 "message": "failed to sync bucket > instance: > >> >> (5) Input\/output error" > >> >> 65 "message": "failed to sync object" > >> >> > >> >> On Tue, Aug 29, 2017 at 10:00 AM Orit Wasserman <owass...@redhat.com > > > >> >> wrote: > >> >>> > >> >>> > >> >>> Hi David, > >> >>> > >> >>> On Mon, Aug 28, 2017 at 8:33 PM, David Turner < > drakonst...@gmail.com> > >> >>> wrote: > >> >>>> > >> >>>> The vast majority of the sync error list is "failed to sync bucket > >> >>>> instance: (16) Device or resource busy". I can't find anything on > >> >>>> Google > >> >>>> about this error message in relation to Ceph. Does anyone have any > >> >>>> idea > >> >>>> what this means? and/or how to fix it? > >> >>> > >> >>> > >> >>> Those are intermediate errors resulting from several radosgw trying > to > >> >>> acquire the same sync log shard lease. It doesn't effect the sync > >> >>> progress. > >> >>> Are there any other errors? > >> >>> > >> >>> Orit > >> >>>> > >> >>>> > >> >>>> On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley <cbod...@redhat.com> > >> >>>> wrote: > >> >>>>> > >> >>>>> Hi David, > >> >>>>> > >> >>>>> The 'data sync init' command won't touch any actual object data, > no. > >> >>>>> Resetting the data sync status will just cause a zone to restart a > >> >>>>> full sync > >> >>>>> of the --source-zone's data changes log. This log only lists which > >> >>>>> buckets/shards have changes in them, which causes radosgw to > >> >>>>> consider them > >> >>>>> for bucket sync. So while the command may silence the warnings > about > >> >>>>> data > >> >>>>> shards being behind, it's unlikely to resolve the issue with > missing > >> >>>>> objects > >> >>>>> in those buckets. > >> >>>>> > >> >>>>> When data sync is behind for an extended period of time, it's > >> >>>>> usually > >> >>>>> because it's stuck retrying previous bucket sync failures. The > 'sync > >> >>>>> error > >> >>>>> list' may help narrow down where those failures are. > >> >>>>> > >> >>>>> There is also a 'bucket sync init' command to clear the bucket > sync > >> >>>>> status. Following that with a 'bucket sync run' should restart a > >> >>>>> full sync > >> >>>>> on the bucket, pulling in any new objects that are present on the > >> >>>>> source-zone. I'm afraid that those commands haven't seen a lot of > >> >>>>> polish or > >> >>>>> testing, however. > >> >>>>> > >> >>>>> Casey > >> >>>>> > >> >>>>> > >> >>>>> On 08/24/2017 04:15 PM, David Turner wrote: > >> >>>>> > >> >>>>> Apparently the data shards that are behind go in both directions, > >> >>>>> but > >> >>>>> only one zone is aware of the problem. Each cluster has objects > in > >> >>>>> their > >> >>>>> data pool that the other doesn't have. I'm thinking about > >> >>>>> initiating a > >> >>>>> `data sync init` on both sides (one at a time) to get them back on > >> >>>>> the same > >> >>>>> page. Does anyone know if that command will overwrite any local > >> >>>>> data that > >> >>>>> the zone has that the other doesn't if you run `data sync init` on > >> >>>>> it? > >> >>>>> > >> >>>>> On Thu, Aug 24, 2017 at 1:51 PM David Turner < > drakonst...@gmail.com> > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> After restarting the 2 RGW daemons on the second site again, > >> >>>>>> everything caught up on the metadata sync. Is there something > >> >>>>>> about having > >> >>>>>> 2 RGW daemons on each side of the multisite that might be causing > >> >>>>>> an issue > >> >>>>>> with the sync getting stale? I have another realm set up the > same > >> >>>>>> way that > >> >>>>>> is having a hard time with its data shards being behind. I > haven't > >> >>>>>> told > >> >>>>>> them to resync, but yesterday I noticed 90 shards were behind. > >> >>>>>> It's caught > >> >>>>>> back up to only 17 shards behind, but the oldest change not > applied > >> >>>>>> is 2 > >> >>>>>> months old and no order of restarting RGW daemons is helping to > >> >>>>>> resolve > >> >>>>>> this. > >> >>>>>> > >> >>>>>> On Thu, Aug 24, 2017 at 10:59 AM David Turner > >> >>>>>> <drakonst...@gmail.com> > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> I have a RGW Multisite 10.2.7 set up for bi-directional syncing. > >> >>>>>>> This has been operational for 5 months and working fine. I > >> >>>>>>> recently created > >> >>>>>>> a new user on the master zone, used that user to create a > bucket, > >> >>>>>>> and put in > >> >>>>>>> a public-acl object in there. The Bucket created on the second > >> >>>>>>> site, but > >> >>>>>>> the user did not and the object errors out complaining about the > >> >>>>>>> access_key > >> >>>>>>> not existing. > >> >>>>>>> > >> >>>>>>> That led me to think that the metadata isn't syncing, while > bucket > >> >>>>>>> and data both are. I've also confirmed that data is syncing for > >> >>>>>>> other > >> >>>>>>> buckets as well in both directions. The sync status from the > >> >>>>>>> second site was > >> >>>>>>> this. > >> >>>>>>> > >> >>>>>>> metadata sync syncing > >> >>>>>>> > >> >>>>>>> full sync: 0/64 shards > >> >>>>>>> > >> >>>>>>> incremental sync: 64/64 shards > >> >>>>>>> > >> >>>>>>> metadata is caught up with master > >> >>>>>>> > >> >>>>>>> data sync source: f4c12327-4721-47c9-a365-86332d84c227 > >> >>>>>>> (public-atl01) > >> >>>>>>> > >> >>>>>>> syncing > >> >>>>>>> > >> >>>>>>> full sync: 0/128 shards > >> >>>>>>> > >> >>>>>>> incremental sync: 128/128 shards > >> >>>>>>> > >> >>>>>>> data is caught up with source > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Sync status leads me to think that the second site believes it > is > >> >>>>>>> up > >> >>>>>>> to date, even though it is missing a freshly created user. I > >> >>>>>>> restarted all > >> >>>>>>> of the rgw daemons for the zonegroup, but it didn't trigger > >> >>>>>>> anything to fix > >> >>>>>>> the missing user in the second site. I did some googling and > >> >>>>>>> found the sync > >> >>>>>>> init commands mentioned in a few ML posts and used metadata sync > >> >>>>>>> init and > >> >>>>>>> now have this as the sync status. > >> >>>>>>> > >> >>>>>>> metadata sync preparing for full sync > >> >>>>>>> > >> >>>>>>> full sync: 64/64 shards > >> >>>>>>> > >> >>>>>>> full sync: 0 entries to sync > >> >>>>>>> > >> >>>>>>> incremental sync: 0/64 shards > >> >>>>>>> > >> >>>>>>> metadata is behind on 70 shards > >> >>>>>>> > >> >>>>>>> oldest incremental change not applied: > 2017-03-01 > >> >>>>>>> 21:13:43.0.126971s > >> >>>>>>> > >> >>>>>>> data sync source: f4c12327-4721-47c9-a365-86332d84c227 > >> >>>>>>> (public-atl01) > >> >>>>>>> > >> >>>>>>> syncing > >> >>>>>>> > >> >>>>>>> full sync: 0/128 shards > >> >>>>>>> > >> >>>>>>> incremental sync: 128/128 shards > >> >>>>>>> > >> >>>>>>> data is caught up with source > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> It definitely triggered a fresh sync and told it to forget about > >> >>>>>>> what > >> >>>>>>> it's previously applied as the date of the oldest change not > >> >>>>>>> applied is the > >> >>>>>>> day we initially set up multisite for this zone. The problem is > >> >>>>>>> that was > >> >>>>>>> over 12 hours ago and the sync stat hasn't caught up on any > shards > >> >>>>>>> yet. > >> >>>>>>> > >> >>>>>>> Does anyone have any suggestions other than blast the second > site > >> >>>>>>> and > >> >>>>>>> set it back up with a fresh start (the only option I can think > of > >> >>>>>>> at this > >> >>>>>>> point)? > >> >>>>>>> > >> >>>>>>> Thank you, > >> >>>>>>> David Turner > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> ceph-users mailing list > >> >>>>> ceph-users@lists.ceph.com > >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >>>>> > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> ceph-users mailing list > >> >>>>> ceph-users@lists.ceph.com > >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >>>> > >> >>>> > >> >>>> _______________________________________________ > >> >>>> ceph-users mailing list > >> >>>> ceph-users@lists.ceph.com > >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >>>> > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com