Andreas, did you find a solution to your multisite sync issues with the
stuck shards?  I'm also on 10.2.7 and having this problem.  One realm has
stuck shards for data sync and another realm says it's up to date, but
isn't receiving new users via metadata sync.  I ran metadata sync init on
it and it had all up to date metadata information when it finished, but
then new users weren't synced again.  I don't know what to do  to get these
working stably.  There are 2 RGW's for each realm in each zone in
master/master allowing data to sync in both directions.

On Mon, Jun 5, 2017 at 3:05 AM Andreas Calminder <
andreas.calmin...@klarna.com> wrote:

> Hello,
> I'm using Ceph jewel (10.2.7) and as far as I know I'm using the jewel
> multisite setup (multiple zones) as described here
> http://docs.ceph.com/docs/master/radosgw/multisite/ and two ceph
> clusters, one in each site. Stretching clusters over multiple sites
> are seldom/never worth the hassle in my opinion. The reason the
> replication ended up in a bad state, seems to be a mix of multiple
> issues, first it's that if you shove a lot of objects into a bucket
> +1M the bucket index starts to drag the rados gateways down, there's
> also some kind of memory leak in rgw when the sync has failed
> http://tracker.ceph.com/issues/19446, causing the rgw daemons to die
> left and right due to out of memory errors and some times also other
> parts of the system would be dragged down with them.
>
> On 4 June 2017 at 22:22,  <ceph.nov...@habmalnefrage.de> wrote:
> > Hi Andreas.
> >
> > Well, we do _NOT_ need multiside in our environment, but unfortunately
> is is the basis for the announced "metasearch", based on ElasticSearch...
> so we try to implement a "multisite" config on Kraken (v11.2.0) since
> weeks, but never succeeded so far. We have purged and started all over with
> the multiside config for about ~5x by now.
> >
> > We have one CEPH cluster with two RadosGW's on top (so NOT two CEPH
> cluster!), not sure if this makes a difference!?
> >
> > Can you please share some infos about your (former working?!?) setup?
> Like
> > - which CEPH version are you on
> > - old deprecated "federated" or "new from Jewel" multiside setup
> > - one or multiple CEPH clusters
> >
> > Great to see that multisite seems to work somehow somewhere. We were
> really in doubt :O
> >
> > Thanks & regards
> >  Anton
> >
> > P.S.: If someone reads this, who has a working "one Kraken CEPH cluster"
> based multisite setup (or, let me dream, even a working ElasticSearch setup
> :| ) please step out of the dark and enlighten us :O
> >
> > Gesendet: Dienstag, 30. Mai 2017 um 11:02 Uhr
> > Von: "Andreas Calminder" <andreas.calmin...@klarna.com>
> > An: ceph-users@lists.ceph.com
> > Betreff: [ceph-users] RGW multisite sync data sync shard stuck
> > Hello,
> > I've got a sync issue with my multisite setup. There's 2 zones in 1
> > zone group in 1 realm. The data sync in the non-master zone have stuck
> > on Incremental sync is behind by 1 shard, this wasn't noticed until
> > the radosgw instances in the master zone started dying from out of
> > memory issues, all radosgw instances in the non-master zone was then
> > shutdown to ensure services in the master zone while trying to
> > troubleshoot the issue.
> >
> > From the rgw logs in the master zone I see entries like:
> >
> > 2017-05-29 16:10:34.717988 7fbbc1ffb700 0 ERROR: failed to sync
> > object:
> 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_1.ext
> > 2017-05-29 16:10:34.718016 7fbbc1ffb700 0 ERROR: failed to sync
> > object:
> 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_2.ext
> > 2017-05-29 16:10:34.718504 7fbbc1ffb700 0 ERROR: failed to fetch
> > remote data log info: ret=-5
> > 2017-05-29 16:10:34.719443 7fbbc1ffb700 0 ERROR: a sync operation
> > returned error
> > 2017-05-29 16:10:34.720291 7fbc167f4700 0 store->fetch_remote_obj()
> > returned r=-5
> >
> > sync status in the non-master zone reports that the metadata is up to
> > sync and that the data sync is behind on 1 shard and that the oldest
> > incremental change not applied is about 2 weeks back.
> >
> > I'm not quite sure how to proceed, is there a way to find out the id
> > of the shard and force some kind of re-sync of the data in it from the
> > master zone? I'm unable to have the non-master zone rgw's running
> > because it'll leave the master zone in a bad state with rgw dying
> > every now and then.
> >
> > Regards,
> > Andreas
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
>
> --
> Andreas Calminder
> System Administrator
> IT Operations Core Services
>
> Klarna AB (publ)
> Sveavägen 46, 111 34 Stockholm
> Tel: +46 8 120 120 00 <+46%208%20120%20120%2000>
> Reg no: 556737-0431
> klarna.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to