Hi David,

The 'radosgw-admin sync error list' command may be useful in debugging sync failures for specific entries. For users, we've seen some sync failures caused by conflicting user metadata that was only present on the secondary site. For example, a user that had the same access key or email address, which we require to be unique.

Running multiple gateways on the same zone is fully supported, and unlikely to cause these kinds of issues.


On 08/24/2017 01:51 PM, David Turner wrote:
After restarting the 2 RGW daemons on the second site again, everything caught up on the metadata sync. Is there something about having 2 RGW daemons on each side of the multisite that might be causing an issue with the sync getting stale? I have another realm set up the same way that is having a hard time with its data shards being behind. I haven't told them to resync, but yesterday I noticed 90 shards were behind. It's caught back up to only 17 shards behind, but the oldest change not applied is 2 months old and no order of restarting RGW daemons is helping to resolve this.

On Thu, Aug 24, 2017 at 10:59 AM David Turner <drakonst...@gmail.com <mailto:drakonst...@gmail.com>> wrote:

I have a RGW Multisite 10.2.7 set up for bi-directional syncing. This has been operational for 5 months and working fine. I
    recently created a new user on the master zone, used that user to
    create a bucket, and put in a public-acl object in there.  The
    Bucket created on the second site, but the user did not and the
    object errors out complaining about the access_key not existing.

    That led me to think that the metadata isn't syncing, while bucket
    and data both are.  I've also confirmed that data is syncing for
    other buckets as well in both directions. The sync status from the
    second site was this.

    1.

        metadata sync syncing

    2.

        full sync:0/64shards

    3.

        incremental sync:64/64shards

    4.

        metadata iscaught up withmaster

    5.

        data sync
        source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

    6.

        syncing

    7.

        full sync:0/128shards

    8.

        incremental sync:128/128shards

    9.

        data iscaught up withsource


    Sync status leads me to think that the second site believes it is
    up to date, even though it is missing a freshly created user.  I
    restarted all of the rgw daemons for the zonegroup, but it didn't
    trigger anything to fix the missing user in the second site.  I
    did some googling and found the sync init commands mentioned in a
    few ML posts and used metadata sync init and now have this as the
    sync status.

    1.

        metadata sync preparing forfull sync

    2.

        full sync:64/64shards

    3.

        full sync:0entries to sync

    4.

        incremental sync:0/64shards

    5.

        metadata isbehind on 70shards

    6.

        oldest incremental change notapplied:2017-03-0121:13:43.0.126971s

    7.

        data sync
        source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

    8.

        syncing

    9.

        full sync:0/128shards

   10.

        incremental sync:128/128shards

   11.

        data iscaught up withsource


    It definitely triggered a fresh sync and told it to forget about
    what it's previously applied as the date of the oldest change not
applied is the day we initially set up multisite for this zone. The problem is that was over 12 hours ago and the sync stat hasn't
    caught up on any shards yet.

    Does anyone have any suggestions other than blast the second site
    and set it back up with a fresh start (the only option I can think
    of at this point)?

    Thank you,
    David Turner



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to