Hi David,

The 'data sync init' command won't touch any actual object data, no. Resetting the data sync status will just cause a zone to restart a full sync of the --source-zone's data changes log. This log only lists which buckets/shards have changes in them, which causes radosgw to consider them for bucket sync. So while the command may silence the warnings about data shards being behind, it's unlikely to resolve the issue with missing objects in those buckets.

When data sync is behind for an extended period of time, it's usually because it's stuck retrying previous bucket sync failures. The 'sync error list' may help narrow down where those failures are.

There is also a 'bucket sync init' command to clear the bucket sync status. Following that with a 'bucket sync run' should restart a full sync on the bucket, pulling in any new objects that are present on the source-zone. I'm afraid that those commands haven't seen a lot of polish or testing, however.

Casey


On 08/24/2017 04:15 PM, David Turner wrote:
Apparently the data shards that are behind go in both directions, but only one zone is aware of the problem. Each cluster has objects in their data pool that the other doesn't have. I'm thinking about initiating a `data sync init` on both sides (one at a time) to get them back on the same page. Does anyone know if that command will overwrite any local data that the zone has that the other doesn't if you run `data sync init` on it?

On Thu, Aug 24, 2017 at 1:51 PM David Turner <drakonst...@gmail.com <mailto:drakonst...@gmail.com>> wrote:

    After restarting the 2 RGW daemons on the second site again,
    everything caught up on the metadata sync.  Is there something
    about having 2 RGW daemons on each side of the multisite that
    might be causing an issue with the sync getting stale?  I have
    another realm set up the same way that is having a hard time with
    its data shards being behind.  I haven't told them to resync, but
    yesterday I noticed 90 shards were behind.  It's caught back up to
    only 17 shards behind, but the oldest change not applied is 2
    months old and no order of restarting RGW daemons is helping to
    resolve this.

    On Thu, Aug 24, 2017 at 10:59 AM David Turner
    <drakonst...@gmail.com <mailto:drakonst...@gmail.com>> wrote:

        I have a RGW Multisite 10.2.7 set up for bi-directional
        syncing.  This has been operational for 5 months and working
        fine.  I recently created a new user on the master zone, used
        that user to create a bucket, and put in a public-acl object
        in there.  The Bucket created on the second site, but the user
        did not and the object errors out complaining about the
        access_key not existing.

        That led me to think that the metadata isn't syncing, while
        bucket and data both are.  I've also confirmed that data is
        syncing for other buckets as well in both directions. The sync
        status from the second site was this.

        1.

            metadata sync syncing

        2.

            full sync:0/64shards

        3.

            incremental sync:64/64shards

        4.

            metadata iscaught up withmaster

        5.

            data sync
            source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

        6.

            syncing

        7.

            full sync:0/128shards

        8.

            incremental sync:128/128shards

        9.

            data iscaught up withsource


        Sync status leads me to think that the second site believes it
        is up to date, even though it is missing a freshly created
        user.  I restarted all of the rgw daemons for the zonegroup,
        but it didn't trigger anything to fix the missing user in the
        second site. I did some googling and found the sync init
        commands mentioned in a few ML posts and used metadata sync
        init and now have this as the sync status.

        1.

            metadata sync preparing forfull sync

        2.

            full sync:64/64shards

        3.

            full sync:0entries to sync

        4.

            incremental sync:0/64shards

        5.

            metadata isbehind on 70shards

        6.

            oldest incremental change
            notapplied:2017-03-0121:13:43.0.126971s

        7.

            data sync
            source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

        8.

            syncing

        9.

            full sync:0/128shards

       10.

            incremental sync:128/128shards

       11.

            data iscaught up withsource


        It definitely triggered a fresh sync and told it to forget
        about what it's previously applied as the date of the oldest
        change not applied is the day we initially set up multisite
        for this zone.  The problem is that was over 12 hours ago and
        the sync stat hasn't caught up on any shards yet.

        Does anyone have any suggestions other than blast the second
        site and set it back up with a fresh start (the only option I
        can think of at this point)?

        Thank you,
        David Turner



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to