So we upgraded everything from 12.2.8 to 12.2.11, and things have gone to hell. 
 Lots of sync errors, like so:

sudo radosgw-admin sync error list
[
    {
        "shard_id": 0,
        "entries": [
            {
                "id": "1_1549348245.870945_5163821.1",
                "section": "data",
                "name": 
"dora/catalogmaker-redis:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.470/56fbc9685d609b4c8cdbd11dd60bf03bedcb613b438c663c9899d930b25f0405",
                "timestamp": "2019-02-05 06:30:45.870945Z",
                "info": {
                    "source_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
                    "error_code": 5,
                    "message": "failed to sync object(5) Input/output error"
                }
            },
…

radosgw logs are full of:
2019-03-04 14:32:58.039467 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2
2019-03-04 14:32:58.041296 7f90e81eb700  0 data sync: ERROR: init sync on 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146 failed, 
retcode=-2
2019-03-04 14:32:58.041662 7f90e81eb700  0 meta sync: ERROR: 
RGWBackoffControlCR called coroutine returned -2
2019-03-04 14:32:58.042949 7f90e81eb700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146
2019-03-04 14:32:58.823501 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2
2019-03-04 14:32:58.825243 7f90e81eb700  0 meta sync: ERROR: 
RGWBackoffControlCR called coroutine returned -2

dc11-ceph-rgw2:~$ sudo radosgw-admin sync status
          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
           zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
2019-03-04 14:26:21.351372 7ff7ae042e40  0 meta sync: ERROR: failed to fetch 
mdlog info
  metadata sync syncing
                full sync: 0/64 shards
                failed to fetch local sync status: (5) Input/output error
^C

Any advice?  All three clusters on 12.2.11, Debian stretch.

From: Christian Rice <cr...@pandora.com>
Date: Thursday, February 28, 2019 at 9:06 AM
To: Matthew H <matthew.he...@hotmail.com>, ceph-users 
<ceph-users@lists.ceph.com>
Subject: Re: radosgw sync falling behind regularly

Yeah my bad on the typo, not running 12.8.8 ☺  It’s 12.2.8.  We can upgrade and 
will attempt to do so asap.  Thanks for that, I need to read my release notes 
more carefully, I guess!

From: Matthew H <matthew.he...@hotmail.com>
Date: Wednesday, February 27, 2019 at 8:33 PM
To: Christian Rice <cr...@pandora.com>, ceph-users <ceph-users@lists.ceph.com>
Subject: Re: radosgw sync falling behind regularly

Hey Christian,

I'm making a while guess, but assuming this is 12.2.8. If so, it it possible 
that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for 
metadata syncing and data syncing ( both separate issues ) that you could be 
hitting.

Thanks,
________________________________
From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Christian 
Rice <cr...@pandora.com>
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly


Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters 
in one zonegroup.



Often we find either metadata or data sync behind, and it doesn’t look to ever 
recover until…we restart the endpoint radosgw target service.



eg at 15:45:40:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

           zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

                full sync: 0/64 shards

                incremental sync: 64/64 shards

                metadata is behind on 2 shards

                behind shards: [19,41]

                oldest incremental change not applied: 2019-02-27 
14:42:24.0.408263s

      data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source

                source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source





so at 15:46:07:



dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart 
ceph-radosgw@rgw.dc11-ceph-rgw1.service<mailto:ceph-radosgw@rgw.dc11-ceph-rgw1.service>



and by the time I checked at 15:48:08:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

           zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

                full sync: 0/64 shards

                incremental sync: 64/64 shards

                metadata is caught up with master

      data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source

                source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source





There’s no way this is “lag.”  It’s stuck, and happens frequently, though 
perhaps not daily.  Any suggestions?  Our cluster isn’t heavily used yet, but 
it’s production.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to