[ceph-users] rgw multi-site replication issues

John Rowe Wed, 21 Sep 2016 14:15:15 -0700

Hello,

We have 2 Ceph clusters running in two separate data centers, each one with
3 mons, 3 rgws, and 5 osds. I am attempting to get bi-directional
multi-site replication setup as described in the ceph documentation here:
http://docs.ceph.com/docs/jewel/radosgw/multisite/


We are running Jewel v 10.2.2:
rpm -qa | grep ceph
ceph-base-10.2.2-0.el7.x86_64
ceph-10.2.2-0.el7.x86_64
ceph-radosgw-10.2.2-0.el7.x86_64
libcephfs1-10.2.2-0.el7.x86_64
python-cephfs-10.2.2-0.el7.x86_64
ceph-selinux-10.2.2-0.el7.x86_64
ceph-mon-10.2.2-0.el7.x86_64
ceph-osd-10.2.2-0.el7.x86_64
ceph-release-1-1.el7.noarch
ceph-common-10.2.2-0.el7.x86_64
ceph-mds-10.2.2-0.el7.x86_64

It appears syncing is happening, however it is not able to sync the
metadata, and therefore no users/buckets from the primary are going to the
secondary.
Primary sync status:
* radosgw-admin sync status*
*          realm 3af93a86-916a-490f-b38f-17922b472b19 (my_realm)*
*      zonegroup 235b010c-22e2-4b43-8fcc-8ae01939273e (us)*
*           zone 6c830b44-4e39-4e19-9bd8-03c37c2021f2 (us-dfw)*
*  metadata sync no sync (zone is master)*
*      data sync source: 58aa3eef-fc1f-492c-a08e-9c6019e7c266 (us-phx)*
*                        syncing*
*                        full sync: 0/128 shards*
*                        incremental sync: 128/128 shards*
*                        data is caught up with source*


*radosgw-admin data sync status --source-zone=us-phx{    "sync_status": {
      "info": {            "status": "sync",            "num_shards": 128
      },        "markers": [...}radosgw-admin metadata sync status{
"sync_status": {        "info": {            "status": "init",
"num_shards": 0,            "period": "",            "realm_epoch": 0
  },        "markers": []    },    "full_sync": {        "total": 0,
"complete": 0    }}*
Secondary sync status:
*radosgw-admin sync status*
*          realm 3af93a86-916a-490f-b38f-17922b472b19 (pardot)*
*      zonegroup 235b010c-22e2-4b43-8fcc-8ae01939273e (us)*
*           zone 58aa3eef-fc1f-492c-a08e-9c6019e7c266 (us-phx)*
*  metadata sync failed to read sync status: (2) No such file or directory*
*      data sync source: 6c830b44-4e39-4e19-9bd8-03c37c2021f2 (us-dfw)*
*                        syncing*
*                        full sync: 0/128 shards*
*                        incremental sync: 128/128 shards*
*                        data is behind on 10 shards*
*                        oldest incremental change not applied: 2016-09-20
15:00:17.0.330225s*

*radosgw-admin data sync status --source-zone=us-dfw*
*{*
*    "sync_status": {*
*        "info": {*
*            "status": "building-full-sync-maps",*
*            "num_shards": 128*
*        },*
*....*
*}*



*radosgw-admin metadata sync status --source-zone=us-dfwERROR:
sync.read_sync_status() returned ret=-2*

In the logs I am seeing (date stamps not in order to pick out non-dupes):
Primary logs:
2016-09-20 15:02:44.313204 7f2a2dffb700  0 ERROR:
client_io->complete_request() returned -5
2016-09-20 10:31:57.501247 7faf4bfff700  0 ERROR: failed to wait for op,
ret=-11: POST
http://pardot0-cephrgw1-3-phx.ops.sfdc.net:80/admin/realm/period?period=385c44c7-0506-4204-90d7-9d26a6cbaad2&epoch=12&rgwx-zonegroup=f46ce11b-ee5d-489b-aa30-752fc5353931
2016-09-20 10:32:03.391118 7fb12affd700  0 ERROR: failed to fetch datalog
info
2016-09-20 10:32:03.491520 7fb12affd700  0 ERROR: lease cr failed, done
early

Secondary logs;
2016-09-20 10:28:15.290050 7faab2fed700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:15.290108 7faab5ff3700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:15.290571 7faab77f6700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:15.304619 7faaad7e2700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:38.169629 7fa98bfff700  0 ERROR: failed to distribute
cache for .rgw.root:periods.385c44c7-0506-4204-90d7-9d26a6cbaad2.12
2016-09-20 10:28:38.169642 7fa98bfff700 -1 period epoch 12 is not newer
than current epoch 12, discarding update
2016-09-21 03:19:01.550808 7fe10bfff700  0 rgw meta sync: ERROR: failed to
fetch mdlog info
2016-09-21 15:45:09.799195 7fcd677fe700  0 ERROR: failed to fetch remote
data log info: ret=-11

Each of those logs are repeated several times each, and constantly.

Any help would be greatly appreciated.  Thanks!

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rgw multi-site replication issues

Reply via email to