Hi Casey, I set up a completely fresh cluster on a new VM host.. everything is fresh fresh fresh. I feel like it installed cleanly and because there is practically zero latency and unlimited bandwidth as peer VMs, this is a better place to experiment. The behavior is the same as the other cluster.
The realm is “example-test”, has a single zone group named “us”, and there are zones “left” and “right”. The master zone is “left” and I am trying to unidirectionally replicate to “right”. “left” is a two node cluster and right is a single node cluster. Both show "too few PGs per OSD” but are otherwise 100% active+clean. Both clusters have been completely restarted to make sure there are no latent config issues, although only the RGW nodes should require that. The thread at [1] is the most involved engagement I’ve found with a staff member on the subject, so I checked and believe I attached all the logs that were requested there. They all appear to be consistent and are attached below. For start: > [root@right01 ~]# radosgw-admin sync status > realm d5078dd2-6a6e-49f8-941e-55c02ad58af7 (example-test) > zonegroup de533461-2593-45d2-8975-99072d860bb2 (us) > zone 5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe (right) > metadata sync syncing > full sync: 0/64 shards > incremental sync: 64/64 shards > metadata is caught up with master > data sync source: 479d3f20-d57d-4b37-995b-510ba10756bf (left) > syncing > full sync: 0/128 shards > incremental sync: 128/128 shards > data is caught up with source I tried the information at [2] and do not see any ops in progress, just “linger_ops”. I don’t know what those are, but probably explain the slow stream of requests back and forth between the two RGW endpoints: > [root@right01 ~]# ceph daemon client.rgw.right01.54395.94074682941968 > objecter_requests > { > "ops": [], > "linger_ops": [ > { > "linger_id": 2, > "pg": "2.16dafda0", > "osd": 0, > "object_id": "notify.1", > "object_locator": "@2", > "target_object_id": "notify.1", > "target_object_locator": "@2", > "paused": 0, > "used_replica": 0, > "precalc_pgid": 0, > "snapid": "head", > "registered": "1" > }, > ... > ], > "pool_ops": [], > "pool_stat_ops": [], > "statfs_ops": [], > "command_ops": [] > } > The next thing I tried is `radosgw-admin data sync run --source-zone=left` from the right side. I get bursts of messages of the following form: > 2019-04-19 21:46:34.281 7f1c006ad580 0 RGW-SYNC:data:sync:shard[1]: ERROR: > failed to read remote data log info: ret=-2 > 2019-04-19 21:46:34.281 7f1c006ad580 0 meta sync: ERROR: RGWBackoffControlCR > called coroutine returned -2 When I sorted and filtered the messages, each burst has one RGW-SYNC message for each of the PGs on the left side identified by the number in “[]”. Since left has 128 PGs, these are the numbers between 0-127. The bursts happen about once every five seconds. The packet traces between the nodes during the `data sync run` are mostly requests and responses of the following form: > HTTP GET: > http://right01.example.com:7480/admin/log/?type=data&id=7&marker&extra-info=true&rgwx-zonegroup=de533461-2593-45d2-8975-99072d860bb2 > > <http://right01.example.com:7480/admin/log/?type=data&id=7&marker&extra-info=true&rgwx-zonegroup=de533461-2593-45d2-8975-99072d860bb2>HTTP > 404 RESPONSE: > {"Code":"NoSuchKey","RequestId":"tx000000000000000002a01-005cba9593-371d-right","HostId":"371d-right-us”} When I stop the `data sync run`, these 404s stop, so clearly the `data sync run` isn’t changing a state in the rgw, but doing something synchronously. In the past, I have done a `data sync init` but it doesn’t seem like doing it repeatedly will make a difference so I didn’t do it any more. NEXT STEPS: I am working on how to get better logging output from daemons and hope to find something in there that will help. If I am lucky, I will find something in there and can report back so this thread is useful for others. If I have not written back, I probably haven’t found anything, so would be grateful for any leads. Kind regards and thank you! Brian [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013188.html <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013188.html> [2] http://docs.ceph.com/docs/master/radosgw/troubleshooting/?highlight=linger_ops#blocked-radosgw-requests <http://docs.ceph.com/docs/master/radosgw/troubleshooting/?highlight=linger_ops#blocked-radosgw-requests> CONFIG DUMPS: > [root@left01 ~]# radosgw-admin period get-current > { > "current_period": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c" > } > [root@left01 ~]# radosgw-admin period get cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c > { > "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", > "epoch": 6, > "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80", > "sync_status": [], > "period_map": { > "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", > "zonegroups": [ > { > "id": "de533461-2593-45d2-8975-99072d860bb2", > "name": "us", > "api_name": "us", > "is_master": "true", > "endpoints": [ > "http://left01.example.com:7480 > <http://left01.example.com:7480/>" > ], > "hostnames": [], > "hostnames_s3website": [], > "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", > "zones": [ > { > "id": "479d3f20-d57d-4b37-995b-510ba10756bf", > "name": "left", > "endpoints": [ > "http://left01.example.com:7480 > <http://left01.example.com:7480/>" > ], > "log_meta": "false", > "log_data": "true", > "bucket_index_max_shards": 0, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > }, > { > "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", > "name": "right", > "endpoints": [ > "http://right01.example.com:7480 > <http://right01.example.com:7480/>" > ], > "log_meta": "false", > "log_data": "true", > "bucket_index_max_shards": 0, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > } > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [], > "storage_classes": [ > "STANDARD" > ] > } > ], > "default_placement": "default-placement", > "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7" > } > ], > "short_zone_ids": [ > { > "key": "479d3f20-d57d-4b37-995b-510ba10756bf", > "val": 1817029288 > }, > { > "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", > "val": 1573215025 > } > ] > }, > "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2", > "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", > "period_config": { > "bucket_quota": { > "enabled": false, > "check_on_raw": false, > "max_size": -1, > "max_size_kb": 0, > "max_objects": -1 > }, > "user_quota": { > "enabled": false, > "check_on_raw": false, > "max_size": -1, > "max_size_kb": 0, > "max_objects": -1 > } > }, > "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7", > "realm_name": “example-test", > "realm_epoch": 2 > } > [root@left01 ~]# radosgw-admin zonegroup get > { > "id": "de533461-2593-45d2-8975-99072d860bb2", > "name": "us", > "api_name": "us", > "is_master": "true", > "endpoints": [ > "http://left01.example.com:7480 <http://left01.example.com:7480/>" > ], > "hostnames": [], > "hostnames_s3website": [], > "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", > "zones": [ > { > "id": "479d3f20-d57d-4b37-995b-510ba10756bf", > "name": "left", > "endpoints": [ > "http://left01.example.com:7480 > <http://left01.example.com:7480/>" > ], > "log_meta": "false", > "log_data": "true", > "bucket_index_max_shards": 0, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > }, > { > "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", > "name": "right", > "endpoints": [ > "http://right01.example.com:7480 > <http://right01.example.com:7480/>" > ], > "log_meta": "false", > "log_data": "true", > "bucket_index_max_shards": 0, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > } > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [], > "storage_classes": [ > "STANDARD" > ] > } > ], > "default_placement": "default-placement", > "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7" > } > [root@left01 ~]# radosgw-admin period get > { > "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", > "epoch": 6, > "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80", > "sync_status": [], > "period_map": { > "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", > "zonegroups": [ > { > "id": "de533461-2593-45d2-8975-99072d860bb2", > "name": "us", > "api_name": "us", > "is_master": "true", > "endpoints": [ > "http://left01.example.com:7480 > <http://left01.example.com:7480/>" > ], > "hostnames": [], > "hostnames_s3website": [], > "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", > "zones": [ > { > "id": "479d3f20-d57d-4b37-995b-510ba10756bf", > "name": "left", > "endpoints": [ > "http://left01.example.com:7480 > <http://left01.example.com:7480/>" > ], > "log_meta": "false", > "log_data": "true", > "bucket_index_max_shards": 0, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > }, > { > "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", > "name": "right", > "endpoints": [ > "http://right01.example.com:7480 > <http://right01.example.com:7480/>" > ], > "log_meta": "false", > "log_data": "true", > "bucket_index_max_shards": 0, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > } > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [], > "storage_classes": [ > "STANDARD" > ] > } > ], > "default_placement": "default-placement", > "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7" > } > ], > "short_zone_ids": [ > { > "key": "479d3f20-d57d-4b37-995b-510ba10756bf", > "val": 1817029288 > }, > { > "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", > "val": 1573215025 > } > ] > }, > "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2", > "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", > "period_config": { > "bucket_quota": { > "enabled": false, > "check_on_raw": false, > "max_size": -1, > "max_size_kb": 0, > "max_objects": -1 > }, > "user_quota": { > "enabled": false, > "check_on_raw": false, > "max_size": -1, > "max_size_kb": 0, > "max_objects": -1 > } > }, > "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7", > "realm_name": “example-test", > "realm_epoch": 2 > }
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com