Re: [ceph-users] radosgw sync falling behind regularly
HI Casey, We're still trying to figure this sync problem out, if you could possibly tell us anything further we would be deeply grateful! Our errors are coming from 'data sync'. In `sync status` we pretty constantly show one shard behind, but a different one each time we run it. Here's a paste -- these commands were run in rapid succession. root@sv3-ceph-rgw1:~# radosgw-admin sync status realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket) zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us) zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source root@sv3-ceph-rgw1:~# radosgw-admin sync status realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket) zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us) zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 1 shards behind shards: [30] oldest incremental change not applied: 2019-01-19 22:53:23.0.16109s source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source root@sv3-ceph-rgw1:~# Below I'm pasting a small section of log. Thanks so much for looking! Trey Palmer root@sv3-ceph-rgw1:/var/log/ceph# tail -f ceph-rgw-sv3-ceph-rgw1.log | grep -i error 2019-03-08 11:43:07.208572 7fa080cc7700 0 data sync: ERROR: failed to read remote data log info: ret=-2 2019-03-08 11:43:07.211348 7fa080cc7700 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-03-08 11:43:07.267117 7fa080cc7700 0 data sync: ERROR: failed to read remote data log info: ret=-2 2019-03-08 11:43:07.269631 7fa080cc7700 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-03-08 11:43:07.895192 7fa080cc7700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.046685 7fa080cc7700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.171277 7fa0870eb700 0 ERROR: failed to get bucket instance info for .bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 2019-03-08 11:43:08.171748 7fa0850e7700 0 ERROR: failed to get bucket instance info for .bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 2019-03-08 11:43:08.175867 7fa08a0f1700 0 meta sync: ERROR: can't remove key: bucket.instance:phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 ret=-2 2019-03-08 11:43:08.176755 7fa0820e1700 0 data sync: ERROR: init sync on whoiswho/whoiswho:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.293 failed, retcode=-2 2019-03-08 11:43:08.176872 7fa0820e1700 0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2 2019-03-08 11:43:08.176885 7fa093103700 0 ERROR: failed to get bucket instance info for .bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 2019-03-08 11:43:08.176925 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 2019-03-08 11:43:08.177916 7fa0910ff700 0 meta sync: ERROR: can't remove key: bucket.instance:gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 ret=-2 2019-03-08 11:43:08.178815 7fa08b0f3700 0 ERROR: failed to get bucket instance info for .bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 2019-03-08 11:43:08.178847 7fa0820e1700 0 data sync: ERROR: failed to retrieve bucket info for bucket=gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 2019-03-08 11:43:
Re: [ceph-users] radosgw sync falling behind regularly
It appears we eventually got 'data sync init' working. At least, it's worked on 5 of the 6 sync directions in our 3-node cluster. The sixth has not run without an error returned, although 'sync status' does say "preparing for full sync". Thanks, Trey On Wed, Mar 6, 2019 at 1:22 PM Trey Palmer wrote: > Casey, > > This was the result of trying 'data sync init': > > root@c2-rgw1:~# radosgw-admin data sync init > ERROR: source zone not specified > root@c2-rgw1:~# radosgw-admin data sync init --source-zone= uuid> > WARNING: cannot find source zone id for name= > ERROR: sync.init_sync_status() returned ret=-2 > root@c2-rgw1:~# radosgw-admin data sync init --source-zone=c1-zone > ERROR: sync.init() returned ret=-5 > 2019-03-06 10:14:59.815735 7fecb214fe40 0 data sync: ERROR: failed to > fetch datalog info > root@c2-rgw1:~# > > Do you have any further advice or info? > > Thanks again, > > Trey > > > On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley wrote: > >> Hi Trey, >> >> I think it's more likely that these stale metadata entries are from >> deleted buckets, rather than accidental bucket reshards. When a bucket >> is deleted in a multisite configuration, we don't delete its bucket >> instance because other zones may still need to sync the object deletes - >> and they can't make progress on sync if the bucket metadata disappears. >> These leftover bucket instances look the same to the 'reshard >> stale-instances' commands, but I'd be cautious about using that to >> remove them in multisite, as it may cause more sync errors and >> potentially leak storage if they still contain objects. >> >> Regarding 'datalog trim', that alone isn't safe because it could trim >> entries that hadn't been applied on other zones yet, causing them to >> miss some updates. What you can do is run 'data sync init' on each zone, >> and restart gateways. This will restart with a data full sync (which >> will scan all buckets for changes), and skip past any datalog entries >> from before the full sync. I was concerned that the bug in error >> handling (ie "ERROR: init sync on...") would also affect full sync, but >> that doesn't appear to be the case - so I do think that's worth trying. >> >> On 3/5/19 6:24 PM, Trey Palmer wrote: >> > Casey, >> > >> > Thanks very much for the reply! >> > >> > We definitely have lots of errors on sync-disabled buckets and the >> > workaround for that is obvious (most of them are empty anyway). >> > >> > Our second form of error is stale buckets. We had dynamic resharding >> > enabled but have now disabled it (having discovered it was on by >> > default, and not supported in multisite). >> > >> > We removed several hundred stale buckets via 'radosgw-admin sharding >> > stale-instances rm', but they are still giving us sync errors. >> > >> > I have found that these buckets do have entries in 'radosgw-admin >> > datalog list', and my guess is this could be fixed by doing a >> > 'radosgw-admin datalog trim' for each entry on the master zone. >> > >> > Does that sound right? :-) >> > >> > Thanks again for the detailed explanation, >> > >> > Trey Palmer >> > >> > On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley > > <mailto:cbod...@redhat.com>> wrote: >> > >> > Hi Christian, >> > >> > I think you've correctly intuited that the issues are related to >> > the use >> > of 'bucket sync disable'. There was a bug fix for that feature in >> > http://tracker.ceph.com/issues/26895, and I recently found that a >> > block >> > of code was missing from its luminous backport. That missing code is >> > what handled those "ERROR: init sync on failed, >> > retcode=-2" errors. >> > >> > I included a fix for that in a later backport >> > (https://github.com/ceph/ceph/pull/26549), which I'm still working >> to >> > get through qa. I'm afraid I can't really recommend a workaround >> > for the >> > issue in the meantime. >> > >> > Looking forward though, we do plan to support something like s3's >> > cross >> > region replication so you can enable replication on a specific >> bucket >> > without having to enable
Re: [ceph-users] radosgw sync falling behind regularly
Casey, This was the result of trying 'data sync init': root@c2-rgw1:~# radosgw-admin data sync init ERROR: source zone not specified root@c2-rgw1:~# radosgw-admin data sync init --source-zone= WARNING: cannot find source zone id for name= ERROR: sync.init_sync_status() returned ret=-2 root@c2-rgw1:~# radosgw-admin data sync init --source-zone=c1-zone ERROR: sync.init() returned ret=-5 2019-03-06 10:14:59.815735 7fecb214fe40 0 data sync: ERROR: failed to fetch datalog info root@c2-rgw1:~# Do you have any further advice or info? Thanks again, Trey On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley wrote: > Hi Trey, > > I think it's more likely that these stale metadata entries are from > deleted buckets, rather than accidental bucket reshards. When a bucket > is deleted in a multisite configuration, we don't delete its bucket > instance because other zones may still need to sync the object deletes - > and they can't make progress on sync if the bucket metadata disappears. > These leftover bucket instances look the same to the 'reshard > stale-instances' commands, but I'd be cautious about using that to > remove them in multisite, as it may cause more sync errors and > potentially leak storage if they still contain objects. > > Regarding 'datalog trim', that alone isn't safe because it could trim > entries that hadn't been applied on other zones yet, causing them to > miss some updates. What you can do is run 'data sync init' on each zone, > and restart gateways. This will restart with a data full sync (which > will scan all buckets for changes), and skip past any datalog entries > from before the full sync. I was concerned that the bug in error > handling (ie "ERROR: init sync on...") would also affect full sync, but > that doesn't appear to be the case - so I do think that's worth trying. > > On 3/5/19 6:24 PM, Trey Palmer wrote: > > Casey, > > > > Thanks very much for the reply! > > > > We definitely have lots of errors on sync-disabled buckets and the > > workaround for that is obvious (most of them are empty anyway). > > > > Our second form of error is stale buckets. We had dynamic resharding > > enabled but have now disabled it (having discovered it was on by > > default, and not supported in multisite). > > > > We removed several hundred stale buckets via 'radosgw-admin sharding > > stale-instances rm', but they are still giving us sync errors. > > > > I have found that these buckets do have entries in 'radosgw-admin > > datalog list', and my guess is this could be fixed by doing a > > 'radosgw-admin datalog trim' for each entry on the master zone. > > > > Does that sound right? :-) > > > > Thanks again for the detailed explanation, > > > > Trey Palmer > > > > On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley > <mailto:cbod...@redhat.com>> wrote: > > > > Hi Christian, > > > > I think you've correctly intuited that the issues are related to > > the use > > of 'bucket sync disable'. There was a bug fix for that feature in > > http://tracker.ceph.com/issues/26895, and I recently found that a > > block > > of code was missing from its luminous backport. That missing code is > > what handled those "ERROR: init sync on failed, > > retcode=-2" errors. > > > > I included a fix for that in a later backport > > (https://github.com/ceph/ceph/pull/26549), which I'm still working > to > > get through qa. I'm afraid I can't really recommend a workaround > > for the > > issue in the meantime. > > > > Looking forward though, we do plan to support something like s3's > > cross > > region replication so you can enable replication on a specific bucket > > without having to enable it globally. > > > > Casey > > > > > > On 3/5/19 2:32 PM, Christian Rice wrote: > > > > > > Much appreciated. We’ll continue to poke around and certainly will > > > disable the dynamic resharding. > > > > > > We started with 12.2.8 in production. We definitely did not > > have it > > > enabled in ceph.conf > > > > > > *From: *Matthew H > <mailto:matthew.he...@hotmail.com>> > > > *Date: *Tuesday, March 5, 2019 at 11:22 AM > > > *To: *Christian Rice > <mailto:cr...@pandora.com>>, ceph-users > > > mailto:ceph-users@lists.cep
Re: [ceph-users] radosgw sync falling behind regularly
Casey, You are spot on that almost all of these are deleted buckets. At some point in the last few months we deleted and replaced buckets with underscores in their names, and those are responsible for most of these errors. Thanks very much for the reply and explanation. We’ll give ‘data sync init’ a try. — Trey On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley wrote: > Hi Trey, > > I think it's more likely that these stale metadata entries are from > deleted buckets, rather than accidental bucket reshards. When a bucket > is deleted in a multisite configuration, we don't delete its bucket > instance because other zones may still need to sync the object deletes - > and they can't make progress on sync if the bucket metadata disappears. > These leftover bucket instances look the same to the 'reshard > stale-instances' commands, but I'd be cautious about using that to > remove them in multisite, as it may cause more sync errors and > potentially leak storage if they still contain objects. > > Regarding 'datalog trim', that alone isn't safe because it could trim > entries that hadn't been applied on other zones yet, causing them to > miss some updates. What you can do is run 'data sync init' on each zone, > and restart gateways. This will restart with a data full sync (which > will scan all buckets for changes), and skip past any datalog entries > from before the full sync. I was concerned that the bug in error > handling (ie "ERROR: init sync on...") would also affect full sync, but > that doesn't appear to be the case - so I do think that's worth trying. > > On 3/5/19 6:24 PM, Trey Palmer wrote: > > Casey, > > > > Thanks very much for the reply! > > > > We definitely have lots of errors on sync-disabled buckets and the > > workaround for that is obvious (most of them are empty anyway). > > > > Our second form of error is stale buckets. We had dynamic resharding > > enabled but have now disabled it (having discovered it was on by > > default, and not supported in multisite). > > > > We removed several hundred stale buckets via 'radosgw-admin sharding > > stale-instances rm', but they are still giving us sync errors. > > > > I have found that these buckets do have entries in 'radosgw-admin > > datalog list', and my guess is this could be fixed by doing a > > 'radosgw-admin datalog trim' for each entry on the master zone. > > > > Does that sound right? :-) > > > > Thanks again for the detailed explanation, > > > > Trey Palmer > > > > On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley > <mailto:cbod...@redhat.com>> wrote: > > > > Hi Christian, > > > > I think you've correctly intuited that the issues are related to > > the use > > of 'bucket sync disable'. There was a bug fix for that feature in > > http://tracker.ceph.com/issues/26895, and I recently found that a > > block > > of code was missing from its luminous backport. That missing code is > > what handled those "ERROR: init sync on failed, > > retcode=-2" errors. > > > > I included a fix for that in a later backport > > (https://github.com/ceph/ceph/pull/26549), which I'm still working > to > > get through qa. I'm afraid I can't really recommend a workaround > > for the > > issue in the meantime. > > > > Looking forward though, we do plan to support something like s3's > > cross > > region replication so you can enable replication on a specific bucket > > without having to enable it globally. > > > > Casey > > > > > > On 3/5/19 2:32 PM, Christian Rice wrote: > > > > > > Much appreciated. We’ll continue to poke around and certainly will > > > disable the dynamic resharding. > > > > > > We started with 12.2.8 in production. We definitely did not > > have it > > > enabled in ceph.conf > > > > > > *From: *Matthew H > <mailto:matthew.he...@hotmail.com>> > > > *Date: *Tuesday, March 5, 2019 at 11:22 AM > > > *To: *Christian Rice > <mailto:cr...@pandora.com>>, ceph-users > > > mailto:ceph-users@lists.ceph.com>> > > > *Cc: *Trey Palmer > <mailto:nerdmagic...@gmail.com>> > > > *Subject: *Re: radosgw sync falling behind regularly > > > > > > Hi Christian, > >
Re: [ceph-users] radosgw sync falling behind regularly
Casey, Thanks very much for the reply! We definitely have lots of errors on sync-disabled buckets and the workaround for that is obvious (most of them are empty anyway). Our second form of error is stale buckets. We had dynamic resharding enabled but have now disabled it (having discovered it was on by default, and not supported in multisite). We removed several hundred stale buckets via 'radosgw-admin sharding stale-instances rm', but they are still giving us sync errors. I have found that these buckets do have entries in 'radosgw-admin datalog list', and my guess is this could be fixed by doing a 'radosgw-admin datalog trim' for each entry on the master zone. Does that sound right? :-) Thanks again for the detailed explanation, Trey Palmer On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley wrote: > Hi Christian, > > I think you've correctly intuited that the issues are related to the use > of 'bucket sync disable'. There was a bug fix for that feature in > http://tracker.ceph.com/issues/26895, and I recently found that a block > of code was missing from its luminous backport. That missing code is > what handled those "ERROR: init sync on failed, > retcode=-2" errors. > > I included a fix for that in a later backport > (https://github.com/ceph/ceph/pull/26549), which I'm still working to > get through qa. I'm afraid I can't really recommend a workaround for the > issue in the meantime. > > Looking forward though, we do plan to support something like s3's cross > region replication so you can enable replication on a specific bucket > without having to enable it globally. > > Casey > > > On 3/5/19 2:32 PM, Christian Rice wrote: > > > > Much appreciated. We’ll continue to poke around and certainly will > > disable the dynamic resharding. > > > > We started with 12.2.8 in production. We definitely did not have it > > enabled in ceph.conf > > > > *From: *Matthew H > > *Date: *Tuesday, March 5, 2019 at 11:22 AM > > *To: *Christian Rice , ceph-users > > > > *Cc: *Trey Palmer > > *Subject: *Re: radosgw sync falling behind regularly > > > > Hi Christian, > > > > To be on the safe side and future proof yourself will want to go ahead > > and set the following in your ceph.conf file, and then issue a restart > > to your RGW instances. > > > > rgw_dynamic_resharding = false > > > > There are a number of issues with dynamic resharding, multisite rgw > > problems being just one of them. However I thought it was disabled > > automatically when multisite rgw is used (but I will have to double > > check the code on that). What version of Ceph did you initially > > install the cluster with? Prior to v12.2.2 this feature was enabled by > > default for all rgw use cases. > > > > Thanks, > > > > > > > > *From:*Christian Rice > > *Sent:* Tuesday, March 5, 2019 2:07 PM > > *To:* Matthew H; ceph-users > > *Subject:* Re: radosgw sync falling behind regularly > > > > Matthew, first of all, let me say we very much appreciate your help! > > > > So I don’t think we turned dynamic resharding on, nor did we manually > > reshard buckets. Seems like it defaults to on for luminous but the > > mimic docs say it’s not supported in multisite. So do we need to > > disable it manually via tell and ceph.conf? > > > > Also, after running the command you suggested, all the stale instances > > are gone…these from my examples were in output: > > > > "bucket_instance": > > "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303", > > > > "bucket_instance": > > "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299", > > > > "bucket_instance": > > "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301", > > > > Though we still get lots of log messages like so in rgw: > > > > 2019-03-05 11:01:09.526120 7f64120ae700 0 ERROR: failed to get bucket > > instance info for > > > .bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299 > > > > 2019-03-05 11:01:09.528664 7f63e5016700 1 civetweb: 0x55976f1c2000: > > 172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET > > > /admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299&rgwx-zonegroup=d
Re: [ceph-users] radosgw sync falling behind regularly
Hi Matthew, I work with Christian. Thanks so much for looking at this. We have a huge stale-instances list from that command. Our periods are all the same, I redirected them to a file on each node and checksummed them. Here's the period: { "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53", "epoch": 16, "predecessor_uuid": "926c74c7-c1a7-46b1-9f25-eb5c392a7fbb", "sync_status": [], "period_map": { "id": "3d0d40ef-90de-40ea-8c44-caa20ea8dc53", "zonegroups": [ { "id": "de6af748-1a2f-44a1-9d44-30799cf1313e", "name": "us", "api_name": "us", "is_master": "true", "endpoints": [ "http://sv5-ceph-rgw1.savagebeast.com:8080"; ], "hostnames": [], "hostnames_s3website": [], "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04", "zones": [ { "id": "107d29a0-b732-4bf1-a26e-1f64f820e839", "name": "dc11-prod", "endpoints": [ "http://dc11-ceph-rgw1:8080"; ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] }, { "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04", "name": "sv5-corp", "endpoints": [ "http://sv5-ceph-rgw1.savagebeast.com:8080"; ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] }, { "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8", "name": "sv3-prod", "endpoints": [ "http://sv3-ceph-rgw1:8080"; ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd" } ], "short_zone_ids": [ { "key": "107d29a0-b732-4bf1-a26e-1f64f820e839", "val": 1720993486 }, { "key": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04", "val": 2301637458 }, { "key": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8", "val": 1449486239 } ] }, "master_zonegroup": "de6af748-1a2f-44a1-9d44-30799cf1313e", "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04", "period_config": { "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } }, "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd", "realm_name": "savagebucket", "realm_epoch": 2 } On Tue, Mar 5, 2019 at 7:31 AM Matthew H wrote: > Hi Christian, > > You haven't resharded any of your buckets have you? You can run the > command below in v12.2.11 to list stale bucket instances. > > radosgw-admin reshard stale-instances list > > Can you also send the output from the following command on each rgw? > > radosgw-admin period get > > > > -- > *From:* Christian Rice > *Sent:* Tuesday, March 5, 2019 1:46 AM > *To:* Matthew H; ceph-users > *Subject:* Re: radosgw sync falling behind regularly > > > sure thing. > > > > sv5-ceph-rgw1 > > zonegroup get > > { > > "id": "de6af748-1a2f-44a1-9d44-30799cf1313e", > > "name": "us", > > "api_name": "us", > > "is_master": "true", > > "endpoints": [ > > "http://sv5-ceph-r
Re: [ceph-users] Question about RadosGW subusers
Anton, It turns out that Adam Emerson is trying to get bucket policies and roles merged in time for Luminous: https://github.com/ceph/ceph/pull/14307 Given this, I think we will only be using subusers temporarily as a method to track which human or service did what in which bucket. This seems to us much easier than trying to deal with ACL's without any concept of groups, roles, or policies, in buckets that can often have millions of objects. Here is the general idea: 1. Each bucket has a user ("master user"), but we don't use or issue that set of keys at all. radosgw-admin user create --uid=mybucket --display-name="My Bucket" You can of course have multiple buckets per user but so far for us it has been simple to have one user per bucket, with the username the same as the bucket name. If a human needs access to more than one bucket, we will create multiple subusers for them. That's not convenient, but it's temporary. So what we're doing is effectively making the user into the group, with the subusers being the users, and each user only capable of being in one group. Very suboptimal, but better than the total chaos that would result from giving everyone the same set of keys for a given bucket. 2. For each human user or service/machine user of that bucket, we create subusers.You can do this via: ## full-control ops user radosgw-admin subuser create --uid=mybucket --subuser=mybucket:alice --access=full --gen-access-key --gen-secret --key-type=s3 ## write-only server user radosgw-admin subuser create --uid=mybucket --subuser=mybucket:daemon --access=write --gen-access-key --gen-secret-key --key-type=s3 If you then do a "radosgw-admin metadata get user:mybucket", the JSON output contains the subusers and their keys. 3. Raise the RGW log level in ceph.conf to make an "access key id" line available for each request, which you can then map to a subuser if/when you need to track who did what after the fact. In ceph.conf: debug_rgw = 10/10 This will cause the logs to be VERY verbose, an order of magnitude and some change more verbose than default. We plan to discard most of the logs while feeding them into ElasticSearch. We might not need this much log verbosity once we have policies and are using unique users rather than subusers. Nevertheless, I hope we can eventually reduce the log level of the "access key id" line, as we have a pretty mainstream use case and I'm certain that tracking S3 request users will be required for many organizations for accounting and forensic purposes just as it is for us. -- Trey On Thu, Apr 13, 2017 at 1:29 PM, wrote: > Hey Trey. > > Sounds great, we were discussing the same kind of requirements and > couldn't agree on/find something "useful"... so THANK YOU for sharing!!! > > It would be great if you could provide some more details or an example how > you configure the "bucket user" and sub-users and all that stuff. > Even more interesting for me, how do the "different ppl or services" > access that buckets/objects afterwards?! I mean via which tools (s3cmd, > boto, cyberduck, mix of some, ...) and are there any ACLs set/in use as > well?! > > (sorry if this all sounds somehow dumb but I'm a just a novice ;) ) > > best > Anton > > > Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr > Von: "Trey Palmer" > An: ceph-us...@ceph.com > Betreff: [ceph-users] Question about RadosGW subusers > > Probably a question for @yehuda : > > > We have fairly strict user accountability requirements. The best way we > have found to meet them with S3 object storage on Ceph is by using RadosGW > subusers. > > If we set up one user per bucket, then set up subusers to provide separate > individual S3 keys and access rights for different people or services using > that bucket, then we can track who did what via access key in the RadosGW > logs (at debug_rgw = 10/10). > > Of course, this is not a documented use case for subusers. I'm wondering > if Yehuda or anyone else could estimate our risk of future incompatibility > if we implement user/key management around subusers in this manner? > > Thanks, > > Trey___ ceph-users mailing > list ceph-users@lists.ceph.com http://lists.ceph.com/ > listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question about RadosGW subusers
Probably a question for @yehuda : We have fairly strict user accountability requirements. The best way we have found to meet them with S3 object storage on Ceph is by using RadosGW subusers. If we set up one user per bucket, then set up subusers to provide separate individual S3 keys and access rights for different people or services using that bucket, then we can track who did what via access key in the RadosGW logs (at debug_rgw = 10/10). Of course, this is not a documented use case for subusers. I'm wondering if Yehuda or anyone else could estimate our risk of future incompatibility if we implement user/key management around subusers in this manner? Thanks, Trey ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how-to undo a "multisite" config
Hi Anton, I'm not sure exactly what you're trying to do. If you want to delete everything and start over, then just remove the zones, zonegroups and realms on both sides, and remove their pools. If you have a master zone you want to keep, but you want to remove the non-master zone that is mirroring it multisite, then do this: 1. Stop the offending radosgw service(s) on the non-master cluster. 2. Delete the zone on the non-master cluster, "radosgw-admin zone delete --rgw-zone=myzone-b". 3. "radosgw-admin zonegroup get" > tempfile 4. Edit tempfile and remove the non-empty zone from the JSON. 5. "radosgw-admin zonegroup set" < tempfile 6. Update the period: radosgw-admin period update --commit Then I believe you can just delete the RGW pools on the non-master side and start over if you need to. I found the multisite documentation accurate in Jewel if you're just doing a single zonegroup/realm. I concede that it's complex and confusing, though, and the man page and other documentation lag far behind. I have found it useful to consult the options parsing at the top of the radosgw-admin source code a few times. If you are stuck at something specific, shoot me an email. -- Trey On Mon, Apr 3, 2017 at 3:44 PM, wrote: > Hi Cephers. > > Quick question couldn't find a "how-to" or "docu"... not even sure if > someone else ever had to do it... > > What would be the steps to make a (failed) multisite config change, > exactly following > - http://docs.ceph.com/docs/master/radosgw/multisite/ > undone again? > > And as I'm on that topic now, any other WORKING documentation how to setup > "multisite"? > > Thanks & regards > Anton > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Latency between datacenters
The multisite configuration available starting in Jewel sound more appropriate for your situation. But then you need two separate clusters, each large enough to contain all of your objects. -- Trey On Tue, Feb 7, 2017 at 12:17 PM, Daniel Picolli Biazus wrote: > Hi Guys, > > I have been planning to deploy a Ceph Cluster with the following hardware: > > *OSDs:* > > 4 Servers Xeon D 1520 / 32 GB RAM / 5 x 6TB SAS 2 (6 OSD daemon per server) > > Monitor/Rados Gateways > > 5 Servers Xeon D 1520 32 GB RAM / 2 x 1TB SAS 2 (5 MON daemon/ 4 rados > daemon) > > Usage: Object Storage only > > However I need to deploy 2 OSD and 3 MON Servers in Miami datacenter > and another 2 OSD and 2 MON Servers in Montreal Datacenter. The latency > between these datacenters is 50 milliseconds. >Considering this scenario, should I use Federated Gateways or should I > use a single Cluster ? > > Thanks in advance > Daniel > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Monitoring
Just going into production now with a large-ish multisite radosgw setup on 10.2. We are starting off by alerting on anything that isn't HEALTH_OK, just to see how things go. If we get HEALTH_WARN but no mons or OSD's are down then it will be a low-level alert. We will massage scripts to pick up on different conditions. We're using graphite via collectd for visualization. -- Trey On Fri, Jan 13, 2017 at 3:15 PM, Chris Jones wrote: > General question/survey: > > Those that have larger clusters, how are you doing alerting/monitoring? > Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about > collectd related but more on initial alerts of an issue or potential issue? > What threshold do you use basically? Just trying to get a pulse of what > others are doing. > > Thanks in advance. > > -- > Best Regards, > Chris Jones > Bloomberg > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Memory leak in radosgw
Updating to libcurl 7.44 fixed the memory leak issue. Thanks for the tip, Ben. FWIW this was a massive memory leak, it rendered the system untenable in my testing. RGW multisite will flat not work with the current CentOS/RHEL7 libcurl. Seems like there are a lot of different problems caused by libcurl bugs/incompatibilities. -- Trey On Fri, Oct 21, 2016 at 11:04 AM, Trey Palmer wrote: > Hi Ben, > > I previously hit this bug: > > https://bugzilla.redhat.com/show_bug.cgi?id=1327142 > > So I updated from libcurl 7.29.0-25 to the new update package libcurl > 7.29.0-32 on RHEL 7, which fixed the deadlock problem. > > I had not seen the issue you linked. It doesn't seem directly related, > since my problem is a memory leak and not CPU. Clearly, though, older > libcurl versions remain problematic for multiple reasons, so I'll give a > newer one a try. > > Thanks for the input! > >-- Trey > > > > On Fri, Oct 21, 2016 at 3:21 AM, Ben Morrice wrote: > >> What version of libcurl are you using? >> >> I was hitting this bug with RHEL7/libcurl 7.29 which could also be your >> catalyst. >> >> http://tracker.ceph.com/issues/15915 >> >> Kind regards, >> >> Ben Morrice >> >> __ >> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 >> EPFL ENT CBS BBP >> Biotech Campus >> Chemin des Mines 9 >> 1202 Geneva >> Switzerland >> >> On 20/10/16 21:41, Trey Palmer wrote: >> >> I've been trying to test radosgw multisite and have a pretty bad memory >> leak.It appears to be associated only with multisite sync. >> >> Multisite works well for a small numbers of objects.However, it all >> fell over when I wrote in 8M 64K objects to two buckets overnight for >> testing (via cosbench). >> >> The leak appears to happen on the multisite transfer source -- that is, the >> node where the objects were written originally. The radosgw process >> eventually dies, I'm sure via the OOM killer, and systemd restarts it. >> Then repeat, though multisite sync pretty much stops at that point. >> >> I have tried 10.2.2, 10.2.3 and a combination of the two. I'm running on >> CentOS 7.2, using civetweb with SSL. I saw that the memory profiler only >> works on mon, osd and mds processes. >> >> Anyone else seen anything like this? >> >>-- Trey >> >> >> >> >> ___ >> ceph-users mailing >> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Memory leak in radosgw
Hi Ben, I previously hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1327142 So I updated from libcurl 7.29.0-25 to the new update package libcurl 7.29.0-32 on RHEL 7, which fixed the deadlock problem. I had not seen the issue you linked. It doesn't seem directly related, since my problem is a memory leak and not CPU. Clearly, though, older libcurl versions remain problematic for multiple reasons, so I'll give a newer one a try. Thanks for the input! -- Trey On Fri, Oct 21, 2016 at 3:21 AM, Ben Morrice wrote: > What version of libcurl are you using? > > I was hitting this bug with RHEL7/libcurl 7.29 which could also be your > catalyst. > > http://tracker.ceph.com/issues/15915 > > Kind regards, > > Ben Morrice > > __ > Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 > EPFL ENT CBS BBP > Biotech Campus > Chemin des Mines 9 > 1202 Geneva > Switzerland > > On 20/10/16 21:41, Trey Palmer wrote: > > I've been trying to test radosgw multisite and have a pretty bad memory > leak.It appears to be associated only with multisite sync. > > Multisite works well for a small numbers of objects.However, it all > fell over when I wrote in 8M 64K objects to two buckets overnight for > testing (via cosbench). > > The leak appears to happen on the multisite transfer source -- that is, the > node where the objects were written originally. The radosgw process > eventually dies, I'm sure via the OOM killer, and systemd restarts it. > Then repeat, though multisite sync pretty much stops at that point. > > I have tried 10.2.2, 10.2.3 and a combination of the two. I'm running on > CentOS 7.2, using civetweb with SSL. I saw that the memory profiler only > works on mon, osd and mds processes. > > Anyone else seen anything like this? > >-- Trey > > > > > ___ > ceph-users mailing > listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Memory leak in radosgw
I've been trying to test radosgw multisite and have a pretty bad memory leak.It appears to be associated only with multisite sync. Multisite works well for a small numbers of objects.However, it all fell over when I wrote in 8M 64K objects to two buckets overnight for testing (via cosbench). The leak appears to happen on the multisite transfer source -- that is, the node where the objects were written originally. The radosgw process eventually dies, I'm sure via the OOM killer, and systemd restarts it. Then repeat, though multisite sync pretty much stops at that point. I have tried 10.2.2, 10.2.3 and a combination of the two. I'm running on CentOS 7.2, using civetweb with SSL. I saw that the memory profiler only works on mon, osd and mds processes. Anyone else seen anything like this? -- Trey ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com