Re: [ceph-users] Large omap objects - how to fix ?
Thanks for everyones comments, including the thread hijackers :) I solved this in our infrastructure slightly differently: 1) find largest omap(s) # for i in `rados -p .bbp-gva-master.rgw.buckets.index ls`; do echo -n "$i:"; rados -p .bbp-gva-master.rgw.buckets.index listomapkeys $i |wc -l; done > omapkeys # sort -t: -k2 -r -n omapkeys |head -1 .dir.bbp-gva-master.125103342.18:7558822 2) confirm that the above index is not used by any buckets # cat bucketstats #!/bin/bash for bucket in $(radosgw-admin bucket list | jq -r .[]); do bucket_id=$(radosgw-admin metadata get bucket:${bucket} | jq -r .data.bucket.bucket_id) marker=$(radosgw-admin metadata get bucket:${bucket} | jq -r .data.bucket.marker) echo "$bucket:$bucket_id:$marker" done # ./bucketstats > bucketstats.out # grep 125103342.18 bucketstats.out 3) delete the rados object rados -p .bbp-gva-master.rgw.buckets.index rm .dir.bbp-gva-master.125103342.18 4) perform a deep scrub on the PGs that were affected # for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail -n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1" 137.1b: 1 137.36: 1 # ceph pg deep-scrub 137.1b # ceph pg deep-scrub 137.36 Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 10/31/2018 11:02 AM, Alexandru Cucu wrote: Hi, Didn't know that auto resharding does not remove old instances. Wrote my own script for cleanup as I've discovered this before reading your message. Not very wlll tested, but here it is: for bucket in $(radosgw-admin bucket list | jq -r .[]); do bucket_id=$(radosgw-admin metadata get bucket:${bucket} | jq -r .data.bucket.bucket_id) marker=$(radosgw-admin metadata get bucket:${bucket} | jq -r .data.bucket.marker) for instance in $(radosgw-admin metadata list bucket.instance | jq -r .[] | grep "^${bucket}:" | grep -v ${bucket_id} | grep -v ${marker} | cut -f2 -d':'); do radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance} radosgw-admin metadata rm bucket.instance:${bucket}:${instance} done done On Tue, Oct 30, 2018 at 3:30 PM Tomasz Płaza wrote: Hi hijackers, Please read: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030317.html TL;DR: Ceph should reshard big indexes, but after that it leaves them to be removed manually. Starting from some version, deep-scrub reports indexes above some threshold as HALTH_WARN. You should find it in osd logs. If You do not have logs, just listomapkeys on every object in default.rgw.buckets.index and find the biggest ones... it should be safe to remove those (radosgw-admin bi purge) but I can not guarantee it. On 26.10.2018 at 17:18, Florian Engelmann wrote: Hi, hijacking the hijacker! Sorry! radosgw-admin bucket reshard --bucket somebucket --num-shards 8 *** NOTICE: operation will not remove old bucket index objects *** *** these will need to be removed manually *** tenant: bucket name: somebucket old bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.1923153.1 new bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.3119759.1 total entries: 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 11000 12000 13000 14000 15000 16000 17000 18000 19000 2 21000 22000 23000 24000 25000 26000 27000 28000 29000 3 31000 32000 33000 34000 35000 36000 37000 38000 39000 4 41000 42000 43000 44000 45000 46000 47000 48000 49000 5 51000 52000 53000 54000 55000 56000 57000 58000 59000 6 61000 62000 63000 64000 65000 66000 67000 68000 69000 7 71000 72000 73000 74000 75000 76000 77000 78000 79000 8 81000 82000 83000 84000 85000 86000 87000 88000 89000 9 91000 92000 93000 94000 95000 96000 97000 98000 99000 10 101000 102000 103000 104000 105000 106000 107000 108000 109000 11 111000 112000 113000 114000 115000 116000 117000 118000 119000 12 121000 122000 123000 124000 125000 126000 127000 128000 129000 13 131000 132000 133000 134000 135000 136000 137000 138000 139000 14 141000 142000 143000 144000 145000 146000 147000 148000 149000 15 151000 152000 153000 154000 155000 156000 157000 158000 159000 16 161000 162000 163000 164000 165000 166000 167000 168000 169000 17 171000 172000 173000 174000 175000 176000 177000 178000 179000 18 181000 182000 183000 184000 185000 186000 187000 188000 189000 19 191000 192000 193000 194000 195000 196000 197000 198000 199000 20 201000 202000 203000 204000 205000 206000 207000 207660 What to do now? ceph -s is still: health: HEALTH_WARN 1 large omap objects But I have no idea how to: *** NOTICE: operation will not remove old bucket ind
[ceph-users] Large omap objects - how to fix ?
Hello all, After a recent Luminous upgrade (now running 12.2.8 with all OSDs migrated to bluestore, upgraded from 11.2.0 and running filestore) I am currently experiencing the warning 'large omap objects'. I know this is related to large buckets in radosgw, and luminous supports 'dynamic sharding' - however I feel that something is missing from our configuration and i'm a bit confused on what the right approach is to fix it. First a bit of background info: We previously had a multi site radosgw installation, however recently we decommissioned the second site. With the radosgw multi-site configuration we had 'bucket_index_max_shards = 0'. Since decommissioning the second site, I have removed the secondary zonegroup and changed 'bucket_index_max_shards' to be 16 for the single primary zone. All our buckets do not have a 'num_shards' field when running 'radosgw-admin bucket stats --bucket ' Is this normal ? Also - I'm finding it difficult to find out exactly what to do with the buckets that are affected with 'large omap' (see commands below). My interpretation of 'search the cluster log' is also listed below. What do I need to do to with the below buckets get back to an overall ceph HEALTH OK state ? :) # ceph health detail HEALTH_WARN 2 large omap objects 2 large objects found in pool '.bbp-gva-master.rgw.buckets.index' Search the cluster log for 'Large omap object found' for more details. # ceph osd pool get .bbp-gva-master.rgw.buckets.index pg_num pg_num: 64 # for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail -n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1" 137.1b: 1 137.36: 1 # cat buckets #!/bin/bash buckets=`radosgw-admin metadata list bucket |grep \" | cut -d\" -f2` for i in $buckets do id=`radosgw-admin bucket stats --bucket $i |grep \"id\" | cut -d\" -f4` pg=`ceph osd map .bbp-gva-master.rgw.buckets.index ${id} | awk '{print $11}' | cut -d\( -f2 | cut -d\) -f1` echo "$i:$id:$pg" done # ./buckets > pglist # egrep '137.1b|137.36' pglist |wc -l 192 The following doesn't appear to do change anything # for bucket in `cut -d: -f1 pglist`; do radosgw-admin reshard add --bucket $bucket --num-shards 8; done # radosgw-admin reshard process -- Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] slow requests and degraded cluster, but not really ?
Hello all, We have an issue with our ceph cluster where 'ceph -s' shows that several requests are blocked, however querying further with 'ceph health detail' indicates that the PGs affected are either active+clean or do not currently exist. OSD 32 appears to be working fine, and the cluster is performing as expected with no clients seemingly affected. Note - we had just upgraded to Luminous - and despite having "mon max pg per osd = 400" set in ceph.conf, we still have the message "too many PGs per OSD (278 > max 200)" In order to improve the situation above, I removed several pools that were not used anymore. I assume the PGs that ceph cannot find now are related to this pool deletion. Does anyone have any ideas on how to get out of this state? Details below - and full 'ceph health detail' attached to this email. Kind regards, Ben Morrice [root@ceph03 ~]# ceph -s cluster: id: 6c21c4ba-9c4d-46ef-93a3-441b8055cdc6 health: HEALTH_WARN Degraded data redundancy: 443765/14311983 objects degraded (3.101%), 162 pgs degraded, 241 pgs undersized 75 slow requests are blocked > 32 sec. Implicated osds 32 too many PGs per OSD (278 > max 200) services: mon: 5 daemons, quorum bbpocn01,bbpocn02,bbpocn03,bbpocn04,bbpocn07 mgr: bbpocn03(active, starting) osd: 36 osds: 36 up, 36 in rgw: 1 daemon active data: pools: 24 pools, 3440 pgs objects: 4.77M objects, 7.69TiB usage: 23.1TiB used, 104TiB / 127TiB avail pgs: 443765/14311983 objects degraded (3.101%) 3107 active+clean 170 active+undersized 109 active+undersized+degraded 43 active+recovery_wait+degraded 10 active+recovering+degraded 1 active+recovery_wait [root@ceph03 ~]# for i in `ceph health detail |grep stuck | awk '{print $2}'`; do echo -n "$i: " ; ceph pg $i query -f plain | cut -d: -f2 | cut -d\" -f2; done 150.270: active+clean 150.2a0: active+clean 150.2b6: active+clean 150.2c2: active+clean 150.2cc: active+clean 150.2d5: active+clean 150.2d6: active+clean 150.2e1: active+clean 150.2ef: active+clean 150.2f5: active+clean 150.2f7: active+clean 150.2fc: active+clean 150.315: active+clean 150.318: active+clean 150.31a: active+clean 150.320: active+clean 150.326: active+clean 150.36e: active+clean 150.380: active+clean 150.389: active+clean 150.3a4: active+clean 150.3ad: active+clean 150.3b4: active+clean 150.3bb: active+clean 150.3ce: active+clean 150.3d0: active+clean 150.3d8: active+clean 150.3e0: active+clean 150.3f6: active+clean 165.24c: Error ENOENT: problem getting command descriptions from pg.165.24c 165.28f: Error ENOENT: problem getting command descriptions from pg.165.28f 165.2b3: Error ENOENT: problem getting command descriptions from pg.165.2b3 165.2b4: Error ENOENT: problem getting command descriptions from pg.165.2b4 165.2d6: Error ENOENT: problem getting command descriptions from pg.165.2d6 165.2f4: Error ENOENT: problem getting command descriptions from pg.165.2f4 165.2fd: Error ENOENT: problem getting command descriptions from pg.165.2fd 165.30f: Error ENOENT: problem getting command descriptions from pg.165.30f 165.322: Error ENOENT: problem getting command descriptions from pg.165.322 165.325: Error ENOENT: problem getting command descriptions from pg.165.325 165.334: Error ENOENT: problem getting command descriptions from pg.165.334 165.36e: Error ENOENT: problem getting command descriptions from pg.165.36e 165.37c: Error ENOENT: problem getting command descriptions from pg.165.37c 165.382: Error ENOENT: problem getting command descriptions from pg.165.382 165.387: Error ENOENT: problem getting command descriptions from pg.165.387 165.3af: Error ENOENT: problem getting command descriptions from pg.165.3af 165.3da: Error ENOENT: problem getting command descriptions from pg.165.3da 165.3e0: Error ENOENT: problem getting command descriptions from pg.165.3e0 165.3e2: Error ENOENT: problem getting command descriptions from pg.165.3e2 165.3e9: Error ENOENT: problem getting command descriptions from pg.165.3e9 165.3fb: Error ENOENT: problem getting command descriptions from pg.165.3fb [root@ceph03 ~]# ceph pg 165.24c query Error ENOENT: problem getting command descriptions from pg.165.24c [root@ceph03 ~]# ceph pg 165.24c delete Error ENOENT: problem getting command descriptions from pg.165.24c -- Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland HEALTH_WARN Degraded data redundancy: 443765/14311983 objects degraded (3.101%), 162 pgs degraded, 241 pgs undersized; 75 slow requests are blocked > 32 sec. Implicated osds 32; too many PGs per OSD (278 > max 200) pg 150.270 is stuck undersized for 1871.987162, current state
[ceph-users] Ceph re-ip of OSD node
Hello We have a small cluster that we need to move to a different network in the same datacentre. My workflow was the following (for a single OSD host), but I failed (further details below) 1) ceph osd set noout 2) stop ceph-osd processes 3) change IP, gateway, domain (short hostname is the same), VLAN 4) change references of OLD IP (cluster and public network) in /etc/ceph/ceph.conf with NEW IP (see [1]) 5) start a single OSD process This seems to work as the NEW IP can communicate with mon hosts and osd hosts on the OLD network, the OSD is booted and is visible via 'ceph -w' however after a few seconds the OSD drops with messages such as the below in it's log file heartbeat_check: no reply from 10.1.1.100:6818 osd.14 ever on either front or back, first ping sent 2017-08-30 16:42:14.692210 (cutoff 2017-08-30 16:42:24.962245) There are logs like the above for every OSD server/process and then eventually a 2017-08-30 16:42:14.486275 7f6d2c966700 0 log_channel(cluster) log [WRN] : map e85351 wrongly marked me down Am I missing something obvious to reconfigure the network on a OSD host? [1] OLD [osd.0] host = sn01 devs = /dev/sdi cluster addr = 10.1.1.101 public addr = 10.1.1.101 NEW [osd.0] host = sn01 devs = /dev/sdi cluster addr = 10.1.2.101 public addr = 10.1.2.101 -- Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: Auth error with hostname instead of IP
Hello Eric, You are probably hitting the git commits listed on this thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-April/017731.html If this is the same behaviour, your options are: a) set all fqn inside the array of hostnames of your zonegroup(s) or b) remove 'rgw dns name' from your ceph.conf Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 09/06/17 23:50, Eric Choi wrote: When I send the a RGW request with hostname (with port that is not 80), I am seeing "SignatureDoesNotMatch" error. GET / HTTP/1.1 Host: cephrgw0002s2mdw1.sendgrid.net:50680 <http://cephrgw0002s2mdw1.sendgrid.net:50680> User-Agent: Minio (linux; amd64) minio-go/2.0.4 mc/2017-04-03T18:35:01Z Authorization: AWS **REDACTED**:**REDACTED** encoding="UTF-8"?>SignatureDoesNotMatchtx00093e0c1-00593b145c-996aae1-default996aae1-default-defaultmc: However this works fine when I send it with an IP address instead. Is the hostname part of the signature? If so, how can I make it so that it will work with hostname as well? Thank you, Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Prometheus RADOSGW usage exporter
Hello Berant, This is very nice! I've had a play with this against our installation of Ceph which is Kraken. We had to change the bucket_owner variable to be inside the for loop [1] and we are currently not getting any bytes sent/received statistics - though this is not an issue with your code, as these values are not updated via radosgw-admin either. I think i'm hitting this bug http://tracker.ceph.com/issues/19194 [1] for bucket in entry['buckets']: print bucket bucket_owner = bucket['owner'] Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 25/05/17 16:25, Berant Lemmenes wrote: Hello all, I've created prometheus exporter that scrapes the RADOSGW Admin Ops API and exports the usage information for all users and buckets. This is my first prometheus exporter so if anyone has feedback I'd greatly appreciate it. I've tested it against Hammer, and will shortly test against Jewel; though looking at the docs it should work fine for Jewel as well. https://github.com/blemmenes/radosgw_usage_exporter Sample output: radosgw_usage_successful_ops_total{bucket="shard0",category="create_bucket",owner="testuser"} 1.0 radosgw_usage_successful_ops_total{bucket="shard0",category="delete_obj",owner="testuser"} 1094978.0 radosgw_usage_successful_ops_total{bucket="shard0",category="list_bucket",owner="testuser"} 2276.0 radosgw_usage_successful_ops_total{bucket="shard0",category="put_obj",owner="testuser"} 1094978.0 radosgw_usage_successful_ops_total{bucket="shard0",category="stat_bucket",owner="testuser"} 20.0 radosgw_usage_received_bytes_total{bucket="shard0",category="create_bucket",owner="testuser"} 0.0 radosgw_usage_received_bytes_total{bucket="shard0",category="delete_obj",owner="testuser"} 0.0 radosgw_usage_received_bytes_total{bucket="shard0",category="list_bucket",owner="testuser"} 0.0 radosgw_usage_received_bytes_total{bucket="shard0",category="put_obj",owner="testuser"} 6352678.0 radosgw_usage_received_bytes_total{bucket="shard0",category="stat_bucket",owner="testuser"} 0.0 radosgw_usage_sent_bytes_total{bucket="shard0",category="create_bucket",owner="testuser"} 19.0 radosgw_usage_sent_bytes_total{bucket="shard0",category="delete_obj",owner="testuser"} 0.0 radosgw_usage_sent_bytes_total{bucket="shard0",category="list_bucket",owner="testuser"} 638339458.0 radosgw_usage_sent_bytes_total{bucket="shard0",category="put_obj",owner="testuser"} 79.0 radosgw_usage_sent_bytes_total{bucket="shard0",category="stat_bucket",owner="testuser"} 380.0 radosgw_usage_ops_total{bucket="shard0",category="create_bucket",owner="testuser"} 1.0 radosgw_usage_ops_total{bucket="shard0",category="delete_obj",owner="testuser"} 1094978.0 radosgw_usage_ops_total{bucket="shard0",category="list_bucket",owner="testuser"} 2276.0 radosgw_usage_ops_total{bucket="shard0",category="put_obj",owner="testuser"} 1094979.0 radosgw_usage_ops_total{bucket="shard0",category="stat_bucket",owner="testuser"} 20.0 Thanks, Berant ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hello again, I can work around this issue. If the host header is an IP address, the request is treated as a virtual: So if I auth to to my backends via IP, things work as expected. Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 28/04/17 09:26, Ben Morrice wrote: Hello Radek, Thanks again for your anaylsis. I can confirm on 10.2.7, if I remove the conf "rgw dns name" I can auth to directly to the radosgw host. In our environment we terminate SSL and route connections via haproxy, but it's still sometimes useful to be able to communicate directly to the backend radosgw server. It seems that it's not possible to set multiple "rgw dns name" entries in ceph.conf Is the only solution to modify the zonegroup and populate the 'hostnames' array with all backend server hostnames as well as the hostname terminated by haproxy? Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/04/17 13:53, Radoslaw Zarzynski wrote: Bingo! From the 10.2.5-admin: GET Thu, 27 Apr 2017 07:49:59 GMT / And also: 2017-04-27 09:49:59.117447 7f4a90ff9700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 2017-04-27 09:49:59.117449 7f4a90ff9700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/ The most interesting part is the "final ... in_hosted_domain=0". It looks we need to dig around RGWREST::preprocess(), rgw_find_host_in_domains() & company. There is a commit introduced in v10.2.6 that touches this area [1]. I'm definitely not saying it's the root cause. It might be that a change in the code just unhidden a configuration issue [2]. I will talk about the problem on the today's sync-up. Thanks for the logs! Regards, Radek [1] https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca16d7f4c6d0 [2] http://tracker.ceph.com/issues/17440 On Thu, Apr 27, 2017 at 10:11 AM, Ben Morrice <ben.morr...@epfl.ch> wrote: Hello Radek, Thank-you for your analysis so far! Please find attached logs for both the admin user and a keystone backed user from 10.2.5 (same host as before, I have simply downgraded the packages). Both users can authenticate and list buckets on 10.2.5. Also - I tried version 10.2.6 and see the same behavior as 10.2.7, so the bug i'm hitting looks like it was introduced in 10.2.6 Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/04/17 04:45, Radoslaw Zarzynski wrote: Thanks for the logs, Ben. It looks that two completely different authenticators have failed: the local, RADOS-backed auth (admin.txt) and Keystone-based one as well. In the second case I'm pretty sure that Keystone has rejected [1][2] to authenticate provided signature/StringToSign. RGW tried to fallback to the local auth which obviously didn't have any chance as the credentials were stored remotely. This explains the presence of "error reading user info" in the user-keystone.txt. What is common for both scenarios are the low-level things related to StringToSign crafting/signature generation at RadosGW's side. Following one has been composed for the request from admin.txt: GET Wed, 26 Apr 2017 09:18:42 GMT /bbpsrvc15.cscs.ch/ If you could provide a similar log from v10.2.5, I would be really grateful. Regards, Radek [1] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_rest_s3.cc#L3269-L3272 [2] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_common.h#L170 On Wed, Apr 26, 2017 at 11:29 AM, Morrice Ben <ben.morr...@epfl.ch> wrote: Hello Radek, Please find attached the failed request for both the admin user and a standard user (backed by keystone). Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland From: Radoslaw Zarzynski <rzarzyn...@mirantis.com> Sent: Tuesday, April 25, 2017 7:38 PM To: Morrice Ben Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail? Hello Ben, Could you provide full RadosGW's log for the failed request? I mean the lines starting from header listing, through the start marker ("== starting new request...") till the end marker? At the moment we can't see any details relat
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hello Radek, Thanks again for your anaylsis. I can confirm on 10.2.7, if I remove the conf "rgw dns name" I can auth to directly to the radosgw host. In our environment we terminate SSL and route connections via haproxy, but it's still sometimes useful to be able to communicate directly to the backend radosgw server. It seems that it's not possible to set multiple "rgw dns name" entries in ceph.conf Is the only solution to modify the zonegroup and populate the 'hostnames' array with all backend server hostnames as well as the hostname terminated by haproxy? Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/04/17 13:53, Radoslaw Zarzynski wrote: Bingo! From the 10.2.5-admin: GET Thu, 27 Apr 2017 07:49:59 GMT / And also: 2017-04-27 09:49:59.117447 7f4a90ff9700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 2017-04-27 09:49:59.117449 7f4a90ff9700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/ The most interesting part is the "final ... in_hosted_domain=0". It looks we need to dig around RGWREST::preprocess(), rgw_find_host_in_domains() & company. There is a commit introduced in v10.2.6 that touches this area [1]. I'm definitely not saying it's the root cause. It might be that a change in the code just unhidden a configuration issue [2]. I will talk about the problem on the today's sync-up. Thanks for the logs! Regards, Radek [1] https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca16d7f4c6d0 [2] http://tracker.ceph.com/issues/17440 On Thu, Apr 27, 2017 at 10:11 AM, Ben Morrice <ben.morr...@epfl.ch> wrote: Hello Radek, Thank-you for your analysis so far! Please find attached logs for both the admin user and a keystone backed user from 10.2.5 (same host as before, I have simply downgraded the packages). Both users can authenticate and list buckets on 10.2.5. Also - I tried version 10.2.6 and see the same behavior as 10.2.7, so the bug i'm hitting looks like it was introduced in 10.2.6 Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/04/17 04:45, Radoslaw Zarzynski wrote: Thanks for the logs, Ben. It looks that two completely different authenticators have failed: the local, RADOS-backed auth (admin.txt) and Keystone-based one as well. In the second case I'm pretty sure that Keystone has rejected [1][2] to authenticate provided signature/StringToSign. RGW tried to fallback to the local auth which obviously didn't have any chance as the credentials were stored remotely. This explains the presence of "error reading user info" in the user-keystone.txt. What is common for both scenarios are the low-level things related to StringToSign crafting/signature generation at RadosGW's side. Following one has been composed for the request from admin.txt: GET Wed, 26 Apr 2017 09:18:42 GMT /bbpsrvc15.cscs.ch/ If you could provide a similar log from v10.2.5, I would be really grateful. Regards, Radek [1] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_rest_s3.cc#L3269-L3272 [2] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_common.h#L170 On Wed, Apr 26, 2017 at 11:29 AM, Morrice Ben <ben.morr...@epfl.ch> wrote: Hello Radek, Please find attached the failed request for both the admin user and a standard user (backed by keystone). Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland From: Radoslaw Zarzynski <rzarzyn...@mirantis.com> Sent: Tuesday, April 25, 2017 7:38 PM To: Morrice Ben Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail? Hello Ben, Could you provide full RadosGW's log for the failed request? I mean the lines starting from header listing, through the start marker ("== starting new request...") till the end marker? At the moment we can't see any details related to the signature calculation. Regards, Radek On Thu, Apr 20, 2017 at 5:08 PM, Ben Morrice <ben.morr...@epfl.ch> wrote: Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a u
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hello Radek, Thank-you for your analysis so far! Please find attached logs for both the admin user and a keystone backed user from 10.2.5 (same host as before, I have simply downgraded the packages). Both users can authenticate and list buckets on 10.2.5. Also - I tried version 10.2.6 and see the same behavior as 10.2.7, so the bug i'm hitting looks like it was introduced in 10.2.6 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/04/17 04:45, Radoslaw Zarzynski wrote: Thanks for the logs, Ben. It looks that two completely different authenticators have failed: the local, RADOS-backed auth (admin.txt) and Keystone-based one as well. In the second case I'm pretty sure that Keystone has rejected [1][2] to authenticate provided signature/StringToSign. RGW tried to fallback to the local auth which obviously didn't have any chance as the credentials were stored remotely. This explains the presence of "error reading user info" in the user-keystone.txt. What is common for both scenarios are the low-level things related to StringToSign crafting/signature generation at RadosGW's side. Following one has been composed for the request from admin.txt: GET Wed, 26 Apr 2017 09:18:42 GMT /bbpsrvc15.cscs.ch/ If you could provide a similar log from v10.2.5, I would be really grateful. Regards, Radek [1] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_rest_s3.cc#L3269-L3272 [2] https://github.com/ceph/ceph/blob/v10.2.7/src/rgw/rgw_common.h#L170 On Wed, Apr 26, 2017 at 11:29 AM, Morrice Ben <ben.morr...@epfl.ch> wrote: Hello Radek, Please find attached the failed request for both the admin user and a standard user (backed by keystone). Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland From: Radoslaw Zarzynski <rzarzyn...@mirantis.com> Sent: Tuesday, April 25, 2017 7:38 PM To: Morrice Ben Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail? Hello Ben, Could you provide full RadosGW's log for the failed request? I mean the lines starting from header listing, through the start marker ("== starting new request...") till the end marker? At the moment we can't see any details related to the signature calculation. Regards, Radek On Thu, Apr 20, 2017 at 5:08 PM, Ben Morrice <ben.morr...@epfl.ch> wrote: Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a user (normally backed by OpenStack Keystone), but even worse I can also not authenticate with an admin user. Please see [1] for the results of performing a list bucket operation with python boto (script works against rgw 10.2.5) Also, if I try to authenticate from the 'master' rgw zone with a "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get: "ERROR: failed to fetch datalog info" "failed to retrieve sync info: (13) Permission denied" The above errors correlates to the errors in the log on the server running 10.2.7 (debug level 20) at [2] I'm not sure what I have done wrong or can try next? By the way, downgrading the packages from 10.2.7 to 10.2.5 returns authentication functionality [1] boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva [2] /bbpsrvc15.cscs.ch/admin/log 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2017-04-20 16:43:04.916329 7ff87c6c0700 2 req 354:0.052585:s3:GET /admin/log:get_obj:op status=0 2017-04-20 16:43:04.916339 7ff87c6c0700 2 req 354:0.052595:s3:GET /admin/log:get_obj:http status=403 2017-04-20 16:43:04.916343 7ff87c6c0700 1 == req done req=0x7ff87c6ba710 op status=0 http_status=403 == 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned -2027 2017-04-20 16:43:04.916390 7ff87c6c0700 1 civetweb: 0x7ff990015610: 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1" 403 0 - - 2017-04-20 16:43:
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hello Orit, Could it be that something has changed in 10.2.5+ which is related to reading the endpoints from the zone/period config? In my master zone I have specified the endpoint with a trailing backslash (which is also escaped), however I do not define the secondary endpoint this way. Am I hitting a bug here? Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 21/04/17 09:36, Ben Morrice wrote: Hello Orit, Please find attached the output from the radosgw commands and the relevant section from ceph.conf (radosgw) bbp-gva-master is running 10.2.5 bbp-gva-secondary is running 10.2.7 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 21/04/17 07:55, Orit Wasserman wrote: Hi Ben, On Thu, Apr 20, 2017 at 6:08 PM, Ben Morrice <ben.morr...@epfl.ch> wrote: Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a user (normally backed by OpenStack Keystone), but even worse I can also not authenticate with an admin user. Please see [1] for the results of performing a list bucket operation with python boto (script works against rgw 10.2.5) Also, if I try to authenticate from the 'master' rgw zone with a "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get: "ERROR: failed to fetch datalog info" "failed to retrieve sync info: (13) Permission denied" The above errors correlates to the errors in the log on the server running 10.2.7 (debug level 20) at [2] I'm not sure what I have done wrong or can try next? By the way, downgrading the packages from 10.2.7 to 10.2.5 returns authentication functionality Can you provide the following info: radosgw-admin period get radsogw-admin zonegroup get radsogw-admin zone get Can you provide your ceph.conf? Thanks, Orit [1] boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden encoding="UTF-8"?>SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva [2] /bbpsrvc15.cscs.ch/admin/log 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2017-04-20 16:43:04.916329 7ff87c6c0700 2 req 354:0.052585:s3:GET /admin/log:get_obj:op status=0 2017-04-20 16:43:04.916339 7ff87c6c0700 2 req 354:0.052595:s3:GET /admin/log:get_obj:http status=403 2017-04-20 16:43:04.916343 7ff87c6c0700 1 == req done req=0x7ff87c6ba710 op status=0 http_status=403 == 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned -2027 2017-04-20 16:43:04.916390 7ff87c6c0700 1 civetweb: 0x7ff990015610: 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1" 403 0 - - 2017-04-20 16:43:04.917212 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate() 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync: incremental_sync:1544: shard_id=20 mdlog_marker=1_1492686039.901886_5551978.1 sync_marker.marker=1_1492686039.901886_5551978.1 period_marker= 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync: incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20 2017-04-20 16:43:04.917236 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: init request 2017-04-20 16:43:04.917240 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917241 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status 2017-04-20 16:43:04.917303 7ff9777e6700 20 run: stack=0x7ff97000d420 is io blocked 2017-04-20 16:43:04.918285 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918295 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status complete 2017-04-20 16:43:04.918307 7ff9777e6700 20 rgw meta sync: shard_id=20 marker=1_1492686039.901886_5551978.1 last_update=2017-04-20 13:00:39.0.901886s 2017-04-20 16:43:04.918316 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff9700
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hello Orit, Please find attached the output from the radosgw commands and the relevant section from ceph.conf (radosgw) bbp-gva-master is running 10.2.5 bbp-gva-secondary is running 10.2.7 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 21/04/17 07:55, Orit Wasserman wrote: Hi Ben, On Thu, Apr 20, 2017 at 6:08 PM, Ben Morrice <ben.morr...@epfl.ch> wrote: Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a user (normally backed by OpenStack Keystone), but even worse I can also not authenticate with an admin user. Please see [1] for the results of performing a list bucket operation with python boto (script works against rgw 10.2.5) Also, if I try to authenticate from the 'master' rgw zone with a "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get: "ERROR: failed to fetch datalog info" "failed to retrieve sync info: (13) Permission denied" The above errors correlates to the errors in the log on the server running 10.2.7 (debug level 20) at [2] I'm not sure what I have done wrong or can try next? By the way, downgrading the packages from 10.2.7 to 10.2.5 returns authentication functionality Can you provide the following info: radosgw-admin period get radsogw-admin zonegroup get radsogw-admin zone get Can you provide your ceph.conf? Thanks, Orit [1] boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva [2] /bbpsrvc15.cscs.ch/admin/log 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2017-04-20 16:43:04.916329 7ff87c6c0700 2 req 354:0.052585:s3:GET /admin/log:get_obj:op status=0 2017-04-20 16:43:04.916339 7ff87c6c0700 2 req 354:0.052595:s3:GET /admin/log:get_obj:http status=403 2017-04-20 16:43:04.916343 7ff87c6c0700 1 == req done req=0x7ff87c6ba710 op status=0 http_status=403 == 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned -2027 2017-04-20 16:43:04.916390 7ff87c6c0700 1 civetweb: 0x7ff990015610: 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1" 403 0 - - 2017-04-20 16:43:04.917212 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate() 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync: incremental_sync:1544: shard_id=20 mdlog_marker=1_1492686039.901886_5551978.1 sync_marker.marker=1_1492686039.901886_5551978.1 period_marker= 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync: incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20 2017-04-20 16:43:04.917236 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: init request 2017-04-20 16:43:04.917240 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917241 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status 2017-04-20 16:43:04.917303 7ff9777e6700 20 run: stack=0x7ff97000d420 is io blocked 2017-04-20 16:43:04.918285 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918295 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status complete 2017-04-20 16:43:04.918307 7ff9777e6700 20 rgw meta sync: shard_id=20 marker=1_1492686039.901886_5551978.1 last_update=2017-04-20 13:00:39.0.901886s 2017-04-20 16:43:04.918316 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918317 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: sending rest request 2017-04-20 16:43:04.918381 7ff9777e6700 20 RGWEnv::set(): HTTP_DATE: Thu Apr 20 14:43:04 2017 2017-04-20 16:43:04.918390 7ff9777e6700 20 > HTTP_DATE -> Thu Apr 20 14:43:04 2017 2017-04-20 16:43:04.918404 7ff9777e6700 10 get_canon_resource(): dest=/admin/log 2017-04-20 16:43:04.918406 7ff9777e6700 10 generated canonical header: GET -- Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 E
[ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a user (normally backed by OpenStack Keystone), but even worse I can also not authenticate with an admin user. Please see [1] for the results of performing a list bucket operation with python boto (script works against rgw 10.2.5) Also, if I try to authenticate from the 'master' rgw zone with a "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get: "ERROR: failed to fetch datalog info" "failed to retrieve sync info: (13) Permission denied" The above errors correlates to the errors in the log on the server running 10.2.7 (debug level 20) at [2] I'm not sure what I have done wrong or can try next? By the way, downgrading the packages from 10.2.7 to 10.2.5 returns authentication functionality [1] boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden encoding="UTF-8"?>SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva [2] /bbpsrvc15.cscs.ch/admin/log 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2017-04-20 16:43:04.916329 7ff87c6c0700 2 req 354:0.052585:s3:GET /admin/log:get_obj:op status=0 2017-04-20 16:43:04.916339 7ff87c6c0700 2 req 354:0.052595:s3:GET /admin/log:get_obj:http status=403 2017-04-20 16:43:04.916343 7ff87c6c0700 1 == req done req=0x7ff87c6ba710 op status=0 http_status=403 == 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned -2027 2017-04-20 16:43:04.916390 7ff87c6c0700 1 civetweb: 0x7ff990015610: 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1" 403 0 - - 2017-04-20 16:43:04.917212 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate() 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync: incremental_sync:1544: shard_id=20 mdlog_marker=1_1492686039.901886_5551978.1 sync_marker.marker=1_1492686039.901886_5551978.1 period_marker= 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync: incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20 2017-04-20 16:43:04.917236 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: init request 2017-04-20 16:43:04.917240 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917241 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status 2017-04-20 16:43:04.917303 7ff9777e6700 20 run: stack=0x7ff97000d420 is io blocked 2017-04-20 16:43:04.918285 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918295 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status complete 2017-04-20 16:43:04.918307 7ff9777e6700 20 rgw meta sync: shard_id=20 marker=1_1492686039.901886_5551978.1 last_update=2017-04-20 13:00:39.0.901886s 2017-04-20 16:43:04.918316 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918317 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: sending rest request 2017-04-20 16:43:04.918381 7ff9777e6700 20 RGWEnv::set(): HTTP_DATE: Thu Apr 20 14:43:04 2017 2017-04-20 16:43:04.918390 7ff9777e6700 20 > HTTP_DATE -> Thu Apr 20 14:43:04 2017 2017-04-20 16:43:04.918404 7ff9777e6700 10 get_canon_resource(): dest=/admin/log 2017-04-20 16:43:04.918406 7ff9777e6700 10 generated canonical header: GET -- Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph pg inconsistencies - omap data lost
Hi all, We have a weird issue with a few inconsistent PGs. We are running ceph 11.2 on RHEL7. As an example inconsistent PG we have: # rados -p volumes list-inconsistent-obj 4.19 {"epoch":83986,"inconsistents":[{"object":{"name":"rbd_header.08f7fa43a49c7f","nspace":"","locator":"","snap":"head","version":28785242},"errors":[],"union_shard_errors":["omap_digest_mismatch_oi"],"selected_object_info":"4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd od alloc_hint [0 0 0])","shards":[{"osd":10,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0x"},{"osd":20,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0x"},{"osd":29,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0x"}]}]} If I try to repair this PG, I get the following in the OSD logs: 2017-04-04 14:31:37.825833 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 shard 10: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest 0x62b5dcb6 != omap_digest 0x from auth oi 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd od alloc_hint [0 0 0]) 2017-04-04 14:31:37.825863 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 shard 20: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest 0x62b5dcb6 != omap_digest 0x from auth oi 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd od alloc_hint [0 0 0]) 2017-04-04 14:31:37.825870 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 shard 29: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest 0x62b5dcb6 != omap_digest 0x from auth oi 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd od alloc_hint [0 0 0]) 2017-04-04 14:31:37.825877 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head: failed to pick suitable auth object 2017-04-04 14:32:37.926980 7f2d7cffd700 -1 log_channel(cluster) log [ERR] : 4.19 deep-scrub 3 errors If I list the omapvalues, they are null # rados -p volumes listomapvals rbd_header.08f7fa43a49c7f |wc -l 0 If I list the extended attributes on the filesystem of each OSD that hosts this file, they are indeed empty (all 3 OSDs are the same, but just listing one for brevity) getfattr /var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\\uheader.08f7fa43a49c7f__head_6C8FC219__4 getfattr: Removing leading '/' from absolute path names # file: var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\134uheader.08f7fa43a49c7f__head_6C8FC219__4 user.ceph._ user.ceph._@1 user.ceph._lock.rbd_lock user.ceph.snapset user.cephos.spill_out Is there anything I can do to recover from this situation? -- Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Memory leak in radosgw
What version of libcurl are you using? I was hitting this bug with RHEL7/libcurl 7.29 which could also be your catalyst. http://tracker.ceph.com/issues/15915 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 20/10/16 21:41, Trey Palmer wrote: > I've been trying to test radosgw multisite and have a pretty bad memory > leak.It appears to be associated only with multisite sync. > > Multisite works well for a small numbers of objects.However, it all > fell over when I wrote in 8M 64K objects to two buckets overnight for > testing (via cosbench). > > The leak appears to happen on the multisite transfer source -- that is, the > node where the objects were written originally. The radosgw process > eventually dies, I'm sure via the OOM killer, and systemd restarts it. > Then repeat, though multisite sync pretty much stops at that point. > > I have tried 10.2.2, 10.2.3 and a combination of the two. I'm running on > CentOS 7.2, using civetweb with SSL. I saw that the memory profiler only > works on mon, osd and mds processes. > > Anyone else seen anything like this? > >-- Trey > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW multisite replication failures
14 7f845b7fe700 5 bucket sync: sync obj: bbp-gva-master/20160928(@{i=.bbp-gva-secondary.rgw.buckets.index,e=.bbp-gva-secondary.rgw.buckets.extra}.bbp-gva-secondary.rgw.buckets[bbp-gva-master.106061599.1])/20160928-1mb-testfile[null][0] 2016-09-28 16:19:01.969017 7f845b7fe700 5 Sync:bbp-gva-:data:Object:20160928:bbp-gva-master.106061599.1/20160928-1mb-testfile[null][0]:fetch 2016-09-28 16:19:01.969363 7f84913f6700 20 get_obj_state: rctx=0x7f84913f46a0 obj=20160928:20160928-1mb-testfile state=0x7f844c17f348 s->prefetch_data=0 2016-09-28 16:19:01.970699 7f84913f6700 10 get_canon_resource(): dest=/20160928/20160928-1mb-testfile?versionId=null /20160928/20160928-1mb-testfile?versionId=null 2016-09-28 16:19:01.970882 7f84913f6700 20 sending request to https://bbpobjectstorage.epfl.ch:443/20160928/20160928-1mb-testfile?rgwx-zonegroup=bbp-gva=bbp-gva=null 2016-09-28 16:19:02.087169 7f84913f6700 10 received header:x-amz-meta-orig-filename: 20160928-1mb-testfile 2016-09-28 16:19:02.156463 7f845b7fe700 5 Sync:bbp-gva-:data:Object:20160928:bbp-gva-master.106061599.1/20160928-1mb-testfile[null][0]:done, retcode=-5 2016-09-28 16:19:02.156467 7f845b7fe700 0 ERROR: failed to sync object: 20160928:bbp-gva-master.106061599.1/20160928-1mb-testfile 2016-09-28 16:19:02.160115 7f845b7fe700 5 Sync:bbp-gva-:data:Object:20160928:bbp-gva-master.106061599.1/20160928-1mb-testfile[null][0]:finish 2016-09-28 16:19:02.163101 7f845b7fe700 5 Sync:bbp-gva-:data:BucketFull:20160928:bbp-gva-master.106061599.1:finish 2016-09-28 16:19:02.163108 7f845b7fe700 5 full sync on 20160928:bbp-gva-master.106061599.1 failed, retcode=-5 2016-09-28 16:19:02.163111 7f845b7fe700 5 Sync:bbp-gva-:data:Bucket:20160928:bbp-gva-master.106061599.1:finish Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/09/16 09:36, Ben Morrice wrote: > Hello Orit, > > Yes, this bug looks to correlate. Was this included in 10.2.3? > > I guess not as I have since updated to 10.2.3 but getting the same errors > > This bug talks about not retrying after a failure, however do you know > why the sync fails in the first place? It seems that basically any > object over 500k in size fails :( > > Kind regards, > > Ben Morrice > > __ > Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 > EPFL ENT CBS BBP > Biotech Campus > Chemin des Mines 9 > 1202 Geneva > Switzerland > > On 23/09/16 16:52, Orit Wasserman wrote: >> Hi Ben, >> It seems to be http://tracker.ceph.com/issues/16742. >> It is being backported to jewel http://tracker.ceph.com/issues/16794, >> you can try apply it and see if it helps you. >> >> Regards, >> Orit >> >> On Fri, Sep 23, 2016 at 9:21 AM, Ben Morrice <ben.morr...@epfl.ch> wrote: >>> Hello all, >>> >>> I have two separate ceph (10.2.2) clusters and have configured multisite >>> replication between the two. I can see some buckets get synced, however >>> others do not. >>> >>> Both clusters are RHEL7, and I have upgraded libcurl from 7.29 to 7.50 >>> (to avoid http://tracker.ceph.com/issues/15915). >>> >>> Below is some debug output on the 'secondary' zone (bbp-gva-secondary) >>> after uploading a file to the bucket 'bentest1' from onto the master >>> zone (bbp-gva-master). >>> >>> This appears to to be happening very frequently. The size of my bucket >>> pool in the master is ~120GB, however on the secondary site it's only >>> 5GB so things are not very happy at the moment. >>> >>> What steps can I take to work out why RGW cannot create a lock in the >>> log pool? >>> >>> Is there a way to force a full sync, starting fresh (the secondary site >>> is not advertised to users, thus it's okay to even clean pools to start >>> again)? >>> >>> >>> 2016-09-23 09:03:28.498292 7f992e664700 20 execute(): read data: >>> [{"key":6,"val":["bentest1:bbp-gva-master.85732351.16:-1"]}] >>> 2016-09-23 09:03:28.498453 7f992e664700 20 execute(): modified >>> key=bentest1:bbp-gva-master.85732351.16:-1 >>> 2016-09-23 09:03:28.498456 7f992e664700 20 wakeup_data_sync_shards: >>> source_zone=bbp-gva-master, >>> shard_ids={6=bentest1:bbp-gva-master.85732351.16:-1} >>> 2016-09-23 09:03:28.498547 7f9a72ffd700 20 incremental_sync(): async >>> update notification: bentest1:bbp-gva-master.85732351.16:-1 >>> 2016-09-23 09:03:28.499137 7f9a7dffb700 20 get_system_obj_
Re: [ceph-users] RGW multisite replication failures
Hello Orit, Yes, this bug looks to correlate. Was this included in 10.2.3? I guess not as I have since updated to 10.2.3 but getting the same errors This bug talks about not retrying after a failure, however do you know why the sync fails in the first place? It seems that basically any object over 500k in size fails :( Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 23/09/16 16:52, Orit Wasserman wrote: > Hi Ben, > It seems to be http://tracker.ceph.com/issues/16742. > It is being backported to jewel http://tracker.ceph.com/issues/16794, > you can try apply it and see if it helps you. > > Regards, > Orit > > On Fri, Sep 23, 2016 at 9:21 AM, Ben Morrice <ben.morr...@epfl.ch> wrote: >> Hello all, >> >> I have two separate ceph (10.2.2) clusters and have configured multisite >> replication between the two. I can see some buckets get synced, however >> others do not. >> >> Both clusters are RHEL7, and I have upgraded libcurl from 7.29 to 7.50 >> (to avoid http://tracker.ceph.com/issues/15915). >> >> Below is some debug output on the 'secondary' zone (bbp-gva-secondary) >> after uploading a file to the bucket 'bentest1' from onto the master >> zone (bbp-gva-master). >> >> This appears to to be happening very frequently. The size of my bucket >> pool in the master is ~120GB, however on the secondary site it's only >> 5GB so things are not very happy at the moment. >> >> What steps can I take to work out why RGW cannot create a lock in the >> log pool? >> >> Is there a way to force a full sync, starting fresh (the secondary site >> is not advertised to users, thus it's okay to even clean pools to start >> again)? >> >> >> 2016-09-23 09:03:28.498292 7f992e664700 20 execute(): read data: >> [{"key":6,"val":["bentest1:bbp-gva-master.85732351.16:-1"]}] >> 2016-09-23 09:03:28.498453 7f992e664700 20 execute(): modified >> key=bentest1:bbp-gva-master.85732351.16:-1 >> 2016-09-23 09:03:28.498456 7f992e664700 20 wakeup_data_sync_shards: >> source_zone=bbp-gva-master, >> shard_ids={6=bentest1:bbp-gva-master.85732351.16:-1} >> 2016-09-23 09:03:28.498547 7f9a72ffd700 20 incremental_sync(): async >> update notification: bentest1:bbp-gva-master.85732351.16:-1 >> 2016-09-23 09:03:28.499137 7f9a7dffb700 20 get_system_obj_state: >> rctx=0x7f9a3c5f8e08 >> obj=.bbp-gva-secondary.log:bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16 >> state=0x7f9a0c069848 s->prefetch_data=0 >> 2016-09-23 09:03:28.501379 7f9a72ffd700 20 operate(): sync status for >> bucket bentest1:bbp-gva-master.85732351.16:-1: 2 >> 2016-09-23 09:03:28.501433 7f9a877fe700 20 reading from >> .bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16 >> 2016-09-23 09:03:28.501447 7f9a877fe700 20 get_system_obj_state: >> rctx=0x7f9a877fc6d0 >> obj=.bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16 >> state=0x7f9a340cfbe8 s->prefetch_data=0 >> 2016-09-23 09:03:28.503269 7f9a877fe700 20 get_system_obj_state: >> rctx=0x7f9a877fc6d0 >> obj=.bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16 >> state=0x7f9a340cfbe8 s->prefetch_data=0 >> 2016-09-23 09:03:28.510428 7f9a72ffd700 20 sending request to >> https://bbpobjectstorage.epfl.ch:443/admin/log?bucket-instance=bentest1%3Abbp-gva-master.85732351.16=json=034.4578.3=bucket-index=bbp-gva >> 2016-09-23 09:03:28.625755 7f9a72ffd700 20 [inc sync] skipping object: >> bentest1:bbp-gva-master.85732351.16:-1/1m: non-complete operation >> 2016-09-23 09:03:28.625759 7f9a72ffd700 20 [inc sync] syncing object: >> bentest1:bbp-gva-master.85732351.16:-1/1m >> 2016-09-23 09:03:28.625831 7f9a72ffd700 20 bucket sync single entry >> (source_zone=bbp-gva-master) >> b=bentest1(@{i=.bbp-gva-secondary.rgw.buckets.index,e=.bbp-gva-master.rgw.buckets.extra}.bbp-gva-secondary.rgw.buckets[bbp-gva-master.85732351.16]):-1/1m[0] >> log_entry=036.4586.3 op=0 op_state=1 >> 2016-09-23 09:03:28.625857 7f9a72ffd700 5 bucket sync: sync obj: >> bbp-gva-master/bentest1(@{i=.bbp-gva-secondary.rgw.buckets.index,e=.bbp-gva-master.rgw.buckets.extra}.bbp-gva-secondary.rgw.buckets[bbp-gva-master.85732351.16])/1m[0] >> 2016-09-23 09:03:28.626092 7f9a85ffb700 20 get_obj_state: >> rctx=0x7f9a85ff96a0 obj=bentest1:1m state=0x7f9a30051cf8 s->prefetch_data=0 >> 2016-09-23 09:03:28.626119 7f9a72ffd700 20 s
[ceph-users] RGW multisite replication failures
03:28.731703 7f9a72ffd700 20 cr:s=0x7f9a3c5a4f90:op=0x7f9a3ca75ef0:20RGWContinuousLeaseCR: couldn't lock .bbp-gva-secondary.log:bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16:sync_lock: retcode=-16 2016-09-23 09:03:28.731721 7f9a72ffd700 0 ERROR: incremental sync on bentest1 bucket_id=bbp-gva-master.85732351.16 shard_id=-1 failed, retcode=-16 2016-09-23 09:03:28.758421 7f9a72ffd700 20 store_marker(): updating marker marker_oid=bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16 marker=035.4585.2 2016-09-23 09:03:28.829207 7f9a72ffd700 0 ERROR: failed to sync object: bentest1:bbp-gva-master.85732351.16:-1/1m 2016-09-23 09:03:28.834281 7f9a72ffd700 20 store_marker(): updating marker marker_oid=bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16 marker=036.4586.3 -- Kind regards, Ben Morrice __________ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW multisite - second cluster woes
Hello, Looks fine on the first cluster: cluster1# radosgw-admin period get { "id": "6ea09956-60a7-48df-980c-2b5bbf71b565", "epoch": 2, "predecessor_uuid": "80026abd-49f4-436e-844f-f8743685dac5", "sync_status": [ "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "" ], "period_map": { "id": "6ea09956-60a7-48df-980c-2b5bbf71b565", "zonegroups": [ { "id": "rgw1-gva", "name": "rgw1-gva", "api_name": "", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "rgw1-gva-master", "zones": [ { "id": "rgw1-gva-master", "name": "rgw1-gva-master", "endpoints": [ "http:\/\/rgw1:80\/" ], "log_meta": "true", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "b23771d0-6005-41da-8ee0-aec03db510d7" } ], "short_zone_ids": [ { "key": "rgw1-gva-master", "val": 1414621010 } ] }, "master_zonegroup": "rgw1-gva", "master_zone": "rgw1-gva-master", "period_config": { "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 } }, "realm_id": "b23771d0-6005-41da-8ee0-aec03db510d7", "realm_name": "gold", "realm_epoch": 2 } And, from the second cluster I get this: cluster2 # radosgw-admin realm pull --url=http://rgw1:80 --access-key=access --secret=secret 2016-08-22 08:48:42.682785 7fc5d3fe29c0 0 error read_lastest_epoch .rgw.root:periods.381464e1-4326-4b6b-9191-35940c4f645f.latest_epoch { "id": "98a7b356-83fd-4d42-b895-b58d45fa4233", "name": "", "current_period": "381464e1-4326-4b6b-9191-35940c4f645f", "epoch": 1 } Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 19/08/16 08:46, Shilpa Manjarabad Jagannath wrote: > > - Original Message -
[ceph-users] RGW multisite - second cluster woes
Hello, I am trying to configure a second cluster into an existing Jewel RGW installation. I do not get the expected output when I perform a 'radosgw-admin realm pull'. My realm on the first cluster is called 'gold', however when doing a realm pull it doesn't reflect the 'gold' name or id and I get an error related to latest_epoch (?). The documentation seems straight forward, so i'm not quite sure what i'm missing here? Please see below for the full output. # radosgw-admin realm pull --url=http://cluster1:80 --access-key=access --secret=secret 2016-08-18 17:20:09.585261 7fb939d879c0 0 error read_lastest_epoch .rgw.root:periods.8c64a4dd-ccd8-4975-b63b-324fbb24aab6.latest_epoch { "id": "98a7b356-83fd-4d42-b895-b58d45fa4233", "name": "", "current_period": "8c64a4dd-ccd8-4975-b63b-324fbb24aab6", "epoch": 1 } # radosgw-admin period pull --url=http://cluster1:80 --access-key=access secret=secret 2016-08-18 17:21:33.277719 7f5dbc7849c0 0 error read_lastest_epoch .rgw.root:periods..latest_epoch { "id": "", "epoch": 0, "predecessor_uuid": "", "sync_status": [], "period_map": { "id": "", "zonegroups": [], "short_zone_ids": [] }, "master_zonegroup": "", "master_zone": "", "period_config": { "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 } }, "realm_id": "", "realm_name": "", "realm_epoch": 0 } # radosgw-admin realm default --rgw-realm=gold failed to init realm: (2) No such file or directory2016-08-18 17:21:46.220181 7f720defa9c0 0 error in read_id for id : (2) No such file or directory # radosgw-admin zonegroup default --rgw-zonegroup=us failed to init zonegroup: (2) No such file or directory 2016-08-18 17:22:10.348984 7f9b2da699c0 0 error in read_id for id : (2) No such file or directory -- Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW Jewel upgrade: realms and default .rgw.root pool?
f49d119c50590d63 state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.924347 7f9371d7da40 20 rados->read ofs=0 len=524288 2016-05-04 14:00:13.924834 7f9371d7da40 20 rados->read r=0 bl.length=118 2016-05-04 14:00:13.924852 7f9371d7da40 20 get_system_obj_state: rctx=0x7ffd86e56150 obj=.rgw.root:periods.21305dac-ee64-42ea-87cf-ee5bb3b42d40.latest_epoch state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.925401 7f9371d7da40 20 get_system_obj_state: s->obj_tag was set empty 2016-05-04 14:00:13.925407 7f9371d7da40 20 get_system_obj_state: rctx=0x7ffd86e56150 obj=.rgw.root:periods.21305dac-ee64-42ea-87cf-ee5bb3b42d40.latest_epoch state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.925409 7f9371d7da40 20 rados->read ofs=0 len=524288 2016-05-04 14:00:13.925950 7f9371d7da40 20 rados->read r=0 bl.length=10 2016-05-04 14:00:13.925971 7f9371d7da40 20 get_system_obj_state: rctx=0x7ffd86e56170 obj=.rgw.root:periods.21305dac-ee64-42ea-87cf-ee5bb3b42d40.1 state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.926584 7f9371d7da40 20 get_system_obj_state: s->obj_tag was set empty 2016-05-04 14:00:13.926590 7f9371d7da40 20 get_system_obj_state: rctx=0x7ffd86e56170 obj=.rgw.root:periods.21305dac-ee64-42ea-87cf-ee5bb3b42d40.1 state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.926592 7f9371d7da40 20 rados->read ofs=0 len=524288 2016-05-04 14:00:13.927347 7f9371d7da40 20 rados->read r=0 bl.length=242 2016-05-04 14:00:13.927387 7f9371d7da40 20 get_system_obj_state: rctx=0x7ffd86e561d0 obj=.bbp-dev.rgw.root:region_info.bbp-dev state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.928068 7f9371d7da40 20 get_system_obj_state: s->obj_tag was set empty 2016-05-04 14:00:13.928075 7f9371d7da40 20 get_system_obj_state: rctx=0x7ffd86e561d0 obj=.bbp-dev.rgw.root:region_info.bbp-dev state=0x7f93732b0e18 s->prefetch_data=0 2016-05-04 14:00:13.928077 7f9371d7da40 20 rados->read ofs=0 len=524288 2016-05-04 14:00:13.928759 7f9371d7da40 20 rados->read r=0 bl.length=212 -- Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com