Re: [ceph-users] Federated gateways
Hi, We did a PoC at Orange and encountered some difficulties in configurating federation. Can you check that placements targets are identical on each zone? brgds De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de wd_hw...@wistron.com Envoyé : vendredi 6 novembre 2015 03:01 À : cle...@centraldesktop.com Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] Federated gateways Hi Craig, I am testing the federated gateway of 1 region with 2 zones. And I found only metadata is replicated, the data is NOT. According to your check list, I am sure all thinks are checked. Could you review my configuration scripts? The configuration files are similar to http://docs.ceph.com/docs/master/radosgw/federated-config/. 1. For the master zone with 5 nodes of the region (1) create keyring sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring sudo chmod +r /etc/ceph/ceph.client.radosgw.keyring sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n client.radosgw.us-east-1 --gen-key sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n client.radosgw.us-west-1 --gen-key sudo ceph-authtool -n client.radosgw.us-east-1 --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.radosgw.keyring sudo ceph-authtool -n client.radosgw.us-west-1 --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.radosgw.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-east-1 -i /etc/ceph/ceph.client.radosgw.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-west-1 -i /etc/ceph/ceph.client.radosgw.keyring (2) modify Ceph cluster configurations and synchronize it ceph-deploy --overwrite-conf config push node1 node2 node3 node4 node5 (3) configure Apache (4) configure region sudo apt-get install -y radosgw radosgw-agent python-pip sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-east-1 sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-west-1 sudo rados -p .us.rgw.root rm region_info.default sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-east-1 sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-west-1 sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name client.radosgw.us-west-1 (5) create master zone sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-west-1 sudo radosgw-admin zone set --rgw-zone=us-west --infile /home/ceph/us-west.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-west --infile /home/ceph/us-west.json --name client.radosgw.us-west-1 sudo rados -p .rgw.root rm zone_info.default sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name client.radosgw.us-west-1 (6) create master zone's users sudo radosgw-admin user create --uid="us-east" --display-name="Region-us Zone-East" --name client.radosgw.us-east-1 --system --access_key=us_access_key --secret=us_secret_key sudo radosgw-admin user create --uid="us-west" --display-name="Region-us Zone-West" --name client.radosgw.us-west-1 --system --access_key=us_access_key --secret=us_secret_key (7) restart Ceph & apache2 & radosgw services 2. For the secondary zone with 5 nodes of the region (1) copy the keyring file 'ceph.client.radosgw.keyring' from master zone and import the keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-east-1 -i /etc/ceph/ceph.client.radosgw.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-west-1 -i /etc/ceph/ceph.client.radosgw.keyring (2) modify Ceph cluster configurations and synchronize it ceph-deploy --overwrite-conf config push node1 node2 node3 node4 node5 (3) configure Apache (4) copy infile '/home/ceph/us.json' for the master zone and create 'us' region sudo apt-get install -y radosgw radosgw-agent python-pip sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-east-1 sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-west-1 sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-east-1 sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-west-1 sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name client.radosgw.us-west-1 (5) create secondary one sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-east --infile /home
Re: [ceph-users] Federated gateways
Hi, Craig: I used 10 VMs for federated gateway testing. There are 5 nodes for us-east, and the others are for us-west. The two zones are independent. Before the configurations of the region and zone, I have the two zones with the same 'client.radosgw.[zone]' setting of ceph.conf. - [client.radosgw.us-east-1] rgw region = us rgw region root pool = .us.rgw.root rgw zone = us-east rgw zone root pool = .us-east.rgw.root host = node1-east keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = /tmp/radosgw.us-east.sock log file = /var/log/ceph/radosgw.us-east.log rgw dns name = node1-east.ceph.com [client.radosgw.us-west-1] rgw region = us rgw region root pool = .us.rgw.root rgw zone = us-west rgw zone root pool = .us-west.rgw.root host = node1-west keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = /tmp/radosgw.us-west.sock log file = /var/log/ceph/radosgw.us-west.log rgw dns name = node1-west.ceph.com - The needed pools were created by manual. And I also created a normal user in the two zones with same access key and secret key. After that, I used 's3cmd' to create a bucket as 'BUCKETa' and put 'ceph.conf' in the bucket to test the synchronization. There was some error message in the logging file of 'radosgw-agent'. I found the file of 'ceph.conf' was sent to the secondary zone and DELETED later by unknown reason. - application/json; charset=UTF-8 Fri, 06 Nov 2015 09:18:01 GMT x-amz-copy-source:BUCKETa/ceph.conf /BUCKETa/ceph.conf 2015-11-06 17:18:01,174 4558 [boto][DEBUG ] Signature: AWS us_access_key:p6+AscqnndOpcWfJMBO7ADDpNek= 2015-11-06 17:18:01,175 4558 [boto][DEBUG ] url = 'http://node1-west.ceph.com/BUCKETa/ceph.conf' params={'rgwx-op-id': 'admin1:4463:2', 'rgwx-source-zone': u'us-east', 'rgwx-client-id': 'radosgw-agent'} headers={'Content-Length': '0', 'User-Agent': 'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-66-generic', 'x-amz-copy-source': 'BUCKETa/ceph.conf', 'Date': 'Fri, 06 Nov 2015 09 :18:01 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'AWS us_access_key:p6+AscqnndOpcWfJMBO7ADDpNek='} data=None 2015-11-06 17:18:01,175 4558 [boto][DEBUG ] Method: PUT 2015-11-06 17:18:01,175 4558 [boto][DEBUG ] Path: /BUCKETa/ceph.conf?rgwx-op-id=admin1%3A4463%3A2=us-east=radosgw-agent 2015-11-06 17:18:01,175 4558 [boto][DEBUG ] Data: 2015-11-06 17:18:01,176 4558 [boto][DEBUG ] Headers: {'Content-Type': 'application/json; charset=UTF-8', 'x-amz-copy-source': 'BUCKETa/ceph.conf'} 2015-11-06 17:18:01,176 4558 [boto][DEBUG ] Host: node1-west.ceph.com 2015-11-06 17:18:01,176 4558 [boto][DEBUG ] Port: 80 2015-11-06 17:18:01,176 4558 [boto][DEBUG ] Params: {'rgwx-op-id': 'admin1%3A4463%3A2', 'rgwx-source-zone': 'us-east', 'rgwx-client-id': 'radosgw-agent'} 2015-11-06 17:18:01,177 4558 [boto][DEBUG ] Token: None 2015-11-06 17:18:01,177 4558 [boto][DEBUG ] StringToSign: PUT application/json; charset=UTF-8 Fri, 06 Nov 2015 09:18:01 GMT x-amz-copy-source:BUCKETa/ceph.conf /BUCKETa/ceph.conf 2015-11-06 17:18:01,177 4558 [boto][DEBUG ] Signature: AWS us_access_key:p6+AscqnndOpcWfJMBO7ADDpNek= 2015-11-06 17:18:01,203 4558 [radosgw_agent.worker][DEBUG ] object "BUCKETa/ceph.conf" not found on master, deleting from secondary 2015-11-06 17:18:01,203 4558 [boto][DEBUG ] path=/BUCKETa/ 2015-11-06 17:18:01,203 4558 [boto][DEBUG ] auth_path=/BUCKETa/ 2015-11-06 17:18:01,203 4558 [boto][DEBUG ] path=/BUCKETa/?max-keys=0 2015-11-06 17:18:01,204 4558 [boto][DEBUG ] auth_path=/BUCKETa/?max-keys=0 2015-11-06 17:18:01,204 4558 [boto][DEBUG ] Method: GET 2015-11-06 17:18:01,204 4558 [boto][DEBUG ] Path: /BUCKETa/?max-keys=0 2015-11-06 17:18:01,204 4558 [boto][DEBUG ] Data: 2015-11-06 17:18:01,204 4558 [boto][DEBUG ] Headers: {} 2015-11-06 17:18:01,205 4558 [boto][DEBUG ] Host: node1-west.ceph.com 2015-11-06 17:18:01,205 4558 [boto][DEBUG ] Port: 80 2015-11-06 17:18:01,205 4558 [boto][DEBUG ] Params: {} 2015-11-06 17:18:01,206 4558 [boto][DEBUG ] establishing HTTP connection: kwargs={'port': 80, 'timeout': 70} 2015-11-06 17:18:01,206 4558 [boto][DEBUG ] Token: None 2015-11-06 17:18:01,206 4558 [boto][DEBUG ] StringToSign: - Any help would be much appreciated. Best Regards, Wdhwang -Original Message- From: Craig Lewis [mailto:cle...@centraldesktop.com] Sent: Saturday, November 07, 2015 3:59 AM To: WD Hwang/WHQ/Wistron Cc: Ceph Users Subject: Re: [ceph-users] Federated gateways You are updating [radosgw-admin] in ceph.conf, in steps 1.4 and 2.4? I recall restarting things more often. IIRC, I would restart everything after every regionmap update or a ceph.conf update. I manually created
Re: [ceph-users] Federated gateways
Hi Craig, I am testing the federated gateway of 1 region with 2 zones. And I found only metadata is replicated, the data is NOT. According to your check list, I am sure all thinks are checked. Could you review my configuration scripts? The configuration files are similar to http://docs.ceph.com/docs/master/radosgw/federated-config/. 1. For the master zone with 5 nodes of the region (1) create keyring sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring sudo chmod +r /etc/ceph/ceph.client.radosgw.keyring sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n client.radosgw.us-east-1 --gen-key sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n client.radosgw.us-west-1 --gen-key sudo ceph-authtool -n client.radosgw.us-east-1 --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.radosgw.keyring sudo ceph-authtool -n client.radosgw.us-west-1 --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.radosgw.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-east-1 -i /etc/ceph/ceph.client.radosgw.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-west-1 -i /etc/ceph/ceph.client.radosgw.keyring (2) modify Ceph cluster configurations and synchronize it ceph-deploy --overwrite-conf config push node1 node2 node3 node4 node5 (3) configure Apache (4) configure region sudo apt-get install -y radosgw radosgw-agent python-pip sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-east-1 sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-west-1 sudo rados -p .us.rgw.root rm region_info.default sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-east-1 sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-west-1 sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name client.radosgw.us-west-1 (5) create master zone sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-west-1 sudo radosgw-admin zone set --rgw-zone=us-west --infile /home/ceph/us-west.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-west --infile /home/ceph/us-west.json --name client.radosgw.us-west-1 sudo rados -p .rgw.root rm zone_info.default sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name client.radosgw.us-west-1 (6) create master zone's users sudo radosgw-admin user create --uid="us-east" --display-name="Region-us Zone-East" --name client.radosgw.us-east-1 --system --access_key=us_access_key --secret=us_secret_key sudo radosgw-admin user create --uid="us-west" --display-name="Region-us Zone-West" --name client.radosgw.us-west-1 --system --access_key=us_access_key --secret=us_secret_key (7) restart Ceph & apache2 & radosgw services 2. For the secondary zone with 5 nodes of the region (1) copy the keyring file 'ceph.client.radosgw.keyring' from master zone and import the keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-east-1 -i /etc/ceph/ceph.client.radosgw.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.us-west-1 -i /etc/ceph/ceph.client.radosgw.keyring (2) modify Ceph cluster configurations and synchronize it ceph-deploy --overwrite-conf config push node1 node2 node3 node4 node5 (3) configure Apache (4) copy infile '/home/ceph/us.json' for the master zone and create 'us' region sudo apt-get install -y radosgw radosgw-agent python-pip sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-east-1 sudo radosgw-admin region set --infile /home/ceph/us.json --name client.radosgw.us-west-1 sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-east-1 sudo radosgw-admin region default --rgw-region=us --name client.radosgw.us-west-1 sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name client.radosgw.us-west-1 (5) create secondary one sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-east --infile /home/ceph/us-east.json --name client.radosgw.us-west-1 sudo radosgw-admin zone set --rgw-zone=us-west --infile /home/ceph/us-west.json --name client.radosgw.us-east-1 sudo radosgw-admin zone set --rgw-zone=us-west --infile /home/ceph/us-west.json --name client.radosgw.us-west-1 sudo rados -p .rgw.root rm zone_info.default sudo radosgw-admin regionmap update --name client.radosgw.us-east-1 sudo radosgw-admin regionmap update --name
Re: [ceph-users] Federated gateways
Well I upgraded both clusters to giant this morning just to see if that would help, and it didn’t. I have a couple questions though. I have the same regionmap on both clusters, with both zones in it, but then i only have the buckets and zone info for one zone in each cluster, is this right? Or do I need all the buckets and zones in both clusters? Reading the docs it doesn’t seem like i do because I’m expecting data to sync from one zone in one cluster to the other zone on the other cluster, but I don’t know what to think anymore. Also do both users need to be system users on both ends? Aaron On Nov 12, 2014, at 4:00 PM, Craig Lewis cle...@centraldesktop.com wrote: http://tracker.ceph.com/issues/9206 http://tracker.ceph.com/issues/9206 My post to the ML: http://www.spinics.net/lists/ceph-users/msg12665.html http://www.spinics.net/lists/ceph-users/msg12665.html IIRC, the system uses didn't see the other user's bucket in a bucket listing, but they could read and write the objects fine. On Wed, Nov 12, 2014 at 11:16 AM, Aaron Bassett aa...@five3genomics.com mailto:aa...@five3genomics.com wrote: In playing around with this a bit more, I noticed that the two users on the secondary node cant see each others buckets. Is this a problem? IIRC, the system user couldn't see each other's buckets, but they could read and write the objects. On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com mailto:cle...@centraldesktop.com wrote: I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? No, the unreleased 0.80.8 has the fix. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. That's pretty much the same behavior I was seeing with Apache 2.4. Try downgrading the primary cluster to Apache 2.2. In my testing, the secondary cluster could run 2.2 or 2.4. Do you have a link to that bug#? I want to see if it gives me any clues. Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
I have identical regionmaps in both clusters. I only created the zone's pools in that cluster. I didn't delete the default .rgw.* pools, so those exist in both zones. Both users need to be system on both ends, and have identical access and secrets. If they're not, this is likely your problem. On Fri, Nov 14, 2014 at 11:38 AM, Aaron Bassett aa...@five3genomics.com wrote: Well I upgraded both clusters to giant this morning just to see if that would help, and it didn’t. I have a couple questions though. I have the same regionmap on both clusters, with both zones in it, but then i only have the buckets and zone info for one zone in each cluster, is this right? Or do I need all the buckets and zones in both clusters? Reading the docs it doesn’t seem like i do because I’m expecting data to sync from one zone in one cluster to the other zone on the other cluster, but I don’t know what to think anymore. Also do both users need to be system users on both ends? Aaron On Nov 12, 2014, at 4:00 PM, Craig Lewis cle...@centraldesktop.com wrote: http://tracker.ceph.com/issues/9206 My post to the ML: http://www.spinics.net/lists/ceph-users/msg12665.html IIRC, the system uses didn't see the other user's bucket in a bucket listing, but they could read and write the objects fine. On Wed, Nov 12, 2014 at 11:16 AM, Aaron Bassett aa...@five3genomics.com wrote: In playing around with this a bit more, I noticed that the two users on the secondary node cant see each others buckets. Is this a problem? IIRC, the system user couldn't see each other's buckets, but they could read and write the objects. On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com wrote: I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? No, the unreleased 0.80.8 has the fix. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. That's pretty much the same behavior I was seeing with Apache 2.4. Try downgrading the primary cluster to Apache 2.2. In my testing, the secondary cluster could run 2.2 or 2.4. Do you have a link to that bug#? I want to see if it gives me any clues. Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
In playing around with this a bit more, I noticed that the two users on the secondary node cant see each others buckets. Is this a problem? On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com wrote: I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? No, the unreleased 0.80.8 has the fix. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. That's pretty much the same behavior I was seeing with Apache 2.4. Try downgrading the primary cluster to Apache 2.2. In my testing, the secondary cluster could run 2.2 or 2.4. Do you have a link to that bug#? I want to see if it gives me any clues. Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
http://tracker.ceph.com/issues/9206 My post to the ML: http://www.spinics.net/lists/ceph-users/msg12665.html IIRC, the system uses didn't see the other user's bucket in a bucket listing, but they could read and write the objects fine. On Wed, Nov 12, 2014 at 11:16 AM, Aaron Bassett aa...@five3genomics.com wrote: In playing around with this a bit more, I noticed that the two users on the secondary node cant see each others buckets. Is this a problem? IIRC, the system user couldn't see each other's buckets, but they could read and write the objects. On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com wrote: I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? No, the unreleased 0.80.8 has the fix. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. That's pretty much the same behavior I was seeing with Apache 2.4. Try downgrading the primary cluster to Apache 2.2. In my testing, the secondary cluster could run 2.2 or 2.4. Do you have a link to that bug#? I want to see if it gives me any clues. Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20: 2014-11-11 14:37:06.688331 7f54447f0700 1 == starting new request req=0x7f546800f3b0 = 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 0x7f51b4001950 prio 127 2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61
Re: [ceph-users] Federated gateways
Is that radosgw log from the primary or the secondary zone? Nothing in that log jumps out at me. I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com wrote: Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20: 2014-11-11 14:37:06.688331 7f54447f0700 1 == starting new request req=0x7f546800f3b0 = 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 2014-11-11 14:37:06.695313
Re: [ceph-users] Federated gateways
On Nov 11, 2014, at 4:21 PM, Craig Lewis cle...@centraldesktop.com wrote: Is that radosgw log from the primary or the secondary zone? Nothing in that log jumps out at me. This is the log from the secondary zone. That HTTP 500 response code coming back is the only problem I can find. There are a bunch of 404s from other requests to logs and stuff, but I assume those are normal because there’s no activity going on. I guess it’s just that cryptic WARNING: set_req_state_err err_no=5 resorting to 500 line that’s the problem. I think I need to get a stack trace from that somehow. I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. Thanks, Aaron On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com mailto:aa...@five3genomics.com wrote: Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20: 2014-11-11 14:37:06.688331 7f54447f0700 1 == starting new request req=0x7f546800f3b0 = 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875
Re: [ceph-users] Federated gateways
I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? No, the unreleased 0.80.8 has the fix. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. That's pretty much the same behavior I was seeing with Apache 2.4. Try downgrading the primary cluster to Apache 2.2. In my testing, the secondary cluster could run 2.2 or 2.4. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
One region two zones is the standard setup, so that should be fine. Is metadata (users and buckets) being replicated, but not data (objects)? Let's go through a quick checklist: - Verify that you enabled log_meta and log_data in the region.json for the master zone - Verify that RadosGW is using your region map with radosgw-admin regionmap get --name client.radosgw.name - Verifu - Verify that RadosGW is using your zone map with radosgw-admin zone get --name client.radosgw.name - Verify that all the pools in your zone exist (RadosGW only auto-creates the basic ones). - Verify that your system users exist in both zones with the same access and secret. Hopefully that gives you an idea what's not working correctly. If it doesn't, crank up the logging on the radosgw daemon on both sides, and check the logs. Add debug rgw = 20 to both ceph.conf (in the client.radosgw.name section), and restart. Hopefully those logs will tell you what's wrong. On Wed, Nov 5, 2014 at 11:39 AM, Aaron Bassett aa...@five3genomics.com wrote: Hello everyone, I am attempted to setup a two cluster situation for object storage disaster recovery. I have two physically separate sites so using 1 big cluster isn’t an option. I’m attempting to follow the guide at: http://ceph.com/docs/v0.80.5/radosgw/federated-config/ . After a couple days of flailing, I’ve settled on using 1 region with two zones, where each cluster is a zone. I’m now attempting to set up an agent as per the “Multi-Site Data Replication section. The agent kicks off ok and starts making all sorts of connections, but no objects were being copied to the non-master zone. I re-ran the agent with the -v flag and saw a lot of: DEBUG:urllib3.connectionpool:GET /admin/opstate?client-id=radosgw-agentobject=test%2F_shadow_.JjVixjWmebQTrRed36FL6D0vy2gDVZ__39op-id=phx-r1-head1%3A2451615%3A1 HTTP/1.1 200 None DEBUG:radosgw_agent.worker:op state is [] DEBUG:radosgw_agent.worker:error geting op state: list index out of range So it appears something is still wrong with my agent though I have no idea what. I can’t seem to find any errors in any other logs. Does anyone have any insight here? I’m also wondering if what I’m attempting with two cluster in the same region as separate zones makes sense? Thanks, Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways (our planning use case)
I've had some luck putting a load balancer infront of multiple zones to get around the multiple URL issue. You can get the LB to send POST/DELETE et al to the primary zone, but GET requests can be distributed to multiple zones. The only issue is the replication delay; your data may not be available on the secondary for reading yet... I'm pretty sure you can most LB's to check multiple zones for the object existing when doing a GET, and redirect to the primary if replication hasn't caught up; for what I was looking at, I didn't need this! Dave On Tue, Oct 7, 2014 at 1:33 AM, Craig Lewis cle...@centraldesktop.com wrote: This sounds doable, with a few caveats. Currently, replication is only one direction. You can only write to the primary zone, and you can read from the primary or secondary zones. A cluster can have many zones on it. I'm thinking your setup would be a star topology. Each telescope will be a primary zone, and replicate to a secondary zone in the main storage cluster. The main cluster will have one read-only secondary zone for each telescope. If you have other needs to write data to the main cluster, you can create another zone that only exists on the main cluster (possibly replicated to one of the telescopes with a good network connection). Each zone has it's own URL (primary and secondary), so you'd have a bit of management problem remembering to use the correct URL. The URLs can be whatever. Convention follow's Amazon's naming scheme, but you'd probably want to create your own scheme, something like http://telescope_name-site.inasan.ru/ and http://telescope_name-campus.inasan.ru/ You might have some problems with the replication if your VPN connections aren't stable. The replication agent isn't very tolerant of cluster problem, so I suspect (but haven't tested) that long VPN outages will need a replication agent restart. For sites that don't have permanent connections, just make the replication agent startup and shutdown part of the connection startup and shutdown process. Replication state is available via a REST api, so it can be monitored. I have tested large backlogs in replication. When I initially imported my data, I deliberately imported faster than I had bandwidth to replicate. At one point, my secondary cluster was ~10 million objects, and ~10TB behind the primary cluster. It eventually caught up, but the process doesn't handle stops and restarts well. Restarting the replication while it was dealing with the backlog will start from the beginning of the backlog. This can be a problem if your backlog is so large that it won't finish in a day, because log rotation will restart the replication agent. If that's something you think might be a problem, I have some strategies to deal with it, but they're manual and hacky. Does that sound feasible? On Mon, Oct 6, 2014 at 5:42 AM, Pavel V. Kaygorodov pa...@inasan.ru wrote: Hi! Our institute now planning to deploy a set of robotic telescopes across a country. Most of the telescopes will have low bandwidth and high latency, or even not permanent internet connectivity. I think, we can set up synchronization of observational data with ceph, using federated gateways: 1. The main big storage ceph cluster will be set up in our institute main building 2. The small ceph clusters will be set up near each telescope, to store only the data from local telescope 3. VPN tunnels will be set up from each telescope site to our institute 4. Federated gateways mechanism will do all the magic to synchronize data Is this a realistic plan? What problems we can meet with this setup? Thanks in advance, Pavel. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways (our planning use case)
This sounds doable, with a few caveats. Currently, replication is only one direction. You can only write to the primary zone, and you can read from the primary or secondary zones. A cluster can have many zones on it. I'm thinking your setup would be a star topology. Each telescope will be a primary zone, and replicate to a secondary zone in the main storage cluster. The main cluster will have one read-only secondary zone for each telescope. If you have other needs to write data to the main cluster, you can create another zone that only exists on the main cluster (possibly replicated to one of the telescopes with a good network connection). Each zone has it's own URL (primary and secondary), so you'd have a bit of management problem remembering to use the correct URL. The URLs can be whatever. Convention follow's Amazon's naming scheme, but you'd probably want to create your own scheme, something like http://telescope_name-site.inasan.ru/ and http://telescope_name-campus.inasan.ru/ You might have some problems with the replication if your VPN connections aren't stable. The replication agent isn't very tolerant of cluster problem, so I suspect (but haven't tested) that long VPN outages will need a replication agent restart. For sites that don't have permanent connections, just make the replication agent startup and shutdown part of the connection startup and shutdown process. Replication state is available via a REST api, so it can be monitored. I have tested large backlogs in replication. When I initially imported my data, I deliberately imported faster than I had bandwidth to replicate. At one point, my secondary cluster was ~10 million objects, and ~10TB behind the primary cluster. It eventually caught up, but the process doesn't handle stops and restarts well. Restarting the replication while it was dealing with the backlog will start from the beginning of the backlog. This can be a problem if your backlog is so large that it won't finish in a day, because log rotation will restart the replication agent. If that's something you think might be a problem, I have some strategies to deal with it, but they're manual and hacky. Does that sound feasible? On Mon, Oct 6, 2014 at 5:42 AM, Pavel V. Kaygorodov pa...@inasan.ru wrote: Hi! Our institute now planning to deploy a set of robotic telescopes across a country. Most of the telescopes will have low bandwidth and high latency, or even not permanent internet connectivity. I think, we can set up synchronization of observational data with ceph, using federated gateways: 1. The main big storage ceph cluster will be set up in our institute main building 2. The small ceph clusters will be set up near each telescope, to store only the data from local telescope 3. VPN tunnels will be set up from each telescope site to our institute 4. Federated gateways mechanism will do all the magic to synchronize data Is this a realistic plan? What problems we can meet with this setup? Thanks in advance, Pavel. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
fixed! thank you for the reply. It was the backslashes in the secret that was the issue. I generated a new gateway user with: radosgw-admin user create --uid=test2 --display-name=test2 --access-key={key} --secret={secret_without_slashes} --name client.radosgw.gateway and that worked. On 04/14/2014 09:57 PM, Craig Lewis wrote: 2014-04-14 12:39:20.556085 7f133f7ee700 10 auth_hdr: GET x-amz-date:Mon, 14 Apr 2014 11:39:01 + / 2014-04-14 12:39:20.556125 7f133f7ee700 15 *calculated digest=TQ5LP8ZeufSqKLumak6Aez4o+Pg=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 *auth_sign=hx94rY3BJn7HQKA6ERaksNMQPRs=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 compare=20 2014-04-14 12:39:20.556130 7f133f7ee700 10 *failed to authorize request* 2014-04-14 12:39:20.556167 7f133f7ee700 2 req 2:0.009095:s3:GET /:list_buckets:http status=403 2014-04-14 12:39:20.556396 7f133f7ee700 1 == req done req=0x8ca280 http_status=403 == Did you create all of the rados pools that are mentioned in ceph.conf and the region and zone maps? The hash s3cmd computed is different that the one RGW computed. Can you verify that the access and secret keys in .s3cfg match the output of: radosgw-admin user info --uid=test1 --name client.radosgw.gateway Does the secret have a backslash (\) character in it? The docs warn that not everything handles it well. I regenerated my keys, rather than testing if s3cmd worked correctly. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Systems and Storage Engineer, Digital Repository of Ireland (DRI) High Performance Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | ptier...@tchpc.tcd.ie Tel: +353-1-896-4466 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
Also good to know that s3cmd does not handle those escapes correctly. Thanks! *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com *Central Desktop. Work together in ways you never thought possible.* Connect with us Website http://www.centraldesktop.com/ | Twitter http://www.twitter.com/centraldesktop | Facebook http://www.facebook.com/CentralDesktop | LinkedIn http://www.linkedin.com/groups?gid=147417 | Blog http://cdblog.centraldesktop.com/ On 4/15/14 01:47 , Peter wrote: fixed! thank you for the reply. It was the backslashes in the secret that was the issue. I generated a new gateway user with: radosgw-admin user create --uid=test2 --display-name=test2 --access-key={key} --secret={secret_without_slashes} --name client.radosgw.gateway and that worked. On 04/14/2014 09:57 PM, Craig Lewis wrote: 2014-04-14 12:39:20.556085 7f133f7ee700 10 auth_hdr: GET x-amz-date:Mon, 14 Apr 2014 11:39:01 + / 2014-04-14 12:39:20.556125 7f133f7ee700 15 *calculated digest=TQ5LP8ZeufSqKLumak6Aez4o+Pg=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 *auth_sign=hx94rY3BJn7HQKA6ERaksNMQPRs=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 compare=20 2014-04-14 12:39:20.556130 7f133f7ee700 10 *failed to authorize request* 2014-04-14 12:39:20.556167 7f133f7ee700 2 req 2:0.009095:s3:GET /:list_buckets:http status=403 2014-04-14 12:39:20.556396 7f133f7ee700 1 == req done req=0x8ca280 http_status=403 == Did you create all of the rados pools that are mentioned in ceph.conf and the region and zone maps? The hash s3cmd computed is different that the one RGW computed. Can you verify that the access and secret keys in .s3cfg match the output of: radosgw-admin user info --uid=test1 --name client.radosgw.gateway Does the secret have a backslash (\) character in it? The docs warn that not everything handles it well. I regenerated my keys, rather than testing if s3cmd worked correctly. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Systems and Storage Engineer, Digital Repository of Ireland (DRI) High Performance Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ |ptier...@tchpc.tcd.ie Tel: +353-1-896-4466 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
Those backslashes as output by radosgw-admin are escape characters preceding the forward slash. They should be removed when you are connecting with most clients. AFAIK, s3cmd would work fine with your original key, had you stripped out the escape chars. You could also just regenerate or specify a key without them. Brian Andrus Storage Consultant, Inktank On Tue, Apr 15, 2014 at 9:45 AM, Craig Lewis cle...@centraldesktop.comwrote: Also good to know that s3cmd does not handle those escapes correctly. Thanks! *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com *Central Desktop. Work together in ways you never thought possible.* Connect with us Website http://www.centraldesktop.com/ | Twitterhttp://www.twitter.com/centraldesktop | Facebook http://www.facebook.com/CentralDesktop | LinkedInhttp://www.linkedin.com/groups?gid=147417 | Blog http://cdblog.centraldesktop.com/ On 4/15/14 01:47 , Peter wrote: fixed! thank you for the reply. It was the backslashes in the secret that was the issue. I generated a new gateway user with: radosgw-admin user create --uid=test2 --display-name=test2 --access-key={key} --secret={secret_without_slashes} --name client.radosgw.gateway and that worked. On 04/14/2014 09:57 PM, Craig Lewis wrote: 2014-04-14 12:39:20.556085 7f133f7ee700 10 auth_hdr: GET x-amz-date:Mon, 14 Apr 2014 11:39:01 + / 2014-04-14 12:39:20.556125 7f133f7ee700 15 *calculated digest=TQ5LP8ZeufSqKLumak6Aez4o+Pg=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 * auth_sign=hx94rY3BJn7HQKA6ERaksNMQPRs=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 compare=20 2014-04-14 12:39:20.556130 7f133f7ee700 10 *failed to authorize request* 2014-04-14 12:39:20.556167 7f133f7ee700 2 req 2:0.009095:s3:GET /:list_buckets:http status=403 2014-04-14 12:39:20.556396 7f133f7ee700 1 == req done req=0x8ca280 http_status=403 == Did you create all of the rados pools that are mentioned in ceph.conf and the region and zone maps? The hash s3cmd computed is different that the one RGW computed. Can you verify that the access and secret keys in .s3cfg match the output of: radosgw-admin user info --uid=test1 --name client.radosgw.gateway Does the secret have a backslash (\) character in it? The docs warn that not everything handles it well. I regenerated my keys, rather than testing if s3cmd worked correctly. ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Systems and Storage Engineer, Digital Repository of Ireland (DRI) High Performance Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.http://www.tchpc.tcd.ie/ | ptier...@tchpc.tcd.ie Tel: +353-1-896-4466 ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
i have the following in ceph.conf: [client.radosgw.gateway] host = cephgw keyring = /etc/ceph/keyring.radosgw.gateway rgw print continue = false rgw region = us rgw region root pool = .us.rgw.root rgw zone = us-master rgw zone root pool = .us-master.rgw.root rgw dns name = cephgw rgw socket path = /tmp/radosgw-us-master.sock log file = /var/log/ceph/radosgw-us-master.log rgw data = /var/lib/ceph/radosgw/ceph-radosgw.us-master-1 i run the following to create a gateway user on gateway node: radosgw-admin user create --uid=test1 --display-name=test --name client.radosgw.gateway Then i setup s3cmd with the access and secret key, and still the same, 'Access Denied' On 04/12/2014 09:35 PM, Craig Lewis wrote: On 4/11/14 02:36 , Peter wrote: Hello, I am testing out federated gateways. I have created one gateway with one region and one zone. The gateway appears to work. I am trying to test it with s3cmd before I continue with more regions and zones. I create a test gateway user: radosgw-admin user create --uid=test --display-name=test but s3cmd keeps getting 'ERROR: S3 error: 403 (AccessDenied):' i have tried specifying zone and region when creating test gateway user: radosgw-admin user create --uid=test1 --rgw-zone=us-master --rgw-region=us --display-name=test but i still get S3 403 error. What am i doing wrong? thanks Try your radosgw-admin user create command again with --name=$radosgw_name . $radosgw_name is whatever name you gave the radosgw agent in ceph.conf. Mine is client.radosgw.ceph2, on the host named ceph2. -- Systems and Storage Engineer, Digital Repository of Ireland (DRI) High Performance Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ |ptier...@tchpc.tcd.ie Tel: +353-1-896-4466 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
Here is log output for request to gateway: 2014-04-14 12:39:20.547012 7f1377aa97c0 20 enqueued request req=0x8ca280 2014-04-14 12:39:20.547036 7f1377aa97c0 20 RGWWQ: 2014-04-14 12:39:20.547038 7f1377aa97c0 20 req: 0x8ca280 2014-04-14 12:39:20.547044 7f1377aa97c0 10 allocated request req=0x8a6d30 2014-04-14 12:39:20.547062 7f133f7ee700 20 dequeued request req=0x8ca280 2014-04-14 12:39:20.547066 7f133f7ee700 20 RGWWQ: empty 2014-04-14 12:39:20.547072 7f133f7ee700 1 == starting new request req=0x8ca280 = 2014-04-14 12:39:20.547171 7f133f7ee700 2 req 2:0.99::GET /::initializing 2014-04-14 12:39:20.547178 7f133f7ee700 10 host=cephgw.example.com rgw_dns_name=cephgw 2014-04-14 12:39:20.547190 7f133f7ee700 10 meta HTTP_X_AMZ_DATE 2014-04-14 12:39:20.547197 7f133f7ee700 10 x x-amz-date:Mon, 14 Apr 2014 11:39:01 + 2014-04-14 12:39:20.547227 7f133f7ee700 10 s-object=NULL s-bucket=NULL 2014-04-14 12:39:20.547234 7f133f7ee700 20 FCGI_ROLE=RESPONDER 2014-04-14 12:39:20.547235 7f133f7ee700 20 SCRIPT_URL=/ 2014-04-14 12:39:20.547235 7f133f7ee700 20 SCRIPT_URI=http://cephgw.example.com 2014-04-14 12:39:20.547236 7f133f7ee700 20 HTTP_AUTHORIZATION=AWS 718UJ2F9TFIEC5FT8XYU:hx94rY3BJn7HQKA6ERaksNMQPRs= 2014-04-14 12:39:20.547237 7f133f7ee700 20 HTTP_HOST=cephgw.example.com 2014-04-14 12:39:20.547237 7f133f7ee700 20 HTTP_ACCEPT_ENCODING=identity 2014-04-14 12:39:20.547238 7f133f7ee700 20 CONTENT_LENGTH=0 2014-04-14 12:39:20.547239 7f133f7ee700 20 HTTP_X_AMZ_DATE=Mon, 14 Apr 2014 11:39:01 + 2014-04-14 12:39:20.547239 7f133f7ee700 20 HTTP_VIA=1. proxy.example.com.:3128 (squid/2.6.STABLE21) 2014-04-14 12:39:20.547240 7f133f7ee700 20 HTTP_X_FORWARDED_FOR=10.10.10.10 2014-04-14 12:39:20.547241 7f133f7ee700 20 HTTP_CACHE_CONTROL=max-age=259200 2014-04-14 12:39:20.547241 7f133f7ee700 20 HTTP_CONNECTION=keep-alive 2014-04-14 12:39:20.547242 7f133f7ee700 20 PATH=/usr/local/bin:/usr/bin:/bin 2014-04-14 12:39:20.547243 7f133f7ee700 20 SERVER_SIGNATURE=addressApache/2.2.22 (Ubuntu) Server at cephgw.example.com Port 80/address 2014-04-14 12:39:20.547255 7f133f7ee700 20 SERVER_SOFTWARE=Apache/2.2.22 (Ubuntu) 2014-04-14 12:39:20.547256 7f133f7ee700 20 SERVER_NAME=cephgw.example.com 2014-04-14 12:39:20.547257 7f133f7ee700 20 SERVER_ADDR=10.10.10.10 2014-04-14 12:39:20.547258 7f133f7ee700 20 SERVER_PORT=80 2014-04-14 12:39:20.547258 7f133f7ee700 20 REMOTE_ADDR=10.10.10.11 2014-04-14 12:39:20.547259 7f133f7ee700 20 DOCUMENT_ROOT=/var/www 2014-04-14 12:39:20.547260 7f133f7ee700 20 SERVER_ADMIN=t...@cephgw.example.com 2014-04-14 12:39:20.547260 7f133f7ee700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi 2014-04-14 12:39:20.547261 7f133f7ee700 20 REMOTE_PORT=51981 2014-04-14 12:39:20.547262 7f133f7ee700 20 GATEWAY_INTERFACE=CGI/1.1 2014-04-14 12:39:20.547262 7f133f7ee700 20 SERVER_PROTOCOL=HTTP/1.0 2014-04-14 12:39:20.547263 7f133f7ee700 20 REQUEST_METHOD=GET 2014-04-14 12:39:20.547264 7f133f7ee700 20 QUERY_STRING=page=params= 2014-04-14 12:39:20.547264 7f133f7ee700 20 REQUEST_URI=/ 2014-04-14 12:39:20.547265 7f133f7ee700 20 SCRIPT_NAME=/ 2014-04-14 12:39:20.547266 7f133f7ee700 2 req 2:0.000195:s3:GET /::getting op 2014-04-14 12:39:20.547287 7f133f7ee700 2 req 2:0.000216:s3:GET /:list_buckets:authorizing 2014-04-14 12:39:20.547326 7f133f7ee700 20 get_obj_state: rctx=0x7f135c01d1a0 obj=.us-master.users:718UJ2F9TFIEC5FT8XYU state=0x7f135c01d268 s-prefetch_data=0 2014-04-14 12:39:20.547336 7f133f7ee700 10 moving .us-master.users+718UJ2F9TFIEC5FT8XYU to cache LRU end 2014-04-14 12:39:20.547339 7f133f7ee700 10 cache get: name=.us-master.users+718UJ2F9TFIEC5FT8XYU : type miss (requested=6, cached=3) 2014-04-14 12:39:20.554368 7f133f7ee700 10 cache put: name=.us-master.users+718UJ2F9TFIEC5FT8XYU 2014-04-14 12:39:20.554372 7f133f7ee700 10 moving .us-master.users+718UJ2F9TFIEC5FT8XYU to cache LRU end 2014-04-14 12:39:20.554381 7f133f7ee700 20 get_obj_state: s-obj_tag was set empty 2014-04-14 12:39:20.554390 7f133f7ee700 10 moving .us-master.users+718UJ2F9TFIEC5FT8XYU to cache LRU end 2014-04-14 12:39:20.554391 7f133f7ee700 10 cache get: name=.us-master.users+718UJ2F9TFIEC5FT8XYU : hit 2014-04-14 12:39:20.554445 7f133f7ee700 20 get_obj_state: rctx=0x7f135c01d580 obj=.us-master.users.uid:test state=0x7f135c01da58 s-prefetch_data=0 2014-04-14 12:39:20.554451 7f133f7ee700 10 moving .us-master.users.uid+test to cache LRU end 2014-04-14 12:39:20.554452 7f133f7ee700 10 cache get: name=.us-master.users.uid+test : type miss (requested=6, cached=3) 2014-04-14 12:39:20.555975 7f133f7ee700 10 cache put: name=.us-master.users.uid+test 2014-04-14 12:39:20.555977 7f133f7ee700 10 moving .us-master.users.uid+test to cache LRU end 2014-04-14 12:39:20.555981 7f133f7ee700 20 get_obj_state: s-obj_tag was set empty 2014-04-14 12:39:20.555984 7f133f7ee700 10 moving .us-master.users.uid+test to cache LRU end 2014-04-14 12:39:20.555986 7f133f7ee700 10 cache get: name=.us-master.users.uid+test : hit 2014-04-14
Re: [ceph-users] Federated gateways
2014-04-14 12:39:20.556085 7f133f7ee700 10 auth_hdr: GET x-amz-date:Mon, 14 Apr 2014 11:39:01 + / 2014-04-14 12:39:20.556125 7f133f7ee700 15 *calculated digest=TQ5LP8ZeufSqKLumak6Aez4o+Pg=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 *auth_sign=hx94rY3BJn7HQKA6ERaksNMQPRs=* 2014-04-14 12:39:20.556127 7f133f7ee700 15 compare=20 2014-04-14 12:39:20.556130 7f133f7ee700 10 *failed to authorize request* 2014-04-14 12:39:20.556167 7f133f7ee700 2 req 2:0.009095:s3:GET /:list_buckets:http status=403 2014-04-14 12:39:20.556396 7f133f7ee700 1 == req done req=0x8ca280 http_status=403 == Did you create all of the rados pools that are mentioned in ceph.conf and the region and zone maps? The hash s3cmd computed is different that the one RGW computed. Can you verify that the access and secret keys in .s3cfg match the output of: radosgw-admin user info --uid=test1 --name client.radosgw.gateway Does the secret have a backslash (\) character in it? The docs warn that not everything handles it well. I regenerated my keys, rather than testing if s3cmd worked correctly. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com