Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete
I think there no other way. :) -- Yueliang Sent with Airmail On March 30, 2015 at 13:17:55, Kai KH Huang (huangk...@lenovo.com) wrote: Thanks for the quick response, and it seems to work! But what I expect to have is (replica number = 3) on two servers ( 1 host will store 2 copies, and the other store the 3rd one -- do deal with disk failure, rather only server failure). Is there a simple way to configure that, rather than building a custom CRUSH map? From: Yueliang [yueliang9...@gmail.com] Sent: Monday, March 30, 2015 12:04 PM To: ceph-users@lists.ceph.com; Kai KH Huang Subject: Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete Hi Kai KH ceph -s report 493 pgs undersized”, I guess you create the pool with default parameter size=3, but you only have two host, so there it not enough host two service the pool. you should add host or set size=2 when create pool or modify crush rule. -- Yueliang Sent with Airmail On March 30, 2015 at 11:16:38, Kai KH Huang (huangk...@lenovo.com) wrote: Hi, all I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with two servers. But when its status is always warning: [root@serverA ~]# ceph osd tree # id weight type name up/down reweight -1 62.04 root default -2 36.4 host serverA 0 3.64 osd.0 up 1 2 3.64 osd.2 up 1 1 3.64 osd.1 up 1 3 3.64 osd.3 up 1 4 3.64 osd.4 up 1 5 3.64 osd.5 up 1 6 3.64 osd.6 up 1 7 3.64 osd.7 up 1 8 3.64 osd.8 up 1 9 3.64 osd.9 up 1 -3 25.64 host serverB 10 3.64 osd.10 up 1 11 2 osd.11 up 1 12 2 osd.12 up 1 13 2 osd.13 up 1 14 2 osd.14 up 1 15 2 osd.15 up 1 16 2 osd.16 up 1 17 2 osd.17 up 1 18 2 osd.18 up 1 19 2 osd.19 up 1 20 2 osd.20 up 1 21 2 osd.21 up 1 [root@serverA ~]# ceph -s cluster ???169715 health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck inactive; 512 pgs stuck unclean; 493 pgs undersized monmap e1: 2 mons at {serverB=10.??.78:6789/0,serverA=10.?.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a osdmap e92634: 22 osds: 22 up, 22 in pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects 49099 MB used, 63427 GB / 63475 GB avail 493 active+undersized+degraded 19 creating+peering [root@serverA ~]# rados -p test31 ls 2015-03-30 09:57:18.607143 7f5251fcf700 0 -- :/1005913 10.??.78:6789/0 pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault 2015-03-30 09:57:21.610994 7f52484ad700 0 -- 10..80:0/1005913 10..78:6835/27111 pipe(0x140e010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x140e2a0).fault 2015-03-30 10:02:21.650191 7f52482ab700 0 -- 10..80:0/1005913 10.78:6835/27111 pipe(0x7f5238016c80 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5238016f10).fault * serverA is 10.???.80, serverB is 10..78 * ntpdate is updated * I tried to remove the pool and re-create it, and clean up all objects inside, but no change at all * firewall are both shutoff Any clue is welcomed, thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete
Another strange thing is that the last few (24) pg seems never get ready and stuck at creating (after 6 hours of waiting): [root@serverA ~]# ceph -s 2015-03-30 17:14:48.720396 7feb5bd7a700 0 -- :/1000964 10.???.78:6789/0 pipe(0x7feb60026120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7feb600263b0).fault cluster c09277a4-0eb9-41b1-b27f-a345c0169715 health HEALTH_WARN 24 pgs peering; 24 pgs stuck inactive; 24 pgs stuck unclean monmap e1: 2 mons at {mac0090fa6aaf7a=10.240.212.78:6789/0,mac0090fa6ab68a=10.???.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a osdmap e102839: 22 osds: 22 up, 22 in pgmap v210270: 512 pgs, 1 pools, 0 bytes data, 0 objects 51633 MB used, 63424 GB / 63475 GB avail 24 creating+peering 488 active+clean And I cannot retrieve the file at ServerA, which I put into Ceph cluster at ServerB: [root@serverA ~]# rados -p test32 get test.txt test.txt 2015-03-30 17:15:44.014158 7f06951b6700 0 -- 10.???.80:0/1002224 10.???.78:6867/29047 pipe(0x21e0f90 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x21e1220).fault 2015-03-30 17:16:36.066125 7f0694fb4700 0 -- 10.???.80:0/1002224 10..78:6867/29047 pipe(0x7f068000d880 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f068000db10).fault It looks it just hang there forever. Is it waiting for all pg to be ready? Or the ceph cluster is at error state? From: Yueliang [yueliang9...@gmail.com] Sent: Monday, March 30, 2015 1:50 PM To: ceph-users@lists.ceph.com; Kai KH Huang Subject: RE: [ceph-users] Ceph osd is all up and in, but every pg is incomplete I think there no other way. :) -- Yueliang Sent with Airmail On March 30, 2015 at 13:17:55, Kai KH Huang (huangk...@lenovo.commailto:huangk...@lenovo.com) wrote: Thanks for the quick response, and it seems to work! But what I expect to have is (replica number = 3) on two servers ( 1 host will store 2 copies, and the other store the 3rd one -- do deal with disk failure, rather only server failure). Is there a simple way to configure that, rather than building a custom CRUSH map? From: Yueliang [yueliang9...@gmail.com] Sent: Monday, March 30, 2015 12:04 PM To: ceph-users@lists.ceph.com; Kai KH Huang Subject: Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete Hi Kai KH ceph -s report 493 pgs undersized”, I guess you create the pool with default parameter size=3, but you only have two host, so there it not enough host two service the pool. you should add host or set size=2 when create pool or modify crush rule. -- Yueliang Sent with Airmail On March 30, 2015 at 11:16:38, Kai KH Huang (huangk...@lenovo.commailto:huangk...@lenovo.com) wrote: Hi, all I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with two servers. But when its status is always warning: [root@serverA ~]# ceph osd tree # idweight type name up/down reweight -1 62.04 root default -2 36.4host serverA 0 3.64osd.0 up 1 2 3.64osd.2 up 1 1 3.64osd.1 up 1 3 3.64osd.3 up 1 4 3.64osd.4 up 1 5 3.64osd.5 up 1 6 3.64osd.6 up 1 7 3.64osd.7 up 1 8 3.64osd.8 up 1 9 3.64osd.9 up 1 -3 25.64 host serverB 10 3.64osd.10 up 1 11 2 osd.11 up 1 12 2 osd.12 up 1 13 2 osd.13 up 1 14 2 osd.14 up 1 15 2 osd.15 up 1 16 2 osd.16 up 1 17 2 osd.17 up 1 18 2 osd.18 up 1 19 2 osd.19 up 1 20 2 osd.20 up 1 21 2 osd.21 up 1 [root@serverA ~]# ceph -s cluster ???169715 health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck inactive; 512 pgs stuck unclean; 493 pgs undersized monmap e1: 2 mons at {serverB=10.??.78:6789/0,serverA=10.?.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a osdmap e92634: 22 osds: 22 up, 22 in pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects 49099 MB used, 63427 GB / 63475 GB avail 493 active+undersized+degraded 19 creating+peering [root@serverA ~]# rados -p test31 ls 2015-03-30 09:57:18.607143 7f5251fcf700 0 -- :/1005913 10.??.78:6789/0 pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault 2015-03-30 09:57:21.610994 7f52484ad700 0 --
[ceph-users] How to test rbd's Copy-on-Read Feature
Hello All, I went through the below link and checked that Copy-on-Read is currently supported only on librbd and not on rbd kernel module. https://wiki.ceph.com/Planning/Blueprints/Infernalis/rbd%3A_kernel_rbd_client_supports_copy-on-read Can someone please let me know how to test Copy-on-Read using librbd ? What should be the IO pattern to see the performance changes. Any pointers appreciated. Thank You, Tanay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes
On 03/30/2015 01:29 PM, Mark Nelson wrote: This is definitely something that we've discussed, though I don't think anyone has really planned out what a complete solution would look like including processor affinity, etc. Before I joined inktank I worked at a supercomputing institute and one of the projects we worked on was to develop grid computing tools for bioinformatics research. Moving analytics rather than the data was a big topic for us too since genomics data at least tends to be pretty big. Interestingly I work for supercomputing/education research company and we are thinking for similar use case and purpose. So are interesting to know other people managing resources this way. - Gurvinder Potentially ceph could be a very interesting solution for that kind of thing. Mark On 03/30/2015 06:20 AM, Gurvinder Singh wrote: One interesting use case of combining Ceph with computing is running big data jobs on ceph itself. As with CephFS coming along, you can run Haddop/Spark jobs directly on ceph without needed to move your data to compute resources with data locality support. I am wondering if anyone in community is looking at combining storage and compute resources from this point of view. Regards, Gurvinder On 03/29/2015 09:19 PM, Nick Fisk wrote: There's probably a middle ground where you get the best of both worlds. Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way you get a bit of extra storage and can still use lower end CPU's, but don't have to worry so much about resource contention. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Martin Millnert Sent: 29 March 2015 19:58 To: Mark Nelson Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote: Having said that, small nodes are absolutely more expensive per OSD as far as raw hardware and power/cooling goes. The smaller volume manufacturers have on the units, the worse the margin typically (from buyers side). Also, CPUs typically run up a premium the higher you go. I've found a lot of local maximas, optimization-wise, over the past years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for instance. There may be local maximas along colocated low-scale storage/compute nodes, but the one major problem with colocating storage with compute is that you can't scale compute independently from storage efficiently, on using that building block alone. There may be temporal optimizations in doing so however (e.g. before you have reached sufficient scale). There's no single optimal answer when you're dealing with 20+ variables to consider... :) BR, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw authorization failed
Date: Wed, 25 Mar 2015 11:43:44 -0400 From: yeh...@redhat.com To: neville.tay...@hotmail.co.uk CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Radosgw authorization failed - Original Message - From: Neville neville.tay...@hotmail.co.uk To: ceph-users@lists.ceph.com Sent: Wednesday, March 25, 2015 8:16:39 AM Subject: [ceph-users] Radosgw authorization failed Hi all, I'm testing backup product which supports Amazon S3 as target for Archive storage and I'm trying to setup a Ceph cluster configured with the S3 API to use as an internal target for backup archives instead of AWS. I've followed the online guide for setting up Radosgw and created a default region and zone based on the AWS naming convention US-East-1. I'm not sure if this is relevant but since I was having issues I thought it might need to be the same. I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can create a bucket, create a folder, list buckets etc. The problem is when the backup software tries to create an object I get an authorization failure. It's using the same user/access/secret as I'm using from boto.s3 and I'm sure the creds are right as it lets me create the initial connection, it just fails when trying to create an object (backup folder). Here's the extract from the radosgw log: - 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET /:list_bucket:init op 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET /:list_bucket:verifying op mask 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET /:list_bucket:verifying op permissions 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test mask=49 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1 mask=49 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2 mask=49 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test owner=test perm=1 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1, policy perm=1, user_perm_mask=1, acl perm=1 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET /:list_bucket:verifying op params 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET /:list_bucket:executing 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2]) start num 1001 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET /:list_bucket:http status=200 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0 http_status=200 == 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request req=0x7f107000f0e0 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ: 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request req=0x7f107000f6b0 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request req=0x7f107000f0e0 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88 2015-03-25 15:07:26.517084 7f1058dd7700 20 CONTENT_TYPE=application/octet-stream 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX= 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs= 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015 15:07:26 GMT 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue 2015-03-25 15:07:26.517093 7f1058dd7700 20 HTTP_HOST=test1.devops-os-cog01.devops.local 2015-03-25 15:07:26.517094 7f1058dd7700 20 HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1 Java_HotSpot(TM)_Client_VM/24.55-b03 2015-03-25 15:07:26.517096 7f1058dd7700 20 HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26 2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88 2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD 2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on 2015-03-25 15:07:26.517100 7f1058dd7700 20
Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes
We have a related topic in CDS about hadoop+ceph(https://wiki.ceph.com/Planning/Blueprints/Infernalis/rgw%3A_Hadoop_FileSystem_Interface_for_a_RADOS_Gateway_Caching_Tier). It's not directly solve the data locality problem but try to avoid data migration between different storage cluster. It would be great if big data framework like Hadoop, Spark can export interface to let ceph or other storage backend aware of compute job schedule. And a new project tachyon(tachyon-project.org) is doing something like this. On Mon, Mar 30, 2015 at 7:20 PM, Gurvinder Singh gurvindersinghdah...@gmail.com wrote: One interesting use case of combining Ceph with computing is running big data jobs on ceph itself. As with CephFS coming along, you can run Haddop/Spark jobs directly on ceph without needed to move your data to compute resources with data locality support. I am wondering if anyone in community is looking at combining storage and compute resources from this point of view. Regards, Gurvinder On 03/29/2015 09:19 PM, Nick Fisk wrote: There's probably a middle ground where you get the best of both worlds. Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way you get a bit of extra storage and can still use lower end CPU's, but don't have to worry so much about resource contention. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Martin Millnert Sent: 29 March 2015 19:58 To: Mark Nelson Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote: Having said that, small nodes are absolutely more expensive per OSD as far as raw hardware and power/cooling goes. The smaller volume manufacturers have on the units, the worse the margin typically (from buyers side). Also, CPUs typically run up a premium the higher you go. I've found a lot of local maximas, optimization-wise, over the past years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for instance. There may be local maximas along colocated low-scale storage/compute nodes, but the one major problem with colocating storage with compute is that you can't scale compute independently from storage efficiently, on using that building block alone. There may be temporal optimizations in doing so however (e.g. before you have reached sufficient scale). There's no single optimal answer when you're dealing with 20+ variables to consider... :) BR, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes
This is definitely something that we've discussed, though I don't think anyone has really planned out what a complete solution would look like including processor affinity, etc. Before I joined inktank I worked at a supercomputing institute and one of the projects we worked on was to develop grid computing tools for bioinformatics research. Moving analytics rather than the data was a big topic for us too since genomics data at least tends to be pretty big. Potentially ceph could be a very interesting solution for that kind of thing. Mark On 03/30/2015 06:20 AM, Gurvinder Singh wrote: One interesting use case of combining Ceph with computing is running big data jobs on ceph itself. As with CephFS coming along, you can run Haddop/Spark jobs directly on ceph without needed to move your data to compute resources with data locality support. I am wondering if anyone in community is looking at combining storage and compute resources from this point of view. Regards, Gurvinder On 03/29/2015 09:19 PM, Nick Fisk wrote: There's probably a middle ground where you get the best of both worlds. Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way you get a bit of extra storage and can still use lower end CPU's, but don't have to worry so much about resource contention. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Martin Millnert Sent: 29 March 2015 19:58 To: Mark Nelson Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote: Having said that, small nodes are absolutely more expensive per OSD as far as raw hardware and power/cooling goes. The smaller volume manufacturers have on the units, the worse the margin typically (from buyers side). Also, CPUs typically run up a premium the higher you go. I've found a lot of local maximas, optimization-wise, over the past years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for instance. There may be local maximas along colocated low-scale storage/compute nodes, but the one major problem with colocating storage with compute is that you can't scale compute independently from storage efficiently, on using that building block alone. There may be temporal optimizations in doing so however (e.g. before you have reached sufficient scale). There's no single optimal answer when you're dealing with 20+ variables to consider... :) BR, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes
One interesting use case of combining Ceph with computing is running big data jobs on ceph itself. As with CephFS coming along, you can run Haddop/Spark jobs directly on ceph without needed to move your data to compute resources with data locality support. I am wondering if anyone in community is looking at combining storage and compute resources from this point of view. Regards, Gurvinder On 03/29/2015 09:19 PM, Nick Fisk wrote: There's probably a middle ground where you get the best of both worlds. Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way you get a bit of extra storage and can still use lower end CPU's, but don't have to worry so much about resource contention. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Martin Millnert Sent: 29 March 2015 19:58 To: Mark Nelson Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote: Having said that, small nodes are absolutely more expensive per OSD as far as raw hardware and power/cooling goes. The smaller volume manufacturers have on the units, the worse the margin typically (from buyers side). Also, CPUs typically run up a premium the higher you go. I've found a lot of local maximas, optimization-wise, over the past years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for instance. There may be local maximas along colocated low-scale storage/compute nodes, but the one major problem with colocating storage with compute is that you can't scale compute independently from storage efficiently, on using that building block alone. There may be temporal optimizations in doing so however (e.g. before you have reached sufficient scale). There's no single optimal answer when you're dealing with 20+ variables to consider... :) BR, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Creating and deploying OSDs in parallel
Hi, I am planning to modify our deployment script so that it can create and deploy multiple OSDs in parallel to the same host as well as on different hosts. Just wanted to check if there is any problem to run say 'ceph-deploy osd create' etc. in parallel while deploying cluster. Thanks Regards Somnath PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Slow writes with 1MB files
On Sat, Mar 28, 2015 at 10:12 AM, Barclay Jameson almightybe...@gmail.com wrote: I redid my entire Ceph build going back to to CentOS 7 hoping to the get the same performance I did last time. The rados bench test was the best I have ever had with a time of 740 MB wr and 1300 MB rd. This was even better than the first rados bench test that had performance equal to PanFS. I find that this does not translate to my CephFS. Even with the following tweaking it still at least twice as slow as PanFS and my first *Magical* build (that had absolutely no tweaking): OSD osd_op_treads 8 /sys/block/sd*/queue/nr_requests 4096 /sys/block/sd*/queue/read_ahead_kb 4096 Client rsize=16777216 readdir_max_bytes=16777216 readdir_max_entries=16777216 ~160 mins to copy 10 (1MB) files for CephFS vs ~50 mins for PanFS. Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s. Strange thing is none of the resources are taxed. CPU, ram, network, disks, are not even close to being taxed on either the client,mon/mds, or the osd nodes. The PanFS client node was a 10Gb network the same as the CephFS client but you can see the huge difference in speed. As per Gregs questions before: There is only one client reading and writing (time cp Small1/* Small2/.) but three clients have cephfs mounted, although they aren't doing anything on the filesystem. I have done another test where I stream data info a file as fast as the processor can put it there. (for (i=0; i 11; i++){ fprintf (out_file, I is : %d\n,i);} ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the above tuning vs 130 seconds for PanFS. Without the tuning it takes 230 seconds for CephFS although the first build did it in 130 seconds without any tuning. This leads me to believe the bottleneck is the mds. Does anybody have any thoughts on this? Are there any tuning parameters that I would need to speed up the mds? This is pretty likely, but 10 creates/second is just impossibly slow. The only other thing I can think of is that you might have enabled fragmentation but aren't now, which might make an impact on a directory with 100k entries. Or else your hardware is just totally wonky, which we've seen in the past but your server doesn't look quite large enough to be hitting any of the nasty NUMA stuff...but that's something else to look at which I can't help you with, although maybe somebody else can. If you're interested in diving into it and depending on the Ceph version you're running you can also examine the mds perfcounters (http://ceph.com/docs/master/dev/perf_counters/) and the op history (dump_ops_in_flight etc) and look for any operations which are noticeably slow. -Greg On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson almightybe...@gmail.com wrote: Yes it's the exact same hardware except for the MDS server (although I tried using the MDS on the old node). I have not tried moving the MON back to the old node. My default cache size is mds cache size = 1000 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks. I created 2048 for data and metadata: ceph osd pool create cephfs_data 2048 2048 ceph osd pool create cephfs_metadata 2048 2048 To your point on clients competing against each other... how would I check that? Do you have multiple clients mounted? Are they both accessing files in the directory(ies) you're testing? Were they accessing the same pattern of files for the old cluster? If you happen to be running a hammer rc or something pretty new you can use the MDS admin socket to explore a bit what client sessions there are and what they have permissions on and check; otherwise you'll have to figure it out from the client side. -Greg Thanks for the input! On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote: So this is exactly the same test you ran previously, but now it's on faster hardware and the test is slower? Do you have more data in the test cluster? One obvious possibility is that previously you were working entirely in the MDS' cache, but now you've got more dentries and so it's kicking data out to RADOS and then reading it back in. If you've got the memory (you appear to) you can pump up the mds cache size config option quite dramatically from it's default 10. Other things to check are that you've got an appropriately-sized metadata pool, that you've not got clients competing against each other inappropriately, etc. -Greg On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson almightybe...@gmail.com wrote: Opps I should have said that I am not just writing the data but copying it : time cp Small1/* Small2/* Thanks, BJ On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson almightybe...@gmail.com wrote: I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in
Re: [ceph-users] SSD Journaling
On Mon, Mar 30, 2015 at 1:01 PM, Garg, Pankaj pankaj.g...@caviumnetworks.com wrote: Hi, I’m benchmarking my small cluster with HDDs vs HDDs with SSD Journaling. I am using both RADOS bench and Block device (using fio) for testing. I am seeing significant Write performance improvements, as expected. I am however seeing the Reads coming out a bit slower on the SSD Journaling side. They are not terribly different, but sometimes 10% slower. Is that something other folks have also seen, or do I need some settings to be tuned properly? I’m wondering if accessing 2 drives for reads, adds latency and hence the throughput suffers. You're not reading off of the journal in any case (it's only read on restart). If I were to guess then the SSD journaling is just building up enough dirty data ahead of the backing filesystem that if you do a read it takes a little longer for the data to be readable through the local filesystem. There have been a number of threads here about configuring the journal which you might want to grab out of an archiving system and look at. :) -Greg Thanks Pankaj ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSD Journaling
Hi, I'm benchmarking my small cluster with HDDs vs HDDs with SSD Journaling. I am using both RADOS bench and Block device (using fio) for testing. I am seeing significant Write performance improvements, as expected. I am however seeing the Reads coming out a bit slower on the SSD Journaling side. They are not terribly different, but sometimes 10% slower. Is that something other folks have also seen, or do I need some settings to be tuned properly? I'm wondering if accessing 2 drives for reads, adds latency and hence the throughput suffers. Thanks Pankaj ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD Journaling
On 03/30/2015 03:01 PM, Garg, Pankaj wrote: Hi, I’m benchmarking my small cluster with HDDs vs HDDs with SSD Journaling. I am using both RADOS bench and Block device (using fio) for testing. I am seeing significant Write performance improvements, as expected. I am however seeing the Reads coming out a bit slower on the SSD Journaling side. They are not terribly different, but sometimes 10% slower. Is that something other folks have also seen, or do I need some settings to be tuned properly? I’m wondering if accessing 2 drives for reads, adds latency and hence the throughput suffers. Hi, What kind of reads are you seeing the degradation with? Is it consistent with different sizes and random/seq? Any interesting spikes or valleys during the tests? Thanks Pankaj ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it possible to change the MDS node after its been created
On Mon, Mar 30, 2015 at 3:15 PM, Francois Lafont flafdiv...@free.fr wrote: Hi, Gregory Farnum wrote: The MDS doesn't have any data tied to the machine you're running it on. You can either create an entirely new one on a different machine, or simply copy the config file and cephx keyring to the appropriate directories. :) Sorry to enter in this post but how can we *remove* a mds daemon of a ceph cluster? Are the commands below enough? stop the daemon rm -r /var/lib/ceph/mds/ceph-$id/ ceph auth del mds.$id Should we edit something in the mds map to remove once and for all the mds ? As long as you turn on another MDS which takes over the logical rank of the MDS you remove, you don't need to remove anything from the cluster store. Note that if you just copy the directory and keyring to the new location you shouldn't do the ceph auth del bit either. ;) -Greg -- François Lafont -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it possible to change the MDS node after its been created
On Mon, Mar 30, 2015 at 1:51 PM, Steve Hindle mech...@gmail.com wrote: Hi! I mistakenly created my MDS node on the 'wrong' server a few months back. Now I realized I placed it on a machine lacking IPMI and would like to move it to another node in my cluster. Is it possible to non-destructively move an MDS ? The MDS doesn't have any data tied to the machine you're running it on. You can either create an entirely new one on a different machine, or simply copy the config file and cephx keyring to the appropriate directories. :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Is it possible to change the MDS node after its been created
Hi! I mistakenly created my MDS node on the 'wrong' server a few months back. Now I realized I placed it on a machine lacking IPMI and would like to move it to another node in my cluster. Is it possible to non-destructively move an MDS ? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it possible to change the MDS node after its been created
Gregory Farnum wrote: Sorry to enter in this post but how can we *remove* a mds daemon of a ceph cluster? Are the commands below enough? stop the daemon rm -r /var/lib/ceph/mds/ceph-$id/ ceph auth del mds.$id Should we edit something in the mds map to remove once and for all the mds ? As long as you turn on another MDS which takes over the logical rank of the MDS you remove, you don't need to remove anything from the cluster store. Ok, and for just remove a mds I guess that commands above are enough. ;) Note that if you just copy the directory and keyring to the new location you shouldn't do the ceph auth del bit either. ;) Yes, it seems logical. Thank you Greg. :) -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it possible to change the MDS node after its been created
Hi, Gregory Farnum wrote: The MDS doesn't have any data tied to the machine you're running it on. You can either create an entirely new one on a different machine, or simply copy the config file and cephx keyring to the appropriate directories. :) Sorry to enter in this post but how can we *remove* a mds daemon of a ceph cluster? Are the commands below enough? stop the daemon rm -r /var/lib/ceph/mds/ceph-$id/ ceph auth del mds.$id Should we edit something in the mds map to remove once and for all the mds ? -- François Lafont -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] One host failure bring down the whole cluster
Hi, all I have a two-node Ceph cluster, and both are monitor and osd. When they're both up, osd are all up and in, everything is fine... almost: [root~]# ceph -s health HEALTH_WARN 25 pgs degraded; 316 pgs incomplete; 85 pgs stale; 24 pgs stuck degraded; 316 pgs stuck inactive; 85 pgs stuck stale; 343 pgs stuck unclean; 24 pgs stuck undersized; 25 pgs undersized; recovery 11/153 objects degraded (7.190%) monmap e1: 2 mons at {server_b=10.???.78:6789/0,server_a=10.???.80:6789/0}, election epoch 14, quorum 0,1 server_b,server_a osdmap e116375: 22 osds: 22 up, 22 in pgmap v238656: 576 pgs, 2 pools, 224 MB data, 59 objects 56175 MB used, 63420 GB / 63475 GB avail 11/153 objects degraded (7.190%) 15 active+undersized+degraded 75 stale+active+clean 2 active+remapped 158 active+clean 10 stale+active+undersized+degraded 316 incomplete But if I bring down one server, the whole cluster seems not functioning any more: [root~]# ceph -s 2015-03-31 10:32:43.848125 7f57e4105700 0 -- :/1017540 10.???.78:6789/0 pipe(0x7f57e0027120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f57e00273b0).fault This should not happen...Any thoughts? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host failure bring down the whole cluster
On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote: Hi, all I have a two-node Ceph cluster, and both are monitor and osd. When they're both up, osd are all up and in, everything is fine... almost: Two things. 1 - You *really* need a min of three monitors. Ceph cannot form a quorum with just two monitors and you run a risk of split brain. 2 - You also probably have a min size of two set (the default). This means that you need a minimum of two copies of each data object for writes to work. So with just two nodes, if one goes down you can't write to the other. So: - Install a extra monitor node - it doesn't have to be powerful, we just use a Intel Celeron NUC for that. - reduce your minimum size to 1 (One). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host failure bring down the whole cluster
On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote: Hi, all I have a two-node Ceph cluster, and both are monitor and osd. When they're both up, osd are all up and in, everything is fine... almost: Two things. 1 - You *really* need a min of three monitors. Ceph cannot form a quorum with just two monitors and you run a risk of split brain. 2 - You also probably have a min size of two set (the default). This means that you need a minimum of two copies of each data object for writes to work. So with just two nodes, if one goes down you can't write to the other. So: - Install a extra monitor node - it doesn't have to be powerful, we just use a Intel Celeron NUC for that. - reduce your minimum size to 1 (One). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hi:everyone Calamari can manage multiple ceph clusters ?
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cannot add OSD node into crushmap or all writes fail
I have this ceph node that will correctly recover into my ceph pool and performance looks to be normal for the rbd clients. However after a few minutes once finishing recovery the rbd clients begin to fall over and cannot write data to the pool. I've been trying to figure this out for weeks! None of the logs contain anything relevant at all. If I disable the node in the crushmap the rbd clients immediately begin writing to the other nodes. Ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host failure bring down the whole cluster
On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote: Hi, all I have a two-node Ceph cluster, and both are monitor and osd. When they're both up, osd are all up and in, everything is fine... almost: Two things. 1 - You *really* need a min of three monitors. Ceph cannot form a quorum with just two monitors and you run a risk of split brain. You can form quorums with an even number of monitors, and Ceph does so — there's no risk of split brain. The problem with 2 monitors is that a quorum is always 2 — which is exactly what you're seeing right now. You can't run with only one monitor up (assuming you have a non-zero number of them). 2 - You also probably have a min size of two set (the default). This means that you need a minimum of two copies of each data object for writes to work. So with just two nodes, if one goes down you can't write to the other. Also this. So: - Install a extra monitor node - it doesn't have to be powerful, we just use a Intel Celeron NUC for that. - reduce your minimum size to 1 (One). Yep. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Slow writes with 1MB files
On Sun, Mar 29, 2015 at 1:12 AM, Barclay Jameson almightybe...@gmail.com wrote: I redid my entire Ceph build going back to to CentOS 7 hoping to the get the same performance I did last time. The rados bench test was the best I have ever had with a time of 740 MB wr and 1300 MB rd. This was even better than the first rados bench test that had performance equal to PanFS. I find that this does not translate to my CephFS. Even with the following tweaking it still at least twice as slow as PanFS and my first *Magical* build (that had absolutely no tweaking): OSD osd_op_treads 8 /sys/block/sd*/queue/nr_requests 4096 /sys/block/sd*/queue/read_ahead_kb 4096 Client rsize=16777216 readdir_max_bytes=16777216 readdir_max_entries=16777216 ~160 mins to copy 10 (1MB) files for CephFS vs ~50 mins for PanFS. Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s. Strange thing is none of the resources are taxed. CPU, ram, network, disks, are not even close to being taxed on either the client,mon/mds, or the osd nodes. The PanFS client node was a 10Gb network the same as the CephFS client but you can see the huge difference in speed. As per Gregs questions before: There is only one client reading and writing (time cp Small1/* Small2/.) but three clients have cephfs mounted, although they aren't doing anything on the filesystem. I have done another test where I stream data info a file as fast as the processor can put it there. (for (i=0; i 11; i++){ fprintf (out_file, I is : %d\n,i);} ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the above tuning vs 130 seconds for PanFS. Without the tuning it takes 230 seconds for CephFS although the first build did it in 130 seconds without any tuning. This leads me to believe the bottleneck is the mds. Does anybody have any thoughts on this? Are there any tuning parameters that I would need to speed up the mds? could you enable mds debugging for a few seconds (ceph daemon mds.x config set debug_mds 10; sleep 10; ceph daemon mds.x config set debug_mds 0). and upload /var/log/ceph/mds.x.log to somewhere. Regards Yan, Zheng On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson almightybe...@gmail.com wrote: Yes it's the exact same hardware except for the MDS server (although I tried using the MDS on the old node). I have not tried moving the MON back to the old node. My default cache size is mds cache size = 1000 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks. I created 2048 for data and metadata: ceph osd pool create cephfs_data 2048 2048 ceph osd pool create cephfs_metadata 2048 2048 To your point on clients competing against each other... how would I check that? Do you have multiple clients mounted? Are they both accessing files in the directory(ies) you're testing? Were they accessing the same pattern of files for the old cluster? If you happen to be running a hammer rc or something pretty new you can use the MDS admin socket to explore a bit what client sessions there are and what they have permissions on and check; otherwise you'll have to figure it out from the client side. -Greg Thanks for the input! On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote: So this is exactly the same test you ran previously, but now it's on faster hardware and the test is slower? Do you have more data in the test cluster? One obvious possibility is that previously you were working entirely in the MDS' cache, but now you've got more dentries and so it's kicking data out to RADOS and then reading it back in. If you've got the memory (you appear to) you can pump up the mds cache size config option quite dramatically from it's default 10. Other things to check are that you've got an appropriately-sized metadata pool, that you've not got clients competing against each other inappropriately, etc. -Greg On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson almightybe...@gmail.com wrote: Opps I should have said that I am not just writing the data but copying it : time cp Small1/* Small2/* Thanks, BJ On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson almightybe...@gmail.com wrote: I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in 61 Mins (Took PanFS 59 Mins). I thought I could increase the performance by adding a better MDS server so I redid the entire build. Now it takes 4 times as long to write the same data as it did before. The only thing that changed was the MDS server. (I even tried moving the MDS back on the old slower node and the performance was the same.) The first install was on CentOS 7. I tried going down to CentOS 6.6 and it's the same results. I use the same scripts to install the OSDs (which I created because I can never get ceph-deploy to behave correctly.
Re: [ceph-users] Where is the systemd files?
The systemd service unit files were imported into the tree, but they have not been added into any upstream packaging yet. See the discussion at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769593 or git log -- systemd. I don't think there are any upstream tickets in Redmine for this yet. Since Hammer is very close to being released, the service unit files will not be available in the Hammer packages. The earliest we would ship them would be the Infernalis release series. I've recently added a _with_systemd conditional to the RPM spec (ceph.spec.in) in master in order to support socket directory creation using tmpfiles.d. That same _with_systemd logic could be extended to ship the service unit files on the relevant RPM-based platforms and ship SysV-init scripts on the older platforms (eg RHEL 6). I'm not quite sure how we ought to handle that on Debian-based packages. Is there a way to conditionalize the Debian packaging to use systemd on some versions of the distro, and use upstart on other versions ? - Ken On 03/26/2015 11:13 PM, Robert LeBlanc wrote: I understand that Giant should have systemd service files, but I don't see them in the CentOS 7 packages. https://github.com/ceph/ceph/tree/giant/systemd [ulhglive-root@mon1 systemd]# rpm -qa | grep --color=always ceph ceph-common-0.93-0.el7.centos.x86_64 python-cephfs-0.93-0.el7.centos.x86_64 libcephfs1-0.93-0.el7.centos.x86_64 ceph-0.93-0.el7.centos.x86_64 ceph-deploy-1.5.22-0.noarch [ulhglive-root@mon1 systemd]# for i in $(rpm -qa | grep ceph); do rpm -ql $i | grep -i --color=always systemd; done [nothing returned] Thanks, Robert LeBlanc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw authorization failed
- Original Message - From: Neville neville.tay...@hotmail.co.uk To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Monday, March 30, 2015 6:49:29 AM Subject: Re: [ceph-users] Radosgw authorization failed Date: Wed, 25 Mar 2015 11:43:44 -0400 From: yeh...@redhat.com To: neville.tay...@hotmail.co.uk CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Radosgw authorization failed - Original Message - From: Neville neville.tay...@hotmail.co.uk To: ceph-users@lists.ceph.com Sent: Wednesday, March 25, 2015 8:16:39 AM Subject: [ceph-users] Radosgw authorization failed Hi all, I'm testing backup product which supports Amazon S3 as target for Archive storage and I'm trying to setup a Ceph cluster configured with the S3 API to use as an internal target for backup archives instead of AWS. I've followed the online guide for setting up Radosgw and created a default region and zone based on the AWS naming convention US-East-1. I'm not sure if this is relevant but since I was having issues I thought it might need to be the same. I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can create a bucket, create a folder, list buckets etc. The problem is when the backup software tries to create an object I get an authorization failure. It's using the same user/access/secret as I'm using from boto.s3 and I'm sure the creds are right as it lets me create the initial connection, it just fails when trying to create an object (backup folder). Here's the extract from the radosgw log: - 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET /:list_bucket:init op 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET /:list_bucket:verifying op mask 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET /:list_bucket:verifying op permissions 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test mask=49 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1 mask=49 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2 mask=49 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test owner=test perm=1 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1, policy perm=1, user_perm_mask=1, acl perm=1 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET /:list_bucket:verifying op params 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET /:list_bucket:executing 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2]) start num 1001 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET /:list_bucket:http status=200 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0 http_status=200 == 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request req=0x7f107000f0e0 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ: 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request req=0x7f107000f6b0 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request req=0x7f107000f0e0 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88 2015-03-25 15:07:26.517084 7f1058dd7700 20 CONTENT_TYPE=application/octet-stream 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX= 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs= 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015 15:07:26 GMT 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue 2015-03-25 15:07:26.517093 7f1058dd7700 20 HTTP_HOST=test1.devops-os-cog01.devops.local 2015-03-25 15:07:26.517094 7f1058dd7700 20 HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1