[ceph-users] radosgw + s3 + keystone + Browser-Based POST problem
Hi guys, We have integrated our radosgw (v0.80.7) with our OpenStack Keystone server (icehouse) successfully. The normal S3 operations can be executed with the Keystone user's EC2 credentials (EC2_ACCESS_KEY, EC2_SECRET_KEY). The radosgw correctly handles these user credentials, ask keystone to validate them, and the resulting objects belong to the Keystone tenant/project or the user (user is member of the tenant/project). But for the Browser-based upload POST [1] it doesn't work! The user is not correctly resolved, and the radosgw returns a 403 code! It looks like the s3 keystone integration doesn't work correctly when a S3 browser-based upload POST is used. See the attached log file (radosgw.log), you can clearly see the user lookup failing, and the status being set to 403: 2015-01-29 15:11:30.151157 7f25616fa700 0 User lookup failed! 2015-01-29 15:11:30.151171 7f25616fa700 15 Read RGWCORSConfigurationCORSConfigurationCORSRuleAllowedMethodPOST/AllowedMethodAllowedOriginhttps://staging.tube.switch.ch/AllowedOriginAllowedHeader*/AllowedHeader/CORSRule/CORSConfiguration 2015-01-29 15:11:30.151184 7f25616fa700 10 Method POST is supported 2015-01-29 15:11:30.151195 7f25616fa700 2 req 1123:0.013204:s3:POST /:post_obj:http status=403 Is this a bug? Or did we miss something else? Cheers, Valery [1] http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingHTTPPOST.html -- SWITCH -- Valery Tschopp, Software Engineer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland email: valery.tsch...@switch.ch phone: +41 44 268 1544 2015-01-29 15:11:30.130054 7f2634cef700 20 enqueued request req=0x7f26040838d0 2015-01-29 15:11:30.130084 7f2634cef700 20 RGWWQ: 2015-01-29 15:11:30.130086 7f2634cef700 20 req: 0x7f26040838d0 2015-01-29 15:11:30.130108 7f2634cef700 10 allocated request req=0x7f26040c58d0 2015-01-29 15:11:30.130200 7f2454ce1700 20 dequeued request req=0x7f26040838d0 2015-01-29 15:11:30.130208 7f2454ce1700 20 RGWWQ: empty 2015-01-29 15:11:30.130303 7f2454ce1700 20 CONTEXT_DOCUMENT_ROOT=/var/www 2015-01-29 15:11:30.130305 7f2454ce1700 20 CONTEXT_PREFIX= 2015-01-29 15:11:30.130306 7f2454ce1700 20 DOCUMENT_ROOT=/var/www 2015-01-29 15:11:30.130307 7f2454ce1700 20 FCGI_ROLE=RESPONDER 2015-01-29 15:11:30.130308 7f2454ce1700 20 GATEWAY_INTERFACE=CGI/1.1 2015-01-29 15:11:30.130308 7f2454ce1700 20 HTTP_ACCEPT=*/* 2015-01-29 15:11:30.130309 7f2454ce1700 20 HTTP_ACCEPT_ENCODING=gzip, deflate, sdch 2015-01-29 15:11:30.130310 7f2454ce1700 20 HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8,it;q=0.6 2015-01-29 15:11:30.130311 7f2454ce1700 20 HTTP_ACCESS_CONTROL_REQUEST_HEADERS=content-type 2015-01-29 15:11:30.130312 7f2454ce1700 20 HTTP_ACCESS_CONTROL_REQUEST_METHOD=POST 2015-01-29 15:11:30.130312 7f2454ce1700 20 HTTP_AUTHORIZATION= 2015-01-29 15:11:30.130313 7f2454ce1700 20 HTTP_CACHE_CONTROL=no-cache 2015-01-29 15:11:30.130314 7f2454ce1700 20 HTTP_CONNECTION=keep-alive 2015-01-29 15:11:30.130314 7f2454ce1700 20 HTTP_HOST=switch-original-staging.os.zhdk.cloud.switch.ch 2015-01-29 15:11:30.130315 7f2454ce1700 20 HTTP_ORIGIN=https://staging.tube.switch.ch 2015-01-29 15:11:30.130316 7f2454ce1700 20 HTTP_PRAGMA=no-cache 2015-01-29 15:11:30.130317 7f2454ce1700 20 HTTP_REFERER=https://staging.tube.switch.ch/channels/04238519/videos 2015-01-29 15:11:30.130318 7f2454ce1700 20 HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36 2015-01-29 15:11:30.130320 7f2454ce1700 20 HTTPS=on 2015-01-29 15:11:30.130321 7f2454ce1700 20 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2015-01-29 15:11:30.130322 7f2454ce1700 20 QUERY_STRING= 2015-01-29 15:11:30.130322 7f2454ce1700 20 REMOTE_ADDR=130.59.17.201 2015-01-29 15:11:30.130323 7f2454ce1700 20 REMOTE_PORT=53901 2015-01-29 15:11:30.130324 7f2454ce1700 20 REQUEST_METHOD=OPTIONS 2015-01-29 15:11:30.130325 7f2454ce1700 20 REQUEST_SCHEME=https 2015-01-29 15:11:30.130326 7f2454ce1700 20 REQUEST_URI=/ 2015-01-29 15:11:30.130327 7f2454ce1700 20 SCRIPT_FILENAME=/var/www/radosgw.fcgi 2015-01-29 15:11:30.130328 7f2454ce1700 20 SCRIPT_NAME=/ 2015-01-29 15:11:30.130329 7f2454ce1700 20 SCRIPT_URI=https://switch-original-staging.os.zhdk.cloud.switch.ch/ 2015-01-29 15:11:30.130330 7f2454ce1700 20 SCRIPT_URL=/ 2015-01-29 15:11:30.130331 7f2454ce1700 20 SERVER_ADDR=86.119.32.13 2015-01-29 15:11:30.130332 7f2454ce1700 20 SERVER_ADMIN=cl...@switch.ch 2015-01-29 15:11:30.130333 7f2454ce1700 20 SERVER_NAME=switch-original-staging.os.zhdk.cloud.switch.ch 2015-01-29 15:11:30.130334 7f2454ce1700 20 SERVER_PORT=443 2015-01-29 15:11:30.130334 7f2454ce1700 20 SERVER_PROTOCOL=HTTP/1.1 2015-01-29 15:11:30.130335 7f2454ce1700 20 SERVER_SIGNATURE= 2015-01-29 15:11:30.130350 7f2454ce1700 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu) 2015-01-29 15:11:30.130351 7f2454ce1700 20 SSL_TLS_SNI=switch-original-staging.os.zhdk.cloud.switch.ch 2015-01-29 15:11:30.130352
Re: [ceph-users] Sizing SSD's for ceph
Hi, Am 29.01.2015 07:53, schrieb Christian Balzer: On Thu, 29 Jan 2015 01:30:41 + Ramakrishna Nishtala (rnishtal) wrote: * Per my understanding once writes are complete to journal then it is read again from the journal before writing to data disk. Does this mean, we have to do, not just sync/async writes but also reads ( random/seq ? ) in order to correctly size them? You might want to read this thread: https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg12952.html Assuming this didn't change (and just looking at my journal SSDs and OSD HDDs with atop I don't think so) your writes go to the HDDs pretty much in parallel. In either case, an SSD that can _write_ fast enough to satisfy your needs will definitely have no problems reading fast enough. due, that the data are in the cache (ram), there are only marginal reads from the journal-ssd! iostat from an journal ssd: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdc 304,45 0,16 82750,46 29544 15518960008 I would say, if you have much more reads, you have to less memory. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW region metadata sync prevents writes to non-master region
On Wed, Jan 28, 2015 at 8:04 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 29/01/15 13:58, Mark Kirkwood wrote: However if I try to write to eu-west I get: Sorry - that should have said: However if I try to write to eu-*east* I get: The actual code is (see below) connecting to the endpoint for eu-east (ceph4:80), so seeing it redirected to us-*west* is pretty strange! The bucket creation is synchronous, and sent to the master region for completion. Not sure why it actually fails, that's what the master region sends back. What does the corresponding log at the master region show? Yehuda --- code --- import boto import boto.s3.connection access_key = 'the key' secret_key = 'the secret' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph4', is_secure=False, # uncommmnt if you are not using ssl calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('bucket1', location='eu') key = bucket.new_key('hello.txt') key.set_contents_from_string('Hello World!') ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] No auto-mount of OSDs after server reboot
On Thu, 29 Jan 2015 03:05:41 PM Alexis KOALLA wrote: Hi, Today we encountered an issue in our Ceph cluster in LAB. Issue: The servers that host the OSDs have rebooted and we have observed that after the reboot there is no auto mount of OSD devices and we need to manually performed the mount and then start the OSD as below: 1- [root@osd.0] mount /dev/sdb2 /var/lib/ceph/osd/ceph-0 2- [root@osd.0] start ceph-osd id=0 As far as I'm aware, ceph does not handle mounting of the base filesystem - its up to you to create an fstab entry for it. The osd should autostart, but it will of course fail if the filesystem is not mounted. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] No auto-mount of OSDs after server reboot
Hi, Today we encountered an issue in our Ceph cluster in LAB. Issue: The servers that host the OSDs have rebooted and we have observed that after the reboot there is no auto mount of OSD devices and we need to manually performed the mount and then start the OSD as below: 1- [root@osd.0] mount /dev/sdb2 /var/lib/ceph/osd/ceph-0 2- [root@osd.0] start ceph-osd id=0 After performing the two commands above the OSD is up again. The question: Is it the normal behaviour of OSD server or probably something is wrong in our configuration. Any help or idea will be appreciated. Thanks and regards Alex ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW region metadata sync prevents writes to non-master region
On 30/01/15 06:31, Yehuda Sadeh wrote: On Wed, Jan 28, 2015 at 8:04 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 29/01/15 13:58, Mark Kirkwood wrote: However if I try to write to eu-west I get: Sorry - that should have said: However if I try to write to eu-*east* I get: The actual code is (see below) connecting to the endpoint for eu-east (ceph4:80), so seeing it redirected to us-*west* is pretty strange! The bucket creation is synchronous, and sent to the master region for completion. Not sure why it actually fails, that's what the master region sends back. What does the corresponding log at the master region show? The log from us-west (ceph1) below. It looks to be failing because the user does not exist. That is reasonable - I've created the user in us-*east* and it has been replicated to eu-east... What is puzzling is why oit is going to that zone (instead of us-east). I'll include the region json below too (in case three is something obviously dumb in them)! $ tail radosgw.log 2015-01-29 21:23:05.260158 7f9f66f7d700 1 == starting new request req=0x7f9fa802b390 = 2015-01-29 21:23:05.260173 7f9f66f7d700 2 req 1:0.15::PUT /bucket1/::initializing 2015-01-29 21:23:05.260178 7f9f66f7d700 10 host=ceph1 rgw_dns_name=ceph1 2015-01-29 21:23:05.260220 7f9f66f7d700 10 s-object=NULL s-bucket=bucket1 2015-01-29 21:23:05.260230 7f9f66f7d700 2 req 1:0.72:s3:PUT /bucket1/::getting op 2015-01-29 21:23:05.260241 7f9f66f7d700 2 req 1:0.83:s3:PUT /bucket1/:create_bucket:authorizing 2015-01-29 21:23:05.260282 7f9f66f7d700 20 get_obj_state: rctx=0x7f9fac0280a0 obj=.us-west.users:eu-east key state=0x7f9fac028380 s-prefetch_data=0 2015-01-29 21:23:05.260291 7f9f66f7d700 10 cache get: name=.us-west.users+eu-east key : miss 2015-01-29 21:23:05.261188 7f9f66f7d700 10 cache put: name=.us-west.users+eu-east key 2015-01-29 21:23:05.261194 7f9f66f7d700 10 adding .us-west.users+eu-east key to cache LRU end 2015-01-29 21:23:05.261207 7f9f66f7d700 5 error reading user info, uid=eu-east key can't authenticate 2015-01-29 21:23:05.261210 7f9f66f7d700 10 failed to authorize request 2015-01-29 21:23:05.261237 7f9f66f7d700 2 req 1:0.001079:s3:PUT /bucket1/:create_bucket:http status=403 2015-01-29 21:23:05.261240 7f9f66f7d700 1 == req done req=0x7f9fa802b390 http_status=403 == $ cat us.json { name: us, api_name: us, is_master: true, endpoints: [ http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ ], master_zone: us-east, zones: [ { name: us-east, endpoints: [ http:\/\/ceph2:80\/], log_meta: true, log_data: true}, { name: us-west, endpoints: [ http:\/\/ceph1:80\/], log_meta: true, log_data: true}], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement} $ cat eu.json { name: eu, api_name: eu, is_master: false, endpoints: [ http:\/\/ceph4:80\/, http:\/\/ceph3:80\/ ], master_zone: eu-east, zones: [ { name: eu-east, endpoints: [ http:\/\/ceph4:80\/], log_meta: true, log_data: true}, { name: eu-west, endpoints: [ http:\/\/ceph3:80\/], log_meta: true, log_data: true}], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement} ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW region metadata sync prevents writes to non-master region
How does your regionmap look like? Is it updated correctly on all zones? On Thu, Jan 29, 2015 at 1:42 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 30/01/15 06:31, Yehuda Sadeh wrote: On Wed, Jan 28, 2015 at 8:04 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 29/01/15 13:58, Mark Kirkwood wrote: However if I try to write to eu-west I get: Sorry - that should have said: However if I try to write to eu-*east* I get: The actual code is (see below) connecting to the endpoint for eu-east (ceph4:80), so seeing it redirected to us-*west* is pretty strange! The bucket creation is synchronous, and sent to the master region for completion. Not sure why it actually fails, that's what the master region sends back. What does the corresponding log at the master region show? The log from us-west (ceph1) below. It looks to be failing because the user does not exist. That is reasonable - I've created the user in us-*east* and it has been replicated to eu-east... What is puzzling is why oit is going to that zone (instead of us-east). I'll include the region json below too (in case three is something obviously dumb in them)! $ tail radosgw.log 2015-01-29 21:23:05.260158 7f9f66f7d700 1 == starting new request req=0x7f9fa802b390 = 2015-01-29 21:23:05.260173 7f9f66f7d700 2 req 1:0.15::PUT /bucket1/::initializing 2015-01-29 21:23:05.260178 7f9f66f7d700 10 host=ceph1 rgw_dns_name=ceph1 2015-01-29 21:23:05.260220 7f9f66f7d700 10 s-object=NULL s-bucket=bucket1 2015-01-29 21:23:05.260230 7f9f66f7d700 2 req 1:0.72:s3:PUT /bucket1/::getting op 2015-01-29 21:23:05.260241 7f9f66f7d700 2 req 1:0.83:s3:PUT /bucket1/:create_bucket:authorizing 2015-01-29 21:23:05.260282 7f9f66f7d700 20 get_obj_state: rctx=0x7f9fac0280a0 obj=.us-west.users:eu-east key state=0x7f9fac028380 s-prefetch_data=0 2015-01-29 21:23:05.260291 7f9f66f7d700 10 cache get: name=.us-west.users+eu-east key : miss 2015-01-29 21:23:05.261188 7f9f66f7d700 10 cache put: name=.us-west.users+eu-east key 2015-01-29 21:23:05.261194 7f9f66f7d700 10 adding .us-west.users+eu-east key to cache LRU end 2015-01-29 21:23:05.261207 7f9f66f7d700 5 error reading user info, uid=eu-east key can't authenticate 2015-01-29 21:23:05.261210 7f9f66f7d700 10 failed to authorize request 2015-01-29 21:23:05.261237 7f9f66f7d700 2 req 1:0.001079:s3:PUT /bucket1/:create_bucket:http status=403 2015-01-29 21:23:05.261240 7f9f66f7d700 1 == req done req=0x7f9fa802b390 http_status=403 == $ cat us.json { name: us, api_name: us, is_master: true, endpoints: [ http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ ], master_zone: us-east, zones: [ { name: us-east, endpoints: [ http:\/\/ceph2:80\/], log_meta: true, log_data: true}, { name: us-west, endpoints: [ http:\/\/ceph1:80\/], log_meta: true, log_data: true}], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement} $ cat eu.json { name: eu, api_name: eu, is_master: false, endpoints: [ http:\/\/ceph4:80\/, http:\/\/ceph3:80\/ ], master_zone: eu-east, zones: [ { name: eu-east, endpoints: [ http:\/\/ceph4:80\/], log_meta: true, log_data: true}, { name: eu-west, endpoints: [ http:\/\/ceph3:80\/], log_meta: true, log_data: true}], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement} ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] keyvaluestore backend metadata overhead
Hi, we've been experimenting with the keyvaluestore backend, and have found that, on every object write (e.g. with `rados put`), a single transaction is issued containing an additional 9 KeyValueDB writes, beyond those which constitute the object data. Given the key names, these are clearly all metadata of some sort, but this poses a problem when the objects themselves are very small. Given the default strip block size of 4 KiB, with objects of size 36 KiB or less, half or more of all key-value store writes are metadata writes. With objects of size 4 KiB or less, the metadata overhead grows to 90%+. Is there any way to reduce the number of metadata rows which must be written with each object? (Alternatively, if there is a way to convince the OSD to issue multiple concurrent write transactions, that would also help. But even with keyvaluestore op threads set as high as 64, and `rados bench` issuing 64 concurrent writes, we never see more than a single active write transaction on the (multithread-capable) backend. Is there some other option we're missing?) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw (0.87) and multipart upload (result object size = 0)
Hi, We're experiencing some issues with our radosgw setup. Today we tried to copy bunch of objects between two separate clusters (using our own tool built on top of java s3 api). All went smooth until we start copying large objects (200G+). We can see that our code handles this case correctly and started multipart upload (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload). When we've checked consistency of two clusters we found that we have a lot of zero-sized objects (which turns to be our large objects). I've made more verbose log from radosgw: two requests (put_obj, complete_multipart) - https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with 200) radosgw-admin object stat output: https://gist.github.com/anonymous/2b6771bbbad3021364e2 We've tried to upload these objects several times without any luck. # radosgw --version ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Thanks in advance. -- Best regards, Gleb M Borisov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] error in sys.exitfunc
Please advise. Thanks, -Karl From: Blake, Karl D Sent: Monday, January 19, 2015 7:23 PM To: 'ceph-us...@ceph.com' Subject: error in sys.exitfunc Anytime I run Ceph-deploy I get the above error. Can you help resolve? Thanks, -Karl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] error in sys.exitfunc
Error is same as this posted link - http://www.spinics.net/lists/ceph-devel/msg21388.html From: Blake, Karl D Sent: Tuesday, January 20, 2015 4:29 AM To: ceph-us...@ceph.com Subject: RE: error in sys.exitfunc Please advise. Thanks, -Karl From: Blake, Karl D Sent: Monday, January 19, 2015 7:23 PM To: 'ceph-us...@ceph.com' Subject: error in sys.exitfunc Anytime I run Ceph-deploy I get the above error. Can you help resolve? Thanks, -Karl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Deploying ceph using Dell equallogic storage arrays
Dear concerned, Can I use Dell's equallogic storage arrays (model PS-4110) to configure different OSDs on these storage arrays (maybe by creating different volumes). If this is possible, then how can should I set about deploying Ceph in my system (some user guide or introductory document will be nice). I am deploying my OpenStack cloud and currently these storage blades can be configured as Cinder Volumes, I have iSCSI access to my storage arrays from all my blades. I don't have any real experience with Ceph but I know that normal blade servers can easily be configured as Ceph storage clusters in HA mode with monitors et al. What I want is to use my storage arrays as Ceph Storage Clusters. Warm regards, Khan, Imran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Help:mount error
Hi: I have completed the installation of ceph cluster,and the ceph health is ok: cluster 15ee68b9-eb3c-4a49-8a99-e5de64449910 health HEALTH_OK monmap e1: 1 mons at {ceph01=10.194.203.251:6789/0}, election epoch 1, quorum 0 ceph01 mdsmap e2: 0/0/1 up osdmap e16: 2 osds: 2 up, 2 in pgmap v729: 92 pgs, 4 pools, 136 MB data, 46 objects 23632 MB used, 31172 MB / 54805 MB avail 92 active+clean But when i mount from client,the error is: mount error 5 = Input/output error. I have tried lots of ways,for ex:disable selinux,update kernel... Could anyone help me to resolve it? Thanks! Jason -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW region metadata sync prevents writes to non-master region
On 30/01/15 11:08, Yehuda Sadeh wrote: How does your regionmap look like? Is it updated correctly on all zones? Regionmap listed below - checking it on all 4 zones produces exactly the same output (md5sum is same): { regions: [ { key: eu, val: { name: eu, api_name: eu, is_master: false, endpoints: [ http:\/\/ceph4:80\/, http:\/\/ceph3:80\/ ], master_zone: eu-east, zones: [ { name: eu-east, endpoints: [ http:\/\/ceph4:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 }, { name: eu-west, endpoints: [ http:\/\/ceph3:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 } ], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement } }, { key: us, val: { name: us, api_name: us, is_master: true, endpoints: [ http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ ], master_zone: us-east, zones: [ { name: us-east, endpoints: [ http:\/\/ceph2:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 }, { name: us-west, endpoints: [ http:\/\/ceph1:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 } ], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement } } ], master_region: us, bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1 }, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1 } } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] keyvaluestore backend metadata overhead
Hi Chris, [Moving this thread to ceph-devel, which is probably a bit more appropriate.] On Thu, 29 Jan 2015, Chris Pacejo wrote: Hi, we've been experimenting with the keyvaluestore backend, and have found that, on every object write (e.g. with `rados put`), a single transaction is issued containing an additional 9 KeyValueDB writes, beyond those which constitute the object data. Given the key names, these are clearly all metadata of some sort, but this poses a problem when the objects themselves are very small. Given the default strip block size of 4 KiB, with objects of size 36 KiB or less, half or more of all key-value store writes are metadata writes. With objects of size 4 KiB or less, the metadata overhead grows to 90%+. Is there any way to reduce the number of metadata rows which must be written with each object? There is a level (or two) of indirection in KeyValueStore's GenericObjectMap that is there to allow object cloning. I wonder if we will want to facilitate a backend that doesn't implement clone and can only be used for pools that disallow clone and snap operations. There is also some key consolidation in the OSD layer we talked about in the wednesday performance call that will cut this down some! (Alternatively, if there is a way to convince the OSD to issue multiple concurrent write transactions, that would also help. But even with keyvaluestore op threads set as high as 64, and `rados bench` issuing 64 concurrent writes, we never see more than a single active write transaction on the (multithread-capable) backend. Is there some other option we're missing?) sage___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mon leveldb loss
Hi, I'm hoping desperately that someone can help. I have a critical issue with a tiny 'cluster'... There was a power glitch earlier today (not an outage, might have been a brownout, some things went down, others didn't) and i came home to a CPU machine check exception on the singular host on which i keep a trio of ceph monitors. No option but to hard reset. When the system came back up, the monitors didn't. Each mon is reporting possible corruption of their leveldb stores, files are missing, one might surmise an fsck decided to discard them. See attached txt files for ceph-mon output and corresponding store.db directory listings. Is there any way to recover the leveldb for the monitors? I am more than capable and willing to dig into the structure of these files - or any similar measures - if necessary. Perhaps correlate a compete picture between the data files that are available? I do have a relevant backup of the monitor data but it is now three months old. I would prefer not to have to resort to this if there is any chance of recovering monitor operability by other means. Also, what would the consequences be of restoring such a backup when the (12TB worth of) osd's are perfectly fine and contain the latest up-to-date pg associations? Would there be a risk of data loss? Unfortunately i don't have any backups of the actual user data (being poor, scraping along on a shoestring budget, not exactly conducive to anything approaching an ideal hardware setup), unless one counts a set of old disks from a previously failed cluster from six months ago. My last recourse will likely be to try to scavenge and piece together my most important files from whatever i find on the osd's. Far from an exciting prospect but i am seriously desperate. I would be terribly grateful for any input. Mike 2015-01-29 19:49:30.590913 7fa66458d7c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18788 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb 2015-01-29 19:49:37.542790 7fa66458d7c0 -1 failed to create new leveldb store 2015-01-29 19:49:43.279940 7f03e8ec87c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18846 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb 2015-01-29 19:49:50.708742 7f03e8ec87c0 -1 failed to create new leveldb store 2015-01-29 19:49:47.866736 7fb6aeebe7c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18869 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb 2015-01-29 19:49:54.935436 7fb6aeebe7c0 -1 failed to create new leveldb store mon/unimatrix-0/store.db/: total 42160 -rw-r--r-- 1 root root 57 Aug 24 14:59 LOG -rw-r--r-- 1 root root0 Aug 24 14:59 LOCK drwxr-xr-x 3 root root 80 Aug 24 14:59 .. -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051297.ldb -rw-r--r-- 1 root root82124 Jan 29 13:53 1054697.ldb -rw-r--r-- 1 root root46609 Jan 29 14:00 1054744.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054790.ldb -rw-r--r-- 1 root root83304 Jan 29 14:16 1054851.ldb -rw-r--r-- 1 root root18620 Jan 29 14:16 1054858.ldb -rw-r--r-- 1 root root 42568979 Jan 29 14:23 MANIFEST-399002 drwxr-xr-x 2 root root 240 Jan 29 14:23 . mon/unimatrix-2/store.db/: total 42180 -rw-r--r-- 1 root root 57 Aug 24 15:09 LOG -rw-r--r-- 1 root root0 Aug 24 15:09 LOCK drwxr-xr-x 3 root root 80 Aug 24 15:09 .. -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051311.ldb -rw-r--r-- 1 root root82124 Jan 29 13:53 1054711.ldb -rw-r--r-- 1 root root46609 Jan 29 14:00 1054758.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054804.ldb -rw-r--r-- 1 root root83304 Jan 29 14:16 1054865.ldb -rw-r--r-- 1 root root18620 Jan 29 14:16 1054872.ldb -rw-r--r-- 1 root root 42589118 Jan 29 14:23 MANIFEST-399004 drwxr-xr-x 2 root root 240 Jan 29 14:23 . mon/unimatrix-1/store.db/: total 42180 -rw-r--r-- 1 root root0 Aug 24 15:03 LOCK drwxr-xr-x 3 root root 80 Aug 24 15:03 .. -rw-r--r-- 1 root root 57 Aug 24 15:03 LOG -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051308.ldb -rw-r--r-- 1 root root82124 Jan 29 13:53 1054708.ldb -rw-r--r-- 1 root root46609 Jan 29 14:00 1054755.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054801.ldb -rw-r--r-- 1 root root83304 Jan 29 14:16 1054862.ldb -rw-r--r-- 1 root root18620 Jan 29 14:16 1054869.ldb -rw-r--r-- 1 root root 4254 Jan 29 14:23
[ceph-users] Question about ceph class usage
Hello, I found the document of ceph class usage is very few, below is the only one which can almost address my needs-- http://ceph.com/rados/dynamic-object-interfaces-with-lua/ But still some questions confusing me left there: 1. How to make the OSD to load the class lib? or what's the process for an OSD deamon to load a customized class lib? I checked my OSD log file(/var/log/ceph/ceph-osd.2.log), I can't find the log message loading cls_hello, does that mean the hello class lib hasn't been loaded by the osd deamon yet? But I can see the 'libcls_hello.so' is really under /usr/lib64/rados-classes folder of that OSD. 2. Suppose I have an object named testobj stored into OSD0 and OSD1, what will happen if I call the rados_exec(..., testobj, hello, say_hello,...) in client side? Will the say_hello() function be called twice in OSD0 and OSD1 respectively? -- Den ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] error in sys.exitfunc
Anytime I run Ceph-deploy I get the above error. Can you help resolve? Thanks, -Karl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW region metadata sync prevents writes to non-master region
On Thu, Jan 29, 2015 at 3:27 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 30/01/15 11:08, Yehuda Sadeh wrote: How does your regionmap look like? Is it updated correctly on all zones? Regionmap listed below - checking it on all 4 zones produces exactly the same output (md5sum is same): { regions: [ { key: eu, val: { name: eu, api_name: eu, is_master: false, endpoints: [ http:\/\/ceph4:80\/, http:\/\/ceph3:80\/ ], master_zone: eu-east, zones: [ { name: eu-east, endpoints: [ http:\/\/ceph4:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 }, { name: eu-west, endpoints: [ http:\/\/ceph3:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 } ], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement } }, { key: us, val: { name: us, api_name: us, is_master: true, endpoints: [ http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ Note that you have ceph1:80 specified as an endpoint to the region. This is then used for the bucket creation. This one should only include the master endpoint. Yehuda ], master_zone: us-east, zones: [ { name: us-east, endpoints: [ http:\/\/ceph2:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 }, { name: us-west, endpoints: [ http:\/\/ceph1:80\/ ], log_meta: true, log_data: true, bucket_index_max_shards: 0 } ], placement_targets: [ { name: default-placement, tags: [] } ], default_placement: default-placement } } ], master_region: us, bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1 }, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1 } } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw (0.87) and multipart upload (result object size = 0)
I am curious whether the object can be uploaded without MultiUpload, so we can determine which part is wrong. On 21 January 2015 at 09:15, Gleb Borisov borisov.g...@gmail.com wrote: Hi, We're experiencing some issues with our radosgw setup. Today we tried to copy bunch of objects between two separate clusters (using our own tool built on top of java s3 api). All went smooth until we start copying large objects (200G+). We can see that our code handles this case correctly and started multipart upload (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload). When we've checked consistency of two clusters we found that we have a lot of zero-sized objects (which turns to be our large objects). I've made more verbose log from radosgw: two requests (put_obj, complete_multipart) - https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with 200) radosgw-admin object stat output: https://gist.github.com/anonymous/2b6771bbbad3021364e2 We've tried to upload these objects several times without any luck. # radosgw --version ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Thanks in advance. -- Best regards, Gleb M Borisov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dong Yuan Email:yuandong1...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH BackUPs
On Fri, 30 Jan 2015 01:22:53 +0200 Georgios Dimitrakakis wrote: Urged by a previous post by Mike Winfield where he suffered a leveldb loss I would like to know which files are critical for CEPH operation and must be backed-up regularly and how are you people doing it? Aside from probably being quite hard/disruptive to back up a monitor leveldb, it will also be quite pointless, as it constantly changes. This is why one has at least 3 monitors on different machines that are on different UPS backed circuits and storing things on SSDs that are also power failure proof. And if a monitor gets destroyed like that, the official fix suggested by the Ceph developers is to re-create it from scratch and let it catch up to the good monitors. That being said, aside from a backup of the actual data on the cluster (which is another challenge), one wonders if in Mike's case a RBD FSCK of sorts can be created that is capable of restoring things based on the actual data still on the OSDs. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw (0.87) and multipart upload (result object size = 0)
I assume that the problem is not with the object itself, but with one of the upload mechanism (either client, or rgw, or both). I would be curious, however, to see if a different S3 client (not the homebrew one) could upload the object correctly using multipart upload. Yehuda On Thu, Jan 29, 2015 at 7:54 PM, Dong Yuan yuandong1...@gmail.com wrote: I am curious whether the object can be uploaded without MultiUpload, so we can determine which part is wrong. On 21 January 2015 at 09:15, Gleb Borisov borisov.g...@gmail.com wrote: Hi, We're experiencing some issues with our radosgw setup. Today we tried to copy bunch of objects between two separate clusters (using our own tool built on top of java s3 api). All went smooth until we start copying large objects (200G+). We can see that our code handles this case correctly and started multipart upload (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload). When we've checked consistency of two clusters we found that we have a lot of zero-sized objects (which turns to be our large objects). I've made more verbose log from radosgw: two requests (put_obj, complete_multipart) - https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with 200) radosgw-admin object stat output: https://gist.github.com/anonymous/2b6771bbbad3021364e2 We've tried to upload these objects several times without any luck. # radosgw --version ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Thanks in advance. -- Best regards, Gleb M Borisov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dong Yuan Email:yuandong1...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw (0.87) and multipart upload (result object size = 0)
On Tue, Jan 20, 2015 at 5:15 PM, Gleb Borisov borisov.g...@gmail.com wrote: Hi, We're experiencing some issues with our radosgw setup. Today we tried to copy bunch of objects between two separate clusters (using our own tool built on top of java s3 api). All went smooth until we start copying large objects (200G+). We can see that our code handles this case correctly and started multipart upload (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload). When we've checked consistency of two clusters we found that we have a lot of zero-sized objects (which turns to be our large objects). I've made more verbose log from radosgw: two requests (put_obj, complete_multipart) - https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with 200) radosgw-admin object stat output: https://gist.github.com/anonymous/2b6771bbbad3021364e2 We've tried to upload these objects several times without any luck. # radosgw --version ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) It's hard to say much from these specific logs. Maybe if you could provide some extra log that includes the http headers of the requests, and also add 'debug ms = 1'. Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CEPH BackUPs
Urged by a previous post by Mike Winfield where he suffered a leveldb loss I would like to know which files are critical for CEPH operation and must be backed-up regularly and how are you people doing it? Any points much appreciated! Regards, G. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW region metadata sync prevents writes to non-master region
On 30/01/15 12:34, Yehuda Sadeh wrote: On Thu, Jan 29, 2015 at 3:27 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 30/01/15 11:08, Yehuda Sadeh wrote: How does your regionmap look like? Is it updated correctly on all zones? Regionmap listed below - checking it on all 4 zones produces exactly the same output (md5sum is same): { key: us, val: { name: us, api_name: us, is_master: true, endpoints: [ http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ Note that you have ceph1:80 specified as an endpoint to the region. This is then used for the bucket creation. This one should only include the master endpoint. Cool, thanks - I was unclear about which endpoint(s) should be listed for a region. I'll change 'em and try again. Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com