[ceph-users] ceph df full allocation
is there a way to see how much data is allocated as opposed to just what was used? for example, this 20gig image is only taking up 8gigs. id like to see a df with the full allocation of images. root@ceph1:~# rbd --image vm-101-disk-1 info rbd image 'vm-101-disk-1': size 20480 MB in 5120 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.105c2ae8944a format: 2 features: layering root@ceph1:~# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 66850G 66842G8136M 0.01 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 2 2563M 022280G 671 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] old osds take much longer to start than newer osd
Hi guys, I'm using ceph for a long time now, since bobtail. I always upgraded every few weeks/ months to the latest stable release. Of course I also removed some osds and added new ones. Now during the last few upgrades (I just upgraded from 80.6 to 80.8) I noticed that old osds take much longer to startup than equal newer osds (same amount of data/ disk usage, same kind of storage+journal backing device (ssd), same weight, same number of pgs, ...). I know I observed the same behavior earlier but just didn't really care about it. Here are the relevant log entries (host of osd.0 and osd.15 has less cpu power than the others): old osds (average pgs load time: 1.5 minutes) 2015-02-27 13:44:23.134086 7ffbfdcbe780 0 osd.0 19323 load_pgs 2015-02-27 13:49:21.453186 7ffbfdcbe780 0 osd.0 19323 load_pgs opened 824 pgs 2015-02-27 13:41:32.219503 7f197b0dd780 0 osd.3 19317 load_pgs 2015-02-27 13:42:56.310874 7f197b0dd780 0 osd.3 19317 load_pgs opened 776 pgs 2015-02-27 13:38:43.909464 7f450ac90780 0 osd.6 19309 load_pgs 2015-02-27 13:40:40.080390 7f450ac90780 0 osd.6 19309 load_pgs opened 806 pgs 2015-02-27 13:36:14.451275 7f3c41d33780 0 osd.9 19301 load_pgs 2015-02-27 13:37:22.446285 7f3c41d33780 0 osd.9 19301 load_pgs opened 795 pgs new osds (average pgs load time: 3 seconds) 2015-02-27 13:44:25.529743 7f2004617780 0 osd.15 19325 load_pgs 2015-02-27 13:44:36.197221 7f2004617780 0 osd.15 19325 load_pgs opened 873 pgs 2015-02-27 13:41:29.176647 7fb147fb3780 0 osd.16 19315 load_pgs 2015-02-27 13:41:31.681722 7fb147fb3780 0 osd.16 19315 load_pgs opened 848 pgs 2015-02-27 13:38:41.470761 7f9c404be780 0 osd.17 19307 load_pgs 2015-02-27 13:38:43.737473 7f9c404be780 0 osd.17 19307 load_pgs opened 821 pgs 2015-02-27 13:36:10.997766 7f7315e99780 0 osd.18 19299 load_pgs 2015-02-27 13:36:13.511898 7f7315e99780 0 osd.18 19299 load_pgs opened 815 pgs The old osds also take more memory, here's an example: root 15700 22.8 0.7 1423816 485552 ? Ssl 13:36 4:55 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph root 15270 15.4 0.4 1227140 297032 ? Ssl 13:36 3:20 /usr/bin/ceph-osd -i 18 --pid-file /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph It seems to me there is still some old data around for the old osds which was not properly migrated/ cleaned up during the upgrades. The cluster is healthy, no problems at all the last few weeks. Is there any way to clean this up? Thanks Corin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What does the parameter journal_align_min_size mean?
I am wondering how the value of journal_align_min_size gives impact on journal padding. Is there any document describing the disk layout of journal? Thanks for help! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph - networking question
Hi all, I've only been using ceph for a few months now and currently have a small cluster (3 nodes, 18 OSDs). I get decent performance based upon the configuration. My question is, should I have a larger pipe on the client/public network or on the ceph cluster private network? I can only have a larger pipe on one of the two. The most Ceph nodes we'd have in the foreseeable future is 7, current client VM Host count is 3 with a max of 5 in the future. Currently I can near about max out the throughput on the larger pipe with the read, but not even close on the write when the larger pipe is connected to the public/client side when benchmarking with rados. The smaller pipe, I still max out with read, but still not close with the write (well, close being relative to how many replications, if I use 2 replications, I can get 60% of theoretical max, 3 replications I get about 40% when using the smaller pipe on the client/public side). Basically, I'm not sure how to determine when I'm getting to a point of when the back end cluster private network starts to become the bottleneck that needs to be expanded. -Tony___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: I'd look at two things first. One is the '{fqdn}' string, which I'm not sure whether that's the actual string that you have, or whether you just replaced it for the sake of anonymity. The second is the port number, which should be fine, but maybe the fact that it appears as part of the script uri triggers some issue. When launching radosgw it logs this: ... 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0 read_len=524288 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put: name=.rgw.root+zone_info.default 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default to cache LRU end 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0 obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get: name=.rgw.root+region_map : miss 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put: name=.rgw.root+region_map 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to cache LRU end 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my user name} 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my user name} 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header: ret=-2 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my user name} ret=-2 Why does it seem to find my radosgw defined user name as a pool and what might bring it to fail to read user header? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] old osds take much longer to start than newer osd
Does deleting/reformatting the old osds improve the performance? On Fri, Feb 27, 2015 at 6:02 AM, Corin Langosch corin.lango...@netskin.com wrote: Hi guys, I'm using ceph for a long time now, since bobtail. I always upgraded every few weeks/ months to the latest stable release. Of course I also removed some osds and added new ones. Now during the last few upgrades (I just upgraded from 80.6 to 80.8) I noticed that old osds take much longer to startup than equal newer osds (same amount of data/ disk usage, same kind of storage+journal backing device (ssd), same weight, same number of pgs, ...). I know I observed the same behavior earlier but just didn't really care about it. Here are the relevant log entries (host of osd.0 and osd.15 has less cpu power than the others): old osds (average pgs load time: 1.5 minutes) 2015-02-27 13:44:23.134086 7ffbfdcbe780 0 osd.0 19323 load_pgs 2015-02-27 13:49:21.453186 7ffbfdcbe780 0 osd.0 19323 load_pgs opened 824 pgs 2015-02-27 13:41:32.219503 7f197b0dd780 0 osd.3 19317 load_pgs 2015-02-27 13:42:56.310874 7f197b0dd780 0 osd.3 19317 load_pgs opened 776 pgs 2015-02-27 13:38:43.909464 7f450ac90780 0 osd.6 19309 load_pgs 2015-02-27 13:40:40.080390 7f450ac90780 0 osd.6 19309 load_pgs opened 806 pgs 2015-02-27 13:36:14.451275 7f3c41d33780 0 osd.9 19301 load_pgs 2015-02-27 13:37:22.446285 7f3c41d33780 0 osd.9 19301 load_pgs opened 795 pgs new osds (average pgs load time: 3 seconds) 2015-02-27 13:44:25.529743 7f2004617780 0 osd.15 19325 load_pgs 2015-02-27 13:44:36.197221 7f2004617780 0 osd.15 19325 load_pgs opened 873 pgs 2015-02-27 13:41:29.176647 7fb147fb3780 0 osd.16 19315 load_pgs 2015-02-27 13:41:31.681722 7fb147fb3780 0 osd.16 19315 load_pgs opened 848 pgs 2015-02-27 13:38:41.470761 7f9c404be780 0 osd.17 19307 load_pgs 2015-02-27 13:38:43.737473 7f9c404be780 0 osd.17 19307 load_pgs opened 821 pgs 2015-02-27 13:36:10.997766 7f7315e99780 0 osd.18 19299 load_pgs 2015-02-27 13:36:13.511898 7f7315e99780 0 osd.18 19299 load_pgs opened 815 pgs The old osds also take more memory, here's an example: root 15700 22.8 0.7 1423816 485552 ? Ssl 13:36 4:55 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph root 15270 15.4 0.4 1227140 297032 ? Ssl 13:36 3:20 /usr/bin/ceph-osd -i 18 --pid-file /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph It seems to me there is still some old data around for the old osds which was not properly migrated/ cleaned up during the upgrades. The cluster is healthy, no problems at all the last few weeks. Is there any way to clean this up? Thanks Corin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
That's the old way of defining pools. The new way involves in defining a zone and placement targets for that zone. Then you can have different default placement targets for different users. Anu URL/pointers to better understand such matters? Do you have any special config in your ceph.conf? E.g., did you modify the rgw_enable_apis configurable by any chance? # tail -20 /etc/ceph/ceph.conf [client.radosgw.owmblob] keyring = /etc/ceph/ceph.client.radosgw.keyring host = rgw user = apache rgw data = /var/lib/ceph/radosgw/ceph-rgw log file = /var/log/radosgw/client.radosgw.owmblob.log debug rgw = 20 rgw enable log rados = true rgw enable ops log = true rgw enable apis = s3 rgw cache enabled = true rgw cache lru size = 1 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock ;#rgw host = localhost ;#rgw port = 8004 rgw dns name = {fqdn} rgw print continue = true rgw thread pool size = 20 What is the purpose of the data directory btw? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
Sorry forgot to send to the list... Begin forwarded message: From: Steffen W Sørensen ste...@me.com Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed Date: 27. feb. 2015 18.29.51 CET To: Yehuda Sadeh-Weinraub yeh...@redhat.com It seems that your request did find its way to the gateway, but the question here is why doesn't it match to a known operation. This really looks like a valid list all buckets request, so I'm not sure what's happening. I'd look at two things first. One is the '{fqdn}' string, which I'm not sure whether that's the actual string that you have, or whether you just replaced it for the sake of anonymity. I replaced for anonymity thou I run on private IP but still :) The second is the port number, which should be fine, but maybe the fact that it appears as part of the script uri triggers some issue. Hmm will try with default port 80... though I would assume that anything before the 'slash' gets cut off as part of the hostname[:port] portion. Makes not difference using port 80. ... 2015-02-27 18:15:43.402729 7f37889e0700 20 SERVER_PORT=80 2015-02-27 18:15:43.402747 7f37889e0700 20 SERVER_PROTOCOL=HTTP/1.1 2015-02-27 18:15:43.402765 7f37889e0700 20 SERVER_SIGNATURE= 2015-02-27 18:15:43.402783 7f37889e0700 20 SERVER_SOFTWARE=Apache/2.2.22 (Fedora) 2015-02-27 18:15:43.402814 7f37889e0700 1 == starting new request req=0x7f37b80083d0 = 2015-02-27 18:15:43.403157 7f37889e0700 2 req 1:0.000345::GET /::initializing 2015-02-27 18:15:43.403491 7f37889e0700 10 host={fqdn} rgw_dns_name={fqdn} 2015-02-27 18:15:43.404624 7f37889e0700 2 req 1:0.001816::GET /::http status=405 2015-02-27 18:15:43.404676 7f37889e0700 1 == req done req=0x7f37b80083d0 http_status=405 == 2015-02-27 18:15:43.404901 7f37889e0700 20 process_request() returned -2003 I'm not sure how to define my radosgw user, i made one with full rights key type s3: # radosgw-admin user info --uid='{user name}' { user_id: {user name}, display_name: test user for testlab, email: {email}, suspended: 0, max_buckets: 1000, auid: 0, subusers: [], keys: [ { user: {user name}, access_key: WL4EJJYTLVYXEHNR6QSA, secret_key: {secret}}], swift_keys: [], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} When authenticating to the S3 API should I then use the unencrypted access key string or the encrypted seen above plus my secret? Howto verify if I authenticate successfully through S3 maybe this is my problem? test example: #!/usr/bin/python import boto import boto.s3.connection access_key = 'WL4EJJYTLVYXEHNR6QSA' secret_key = '{secret}' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = '{fqdn}', port = 8005, debug = 1, is_secure=False, # uncomment if you are not using ssl calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) ## Any access on conn object fails with 405 not allowed for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) bucket = conn.create_bucket('my-new-bucket') How does one btw control/map a user to/with a Ceph Pool or will an user with full right be able to create Ceph Pools through the admin API? I've added a pool to radosgw before creating my user with --pool=owmblob option not sure though that this will 'limit' a user to a default pool like that. Would have thought that this would set the default_placement attribute on the user then. Any good URLs to doc on the understanding of such matters as ACL, users and pool mapping etc in a gateway are also appreciated. # radosgw-admin pools list [ { name: owmblob}] signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
- Original Message - From: Steffen W Sørensen ste...@me.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Friday, February 27, 2015 9:39:46 AM Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: I'd look at two things first. One is the '{fqdn}' string, which I'm not sure whether that's the actual string that you have, or whether you just replaced it for the sake of anonymity. The second is the port number, which should be fine, but maybe the fact that it appears as part of the script uri triggers some issue. When launching radosgw it logs this: ... 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0 read_len=524288 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put: name=.rgw.root+zone_info.default 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default to cache LRU end 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0 obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get: name=.rgw.root+region_map : miss 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put: name=.rgw.root+region_map 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to cache LRU end 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my user name} 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my user name} 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header: ret=-2 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my user name} ret=-2 Why does it seem to find my radosgw defined user name as a pool and what might bring it to fail to read user header? That's just a red herring. It tries to sync the user stats, but it can't because quota is not enabled (iirc). We should probably get rid of these messages as they're pretty confusing. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
rgw enable apis = s3 Commenting this out makes it work :) [root@rgw tests3]# ./lsbuckets.py [root@rgw tests3]# ./lsbuckets.py my-new-bucket 2015-02-27T17:49:04.000Z [root@rgw tests3]# ... 2015-02-27 18:49:22.601578 7f48f2bdd700 20 rgw_create_bucket returned ret=-17 bucket=my-new-bucket(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.5234475.2]) 2015-02-27 18:49:22.625672 7f48f2bdd700 2 req 4:0.350444:s3:PUT /my-new-bucket/:create_bucket:http status=200 2015-02-27 18:49:22.625758 7f48f2bdd700 1 == req done req=0x7f4938007810 http_status=200 == ... Why I just wants a S3 API available not admin API? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
Hi, Newbie to RadosGW+Ceph, but learning... Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to verify a RadosGW S3 api, but seems to have an issue with RadosGW access. I get the error (not found anything searching so far...): S3ResponseError: 405 Method Not Allowed when trying to access the rgw. Apache vhost access log file says: 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 - Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 and Apache's general error_log file says: [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/ auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc= RadosGW seems to launch and run fine, though /var/log/messages at launches says: Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip 003fb36cb1dc sp 7fffde221410 error 4 in librados.so.2.0.0[3fb320+6d] # ps -fuapache UIDPID PPID C STIME TTY TIME CMD apache 15113 15111 0 14:07 ?00:00:00 /usr/sbin/fcgi- apache 15114 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15115 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15116 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15117 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15118 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15119 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15120 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15121 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15224 1 1 14:12 ?00:00:25 /usr/bin/radosgw -n client.radosgw.owmblob RadosGW create my FastCGI socket and a default .asok, (not sure why/what default socket are meant for) as well as the configured log file though it never logs anything... # tail -18 /etc/ceph/ceph.conf: [client.radosgw.owmblob] keyring = /etc/ceph/ceph.client.radosgw.keyring host = rgw rgw data = /var/lib/ceph/radosgw/ceph-rgw log file = /var/log/radosgw/client.radosgw.owmblob.log debug rgw = 20 rgw enable log rados = true rgw enable ops log = true rgw enable apis = s3 rgw cache enabled = true rgw cache lru size = 1 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock ;#rgw host = localhost ;#rgw port = 8004 rgw dns name = {fqdn} rgw print continue = true rgw thread pool size = 20 Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file didn't exist, assuming radosgw creates this log file when opening it, only it creates it as root not $USER, thus not output, manually chowning it and restarting GW gives output ala: 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ: 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg= 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015 14:25:14 GMT 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING= 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/ 2015-02-27 15:25:14.469677 7fef431e4700 20 SCRIPT_FILENAME=/var/www/html/s3gw.fcgi 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/ 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/ 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/ 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29 2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email} 2015-02-27 15:25:14.469782 7fef431e4700 20 SERVER_NAME={fqdn} 2015-02-27 15:25:14.469801 7fef431e4700 20 SERVER_PORT=8005 2015-02-27 15:25:14.469818 7fef431e4700 20 SERVER_PROTOCOL=HTTP/1.1 2015-02-27 15:25:14.469835 7fef431e4700 20 SERVER_SIGNATURE= 2015-02-27 15:25:14.469852 7fef431e4700 20 SERVER_SOFTWARE=Apache/2.2.22 (Fedora) 2015-02-27
[ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings
Hi everyone, I always have a bit of trouble wrapping my head around how libvirt seems to ignore ceph.conf option while qemu/kvm does not, so I thought I'd ask. Maybe Josh, Wido or someone else can clarify the following. http://ceph.com/docs/master/rbd/qemu-rbd/ says: Important: If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. Now this refers to explicitly setting rbd_cache=true on the qemu command line, not having rbd_cache=true in the [client] section in ceph.conf, and I'm not even sure whether qemu supports that anymore. Even if it does, I'm still not sure whether the statement is accurate. qemu has, for some time, had a cache=directsync mode which is intended to be used as follows (from http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html): This mode is useful when guests may not be sending flushes when appropriate and therefore leave data at risk in case of power failure. When cache=directsync is used, write operations are only completed to the guest when data is safely on disk. So even if there are no flush requests to librbd, users should still be safe from corruption when using cache=directsync, no? So in summary, I *think* the following considerations apply, but I'd be grateful if someone could confirm or refute them: cache = writethrough Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to use whether or not guest I/O stack sends flushes. cache = writeback Maps to rbd_cache=true, rbd_cache_max_dirty 0. Safe to use only if guest I/O stack sends flushes. Maps to cache = writethrough until first flush if rbd_cache_writethrough_until_flush = true (default in master). cache = none Maps to rbd_cache=false. No caching, safe to use regardless of guest I/O stack flush support. cache = unsafe Maps to rbd_cache=true, rbd_cache_max_dirty 0, but also *ignores* all flush requests from the guest. Not safe to use (except in the unlikely case that your guest never-ever writes). cache=directsync Maps to rbd_cache=true, rbd_cache_max_dirty=0. Bypasses the host page cache altogether, which I think would be meaningless with the rbd storage driver because it doesn't use the host page cache (unlike qcow2). Read cache only, safe to use whether or not guest I/O stack sends flushes. Is the above an accurate summary? If so, I'll be happy to send a doc patch. Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] multiple CephFS filesystems on the same pools
On 02/27/2015 11:37 AM, Blair Bethwaite wrote: Sorry if this is actually documented somewhere, It is. :) but is it possible to create and use multiple filesystems on the data data and metadata pools? I'm guessing yes, but requires multiple MDSs? Nope. Every fs needs one data and one metadata pool, which (as of 0.84) can be arbitrarily named, but as yet there's no support for multiple filesystems on a single cluster. http://ceph.com/docs/master/cephfs/createfs/ Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings
2015-02-27 20:56 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com: Hi, from qemu rbd.c if (flags BDRV_O_NOCACHE) { rados_conf_set(s-cluster, rbd_cache, false); } else { rados_conf_set(s-cluster, rbd_cache, true); } and block.c int bdrv_parse_cache_flags(const char *mode, int *flags) { *flags = ~BDRV_O_CACHE_MASK; if (!strcmp(mode, off) || !strcmp(mode, none)) { *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB; } else if (!strcmp(mode, directsync)) { *flags |= BDRV_O_NOCACHE; } else if (!strcmp(mode, writeback)) { *flags |= BDRV_O_CACHE_WB; } else if (!strcmp(mode, unsafe)) { *flags |= BDRV_O_CACHE_WB; *flags |= BDRV_O_NO_FLUSH; } else if (!strcmp(mode, writethrough)) { /* this is the default */ } else { return -1; } return 0; } So rbd_cache is disabled for cache=directsync|none and enabled for writethrough|writeback|unsafe so directsync or none should be safe if guest does not send flush. - Mail original - De: Florian Haas flor...@hastexo.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 27 Février 2015 13:38:13 Objet: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings Hi everyone, I always have a bit of trouble wrapping my head around how libvirt seems to ignore ceph.conf option while qemu/kvm does not, so I thought I'd ask. Maybe Josh, Wido or someone else can clarify the following. http://ceph.com/docs/master/rbd/qemu-rbd/ says: Important: If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. Now this refers to explicitly setting rbd_cache=true on the qemu command line, not having rbd_cache=true in the [client] section in ceph.conf, and I'm not even sure whether qemu supports that anymore. Even if it does, I'm still not sure whether the statement is accurate. qemu has, for some time, had a cache=directsync mode which is intended to be used as follows (from http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html): This mode is useful when guests may not be sending flushes when appropriate and therefore leave data at risk in case of power failure. When cache=directsync is used, write operations are only completed to the guest when data is safely on disk. So even if there are no flush requests to librbd, users should still be safe from corruption when using cache=directsync, no? So in summary, I *think* the following considerations apply, but I'd be grateful if someone could confirm or refute them: cache = writethrough Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to Actually, qemu doesn't care about the setting rbd_cache_max_dirty. In the mode of writethrough, qemu always sends flush following every write request. use whether or not guest I/O stack sends flushes. cache = writeback Maps to rbd_cache=true, rbd_cache_max_dirty 0. Safe to use only if guest I/O stack sends flushes. Maps to cache = writethrough until first Qemu can report to guest if the write cache is enabled and guest kernel can manage the cache as what it does against volatile writeback cache on physical storage controller (Please see https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt) If filesystem barrier is not disabled on guest, it can avoid data corruption. flush if rbd_cache_writethrough_until_flush = true (default in master). cache = none Maps to rbd_cache=false. No caching, safe to use regardless of guest I/O stack flush support. cache = unsafe Maps to rbd_cache=true, rbd_cache_max_dirty 0, but also *ignores* all flush requests from the guest. Not safe to use (except in the unlikely case that your guest never-ever writes). cache=directsync Maps to rbd_cache=true, rbd_cache_max_dirty=0. Bypasses the host page cache altogether, which I think would be meaningless with the rbd storage driver because it doesn't use the host page cache (unlike qcow2). Read cache only, safe to use whether or not guest I/O stack sends flushes. Is the above an accurate summary? If so, I'll be happy to send a doc patch. Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings
Hi, from qemu rbd.c if (flags BDRV_O_NOCACHE) { rados_conf_set(s-cluster, rbd_cache, false); } else { rados_conf_set(s-cluster, rbd_cache, true); } and block.c int bdrv_parse_cache_flags(const char *mode, int *flags) { *flags = ~BDRV_O_CACHE_MASK; if (!strcmp(mode, off) || !strcmp(mode, none)) { *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB; } else if (!strcmp(mode, directsync)) { *flags |= BDRV_O_NOCACHE; } else if (!strcmp(mode, writeback)) { *flags |= BDRV_O_CACHE_WB; } else if (!strcmp(mode, unsafe)) { *flags |= BDRV_O_CACHE_WB; *flags |= BDRV_O_NO_FLUSH; } else if (!strcmp(mode, writethrough)) { /* this is the default */ } else { return -1; } return 0; } So rbd_cache is disabled for cache=directsync|none and enabled for writethrough|writeback|unsafe so directsync or none should be safe if guest does not send flush. - Mail original - De: Florian Haas flor...@hastexo.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 27 Février 2015 13:38:13 Objet: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings Hi everyone, I always have a bit of trouble wrapping my head around how libvirt seems to ignore ceph.conf option while qemu/kvm does not, so I thought I'd ask. Maybe Josh, Wido or someone else can clarify the following. http://ceph.com/docs/master/rbd/qemu-rbd/ says: Important: If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. Now this refers to explicitly setting rbd_cache=true on the qemu command line, not having rbd_cache=true in the [client] section in ceph.conf, and I'm not even sure whether qemu supports that anymore. Even if it does, I'm still not sure whether the statement is accurate. qemu has, for some time, had a cache=directsync mode which is intended to be used as follows (from http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html): This mode is useful when guests may not be sending flushes when appropriate and therefore leave data at risk in case of power failure. When cache=directsync is used, write operations are only completed to the guest when data is safely on disk. So even if there are no flush requests to librbd, users should still be safe from corruption when using cache=directsync, no? So in summary, I *think* the following considerations apply, but I'd be grateful if someone could confirm or refute them: cache = writethrough Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to use whether or not guest I/O stack sends flushes. cache = writeback Maps to rbd_cache=true, rbd_cache_max_dirty 0. Safe to use only if guest I/O stack sends flushes. Maps to cache = writethrough until first flush if rbd_cache_writethrough_until_flush = true (default in master). cache = none Maps to rbd_cache=false. No caching, safe to use regardless of guest I/O stack flush support. cache = unsafe Maps to rbd_cache=true, rbd_cache_max_dirty 0, but also *ignores* all flush requests from the guest. Not safe to use (except in the unlikely case that your guest never-ever writes). cache=directsync Maps to rbd_cache=true, rbd_cache_max_dirty=0. Bypasses the host page cache altogether, which I think would be meaningless with the rbd storage driver because it doesn't use the host page cache (unlike qcow2). Read cache only, safe to use whether or not guest I/O stack sends flushes. Is the above an accurate summary? If so, I'll be happy to send a doc patch. Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings
On 02/27/2015 01:56 PM, Alexandre DERUMIER wrote: Hi, from qemu rbd.c if (flags BDRV_O_NOCACHE) { rados_conf_set(s-cluster, rbd_cache, false); } else { rados_conf_set(s-cluster, rbd_cache, true); } and block.c int bdrv_parse_cache_flags(const char *mode, int *flags) { *flags = ~BDRV_O_CACHE_MASK; if (!strcmp(mode, off) || !strcmp(mode, none)) { *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB; } else if (!strcmp(mode, directsync)) { *flags |= BDRV_O_NOCACHE; } else if (!strcmp(mode, writeback)) { *flags |= BDRV_O_CACHE_WB; } else if (!strcmp(mode, unsafe)) { *flags |= BDRV_O_CACHE_WB; *flags |= BDRV_O_NO_FLUSH; } else if (!strcmp(mode, writethrough)) { /* this is the default */ } else { return -1; } return 0; } So rbd_cache is disabled for cache=directsync|none and enabled for writethrough|writeback|unsafe so directsync or none should be safe if guest does not send flush. That's what I figured too, but then where does the important warning in the documentation come from that implores people to always set writeback? As per git blame it came directly from Josh. If anyone's an authority on RBD, it would be him. :) Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
On 27/02/2015, at 18.51, Steffen W Sørensen ste...@me.com wrote: rgw enable apis = s3 Commenting this out makes it work :) Thanks for helping on this initial issue! [root@rgw tests3]# ./lsbuckets.py [root@rgw tests3]# ./lsbuckets.py my-new-bucket 2015-02-27T17:49:04.000Z [root@rgw tests3]# ... 2015-02-27 18:49:22.601578 7f48f2bdd700 20 rgw_create_bucket returned ret=-17 bucket=my-new-bucket(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.5234475.2]) 2015-02-27 18:49:22.625672 7f48f2bdd700 2 req 4:0.350444:s3:PUT /my-new-bucket/:create_bucket:http status=200 2015-02-27 18:49:22.625758 7f48f2bdd700 1 == req done req=0x7f4938007810 http_status=200 == ... Into which pool does such user data (buckets and objects) gets stored and possible howto direct user data into a dedicated pool? [root@rgw ~]# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB .intent-log - 000 0 00000 .log- 110 0 00022 .rgw- 140 0 0 17 14 104 .rgw.buckets- 000 0 00000 .rgw.buckets.extra - 000 0 00000 .rgw.buckets.index - 010 0 02030 .rgw.control- 080 0 00000 .rgw.gc - 0 320 0 0 8302 8302 55560 .rgw.root - 130 0 0 929 61833 .usage - 000 0 00000 .users - 110 0 06453 .users.email- 110 0 03253 .users.swift- 000 0 00000 .users.uid - 120 0 0 65 54 164 Assume a bucket is a naming container for objects in a pool maybe similar to a directory with files. /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Lost Object
Anyone help me, please? In the attach, the log of mds with debug = 20. Thanks, Att. --- Daniel Takatori Ohara. System Administrator - Lab. of Bioinformatics Molecular Oncology Center Instituto Sírio-Libanês de Ensino e Pesquisa Hospital Sírio-Libanês Phone: +55 11 3155-0200 (extension 1927) R: Cel. Nicolau dos Santos, 69 São Paulo-SP. 01308-060 http://www.bioinfo.mochsl.org.br On Thu, Feb 26, 2015 at 4:21 PM, Daniel Takatori Ohara dtoh...@mochsl.org.br wrote: Hello, I have an problem. I will make a symbolic link for an file, but return the message : ln: failed to create symbolic link ‘./M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam’: File exists When i do the command ls, the result is l? ? ? ? ?? M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam But, when do the command ls in the second time, the result not show the file. Anyone help me, please? Thank you, Att. --- Daniel Takatori Ohara. System Administrator - Lab. of Bioinformatics Molecular Oncology Center Instituto Sírio-Libanês de Ensino e Pesquisa Hospital Sírio-Libanês Phone: +55 11 3155-0200 (extension 1927) R: Cel. Nicolau dos Santos, 69 São Paulo-SP. 01308-060 http://www.bioinfo.mochsl.org.br log_mds.gz Description: GNU Zip compressed data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Clarification of SSD journals for BTRFS rotational HDD
Also sending to the devel list to see if they have some insight. On Wed, Feb 25, 2015 at 3:01 PM, Robert LeBlanc rob...@leblancnet.us wrote: I tried finding an answer to this on Google, but couldn't find it. Since BTRFS can parallel the journal with the write, does it make sense to have the journal on the SSD (because then we are forcing two writes instead of one)? Our plan is to have a caching tier of SSDs in front of our rotational HDDs and it sounds like the improvements in Hammer will really help here. If we can take the journals off the SSDs, that just opens up a bit more space for caching (albeit not much). It specifically makes the configuration of the host much simpler and a single SSD doesn't take out 5 HHDs. Thanks, Robert LeBlanc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] too few pgs in cache tier
Hi all, we use an EC-Pool with an small cache tier in front of, for our archive-data (4 * 16TB VM-disks). The ec-pool has k=3;m=2 because we startet with 5 nodes and want to migrate to an new ec-pool with k=5;m=2. Therefor we migrate one VM-disk (16TB) from the ceph-cluster to an fc-raid with the proxmox-ve interface move disk. The move was finished, but during removing the ceph-vm file the warning 'ssd-archiv' at/near target max; pool ssd-archiv has too few pgs occour. Some hour later only the second warning exsist. ceph health detail HEALTH_WARN pool ssd-archiv has too few pgs pool ssd-archiv objects per pg (51196) is more than 14.7709 times cluster average (3466) info about the image, which was deleted: rbd image 'vm-409-disk-1': size 16384 GB in 4194304 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.2b8fda574b0dc51 format: 2 features: layering I think we hit http://tracker.ceph.com/issues/8103 but normaly one reading should not put the data in the cache tier, or?? Is deleting a second read?? Our ceph version: 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Minor flaw in /etc/init.d/ceph-radsgw script
Hi Seems there's a minor flaw in CentOS/RHEL niit script: line 91 reads: daemon --user=$user $RADOSGW -n $name should ImHO be: daemon --user=$user $RADOSGW -n $name to avoid /etc/rc.d/init.d/functions:__pids_var_run line 151 complain in dirname :) /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
- Original Message - From: Steffen W Sørensen ste...@me.com To: ceph-users@lists.ceph.com Sent: Friday, February 27, 2015 6:40:01 AM Subject: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed Hi, Newbie to RadosGW+Ceph, but learning... Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to verify a RadosGW S3 api, but seems to have an issue with RadosGW access. I get the error (not found anything searching so far...): S3ResponseError: 405 Method Not Allowed when trying to access the rgw. Apache vhost access log file says: 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 - Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 and Apache's general error_log file says: [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/ auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc= RadosGW seems to launch and run fine, though /var/log/messages at launches says: Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip 003fb36cb1dc sp 7fffde221410 error 4 in librados.so.2.0.0[3fb320+6d] # ps -fuapache UIDPID PPID C STIME TTY TIME CMD apache 15113 15111 0 14:07 ?00:00:00 /usr/sbin/fcgi- apache 15114 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15115 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15116 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15117 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15118 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15119 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15120 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15121 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15224 1 1 14:12 ?00:00:25 /usr/bin/radosgw -n client.radosgw.owmblob RadosGW create my FastCGI socket and a default .asok, (not sure why/what default socket are meant for) as well as the configured log file though it never logs anything... # tail -18 /etc/ceph/ceph.conf: [client.radosgw.owmblob] keyring = /etc/ceph/ceph.client.radosgw.keyring host = rgw rgw data = /var/lib/ceph/radosgw/ceph-rgw log file = /var/log/radosgw/client.radosgw.owmblob.log debug rgw = 20 rgw enable log rados = true rgw enable ops log = true rgw enable apis = s3 rgw cache enabled = true rgw cache lru size = 1 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock ;#rgw host = localhost ;#rgw port = 8004 rgw dns name = {fqdn} rgw print continue = true rgw thread pool size = 20 Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file didn't exist, assuming radosgw creates this log file when opening it, only it creates it as root not $USER, thus not output, manually chowning it and restarting GW gives output ala: 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ: 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg= 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015 14:25:14 GMT 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING= 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/ 2015-02-27 15:25:14.469677 7fef431e4700 20 SCRIPT_FILENAME=/var/www/html/s3gw.fcgi 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/ 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/ 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/ 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29 2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email} 2015-02-27 15:25:14.469782 7fef431e4700
Re: [ceph-users] old osds take much longer to start than newer osd
I'd guess so, but that's not what I want to do ;) Am 27.02.2015 um 18:43 schrieb Robert LeBlanc: Does deleting/reformatting the old osds improve the performance? On Fri, Feb 27, 2015 at 6:02 AM, Corin Langosch corin.lango...@netskin.com wrote: Hi guys, I'm using ceph for a long time now, since bobtail. I always upgraded every few weeks/ months to the latest stable release. Of course I also removed some osds and added new ones. Now during the last few upgrades (I just upgraded from 80.6 to 80.8) I noticed that old osds take much longer to startup than equal newer osds (same amount of data/ disk usage, same kind of storage+journal backing device (ssd), same weight, same number of pgs, ...). I know I observed the same behavior earlier but just didn't really care about it. Here are the relevant log entries (host of osd.0 and osd.15 has less cpu power than the others): old osds (average pgs load time: 1.5 minutes) 2015-02-27 13:44:23.134086 7ffbfdcbe780 0 osd.0 19323 load_pgs 2015-02-27 13:49:21.453186 7ffbfdcbe780 0 osd.0 19323 load_pgs opened 824 pgs 2015-02-27 13:41:32.219503 7f197b0dd780 0 osd.3 19317 load_pgs 2015-02-27 13:42:56.310874 7f197b0dd780 0 osd.3 19317 load_pgs opened 776 pgs 2015-02-27 13:38:43.909464 7f450ac90780 0 osd.6 19309 load_pgs 2015-02-27 13:40:40.080390 7f450ac90780 0 osd.6 19309 load_pgs opened 806 pgs 2015-02-27 13:36:14.451275 7f3c41d33780 0 osd.9 19301 load_pgs 2015-02-27 13:37:22.446285 7f3c41d33780 0 osd.9 19301 load_pgs opened 795 pgs new osds (average pgs load time: 3 seconds) 2015-02-27 13:44:25.529743 7f2004617780 0 osd.15 19325 load_pgs 2015-02-27 13:44:36.197221 7f2004617780 0 osd.15 19325 load_pgs opened 873 pgs 2015-02-27 13:41:29.176647 7fb147fb3780 0 osd.16 19315 load_pgs 2015-02-27 13:41:31.681722 7fb147fb3780 0 osd.16 19315 load_pgs opened 848 pgs 2015-02-27 13:38:41.470761 7f9c404be780 0 osd.17 19307 load_pgs 2015-02-27 13:38:43.737473 7f9c404be780 0 osd.17 19307 load_pgs opened 821 pgs 2015-02-27 13:36:10.997766 7f7315e99780 0 osd.18 19299 load_pgs 2015-02-27 13:36:13.511898 7f7315e99780 0 osd.18 19299 load_pgs opened 815 pgs The old osds also take more memory, here's an example: root 15700 22.8 0.7 1423816 485552 ? Ssl 13:36 4:55 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph root 15270 15.4 0.4 1227140 297032 ? Ssl 13:36 3:20 /usr/bin/ceph-osd -i 18 --pid-file /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph It seems to me there is still some old data around for the old osds which was not properly migrated/ cleaned up during the upgrades. The cluster is healthy, no problems at all the last few weeks. Is there any way to clean this up? Thanks Corin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] too few pgs in cache tier
On 27/02/2015, at 17.04, Udo Lembke ulem...@polarzone.de wrote: ceph health detail HEALTH_WARN pool ssd-archiv has too few pgs Slightly different I had an issue with my Ceph Cluster underneath a PVE cluster yesterday. Had two Ceph pools for RBD virt disks, vm_images (boot hdd images) + rbd_data (extra hdd images). Then while adding pools for a rados GW (.rgw.*) suddenly health status said that my vm_images pool had too few PGs, thus I ran: ceph osd pool set vm_images pg_num larger_number ceph osd pool set vm_images pgp_num larger_number Kicking off a 20 min rebalancing with a lot of IO in the Ceph Cluster, eventually Ceph Cluster was fine again, only almost all my PVE VMs ended up in stopped state, wondering why, a watchdog thingy maybe... /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
On 27/02/2015, at 19.02, Steffen W Sørensen ste...@me.com wrote: Into which pool does such user data (buckets and objects) gets stored and possible howto direct user data into a dedicated pool? [root@rgw ~]# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB .intent-log - 000 0 00000 .log- 110 0 00022 .rgw- 140 0 0 17 14 104 .rgw.buckets- 000 0 00000 .rgw.buckets.extra - 000 0 00000 .rgw.buckets.index - 010 0 02030 .rgw.control- 080 0 00000 .rgw.gc - 0 320 0 0 8302 8302 55560 .rgw.root - 130 0 0 929 61833 .usage - 000 0 00000 .users - 110 0 06453 .users.email- 110 0 03253 .users.swift- 000 0 00000 .users.uid - 120 0 0 65 54 164 So it's mapped into a zone (at least on my Giant version 0.87) and in my simple non-federated config it's in the default region+zone: [root@rgw ~]# radosgw-admin region list { default_info: { default_region: default}, regions: [ default]} [root@rgw ~]# radosgw-admin zone list { zones: [ default]} [root@rgw ~]# radosgw-admin region get { name: default, api_name: , is_master: true, endpoints: [], master_zone: , zones: [ { name: default, endpoints: [], log_meta: false, log_data: false}], placement_targets: [ { name: default-placement, tags: []}], default_placement: default-placement} [root@rgw ~]# radosgw-admin zone get { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: .rgw.buckets.extra}}]} and my user if associated with the default region+zone, thus it's data goes into .rgw.buckets + .rgw.buckets.index [+ .rgw.buckets.extra] Buckets seems a naming container at the radosgw level, above the underlying Ceph pool abstraction level, 'just' providing object persistence for radosgw abstraction/object FS on top of Ceph Pools... I think. So more users associated with same region+zone can share buckets+objects? Would be nice with a drawing showing abstractions at the different levels possible woth links to details on administration at different levels :) Lot of stuff to grasp for a newbie just in the need of a S3 service for an App usage :) /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph and docker
The online Ceph Developer Summit is next week, and there is a session proposed for discussing ongoing Ceph and Docker integration efforts: https://wiki.ceph.com/Planning/Blueprints/Infernalis/Continue_Ceph%2F%2FDocker_integration_work Right now there is mostly a catalog of existing efforts. It would be great to come out of this discussion with a more consolidated view of what the requirements are and what direction we should be going in. If anyone is interested, please add your name to the blueprint and/or comment and edit as you see fit. And join the discussion next week.. it's all video chat and irc and etherpad based. Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v0.93 Hammer release candidate released
This is the first release candidate for Hammer, and includes all of the features that will be present in the final release. We welcome and encourage any and all testing in non-production clusters to identify any problems with functionality, stability, or performance before the final Hammer release. We suggest some caution in one area: librbd. There is a lot of new functionality around object maps and locking that is disabled by default but may still affect stability for existing images. We are continuing to shake out those bugs so that the final Hammer release (probably v0.94) will be stable. Major features since Giant include: * cephfs: journal scavenger repair tool (John Spray) * crush: new and improved straw2 bucket type (Sage Weil, Christina Anderson, Xiaoxi Chen) * doc: improved guidance for CephFS early adopters (John Spray) * librbd: add per-image object map for improved performance (Jason Dillaman) * librbd: copy-on-read (Min Chen, Li Wang, Yunchuan Wen, Cheng Cheng) * librados: fadvise-style IO hints (Jianpeng Ma) * mds: many many snapshot-related fixes (Yan, Zheng) * mon: new 'ceph osd df' command (Mykola Golub) * mon: new 'ceph pg ls ...' command (Xinxin Shu) * osd: improved performance for high-performance backends * osd: improved recovery behavior (Samuel Just) * osd: improved cache tier behavior with reads (Zhiqiang Wang) * rgw: S3-compatible bucket versioning support (Yehuda Sadeh) * rgw: large bucket index sharding (Guang Yang, Yehuda Sadeh) * RDMA xio messenger support (Matt Benjamin, Vu Pham) Upgrading - * No special restrictions when upgrading from firefly or giant Notable Changes --- * build: CMake support (Ali Maredia, Casey Bodley, Adam Emerson, Marcus Watts, Matt Benjamin) * ceph-disk: do not re-use partition if encryption is required (Loic Dachary) * ceph-disk: support LUKS for encrypted partitions (Andrew Bartlett, Loic Dachary) * ceph-fuse,libcephfs: add support for O_NOFOLLOW and O_PATH (Greg Farnum) * ceph-fuse,libcephfs: resend requests before completing cap reconnect (#10912 Yan, Zheng) * ceph-fuse: select kernel cache invalidation mechanism based on kernel version (Greg Farnum) * ceph-objectstore-tool: improved import (David Zafman) * ceph-objectstore-tool: misc improvements, fixes (#9870 #9871 David Zafman) * ceph: add 'ceph osd df [tree]' command (#10452 Mykola Golub) * ceph: fix 'ceph tell ...' command validation (#10439 Joao Eduardo Luis) * ceph: improve 'ceph osd tree' output (Mykola Golub) * cephfs-journal-tool: add recover_dentries function (#9883 John Spray) * common: add newline to flushed json output (Sage Weil) * common: filtering for 'perf dump' (John Spray) * common: fix Formatter factory breakage (#10547 Loic Dachary) * common: make json-pretty output prettier (Sage Weil) * crush: new and improved straw2 bucket type (Sage Weil, Christina Anderson, Xiaoxi Chen) * crush: update tries stats for indep rules (#10349 Loic Dachary) * crush: use larger choose_tries value for erasure code rulesets (#10353 Loic Dachary) * debian,rpm: move RBD udev rules to ceph-common (#10864 Ken Dreyer) * debian: split python-ceph into python-{rbd,rados,cephfs} (Boris Ranto) * doc: CephFS disaster recovery guidance (John Spray) * doc: CephFS for early adopters (John Spray) * doc: fix OpenStack Glance docs (#10478 Sebastien Han) * doc: misc updates (#9793 #9922 #10204 #10203 Travis Rhoden, Hazem, Ayari, Florian Coste, Andy Allan, Frank Yu, Baptiste Veuillez-Mainard, Yuan Zhou, Armando Segnini, Robert Jansen, Tyler Brekke, Viktor Suprun) * doc: replace cloudfiles with swiftclient Python Swift example (Tim Freund) * erasure-code: add mSHEC erasure code support (Takeshi Miyamae) * erasure-code: improved docs (#10340 Loic Dachary) * erasure-code: set max_size to 20 (#10363 Loic Dachary) * libcephfs,ceph-fuse: fix getting zero-length xattr (#10552 Yan, Zheng) * librados: add blacklist_add convenience method (Jason Dillaman) * librados: expose rados_{read|write}_op_assert_version in C API (Kim Vandry) * librados: fix pool name caching (#10458 Radoslaw Zarzynski) * librados: fix resource leak, misc bugs (#10425 Radoslaw Zarzynski) * librados: fix some watch/notify locking (Jason Dillaman, Josh Durgin) * libradosstriper: fix write_full when ENOENT (#10758 Sebastien Ponce) * librbd: CRC protection for RBD image map (Jason Dillaman) * librbd: add per-image object map for improved performance (Jason Dillaman) * librbd: add support for an object map indicating which objects exist (Jason Dillaman) * librbd: adjust internal locking (Josh Durgin, Jason Dillaman) * librbd: better handling of watch errors (Jason Dillaman) * librbd: coordinate maint operations through lock owner (Jason Dillaman) * librbd: copy-on-read (Min Chen, Li Wang, Yunchuan Wen, Cheng Cheng, Jason Dillaman) * librbd: enforce write ordering with a snapshot (Jason Dillaman) * librbd: fadvise-style hints; add misc hints for certain operations (Jianpeng
Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings
On 02/27/2015 02:46 PM, Mark Wu wrote: 2015-02-27 20:56 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com mailto:aderum...@odiso.com: Hi, from qemu rbd.c if (flags BDRV_O_NOCACHE) { rados_conf_set(s-cluster, rbd_cache, false); } else { rados_conf_set(s-cluster, rbd_cache, true); } and block.c int bdrv_parse_cache_flags(const char *mode, int *flags) { *flags = ~BDRV_O_CACHE_MASK; if (!strcmp(mode, off) || !strcmp(mode, none)) { *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB; } else if (!strcmp(mode, directsync)) { *flags |= BDRV_O_NOCACHE; } else if (!strcmp(mode, writeback)) { *flags |= BDRV_O_CACHE_WB; } else if (!strcmp(mode, unsafe)) { *flags |= BDRV_O_CACHE_WB; *flags |= BDRV_O_NO_FLUSH; } else if (!strcmp(mode, writethrough)) { /* this is the default */ } else { return -1; } return 0; } So rbd_cache is disabled for cache=directsync|none and enabled for writethrough|writeback|unsafe so directsync or none should be safe if guest does not send flush. - Mail original - De: Florian Haas flor...@hastexo.com mailto:flor...@hastexo.com À: ceph-users ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Envoyé: Vendredi 27 Février 2015 13:38:13 Objet: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings Hi everyone, I always have a bit of trouble wrapping my head around how libvirt seems to ignore ceph.conf option while qemu/kvm does not, so I thought I'd ask. Maybe Josh, Wido or someone else can clarify the following. http://ceph.com/docs/master/rbd/qemu-rbd/ says: Important: If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. Now this refers to explicitly setting rbd_cache=true on the qemu command line, not having rbd_cache=true in the [client] section in ceph.conf, and I'm not even sure whether qemu supports that anymore. Even if it does, I'm still not sure whether the statement is accurate. qemu has, for some time, had a cache=directsync mode which is intended to be used as follows (from http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html): This mode is useful when guests may not be sending flushes when appropriate and therefore leave data at risk in case of power failure. When cache=directsync is used, write operations are only completed to the guest when data is safely on disk. So even if there are no flush requests to librbd, users should still be safe from corruption when using cache=directsync, no? So in summary, I *think* the following considerations apply, but I'd be grateful if someone could confirm or refute them: cache = writethrough Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to Actually, qemu doesn't care about the setting rbd_cache_max_dirty. In the mode of writethrough, qemu always sends flush following every write request. So how exactly is that functionally different from rbd_cache_max_dirty=0? use whether or not guest I/O stack sends flushes. cache = writeback Maps to rbd_cache=true, rbd_cache_max_dirty 0. Safe to use only if guest I/O stack sends flushes. Maps to cache = writethrough until first Qemu can report to guest if the write cache is enabled and guest kernel can manage the cache as what it does against volatile writeback cache on physical storage controller (Please see https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt) If filesystem barrier is not disabled on guest, it can avoid data corruption. You mean block barriers? I thought those were killed upstream like 4 years ago. Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?
That's interesting, it seems to be alternating between two lines, but only one thread this time? I'm guessing the 62738 is the osdmap, which is much behind where it should be? Osd.0 and osd.3 are on 63675, if I'm understanding that correctly. 2015-02-27 08:18:48.724645 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:18:48.724683 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:00.025003 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:00.025040 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:04.125395 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:04.125431 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:26.225763 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:26.225797 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:26.726140 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:26.726177 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) Activity on /dev/sdb looks similar to how it did without debugging: sdb 5.95 0.00 701.20 0 42072 sdb 5.10 0.00 625.60 0 37536 sdb 4.97 0.00 611.33 0 36680 sdb 5.77 0.00 701.20 0 42072 Some Googling reveals references to log files which have very similar entries, but I can't see anything that just repeats like mine does. -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 26 February 2015 22:37 To: Chris Murray Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help? If you turn up debug osd = 20 or something it'll apply a good bit more disk load but give you more debugging logs about what's going on. It could be that you're in enough of a mess now that it's stuck trying to calculate past intervals for a bunch of PGs across so many maps that it's swapping things in and out of memory and going slower (if that's the case, then increasing the size of your map cache will help). -Greg On Thu, Feb 26, 2015 at 1:56 PM, Chris Murray chrismurra...@gmail.com wrote: Tackling this on a more piecemeal basis, I've stopped all OSDs, and started just the three which exist on the first host. osd.0 comes up without complaint: osd.0 63675 done with init, starting boot process osd.3 comes up without complaint: osd.3 63675 done with init, starting boot process osd.11 is a problematic one. It does something like this ... 2015-02-26 10:44:50.260593 7f7e23551780 0 ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7), process ceph-osd, pid 305080 2015-02-26 10:44:50.265525 7f7e23551780 0 filestore(/var/lib/ceph/osd/ceph-11) mount detected btrfs 2015-02-26 10:44:51.155501 7f7e23551780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-26 10:44:51.155536 7f7e23551780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-26 10:44:51.433239 7f7e23551780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-02-26 10:44:51.433467 7f7e23551780 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: CLONE_RANGE ioctl is supported 2015-02-26 10:44:51.644373 7f7e23551780 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: SNAP_CREATE is supported 2015-02-26 10:44:51.668424 7f7e23551780 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: SNAP_DESTROY is supported 2015-02-26 10:44:51.668741 7f7e23551780 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: START_SYNC is supported (transid 43205) 2015-02-26 10:44:51.766577 7f7e23551780 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: WAIT_SYNC is supported 2015-02-26 10:44:51.814761 7f7e23551780 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature: SNAP_CREATE_V2 is supported 2015-02-26 10:44:52.181382 7f7e23551780 0 filestore(/var/lib/ceph/osd/ceph-11) mount:
[ceph-users] multiple CephFS filesystems on the same pools
Sorry if this is actually documented somewhere, but is it possible to create and use multiple filesystems on the data data and metadata pools? I'm guessing yes, but requires multiple MDSs? -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?
A little further logging: 2015-02-27 10:27:15.745585 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:15.745619 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:23.530913 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26380 con 0xe1f0cc60 2015-02-27 10:27:30.645902 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:30.645938 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:33.531142 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26540 con 0xe1f0cc60 2015-02-27 10:27:43.531333 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26700 con 0xe1f0cc60 2015-02-27 10:27:45.546275 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:45.546311 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:53.531564 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f268c0 con 0xe1f0cc60 2015-02-27 10:27:56.846593 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:56.846627 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:57.346965 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:27:57.347001 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:28:03.531785 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26a80 con 0xe1f0cc60 2015-02-27 10:28:13.532027 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26c40 con 0xe1f0cc60 2015-02-27 10:28:23.047382 7fe8e3f2f700 20 osd.11 62839 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:28:23.047419 7fe8e3f2f700 5 osd.11 62839 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 10:28:23.532271 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26e00 con 0xe1f0cc60 2015-02-27 10:28:33.532496 7fe8e8536700 1 -- 192.168.12.25:6800/673078 -- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0}) v2 -- ?+0 0xe5f26fc0 con 0xe1f0cc60 62839? But it was 62738 earlier, so it is actually advancing toward the 63675? If what I've assumed about the osd map numbers is true. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chris Murray Sent: 27 February 2015 08:33 To: Gregory Farnum Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will the cluster recover without help? That's interesting, it seems to be alternating between two lines, but only one thread this time? I'm guessing the 62738 is the osdmap, which is much behind where it should be? Osd.0 and osd.3 are on 63675, if I'm understanding that correctly. 2015-02-27 08:18:48.724645 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:18:48.724683 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:00.025003 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:00.025040 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:04.125395 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:04.125431 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:26.225763 7f2fbd1e8700 20 osd.11 62738 update_osd_stat osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist []) 2015-02-27 08:19:26.225797 7f2fbd1e8700 5 osd.11 62738 heartbeat: osd_stat(1305 GB used,
Re: [ceph-users] Cluster never reaching clean after osd out
Hi Stéphane, I think I got it. I purged my complete Cluster and set up the new one like the old and got exactly the same problem again. Then I did ceph osd crush tunables optimal which added the option chooseleaf_vary_r 1 to the crushmap. After that everything works fine. Try it at your cluster. Greetings Yves Gesendet: Dienstag, 24. Februar 2015 um 10:49 Uhr Von: Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.fr An: Yves Kretzschmar yveskretzsch...@web.de, ceph-users@lists.ceph.com Betreff: Re: [ceph-users] Cluster never reaching clean after osd out I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel 3.16.0-0.bpo.4-amd64. For testing I did a ~# ceph osd out 20 from a clean state. Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean to get up and then go down to about 11. Short after that the cluster keeps stuck forever in this state: health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded (0.265%); 3691/169647 objects misplaced (2.176%) According to the documentation at http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should reach a clean state after an osd out. What am I doing wrong? Hi Yves and Cephers, I have a cluster with 6 nodes and 36 OSD. I have the same pb : cluster 1d0503fb-36d0-4dbc-aabe-a2a0709163cd health HEALTH_WARN 76 pgs stuck unclean; recovery 1/624 objects degraded (0.160%); 7/624 objects misplaced (1.122%) monmap e6: 6 mons osdmap e616: 36 osds: 36 up, 35 in pgmap v16344: 2048 pgs, 1 pools, 689 MB data, 208 objects 178 GB used, 127 TB / 127 TB avail 1/624 objects degraded (0.160%); 7/624 objects misplaced (1.122%) 76 active+remapped 1972 active+clean After 'out' osd.15, ceph didn't return to health ok, and get misplaced object ... :-/ I noticed that this happen when i use a replicated 3 pool. When the pool use a replicated 2, ceph returned to health ok... Have you try with a replicated 2 pool ? In the same way, I wonder why he does not return to the status ok CEPH OSD TREE # id weight type name up/down reweight -1000 144 root default -200 48 datacenter mo -133 48 rack mom02 -4 24 host mom02h01 12 4 osd.12 up 1 13 4 osd.13 up 1 14 4 osd.14 up 1 16 4 osd.16 up 1 17 4 osd.17 up 1 15 4 osd.15 up 0 -5 24 host mom02h02 18 4 osd.18 up 1 19 4 osd.19 up 1 20 4 osd.20 up 1 21 4 osd.21 up 1 22 4 osd.22 up 1 23 4 osd.23 up 1 -202 48 datacenter me -135 48 rack mem04 -6 24 host mem04h01 24 4 osd.24 up 1 25 4 osd.25 up 1 26 4 osd.26 up 1 27 4 osd.27 up 1 28 4 osd.28 up 1 29 4 osd.29 up 1 -7 24 host mem04h02 30 4 osd.30 up 1 31 4 osd.31 up 1 32 4 osd.32 up 1 33 4 osd.33 up 1 34 4 osd.34 up 1 35 4 osd.35 up 1 -201 48 datacenter li -134 48 rack lis04 -2 24 host lis04h01 0 4 osd.0 up 1 2 4 osd.2 up 1 3 4 osd.3 up 1 4 4 osd.4 up 1 5 4 osd.5 up 1 1 4 osd.1 up 1 -3 24 host lis04h02 6 4 osd.6 up 1 7 4 osd.7 up 1 8 4 osd.8 up 1 9 4 osd.9 up 1 10 4 osd.10 up 1 11 4 osd.11 up 1 Crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 device 16 osd.16 device 17 osd.17 device 18 osd.18 device 19 osd.19 device 20 osd.20 device 21 osd.21 device 22