Re: [ceph-users] Radosgw Timeout
On 22.05.2014 15:36, Yehuda Sadeh wrote: On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl georg.hoellr...@xidras.com wrote: Hello List, Using the radosgw works fine, as long as the amount of data doesn't get too big. I have created one bucket that holds many small files, separated into different directories. But whenever I try to acess the bucket, I only run into some timeout. The timeout is at around 30 - 100 seconds. This is smaller then the Apache timeout of 300 seconds. I've tried to access the bucket with different clients - one thing is s3cmd - which still is able to upload things, but takes rather long time, when listing the contents. Then I've tried with s3fs-fuse - which throws ls: reading directory .: Input/output error Also Cyberduck and S3Browser show a similar behaivor. Is there an option, to only send back maybe 1000 list entries, like Amazon das? So that the client might decide, if he want's to list all the contents? That how it works, it doesn't return more than 1000 entries at once. OK. I found that in the Requests. So it's the client, that states how many objects should be in the listing with sending the max-keys=1000 variable: - - - [23/May/2014:08:49:33 +] GET /test/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 715 - Cyberduck/4.4.4 (14505) (Windows NT (unknown)/6.2) (x86) xidrasservice.com:443 Are there any timeout values in radosgw? Are you sure the timeout is in the gateway itself? Could be apache that is timing out. Will need to see the apache access logs for these operations, radosgw debug and messenger logs (debug rgw = 20, debug ms = 1), to give a better answer. No I'm not sure where the timeout comes from. As far as I can tell, apache times out after 300 seconds - so that should not be the problem. I think I found something in the apache logs: [Fri May 23 08:59:39.385548 2014] [fastcgi:error] [pid 3035:tid 140723006891776] [client 10.0.1.66:46049] FastCGI: comm with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec) [Fri May 23 08:59:39.385604 2014] [fastcgi:error] [pid 3035:tid 140723006891776] [client 10.0.1.66:46049] FastCGI: incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi I've increased the timeout to 900 in the apache vhosts config: FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/radosgw.vvx-ceph-m-02 -idle-timeout 900 Now it's not working, and I don't get a log entry any more. Most interesting when watching the debug output - I'm getting that rados successfully finished with the request. But at the same time, the client tells me, it failed. I've shortened the log file, as far as I could see, the info repeats itself... 2014-05-23 09:38:43.051395 7f1b427fc700 1 == starting new request req=0x7f1b3400f1c0 = 2014-05-23 09:38:43.051597 7f1b427fc700 1 -- 10.0.1.107:0/1005898 -- 10.0.1.199:6800/14453 -- osd_op(client.72942.0:120 UHXW458EH1RVULE1BCEH [getxattrs,stat] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b4640 con 0x2455930 2014-05-23 09:38:43.053180 7f1b96d80700 1 -- 10.0.1.107:0/1005898 == osd.0 10.0.1.199:6800/14453 23 osd_op_reply(120 UHXW458EH1RVULE1BCEH [getxattrs,stat] v0'0 uv1 ondisk = 0) v6 229+0+20 (1060030390 0 1010060712) 0x7f1b58002540 con 0x2455930 2014-05-23 09:38:43.053380 7f1b427fc700 1 -- 10.0.1.107:0/1005898 -- 10.0.1.199:6800/14453 -- osd_op(client.72942.0:121 UHXW458EH1RVULE1BCEH [read 0~524288] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b45d0 con 0x2455930 2014-05-23 09:38:43.054359 7f1b96d80700 1 -- 10.0.1.107:0/1005898 == osd.0 10.0.1.199:6800/14453 24 osd_op_reply(121 UHXW458EH1RVULE1BCEH [read 0~8] v0'0 uv1 ondisk = 0) v6 187+0+8 (3510944971 0 3829959217) 0x7f1b580057b0 con 0x2455930 2014-05-23 09:38:43.054490 7f1b427fc700 1 -- 10.0.1.107:0/1005898 -- 10.0.1.199:6806/15018 -- osd_op(client.72942.0:122 macm [getxattrs,stat] 7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b6010 con 0x2457de0 2014-05-23 09:38:43.055871 7f1b96d80700 1 -- 10.0.1.107:0/1005898 == osd.2 10.0.1.199:6806/15018 3 osd_op_reply(122 macm [getxattrs,stat] v0'0 uv46 ondisk = 0) v6 213+0+91 (22324782 0 2022698800) 0x7f1b500025a0 con 0x2457de0 2014-05-23 09:38:43.055963 7f1b427fc700 1 -- 10.0.1.107:0/1005898 -- 10.0.1.199:6806/15018 -- osd_op(client.72942.0:123 macm [read 0~524288] 7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b3950 con 0x2457de0 2014-05-23 09:38:43.057087 7f1b96d80700 1 -- 10.0.1.107:0/1005898 == osd.2 10.0.1.199:6806/15018 4 osd_op_reply(123 macm [read 0~310] v0'0 uv46 ondisk = 0) v6 171+0+310 (3762965810 0 1648184722) 0x7f1b500026e0 con 0x2457de0 2014-05-23 09:38:43.057364 7f1b427fc700 1 -- 10.0.1.107:0/1005898 -- 10.0.0.26:6809/4834 -- osd_op(client.72942.0:124 store [call version.read,getxattrs,stat] 5.c5755cee ack+read e279) v4 -- ?+0 0x7f1b66b0 con 0x7f1b440022e0 2014-05-23 09:38:43.059223 7f1b96d80700 1 -- 10.0.1.107:0/1005898 == osd.7 10.0.0.26:6809/4834 37 osd_op_reply
Re: [ceph-users] Radosgw Timeout
Thank you very much - I think I've solved the whole thing. It wasn't in radosgw. The solution was, - increase the timeout in Apache conf. - when using haproxy, also increase the timeouts there! Georg On 22.05.2014 15:36, Yehuda Sadeh wrote: On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl georg.hoellr...@xidras.com wrote: Hello List, Using the radosgw works fine, as long as the amount of data doesn't get too big. I have created one bucket that holds many small files, separated into different directories. But whenever I try to acess the bucket, I only run into some timeout. The timeout is at around 30 - 100 seconds. This is smaller then the Apache timeout of 300 seconds. I've tried to access the bucket with different clients - one thing is s3cmd - which still is able to upload things, but takes rather long time, when listing the contents. Then I've tried with s3fs-fuse - which throws ls: reading directory .: Input/output error Also Cyberduck and S3Browser show a similar behaivor. Is there an option, to only send back maybe 1000 list entries, like Amazon das? So that the client might decide, if he want's to list all the contents? That how it works, it doesn't return more than 1000 entries at once. Are there any timeout values in radosgw? Are you sure the timeout is in the gateway itself? Could be apache that is timing out. Will need to see the apache access logs for these operations, radosgw debug and messenger logs (debug rgw = 20, debug ms = 1), to give a better answer. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw Timeout
On 22.05.2014 17:30, Craig Lewis wrote: On 5/22/14 06:16 , Georg Höllrigl wrote: I have created one bucket that holds many small files, separated into different directories. But whenever I try to acess the bucket, I only run into some timeout. The timeout is at around 30 - 100 seconds. This is smaller then the Apache timeout of 300 seconds. Just so we're all talking about the same things, what does many small files mean to you? Also, how are you separating them into directories? Are you just giving files in the same directory the same leading string, like dir1_subdir1_filename? I can only estimate how many files. ATM I've 25M files on the origin but only 1/10th has been synced to radosgw. These are distributed throuhg 20 folders, each containing about 2k directories with ~ 100 - 500 files each. Do you think that's too much in that usecase? I'm putting about 1M objects, random sizes, in each bucket. I'm not having problems getting individual files, or uploading new ones. It does take a long time for s3cmd to list the contents of the bucket. The only time I get timeouts is when my cluster is very unhealthy. If you're doing a lot more than that, say 10M or 100M objects, then that could cause a hot spot on disk. You might be better off taking your directories, and putting them in their own bucket. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com *Central Desktop. Work together in ways you never thought possible.* Connect with us Website http://www.centraldesktop.com/ | Twitter http://www.twitter.com/centraldesktop | Facebook http://www.facebook.com/CentralDesktop | LinkedIn http://www.linkedin.com/groups?gid=147417 | Blog http://cdblog.centraldesktop.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Radosgw Timeout
Hello List, Using the radosgw works fine, as long as the amount of data doesn't get too big. I have created one bucket that holds many small files, separated into different directories. But whenever I try to acess the bucket, I only run into some timeout. The timeout is at around 30 - 100 seconds. This is smaller then the Apache timeout of 300 seconds. I've tried to access the bucket with different clients - one thing is s3cmd - which still is able to upload things, but takes rather long time, when listing the contents. Then I've tried with s3fs-fuse - which throws ls: reading directory .: Input/output error Also Cyberduck and S3Browser show a similar behaivor. Is there an option, to only send back maybe 1000 list entries, like Amazon das? So that the client might decide, if he want's to list all the contents? Are there any timeout values in radosgw? Any further thoughts, how I would increase performance on these listings? Kind Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Pool without Name
On 14.05.2014 17:26, Wido den Hollander wrote: On 05/14/2014 05:24 PM, Georg Höllrigl wrote: Hello List, I see a pool without a name: ceph osd lspools 0 data,1 metadata,2 rbd,3 .rgw.root,4 .rgw.control,5 .rgw,6 .rgw.gc,7 .users.uid,8 openstack-images,9 openstack-volumes,10 openstack-backups,11 .users,12 .users.swift,13 .users.email,14 .log,15 .rgw.buckets,16 .rgw.buckets.index,17 .usage,18 .intent-log,20 , I've already deleted one of those (with ID 19) with rados rmpool --yes-i-really-really-mean-it But now it's back with ID 20. Where do they come from? What kind of data is in there? You are running Dumpling 0.67.X with the RGW? It's something which is caused by the RGW. No, the cluster is in the latest firefly release 0.80.1 I only found your entries from November 2013 - that's how I know how to delete the entry. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/005737.html I found the discussion https://www.mail-archive.com/ceph-users@lists.ceph.com/msg08975.html - but the only info I see, that the empty named pool comes from rados. So maybe it's a bug somewhere in radosgw? I think pools should have names :) There is a thread on this list from two weeks ago about this. Kind Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados GW Method not allowed
Hello Everyone, The important thing here is, to include the rgw_dns_name in ceph.conf and to restart radosgw. Also you need the DNS configured to point to your radosgw + a wildcard subdomain. For example s3cmd handles the access this way, and you'll see the Method Not Allowed message if you miss anything! Kind Regards, Georg On 13.05.2014 14:30, Georg Höllrigl wrote: Hello, System Ubuntu 14.04 Ceph 0.80 I'm getting either a 405 Method Not Allowed or a 403 Permission Denied from Radosgw. Here is what I get from radosgw: HTTP/1.1 405 Method Not Allowed Date: Tue, 13 May 2014 12:21:43 GMT Server: Apache Accept-Ranges: bytes Content-Length: 82 Content-Type: application/xml ?xml version=1.0 encoding=UTF-8?ErrorCodeMethodNotAllowed/Code/Error I can see that the user exists using: radosgw-admin --name client.radosgw.ceph-m-01 metadata list user I can get the credentials via: #radosgw-admin user info --uid=test { user_id: test, display_name: test, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [], keys: [ { user: test, access_key: 95L2C7BFQ8492LVZ271N, secret_key: f2tqIet+LrD0kAXYAUrZXydL+1nsO6Gs+we+94U5}], swift_keys: [], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I've also found some hints about a broken redirect in apache - but not really a working version. Any hints? Any thoughts about how to solve that? Where to get more detailed logs, why it's not supporting to create a bucket? KInd Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Pool without Name
Hello List, I see a pool without a name: ceph osd lspools 0 data,1 metadata,2 rbd,3 .rgw.root,4 .rgw.control,5 .rgw,6 .rgw.gc,7 .users.uid,8 openstack-images,9 openstack-volumes,10 openstack-backups,11 .users,12 .users.swift,13 .users.email,14 .log,15 .rgw.buckets,16 .rgw.buckets.index,17 .usage,18 .intent-log,20 , I've already deleted one of those (with ID 19) with rados rmpool --yes-i-really-really-mean-it But now it's back with ID 20. Where do they come from? What kind of data is in there? Kind Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rados GW Method not allowed
Hello, System Ubuntu 14.04 Ceph 0.80 I'm getting either a 405 Method Not Allowed or a 403 Permission Denied from Radosgw. Here is what I get from radosgw: HTTP/1.1 405 Method Not Allowed Date: Tue, 13 May 2014 12:21:43 GMT Server: Apache Accept-Ranges: bytes Content-Length: 82 Content-Type: application/xml ?xml version=1.0 encoding=UTF-8?ErrorCodeMethodNotAllowed/Code/Error I can see that the user exists using: radosgw-admin --name client.radosgw.ceph-m-01 metadata list user I can get the credentials via: #radosgw-admin user info --uid=test { user_id: test, display_name: test, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [], keys: [ { user: test, access_key: 95L2C7BFQ8492LVZ271N, secret_key: f2tqIet+LrD0kAXYAUrZXydL+1nsO6Gs+we+94U5}], swift_keys: [], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I've also found some hints about a broken redirect in apache - but not really a working version. Any hints? Any thoughts about how to solve that? Where to get more detailed logs, why it's not supporting to create a bucket? KInd Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Not getting into a clean state
Thank you soo much! That seems to work immidetately. ATM I still see 3 pgs in active+clean+scrubbing state - but that will hopefully fix by time. So the way to go with firefly, is to either use at least 3 hosts for OSDs - or reduce the number of replicas? Kind Regards, Georg On 09.05.2014 10:59, Martin B Nielsen wrote: Hi, I experienced exactly the same with 14.04 and the 0.79 release. It was a fresh clean install with default crushmap and ceph-deploy install as pr. the quick-start guide. Oddly enough changing replica size (incl min_size) from 3 - 2 (and 2-1) and back again it worked. I didn't have time to look into replicating the issue. Cheers, Martin On Thu, May 8, 2014 at 4:30 PM, Georg Höllrigl georg.hoellr...@xidras.com mailto:georg.hoellr...@xidras.com wrote: Hello, We've a fresh cluster setup - with Ubuntu 14.04 and ceph firefly. By now I've tried this multiple times - but the result keeps the same and shows me lots of troubles (the cluster is empty, no client has accessed it) #ceph -s cluster b04fc583-9e71-48b7-a741-__92f4dff4cfef health HEALTH_WARN 470 pgs stale; 470 pgs stuck stale; 18 pgs stuck unclean; 26 requests are blocked 32 sec monmap e2: 3 mons at {ceph-m-01=10.0.0.100:6789/0,__ceph-m-02=10.0.1.101:6789/0,__ceph-m-03=10.0.1.102:6789/0 http://10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0}, election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03 osdmap e409: 9 osds: 9 up, 9 in pgmap v1231: 480 pgs, 9 pools, 822 bytes data, 43 objects 9373 MB used, 78317 GB / 78326 GB avail 451 stale+active+clean 1 stale+active+clean+scrubbing 10 active+clean 18 stale+active+remapped Anyone an idea what happens here? Should an empty cluster not show only active+clean pgs? Regards, Georg _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Not getting into a clean state
Hello, I've already thought about that - but even after changing the replication level (size) I'm not getting a clean cluster (there are only the default pools ATM): root@ceph-m-02:~#ceph -s cluster b04fc583-9e71-48b7-a741-92f4dff4cfef health HEALTH_WARN 232 pgs stuck unclean; recovery 26/126 objects degraded (20.635%) monmap e2: 3 mons at {ceph-m-01=10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0}, election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03 osdmap e56: 9 osds: 9 up, 9 in pgmap v287: 232 pgs, 8 pools, 822 bytes data, 43 objects 9342 MB used, 78317 GB / 78326 GB avail 26/126 objects degraded (20.635%) 119 active 113 active+remapped root@ceph-m-02:~#ceph osd dump | grep size pool 0 'data' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 48 owner 0 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 49 owner 0 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 50 owner 0 flags hashpspool stripe_width 0 pool 3 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 52 owner 0 flags hashpspool stripe_width 0 pool 4 '.rgw.control' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 53 owner 0 flags hashpspool stripe_width 0 pool 5 '.rgw' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 54 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 6 '.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 55 owner 0 flags hashpspool stripe_width 0 pool 7 '.users.uid' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 56 owner 18446744073709551615 flags hashpspool stripe_width 0 Kind Regards, Georg On 09.05.2014 08:29, Mark Kirkwood wrote: So that's two hosts - if this is a new cluster chances are the pools have replication size=3, and won't place replica pgs on the same host... 'ceph osd dump' will let you know if this is the case. If it is ether reduce size to 2, add another host or edit your crush rules to allow replica pgs on the same host. Cheers Mark On 09/05/14 18:20, Georg Höllrigl wrote: #ceph osd tree # idweight type name up/down reweight -1 76.47 root default -2 32.72 host ceph-s-01 0 7.27osd.0 up 1 1 7.27osd.1 up 1 2 9.09osd.2 up 1 3 9.09osd.3 up 1 -3 43.75 host ceph-s-02 4 10.91 osd.4 up 1 5 0.11osd.5 up 1 6 10.91 osd.6 up 1 7 10.91 osd.7 up 1 8 10.91 osd.8 up 1 On 08.05.2014 19:11, Craig Lewis wrote: What does `ceph osd tree` output? On 5/8/14 07:30 , Georg Höllrigl wrote: Hello, We've a fresh cluster setup - with Ubuntu 14.04 and ceph firefly. By now I've tried this multiple times - but the result keeps the same and shows me lots of troubles (the cluster is empty, no client has accessed it) #ceph -s cluster b04fc583-9e71-48b7-a741-92f4dff4cfef health HEALTH_WARN 470 pgs stale; 470 pgs stuck stale; 18 pgs stuck unclean; 26 requests are blocked 32 sec monmap e2: 3 mons at {ceph-m-01=10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0}, election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03 osdmap e409: 9 osds: 9 up, 9 in pgmap v1231: 480 pgs, 9 pools, 822 bytes data, 43 objects 9373 MB used, 78317 GB / 78326 GB avail 451 stale+active+clean 1 stale+active+clean+scrubbing 10 active+clean 18 stale+active+remapped ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
Looks like you enabled directory fragments, which is buggy in ceph version 0.72. Regards Yan, Zheng When it's enabled it wasn't intentionally. So how would I disable it? Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
And that's exactly what it sounds like — the MDS isn't finding objects that are supposed to be in the RADOS cluster. I'm not sure, what I should think about that. MDS shouldn't access data for RADOS and vice versa? Anyway, glad it fixed itself, but it sounds like you've got some infrastructure issues or something you need to sort out first. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com I think we found a reason - somehow all the memory gets used up - maybe some leak? So ATM it's not really fixed. Is there anything I could do, so that we could track down and fix this in future versions? Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
RESETSESSION but no longer connecting 2014-04-16 12:29:50.769693 7f1ddf5c3700 0 -- 10.0.1.107:6803/21953 10.0.1.107:6789/0 pipe(0x6dbc6280 sd=243 :43169 s=4 pgs=0 cs=0 l=1 c=0x1358ed580).connect got RESETSESSION but no longer connecting 2014-04-16 12:29:51.621605 7f1de05d3700 0 -- 10.0.1.107:6803/21953 10.0.1.107:6789/0 pipe(0x21154780 sd=243 :43929 s=4 pgs=0 cs=0 l=1 c=0xecb1a580).connect got RESETSESSION but no longer connecting 2014-04-16 12:29:51.886405 7f1dea867700 -1 mds.0.13 *** got signal Terminated *** 2014-04-16 12:29:51.886894 7f1dea867700 1 mds.0.13 suicide. wanted down:dne, now up:rejoin Also see lots of these: ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.900657 7f937a5e1700 0 log [ERR] : dir 1c4639d.1c4639d object missing on disk; some files may be lost ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.913245 7f937a5e1700 0 log [ERR] : dir 1c4617e.1c4617e object missing on disk; some files may be lost ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.925811 7f937a5e1700 0 log [ERR] : dir 1c45d08.1c45d08 object missing on disk; some files may be lost ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.941476 7f937a5e1700 0 log [ERR] : dir 1c45d9e.1c45d9e object missing on disk; some files may be lost ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.956158 7f937a5e1700 0 log [ERR] : dir 1c461e5.1c461e5 object missing on disk; some files may be lost ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.968524 7f937a5e1700 0 log [ERR] : dir 1c46608.1c46608 object missing on disk; some files may be lost ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.979229 7f937a5e1700 0 log [ERR] : dir 1c468b6.1c468b6 object missing on disk; some files may be lost At the moment, I've only one mds running - but clients (mainly using fuse) can't connect. Regards, Georg On 16.04.2014 16:27, Gregory Farnum wrote: What's the backtrace from the MDS crash? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 16, 2014 at 7:11 AM, Georg Höllrigl georg.hoellr...@xidras.com wrote: Hello, Using Ceph MDS with one active and one standby server - a day ago one of the mds crashed and I restarted it. Tonight it crashed again, a few hours later, also the second mds crashed. #ceph -v ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) At the moment cephfs is dead - with following health status: #ceph -s cluster b04fc583-9e71-48b7-a741-92f4dff4cfef health HEALTH_WARN mds cluster is degraded; mds c is laggy monmap e3: 3 mons at {ceph-m-01=10.0.0.176:6789/0,ceph-m-02=10.0.1.107:6789/0,ceph-m-03=10.0.1.108:6789/0}, election epoch 6274, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03 mdsmap e2055: 1/1/1 up {0=ceph-m-03=up:rejoin(laggy or crashed)} osdmap e3752: 39 osds: 39 up, 39 in pgmap v3277576: 8328 pgs, 17 pools, 6461 GB data, 17066 kobjects 13066 GB used, 78176 GB / 91243 GB avail 8328 active+clean client io 1193 B/s rd, 0 op/s I couldn't really find any useful infos in the logfiles nor searching in documentations. Any ideas how to get cephfs up and running? Here is part of mds log: 2014-04-16 14:07:05.603501 7ff184c64700 1 mds.0.server reconnect gave up on client.7846580 10.0.1.152:0/14639 2014-04-16 14:07:05.603525 7ff184c64700 1 mds.0.46 reconnect_done 2014-04-16 14:07:05.674990 7ff186d69700 1 mds.0.46 handle_mds_map i am now mds.0.46 2014-04-16 14:07:05.674996 7ff186d69700 1 mds.0.46 handle_mds_map state change up:reconnect -- up:rejoin 2014-04-16 14:07:05.674998 7ff186d69700 1 mds.0.46 rejoin_start 2014-04-16 14:07:22.347521 7ff17f825700 0 -- 10.0.1.107:6815/17325 10.0.1.68:0/4128280551 pipe(0x5e2ac80 sd=930 :6815 s=2 pgs=153 cs=1 l=0 c=0x5e2e160).fault with nothing to send, going to standby Any ideas, how to solve laggy or crashed ? Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
Whatever happened - It fixed itself!? When restarting, I got ~ 165k log messages like: 2014-04-17 07:30:14.856421 7fc50b991700 0 log [WRN] : ino 1f24fe0 2014-04-17 07:30:14.856422 7fc50b991700 0 log [WRN] : ino 1f24fe1 2014-04-17 07:30:14.856423 7fc50b991700 0 log [WRN] : ino 1f24fe2 2014-04-17 07:30:14.856424 7fc50b991700 0 log [WRN] : ino 1f24fe3 2014-04-17 07:30:14.856427 7fc50b991700 0 log [WRN] : ino 1f24fe4 2014-04-17 07:30:14.856428 7fc50b991700 0 log [WRN] : ino 1f24fe5 And the clients recovered!? I would be really interested, what happend! Georg On 17.04.2014 09:45, Georg Höllrigl wrote: Hello Greg, I've searched - but don't see any backtraces... I've tried to get some more info out of the logs. I really hope, there is something interesting in it: It all started two days ago with an authentication error: 2014-04-14 21:08:55.929396 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:07.989547 7fd93b62e700 1 mds.0.0 replay_done (as standby) 2014-04-14 21:09:08.989647 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:10.633786 7fd93b62e700 1 mds.0.0 replay_done (as standby) 2014-04-14 21:09:11.633886 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:17.995105 7fd93f644700 0 mds.0.0 handle_mds_beacon no longer laggy 2014-04-14 21:09:39.798798 7fd93f644700 0 monclient: hunting for new mon 2014-04-14 21:09:39.955078 7fd93f644700 1 mds.-1.-1 handle_mds_map i (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself 2014-04-14 21:09:39.955094 7fd93f644700 1 mds.-1.-1 respawn 2014-04-14 21:09:39.955106 7fd93f644700 1 mds.-1.-1 e: '/usr/bin/ceph-mds' 2014-04-14 21:09:39.955109 7fd93f644700 1 mds.-1.-1 0: '/usr/bin/ceph-mds' 2014-04-14 21:09:39.955110 7fd93f644700 1 mds.-1.-1 1: '-i' 2014-04-14 21:09:39.955112 7fd93f644700 1 mds.-1.-1 2: 'ceph-m-02' 2014-04-14 21:09:39.955113 7fd93f644700 1 mds.-1.-1 3: '--pid-file' 2014-04-14 21:09:39.955114 7fd93f644700 1 mds.-1.-1 4: '/var/run/ceph/mds.ceph-m-02.pid' 2014-04-14 21:09:39.955116 7fd93f644700 1 mds.-1.-1 5: '-c' 2014-04-14 21:09:39.955117 7fd93f644700 1 mds.-1.-1 6: '/etc/ceph/ceph.conf' 2014-04-14 21:09:39.979138 7fd93f644700 1 mds.-1.-1 cwd / 2014-04-14 19:09:40.922683 7f8ba9973780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-14 19:09:40.975070 7f8ba9973780 1 mds.-1.0 suicide. wanted down:dne, now up:boot That was fixed with restarting mds (+ the whole server). 2014-04-15 07:07:15.948650 7f9fdec0d780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:07:15.954422 7f9fdec0d780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:15:49.177861 7fe8a1d60780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:15:49.184046 7fe8a1d60780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:17:32.598031 7fab123e6780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:17:32.604592 7fab123e6780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:21:56.099203 7fd37b951780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:21:56.105254 7fd37b951780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:22:09.345800 7f23392ef780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:22:09.391087 7f23392ef780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:28:01.762191 7fab6d14b780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:28:01.779507 7fab6d14b780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:35:49.065110 7fe4f6b0d780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233 2014-04-15 07:35:52.191856 7fe4f6b05700 0 -- 10.0.1.107:6800/1233 10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1 c=0x2f81580).fault 2014-04-15 07:35
[ceph-users] Troubles MDS
Hello, Using Ceph MDS with one active and one standby server - a day ago one of the mds crashed and I restarted it. Tonight it crashed again, a few hours later, also the second mds crashed. #ceph -v ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) At the moment cephfs is dead - with following health status: #ceph -s cluster b04fc583-9e71-48b7-a741-92f4dff4cfef health HEALTH_WARN mds cluster is degraded; mds c is laggy monmap e3: 3 mons at {ceph-m-01=10.0.0.176:6789/0,ceph-m-02=10.0.1.107:6789/0,ceph-m-03=10.0.1.108:6789/0}, election epoch 6274, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03 mdsmap e2055: 1/1/1 up {0=ceph-m-03=up:rejoin(laggy or crashed)} osdmap e3752: 39 osds: 39 up, 39 in pgmap v3277576: 8328 pgs, 17 pools, 6461 GB data, 17066 kobjects 13066 GB used, 78176 GB / 91243 GB avail 8328 active+clean client io 1193 B/s rd, 0 op/s I couldn't really find any useful infos in the logfiles nor searching in documentations. Any ideas how to get cephfs up and running? Here is part of mds log: 2014-04-16 14:07:05.603501 7ff184c64700 1 mds.0.server reconnect gave up on client.7846580 10.0.1.152:0/14639 2014-04-16 14:07:05.603525 7ff184c64700 1 mds.0.46 reconnect_done 2014-04-16 14:07:05.674990 7ff186d69700 1 mds.0.46 handle_mds_map i am now mds.0.46 2014-04-16 14:07:05.674996 7ff186d69700 1 mds.0.46 handle_mds_map state change up:reconnect -- up:rejoin 2014-04-16 14:07:05.674998 7ff186d69700 1 mds.0.46 rejoin_start 2014-04-16 14:07:22.347521 7ff17f825700 0 -- 10.0.1.107:6815/17325 10.0.1.68:0/4128280551 pipe(0x5e2ac80 sd=930 :6815 s=2 pgs=153 cs=1 l=0 c=0x5e2e160).fault with nothing to send, going to standby Any ideas, how to solve laggy or crashed ? Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] issues with 'https://ceph.com/git/?p=ceph.git; a=blob_plain; f=keys/release.asc'
documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com and delete the original message. Your cooperation is appreciated. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dipl.-Ing. (FH) Georg Höllrigl Technik Xidras GmbH Stockern 47 3744 Stockern Austria Tel: +43 (0) 2983 201 - 30505 Fax: +43 (0) 2983 201 - 930505 Email: georg.hoellr...@xidras.com Web: http://www.xidras.com FN 317036 f | Landesgericht Krems | ATU64485024 VERTRAULICHE INFORMATIONEN! Diese eMail enthält vertrauliche Informationen und ist nur für den berechtigten Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten wir Sie, diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen, noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form. Wir danken für Ihre Kooperation! CONFIDENTIAL! This email contains confidential information and is intended for the authorised recipient only. If you are not an authorised recipient, please return the email to us and then delete it from your computer and mail-server. You may neither use nor edit any such emails including attachments, nor make them accessible to third parties in any manner whatsoever. Thank you for your cooperation ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] issues with 'https://ceph.com/git/?p=ceph.git; a=blob_plain; f=keys/release.asc'
The whole GIT seems unreachable. Anybody knows what's going on? On 30.09.2013 17:33, Mike O'Toole wrote: I have had the same issues. From: qgra...@onq.com.au To: ceph-users@lists.ceph.com Date: Mon, 30 Sep 2013 00:01:11 + Subject: [ceph-users] issues with 'https://ceph.com/git/?p=ceph.git; a=blob_plain; f=keys/release.asc' Hey Guys, Looks like 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' is down. Regards, Quenten Grasso ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using radosgw with s3cmd: Bucket failure
On 23.08.2013 16:24, Yehuda Sadeh wrote: On Fri, Aug 23, 2013 at 1:47 AM, Tobias Brunner tob...@tobru.ch wrote: Hi, I'm trying to use radosgw with s3cmd: # s3cmd ls # s3cmd mb s3://bucket-1 ERROR: S3 error: 405 (MethodNotAllowed): So there seems to be something missing according to buckets. How can I create buckets? What do I have to configure on the radosgw side to have buckets working? The problem that you have here is that s3cmd uses the virtual host bucket name mechanism, e.g. it tries to access http://bucket.host/ instead of the usual http://host/bucket. You can configure the gateway to support that (set 'rgw dns name = host' in your ceph.conf), however, you'll also need to be able to route all these requests to your host, using some catch-all dns. The easiest way to go would be to configure your client to not use that virtual host bucket name, but I'm not completely sure s3cmd can do that. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com I'm standing directly at the same problem - but this didn't help. I've set up the DNS, can reach the subdomains and also the rgw dns name. But still the same troubles here :( Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using radosgw with s3cmd: Bucket failure
Just in case someone stumbles across the same problem: The option name in ceph.conf is rgw_dns_name - not as described rgw dns name at http://ceph.com/docs/next/radosgw/config-ref/ !? And the hostname needs to be set to your DNS name withouth any wildcard. Georg On 06.09.2013 08:51, Georg Höllrigl wrote: On 23.08.2013 16:24, Yehuda Sadeh wrote: On Fri, Aug 23, 2013 at 1:47 AM, Tobias Brunner tob...@tobru.ch wrote: Hi, I'm trying to use radosgw with s3cmd: # s3cmd ls # s3cmd mb s3://bucket-1 ERROR: S3 error: 405 (MethodNotAllowed): So there seems to be something missing according to buckets. How can I create buckets? What do I have to configure on the radosgw side to have buckets working? The problem that you have here is that s3cmd uses the virtual host bucket name mechanism, e.g. it tries to access http://bucket.host/ instead of the usual http://host/bucket. You can configure the gateway to support that (set 'rgw dns name = host' in your ceph.conf), however, you'll also need to be able to route all these requests to your host, using some catch-all dns. The easiest way to go would be to configure your client to not use that virtual host bucket name, but I'm not completely sure s3cmd can do that. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com I'm standing directly at the same problem - but this didn't help. I've set up the DNS, can reach the subdomains and also the rgw dns name. But still the same troubles here :( Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dipl.-Ing. (FH) Georg Höllrigl Technik Xidras GmbH Stockern 47 3744 Stockern Austria Tel: +43 (0) 2983 201 - 30505 Fax: +43 (0) 2983 201 - 930505 Email: georg.hoellr...@xidras.com Web: http://www.xidras.com FN 317036 f | Landesgericht Krems | ATU64485024 VERTRAULICHE INFORMATIONEN! Diese eMail enthält vertrauliche Informationen und ist nur für den berechtigten Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten wir Sie, diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen, noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form. Wir danken für Ihre Kooperation! CONFIDENTIAL! This email contains confidential information and is intended for the authorised recipient only. If you are not an authorised recipient, please return the email to us and then delete it from your computer and mail-server. You may neither use nor edit any such emails including attachments, nor make them accessible to third parties in any manner whatsoever. Thank you for your cooperation ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Destroyed Ceph Cluster
Hello List, The troubles to fix such a cluster continue... I get output like this now: # ceph health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; mds cluster is degraded; mds vvx-ceph-m-03 is laggy When checking for the ceph-mds processes, there are now none left... no matter which server I check. And the won't start up again!? The log starts up with: 2013-08-19 11:23:30.503214 7f7e9dfbd780 0 ceph version 0.67 (e3b7bc5bce8ab330ec1661381072368af3c218a0), process ceph-mds, pid 27636 2013-08-19 11:23:30.523314 7f7e9904b700 1 mds.-1.0 handle_mds_map standby 2013-08-19 11:23:30.529418 7f7e9904b700 1 mds.0.26 handle_mds_map i am now mds.0.26 2013-08-19 11:23:30.529423 7f7e9904b700 1 mds.0.26 handle_mds_map state change up:standby -- up:replay 2013-08-19 11:23:30.529426 7f7e9904b700 1 mds.0.26 replay_start 2013-08-19 11:23:30.529434 7f7e9904b700 1 mds.0.26 recovery set is 2013-08-19 11:23:30.529436 7f7e9904b700 1 mds.0.26 need osdmap epoch 277, have 276 2013-08-19 11:23:30.529438 7f7e9904b700 1 mds.0.26 waiting for osdmap 277 (which blacklists prior instance) 2013-08-19 11:23:30.534090 7f7e9904b700 -1 mds.0.sessionmap _load_finish got (2) No such file or directory 2013-08-19 11:23:30.535483 7f7e9904b700 -1 mds/SessionMap.cc: In function 'void SessionMap::_load_finish(int, ceph::bufferlist)' thread 7f7e9904b700 time 2013-08-19 11:23:30.534107 mds/SessionMap.cc: 83: FAILED assert(0 == failed to load sessionmap) Anyone an idea how to get the cluster back running? Georg On 16.08.2013 16:23, Mark Nelson wrote: Hi Georg, I'm not an expert on the monitors, but that's probably where I would start. Take a look at your monitor logs and see if you can get a sense for why one of your monitors is down. Some of the other devs will probably be around later that might know if there are any known issues with recreating the OSDs and missing PGs. Mark On 08/16/2013 08:21 AM, Georg Höllrigl wrote: Hello, I'm still evaluating ceph - now a test cluster with the 0.67 dumpling. I've created the setup with ceph-deploy from GIT. I've recreated a bunch of OSDs, to give them another journal. There already was some test data on these OSDs. I've already recreated the missing PGs with ceph pg force_create_pg HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 5 requests are blocked 32 sec; mds cluster is degraded; 1 mons down, quorum 0,1,2 vvx-ceph-m-01,vvx-ceph-m-02,vvx-ceph-m-03 Any idea how to fix the cluster, besides completley rebuilding the cluster from scratch? What if such a thing happens in a production environment... The pgs from ceph pg dump looks all like creating for some time now: 2.3d0 0 0 0 0 0 0 creating 2013-08-16 13:43:08.186537 0'0 0:0 [] [] 0'0 0.000'0 0.00 Is there a way to just dump the data, that was on the discarded OSDs? Kind Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Destroyed Ceph Cluster
Hello, I'm still evaluating ceph - now a test cluster with the 0.67 dumpling. I've created the setup with ceph-deploy from GIT. I've recreated a bunch of OSDs, to give them another journal. There already was some test data on these OSDs. I've already recreated the missing PGs with ceph pg force_create_pg HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 5 requests are blocked 32 sec; mds cluster is degraded; 1 mons down, quorum 0,1,2 vvx-ceph-m-01,vvx-ceph-m-02,vvx-ceph-m-03 Any idea how to fix the cluster, besides completley rebuilding the cluster from scratch? What if such a thing happens in a production environment... The pgs from ceph pg dump looks all like creating for some time now: 2.3d0 0 0 0 0 0 0 creating 2013-08-16 13:43:08.186537 0'0 0:0 [] [] 0'0 0.000'0 0.00 Is there a way to just dump the data, that was on the discarded OSDs? Kind Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mounting a pool via fuse
Thank you for the explaination. By mounting as filesystem I'm talking about something similar to this: http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/ Using the kernel module, I can mount a subdirectory into my directory tree - a directory, where I have assigned a pool. Using fuse, I can't mount a subdirectory? By the way setting the layout seems to have a bug: # cephfs /mnt/macm01 set_layout -p 4 Error setting layout: Invalid argument I have to add the -u option, then it works: # cephfs /mnt/mailstore set_layout -p 5 -u 4194304 Kind Regards, Georg On 13.08.2013 12:09, Dzianis Kahanovich wrote: Georg Höllrigl пишет: I'm using ceph 0.61.7. When using ceph-fuse, I couldn't find a way, to only mount one pool. Is there a way to mount a pool - or is it simply not supported? This mean mount as fs? Same as kernel-level cephfs (fuse cephfs = same instance). You cannot mount pool, but can mount filesystem and can map pool to any point of filesystem (file or directory), include root. First, mount ceph via kernel - mount -t ceph (just for cephfs tool syntax compatibility). For example - to /mnt/ceph. Then say ceph df and lookup pool number (not name!), for example pool number is 10. And last: mkdir -p /mnt/ceph/pools/pool1 cephfs /mnt/ceph/pools/pool1 set_layout -p 10 or just (for ceph's root): cephfs /mnt/ceph set_layout -p 10 Next you can unmount kernel-level and mount this point via fuse. PS For ceph developers: trying this for qouta (with ceph osd pool set-quota) semi-working: on quota overflow - nothing limited, but ceph health show warning. In case of no other ways to quota, it may qualified as bug and not too actual only while big number of pools performance limitation. So, FYI. -- Dipl.-Ing. (FH) Georg Höllrigl Technik Xidras GmbH Stockern 47 3744 Stockern Austria Tel: +43 (0) 2983 201 - 30505 Fax: +43 (0) 2983 201 - 930505 Email: georg.hoellr...@xidras.com Web: http://www.xidras.com FN 317036 f | Landesgericht Krems | ATU64485024 VERTRAULICHE INFORMATIONEN! Diese eMail enthält vertrauliche Informationen und ist nur für den berechtigten Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten wir Sie, diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen, noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form. Wir danken für Ihre Kooperation! CONFIDENTIAL! This email contains confidential information and is intended for the authorised recipient only. If you are not an authorised recipient, please return the email to us and then delete it from your computer and mail-server. You may neither use nor edit any such emails including attachments, nor make them accessible to third parties in any manner whatsoever. Thank you for your cooperation ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mounting a pool via fuse
Hi, I'm using ceph 0.61.7. When using ceph-fuse, I couldn't find a way, to only mount one pool. Is there a way to mount a pool - or is it simply not supported? Kind Regards, Georg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com