Re: [ceph-users] Cache tiering and target_max_bytes
On 08/14/2014 10:30 PM, Sage Weil wrote: On Thu, 14 Aug 2014, Pawe? Sadowski wrote: W dniu 14.08.2014 17:20, Sage Weil pisze: On Thu, 14 Aug 2014, Pawe? Sadowski wrote: Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks like ceph is not respecting target_max_bytes settings. Steps to reproduce: - configure cache tiering - set target_max_bytes to 32G (on hot pool) - write more than 32G of data - nothing happens snip details The reason the agent is doing work is because you don't have hit_set_* configured for the cache pool, which means the cluster isn't tracking what objects get read to inform the flush/evict decisions. Configuring that will fix this. Try ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 8 ceph osd pool set cache hit_set_period 3600 or similar. The agent could still run in a brain-dead mode without it, but it suffers from the bug you found. That was fixed after 0.80.5 and will be in 0.80.6. Thanks! PS ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pools with latest master
Yes, these are recent changes from John. Because of these changes: commit 90e6daec9f3fe2a3ba051301ee50940278ade18b Author: John Spray john.sp...@inktank.com Date: Tue Apr 29 15:39:45 2014 +0100 osdmap: Don't create FS pools by default Because many Ceph users don't use the filesystem, don't create the 'data' and 'metadata' pools by default -- they will be created by newfs if they are needed. Signed-off-by: John Spray john.sp...@inktank.com commit 7294e8c4df6df9d0898f82bb6e0839ed98149310 Author: John Spray john.sp...@inktank.com Date: Tue May 27 11:04:43 2014 +0100 test/qa: update for MDSMonitor changes Accomodate changes: * data and metadata pools no longer exist by default * filesystem-using tests must use `fs new` to create the filesystem first. Signed-off-by: John Spray john.sp...@inktank.com Varada From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Somnath Roy Sent: Saturday, August 16, 2014 3:19 AM To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org Subject: [ceph-users] pools with latest master Hi, I have installed created a single node/single osd cluster with latest master for some experiment and saw it is creating only rbd pool by default not the data/metadata pools. Is this something changed recently ? Thanks Regards Somnath PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pools with latest master
Yes, these are recent changes from John. Because of these changes: commit 90e6daec9f3fe2a3ba051301ee50940278ade18b Author: John Spray john.sp...@inktank.com Date: Tue Apr 29 15:39:45 2014 +0100 osdmap: Don't create FS pools by default Because many Ceph users don't use the filesystem, don't create the 'data' and 'metadata' pools by default -- they will be created by newfs if they are needed. Signed-off-by: John Spray john.sp...@inktank.com commit 7294e8c4df6df9d0898f82bb6e0839ed98149310 Author: John Spray john.sp...@inktank.com Date: Tue May 27 11:04:43 2014 +0100 test/qa: update for MDSMonitor changes Accomodate changes: * data and metadata pools no longer exist by default * filesystem-using tests must use `fs new` to create the filesystem first. Signed-off-by: John Spray john.sp...@inktank.com Varada From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Somnath Roy Sent: Saturday, August 16, 2014 3:19 AM To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org Subject: [ceph-users] pools with latest master Hi, I have installed created a single node/single osd cluster with latest master for some experiment and saw it is creating only rbd pool by default not the data/metadata pools. Is this something changed recently ? Thanks Regards Somnath PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster inconsistency?
Hi, I tried this after restarting the osd, but I guess that was not the aim ( # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_| grep 6adb1100 -A 100 IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily unavailable tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr)) .. ) When I run it after bringing the osd down, it takes a while, but it has no output.. (When running it without the grep, I'm getting a huge list ) Or should I run this immediately after the osd is crashed, (because it maybe rebalanced? I did already restarted the cluster) I don't know if it is related, but before I could all do that, I had to fix something else: A monitor did run out if disk space, using 8GB for his store.db folder (lot of sst files). Other monitors are also near that level. Never had that problem on previous setups before. I recreated a monitor and now it uses 3.8GB. Thanks! Kenneth - Message from Sage Weil sw...@redhat.com - Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT) From: Sage Weil sw...@redhat.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Haomai Wang haomaiw...@gmail.com Cc: Kenneth Waegeman kenneth.waege...@ugent.be, ceph-users@lists.ceph.com On Fri, 15 Aug 2014, Haomai Wang wrote: Hi Kenneth, I don't find valuable info in your logs, it lack of the necessary debug output when accessing crash code. But I scan the encode/decode implementation in GenericObjectMap and find something bad. For example, two oid has same hash and their name is: A: rb.data.123 B: rb-123 In ghobject_t compare level, A B. But GenericObjectMap encode . to %e, so the key in DB is: A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head A B And it seemed that the escape function is useless and should be disabled. I'm not sure whether Kenneth's problem is touching this bug. Because this scene only occur when the object set is very large and make the two object has same hash value. Kenneth, could you have time to run ceph-kv-store [path-to-osd] list _GHOBJTOSEQ_| grep 6adb1100 -A 100. ceph-kv-store is a debug tool which can be compiled from source. You can clone ceph repo and run ./authongen.sh; ./configure; cd src; make ceph-kvstore-tool. path-to-osd should be /var/lib/ceph/osd-[id]/current/. 6adb1100 is from your verbose log and the next 100 rows should know necessary infos. You can also get ceph-kvstore-tool from the 'ceph-tests' package. Hi sage, do you think we need to provided with upgrade function to fix it? Hmm, we might. This only affects the key/value encoding right? The FileStore is using its own function to map these to file names? Can you open a ticket in the tracker for this? Thanks! sage On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: - Message from Haomai Wang haomaiw...@gmail.com - Date: Thu, 14 Aug 2014 19:11:55 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Could you add config debug_keyvaluestore = 20/20 to the crashed osd and replay the command causing crash? I would like to get more debug infos! Thanks. I included the log in attachment! Thanks! On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: I have: osd_objectstore = keyvaluestore-dev in the global section of my ceph.conf [root@ceph002 ~]# ceph osd erasure-code-profile get profile11 directory=/usr/lib64/ceph/erasure-code k=8 m=3 plugin=jerasure ruleset-failure-domain=osd technique=reed_sol_van the ecdata pool has this as profile pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 161 flags hashpspool stripe_width 4096 ECrule in crushmap rule ecdata { ruleset 2 type erasure min_size 3 max_size 20 step set_chooseleaf_tries 5 step take default-ec step choose indep 0 type osd step emit } root default-ec { id -8 # do not change unnecessarily # weight 140.616 alg straw hash 0 # rjenkins1 item ceph001-ec weight 46.872 item ceph002-ec weight 46.872 item ceph003-ec weight 46.872 ... Cheers! Kenneth - Message from Haomai Wang haomaiw...@gmail.com - Date: Thu, 14 Aug 2014 10:07:50 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Cc: ceph-users ceph-users@lists.ceph.com Hi Kenneth, Could you give your configuration related to EC and KeyValueStore? Not sure
Re: [ceph-users] RadosGW problems
Hi there, I have FastCgiWrapper Off in fastcgi.conf file; I also have SELinux in permissive state; 'ps aux | grep rados' shows me radosgw is running; The problems stays the same... I can login with S3 credentials, create buckets, but uploads write this in the logs: [Mon Aug 18 12:00:28.636378 2014] [:error] [pid 11251] [client 10.5.1.1:49680] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (3 0 sec) [Mon Aug 18 12:00:28.676825 2014] [:error] [pid 11251] [client 10.5.1.1:49680] FastCGI: incomplete headers (0 bytes) received from server /var/www/cgi-bin/s3 gw.fcgi When I try Swift credentials, I cannot login at all.. I have tested both Cyberduck and Swift client on the command line, and I always get this on the logs: GET /v1.0 HTTP/1.1 404 78 - Cyberduck/4.5 (Mac OS X/10.9.3) (x86_64) GET /v1.0 HTTP/1.1 404 78 - python-swiftclient-2.2.0 In S3 login, when I upload a file, I can see it almost at 100% complete, but then it fails with the above errors. A strange thing is... the /var/log/ceph/client.radosgw.gateway.log is not getting updated, I don't see any new logs in there. Thank you once again for your help, Marco Garcês *Marco Garcês* *#sysadmin* Maputo - Mozambique *[Phone]* +258 84 4105579 *[Skype]* marcogarces On Mon, Aug 18, 2014 at 12:08 AM, Linux Chips linux.ch...@gmail.com wrote: On Mon 18 Aug 2014 12:45:33 AM AST, Bachelder, Kurt wrote: Hi Marco – In CentOS 6, you also had to edit /etc/httpd/conf.d/fastcgi.conf to turn OFF the fastcgi wrapper. I haven’t tested in v7 yet, but I’d guess it’s required there too: # wrap all fastcgi script calls in suexec FastCgiWrapper Off Give that a try, if you haven’t already – restart httpd and ceph-radosgw afterward. Kurt *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Marco Garcês *Sent:* Friday, August 15, 2014 12:46 PM *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] RadosGW problems Hi there, I am using CentOS 7 with Ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), 3 OSD, 3 MON, 1 RadosGW (which also serves as ceph-deploy node) I followed all the instructions in the docs, regarding setting up a basic Ceph cluster, and then followed the one to setup RadosGW. I can't seem to use the Swift interface, and the S3 interface, times out after 30 seconds. [Fri Aug 15 18:25:33.290877 2014] [:error] [pid 6197] [client 10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (30 sec) [Fri Aug 15 18:25:33.291781 2014] [:error] [pid 6197] [client 10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: incomplete headers (0 bytes) received from server /var/www/cgi-bin/s3gw.fcgi *My ceph.conf:* [global] fsid = 581bcd61-8760-4756-a7c8-e8275c0957ad mon_initial_members = CEPH01, CEPH02, CEPH03 mon_host = 10.2.27.81,10.2.27.82,10.2.27.83 public network = 10.2.27.0/25 http://10.2.27.0/25 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd pool default size = 2 osd pool default pg num = 333 osd pool default pgp num = 333 osd journal size = 1024 [client.radosgw.gateway] host = GATEWAY keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /var/log/ceph/client.radosgw.gateway.log rgw print continue = false rgw enable ops log = true *My apache rgw.conf:* FastCgiExternalServer /var/www/cgi-bin/s3gw.fcgi -socket /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock VirtualHost *:443 SSLEngine on SSLCertificateFile /etc/pki/tls/certs/ca_rgw.crt SSLCertificateKeyFile /etc/pki/tls/private/ca_rgw.key SetEnv SERVER_PORT_SECURE 443 ServerName gateway.testes.local ServerAlias *.gateway.testes.local ServerAdmin marco.gar...@testes.co.mz mailto:marco.gar...@testes.co.mz DocumentRoot /var/www/cgi-bin RewriteEngine On #RewriteRule ^/(.*) /s3gw.fcgi?%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/httpd/error_rgw_ssl.log CustomLog /var/log/httpd/access_rgw_ssl.log combined ServerSignature Off /VirtualHost *My /var/www/cgi-bin/s3gw.fcgi * #!/bin/sh exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n
Re: [ceph-users] ceph cluster inconsistency?
On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, I tried this after restarting the osd, but I guess that was not the aim ( # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_| grep 6adb1100 -A 100 IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily unavailable tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr)) .. ) When I run it after bringing the osd down, it takes a while, but it has no output.. (When running it without the grep, I'm getting a huge list ) Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be reversed into leveldb. So grep benchmark_data_ceph001.cubone.os_5560_object789734 should be help it. Or should I run this immediately after the osd is crashed, (because it maybe rebalanced? I did already restarted the cluster) I don't know if it is related, but before I could all do that, I had to fix something else: A monitor did run out if disk space, using 8GB for his store.db folder (lot of sst files). Other monitors are also near that level. Never had that problem on previous setups before. I recreated a monitor and now it uses 3.8GB. It exists some duplicate data which needed to be compacted. Another idea, maybe you can make KeyValueStore's stripe size align with EC stripe size. I haven't think deeply and maybe I will try it later. Thanks! Kenneth - Message from Sage Weil sw...@redhat.com - Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT) From: Sage Weil sw...@redhat.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Haomai Wang haomaiw...@gmail.com Cc: Kenneth Waegeman kenneth.waege...@ugent.be, ceph-users@lists.ceph.com On Fri, 15 Aug 2014, Haomai Wang wrote: Hi Kenneth, I don't find valuable info in your logs, it lack of the necessary debug output when accessing crash code. But I scan the encode/decode implementation in GenericObjectMap and find something bad. For example, two oid has same hash and their name is: A: rb.data.123 B: rb-123 In ghobject_t compare level, A B. But GenericObjectMap encode . to %e, so the key in DB is: A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head A B And it seemed that the escape function is useless and should be disabled. I'm not sure whether Kenneth's problem is touching this bug. Because this scene only occur when the object set is very large and make the two object has same hash value. Kenneth, could you have time to run ceph-kv-store [path-to-osd] list _GHOBJTOSEQ_| grep 6adb1100 -A 100. ceph-kv-store is a debug tool which can be compiled from source. You can clone ceph repo and run ./authongen.sh; ./configure; cd src; make ceph-kvstore-tool. path-to-osd should be /var/lib/ceph/osd-[id]/current/. 6adb1100 is from your verbose log and the next 100 rows should know necessary infos. You can also get ceph-kvstore-tool from the 'ceph-tests' package. Hi sage, do you think we need to provided with upgrade function to fix it? Hmm, we might. This only affects the key/value encoding right? The FileStore is using its own function to map these to file names? Can you open a ticket in the tracker for this? Thanks! sage On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: - Message from Haomai Wang haomaiw...@gmail.com - Date: Thu, 14 Aug 2014 19:11:55 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Could you add config debug_keyvaluestore = 20/20 to the crashed osd and replay the command causing crash? I would like to get more debug infos! Thanks. I included the log in attachment! Thanks! On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: I have: osd_objectstore = keyvaluestore-dev in the global section of my ceph.conf [root@ceph002 ~]# ceph osd erasure-code-profile get profile11 directory=/usr/lib64/ceph/erasure-code k=8 m=3 plugin=jerasure ruleset-failure-domain=osd technique=reed_sol_van the ecdata pool has this as profile pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 161 flags hashpspool stripe_width 4096 ECrule in crushmap rule ecdata { ruleset 2 type erasure min_size 3 max_size 20 step set_chooseleaf_tries 5 step take default-ec step choose indep 0 type osd step emit } root default-ec { id -8 # do not change unnecessarily # weight 140.616 alg straw hash 0
Re: [ceph-users] ceph cluster inconsistency?
- Message from Haomai Wang haomaiw...@gmail.com - Date: Mon, 18 Aug 2014 18:34:11 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Cc: Sage Weil sw...@redhat.com, ceph-users@lists.ceph.com On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, I tried this after restarting the osd, but I guess that was not the aim ( # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_| grep 6adb1100 -A 100 IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily unavailable tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr)) .. ) When I run it after bringing the osd down, it takes a while, but it has no output.. (When running it without the grep, I'm getting a huge list ) Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be reversed into leveldb. So grep benchmark_data_ceph001.cubone.os_5560_object789734 should be help it. this gives: [root@ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_ | grep 5560_object789734 -A 100 _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object789734!head _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1330170!head _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ceph001%ecubone%eos_4919_object227366!head _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1363631!head _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1573957!head _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1019282!head _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1283563!head _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ceph001%ecubone%eos_4919_object273736!head _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1170628!head _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ceph001%ecubone%eos_4919_object256335!head _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1484196!head _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ceph001%ecubone%eos_5560_object884178!head _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object853746!head _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object36633!head _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1235337!head _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1661351!head _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object238126!head _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object339943!head _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1047094!head _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ceph001%ecubone%eos_31461_object520642!head _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object639565!head _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ceph001%ecubone%eos_4919_object231080!head _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object858050!head _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object241796!head _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ceph001%ecubone%eos_4919_object7462!head _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ceph001%ecubone%eos_5560_object243798!head _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ceph001%ecubone%eos_8961_object109512!head _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ceph001%ecubone%eos_31461_object653973!head _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1378169!head _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ceph001%ecubone%eos_5560_object512925!head _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ceph001%ecubone%eos_4919_object23289!head _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1108852!head _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ceph001%ecubone%eos_5560_object704026!head _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ceph001%ecubone%eos_8961_object250441!head _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ceph001%ecubone%eos_31461_object706178!head _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ceph001%ecubone%eos_5560_object316952!head _GHOBJTOSEQ_:3%e0s0_head!0012447D!!3!!benchmark_data_ceph001%ecubone%eos_5560_object538734!head _GHOBJTOSEQ_:3%e0s0_head!001244D9!!3!!benchmark_data_ceph001%ecubone%eos_31461_object789215!head
Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
Hi Craig, I brought the cluster in a stable condition. All slow osds are no longer in the cluster. All remaining 36 osds are more than 100 MB / sec writeable (dd if=/dev/zero of=testfile-2.txt bs=1024 count=4096000). No ceph client is connected to the cluster. The ceph nodes are in idle. Now sees the state as follows: root@ceph-admin-storage:~# ceph -s cluster 6b481875-8be5-4508-b075-e1f660fd7b33 health HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean monmap e2: 3 mons at {ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0}, election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage osdmap e36830: 36 osds: 36 up, 36 in pgmap v10907190: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects 22051 GB used, 68206 GB / 90258 GB avail 6140 active+clean 3 down+incomplete 1 active+clean+replay root@ceph-admin-storage:~# ceph health detail HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean pg 2.c1 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck inactive since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.c1 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck unclean since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.587 is down+incomplete, acting [13,8] pg 2.e3 is down+incomplete, acting [20,8] pg 2.c1 is down+incomplete, acting [13,8] I have tried the following: root@ceph-admin-storage:~# ceph pg scrub 2.587 instructing pg 2.587 on osd.13 to scrub root@ceph-admin-storage:~# ceph pg scrub 2.e3 ^[[Ainstructing pg 2.e3 on osd.20 to scrub root@ceph-admin-storage:~# ceph pg scrub 2.c1 instructing pg 2.c1 on osd.13 to scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.587 instructing pg 2.587 on osd.13 to deep-scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.e3 instructing pg 2.e3 on osd.20 to deep-scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.c1 instructing pg 2.c1 on osd.13 to deep-scrub root@ceph-admin-storage:~# ceph pg repair 2.587 instructing pg 2.587 on osd.13 to repair root@ceph-admin-storage:~# ceph pg repair 2.e3 instructing pg 2.e3 on osd.20 to repair root@ceph-admin-storage:~# ceph pg repair 2.c1 instructing pg 2.c1 on osd.13 to repair In the monitor logfiles (ceph-mon.ceph-1/2/3-storage.log) I see the pg scrub, pg deep-scrub and pg repair commands, but I do not see anything in ceph.log and nothing in the ceph-osd.13/20/8.log. (2014-08-18 13:24:49.337954 7f24ac111700 0 mon.ceph-1-storage@0(leader) e2 handle_command mon_command({prefix: pg repair, pgid: 2.587} v 0) v1) Is it possible to repair the ceph-cluster? root@ceph-admin-storage:~# ceph pg force_create_pg 2.587 pg 2.587 now creating, ok But nothing happens, the pg will not created. root@ceph-admin-storage:~# ceph -s cluster 6b481875-8be5-4508-b075-e1f660fd7b33 health HEALTH_WARN 2 pgs down; 2 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean monmap e2: 3 mons at {ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0}, election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage osdmap e36830: 36 osds: 36 up, 36 in pgmap v10907191: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects 22051 GB used, 68206 GB / 90258 GB avail 1 creating 6140 active+clean 2 down+incomplete 1 active+clean+replay root@ceph-admin-storage:~# ceph health detail HEALTH_WARN 2 pgs down; 2 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean pg 2.c1 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck inactive since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck inactive since forever, current state creating, last acting [] pg 2.c1 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck unclean since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck unclean since forever, current state creating, last acting [] pg 2.e3 is down+incomplete, acting [20,8] pg 2.c1 is down+incomplete, acting [13,8] What can I do to get rid of the incomplete or creating pg? Regards, Mike Von: Craig Lewis [cle...@centraldesktop.com] Gesendet: Donnerstag, 14. August 2014 19:56 An: Riederer, Michael Cc: Karan Singh; ceph-users@lists.ceph.com Betreff: Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean It sound
Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
What has changed in the cluster compared to my first mail, the cluster was in a position to repair one pg, but now has a different pg in status active+clean+replay root@ceph-admin-storage:~# ceph pg dump | grep ^2.92 dumped all in format plain 2.920000000active+clean2014-08-18 10:37:20.9628580'036830:577[8,13]8[8,13]80'0 2014-08-18 10:37:20.96272813503'13904192014-08-14 10:37:12.497492 root@ceph-admin-storage:~# ceph pg dump | grep replay dumped all in format plain 0.49a0000000active+clean+replay2014-08-18 13:09:15.3172210'036830:1704[12,10]12[12,10]120'0 2014-08-18 13:09:15.3171310'02014-08-18 13:09:15.317131 Mike Von: ceph-users [ceph-users-boun...@lists.ceph.com] im Auftrag von Riederer, Michael [michael.riede...@br.de] Gesendet: Montag, 18. August 2014 13:40 An: Craig Lewis Cc: ceph-users@lists.ceph.com; Karan Singh Betreff: Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean Hi Craig, I brought the cluster in a stable condition. All slow osds are no longer in the cluster. All remaining 36 osds are more than 100 MB / sec writeable (dd if=/dev/zero of=testfile-2.txt bs=1024 count=4096000). No ceph client is connected to the cluster. The ceph nodes are in idle. Now sees the state as follows: root@ceph-admin-storage:~# ceph -s cluster 6b481875-8be5-4508-b075-e1f660fd7b33 health HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean monmap e2: 3 mons at {ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0}, election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage osdmap e36830: 36 osds: 36 up, 36 in pgmap v10907190: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects 22051 GB used, 68206 GB / 90258 GB avail 6140 active+clean 3 down+incomplete 1 active+clean+replay root@ceph-admin-storage:~# ceph health detail HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean pg 2.c1 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck inactive since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.c1 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck unclean since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.587 is down+incomplete, acting [13,8] pg 2.e3 is down+incomplete, acting [20,8] pg 2.c1 is down+incomplete, acting [13,8] I have tried the following: root@ceph-admin-storage:~# ceph pg scrub 2.587 instructing pg 2.587 on osd.13 to scrub root@ceph-admin-storage:~# ceph pg scrub 2.e3 ^[[Ainstructing pg 2.e3 on osd.20 to scrub root@ceph-admin-storage:~# ceph pg scrub 2.c1 instructing pg 2.c1 on osd.13 to scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.587 instructing pg 2.587 on osd.13 to deep-scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.e3 instructing pg 2.e3 on osd.20 to deep-scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.c1 instructing pg 2.c1 on osd.13 to deep-scrub root@ceph-admin-storage:~# ceph pg repair 2.587 instructing pg 2.587 on osd.13 to repair root@ceph-admin-storage:~# ceph pg repair 2.e3 instructing pg 2.e3 on osd.20 to repair root@ceph-admin-storage:~# ceph pg repair 2.c1 instructing pg 2.c1 on osd.13 to repair In the monitor logfiles (ceph-mon.ceph-1/2/3-storage.log) I see the pg scrub, pg deep-scrub and pg repair commands, but I do not see anything in ceph.log and nothing in the ceph-osd.13/20/8.log. (2014-08-18 13:24:49.337954 7f24ac111700 0 mon.ceph-1-storage@0(leader) e2 handle_command mon_command({prefix: pg repair, pgid: 2.587} v 0) v1) Is it possible to repair the ceph-cluster? root@ceph-admin-storage:~# ceph pg force_create_pg 2.587 pg 2.587 now creating, ok But nothing happens, the pg will not created. root@ceph-admin-storage:~# ceph -s cluster 6b481875-8be5-4508-b075-e1f660fd7b33 health HEALTH_WARN 2 pgs down; 2 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean monmap e2: 3 mons at {ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0}, election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage osdmap e36830: 36 osds: 36 up, 36 in pgmap v10907191: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects 22051 GB used, 68206 GB / 90258 GB avail 1 creating 6140 active+clean 2 down+incomplete 1
[ceph-users] [radosgw-admin] bilog list confusion
Hi, Is there any configuration option in ceph.conf for enabling/disabling the bilog list? I mean the result of this command: radosgw-admin bilog list One ceph cluster gives me results - list of operations which were made to the bucket, and the other one gives me just an empty list. I can't see what's the reason. I can't find it anywhere here in the ceph.conf file. http://ceph.com/docs/master/rados/configuration/ceph-conf/ My guess is it's in region info, but when I've changed these values to false for the cluster with working bilog, the bilog would still show. 1. cluster with empty bilog list: zones: [ { name: default, endpoints: [], log_meta: false, log_data: false}], 2. cluster with *proper* bilog list: zones: [ { name: master-1, endpoints: [ http:\/\/[...]], log_meta: true, log_data: true}], Here are pools on both of the clusters: 1. cluster with *proper* bilog list: rbd .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users.email .users .rgw.buckets .rgw.buckets.index .log '' 2. cluster with empty bilog list: data metadata rbd .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users.email .users '' .rgw.buckets.index .rgw.buckets .log And here is the zone info (just the placement_pools, rest of the config is the same): 1. cluster with *proper* bilog list: placement_pools: [] 2. cluster with *empty* bilog list: placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: }}]} Any thoughts? I've tried to figure it out by myself, but no luck. Thanks, Patrycja Szabłowska ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW problems
Hi Marco, Is your DNS setup to use the wildcard (*.gateway.testes.local)? I noticed that you're using it in the server alias, but that you don't have an rgw_dns_name configured in your ceph.conf. The rgw_dns_name should be set to gateway.testes.local if your dns is configured to use the wildcard naming with that subdomain. Also see that you're using SSL... which domain have you signed? *.gateway.testes.local? Since you can create a bucket, but not write to it, I'm wondering if there's an issue with the way your client is attempting to access the bucket... can you resolve bucket.gateway.testes.local from your client? Kurt Original message From: Marco Garcês Date:08/18/2014 6:33 AM (GMT-05:00) To: Linux Chips Cc: Bachelder, Kurt , ceph-users@lists.ceph.com Subject: Re: [ceph-users] RadosGW problems Hi there, I have FastCgiWrapper Off in fastcgi.conf file; I also have SELinux in permissive state; 'ps aux | grep rados' shows me radosgw is running; The problems stays the same... I can login with S3 credentials, create buckets, but uploads write this in the logs: [Mon Aug 18 12:00:28.636378 2014] [:error] [pid 11251] [client 10.5.1.1:49680http://10.5.1.1:49680] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (3 0 sec) [Mon Aug 18 12:00:28.676825 2014] [:error] [pid 11251] [client 10.5.1.1:49680http://10.5.1.1:49680] FastCGI: incomplete headers (0 bytes) received from server /var/www/cgi-bin/s3 gw.fcgi When I try Swift credentials, I cannot login at all.. I have tested both Cyberduck and Swift client on the command line, and I always get this on the logs: GET /v1.0 HTTP/1.1 404 78 - Cyberduck/4.5 (Mac OS X/10.9.3) (x86_64) GET /v1.0 HTTP/1.1 404 78 - python-swiftclient-2.2.0 In S3 login, when I upload a file, I can see it almost at 100% complete, but then it fails with the above errors. A strange thing is... the /var/log/ceph/client.radosgw.gateway.log is not getting updated, I don't see any new logs in there. Thank you once again for your help, Marco Garcês Marco Garcês #sysadmin Maputo - Mozambique [Phone] +258 84 4105579 [Skype] marcogarces On Mon, Aug 18, 2014 at 12:08 AM, Linux Chips linux.ch...@gmail.commailto:linux.ch...@gmail.com wrote: On Mon 18 Aug 2014 12:45:33 AM AST, Bachelder, Kurt wrote: Hi Marco – In CentOS 6, you also had to edit /etc/httpd/conf.d/fastcgi.conf to turn OFF the fastcgi wrapper. I haven’t tested in v7 yet, but I’d guess it’s required there too: # wrap all fastcgi script calls in suexec FastCgiWrapper Off Give that a try, if you haven’t already – restart httpd and ceph-radosgw afterward. Kurt *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Marco Garcês *Sent:* Friday, August 15, 2014 12:46 PM *To:* ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com *Subject:* [ceph-users] RadosGW problems Hi there, I am using CentOS 7 with Ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), 3 OSD, 3 MON, 1 RadosGW (which also serves as ceph-deploy node) I followed all the instructions in the docs, regarding setting up a basic Ceph cluster, and then followed the one to setup RadosGW. I can't seem to use the Swift interface, and the S3 interface, times out after 30 seconds. [Fri Aug 15 18:25:33.290877 2014] [:error] [pid 6197] [client 10.5.5.222:58051http://10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (30 sec) [Fri Aug 15 18:25:33.291781 2014] [:error] [pid 6197] [client 10.5.5.222:58051http://10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: incomplete headers (0 bytes) received from server /var/www/cgi-bin/s3gw.fcgi *My ceph.conf:* [global] fsid = 581bcd61-8760-4756-a7c8-e8275c0957ad mon_initial_members = CEPH01, CEPH02, CEPH03 mon_host = 10.2.27.81,10.2.27.82,10.2.27.83 public network = 10.2.27.0/25http://10.2.27.0/25 http://10.2.27.0/25 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd pool default size = 2 osd pool default pg num = 333 osd pool default pgp num = 333 osd journal size = 1024 [client.radosgw.gateway] host = GATEWAY keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /var/log/ceph/client.radosgw.gateway.log rgw print continue = false rgw enable ops log = true *My apache rgw.conf:* FastCgiExternalServer /var/www/cgi-bin/s3gw.fcgi -socket /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock VirtualHost *:443 SSLEngine on SSLCertificateFile /etc/pki/tls/certs/ca_rgw.crt SSLCertificateKeyFile /etc/pki/tls/private/ca_rgw.key SetEnv SERVER_PORT_SECURE 443 ServerName gateway.testes.local ServerAlias *.gateway.testes.local ServerAdmin marco.gar...@testes.co.mzmailto:marco.gar...@testes.co.mz
Re: [ceph-users] RadosGW problems
Hi Kurt, I have pointed my DNS '*.gateway.testes.local' and 'gateway.testes.local, to the same IP (the radosgw server). I have added rgw_dns_name has you suggested to the config (it was comment out). I will try everything and give feedback. By the way, when I restart ceph-radosgw service, I get this in the logs (which previous I did not see anything): 2014-08-18 15:19:44.812039 7fbf417fa700 1 handle_sigterm 2014-08-18 15:19:44.812104 7fbf417fa700 1 handle_sigterm set alarm for 120 2014-08-18 15:19:44.812235 7fbf5c495880 -1 shutting down 2014-08-18 15:19:44.812305 7fbf40ff9700 0 ERROR: FCGX_Accept_r returned -4 2014-08-18 15:19:44.812432 7fbf417fa700 1 handle_sigterm 2014-08-18 15:19:44.857506 7fbf5c495880 1 final shutdown 2014-08-18 15:19:45.010597 7fb318b96880 0 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 3242 2014-08-18 15:19:45.219582 7fb318b96880 0 framework: fastcgi 2014-08-18 15:19:45.219599 7fb318b96880 0 starting handler: fastcgi 2014-08-18 15:19:45.692248 7fb2fe6fb700 0 ERROR: can't read user header: ret=-2 2014-08-18 15:19:45.692273 7fb2fe6fb700 0 ERROR: sync_user() failed, user=teste ret=-2 The last 2 lines look suspicious... *Marco Garcês* *#sysadmin* Maputo - Mozambique *[Phone]* +258 84 4105579 *[Skype]* marcogarces On Mon, Aug 18, 2014 at 2:58 PM, Bachelder, Kurt kurt.bachel...@sierra-cedar.com wrote: Hi Marco, Is your DNS setup to use the wildcard (*.gateway.testes.local)? I noticed that you're using it in the server alias, but that you don't have an rgw_dns_name configured in your ceph.conf. The rgw_dns_name should be set to gateway.testes.local if your dns is configured to use the wildcard naming with that subdomain. Also see that you're using SSL... which domain have you signed? *.gateway.testes.local? Since you can create a bucket, but not write to it, I'm wondering if there's an issue with the way your client is attempting to access the bucket... can you resolve bucket.gateway.testes.local from your client? Kurt Original message From: Marco Garcês Date:08/18/2014 6:33 AM (GMT-05:00) To: Linux Chips Cc: Bachelder, Kurt , ceph-users@lists.ceph.com Subject: Re: [ceph-users] RadosGW problems Hi there, I have FastCgiWrapper Off in fastcgi.conf file; I also have SELinux in permissive state; 'ps aux | grep rados' shows me radosgw is running; The problems stays the same... I can login with S3 credentials, create buckets, but uploads write this in the logs: [Mon Aug 18 12:00:28.636378 2014] [:error] [pid 11251] [client 10.5.1.1:49680] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (3 0 sec) [Mon Aug 18 12:00:28.676825 2014] [:error] [pid 11251] [client 10.5.1.1:49680] FastCGI: incomplete headers (0 bytes) received from server /var/www/cgi-bin/s3 gw.fcgi When I try Swift credentials, I cannot login at all.. I have tested both Cyberduck and Swift client on the command line, and I always get this on the logs: GET /v1.0 HTTP/1.1 404 78 - Cyberduck/4.5 (Mac OS X/10.9.3) (x86_64) GET /v1.0 HTTP/1.1 404 78 - python-swiftclient-2.2.0 In S3 login, when I upload a file, I can see it almost at 100% complete, but then it fails with the above errors. A strange thing is... the /var/log/ceph/client.radosgw.gateway.log is not getting updated, I don't see any new logs in there. Thank you once again for your help, Marco Garcês *Marco Garcês* *#sysadmin* Maputo - Mozambique *[Phone]* +258 84 4105579 *[Skype]* marcogarces On Mon, Aug 18, 2014 at 12:08 AM, Linux Chips linux.ch...@gmail.com wrote: On Mon 18 Aug 2014 12:45:33 AM AST, Bachelder, Kurt wrote: Hi Marco – In CentOS 6, you also had to edit /etc/httpd/conf.d/fastcgi.conf to turn OFF the fastcgi wrapper. I haven’t tested in v7 yet, but I’d guess it’s required there too: # wrap all fastcgi script calls in suexec FastCgiWrapper Off Give that a try, if you haven’t already – restart httpd and ceph-radosgw afterward. Kurt *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Marco Garcês *Sent:* Friday, August 15, 2014 12:46 PM *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] RadosGW problems Hi there, I am using CentOS 7 with Ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), 3 OSD, 3 MON, 1 RadosGW (which also serves as ceph-deploy node) I followed all the instructions in the docs, regarding setting up a basic Ceph cluster, and then followed the one to setup RadosGW. I can't seem to use the Swift interface, and the S3 interface, times out after 30 seconds. [Fri Aug 15 18:25:33.290877 2014] [:error] [pid 6197] [client 10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (30 sec) [Fri Aug 15 18:25:33.291781 2014] [:error] [pid 6197] [client 10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: incomplete headers (0 bytes)
[ceph-users] mds isn't working anymore after osd's running full
Hi all, We have a small ceph cluster running version 0.80.1 with cephfs on five nodes. Last week some osd's were full and shut itself down. To help de osd's start again I added some extra osd's and moved some placement group directories on the full osd's (which has a copy on another osd) to another place on the node (as mentioned in http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/) After clearing some space on the full osd's I started them again. After a lot of deep scrubbing and two pg inconsistencies which needed to be repaired everything looked fine except the mds which still is in the replay state and it stays that way. The log below says that mds need osdmap epoch 1833 and have 1832. 2014-08-18 12:29:22.268248 7fa786182700 1 mds.-1.0 handle_mds_map standby 2014-08-18 12:29:22.273995 7fa786182700 1 mds.0.25 handle_mds_map i am now mds.0.25 2014-08-18 12:29:22.273998 7fa786182700 1 mds.0.25 handle_mds_map state change up:standby -- up:replay 2014-08-18 12:29:22.274000 7fa786182700 1 mds.0.25 replay_start 2014-08-18 12:29:22.274014 7fa786182700 1 mds.0.25 recovery set is 2014-08-18 12:29:22.274016 7fa786182700 1 mds.0.25 need osdmap epoch 1833, have 1832 2014-08-18 12:29:22.274017 7fa786182700 1 mds.0.25 waiting for osdmap 1833 (which blacklists prior instance) # ceph status cluster c78209f5-55ea-4c70-8968-2231d2b05560 health HEALTH_WARN mds cluster is degraded monmap e3: 3 mons at {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0}, election epoch 362, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003 mdsmap e154: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby osdmap e1951: 12 osds: 12 up, 12 in pgmap v193685: 492 pgs, 4 pools, 60297 MB data, 470 kobjects 124 GB used, 175 GB / 299 GB avail 492 active+clean # ceph osd tree # idweighttype nameup/downreweight -10.2399root default -20.05997host th1-osd001 00.01999osd.0up1 10.01999osd.1up1 20.01999osd.2up1 -30.05997host th1-osd002 30.01999osd.3up1 40.01999osd.4up1 50.01999osd.5up1 -40.05997host th1-mon003 60.01999osd.6up1 70.01999osd.7up1 80.01999osd.8up1 -50.05997host th1-mon002 90.01999osd.9up1 100.01999osd.10up1 110.01999osd.11up1 What is the way to get the mds up and running again? I still have all the placement group directories which I moved from the full osds which where down to create disk space. Kind regards, Jasper Siero ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cache tiering and CRUSH map
Hi I am trying to use cache tiering and read the topic about mapping OSD with pools (http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds). I can't realize why OSDs were splitted on spinner and SSD type on root level of CRUSH map? Is it possible to to use some location type under host level to group OSDs by type and use then it in mapping rules? -- Michael Kolomiets ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Days are back with a vengeance!
Greetings cephalofolk, Now that the Ceph Day events are becoming much more of a community undertaking (as opposed to a Inktank-hosted event), we are really ramping things up. There are currently four events planned in the near future, and we need speakers for all of them! http://ceph.com/cephdays/ If you are interested in speaking at any of these events just send me the following: 1) Title 2) Abstract (brief outline of your ceph-related talk) 3) Speaker Name/title 4) Organization/Affiliation (or just Ceph Community if you are speaking on your own) 5) Event at which you wish to speak Currently we have openings at the following events: * 18 SEP 2014 -- Paris, France :: Le Comptoir General Ghetto Museum * 24 SEP 2014 -- San Jose, CA USA :: Brocade Communication Systems HQ * 08 OCT 2014 -- New York, NY USA :: Humphrey at the Eventi Hotel * 22 OCT 2014 -- London, UK :: etc. Venues St Paul's We obviously love any talks that are Ceph-related, but we're especially interested in some of the following topics: * CephFS * Performance Tuning * Integrations work * Crazy experiments * Large scale deployment/management use cases * Using embedded object classes Please let me know if you have questions or concerns before submitting, but hurry as spots will fill up quickly! Thanks. Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-Deploy Install Error
Do you have the full paste of the ceph-deploy output? Tracing the URL we definitely not have google-perftools packages for Wheezy, the full output might help understanding what is going on On Mon, Aug 11, 2014 at 8:01 PM, joshua Kay scjo...@gmail.com wrote: Hi, When I attempt to use the ceph-deploy install command on one of my nodes I get this error: ][WARNIN] W: Failed to fetch http://ceph.com/packages/google-perftools/debian/dists/wheezy/main/binary-armhf/Packages 404 Not Found [IP: 208.113.241.137 80] [ceph1][WARNIN] [ceph1][WARNIN] E: Some index files failed to download. They have been ignored, or old ones used instead. [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 100 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-get -q update Does anyone know the cause of this problem and the solution? Thanks, Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)
On 08/14/2014 02:35 AM, Christian Balzer wrote: The default (firefly, but previous ones are functionally identical) crush map has: --- # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } --- The type host states that there will be not more that one replica per host (node), so with size=3 you will need at least 3 hosts to choose from. If you were to change this to to type OSD, all 3 replicas could wind up on the same host, not really a good idea. Ah, this is a great clue. (On my cluster, the default rule contains 'step choose firstn 0 type osd', and thus has the problem you hint at here.) So I played with a new rule set with the buckets 'root', 'rack', 'host', 'bank' and 'osd', of which 'rack' and 'host' are unused. The 'bank' bucket: the OSD nodes each contain two 'banks' of disks with a separate disk controller channel, a separate power supply cable, and a separate SSD. Thus, 'bank' actually does represent a real failure domain. More importantly, this provides a bucket level below 'osd' that is big enough for 3-4 replicas. Here's the rule: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } If the OP (sorry, Craig, you do have a name ;) wants to play with CRUSH map rules, here's the quick and dirty of what I did: # get the current 'orig' CRUSH map, decompile and edit; see: # http://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map ceph osd getcrushmap -o /tmp/crush-orig.bin crushtool -d /tmp/crush-orig.bin -o /tmp/crush.txt $EDITOR /tmp/crush.txt # edit the crush map with your fave editor; see: # http://ceph.com/docs/master/rados/operations/crush-map # # in my case, I added the bank type: type 0 osd type 1 bank type 2 host type 3 rack type 4 root # the banks (repeat as applicable): bank bank0 { id -6 alg straw hash 0 item osd.0 weight 1.000 item osd.1 weight 1.000 } bank bank1 { id -7 alg straw hash 0 item osd.2 weight 1.000 item osd.3 weight 1.000 } # updated the hosts (repeat as applicable): host host0 { id -4 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item bank0 weight 2.000 item bank1 weight 2.000 } # and added the rule: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } # compile the crush map: crushtool -c /tmp/crush.txt -o /tmp/crush-new.bin # and run some tests; the replica sizes tested come from # 'min_size' and 'max_size' in the above rule; see: # http://ceph.com/docs/master/man/8/crushtool/#running-tests-with-test # # show sample PG-OSD maps: crushtool -i /tmp/crush-new.bin --test --show-statistics # show bad mappings; if the CRUSH map is correct, # this should be empty: crushtool -i /tmp/crush-new.bin --test --show-bad-mappings # show per-OSD pg utilization: crushtool -i /tmp/crush-new.bin --test --show-utilization You might finackle something like that (again the rule splits on hosts) by having multiple hosts on one physical machine, but therein lies madness. Well, the bucket names can be changed, as above, and Sage hints at doing something like this here: http://wiki.ceph.com/Planning/Blueprints/Dumpling/extend_crush_rule_language (And IIUC he also proposes something to implement my original intentions: distribute four replicas, two on each of two racks, and don't put two replicas on the same host within a rack; this is more easily generalized than the above funky configuration.) John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy error
Oh yes, we don't have ARM packages for wheezy. On Mon, Aug 11, 2014 at 7:12 PM, joshua Kay scjo...@gmail.com wrote: Hi, I am running into an error when I am attempting to use ceph-deploy install when creating my cluster. I am attempting to run ceph on Debian 7.0 wheezy with an ARM processor. When I attempt to run ceph-deploy install I get the following errors: [ceph1][WARNIN] E: Unable to locate package ceph [ceph1][WARNIN] E: Unable to locate package ceph-mds [ceph1][WARNIN] E: Unable to locate package ceph-common [ceph1][WARNIN] E: Unable to locate package ceph-fs-common [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 100 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get -q -o Dpkg::Options::=--force-confnew --no-install-recommends --assume-yes install -- ceph ceph-mds ceph-common ceph-fs-common gdisk I am assuming I do not have all the packages required for debian wheezy, but I have tried to set up a repository and manually insert the packages from this documentation: http://ceph.com/docs/master/install/get-packages/ Does anyone know the issue or the proper way to set up a repository for a Debian wheezy system? Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
I take it that OSD 8, 13, and 20 are some of the stopped OSDs. I wasn't able to get ceph to execute ceph pg force_create until the OSDs in [recovery_state][probing_osds] from ceph pg query were online. I ended up reformatting most of them, and re-adding them to the cluster. What's wrong with those OSDs? How slow are they? If the problem is just that they're really slow, try starting them up, and manually marking them UP and OUT. That way Ceph will read from them, but not write to them. If they won't stay up, I'd replace them, and get the replacements back in the cluster. I'd leave the replacements UP and OUT. You can rebalance later, after the cluster is healthy again. I've never seen the replay state, I'm not sure what to do with that. On Mon, Aug 18, 2014 at 5:05 AM, Riederer, Michael michael.riede...@br.de wrote: What has changed in the cluster compared to my first mail, the cluster was in a position to repair one pg, but now has a different pg in status active+clean+replay root@ceph-admin-storage:~# ceph pg dump | grep ^2.92 dumped all in format plain 2.920000000active+clean2014-08-18 10:37:20.9628580'036830:577[8,13]8[8,13]80'0 2014-08-18 10:37:20.96272813503'13904192014-08-14 10:37:12.497492 root@ceph-admin-storage:~# ceph pg dump | grep replay dumped all in format plain 0.49a0000000active+clean+replay 2014-08-18 13:09:15.3172210'036830:1704[12,10]12 [12,10]120'02014-08-18 13:09:15.3171310'02014-08-18 13:09:15.317131 Mike -- *Von:* ceph-users [ceph-users-boun...@lists.ceph.com] im Auftrag von Riederer, Michael [michael.riede...@br.de] *Gesendet:* Montag, 18. August 2014 13:40 *An:* Craig Lewis *Cc:* ceph-users@lists.ceph.com; Karan Singh *Betreff:* Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean Hi Craig, I brought the cluster in a stable condition. All slow osds are no longer in the cluster. All remaining 36 osds are more than 100 MB / sec writeable (dd if=/dev/zero of=testfile-2.txt bs=1024 count=4096000). No ceph client is connected to the cluster. The ceph nodes are in idle. Now sees the state as follows: root@ceph-admin-storage:~# ceph -s cluster 6b481875-8be5-4508-b075-e1f660fd7b33 health HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean monmap e2: 3 mons at {ceph-1-storage= 10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0}, election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage osdmap e36830: 36 osds: 36 up, 36 in pgmap v10907190: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects 22051 GB used, 68206 GB / 90258 GB avail 6140 active+clean 3 down+incomplete 1 active+clean+replay root@ceph-admin-storage:~# ceph health detail HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean pg 2.c1 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck inactive since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck inactive since forever, current state down+incomplete, last acting [13,8] pg 2.c1 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.e3 is stuck unclean since forever, current state down+incomplete, last acting [20,8] pg 2.587 is stuck unclean since forever, current state down+incomplete, last acting [13,8] pg 2.587 is down+incomplete, acting [13,8] pg 2.e3 is down+incomplete, acting [20,8] pg 2.c1 is down+incomplete, acting [13,8] I have tried the following: root@ceph-admin-storage:~# ceph pg scrub 2.587 instructing pg 2.587 on osd.13 to scrub root@ceph-admin-storage:~# ceph pg scrub 2.e3 ^[[Ainstructing pg 2.e3 on osd.20 to scrub root@ceph-admin-storage:~# ceph pg scrub 2.c1 instructing pg 2.c1 on osd.13 to scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.587 instructing pg 2.587 on osd.13 to deep-scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.e3 instructing pg 2.e3 on osd.20 to deep-scrub root@ceph-admin-storage:~# ceph pg deep-scrub 2.c1 instructing pg 2.c1 on osd.13 to deep-scrub root@ceph-admin-storage:~# ceph pg repair 2.587 instructing pg 2.587 on osd.13 to repair root@ceph-admin-storage:~# ceph pg repair 2.e3 instructing pg 2.e3 on osd.20 to repair root@ceph-admin-storage:~# ceph pg repair 2.c1 instructing pg 2.c1 on osd.13 to repair In the monitor logfiles (ceph-mon.ceph-1/2/3-storage.log) I see the pg scrub, pg deep-scrub and pg repair commands, but I do not see anything in ceph.log and nothing in the ceph-osd.13/20/8.log. (2014-08-18 13:24:49.337954 7f24ac111700 0 mon.ceph-1-storage@0(leader) e2 handle_command
Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)
On 08/18/2014 12:13 PM, John Morris wrote: On 08/14/2014 02:35 AM, Christian Balzer wrote: The default (firefly, but previous ones are functionally identical) crush map has: --- # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } --- The type host states that there will be not more that one replica per host (node), so with size=3 you will need at least 3 hosts to choose from. If you were to change this to to type OSD, all 3 replicas could wind up on the same host, not really a good idea. Ah, this is a great clue. (On my cluster, the default rule contains 'step choose firstn 0 type osd', and thus has the problem you hint at here.) So I played with a new rule set with the buckets 'root', 'rack', 'host', 'bank' and 'osd', of which 'rack' and 'host' are unused. The 'bank' bucket: the OSD nodes each contain two 'banks' of disks with a separate disk controller channel, a separate power supply cable, and a separate SSD. Thus, 'bank' actually does represent a real failure domain. More importantly, this provides a bucket level below 'osd' that is big enough for 3-4 replicas. Here's the rule: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } Ah, with the 'legacy' tunables, the 'chooseleaf' step in the above rule generates bad mappings. But by injecting tunables into the map (recommended in the below link), the rule can be shortened to the following: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step chooseleaf firstn 0 type bank step emit } See this link: http://ceph.com/docs/master/rados/operations/crush-map/#tuning-crush-the-hard-way Below, after re-compiling the new CRUSH map, but before running tests, inject the tunables into the binary map, and then run the tests on /tmp/crush-new-tuned.bin instead: crushtool --enable-unsafe-tunables \ --set-choose-local-tries 0 \ --set-choose-local-fallback-tries 0 \ --set-choose-total-tries 50 \ -i /tmp/crush-new.bin -o /tmp/crush-new-tuned.bin If the OP (sorry, Craig, you do have a name ;) wants to play with CRUSH map rules, here's the quick and dirty of what I did: # get the current 'orig' CRUSH map, decompile and edit; see: # http://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map ceph osd getcrushmap -o /tmp/crush-orig.bin crushtool -d /tmp/crush-orig.bin -o /tmp/crush.txt $EDITOR /tmp/crush.txt # edit the crush map with your fave editor; see: # http://ceph.com/docs/master/rados/operations/crush-map # # in my case, I added the bank type: type 0 osd type 1 bank type 2 host type 3 rack type 4 root # the banks (repeat as applicable): bank bank0 { id -6 alg straw hash 0 item osd.0 weight 1.000 item osd.1 weight 1.000 } bank bank1 { id -7 alg straw hash 0 item osd.2 weight 1.000 item osd.3 weight 1.000 } # updated the hosts (repeat as applicable): host host0 { id -4 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item bank0 weight 2.000 item bank1 weight 2.000 } # and added the rule: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } # compile the crush map: crushtool -c /tmp/crush.txt -o /tmp/crush-new.bin # and run some tests; the replica sizes tested come from # 'min_size' and 'max_size' in the above rule; see: # http://ceph.com/docs/master/man/8/crushtool/#running-tests-with-test # # show sample PG-OSD maps: crushtool -i /tmp/crush-new.bin --test --show-statistics # show bad mappings; if the CRUSH map is correct, # this should be empty: crushtool -i /tmp/crush-new.bin --test --show-bad-mappings # show per-OSD pg utilization: crushtool -i /tmp/crush-new.bin --test --show-utilization You might finackle something like that (again the rule splits on hosts) by having multiple hosts on one physical machine, but therein lies madness. Well, the bucket names can be changed, as above, and Sage hints at doing something like this here: http://wiki.ceph.com/Planning/Blueprints/Dumpling/extend_crush_rule_language (And IIUC he also proposes something to implement my original intentions: distribute four replicas, two on each of two racks, and don't put two replicas on the same host within a rack; this is more easily generalized than the above funky configuration.) John ___
Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)
On Mon, 18 Aug 2014, John Morris wrote: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } You probably want: step choose firstn 0 type bank step choose firstn 1 type osd I.e., 3 (or 4) banks, and 1 osd in each.. not 3 banks with 3 osds in each or 4 banks with 4 osds in each (for a total of 9 or 16 OSDs). sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Managing OSDs on twin machines
Hello guys, I just acquired some brand new machines I would like to rely upon for a storage cluster (and some virtualization). These machines are, however, « twin servers », ie. each blade (1U) comes with two different machines but a single psu. I think two replicas would be enough for the intended purpose. Yet I cannot guarantee that all replicas of a given object are stored on two different blades. I basically have N blades, each blade has 2 distinct machines but a single psu, each machine has 2 hard drives. Is it possible to configure mutual exclusion between OSDs where replicas of a single object are stored? Regards -- Pierre Jaury @ kaiyou http://kaiyou.fr/contact.html signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Managing OSDs on twin machines
Hi Pierre — You can manipulate your CRUSH map to make use of ‘chassis’ in addition to the default ‘host’ type. I’ve done this with FatTwin and FatTwin^2 boxes with great success. For more reading take a look at: http://ceph.com/docs/master/rados/operations/crush-map/ In particular the ‘Move a Bucket’ section: http://ceph.com/docs/master/rados/operations/crush-map/#move-a-bucket ./JRH On Aug 18, 2014, at 2:57 PM, Pierre Jaury pie...@jaury.eu wrote: Hello guys, I just acquired some brand new machines I would like to rely upon for a storage cluster (and some virtualization). These machines are, however, « twin servers », ie. each blade (1U) comes with two different machines but a single psu. I think two replicas would be enough for the intended purpose. Yet I cannot guarantee that all replicas of a given object are stored on two different blades. I basically have N blades, each blade has 2 distinct machines but a single psu, each machine has 2 hard drives. Is it possible to configure mutual exclusion between OSDs where replicas of a single object are stored? Regards -- Pierre Jaury @ kaiyou http://kaiyou.fr/contact.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)
On 08/18/2014 01:49 PM, Sage Weil wrote: On Mon, 18 Aug 2014, John Morris wrote: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } You probably want: step choose firstn 0 type bank step choose firstn 1 type osd I.e., 3 (or 4) banks, and 1 osd in each.. not 3 banks with 3 osds in each or 4 banks with 4 osds in each (for a total of 9 or 16 OSDs). Yes, thanks. Funny, testing still works with the incorrect version, and the --show-utilization test results look similar. In re. to my last email about tunables, those can also be expressed in the human-readable map as such: tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 John sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v0.84 released
The next Ceph development release is here! This release contains several meaty items, including some MDS improvements for journaling, the ability to remove the CephFS file system (and name it), several mon cleanups with tiered pools, several OSD performance branches, a new read forward RADOS caching mode, a prototype Kinetic OSD backend, and various radosgw improvements (especially with the new standalone civetweb frontend). And there are a zillion OSD bug fixes. Things are looking pretty good for the Giant release that is coming up in the next month. Upgrading - * The *_kb perf counters on the monitor have been removed. These are replaced with a new set of *_bytes counters (e.g., cluster_osd_kb is replaced by cluster_osd_bytes). * The rd_kb and wr_kb fields in the JSON dumps for pool stats (accessed via the 'ceph df detail -f json-pretty' and related commands) have been replaced with corresponding *_bytes fields. Similarly, the 'total_space', 'total_used', and 'total_avail' fields are replaced with 'total_bytes', 'total_used_bytes', and 'total_avail_bytes' fields. * The 'rados df --format=json' output 'read_bytes' and 'write_bytes' fields were incorrectly reporting ops; this is now fixed. * The 'rados df --format=json' output previously included 'read_kb' and 'write_kb' fields; these have been removed. Please use 'read_bytes' and 'write_bytes' instead (and divide by 1024 if appropriate). Notable Changes --- * ceph-conf: flush log on exit (Sage Weil) * ceph-dencoder: refactor build a bit to limit dependencies (Sage Weil, Dan Mick) * ceph.spec: split out ceph-common package, other fixes (Sandon Van Ness) * ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya Dryomov) * cephtool: refactor and improve CLI tests (Joao Eduardo Luis) * client: improved MDS session dumps (John Spray) * common: fix dup log messages (#9080, Sage Weil) * crush: include new tunables in dump (Sage Weil) * crush: only require rule features if the rule is used (#8963, Sage Weil) * crushtool: send output to stdout, not stderr (Wido den Hollander) * fix i386 builds (Sage Weil) * fix struct vs class inconsistencies (Thorsten Behrens) * hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen) * librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang) * librbd: fix error path when opening image (#8912, Josh Durgin) * mds: add file system name, enabled flag (John Spray) * mds: boot refactor, cleanup (John Spray) * mds: fix journal conversion with standby-replay (John Spray) * mds: separate inode recovery queue (John Spray) * mds: session ls, evict commands (John Spray) * mds: submit log events in async thread (Yan, Zheng) * mds: use client-provided timestamp for user-visible file metadata (Yan, Zheng) * mds: validate journal header on load and save (John Spray) * misc build fixes for OS X (John Spray) * misc integer size cleanups (Kevin Cox) * mon: add get-quota commands (Joao Eduardo Luis) * mon: do not create file system by default (John Spray) * mon: fix 'ceph df' output for available space (Xiaoxi Chen) * mon: fix bug when no auth keys are present (#8851, Joao Eduardo Luis) * mon: fix compat version for MForward (Joao Eduardo Luis) * mon: restrict some pool properties to tiered pools (Joao Eduardo Luis) * msgr: misc locking fixes for fast dispatch (#8891, Sage Weil) * osd: add 'dump_reservations' admin socket command (Sage Weil) * osd: add READFORWARD caching mode (Luis Pabon) * osd: add header cache for KeyValueStore (Haomai Wang) * osd: add prototype KineticStore based on Seagate Kinetic (Josh Durgin) * osd: allow map cache size to be adjusted at runtime (Sage Weil) * osd: avoid refcounting overhead by passing a few things by ref (Somnath Roy) * osd: avoid sharing PG info that is not durable (Samuel Just) * osd: clear slow request latency info on osd up/down (Sage Weil) * osd: fix PG object listing/ordering bug (Guang Yang) * osd: fix PG stat errors with tiering (#9082, Sage Weil) * osd: fix bug with long object names and rename (#8701, Sage Weil) * osd: fix cache full - not full requeueing (#8931, Sage Weil) * osd: fix gating of messages from old OSD instances (Greg Farnum) * osd: fix memstore bugs with collection_move_rename, lock ordering (Sage Weil) * osd: improve locking for KeyValueStore (Haomai Wang) * osd: make tiering behave if hit_sets aren't enabled (Sage Weil) * osd: mark pools with incomplete clones (Sage Weil) * osd: misc locking fixes for fast dispatch (Samuel Just, Ma Jianpeng) * osd: prevent old rados clients from using tiered pools (#8714, Sage Weil) * osd: reduce OpTracker overhead (Somnath Roy) * osd: set configurable hard limits on object and xattr names (Sage Weil, Haomai Wang) * osd: trim old EC objects quickly; verify on scrub (Samuel Just) * osd: work around GCC 4.8 bug in journal code (Matt Benjamin) * rados bench: fix arg order (Kevin Dalley) * rados: fix {read,write}_ops values for df output (Sage Weil) * rbd: add rbdmap pre-
[ceph-users] cephfs set_layout / setfattr ... does not work anymore for pools
Hi Sage, a couple of months ago (maybe last year) I was able to change the assignment of Directorlies and Files of CephFS to different pools back and forth (with cephfs set_layout as well as with setfattr). Now (with ceph v0.81 and Kernel 3.10 an the client side) neither 'cephfs set_layout' nor 'setfattr' works anymore: # mount | grep ceph ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions) 192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin) # ls -l /mnt/cephfs total 0 -rw-r--r-- 1 root root 0 Aug 18 21:06 file -rw-r--r-- 1 root root 0 Aug 18 21:10 file2 -rw-r--r-- 1 root root 0 Aug 18 21:11 file3 drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2 drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2 # getfattr -d -m - /mnt/cephfs getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs ceph.dir.entries=5 ceph.dir.files=3 ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=SAS-r2 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 ceph.dir.subdirs=2 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs setfattr: /mnt/cephfs: Invalid argument # ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool stripe_width 0 pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool stripe_width 0 pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool stripe_width 0 pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool stripe_width 0 # getfattr -d -m - /mnt/cephfs/ssd-r2 getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs/ssd-r2 ceph.dir.entries=0 ceph.dir.files=0 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument # cephfs /mnt/cephfs/ssd-r2 set_layout -p 3 -s 4194304 -u 4194304 -c 1 Error setting layout: (22) Invalid argument Any recommendations ? Is this a bug, or a new feature ? Do I have to use a newer Kernel ? Kind Regards, -Dieter On Sat, Aug 31, 2013 at 02:26:48AM +0200, Sage Weil wrote: On Fri, 30 Aug 2013, Joao Pedras wrote: Greetings all! I am bumping into a small issue and I am wondering if someone has any insight on it. I am trying to use a pool other than 'data' for cephfs. Said pool has id #3 and I have run 'ceph mds add_data_pool 3'. After mounting cephfs seg faults when trying to set the layout: $ cephfs /path set_layout -p 3 Segmentation fault Actually plainly running 'cephfs /path set_layout' without more options will seg fault as well. Version is 0.61.8 on ubuntu 12.04. A question that comes to mind here is if there is a way of accomplishing this when using ceph-fuse (3.x kernels). You can adjust this more easily using the xattr interface: getfattr -n ceph.dir.layout dir setfattr -n ceph.dir.layout.pool -v mypool getfattr -n ceph.dir.layout dir The interface tests are probably a decent reference given this isn't explicitly documented anywhere: https://github.com/ceph/ceph/blob/master/qa/workunits/misc/layout_vxattrs.sh sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs set_layout / setfattr ... does not work anymore for pools
Hi Dieter, There is a new xattr based interface. See https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh The nice part about this interface is no new tools are necessary (just standard 'attr' or 'setfattr' commands) and it is the same with both ceph-fuse and the kernel client. sage On Mon, 18 Aug 2014, Kasper Dieter wrote: Hi Sage, a couple of months ago (maybe last year) I was able to change the assignment of Directorlies and Files of CephFS to different pools back and forth (with cephfs set_layout as well as with setfattr). Now (with ceph v0.81 and Kernel 3.10 an the client side) neither 'cephfs set_layout' nor 'setfattr' works anymore: # mount | grep ceph ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions) 192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin) # ls -l /mnt/cephfs total 0 -rw-r--r-- 1 root root 0 Aug 18 21:06 file -rw-r--r-- 1 root root 0 Aug 18 21:10 file2 -rw-r--r-- 1 root root 0 Aug 18 21:11 file3 drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2 drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2 # getfattr -d -m - /mnt/cephfs getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs ceph.dir.entries=5 ceph.dir.files=3 ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=SAS-r2 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 ceph.dir.subdirs=2 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs setfattr: /mnt/cephfs: Invalid argument # ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool stripe_width 0 pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool stripe_width 0 pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool stripe_width 0 pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool stripe_width 0 # getfattr -d -m - /mnt/cephfs/ssd-r2 getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs/ssd-r2 ceph.dir.entries=0 ceph.dir.files=0 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument # cephfs /mnt/cephfs/ssd-r2 set_layout -p 3 -s 4194304 -u 4194304 -c 1 Error setting layout: (22) Invalid argument Any recommendations ? Is this a bug, or a new feature ? Do I have to use a newer Kernel ? Kind Regards, -Dieter On Sat, Aug 31, 2013 at 02:26:48AM +0200, Sage Weil wrote: On Fri, 30 Aug 2013, Joao Pedras wrote: Greetings all! I am bumping into a small issue and I am wondering if someone has any insight on it. I am trying to use a pool other than 'data' for cephfs. Said pool has id #3 and I have run 'ceph mds add_data_pool 3'. After mounting cephfs seg faults when trying to set the layout: $ cephfs /path set_layout -p 3 Segmentation fault Actually plainly running 'cephfs /path set_layout' without more options will seg fault as well. Version is 0.61.8 on ubuntu 12.04. A question that comes to mind here is if there is a way of accomplishing this when using ceph-fuse (3.x kernels). You can adjust this more easily using the xattr interface: getfattr -n ceph.dir.layout dir setfattr -n ceph.dir.layout.pool -v mypool getfattr -n ceph.dir.layout dir The interface tests are probably a decent reference given this isn't explicitly documented anywhere: https://github.com/ceph/ceph/blob/master/qa/workunits/misc/layout_vxattrs.sh sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] setfattr ... does not work anymore for pools
Hi Sage, I know about the setattr syntax from https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh = setfattr -n ceph.dir.layout.pool -v data dir setfattr -n ceph.dir.layout.pool -v 2 dir But, in my case it is not working: [root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v 3 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument [root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument [root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 (...) setxattr(/mnt/cephfs/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 EINVAL (Invalid argument) Same with ceph-fuse: [root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/ceph-fuse/ssd-r2 (...) setxattr(/mnt/ceph-fuse/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 EINVAL (Invalid argument) Setting all layout attribute at once does not work either: [root@rx37-1 cephfs]# setfattr -n ceph.dir.layout -v stripe_unit=2097152 stripe_count=1 object_size=4194304 pool=SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument How can I debug this further ? It seems the Directory has no layout at all: # getfattr -d -m - /mnt/cephfs/ssd-r2 # file: ssd-r2 ceph.dir.entries=0 ceph.dir.files=0 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 ceph.dir.subdirs=0 Kind Regards, -Dieter On Mon, Aug 18, 2014 at 09:37:39PM +0200, Sage Weil wrote: Hi Dieter, There is a new xattr based interface. See https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh The nice part about this interface is no new tools are necessary (just standard 'attr' or 'setfattr' commands) and it is the same with both ceph-fuse and the kernel client. sage On Mon, 18 Aug 2014, Kasper Dieter wrote: Hi Sage, a couple of months ago (maybe last year) I was able to change the assignment of Directorlies and Files of CephFS to different pools back and forth (with cephfs set_layout as well as with setfattr). Now (with ceph v0.81 and Kernel 3.10 an the client side) neither 'cephfs set_layout' nor 'setfattr' works anymore: # mount | grep ceph ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions) 192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin) # ls -l /mnt/cephfs total 0 -rw-r--r-- 1 root root 0 Aug 18 21:06 file -rw-r--r-- 1 root root 0 Aug 18 21:10 file2 -rw-r--r-- 1 root root 0 Aug 18 21:11 file3 drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2 drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2 # getfattr -d -m - /mnt/cephfs getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs ceph.dir.entries=5 ceph.dir.files=3 ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=SAS-r2 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 ceph.dir.subdirs=2 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs setfattr: /mnt/cephfs: Invalid argument # ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool stripe_width 0 pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool stripe_width 0 pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool stripe_width 0 pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool stripe_width 0 # getfattr -d -m - /mnt/cephfs/ssd-r2 getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs/ssd-r2 ceph.dir.entries=0 ceph.dir.files=0 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument # cephfs /mnt/cephfs/ssd-r2 set_layout -p 3 -s 4194304 -u 4194304 -c 1 Error setting layout: (22) Invalid argument Any recommendations ? Is this a bug, or a new feature ? Do
Re: [ceph-users] setfattr ... works after 'ceph mds add_data_pool'
Hi Sage, it seems the pools must be added to the MDS first: ceph mds add_data_pool 3# = SSD-r2 ceph mds add_data_pool 4# = SAS-r2 After these commands the setfattr -n ceph.dir.layout.pool worked. Thanks, -Dieter On Mon, Aug 18, 2014 at 10:19:08PM +0200, Kasper Dieter wrote: Hi Sage, I know about the setattr syntax from https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh = setfattr -n ceph.dir.layout.pool -v data dir setfattr -n ceph.dir.layout.pool -v 2 dir But, in my case it is not working: [root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v 3 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument [root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument [root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2 (...) setxattr(/mnt/cephfs/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 EINVAL (Invalid argument) Same with ceph-fuse: [root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/ceph-fuse/ssd-r2 (...) setxattr(/mnt/ceph-fuse/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 EINVAL (Invalid argument) Setting all layout attribute at once does not work either: [root@rx37-1 cephfs]# setfattr -n ceph.dir.layout -v stripe_unit=2097152 stripe_count=1 object_size=4194304 pool=SSD-r2 /mnt/cephfs/ssd-r2 setfattr: /mnt/cephfs/ssd-r2: Invalid argument How can I debug this further ? It seems the Directory has no layout at all: # getfattr -d -m - /mnt/cephfs/ssd-r2 # file: ssd-r2 ceph.dir.entries=0 ceph.dir.files=0 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 ceph.dir.subdirs=0 Kind Regards, -Dieter On Mon, Aug 18, 2014 at 09:37:39PM +0200, Sage Weil wrote: Hi Dieter, There is a new xattr based interface. See https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh The nice part about this interface is no new tools are necessary (just standard 'attr' or 'setfattr' commands) and it is the same with both ceph-fuse and the kernel client. sage On Mon, 18 Aug 2014, Kasper Dieter wrote: Hi Sage, a couple of months ago (maybe last year) I was able to change the assignment of Directorlies and Files of CephFS to different pools back and forth (with cephfs set_layout as well as with setfattr). Now (with ceph v0.81 and Kernel 3.10 an the client side) neither 'cephfs set_layout' nor 'setfattr' works anymore: # mount | grep ceph ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions) 192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin) # ls -l /mnt/cephfs total 0 -rw-r--r-- 1 root root 0 Aug 18 21:06 file -rw-r--r-- 1 root root 0 Aug 18 21:10 file2 -rw-r--r-- 1 root root 0 Aug 18 21:11 file3 drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2 drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2 # getfattr -d -m - /mnt/cephfs getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs ceph.dir.entries=5 ceph.dir.files=3 ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=SAS-r2 ceph.dir.rbytes=0 ceph.dir.rctime=0.090 ceph.dir.rentries=1 ceph.dir.rfiles=0 ceph.dir.rsubdirs=1 ceph.dir.subdirs=2 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs setfattr: /mnt/cephfs: Invalid argument # ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0 pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool stripe_width 0 pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool stripe_width 0 pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool stripe_width 0 pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool stripe_width 0 # getfattr -d -m - /mnt/cephfs/ssd-r2 getfattr: Removing leading '/' from absolute path names # file: mnt/cephfs/ssd-r2 ceph.dir.entries=0
Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)
On 08/18/2014 02:20 PM, John Morris wrote: On 08/18/2014 01:49 PM, Sage Weil wrote: On Mon, 18 Aug 2014, John Morris wrote: rule by_bank { ruleset 3 type replicated min_size 3 max_size 4 step take default step choose firstn 0 type bank step choose firstn 0 type osd step emit } You probably want: step choose firstn 0 type bank step choose firstn 1 type osd I.e., 3 (or 4) banks, and 1 osd in each.. not 3 banks with 3 osds in each or 4 banks with 4 osds in each (for a total of 9 or 16 OSDs). Yes, thanks. Funny, testing still works with the incorrect version, and the --show-utilization test results look similar. In re. to my last email about tunables, those can also be expressed in the human-readable map as such: tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 Wrapping up this exercise: This little script helps to see exactly where things go, and show what goes wrong with my original, incorrect map. #!/bin/bash echo compiling crush map crushtool -c /tmp/crush.txt -o /tmp/crush-new.bin \ --enable-unsafe-tunables bad=$(crushtool -i /tmp/crush2-new.bin --test \ --show-bad-mappings 21 | \ wc -l) echo number of bad mappings: $bad distribution() { crushtool -i /tmp/crush2-new.bin --test --show-statistics \ --num-rep $1 21 | \ awk '/\[.*\]/ { gsub([][],,$6); split($6,a,,); asort(a,d); print d[1], d[2], d[3], d[4]; }' | \ sort | uniq -c } for i in 3 4; do echo distribution of size=${i} replicas: distribution $i done For --num-rep=4, the result looks like the following; it's easily seen that two sets of OSDs in the same bank are always picked, exactly what we do NOT want (note OSDs 0+1 in bank0, 1+2 in bank1, etc.): 173 0 1 2 3 176 0 1 4 5 184 0 1 6 7 171 2 3 4 5 156 2 3 6 7 164 4 5 6 7 After Sage's correction, the result looks like the following, with one OSD from each bank: 70 0 2 4 6 74 0 2 4 7 65 0 2 5 6 58 0 2 5 7 60 0 3 4 6 72 0 3 4 7 80 0 3 5 6 64 0 3 5 7 48 1 2 4 6 66 1 2 4 7 72 1 2 5 6 46 1 2 5 7 73 1 3 4 6 70 1 3 4 7 51 1 3 5 6 55 1 3 5 7 When replicas=3, the result is also correct. So this is a bit of a hack, but it does seem to work to evenly distribute 3-4 replicas across a bucket level with only two nodes. Late into this exploration, it appears that if the 'bank' layer is undesirable, this also works to distribute evenly across hosts: step choose firstn 0 type host step choose firstn 2 type osd In conclusion, this example doesn't seem so far-fetched, since it's easy to imagine wanting to distribute OSDs across two racks, or PDUs, or data centers, where it's not so unreasonable to say a third is out of the budget. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] active+remapped after remove osd via ceph osd out
After replace broken disk and ceph osd in it, cluster: ceph health detail HEALTH_WARN 2 pgs stuck unclean; recovery 60/346857819 degraded (0.000%) pg 3.884 is stuck unclean for 570722.873270, current state active+remapped, last acting [143,261,314] pg 3.154a is stuck unclean for 577659.917066, current state active+remapped, last acting [85,224,64] recovery 60/346857819 degraded (0.000%) What can be wrong? It is possible this is caused by 'ceph osd reweight-by-utilization' ? More info: ceph -v ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a) Enabled tunnables: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 df osd: 143 - 78% 261 - 78% 314 - 80% 85 - 76% 224 76% 64 - 75% ceph osd dump | grep -i pool pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0 pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner 0 pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 90517 owner 0 pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0 pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0 pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0 pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28467 owner 18446744073709551615 pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28468 owner 18446744073709551615 pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner 18446744073709551615 pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 33487 owner 18446744073709551615 pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0 pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 46912 owner 0 ceph pg 3.884 query { state: active+remapped, epoch: 160655, up: [ 143], acting: [ 143, 261, 314], info: { pgid: 3.884, last_update: 160655'111533, last_complete: 160655'111533, log_tail: 159997'108532, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 4, last_epoch_started: 160261, last_epoch_clean: 160261, last_epoch_split: 11488, same_up_since: 160252, same_interval_since: 160260, same_primary_since: 160252, last_scrub: 155516'107396, last_scrub_stamp: 2014-08-06 03:15:18.193611, last_deep_scrub: 155516'107293, last_deep_scrub_stamp: 2014-08-03 06:45:59.215397, last_clean_scrub_stamp: 2014-08-06 03:15:18.193611}, stats: { version: 160655'111533, reported_seq: 856860, reported_epoch: 160655, state: active+remapped, last_fresh: 2014-08-18 23:06:47.068588, last_change: 2014-08-17 21:12:29.452628, last_active: 2014-08-18 23:06:47.068588, last_clean: 2014-08-12 08:44:00.293916, last_became_active: 2013-10-25 14:54:55.902442, last_unstale: 2014-08-18 23:06:47.068588, mapping_epoch: 160258, log_start: 159997'108532, ondisk_log_start: 159997'108532, created: 4, last_epoch_clean: 160261, parent: 0.0, parent_split_bits: 0, last_scrub: 155516'107396, last_scrub_stamp: 2014-08-06 03:15:18.193611, last_deep_scrub: 155516'107293, last_deep_scrub_stamp: 2014-08-03 06:45:59.215397, last_clean_scrub_stamp: 2014-08-06 03:15:18.193611, log_size: 3001, ondisk_log_size: 3001, stats_invalid: 0, stat_sum: { num_bytes: 2750235192, num_objects: 12015, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_read: 708045, num_read_kb: 39418032, num_write: 120983, num_write_kb: 2383937, num_scrub_errors: 0, num_shallow_scrub_errors: 0,
Re: [ceph-users] [radosgw-admin] bilog list confusion
I have the same results. The primary zone (with log_meta and log_data true) have bilog data, the secondary zone (with log_meta and log_data false) do not have bilog data. I'm just guessing here (I can't test it right now)... I would think that disabling log_meta and log_data will stop adding new information to the bilog, but keep existing bilogs. If that's true, bilog trim should clean up the old logs (along with mdlog trim and datalog trim). On Mon, Aug 18, 2014 at 5:43 AM, Patrycja Szabłowska szablowska.patry...@gmail.com wrote: Hi, Is there any configuration option in ceph.conf for enabling/disabling the bilog list? I mean the result of this command: radosgw-admin bilog list One ceph cluster gives me results - list of operations which were made to the bucket, and the other one gives me just an empty list. I can't see what's the reason. I can't find it anywhere here in the ceph.conf file. http://ceph.com/docs/master/rados/configuration/ceph-conf/ My guess is it's in region info, but when I've changed these values to false for the cluster with working bilog, the bilog would still show. 1. cluster with empty bilog list: zones: [ { name: default, endpoints: [], log_meta: false, log_data: false}], 2. cluster with *proper* bilog list: zones: [ { name: master-1, endpoints: [ http:\/\/[...]], log_meta: true, log_data: true}], Here are pools on both of the clusters: 1. cluster with *proper* bilog list: rbd .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users.email .users .rgw.buckets .rgw.buckets.index .log '' 2. cluster with empty bilog list: data metadata rbd .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users.email .users '' .rgw.buckets.index .rgw.buckets .log And here is the zone info (just the placement_pools, rest of the config is the same): 1. cluster with *proper* bilog list: placement_pools: [] 2. cluster with *empty* bilog list: placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: }}]} Any thoughts? I've tried to figure it out by myself, but no luck. Thanks, Patrycja Szabłowska ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.84 released
This may be a better question for Federico. I've pulled the systemd stuff from git and I have it working, but only if I have the volumes listed in fstab. Is this the intended way that systemd will function for now or am I missing a step? I'm pretty new to systemd. Thanks, Robert LeBlanc On Mon, Aug 18, 2014 at 1:14 PM, Sage Weil s...@inktank.com wrote: The next Ceph development release is here! This release contains several meaty items, including some MDS improvements for journaling, the ability to remove the CephFS file system (and name it), several mon cleanups with tiered pools, several OSD performance branches, a new read forward RADOS caching mode, a prototype Kinetic OSD backend, and various radosgw improvements (especially with the new standalone civetweb frontend). And there are a zillion OSD bug fixes. Things are looking pretty good for the Giant release that is coming up in the next month. Upgrading - * The *_kb perf counters on the monitor have been removed. These are replaced with a new set of *_bytes counters (e.g., cluster_osd_kb is replaced by cluster_osd_bytes). * The rd_kb and wr_kb fields in the JSON dumps for pool stats (accessed via the 'ceph df detail -f json-pretty' and related commands) have been replaced with corresponding *_bytes fields. Similarly, the 'total_space', 'total_used', and 'total_avail' fields are replaced with 'total_bytes', 'total_used_bytes', and 'total_avail_bytes' fields. * The 'rados df --format=json' output 'read_bytes' and 'write_bytes' fields were incorrectly reporting ops; this is now fixed. * The 'rados df --format=json' output previously included 'read_kb' and 'write_kb' fields; these have been removed. Please use 'read_bytes' and 'write_bytes' instead (and divide by 1024 if appropriate). Notable Changes --- * ceph-conf: flush log on exit (Sage Weil) * ceph-dencoder: refactor build a bit to limit dependencies (Sage Weil, Dan Mick) * ceph.spec: split out ceph-common package, other fixes (Sandon Van Ness) * ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya Dryomov) * cephtool: refactor and improve CLI tests (Joao Eduardo Luis) * client: improved MDS session dumps (John Spray) * common: fix dup log messages (#9080, Sage Weil) * crush: include new tunables in dump (Sage Weil) * crush: only require rule features if the rule is used (#8963, Sage Weil) * crushtool: send output to stdout, not stderr (Wido den Hollander) * fix i386 builds (Sage Weil) * fix struct vs class inconsistencies (Thorsten Behrens) * hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen) * librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang) * librbd: fix error path when opening image (#8912, Josh Durgin) * mds: add file system name, enabled flag (John Spray) * mds: boot refactor, cleanup (John Spray) * mds: fix journal conversion with standby-replay (John Spray) * mds: separate inode recovery queue (John Spray) * mds: session ls, evict commands (John Spray) * mds: submit log events in async thread (Yan, Zheng) * mds: use client-provided timestamp for user-visible file metadata (Yan, Zheng) * mds: validate journal header on load and save (John Spray) * misc build fixes for OS X (John Spray) * misc integer size cleanups (Kevin Cox) * mon: add get-quota commands (Joao Eduardo Luis) * mon: do not create file system by default (John Spray) * mon: fix 'ceph df' output for available space (Xiaoxi Chen) * mon: fix bug when no auth keys are present (#8851, Joao Eduardo Luis) * mon: fix compat version for MForward (Joao Eduardo Luis) * mon: restrict some pool properties to tiered pools (Joao Eduardo Luis) * msgr: misc locking fixes for fast dispatch (#8891, Sage Weil) * osd: add 'dump_reservations' admin socket command (Sage Weil) * osd: add READFORWARD caching mode (Luis Pabon) * osd: add header cache for KeyValueStore (Haomai Wang) * osd: add prototype KineticStore based on Seagate Kinetic (Josh Durgin) * osd: allow map cache size to be adjusted at runtime (Sage Weil) * osd: avoid refcounting overhead by passing a few things by ref (Somnath Roy) * osd: avoid sharing PG info that is not durable (Samuel Just) * osd: clear slow request latency info on osd up/down (Sage Weil) * osd: fix PG object listing/ordering bug (Guang Yang) * osd: fix PG stat errors with tiering (#9082, Sage Weil) * osd: fix bug with long object names and rename (#8701, Sage Weil) * osd: fix cache full - not full requeueing (#8931, Sage Weil) * osd: fix gating of messages from old OSD instances (Greg Farnum) * osd: fix memstore bugs with collection_move_rename, lock ordering (Sage Weil) * osd: improve locking for KeyValueStore (Haomai Wang) * osd: make tiering behave if hit_sets aren't enabled (Sage Weil) * osd: mark pools with incomplete clones (Sage Weil) * osd: misc locking fixes for fast dispatch (Samuel Just, Ma Jianpeng) * osd: prevent old rados clients from using
Re: [ceph-users] v0.84 released
On Mon, 18 Aug 2014, Robert LeBlanc wrote: This may be a better question for Federico. I've pulled the systemd stuff from git and I have it working, but only if I have the volumes listed in fstab. Is this the intended way that systemd will function for now or am I missing a step? I'm pretty new to systemd. The OSDs are normally mounted and started via udev, which will call 'ceph-disk activate device'. The missing piece is teaching ceph-disk how to start up the systemd service for the OSD. I suspect that this can be completely dynamic, based on udev events, not not using 'enable' thing where systemd persistently registers that a service is to be started...? sage Thanks, Robert LeBlanc On Mon, Aug 18, 2014 at 1:14 PM, Sage Weil s...@inktank.com wrote: The next Ceph development release is here! This release contains several meaty items, including some MDS improvements for journaling, the ability to remove the CephFS file system (and name it), several mon cleanups with tiered pools, several OSD performance branches, a new read forward RADOS caching mode, a prototype Kinetic OSD backend, and various radosgw improvements (especially with the new standalone civetweb frontend). And there are a zillion OSD bug fixes. Things are looking pretty good for the Giant release that is coming up in the next month. Upgrading - * The *_kb perf counters on the monitor have been removed. These are replaced with a new set of *_bytes counters (e.g., cluster_osd_kb is replaced by cluster_osd_bytes). * The rd_kb and wr_kb fields in the JSON dumps for pool stats (accessed via the 'ceph df detail -f json-pretty' and related commands) have been replaced with corresponding *_bytes fields. Similarly, the 'total_space', 'total_used', and 'total_avail' fields are replaced with 'total_bytes', 'total_used_bytes', and 'total_avail_bytes' fields. * The 'rados df --format=json' output 'read_bytes' and 'write_bytes' fields were incorrectly reporting ops; this is now fixed. * The 'rados df --format=json' output previously included 'read_kb' and 'write_kb' fields; these have been removed. Please use 'read_bytes' and 'write_bytes' instead (and divide by 1024 if appropriate). Notable Changes --- * ceph-conf: flush log on exit (Sage Weil) * ceph-dencoder: refactor build a bit to limit dependencies (Sage Weil, Dan Mick) * ceph.spec: split out ceph-common package, other fixes (Sandon Van Ness) * ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya Dryomov) * cephtool: refactor and improve CLI tests (Joao Eduardo Luis) * client: improved MDS session dumps (John Spray) * common: fix dup log messages (#9080, Sage Weil) * crush: include new tunables in dump (Sage Weil) * crush: only require rule features if the rule is used (#8963, Sage Weil) * crushtool: send output to stdout, not stderr (Wido den Hollander) * fix i386 builds (Sage Weil) * fix struct vs class inconsistencies (Thorsten Behrens) * hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen) * librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang) * librbd: fix error path when opening image (#8912, Josh Durgin) * mds: add file system name, enabled flag (John Spray) * mds: boot refactor, cleanup (John Spray) * mds: fix journal conversion with standby-replay (John Spray) * mds: separate inode recovery queue (John Spray) * mds: session ls, evict commands (John Spray) * mds: submit log events in async thread (Yan, Zheng) * mds: use client-provided timestamp for user-visible file metadata (Yan, Zheng) * mds: validate journal header on load and save (John Spray) * misc build fixes for OS X (John Spray) * misc integer size cleanups (Kevin Cox) * mon: add get-quota commands (Joao Eduardo Luis) * mon: do not create file system by default (John Spray) * mon: fix 'ceph df' output for available space (Xiaoxi Chen) * mon: fix bug when no auth keys are present (#8851, Joao Eduardo Luis) * mon: fix compat version for MForward (Joao Eduardo Luis) * mon: restrict some pool properties to tiered pools (Joao Eduardo Luis) * msgr: misc locking fixes for fast dispatch (#8891, Sage Weil) * osd: add 'dump_reservations' admin socket command (Sage Weil) * osd: add READFORWARD caching mode (Luis Pabon) * osd: add header cache for KeyValueStore (Haomai Wang) * osd: add prototype KineticStore based on Seagate Kinetic (Josh Durgin)
Re: [ceph-users] ceph cluster inconsistency?
On Mon, Aug 18, 2014 at 7:32 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: - Message from Haomai Wang haomaiw...@gmail.com - Date: Mon, 18 Aug 2014 18:34:11 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Cc: Sage Weil sw...@redhat.com, ceph-users@lists.ceph.com On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, I tried this after restarting the osd, but I guess that was not the aim ( # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_| grep 6adb1100 -A 100 IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily unavailable tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780 tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr)) .. ) When I run it after bringing the osd down, it takes a while, but it has no output.. (When running it without the grep, I'm getting a huge list ) Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be reversed into leveldb. So grep benchmark_data_ceph001.cubone.os_5560_object789734 should be help it. this gives: [root@ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_ | grep 5560_object789734 -A 100 _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object789734!head _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1330170!head _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ceph001%ecubone%eos_4919_object227366!head _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1363631!head _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1573957!head _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1019282!head _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1283563!head _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ceph001%ecubone%eos_4919_object273736!head _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1170628!head _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ceph001%ecubone%eos_4919_object256335!head _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1484196!head _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ceph001%ecubone%eos_5560_object884178!head _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object853746!head _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object36633!head _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1235337!head _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1661351!head _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object238126!head _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object339943!head _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1047094!head _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ceph001%ecubone%eos_31461_object520642!head _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object639565!head _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ceph001%ecubone%eos_4919_object231080!head _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object858050!head _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object241796!head _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ceph001%ecubone%eos_4919_object7462!head _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ceph001%ecubone%eos_5560_object243798!head _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ceph001%ecubone%eos_8961_object109512!head _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ceph001%ecubone%eos_31461_object653973!head _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1378169!head _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ceph001%ecubone%eos_5560_object512925!head _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ceph001%ecubone%eos_4919_object23289!head _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1108852!head _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ceph001%ecubone%eos_5560_object704026!head _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ceph001%ecubone%eos_8961_object250441!head _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ceph001%ecubone%eos_31461_object706178!head _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ceph001%ecubone%eos_5560_object316952!head