[ceph-users] OSD not marked as down or out
Hello everyone, I have a cluster running with OpenStack. It has 6 OSD (3 in each 2 different locations). Each pool has 3 replication size with 2 copy in primary location and 1 copy at secondary location. Everything is running as expected but the osd are not marked as down when I poweroff a OSD server. It has been around an hour. I tried changing the heartbeat settings too. Can someone point me in right direction. OSD 0 log = 2015-02-20 16:20:14.009723 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:54.009720) 2015-02-20 16:20:15.009908 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:55.009907) 2015-02-20 16:20:16.010123 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:56.010119) 2015-02-20 16:20:16.648167 7f3fc9a76700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:56.648165) Ceph monitor log 2015-02-20 16:49:16.831548 7f416e4aa700 1 mon.storage1@1(leader).osd e455 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.4 192.168.100.35:6800/1305 is reporting failure:1 2015-02-20 16:49:16.831593 7f416e4aa700 0 log_channel(cluster) log [DBG] : osd.2 192.168.100.33:6800/24431 reported failed by osd.4 192.168.100.35:6800/1305 2015-02-20 16:49:17.080314 7f416e4aa700 1 mon.storage1@1(leader).osd e455 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.3 192.168.100.34:6800/1358 is reporting failure:1 2015-02-20 16:49:17.080527 7f416e4aa700 0 log_channel(cluster) log [DBG] : osd.2 192.168.100.33:6800/24431 reported failed by osd.3 192.168.100.34:6800/1358 2015-02-20 16:49:17.420859 7f416e4aa700 1 mon.storage1@1(leader).osd e455 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.5 192.168.100.36:6800/1359 is reporting failure:1 #ceph osd stat osdmap e455: 6 osds: 6 up, 6 in #ceph -s cluster c8a5975f-4c86-4cfe-a91b-fac9f3126afc health HEALTH_WARN 528 pgs peering; 528 pgs stuck inactive; 528 pgs stuck unclean; 1 requests are blocked 32 sec; 1 mons down, quorum 1,2,3,4 storage1,storage2,compute3,compute4 monmap e1: 5 mons at {admin= 192.168.100.39:6789/0,compute3=192.168.100.133:6789/0,compute4=192.168.100.134:6789/0,storage1=192.168.100.120:6789/0,storage2=192.168.100.121:6789/0}, election epoch 132, quorum 1,2,3,4 storage1,storage2,compute3,compute4 osdmap e455: 6 osds: 6 up, 6 in pgmap v48474: 3650 pgs, 19 pools, 27324 MB data, 4420 objects 82443 MB used, 2682 GB / 2763 GB avail 3122 active+clean 528 remapped+peering Ceph.conf file [global] fsid = c8a5975f-4c86-4cfe-a91b-fac9f3126afc mon_initial_members = admin, storage1, storage2, compute3, compute4 mon_host = 192.168.100.39,192.168.100.120,192.168.100.121,192.168.100.133,192.168.100.134 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd pool default size = 3 osd pool default min size = 3 osd pool default pg num = 300 osd pool default pgp num = 300 public network = 192.168.100.0/24 rgw print continue = false rgw enable ops log = false mon osd report timeout = 60 mon osd down out interval = 30 mon osd min down reports = 2 osd heartbeat grace = 10 osd mon heartbeat interval = 20 osd mon report interval max = 60 osd mon ack timeout = 15 mon osd min down reports = 2 Regards, Sudarshan Pathak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] running giant/hammer mds with firefly osds
Hi all, Back in the dumpling days, we were able to run the emperor MDS with dumpling OSDs -- this was an improvement over the dumpling MDS. Now we have stable firefly OSDs, but I was wondering if we can reap some of the recent CephFS developments by running a giant or ~hammer MDS with our firefly OSDs. Did anyone try that yet? Best Regards, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] running giant/hammer mds with firefly osds
Hi Dan, I remember http://tracker.ceph.com/issues/9945 introducing some issues with running cephfs between different versions of giant/firefly. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html So if you upgrade please be aware that you'll also have to update the clients. On Fri, Feb 20, 2015 at 10:33 AM, Dan van der Ster d...@vanderster.com wrote: Hi all, Back in the dumpling days, we were able to run the emperor MDS with dumpling OSDs -- this was an improvement over the dumpling MDS. Now we have stable firefly OSDs, but I was wondering if we can reap some of the recent CephFS developments by running a giant or ~hammer MDS with our firefly OSDs. Did anyone try that yet? Best Regards, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] new ssd intel s3610, has somebody tested them ?
Interesting, thanks for the link. I hope the quality on the 3610/3710 is as good as the 3700... we haven't yet seen a single failure in production. Cheers, Dan On Fri, Feb 20, 2015 at 8:06 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, Intel has just released new ssd s3610: http://www.anandtech.com/show/8954/intel-launches-ssd-dc-s3610-s3710-enterprise-ssds endurance is 10x bigger than 3500, for 10% cost addition. Has somebody already tested them ? Regards, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] new ssd intel s3610, has somebody tested them ?
Hello, On Fri, 20 Feb 2015 09:30:56 +0100 Dan van der Ster wrote: Interesting, thanks for the link. Interesting indeed, more for a non-Ceph project of mine, but still. ^o^ I hope the quality on the 3610/3710 is as good as the 3700... we haven't yet seen a single failure in production. Same here, same goes for the consumer models (5xx). Those will naturally wear out faster, but none has failed so far and the most I got to wear out some was down to 83% after 2 hours of uptime. ^^ Christian Cheers, Dan On Fri, Feb 20, 2015 at 8:06 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, Intel has just released new ssd s3610: http://www.anandtech.com/show/8954/intel-launches-ssd-dc-s3610-s3710-enterprise-ssds endurance is 10x bigger than 3500, for 10% cost addition. Has somebody already tested them ? Regards, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] unsubscribe
unsubscribe -- Best regards, Konstantin Khatskevich ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] erasure coded pool
Is it possible to run an erasure coded pool using default k=2, m=2 profile on a single node? (this is just for functionality testing). The single node has 3 OSDs. Replicated pools run fine. ceph.conf does contain: osd crush chooseleaf type = 0 -- Tom Deneau ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and data locality?
Okay thanks for pointing me in the right direction. From a quick read I think this will work but will take a look in detail. Thanks! Jake On Tue, Feb 17, 2015 at 3:16 PM, Gregory Farnum wrote: On Tue, Feb 17, 2015 at 10:36 AM, Jake Kugel jkugel@... wrote: Hi, I'm just starting to look at Ceph and CephFS. I see that Ceph supports dynamic object interfaces to allow some processing of object data on the same node where the data is stored [1]. This might be a naive question, but is there any way to get data locality when using CephFS? For example, somehow arrange for parts of the filesystem to reside on OSDs on same system using CephFS client? It's unrelated to the in-place RADOS class computation, but you can do some intelligent placement by having specialized CRUSH rules and making use of the CephFS' data layouts. Check the docs! :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
Any update on this matter? I've been thinking of upgrading from 0.80.7 to 0.80.8 - lucky that I see this thread first... On Thu, Feb 12, 2015 at 10:39 PM, 杨万元 yangwanyuan8...@gmail.com wrote: thanks very much for your advice . yes,as you said,disabled the rbd_cache will improve the read request,but if i disabled rbd_cache, the randwrite request will be worse. so this method maybe can not solve my problem, is it ? In addition , I also test the 0.80.6 and 0.80.7 librbd,they are as good as 0.80.5 performance , so maybe can sure this problem is cause from 0.80.8 2015-02-12 19:33 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com: Hi, Can you test with disabling rbd_cache ? I remember of a bug detected in giant, not sure it's also the case for fireflt This was this tracker: http://tracker.ceph.com/issues/9513 But It has been solved and backported to firefly. Also, can you test 0.80.6 and 0.80.7 ? - Mail original - De: killingwolf killingw...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 12:16:32 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow I have this problems too , Help! -- 原始邮件 -- 发件人: 杨万元;yangwanyuan8...@gmail.com; 发送时间: 2015年2月12日(星期四) 中午11:14 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com; 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite:
Re: [ceph-users] initially conf calamari to know about my Ceph cluster(s)
By the way, you may want to put these sorts of questions on ceph-calam...@lists.ceph.com, which is specific to calamari. On 02/16/2015 01:08 PM, Steffen Winther wrote: Steffen Winther ceph.user@... writes: Trying to figure out how to initially configure calamari clients to know about my Ceph Cluster(s) when such aint install through ceph.deploy but through Proxmox pveceph. Assume I possible need to copy some client admin keys and configure my MON hosts somehow, any pointers to doc on this? :) stupid me, most have been to tried after struggling with the built... It was just a question of finish the Karan$s guide from step 5 and make my salt master and minions work plus diamond. Now everything seems to be working, nice dashboard/workbench etc. Step 5 from http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html: 5 Calamari would not be able to find the Ceph cluster and will ask to add a cluster, for this we need to add Ceph clients to dashboard by installing salt-minion and diamond packages on them. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dan Mick Red Hat, Inc. Ceph docs: http://ceph.com/docs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Calamari build in vagrants
On 02/16/2015 12:57 PM, Steffen Winther wrote: Dan Mick dmick@... writes: 0cbcfbaa791baa3ee25c4f1a135f005c1d568512 on the 1.2.3 branch has the change to yo 1.1.0. I've just cherry-picked that to v1.3 and master. Do you mean that you merged 1.2.3 into master and branch 1.3? I put just that specific commit onto v1.3 and master. The branches may need a little preening to be completely synced, but that commit should be in all of them now. BTW I managed to clone and built branch 1.2.3 in my vagrant env. \o/ -- Dan Mick Red Hat, Inc. Ceph docs: http://ceph.com/docs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cluster never reaching clean after osd out
I have a Cluster of 3 hosts, running giant on Debian wheezy and Backports Kernel 3.16.0-0.bpo.4-amd64. For testing I did a ~# ceph osd out 20 from a clean state. Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean to get up and then go down to about 11. Short after that the cluster keeps stuck forever in this state: health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded (0.265%); 3691/169647 objects misplaced (2.176%) According to the documentation at http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should reach a clean state after an osd out. What am I doing wrong? Below some config and command outputs: ~# ceph osd tree # idweight type name up/down reweight -1 76.02 root default -2 25.34 host ve51 0 3.62osd.0 up 1 3 3.62osd.3 up 1 6 3.62osd.6 up 1 9 3.62osd.9 up 1 12 3.62osd.12 up 1 15 3.62osd.15 up 1 18 3.62osd.18 up 1 -3 25.34 host ve52 1 3.62osd.1 up 1 4 3.62osd.4 up 1 7 3.62osd.7 up 1 10 3.62osd.10 up 1 13 3.62osd.13 up 1 16 3.62osd.16 up 1 19 3.62osd.19 up 1 -4 25.34 host ve53 2 3.62osd.2 up 1 5 3.62osd.5 up 1 8 3.62osd.8 up 1 11 3.62osd.11 up 1 14 3.62osd.14 up 1 17 3.62osd.17 up 1 20 3.62osd.20 up 1 == ~# cat ceph.conf [global] fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196 public_network = 192.168.10.0/24 cluster_network = 192.168.10.0/24 mon_initial_members = ve51, ve52, ve53 mon_host = 192.168.10.51,192.168.10.52,192.168.10.53 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true mon_osd_down_out_subtree_limit = host osd_pool_default_size=3 osd_pool_default_min_size=2 [osd] osd_journal_size = 2 osd_mount_options_xfs = noatime,nodiratime,logbsize=256k,logbufs=8,inode64 == ~# ceph -s cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196 health HEALTH_OK monmap e1: 3 mons at {ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53: 6789/0}, election epoch 28, quorum 0,1,2 ve51,ve52,ve53 osdmap e1353: 21 osds: 21 up, 21 in pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects 658 GB used, 77139 GB / 77797 GB avail 2048 active+clean == ~# cat crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 device 16 osd.16 device 17 osd.17 device 18 osd.18 device 19 osd.19 device 20 osd.20 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ve51 { id -2 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.0 weight 3.620 item osd.3 weight 3.620 item osd.6 weight 3.620 item osd.9 weight 3.620 item osd.12 weight 3.620 item osd.15 weight 3.620 item osd.18 weight 3.620 } host ve52 { id -3 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.1 weight 3.620 item osd.4 weight 3.620 item osd.7 weight 3.620 item osd.10 weight 3.620 item osd.13 weight 3.620 item osd.16 weight 3.620 item osd.19 weight 3.620 } host ve53 { id -4 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.2 weight 3.620 item osd.5 weight 3.620 item osd.8 weight 3.620 item osd.11 weight 3.620 item osd.14 weight 3.620 item osd.17 weight 3.620 item osd.20 weight 3.620 } root default { id -1 # do not change unnecessarily # weight 76.020 alg straw hash 0 # rjenkins1 item ve51 weight 25.340 item ve52 weight 25.340
Re: [ceph-users] Fixing a crushmap
Here was the process I went through. 1) I created an EC pool which created ruleset 1 2) I edited the crushmap to approximately its current form 3) I discovered my previous EC pool wasn't doing what I meant for it to do, so I deleted it. 4) I created a new EC pool with the parameters I wanted and told it to use ruleset 3 On Fri, Feb 20, 2015 at 10:55 AM, Luis Periquito periqu...@gmail.com wrote: The process of creating an erasure coded pool and a replicated one is slightly different. You can use Sebastian's guide to create/manage the osd tree, but you should follow this guide http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool. I'm not sure (i.e. I never tried) to create a EC pool the way you did. The normal replicated ones do work like this. On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote: I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fixing a crushmap
Oh, and I don't yet have any important data here, so I'm not worried about losing anything at this point. I just need to get my cluster happy again so I can play with it some more. On Fri, Feb 20, 2015 at 11:00 AM, Kyle Hutson kylehut...@ksu.edu wrote: Here was the process I went through. 1) I created an EC pool which created ruleset 1 2) I edited the crushmap to approximately its current form 3) I discovered my previous EC pool wasn't doing what I meant for it to do, so I deleted it. 4) I created a new EC pool with the parameters I wanted and told it to use ruleset 3 On Fri, Feb 20, 2015 at 10:55 AM, Luis Periquito periqu...@gmail.com wrote: The process of creating an erasure coded pool and a replicated one is slightly different. You can use Sebastian's guide to create/manage the osd tree, but you should follow this guide http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool. I'm not sure (i.e. I never tried) to create a EC pool the way you did. The normal replicated ones do work like this. On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote: I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-osd pegging CPU on giant, no snapshots involved this time
On 02/19/2015 10:56 AM, Florian Haas wrote: On Wed, Feb 18, 2015 at 10:27 PM, Florian Haas flor...@hastexo.com wrote: On Wed, Feb 18, 2015 at 9:32 PM, Mark Nelson mnel...@redhat.com wrote: On 02/18/2015 02:19 PM, Florian Haas wrote: Hey everyone, I must confess I'm still not fully understanding this problem and don't exactly know where to start digging deeper, but perhaps other users have seen this and/or it rings a bell. System info: Ceph giant on CentOS 7; approx. 240 OSDs, 6 pools using 2 different rulesets where the problem applies to hosts and PGs using a bog-standard default crushmap. Symptom: out of the blue, ceph-osd processes on a single OSD node start going to 100% CPU utilization. The problems turns so bad that the machine is effectively becoming CPU bound and can't cope with any client requests anymore. Stopping and restarting all OSDs brings the problem right back, as does rebooting the machine — right after ceph-osd processes start, CPU utilization shoots up again. Stopping and marking out several OSDs on the machine makes the problem go away but obviously causes massive backfilling. All the logs show while CPU utilization is implausibly high are slow requests (which would be expected in a system that can barely do anything). Now I've seen issues like this before on dumpling and firefly, but besides the fact that they have all been addressed and should now be fixed, they always involved the prior mass removal of RBD snapshots. This system only used a handful of snapshots in testing, and is presently not using any snapshots at all. I'll be spending some time looking for clues in the log files of the OSDs that were shut down which caused the problem to go away, but if this sounds familiar to anyone willing to offer clues, I'd be more than interested. :) Thanks! Hi Florian, Does a quick perf top tell you anything useful? Hi Mark, Unfortunately, quite the contrary -- but this might actually provide a clue to the underlying issue. So the CPU pegging issue isn't currently present, so the perf top data wouldn't be conclusive until the issue is reproduced. But: merely running perf top on this host, which currently only has 2 active OSDs, renders the host unresponsive. Corresponding dmesg snippet: [Wed Feb 18 20:53:42 2015] hrtimer: interrupt took 2243820 ns [Wed Feb 18 20:53:49 2015] [ cut here ] [Wed Feb 18 20:53:49 2015] WARNING: at arch/x86/kernel/cpu/perf_event.c:1074 x86_pmu_start+0xc6/0x100() [Wed Feb 18 20:53:49 2015] Modules linked in: ipmi_si binfmt_misc mpt3sas mptctl mptbase dell_rbu 8021q garp stp mrp llc sg ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables xfs vfat fat iTCO_w dt iTCO_vendor_support dcdbas coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core lpc_ich mfd_core mei_me mei ipmi_devintf shpchp wmi ipmi_msghandler acpi_power_meter acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm bnx2 x drm mpt2sas i2c_core raid_class mdio scsi_transport_sas libcrc32c dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_si] [Wed Feb 18 20:53:49 2015] CPU: 0 PID: 12381 Comm: dsm_sa_datamgrd Not tainted 3.10.0-123.20.1.el7.x86_64 #1 [Wed Feb 18 20:53:49 2015] Hardware name: Dell Inc. PowerEdge R720xd/0020HJ, BIOS 2.2.2 01/16/2014 [Wed Feb 18 20:53:49 2015] 50de8931 880fef003d40 815e2b0c [Wed Feb 18 20:53:49 2015] 880fef003d78 8105dee1 880c316a7400 880fef00b9e0 [Wed Feb 18 20:53:49 2015] 880fef016db0 880dbaa896c0 880fef003d88 [Wed Feb 18 20:53:49 2015] Call Trace: [Wed Feb 18 20:53:49 2015] IRQ [815e2b0c] dump_stack+0x19/0x1b [Wed Feb 18 20:53:49 2015] [8105dee1] warn_slowpath_common+0x61/0x80 [Wed Feb 18 20:53:49 2015] [8105e00a] warn_slowpath_null+0x1a/0x20 [Wed Feb 18 20:53:49 2015] [81023706] x86_pmu_start+0xc6/0x100 [Wed Feb 18 20:53:49 2015] [81136128] perf_adjust_freq_unthr_context.part.79+0x198/0x1b0 [Wed Feb 18 20:53:49 2015] [811363d6] perf_event_task_tick+0xb6/0xf0 [Wed Feb 18 20:53:49 2015] [810967e5] scheduler_tick+0xd5/0x150 [Wed Feb 18 20:53:49 2015] [8106fe86] update_process_times+0x66/0x80 [Wed Feb 18 20:53:49 2015] [810be055] tick_sched_handle.isra.16+0x25/0x60 [Wed Feb 18 20:53:49 2015] [810be0d1] tick_sched_timer+0x41/0x60 [Wed Feb 18 20:53:49 2015] [81089a57] __run_hrtimer+0x77/0x1d0 [Wed Feb 18 20:53:49 2015] [810be090] ? tick_sched_handle.isra.16+0x60/0x60 [Wed Feb 18 20:53:49 2015] [8108a297] hrtimer_interrupt+0xf7/0x240 [Wed Feb 18 20:53:49 2015] [81039717] local_apic_timer_interrupt+0x37/0x60 [Wed Feb 18 20:53:49 2015]
[ceph-users] Fixing a crushmap
I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fixing a crushmap
The process of creating an erasure coded pool and a replicated one is slightly different. You can use Sebastian's guide to create/manage the osd tree, but you should follow this guide http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool. I'm not sure (i.e. I never tried) to create a EC pool the way you did. The normal replicated ones do work like this. On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote: I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool
Hi Tom, On 20/02/2015 22:59, Deneau, Tom wrote: Is it possible to run an erasure coded pool using default k=2, m=2 profile on a single node? (this is just for functionality testing). The single node has 3 OSDs. Replicated pools run fine. For k=2 m=2 to work you need four (k+m) OSDs. As long the the crush rule allows it, you can have them on the same host. Cheers ceph.conf does contain: osd crush chooseleaf type = 0 -- Tom Deneau ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Power failure recovery woes (fwd)
Should I infer from the silence that there is no way to recover from the FAILED assert(last_e.version.version e.version.version) errors? Thanks, Jeff - Forwarded message from Jeff j...@usedmoviefinder.com - Date: Tue, 17 Feb 2015 09:16:33 -0500 From: Jeff j...@usedmoviefinder.com To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Power failure recovery woes Some additional information/questions: Here is the output of ceph osd tree Some of the down OSD's are actually running, but are down. For example osd.1: root 30158 8.6 12.7 1542860 781288 ? Ssl 07:47 4:40 /usr/bin/ceph-osd --cluster=ceph -i 0 -f Is there any way to get the cluster to recognize them as being up? osd-1 has the FAILED assert(last_e.version.version e.version.version) errors. Thanks, Jeff # idweight type name up/down reweight -1 10.22 root default -2 2.72host ceph1 0 0.91osd.0 up 1 1 0.91osd.1 down0 2 0.9 osd.2 down0 -3 1.82host ceph2 3 0.91osd.3 down0 4 0.91osd.4 down0 -4 2.04host ceph3 5 0.68osd.5 up 1 6 0.68osd.6 up 1 7 0.68osd.7 up 1 8 0.68osd.8 down0 -5 1.82host ceph4 9 0.91osd.9 up 1 10 0.91osd.10 down0 -6 1.82host ceph5 11 0.91osd.11 up 1 12 0.91osd.12 up 1 On 2/17/2015 8:28 AM, Jeff wrote: Original Message Subject: Re: [ceph-users] Power failure recovery woes Date: 2015-02-17 04:23 From: Udo Lembke ulem...@polarzone.de To: Jeff j...@usedmoviefinder.com, ceph-users@lists.ceph.com Hi Jeff, is the osd /var/lib/ceph/osd/ceph-2 mounted? If not, does it helps, if you mounted the osd and start with service ceph start osd.2 ?? Udo Am 17.02.2015 09:54, schrieb Jeff: Hi, We had a nasty power failure yesterday and even with UPS's our small (5 node, 12 OSD) cluster is having problems recovering. We are running ceph 0.87 3 of our OSD's are down consistently (others stop and are restartable, but our cluster is so slow that almost everything we do times out). We are seeing errors like this on the OSD's that never run: ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) Operation not permitted We are seeing errors like these of the OSD's that run some of the time: osd/PGLog.cc: 844: FAILED assert(last_e.version.version e.version.version) common/HeartbeatMap.cc: 79: FAILED assert(0 == hit suicide timeout) Does anyone have any suggestions on how to recover our cluster? Thanks! Jeff ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com - End forwarded message - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Minor version difference between monitors and OSDs
On Thu, Feb 19, 2015 at 8:30 PM, Christian Balzer ch...@gol.com wrote: Hello, I have a cluster currently at 0.80.1 and would like to upgrade it to 0.80.7 (Debian as you can guess), but for a number of reasons I can't really do it all at the same time. In particular I would like to upgrade the primary monitor node first and the secondary ones as well as the OSDs later. Now my understanding and hope is that unless I change the config to add features that aren't present in 0.80.1, things should work just fine, especially given the main release note blurb about 0.80.7: I don't think we test upgrades between that particular combination of versions, but as a matter of policy there shouldn't be any issues between point releases. The release note is referring to the issue described at http://tracker.ceph.com/issues/9419, which is indeed for pre-Firefly to Firefly upgrades. :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD not marked as down or out
That's pretty strange, especially since the monitor is getting the failure reports. What version are you running? Can you bump up the monitor debugging and provide its output from around that time? -Greg On Fri, Feb 20, 2015 at 3:26 AM, Sudarshan Pathak sushan@gmail.com wrote: Hello everyone, I have a cluster running with OpenStack. It has 6 OSD (3 in each 2 different locations). Each pool has 3 replication size with 2 copy in primary location and 1 copy at secondary location. Everything is running as expected but the osd are not marked as down when I poweroff a OSD server. It has been around an hour. I tried changing the heartbeat settings too. Can someone point me in right direction. OSD 0 log = 2015-02-20 16:20:14.009723 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:54.009720) 2015-02-20 16:20:15.009908 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:55.009907) 2015-02-20 16:20:16.010123 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:56.010119) 2015-02-20 16:20:16.648167 7f3fc9a76700 -1 osd.0 451 heartbeat_check: no reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20 16:15:54.607854 (cutoff 2015-02-20 16:19:56.648165) Ceph monitor log 2015-02-20 16:49:16.831548 7f416e4aa700 1 mon.storage1@1(leader).osd e455 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.4 192.168.100.35:6800/1305 is reporting failure:1 2015-02-20 16:49:16.831593 7f416e4aa700 0 log_channel(cluster) log [DBG] : osd.2 192.168.100.33:6800/24431 reported failed by osd.4 192.168.100.35:6800/1305 2015-02-20 16:49:17.080314 7f416e4aa700 1 mon.storage1@1(leader).osd e455 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.3 192.168.100.34:6800/1358 is reporting failure:1 2015-02-20 16:49:17.080527 7f416e4aa700 0 log_channel(cluster) log [DBG] : osd.2 192.168.100.33:6800/24431 reported failed by osd.3 192.168.100.34:6800/1358 2015-02-20 16:49:17.420859 7f416e4aa700 1 mon.storage1@1(leader).osd e455 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.5 192.168.100.36:6800/1359 is reporting failure:1 #ceph osd stat osdmap e455: 6 osds: 6 up, 6 in #ceph -s cluster c8a5975f-4c86-4cfe-a91b-fac9f3126afc health HEALTH_WARN 528 pgs peering; 528 pgs stuck inactive; 528 pgs stuck unclean; 1 requests are blocked 32 sec; 1 mons down, quorum 1,2,3,4 storage1,storage2,compute3,compute4 monmap e1: 5 mons at {admin=192.168.100.39:6789/0,compute3=192.168.100.133:6789/0,compute4=192.168.100.134:6789/0,storage1=192.168.100.120:6789/0,storage2=192.168.100.121:6789/0}, election epoch 132, quorum 1,2,3,4 storage1,storage2,compute3,compute4 osdmap e455: 6 osds: 6 up, 6 in pgmap v48474: 3650 pgs, 19 pools, 27324 MB data, 4420 objects 82443 MB used, 2682 GB / 2763 GB avail 3122 active+clean 528 remapped+peering Ceph.conf file [global] fsid = c8a5975f-4c86-4cfe-a91b-fac9f3126afc mon_initial_members = admin, storage1, storage2, compute3, compute4 mon_host = 192.168.100.39,192.168.100.120,192.168.100.121,192.168.100.133,192.168.100.134 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd pool default size = 3 osd pool default min size = 3 osd pool default pg num = 300 osd pool default pgp num = 300 public network = 192.168.100.0/24 rgw print continue = false rgw enable ops log = false mon osd report timeout = 60 mon osd down out interval = 30 mon osd min down reports = 2 osd heartbeat grace = 10 osd mon heartbeat interval = 20 osd mon report interval max = 60 osd mon ack timeout = 15 mon osd min down reports = 2 Regards, Sudarshan Pathak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] running giant/hammer mds with firefly osds
On Fri, Feb 20, 2015 at 7:56 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, Feb 20, 2015 at 3:50 AM, Luis Periquito periqu...@gmail.com wrote: Hi Dan, I remember http://tracker.ceph.com/issues/9945 introducing some issues with running cephfs between different versions of giant/firefly. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html Hmm, yeah, that's been fixed for a while but is still waiting to go out in the next point release. :( Beyond this bug, although the MDS doesn't have any new OSD dependencies that could break things, we don't test cross-version stuff like that at all except during upgrades. Some minimal testing on your side should be enough to make sure it works, but if I were you I'd try it on a test cluster first — the MDS is reporting a lot more to the monitors in Giant and Hammer than it did in Firefly, and everything should be good but there might be issues lurking in the compatibility checks there. -Greg Thanks Greg, I'll definitely keep this on a test instance. I'll report back if I find anything interesting... Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Power failure recovery woes (fwd)
You can try searching the archives and tracker.ceph.com for hints about repairing these issues, but your disk stores have definitely been corrupted and it's likely to be an adventure. I'd recommend examining your local storage stack underneath Ceph and figuring out which part was ignoring barriers. -Greg On Fri, Feb 20, 2015 at 10:39 AM, Jeff j...@usedmoviefinder.com wrote: Should I infer from the silence that there is no way to recover from the FAILED assert(last_e.version.version e.version.version) errors? Thanks, Jeff - Forwarded message from Jeff j...@usedmoviefinder.com - Date: Tue, 17 Feb 2015 09:16:33 -0500 From: Jeff j...@usedmoviefinder.com To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Power failure recovery woes Some additional information/questions: Here is the output of ceph osd tree Some of the down OSD's are actually running, but are down. For example osd.1: root 30158 8.6 12.7 1542860 781288 ? Ssl 07:47 4:40 /usr/bin/ceph-osd --cluster=ceph -i 0 -f Is there any way to get the cluster to recognize them as being up? osd-1 has the FAILED assert(last_e.version.version e.version.version) errors. Thanks, Jeff # idweight type name up/down reweight -1 10.22 root default -2 2.72host ceph1 0 0.91osd.0 up 1 1 0.91osd.1 down0 2 0.9 osd.2 down0 -3 1.82host ceph2 3 0.91osd.3 down0 4 0.91osd.4 down0 -4 2.04host ceph3 5 0.68osd.5 up 1 6 0.68osd.6 up 1 7 0.68osd.7 up 1 8 0.68osd.8 down0 -5 1.82host ceph4 9 0.91osd.9 up 1 10 0.91osd.10 down0 -6 1.82host ceph5 11 0.91osd.11 up 1 12 0.91osd.12 up 1 On 2/17/2015 8:28 AM, Jeff wrote: Original Message Subject: Re: [ceph-users] Power failure recovery woes Date: 2015-02-17 04:23 From: Udo Lembke ulem...@polarzone.de To: Jeff j...@usedmoviefinder.com, ceph-users@lists.ceph.com Hi Jeff, is the osd /var/lib/ceph/osd/ceph-2 mounted? If not, does it helps, if you mounted the osd and start with service ceph start osd.2 ?? Udo Am 17.02.2015 09:54, schrieb Jeff: Hi, We had a nasty power failure yesterday and even with UPS's our small (5 node, 12 OSD) cluster is having problems recovering. We are running ceph 0.87 3 of our OSD's are down consistently (others stop and are restartable, but our cluster is so slow that almost everything we do times out). We are seeing errors like this on the OSD's that never run: ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) Operation not permitted We are seeing errors like these of the OSD's that run some of the time: osd/PGLog.cc: 844: FAILED assert(last_e.version.version e.version.version) common/HeartbeatMap.cc: 79: FAILED assert(0 == hit suicide timeout) Does anyone have any suggestions on how to recover our cluster? Thanks! Jeff ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com - End forwarded message - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cluster never reaching clean after osd out
I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel 3.16.0-0.bpo.4-amd64. For testing I did a ~# ceph osd out 20 from a clean state. Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean to get up and then go down to about 11. Short after that the cluster keeps stuck forever in this state: health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded (0.265%); 3691/169647 objects misplaced (2.176%) According to the documentation at http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should reach a clean state after an osd out. What am I doing wrong? Below some config and command outputs: ~# ceph osd tree # id weight type name up/down reweight -1 76.02 root default -2 25.34 host ve51 0 3.62 osd.0 up 1 3 3.62 osd.3 up 1 6 3.62 osd.6 up 1 9 3.62 osd.9 up 1 12 3.62 osd.12 up 1 15 3.62 osd.15 up 1 18 3.62 osd.18 up 1 -3 25.34 host ve52 1 3.62 osd.1 up 1 4 3.62 osd.4 up 1 7 3.62 osd.7 up 1 10 3.62 osd.10 up 1 13 3.62 osd.13 up 1 16 3.62 osd.16 up 1 19 3.62 osd.19 up 1 -4 25.34 host ve53 2 3.62 osd.2 up 1 5 3.62 osd.5 up 1 8 3.62 osd.8 up 1 11 3.62 osd.11 up 1 14 3.62 osd.14 up 1 17 3.62 osd.17 up 1 20 3.62 osd.20 up 1 == ~# cat ceph.conf [global] fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196 public_network = 192.168.10.0/24 cluster_network = 192.168.10.0/24 mon_initial_members = ve51, ve52, ve53 mon_host = 192.168.10.51,192.168.10.52,192.168.10.53 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true mon_osd_down_out_subtree_limit = host osd_pool_default_size=3 osd_pool_default_min_size=2 [osd] osd_journal_size = 2 osd_mount_options_xfs = noatime,nodiratime,logbsize=256k,logbufs=8,inode64 == ~# ceph -s cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196 health HEALTH_OK monmap e1: 3 mons at {ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53:6789/0}, election epoch 28, quorum 0,1,2 ve51,ve52,ve53 osdmap e1353: 21 osds: 21 up, 21 in pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects 658 GB used, 77139 GB / 77797 GB avail 2048 active+clean == ~# cat crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 device 16 osd.16 device 17 osd.17 device 18 osd.18 device 19 osd.19 device 20 osd.20 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ve51 { id -2 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.0 weight 3.620 item osd.3 weight 3.620 item osd.6 weight 3.620 item osd.9 weight 3.620 item osd.12 weight 3.620 item osd.15 weight 3.620 item osd.18 weight 3.620 } host ve52 { id -3 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.1 weight 3.620 item osd.4 weight 3.620 item osd.7 weight 3.620 item osd.10 weight 3.620 item osd.13 weight 3.620 item osd.16 weight 3.620 item osd.19 weight 3.620 } host ve53 { id -4 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.2 weight 3.620 item osd.5 weight 3.620 item osd.8 weight 3.620 item osd.11 weight 3.620 item osd.14 weight 3.620 item osd.17 weight 3.620 item osd.20 weight 3.620 } root default { id -1 # do not change unnecessarily # weight 76.020 alg straw hash 0 # rjenkins1 item ve51 weight 25.340 item ve52 weight 25.340 item
Re: [ceph-users] running giant/hammer mds with firefly osds
On Fri, Feb 20, 2015 at 3:50 AM, Luis Periquito periqu...@gmail.com wrote: Hi Dan, I remember http://tracker.ceph.com/issues/9945 introducing some issues with running cephfs between different versions of giant/firefly. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html Hmm, yeah, that's been fixed for a while but is still waiting to go out in the next point release. :( Beyond this bug, although the MDS doesn't have any new OSD dependencies that could break things, we don't test cross-version stuff like that at all except during upgrades. Some minimal testing on your side should be enough to make sure it works, but if I were you I'd try it on a test cluster first — the MDS is reporting a lot more to the monitors in Giant and Hammer than it did in Firefly, and everything should be good but there might be issues lurking in the compatibility checks there. -Greg So if you upgrade please be aware that you'll also have to update the clients. On Fri, Feb 20, 2015 at 10:33 AM, Dan van der Ster d...@vanderster.com wrote: Hi all, Back in the dumpling days, we were able to run the emperor MDS with dumpling OSDs -- this was an improvement over the dumpling MDS. Now we have stable firefly OSDs, but I was wondering if we can reap some of the recent CephFS developments by running a giant or ~hammer MDS with our firefly OSDs. Did anyone try that yet? Best Regards, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cluster never reaching clean after osd out
I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel3.16.0-0.bpo.4-amd64. For testing I did a ~# ceph osd out 20 from a clean state. Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean to get up and then go down to about 11. Short after that the cluster keeps stuck forever in this state: health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded (0.265%); 3691/169647 objects misplaced (2.176%) According to the documentation athttp://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should reach a clean state after an osd out. What am I doing wrong? Below some config and command outputs: ~# ceph osd tree # id weight type nameup/down reweight -1 76.02 root default -2 25.34 host ve51 03.62 osd.0 up 1 33.62 osd.3 up 1 63.62 osd.6 up 1 93.62 osd.9 up 1 12 3.62 osd.12 up 1 15 3.62 osd.15 up 1 18 3.62 osd.18 up 1 -3 25.34 host ve52 13.62 osd.1 up 1 43.62 osd.4 up 1 73.62 osd.7 up 1 10 3.62 osd.10 up 1 13 3.62 osd.13 up 1 16 3.62 osd.16 up 1 19 3.62 osd.19 up 1 -4 25.34 host ve53 23.62 osd.2 up 1 53.62 osd.5 up 1 83.62 osd.8 up 1 11 3.62 osd.11 up 1 14 3.62 osd.14 up 1 17 3.62 osd.17 up 1 20 3.62 osd.20 up 1 == ~# cat ceph.conf [global] fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196 public_network = 192.168.10.0/24 cluster_network = 192.168.10.0/24 mon_initial_members = ve51, ve52, ve53 mon_host = 192.168.10.51,192.168.10.52,192.168.10.53 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true mon_osd_down_out_subtree_limit = host osd_pool_default_size=3 osd_pool_default_min_size=2 [osd] osd_journal_size = 2 osd_mount_options_xfs = noatime,nodiratime,logbsize=256k,logbufs=8,inode64 == ~# ceph -s cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196 health HEALTH_OK monmap e1: 3 mons at {ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53:6789/0}, election epoch 28, quorum 0,1,2 ve51,ve52,ve53 osdmap e1353: 21 osds: 21 up, 21 in pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects 658 GB used, 77139 GB / 77797 GB avail 2048 active+clean == ~# cat crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 device 16 osd.16 device 17 osd.17 device 18 osd.18 device 19 osd.19 device 20 osd.20 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ve51 { id -2 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.0 weight 3.620 item osd.3 weight 3.620 item osd.6 weight 3.620 item osd.9 weight 3.620 item osd.12 weight 3.620 item osd.15 weight 3.620 item osd.18 weight 3.620 } host ve52 { id -3 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.1 weight 3.620 item osd.4 weight 3.620 item osd.7 weight 3.620 item osd.10 weight 3.620 item osd.13 weight 3.620 item osd.16 weight 3.620 item osd.19 weight 3.620 } host ve53 { id -4 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.2 weight 3.620 item osd.5 weight 3.620 item osd.8 weight 3.620 item osd.11 weight 3.620 item osd.14 weight 3.620 item osd.17 weight 3.620 item osd.20 weight 3.620 } root default { id -1 # do not change unnecessarily # weight 76.020 alg straw hash 0 # rjenkins1 item ve51 weight 25.340 item ve52 weight 25.340 item ve53 weight 25.340 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com