[ceph-users] OSD not marked as down or out

2015-02-20 Thread Sudarshan Pathak
Hello everyone,

I have a cluster running with OpenStack. It has 6 OSD (3 in each 2
different locations). Each pool has 3 replication size with 2 copy in
primary location and 1 copy at secondary location.

Everything is running as expected but the osd are not marked as down when I
poweroff a OSD server. It has been around an hour.
I tried changing the heartbeat settings too.

Can someone point me in right direction.

OSD 0 log
=
2015-02-20 16:20:14.009723 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:54.009720)
2015-02-20 16:20:15.009908 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:55.009907)
2015-02-20 16:20:16.010123 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:56.010119)
2015-02-20 16:20:16.648167 7f3fc9a76700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:56.648165)


Ceph monitor log

2015-02-20 16:49:16.831548 7f416e4aa700  1 mon.storage1@1(leader).osd e455
prepare_failure osd.2 192.168.100.33:6800/24431 from osd.4
192.168.100.35:6800/1305 is reporting failure:1
2015-02-20 16:49:16.831593 7f416e4aa700  0 log_channel(cluster) log [DBG] :
osd.2 192.168.100.33:6800/24431 reported failed by osd.4
192.168.100.35:6800/1305
2015-02-20 16:49:17.080314 7f416e4aa700  1 mon.storage1@1(leader).osd e455
prepare_failure osd.2 192.168.100.33:6800/24431 from osd.3
192.168.100.34:6800/1358 is reporting failure:1
2015-02-20 16:49:17.080527 7f416e4aa700  0 log_channel(cluster) log [DBG] :
osd.2 192.168.100.33:6800/24431 reported failed by osd.3
192.168.100.34:6800/1358
2015-02-20 16:49:17.420859 7f416e4aa700  1 mon.storage1@1(leader).osd e455
prepare_failure osd.2 192.168.100.33:6800/24431 from osd.5
192.168.100.36:6800/1359 is reporting failure:1


#ceph osd stat
 osdmap e455: 6 osds: 6 up, 6 in


#ceph -s
cluster c8a5975f-4c86-4cfe-a91b-fac9f3126afc
 health HEALTH_WARN 528 pgs peering; 528 pgs stuck inactive; 528 pgs
stuck unclean; 1 requests are blocked  32 sec; 1 mons down, quorum 1,2,3,4
storage1,storage2,compute3,compute4
 monmap e1: 5 mons at {admin=
192.168.100.39:6789/0,compute3=192.168.100.133:6789/0,compute4=192.168.100.134:6789/0,storage1=192.168.100.120:6789/0,storage2=192.168.100.121:6789/0},
election epoch 132, quorum 1,2,3,4 storage1,storage2,compute3,compute4
 osdmap e455: 6 osds: 6 up, 6 in
  pgmap v48474: 3650 pgs, 19 pools, 27324 MB data, 4420 objects
82443 MB used, 2682 GB / 2763 GB avail
3122 active+clean
 528 remapped+peering



Ceph.conf file

[global]
fsid = c8a5975f-4c86-4cfe-a91b-fac9f3126afc
mon_initial_members = admin, storage1, storage2, compute3, compute4
mon_host =
192.168.100.39,192.168.100.120,192.168.100.121,192.168.100.133,192.168.100.134
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

osd pool default size = 3
osd pool default min size = 3

osd pool default pg num = 300
osd pool default pgp num = 300

public network = 192.168.100.0/24

rgw print continue = false
rgw enable ops log = false

mon osd report timeout = 60
mon osd down out interval = 30
mon osd min down reports = 2

osd heartbeat grace = 10
osd mon heartbeat interval = 20
osd mon report interval max = 60
osd mon ack timeout = 15

mon osd min down reports = 2


Regards,
Sudarshan Pathak
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] running giant/hammer mds with firefly osds

2015-02-20 Thread Dan van der Ster
Hi all,

Back in the dumpling days, we were able to run the emperor MDS with
dumpling OSDs -- this was an improvement over the dumpling MDS.

Now we have stable firefly OSDs, but I was wondering if we can reap
some of the recent CephFS developments by running a giant or ~hammer
MDS with our firefly OSDs. Did anyone try that yet?

Best Regards, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] running giant/hammer mds with firefly osds

2015-02-20 Thread Luis Periquito
Hi Dan,

I remember http://tracker.ceph.com/issues/9945 introducing some issues with
running cephfs between different versions of giant/firefly.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html

So if you upgrade please be aware that you'll also have to update the
clients.

On Fri, Feb 20, 2015 at 10:33 AM, Dan van der Ster d...@vanderster.com
wrote:

 Hi all,

 Back in the dumpling days, we were able to run the emperor MDS with
 dumpling OSDs -- this was an improvement over the dumpling MDS.

 Now we have stable firefly OSDs, but I was wondering if we can reap
 some of the recent CephFS developments by running a giant or ~hammer
 MDS with our firefly OSDs. Did anyone try that yet?

 Best Regards, Dan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] new ssd intel s3610, has somebody tested them ?

2015-02-20 Thread Dan van der Ster
Interesting, thanks for the link.
I hope the quality on the 3610/3710 is as good as the 3700... we
haven't yet seen a single failure in production.

Cheers, Dan



On Fri, Feb 20, 2015 at 8:06 AM, Alexandre DERUMIER aderum...@odiso.com wrote:
 Hi,

 Intel has just released new ssd s3610:

 http://www.anandtech.com/show/8954/intel-launches-ssd-dc-s3610-s3710-enterprise-ssds

 endurance is 10x bigger than 3500, for 10% cost addition.

 Has somebody already tested them ?

 Regards,

 Alexandre
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] new ssd intel s3610, has somebody tested them ?

2015-02-20 Thread Christian Balzer

Hello,

On Fri, 20 Feb 2015 09:30:56 +0100 Dan van der Ster wrote:

 Interesting, thanks for the link.

Interesting indeed, more for a non-Ceph project of mine, but still. ^o^

 I hope the quality on the 3610/3710 is as good as the 3700... we
 haven't yet seen a single failure in production.
 

Same here, same goes for the consumer models (5xx).
Those will naturally wear out faster, but none has failed so far and the
most I got to wear out some was down to 83% after 2 hours of uptime. ^^

Christian

 Cheers, Dan
 
 
 
 On Fri, Feb 20, 2015 at 8:06 AM, Alexandre DERUMIER
 aderum...@odiso.com wrote:
  Hi,
 
  Intel has just released new ssd s3610:
 
  http://www.anandtech.com/show/8954/intel-launches-ssd-dc-s3610-s3710-enterprise-ssds
 
  endurance is 10x bigger than 3500, for 10% cost addition.
 
  Has somebody already tested them ?
 
  Regards,
 
  Alexandre
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unsubscribe

2015-02-20 Thread Konstantin Khatskevich

unsubscribe

--
Best regards,
Konstantin Khatskevich

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] erasure coded pool

2015-02-20 Thread Deneau, Tom
Is it possible to run an erasure coded pool using default k=2, m=2 profile on a 
single node?
(this is just for functionality testing). The single node has 3 OSDs. 
Replicated pools run fine.

ceph.conf does contain:
   osd crush chooseleaf type = 0


-- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and data locality?

2015-02-20 Thread Jake Kugel
Okay thanks for pointing me in the right direction.  From a quick read I 
think this will work but will take a look in detail.  Thanks!

Jake

On Tue, Feb 17, 2015 at 3:16 PM, Gregory Farnum wrote:

 On Tue, Feb 17, 2015 at 10:36 AM, Jake Kugel jkugel@... wrote:
  Hi,
 
  I'm just starting to look at Ceph and CephFS.  I see that Ceph 
supports
  dynamic object interfaces to allow some processing of object data on 
the
  same node where the data is stored [1].  This might be a naive 
question,
  but is there any way to get data locality when using CephFS? For 
example,
  somehow arrange for parts of the filesystem to reside on OSDs on same
  system using CephFS client?
 
 It's unrelated to the in-place RADOS class computation, but you can do
 some intelligent placement by having specialized CRUSH rules and
 making use of the CephFS' data layouts. Check the docs! :)
 -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow

2015-02-20 Thread Xu (Simon) Chen
Any update on this matter? I've been thinking of upgrading from 0.80.7 to
0.80.8 - lucky that I see this thread first...

On Thu, Feb 12, 2015 at 10:39 PM, 杨万元 yangwanyuan8...@gmail.com wrote:

 thanks very much for your advice .
 yes,as you said,disabled the rbd_cache will improve the read request,but
 if i disabled rbd_cache, the randwrite request will be worse.  so this
 method maybe can not solve my problem, is it ?

 In addition , I also test the 0.80.6 and 0.80.7 librbd,they are as good as
 0.80.5 performance ,  so  maybe can sure this problem is cause from 0.80.8


 2015-02-12 19:33 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com:

 Hi,
 Can you test with disabling rbd_cache ?

 I remember of a bug detected in giant, not sure it's also the case for
 fireflt

 This was this tracker:

 http://tracker.ceph.com/issues/9513

 But It has been solved and backported to firefly.

 Also, can you test 0.80.6 and 0.80.7 ?







 - Mail original -
 De: killingwolf killingw...@qq.com
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Jeudi 12 Février 2015 12:16:32
 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read
 requestbecome too slow

 I have this problems too , Help!

 -- 原始邮件 --
 发件人: 杨万元;yangwanyuan8...@gmail.com;
 发送时间: 2015年2月12日(星期四) 中午11:14
 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com;
 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome
 too slow

 Hello!
 We use Ceph+Openstack in our private cloud. Recently we upgrade our
 centos6.5 based cluster from Ceph Emperor to Ceph Firefly.
 At first,we use redhat yum repo epel to upgrade, this Ceph's version is
 0.80.5. First upgrade monitor,then osd,last client. when we complete this
 upgrade, we boot a VM on the cluster,then use fio to test the io
 performance. The io performance is as better as before. Everything is ok!
 Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we
 reboot the VM to load the newest librbd. after that we also use fio to test
 the io performance .then we find the randwrite and write is as good as
 before.but the randread and read is become worse, randwrite's iops from
 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s
 to 115MB/s . then I downgrade the ceph client version from 0.80.8 to
 0.80.5, then the reslut become normal.
 So I think maybe something cause about librbd. I compare the 0.80.8
 release notes with 0.80.5 (
 http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just
 find this change in 0.80.8 is something about read request : librbd: cap
 memory utilization for read requests (Jason Dillaman) . Who can explain
 this?


 My ceph cluster is 400osd,5mons :
 ceph -s
 health HEALTH_OK
 monmap e11: 5 mons at {BJ-M1-Cloud71=
 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0
 }, election epoch 198, quorum 0,1,2,3,4
 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85
 osdmap e120157: 400 osds: 400 up, 400 in
 pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects
 41084 GB used, 323 TB / 363 TB avail
 29288 active+clean
 client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s


 The follwing is my ceph client conf :
 [global]
 auth_service_required = cephx
 filestore_xattr_use_omap = true
 auth_client_required = cephx
 auth_cluster_required = cephx
 mon_host =
 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73
 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55,
 ZR-F8-Cloud58, ZR-F9-Cloud73
 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980
 mon osd full ratio = .85
 mon osd nearfull ratio = .75
 public network = 172.29.204.0/24
 mon warn on legacy crush tunables = false

 [osd]
 osd op threads = 12
 filestore journal writeahead = true
 filestore merge threshold = 40
 filestore split multiple = 8

 [client]
 rbd cache = true
 rbd cache writethrough until flush = false
 rbd cache size = 67108864
 rbd cache max dirty = 50331648
 rbd cache target dirty = 33554432

 [client.cinder]
 admin socket = /var/run/ceph/rbd-$pid.asok



 My VM is 8core16G,we use fio scripts is :
 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G
 -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200
 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G
 -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200
 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G
 -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200
 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G
 -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200

 The following is the io test result
 ceph client verison :0.80.5
 read: bw= 430MB
 write: bw=420MB
 randread: iops= 4875 latency=65ms
 randwrite: iops=6844 latency=46ms

 ceph client verison :0.80.8
 read: bw= 115MB
 write: bw=480MB
 randread: iops= 381 latency=83ms
 randwrite: 

Re: [ceph-users] initially conf calamari to know about my Ceph cluster(s)

2015-02-20 Thread Dan Mick
By the way, you may want to put these sorts of questions on
ceph-calam...@lists.ceph.com, which is specific to calamari.

On 02/16/2015 01:08 PM, Steffen Winther wrote:
 Steffen Winther ceph.user@... writes:
 
 Trying to figure out how to initially configure
 calamari clients to know about my
 Ceph Cluster(s) when such aint install through ceph.deploy
 but through Proxmox pveceph.

 Assume I possible need to copy some client admin keys and
 configure my MON hosts somehow, any pointers to doc on this?
 :) stupid me, most have been to tried after struggling with the built...
 
 It was just a question of finish the Karan$s guide from step 5 and
 make my salt master and minions work plus diamond.
 Now everything seems to be working,
 nice dashboard/workbench etc.
 
 
 Step 5 from
 http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html:
 
 5 Calamari would not be able to find the Ceph cluster
 and will ask to add a cluster, for this we need to add Ceph clients
 to dashboard by installing salt-minion and
 diamond packages on them.
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Calamari build in vagrants

2015-02-20 Thread Dan Mick
On 02/16/2015 12:57 PM, Steffen Winther wrote:
 Dan Mick dmick@... writes:
 

 0cbcfbaa791baa3ee25c4f1a135f005c1d568512 on the 1.2.3 branch has the
 change to yo 1.1.0.  I've just cherry-picked that to v1.3 and master.
 Do you mean that you merged 1.2.3 into master and branch 1.3?

I put just that specific commit onto v1.3 and master.  The branches may
need a little preening to be completely synced, but that commit should
be in all of them now.


 BTW I managed to clone and built branch 1.2.3 in my vagrant env.

\o/


-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster never reaching clean after osd out

2015-02-20 Thread Yves
I have a Cluster of 3 hosts, running giant on Debian wheezy and 
Backports Kernel 3.16.0-0.bpo.4-amd64.
For testing I did a 
~# ceph osd out 20
from a clean state.
Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck 
unclean to get up and then go down to about 11.
 
Short after that the cluster keeps stuck forever in this state:
health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects 
degraded (0.265%); 3691/169647 objects misplaced (2.176%)
 
According to the documentation at 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster 
should reach a clean state after an osd out.
 
What am I doing wrong?
 
 
Below some config and command outputs:
 

~# ceph osd tree
# idweight  type name   up/down reweight
-1  76.02   root default
-2  25.34   host ve51
0   3.62osd.0   up  1
3   3.62osd.3   up  1
6   3.62osd.6   up  1
9   3.62osd.9   up  1
12  3.62osd.12  up  1
15  3.62osd.15  up  1
18  3.62osd.18  up  1
-3  25.34   host ve52
1   3.62osd.1   up  1
4   3.62osd.4   up  1
7   3.62osd.7   up  1
10  3.62osd.10  up  1
13  3.62osd.13  up  1
16  3.62osd.16  up  1
19  3.62osd.19  up  1
-4  25.34   host ve53
2   3.62osd.2   up  1
5   3.62osd.5   up  1
8   3.62osd.8   up  1
11  3.62osd.11  up  1
14  3.62osd.14  up  1
17  3.62osd.17  up  1
20  3.62osd.20  up  1
==
~# cat ceph.conf
[global]
fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196
public_network = 192.168.10.0/24
cluster_network = 192.168.10.0/24
mon_initial_members = ve51, ve52, ve53
mon_host = 192.168.10.51,192.168.10.52,192.168.10.53
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
mon_osd_down_out_subtree_limit = host
osd_pool_default_size=3
osd_pool_default_min_size=2
[osd]
osd_journal_size = 2
osd_mount_options_xfs = 
noatime,nodiratime,logbsize=256k,logbufs=8,inode64
==
~# ceph -s
cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196
 health HEALTH_OK
 monmap e1: 3 mons at 
{ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53:
6789/0}, election epoch 28, quorum 0,1,2 ve51,ve52,ve53
 osdmap e1353: 21 osds: 21 up, 21 in
  pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects
658 GB used, 77139 GB / 77797 GB avail
2048 active+clean
==
~# cat crushmap
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host ve51 {
id -2   # do not change unnecessarily
# weight 25.340
alg straw
hash 0  # rjenkins1
item osd.0 weight 3.620
item osd.3 weight 3.620
item osd.6 weight 3.620
item osd.9 weight 3.620
item osd.12 weight 3.620
item osd.15 weight 3.620
item osd.18 weight 3.620
}
host ve52 {
id -3   # do not change unnecessarily
# weight 25.340
alg straw
hash 0  # rjenkins1
item osd.1 weight 3.620
item osd.4 weight 3.620
item osd.7 weight 3.620
item osd.10 weight 3.620
item osd.13 weight 3.620
item osd.16 weight 3.620
item osd.19 weight 3.620
}
host ve53 {
id -4   # do not change unnecessarily
# weight 25.340
alg straw
hash 0  # rjenkins1
item osd.2 weight 3.620
item osd.5 weight 3.620
item osd.8 weight 3.620
item osd.11 weight 3.620
item osd.14 weight 3.620
item osd.17 weight 3.620
item osd.20 weight 3.620
}
root default {
id -1   # do not change unnecessarily
# weight 76.020
alg straw
hash 0  # rjenkins1
item ve51 weight 25.340
item ve52 weight 25.340

Re: [ceph-users] Fixing a crushmap

2015-02-20 Thread Kyle Hutson
Here was the process I went through.
1) I created an EC pool which created ruleset 1
2) I edited the crushmap to approximately its current form
3) I discovered my previous EC pool wasn't doing what I meant for it to do,
so I deleted it.
4) I created a new EC pool with the parameters I wanted and told it to use
ruleset 3

On Fri, Feb 20, 2015 at 10:55 AM, Luis Periquito periqu...@gmail.com
wrote:

 The process of creating an erasure coded pool and a replicated one is
 slightly different. You can use Sebastian's guide to create/manage the osd
 tree, but you should follow this guide
 http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool.

 I'm not sure (i.e. I never tried) to create a EC pool the way you did. The
 normal replicated ones do work like this.

 On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote:

 I manually edited my crushmap, basing my changes on
 http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
 I have SSDs and HDDs in the same box and was wanting to separate them by
 ruleset. My current crushmap can be seen at http://pastie.org/9966238

 I had it installed and everything looked gooduntil I created a new
 pool. All of the new pgs are stuck in creating. I first tried creating an
 erasure-coded pool using ruleset 3, then created another pool using ruleset
 0. Same result.

 I'm not opposed to an 'RTFM' answer, so long as you can point me to the
 right one. I've seen very little documentation on crushmap rules, in
 particular.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fixing a crushmap

2015-02-20 Thread Kyle Hutson
Oh, and I don't yet have any important data here, so I'm not worried about
losing anything at this point. I just need to get my cluster happy again so
I can play with it some more.

On Fri, Feb 20, 2015 at 11:00 AM, Kyle Hutson kylehut...@ksu.edu wrote:

 Here was the process I went through.
 1) I created an EC pool which created ruleset 1
 2) I edited the crushmap to approximately its current form
 3) I discovered my previous EC pool wasn't doing what I meant for it to
 do, so I deleted it.
 4) I created a new EC pool with the parameters I wanted and told it to use
 ruleset 3

 On Fri, Feb 20, 2015 at 10:55 AM, Luis Periquito periqu...@gmail.com
 wrote:

 The process of creating an erasure coded pool and a replicated one is
 slightly different. You can use Sebastian's guide to create/manage the osd
 tree, but you should follow this guide
 http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool.

 I'm not sure (i.e. I never tried) to create a EC pool the way you did.
 The normal replicated ones do work like this.

 On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote:

 I manually edited my crushmap, basing my changes on
 http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
 I have SSDs and HDDs in the same box and was wanting to separate them by
 ruleset. My current crushmap can be seen at http://pastie.org/9966238

 I had it installed and everything looked gooduntil I created a new
 pool. All of the new pgs are stuck in creating. I first tried creating an
 erasure-coded pool using ruleset 3, then created another pool using ruleset
 0. Same result.

 I'm not opposed to an 'RTFM' answer, so long as you can point me to the
 right one. I've seen very little documentation on crushmap rules, in
 particular.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd pegging CPU on giant, no snapshots involved this time

2015-02-20 Thread Mark Nelson

On 02/19/2015 10:56 AM, Florian Haas wrote:

On Wed, Feb 18, 2015 at 10:27 PM, Florian Haas flor...@hastexo.com wrote:

On Wed, Feb 18, 2015 at 9:32 PM, Mark Nelson mnel...@redhat.com wrote:

On 02/18/2015 02:19 PM, Florian Haas wrote:


Hey everyone,

I must confess I'm still not fully understanding this problem and
don't exactly know where to start digging deeper, but perhaps other
users have seen this and/or it rings a bell.

System info: Ceph giant on CentOS 7; approx. 240 OSDs, 6 pools using 2
different rulesets where the problem applies to hosts and PGs using a
bog-standard default crushmap.

Symptom: out of the blue, ceph-osd processes on a single OSD node
start going to 100% CPU utilization. The problems turns so bad that
the machine is effectively becoming CPU bound and can't cope with any
client requests anymore. Stopping and restarting all OSDs brings the
problem right back, as does rebooting the machine — right after
ceph-osd processes start, CPU utilization shoots up again. Stopping
and marking out several OSDs on the machine makes the problem go away
but obviously causes massive backfilling. All the logs show while CPU
utilization is implausibly high are slow requests (which would be
expected in a system that can barely do anything).

Now I've seen issues like this before on dumpling and firefly, but
besides the fact that they have all been addressed and should now be
fixed, they always involved the prior mass removal of RBD snapshots.
This system only used a handful of snapshots in testing, and is
presently not using any snapshots at all.

I'll be spending some time looking for clues in the log files of the
OSDs that were shut down which caused the problem to go away, but if
this sounds familiar to anyone willing to offer clues, I'd be more
than interested. :) Thanks!



Hi Florian,

Does a quick perf top tell you anything useful?


Hi Mark,

Unfortunately, quite the contrary -- but this might actually provide a
clue to the underlying issue.

So the CPU pegging issue isn't currently present, so the perf top data
wouldn't be conclusive until the issue is reproduced. But: merely
running perf top on this host, which currently only has 2 active OSDs,
renders the host unresponsive.

Corresponding dmesg snippet:

[Wed Feb 18 20:53:42 2015] hrtimer: interrupt took 2243820 ns
[Wed Feb 18 20:53:49 2015] [ cut here ]
[Wed Feb 18 20:53:49 2015] WARNING: at
arch/x86/kernel/cpu/perf_event.c:1074 x86_pmu_start+0xc6/0x100()
[Wed Feb 18 20:53:49 2015] Modules linked in: ipmi_si binfmt_misc
mpt3sas mptctl mptbase dell_rbu 8021q garp stp mrp llc sg ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
iptable_filter ip_tables xfs vfat fat iTCO_w
dt iTCO_vendor_support dcdbas coretemp kvm_intel kvm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul
glue_helper ablk_helper cryptd pcspkr sb_edac edac_core lpc_ich
mfd_core mei_me mei ipmi_devintf
shpchp wmi ipmi_msghandler acpi_power_meter acpi_cpufreq mperf nfsd
auth_rpcgss nfs_acl lockd sunrpc ext4 mbcache jbd2 raid1 sd_mod
crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ttm bnx2
x drm mpt2sas i2c_core raid_class mdio scsi_transport_sas libcrc32c
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_si]

[Wed Feb 18 20:53:49 2015] CPU: 0 PID: 12381 Comm: dsm_sa_datamgrd Not
tainted 3.10.0-123.20.1.el7.x86_64 #1
[Wed Feb 18 20:53:49 2015] Hardware name: Dell Inc. PowerEdge
R720xd/0020HJ, BIOS 2.2.2 01/16/2014
[Wed Feb 18 20:53:49 2015]  50de8931
880fef003d40 815e2b0c
[Wed Feb 18 20:53:49 2015] 880fef003d78 8105dee1
880c316a7400 880fef00b9e0
[Wed Feb 18 20:53:49 2015]  880fef016db0
880dbaa896c0 880fef003d88
[Wed Feb 18 20:53:49 2015] Call Trace:
[Wed Feb 18 20:53:49 2015]  IRQ  [815e2b0c] dump_stack+0x19/0x1b
[Wed Feb 18 20:53:49 2015] [8105dee1] warn_slowpath_common+0x61/0x80
[Wed Feb 18 20:53:49 2015] [8105e00a] warn_slowpath_null+0x1a/0x20
[Wed Feb 18 20:53:49 2015] [81023706] x86_pmu_start+0xc6/0x100
[Wed Feb 18 20:53:49 2015] [81136128]
perf_adjust_freq_unthr_context.part.79+0x198/0x1b0
[Wed Feb 18 20:53:49 2015] [811363d6] perf_event_task_tick+0xb6/0xf0
[Wed Feb 18 20:53:49 2015] [810967e5] scheduler_tick+0xd5/0x150
[Wed Feb 18 20:53:49 2015] [8106fe86] update_process_times+0x66/0x80
[Wed Feb 18 20:53:49 2015] [810be055]
tick_sched_handle.isra.16+0x25/0x60
[Wed Feb 18 20:53:49 2015] [810be0d1] tick_sched_timer+0x41/0x60
[Wed Feb 18 20:53:49 2015] [81089a57] __run_hrtimer+0x77/0x1d0
[Wed Feb 18 20:53:49 2015] [810be090] ?
tick_sched_handle.isra.16+0x60/0x60
[Wed Feb 18 20:53:49 2015] [8108a297] hrtimer_interrupt+0xf7/0x240
[Wed Feb 18 20:53:49 2015] [81039717]
local_apic_timer_interrupt+0x37/0x60
[Wed Feb 18 20:53:49 2015] 

[ceph-users] Fixing a crushmap

2015-02-20 Thread Kyle Hutson
I manually edited my crushmap, basing my changes on
http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
I have SSDs and HDDs in the same box and was wanting to separate them by
ruleset. My current crushmap can be seen at http://pastie.org/9966238

I had it installed and everything looked gooduntil I created a new
pool. All of the new pgs are stuck in creating. I first tried creating an
erasure-coded pool using ruleset 3, then created another pool using ruleset
0. Same result.

I'm not opposed to an 'RTFM' answer, so long as you can point me to the
right one. I've seen very little documentation on crushmap rules, in
particular.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fixing a crushmap

2015-02-20 Thread Luis Periquito
The process of creating an erasure coded pool and a replicated one is
slightly different. You can use Sebastian's guide to create/manage the osd
tree, but you should follow this guide
http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool.

I'm not sure (i.e. I never tried) to create a EC pool the way you did. The
normal replicated ones do work like this.

On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote:

 I manually edited my crushmap, basing my changes on
 http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
 I have SSDs and HDDs in the same box and was wanting to separate them by
 ruleset. My current crushmap can be seen at http://pastie.org/9966238

 I had it installed and everything looked gooduntil I created a new
 pool. All of the new pgs are stuck in creating. I first tried creating an
 erasure-coded pool using ruleset 3, then created another pool using ruleset
 0. Same result.

 I'm not opposed to an 'RTFM' answer, so long as you can point me to the
 right one. I've seen very little documentation on crushmap rules, in
 particular.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure coded pool

2015-02-20 Thread Loic Dachary
Hi Tom,

On 20/02/2015 22:59, Deneau, Tom wrote:
 Is it possible to run an erasure coded pool using default k=2, m=2 profile on 
 a single node?
 (this is just for functionality testing). The single node has 3 OSDs. 
 Replicated pools run fine.

For k=2 m=2 to work you need four (k+m) OSDs. As long the the crush rule allows 
it, you can have them on the same host.

Cheers

 
 ceph.conf does contain:
osd crush chooseleaf type = 0
 
 
 -- Tom Deneau
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes (fwd)

2015-02-20 Thread Jeff
Should I infer from the silence that there is no way to recover from the

FAILED assert(last_e.version.version  e.version.version) errors?

Thanks,
Jeff

- Forwarded message from Jeff j...@usedmoviefinder.com -

Date: Tue, 17 Feb 2015 09:16:33 -0500
From: Jeff j...@usedmoviefinder.com
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Power failure recovery woes

Some additional information/questions:

Here is the output of ceph osd tree

Some of the down OSD's are actually running, but are down. For example
osd.1:

root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40
/usr/bin/ceph-osd --cluster=ceph -i 0 -f

 Is there any way to get the cluster to recognize them as being up?  osd-1 has
the FAILED assert(last_e.version.version  e.version.version) errors.

Thanks,
 Jeff


# idweight  type name   up/down reweight
-1  10.22   root default
-2  2.72host ceph1
0   0.91osd.0   up  1
1   0.91osd.1   down0
2   0.9 osd.2   down0
-3  1.82host ceph2
3   0.91osd.3   down0
4   0.91osd.4   down0
-4  2.04host ceph3
5   0.68osd.5   up  1
6   0.68osd.6   up  1
7   0.68osd.7   up  1
8   0.68osd.8   down0
-5  1.82host ceph4
9   0.91osd.9   up  1
10  0.91osd.10  down0
-6  1.82host ceph5
11  0.91osd.11  up  1
12  0.91osd.12  up  1

On 2/17/2015 8:28 AM, Jeff wrote:
 
 
  Original Message 
 Subject: Re: [ceph-users] Power failure recovery woes
 Date: 2015-02-17 04:23
 From: Udo Lembke ulem...@polarzone.de
 To: Jeff j...@usedmoviefinder.com, ceph-users@lists.ceph.com
 
 Hi Jeff,
 is the osd /var/lib/ceph/osd/ceph-2 mounted?
 
 If not, does it helps, if you mounted the osd and start with
 service ceph start osd.2
 ??
 
 Udo
 
 Am 17.02.2015 09:54, schrieb Jeff:
 Hi,
 
 We had a nasty power failure yesterday and even with UPS's our small (5
 node, 12 OSD) cluster is having problems recovering.
 
 We are running ceph 0.87
 
 3 of our OSD's are down consistently (others stop and are restartable,
 but our cluster is so slow that almost everything we do times out).
 
 We are seeing errors like this on the OSD's that never run:
 
 ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
 Operation not permitted
 
 We are seeing errors like these of the OSD's that run some of the time:
 
 osd/PGLog.cc: 844: FAILED assert(last_e.version.version 
 e.version.version)
 common/HeartbeatMap.cc: 79: FAILED assert(0 == hit suicide
 timeout)
 
 Does anyone have any suggestions on how to recover our cluster?
 
 Thanks!
   Jeff
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

- End forwarded message -

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Minor version difference between monitors and OSDs

2015-02-20 Thread Gregory Farnum
On Thu, Feb 19, 2015 at 8:30 PM, Christian Balzer ch...@gol.com wrote:

 Hello,

 I have a cluster currently at 0.80.1 and would like to upgrade it to
 0.80.7 (Debian as you can guess), but for a number of reasons I can't
 really do it all at the same time.

 In particular I would like to upgrade the primary monitor node first and
 the secondary ones as well as the OSDs later.

 Now my understanding and hope is that unless I change the config to add
 features that aren't present in 0.80.1, things should work just fine,
 especially given the main release note blurb about 0.80.7:

I don't think we test upgrades between that particular combination of
versions, but as a matter of policy there shouldn't be any issues
between point releases.

The release note is referring to the issue described at
http://tracker.ceph.com/issues/9419, which is indeed for pre-Firefly
to Firefly upgrades. :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD not marked as down or out

2015-02-20 Thread Gregory Farnum
That's pretty strange, especially since the monitor is getting the
failure reports. What version are you running? Can you bump up the
monitor debugging and provide its output from around that time?
-Greg

On Fri, Feb 20, 2015 at 3:26 AM, Sudarshan Pathak sushan@gmail.com wrote:
 Hello everyone,

 I have a cluster running with OpenStack. It has 6 OSD (3 in each 2 different
 locations). Each pool has 3 replication size with 2 copy in primary location
 and 1 copy at secondary location.

 Everything is running as expected but the osd are not marked as down when I
 poweroff a OSD server. It has been around an hour.
 I tried changing the heartbeat settings too.

 Can someone point me in right direction.

 OSD 0 log
 =
 2015-02-20 16:20:14.009723 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
 reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
 16:15:54.607854 (cutoff 2015-02-20 16:19:54.009720)
 2015-02-20 16:20:15.009908 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
 reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
 16:15:54.607854 (cutoff 2015-02-20 16:19:55.009907)
 2015-02-20 16:20:16.010123 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
 reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
 16:15:54.607854 (cutoff 2015-02-20 16:19:56.010119)
 2015-02-20 16:20:16.648167 7f3fc9a76700 -1 osd.0 451 heartbeat_check: no
 reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
 16:15:54.607854 (cutoff 2015-02-20 16:19:56.648165)


 Ceph monitor log
 
 2015-02-20 16:49:16.831548 7f416e4aa700  1 mon.storage1@1(leader).osd e455
 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.4
 192.168.100.35:6800/1305 is reporting failure:1
 2015-02-20 16:49:16.831593 7f416e4aa700  0 log_channel(cluster) log [DBG] :
 osd.2 192.168.100.33:6800/24431 reported failed by osd.4
 192.168.100.35:6800/1305
 2015-02-20 16:49:17.080314 7f416e4aa700  1 mon.storage1@1(leader).osd e455
 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.3
 192.168.100.34:6800/1358 is reporting failure:1
 2015-02-20 16:49:17.080527 7f416e4aa700  0 log_channel(cluster) log [DBG] :
 osd.2 192.168.100.33:6800/24431 reported failed by osd.3
 192.168.100.34:6800/1358
 2015-02-20 16:49:17.420859 7f416e4aa700  1 mon.storage1@1(leader).osd e455
 prepare_failure osd.2 192.168.100.33:6800/24431 from osd.5
 192.168.100.36:6800/1359 is reporting failure:1


 #ceph osd stat
  osdmap e455: 6 osds: 6 up, 6 in


 #ceph -s
 cluster c8a5975f-4c86-4cfe-a91b-fac9f3126afc
  health HEALTH_WARN 528 pgs peering; 528 pgs stuck inactive; 528 pgs
 stuck unclean; 1 requests are blocked  32 sec; 1 mons down, quorum 1,2,3,4
 storage1,storage2,compute3,compute4
  monmap e1: 5 mons at
 {admin=192.168.100.39:6789/0,compute3=192.168.100.133:6789/0,compute4=192.168.100.134:6789/0,storage1=192.168.100.120:6789/0,storage2=192.168.100.121:6789/0},
 election epoch 132, quorum 1,2,3,4 storage1,storage2,compute3,compute4
  osdmap e455: 6 osds: 6 up, 6 in
   pgmap v48474: 3650 pgs, 19 pools, 27324 MB data, 4420 objects
 82443 MB used, 2682 GB / 2763 GB avail
 3122 active+clean
  528 remapped+peering



 Ceph.conf file

 [global]
 fsid = c8a5975f-4c86-4cfe-a91b-fac9f3126afc
 mon_initial_members = admin, storage1, storage2, compute3, compute4
 mon_host =
 192.168.100.39,192.168.100.120,192.168.100.121,192.168.100.133,192.168.100.134
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true

 osd pool default size = 3
 osd pool default min size = 3

 osd pool default pg num = 300
 osd pool default pgp num = 300

 public network = 192.168.100.0/24

 rgw print continue = false
 rgw enable ops log = false

 mon osd report timeout = 60
 mon osd down out interval = 30
 mon osd min down reports = 2

 osd heartbeat grace = 10
 osd mon heartbeat interval = 20
 osd mon report interval max = 60
 osd mon ack timeout = 15

 mon osd min down reports = 2


 Regards,
 Sudarshan Pathak

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] running giant/hammer mds with firefly osds

2015-02-20 Thread Dan van der Ster
On Fri, Feb 20, 2015 at 7:56 PM, Gregory Farnum g...@gregs42.com wrote:
 On Fri, Feb 20, 2015 at 3:50 AM, Luis Periquito periqu...@gmail.com wrote:
 Hi Dan,

 I remember http://tracker.ceph.com/issues/9945 introducing some issues with
 running cephfs between different versions of giant/firefly.

 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html

 Hmm, yeah, that's been fixed for a while but is still waiting to go
 out in the next point release. :(

 Beyond this bug, although the MDS doesn't have any new OSD
 dependencies that could break things, we don't test cross-version
 stuff like that at all except during upgrades. Some minimal testing on
 your side should be enough to make sure it works, but if I were you
 I'd try it on a test cluster first — the MDS is reporting a lot more
 to the monitors in Giant and Hammer than it did in Firefly, and
 everything should be good but there might be issues lurking in the
 compatibility checks there.
 -Greg

Thanks Greg, I'll definitely keep this on a test instance. I'll report
back if I find anything interesting...
Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes (fwd)

2015-02-20 Thread Gregory Farnum
You can try searching the archives and tracker.ceph.com for hints
about repairing these issues, but your disk stores have definitely
been corrupted and it's likely to be an adventure. I'd recommend
examining your local storage stack underneath Ceph and figuring out
which part was ignoring barriers.
-Greg

On Fri, Feb 20, 2015 at 10:39 AM, Jeff j...@usedmoviefinder.com wrote:
 Should I infer from the silence that there is no way to recover from the

 FAILED assert(last_e.version.version  e.version.version) errors?

 Thanks,
 Jeff

 - Forwarded message from Jeff j...@usedmoviefinder.com -

 Date: Tue, 17 Feb 2015 09:16:33 -0500
 From: Jeff j...@usedmoviefinder.com
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Power failure recovery woes

 Some additional information/questions:

 Here is the output of ceph osd tree

 Some of the down OSD's are actually running, but are down. For example
 osd.1:

 root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40
 /usr/bin/ceph-osd --cluster=ceph -i 0 -f

  Is there any way to get the cluster to recognize them as being up?  osd-1 has
 the FAILED assert(last_e.version.version  e.version.version) errors.

 Thanks,
  Jeff


 # idweight  type name   up/down reweight
 -1  10.22   root default
 -2  2.72host ceph1
 0   0.91osd.0   up  1
 1   0.91osd.1   down0
 2   0.9 osd.2   down0
 -3  1.82host ceph2
 3   0.91osd.3   down0
 4   0.91osd.4   down0
 -4  2.04host ceph3
 5   0.68osd.5   up  1
 6   0.68osd.6   up  1
 7   0.68osd.7   up  1
 8   0.68osd.8   down0
 -5  1.82host ceph4
 9   0.91osd.9   up  1
 10  0.91osd.10  down0
 -6  1.82host ceph5
 11  0.91osd.11  up  1
 12  0.91osd.12  up  1

 On 2/17/2015 8:28 AM, Jeff wrote:


  Original Message 
 Subject: Re: [ceph-users] Power failure recovery woes
 Date: 2015-02-17 04:23
 From: Udo Lembke ulem...@polarzone.de
 To: Jeff j...@usedmoviefinder.com, ceph-users@lists.ceph.com

 Hi Jeff,
 is the osd /var/lib/ceph/osd/ceph-2 mounted?

 If not, does it helps, if you mounted the osd and start with
 service ceph start osd.2
 ??

 Udo

 Am 17.02.2015 09:54, schrieb Jeff:
 Hi,

 We had a nasty power failure yesterday and even with UPS's our small (5
 node, 12 OSD) cluster is having problems recovering.

 We are running ceph 0.87

 3 of our OSD's are down consistently (others stop and are restartable,
 but our cluster is so slow that almost everything we do times out).

 We are seeing errors like this on the OSD's that never run:

 ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
 Operation not permitted

 We are seeing errors like these of the OSD's that run some of the time:

 osd/PGLog.cc: 844: FAILED assert(last_e.version.version 
 e.version.version)
 common/HeartbeatMap.cc: 79: FAILED assert(0 == hit suicide
 timeout)

 Does anyone have any suggestions on how to recover our cluster?

 Thanks!
   Jeff


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 - End forwarded message -

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster never reaching clean after osd out

2015-02-20 Thread Yves Kretzschmar
I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel 
3.16.0-0.bpo.4-amd64.
For testing I did a 
~# ceph osd out 20
from a clean state.
Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean 
to get up and then go down to about 11.
 
Short after that the cluster keeps stuck forever in this state:
health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded 
(0.265%); 3691/169647 objects misplaced (2.176%)
 
According to the documentation at 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should 
reach a clean state after an osd out.
 
What am I doing wrong?
 
 
Below some config and command outputs:
 

~# ceph osd tree
# id    weight  type name       up/down reweight
-1      76.02   root default
-2      25.34           host ve51
0       3.62                    osd.0   up      1
3       3.62                    osd.3   up      1
6       3.62                    osd.6   up      1
9       3.62                    osd.9   up      1
12      3.62                    osd.12  up      1
15      3.62                    osd.15  up      1
18      3.62                    osd.18  up      1
-3      25.34           host ve52
1       3.62                    osd.1   up      1
4       3.62                    osd.4   up      1
7       3.62                    osd.7   up      1
10      3.62                    osd.10  up      1
13      3.62                    osd.13  up      1
16      3.62                    osd.16  up      1
19      3.62                    osd.19  up      1
-4      25.34           host ve53
2       3.62                    osd.2   up      1
5       3.62                    osd.5   up      1
8       3.62                    osd.8   up      1
11      3.62                    osd.11  up      1
14      3.62                    osd.14  up      1
17      3.62                    osd.17  up      1
20      3.62                    osd.20  up      1
==
~# cat ceph.conf
[global]
fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196
public_network = 192.168.10.0/24
cluster_network = 192.168.10.0/24
mon_initial_members = ve51, ve52, ve53
mon_host = 192.168.10.51,192.168.10.52,192.168.10.53
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
mon_osd_down_out_subtree_limit = host
osd_pool_default_size=3
osd_pool_default_min_size=2
[osd]
osd_journal_size = 2
osd_mount_options_xfs = noatime,nodiratime,logbsize=256k,logbufs=8,inode64
==
~# ceph -s
    cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196
     health HEALTH_OK
     monmap e1: 3 mons at 
{ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53:6789/0},
 election epoch 28, quorum 0,1,2 ve51,ve52,ve53
     osdmap e1353: 21 osds: 21 up, 21 in
      pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects
            658 GB used, 77139 GB / 77797 GB avail
                2048 active+clean
==                
~# cat crushmap
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host ve51 {
        id -2           # do not change unnecessarily
        # weight 25.340
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 3.620
        item osd.3 weight 3.620
        item osd.6 weight 3.620
        item osd.9 weight 3.620
        item osd.12 weight 3.620
        item osd.15 weight 3.620
        item osd.18 weight 3.620
}
host ve52 {
        id -3           # do not change unnecessarily
        # weight 25.340
        alg straw
        hash 0  # rjenkins1
        item osd.1 weight 3.620
        item osd.4 weight 3.620
        item osd.7 weight 3.620
        item osd.10 weight 3.620
        item osd.13 weight 3.620
        item osd.16 weight 3.620
        item osd.19 weight 3.620
}
host ve53 {
        id -4           # do not change unnecessarily
        # weight 25.340
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 3.620
        item osd.5 weight 3.620
        item osd.8 weight 3.620
        item osd.11 weight 3.620
        item osd.14 weight 3.620
        item osd.17 weight 3.620
        item osd.20 weight 3.620
}
root default {
        id -1           # do not change unnecessarily
        # weight 76.020
        alg straw
        hash 0  # rjenkins1
        item ve51 weight 25.340
        item ve52 weight 25.340
        item 

Re: [ceph-users] running giant/hammer mds with firefly osds

2015-02-20 Thread Gregory Farnum
On Fri, Feb 20, 2015 at 3:50 AM, Luis Periquito periqu...@gmail.com wrote:
 Hi Dan,

 I remember http://tracker.ceph.com/issues/9945 introducing some issues with
 running cephfs between different versions of giant/firefly.

 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14257.html

Hmm, yeah, that's been fixed for a while but is still waiting to go
out in the next point release. :(

Beyond this bug, although the MDS doesn't have any new OSD
dependencies that could break things, we don't test cross-version
stuff like that at all except during upgrades. Some minimal testing on
your side should be enough to make sure it works, but if I were you
I'd try it on a test cluster first — the MDS is reporting a lot more
to the monitors in Giant and Hammer than it did in Firefly, and
everything should be good but there might be issues lurking in the
compatibility checks there.
-Greg


 So if you upgrade please be aware that you'll also have to update the
 clients.

 On Fri, Feb 20, 2015 at 10:33 AM, Dan van der Ster d...@vanderster.com
 wrote:

 Hi all,

 Back in the dumpling days, we were able to run the emperor MDS with
 dumpling OSDs -- this was an improvement over the dumpling MDS.

 Now we have stable firefly OSDs, but I was wondering if we can reap
 some of the recent CephFS developments by running a giant or ~hammer
 MDS with our firefly OSDs. Did anyone try that yet?

 Best Regards, Dan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster never reaching clean after osd out

2015-02-20 Thread Yves Kretzschmar
I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel3.16.0-0.bpo.4-amd64.

For testing I did a

~# ceph osd out 20

from a clean state.

Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean to get up and then go down to about 11.



Short after that the cluster keeps stuck forever in this state:

health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded (0.265%); 3691/169647 objects misplaced (2.176%)



According to the documentation athttp://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should reach a clean state after an osd out.



What am I doing wrong?





Below some config and command outputs:





~# ceph osd tree
# id  weight type nameup/down reweight
-1   76.02  root default
-2   25.34  host ve51
03.62  osd.0  up   1
33.62  osd.3  up   1
63.62  osd.6  up   1
93.62  osd.9  up   1
12   3.62  osd.12 up   1
15   3.62  osd.15 up   1
18   3.62  osd.18 up   1
-3   25.34  host ve52
13.62  osd.1  up   1
43.62  osd.4  up   1
73.62  osd.7  up   1
10   3.62  osd.10 up   1
13   3.62  osd.13 up   1
16   3.62  osd.16 up   1
19   3.62  osd.19 up   1
-4   25.34  host ve53
23.62  osd.2  up   1
53.62  osd.5  up   1
83.62  osd.8  up   1
11   3.62  osd.11 up   1
14   3.62  osd.14 up   1
17   3.62  osd.17 up   1
20   3.62  osd.20 up   1
==
~# cat ceph.conf
[global]
fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196
public_network = 192.168.10.0/24
cluster_network = 192.168.10.0/24
mon_initial_members = ve51, ve52, ve53
mon_host = 192.168.10.51,192.168.10.52,192.168.10.53
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
mon_osd_down_out_subtree_limit = host
osd_pool_default_size=3
osd_pool_default_min_size=2

[osd]
osd_journal_size = 2
osd_mount_options_xfs = noatime,nodiratime,logbsize=256k,logbufs=8,inode64
==
~# ceph -s
  cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196
  health HEALTH_OK
  monmap e1: 3 mons at {ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53:6789/0}, election epoch 28, quorum 0,1,2 ve51,ve52,ve53
  osdmap e1353: 21 osds: 21 up, 21 in
   pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects
  658 GB used, 77139 GB / 77797 GB avail
2048 active+clean
==
~# cat crushmap
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ve51 {
id -2  # do not change unnecessarily
# weight 25.340
alg straw
hash 0 # rjenkins1
item osd.0 weight 3.620
item osd.3 weight 3.620
item osd.6 weight 3.620
item osd.9 weight 3.620
item osd.12 weight 3.620
item osd.15 weight 3.620
item osd.18 weight 3.620
}
host ve52 {
id -3  # do not change unnecessarily
# weight 25.340
alg straw
hash 0 # rjenkins1
item osd.1 weight 3.620
item osd.4 weight 3.620
item osd.7 weight 3.620
item osd.10 weight 3.620
item osd.13 weight 3.620
item osd.16 weight 3.620
item osd.19 weight 3.620
}
host ve53 {
id -4  # do not change unnecessarily
# weight 25.340
alg straw
hash 0 # rjenkins1
item osd.2 weight 3.620
item osd.5 weight 3.620
item osd.8 weight 3.620
item osd.11 weight 3.620
item osd.14 weight 3.620
item osd.17 weight 3.620
item osd.20 weight 3.620
}
root default {
id -1  # do not change unnecessarily
# weight 76.020
alg straw
hash 0 # rjenkins1
item ve51 weight 25.340
item ve52 weight 25.340
item ve53 weight 25.340
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com