Hi,

my version of ceph is 0.72.2 on scientific linux with
2.6.32-431.1.2.el6.x86_64 kernel.

after a network trouble on all my nodes. Osd flap up to down periodically.
I have to set* nodown parameter* to stabilize it. I have a public_network
and a cluster_network.


 I have this message on most of osd:

2014-06-23 08:08:59.750879 7f6bd3661700 -1 osd.y 53377 he*artbeat_check: no
reply from osd.xxx ever on either front or back*, first ping sent
2014-06-22 20:06:10.055264 (cutoff 2014-06-23 08:08:24.750744)


 cluster b71fecc6-0323-4f08-8b49-e8ed1ff2d4ce

health HEALTH_WARN 1 pgs backfill; 73 pgs down; 196 pgs peering; 196 pgs
stuck inactive; 197 pgs stuck unclean; recovery 592/2459924 objects
degraded (0.024%); nodown flag(s) set

monmap e5: 3 mons at
{bb-e19-x4=10.257.53.236:6789/0,cephfrontux1-r=10.257.53.241:6789/0,cephfrontux2-r=10.257.53.242:6789/0},
election epoch 202, quorum 0,1,2 bb-e19-x4,cephtux1-r,cephtux2-r

osdmap e53377: 34 osds: 33 up, 33 in

flags nodown

pgmap v5928500: 5596 pgs, 5 pools, 4755 GB data, 1212 kobjects

9466 GB used, 17248 GB / 26715 GB avail

592/2459924 objects degraded (0.024%)

5398 active+clean

1 active+remapped+wait_backfill

123 peering

73 down+peering

1 active+clean+scrubbing


 grep check ceph-osd.*.log ' '| awk '{print $5,$7,'problem',$11}'|sort -u


 osd.10 heartbeat_check: problem osd.0

osd.10 heartbeat_check: problem osd.11

osd.10 heartbeat_check: problem osd.19

.....

 is the same for most os dlog.

I wrote some options  but nothing

[osd]
osd_heartbeat_grace = 35
osd_min_down_reports = 4
osd_heartbeat_addr = 10.157.53.224
mon_osd_down_out_interval = 3000
osd_heartbeat_interval = 12
osd_mkfs_options_xfs = "-f"
mon_osd_min_down_reporters = 3
osd_mkfs_type = xfs

  Have you an idea to fix it?



-- 
Eric Mourgaya,


Respectons la planete!
Luttons contre la mediocrite!
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to