That's pretty strange, especially since the monitor is getting the
failure reports. What version are you running? Can you bump up the
monitor debugging and provide its output from around that time?
-Greg
On Fri, Feb 20, 2015 at 3:26 AM, Sudarshan Pathak sushan@gmail.com wrote:
Hello everyone,
I have a cluster running with OpenStack. It has 6 OSD (3 in each 2 different
locations). Each pool has 3 replication size with 2 copy in primary location
and 1 copy at secondary location.
Everything is running as expected but the osd are not marked as down when I
poweroff a OSD server. It has been around an hour.
I tried changing the heartbeat settings too.
Can someone point me in right direction.
OSD 0 log
=
2015-02-20 16:20:14.009723 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:54.009720)
2015-02-20 16:20:15.009908 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:55.009907)
2015-02-20 16:20:16.010123 7f3fe37d7700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:56.010119)
2015-02-20 16:20:16.648167 7f3fc9a76700 -1 osd.0 451 heartbeat_check: no
reply from osd.2 since back 2015-02-20 16:15:54.607854 front 2015-02-20
16:15:54.607854 (cutoff 2015-02-20 16:19:56.648165)
Ceph monitor log
2015-02-20 16:49:16.831548 7f416e4aa700 1 mon.storage1@1(leader).osd e455
prepare_failure osd.2 192.168.100.33:6800/24431 from osd.4
192.168.100.35:6800/1305 is reporting failure:1
2015-02-20 16:49:16.831593 7f416e4aa700 0 log_channel(cluster) log [DBG] :
osd.2 192.168.100.33:6800/24431 reported failed by osd.4
192.168.100.35:6800/1305
2015-02-20 16:49:17.080314 7f416e4aa700 1 mon.storage1@1(leader).osd e455
prepare_failure osd.2 192.168.100.33:6800/24431 from osd.3
192.168.100.34:6800/1358 is reporting failure:1
2015-02-20 16:49:17.080527 7f416e4aa700 0 log_channel(cluster) log [DBG] :
osd.2 192.168.100.33:6800/24431 reported failed by osd.3
192.168.100.34:6800/1358
2015-02-20 16:49:17.420859 7f416e4aa700 1 mon.storage1@1(leader).osd e455
prepare_failure osd.2 192.168.100.33:6800/24431 from osd.5
192.168.100.36:6800/1359 is reporting failure:1
#ceph osd stat
osdmap e455: 6 osds: 6 up, 6 in
#ceph -s
cluster c8a5975f-4c86-4cfe-a91b-fac9f3126afc
health HEALTH_WARN 528 pgs peering; 528 pgs stuck inactive; 528 pgs
stuck unclean; 1 requests are blocked 32 sec; 1 mons down, quorum 1,2,3,4
storage1,storage2,compute3,compute4
monmap e1: 5 mons at
{admin=192.168.100.39:6789/0,compute3=192.168.100.133:6789/0,compute4=192.168.100.134:6789/0,storage1=192.168.100.120:6789/0,storage2=192.168.100.121:6789/0},
election epoch 132, quorum 1,2,3,4 storage1,storage2,compute3,compute4
osdmap e455: 6 osds: 6 up, 6 in
pgmap v48474: 3650 pgs, 19 pools, 27324 MB data, 4420 objects
82443 MB used, 2682 GB / 2763 GB avail
3122 active+clean
528 remapped+peering
Ceph.conf file
[global]
fsid = c8a5975f-4c86-4cfe-a91b-fac9f3126afc
mon_initial_members = admin, storage1, storage2, compute3, compute4
mon_host =
192.168.100.39,192.168.100.120,192.168.100.121,192.168.100.133,192.168.100.134
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd pool default size = 3
osd pool default min size = 3
osd pool default pg num = 300
osd pool default pgp num = 300
public network = 192.168.100.0/24
rgw print continue = false
rgw enable ops log = false
mon osd report timeout = 60
mon osd down out interval = 30
mon osd min down reports = 2
osd heartbeat grace = 10
osd mon heartbeat interval = 20
osd mon report interval max = 60
osd mon ack timeout = 15
mon osd min down reports = 2
Regards,
Sudarshan Pathak
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com