Hello: According to my understanding, osd's heartbeat partners only come from those osds who assume the same pg See below(# ceph osd tree), osd.10 and osd.0-6 cannot assume the same pg, because osd.10 and osd.0-6 are from different root tree, and pg in my cluster doesn't map across root trees(# ceph osd crush rule dump). so, osd.0-6 cannot become the heartbeat partner of osd.10
But, below is the log on osd.10, It can be seen that the osd.10's heartbeat partner include osd.0/1/2/5, why? thanks for any help > # osd.10 log > 2019-11-20 09:21:50.431799 7fbb369fb700 -1 osd.10 7344 heartbeat_check: no > reply from 10.13.6.162:6806 osd.2 since back 2019-11-20 09:21:19.979712 > front 2019-11-20 09:21:19.979712 (cutoff 2019-11-20 09:21:30.431768) > 2019-11-20 13:15:59.175060 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no > reply from 10.13.6.162:6806 <http://10.13.6.152:6806> osd.2 since back > 2019-11-20 13:15:38.710424 front 2019-11-20 13:15:38.710424 (cutoff > 2019-11-20 13:15:39.175058) > 2019-11-20 13:15:59.175110 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no > reply from 10.13.6.160:6803 osd.0 since back 2019-11-20 13:15:38.710424 > front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058) > 2019-11-20 13:15:59.175118 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no > reply from 10.13.6.161:6803 osd.1 since back 2019-11-20 13:15:38.710424 > front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058) > 2019-11-21 02:52:24.656783 7fbb369fb700 -1 osd.10 7374 heartbeat_check: no > reply from 10.13.6.158:6810 osd.5 since back 2019-11-21 02:52:04.557548 > front 2019-11-21 02:52:04.557548 (cutoff 2019-11-21 02:52:04.656781) > # ceph osd tree > -17 3.29095 root ssd-storage > > -25 1.09698 rack rack-ssd-A > > -18 1.09698 host ssd-osd01 > 10 hdd 1.09698 osd.10 up 1.00000 > 1.00000 > -26 1.09698 rack rack-ssd-B > > -19 1.09698 host ssd-osd02 > 11 hdd 1.09698 osd.11 up 1.00000 > 1.00000 > -27 1.09698 rack rack-ssd-C > > -20 1.09698 host ssd-osd03 > 12 hdd 1.09698 osd.12 up 1.00000 > 1.00000 > -1 3.22256 root default > > -3 0.29300 host test-osd01 > > 0 hdd 0.29300 osd.0 up 1.00000 > 1.00000 > -5 0.29300 host test-osd02 > > 1 hdd 0.29300 osd.1 up 0.89999 > 1.00000 > -7 0.29300 host test-osd03 > > 2 hdd 0.29300 osd.2 up 0.79999 > 1.00000 > -9 0.29300 host test-osd04 > > 3 hdd 0.29300 osd.3 up 1.00000 > 1.00000 > -11 0.29300 host test-osd05 > > 4 hdd 0.29300 osd.4 up 1.00000 > 1.00000 > -13 0.29300 host test-osd06 > > 5 hdd 0.29300 osd.5 up 1.00000 > 1.00000 > -15 0.29300 host test-osd07 > > 6 hdd 0.29300 osd.6 up 1.00000 > 1.00000 # ceph osd crush rule dump > > [ > { > "rule_id": 0, > "rule_name": "replicated_rule", > "ruleset": 0, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -1, > "item_name": "default" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > }, > { > "rule_id": 1, > "rule_name": "replicated_rule_ssd", > "ruleset": 1, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -17, > "item_name": "ssd-storage" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "rack" > }, > { > "op": "emit" > } > ] > } > ] > # some parameters > "mon_osd_down_out_interval": "600", > "mon_osd_down_out_subtree_limit": "rack", > "mds_debug_subtrees": "false", > "mon_osd_down_out_subtree_limit": "rack", > "mon_osd_reporter_subtree_level": "host",
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io