Hello:
According to my understanding, osd's heartbeat partners only come from
those osds who assume the same pg
See below(# ceph osd tree), osd.10 and osd.0-6 cannot assume the same pg,
because osd.10 and osd.0-6 are from different root tree, and pg in my
cluster doesn't map across root trees(# ceph osd crush rule dump). so,
osd.0-6 cannot become the heartbeat partner of osd.10

But, below is the log on osd.10, It can be seen that the osd.10's heartbeat
partner include osd.0/1/2/5, why?

thanks for any help


> # osd.10 log
> 2019-11-20 09:21:50.431799 7fbb369fb700 -1 osd.10 7344 heartbeat_check: no
> reply from 10.13.6.162:6806 osd.2 since back 2019-11-20 09:21:19.979712
> front 2019-11-20 09:21:19.979712 (cutoff 2019-11-20 09:21:30.431768)
> 2019-11-20 13:15:59.175060 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no
> reply from 10.13.6.162:6806 <http://10.13.6.152:6806> osd.2 since back
> 2019-11-20 13:15:38.710424 front 2019-11-20 13:15:38.710424 (cutoff
> 2019-11-20 13:15:39.175058)
> 2019-11-20 13:15:59.175110 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no
> reply from 10.13.6.160:6803 osd.0 since back 2019-11-20 13:15:38.710424
> front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058)
> 2019-11-20 13:15:59.175118 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no
> reply from 10.13.6.161:6803 osd.1 since back 2019-11-20 13:15:38.710424
> front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058)
> 2019-11-21 02:52:24.656783 7fbb369fb700 -1 osd.10 7374 heartbeat_check: no
> reply from 10.13.6.158:6810 osd.5 since back 2019-11-21 02:52:04.557548
> front 2019-11-21 02:52:04.557548 (cutoff 2019-11-21 02:52:04.656781)
>

# ceph osd tree

> -17       3.29095 root ssd-storage
>
> -25       1.09698     rack rack-ssd-A
>
> -18       1.09698         host ssd-osd01
>  10   hdd 1.09698             osd.10                       up  1.00000
> 1.00000
> -26       1.09698     rack rack-ssd-B
>
> -19       1.09698         host ssd-osd02
>  11   hdd 1.09698             osd.11                        up  1.00000
> 1.00000
> -27       1.09698     rack rack-ssd-C
>
> -20       1.09698         host ssd-osd03
>  12   hdd 1.09698             osd.12                        up  1.00000
> 1.00000
>  -1       3.22256 root default
>
>  -3       0.29300     host test-osd01
>
>   0   hdd 0.29300         osd.0                             up  1.00000
> 1.00000
>  -5       0.29300     host test-osd02
>
>   1   hdd 0.29300         osd.1                             up  0.89999
> 1.00000
>  -7       0.29300     host test-osd03
>
>   2   hdd 0.29300         osd.2                             up  0.79999
> 1.00000
>  -9       0.29300     host test-osd04
>
>   3   hdd 0.29300         osd.3                             up  1.00000
> 1.00000
> -11       0.29300     host test-osd05
>
>   4   hdd 0.29300         osd.4                             up  1.00000
> 1.00000
> -13       0.29300     host test-osd06
>
>   5   hdd 0.29300         osd.5                             up  1.00000
> 1.00000
> -15       0.29300     host test-osd07
>
>   6   hdd 0.29300         osd.6                             up  1.00000
> 1.00000


# ceph osd crush rule dump

>
> [
>     {
>         "rule_id": 0,
>         "rule_name": "replicated_rule",
>         "ruleset": 0,
>         "type": 1,
>         "min_size": 1,
>         "max_size": 10,
>         "steps": [
>             {
>                 "op": "take",
>                 "item": -1,
>                 "item_name": "default"
>             },
>             {
>                 "op": "chooseleaf_firstn",
>                 "num": 0,
>                 "type": "host"
>             },
>             {
>                 "op": "emit"
>             }
>         ]
>     },
>     {
>         "rule_id": 1,
>         "rule_name": "replicated_rule_ssd",
>         "ruleset": 1,
>         "type": 1,
>         "min_size": 1,
>         "max_size": 10,
>         "steps": [
>             {
>                 "op": "take",
>                 "item": -17,
>                 "item_name": "ssd-storage"
>             },
>             {
>                 "op": "chooseleaf_firstn",
>                 "num": 0,
>                 "type": "rack"
>             },
>             {
>                 "op": "emit"
>             }
>         ]
>     }
> ]
>

# some parameters

> "mon_osd_down_out_interval": "600",
> "mon_osd_down_out_subtree_limit": "rack",
> "mds_debug_subtrees": "false",
> "mon_osd_down_out_subtree_limit": "rack",
> "mon_osd_reporter_subtree_level": "host",
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to