Hi all,
We sometimes can observe that acting set seems to violate crush rule. For
example, we had an environment before:
[root@Ann-per-R7-3 /]# ceph -s
cluster:
id: 248ce880-f57b-4a4c-a53a-3fc2b3eb142a
health: HEALTH_WARN
34/8019 objects misplaced (0.424%)
services:
mon: 3 daemons, quorum Ann-per-R7-3,Ann-per-R7-7,Ann-per-R7-1
mgr: Ann-per-R7-3(active), standbys: Ann-per-R7-7, Ann-per-R7-1
mds: cephfs-1/1/1 up {0=qceph-mds-Ann-per-R7-1=up:active}, 2 up:standby
osd: 7 osds: 7 up, 7 in; 1 remapped pgs
data:
pools: 7 pools, 128 pgs
objects: 2.67 k objects, 10 GiB
usage: 107 GiB used, 3.1 TiB / 3.2 TiB avail
pgs: 34/8019 objects misplaced (0.424%)
127 active+clean
1 active+clean+remapped
[root@Ann-per-R7-3 /]# ceph pg ls remapped
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE
STATE_STAMP VERSION REPORTED UP ACTING
SCRUB_STAMP DEEP_SCRUB_STAMP
1.7 34 0 34 0 134217728 42
active+clean+remapped 2019-11-05 10:39:58.639533 144'42 229:407
[6,1]p6 [6,1,2]p6 2019-11-04 10:36:19.519820 2019-11-04
10:36:19.519820
[root@Ann-per-R7-3 /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-2 0 root perf_osd
-1 3.10864 root default
-7 0.44409 host Ann-per-R7-1
5 hdd 0.44409 osd.5 up 1.00000 1.00000
-3 1.33228 host Ann-per-R7-3
0 hdd 0.44409 osd.0 up 1.00000 1.00000
1 hdd 0.44409 osd.1 up 1.00000 1.00000
2 hdd 0.44409 osd.2 up 1.00000 1.00000
-9 1.33228 host Ann-per-R7-7
6 hdd 0.44409 osd.6 up 1.00000 1.00000
7 hdd 0.44409 osd.7 up 1.00000 1.00000
8 hdd 0.44409 osd.8 up 1.00000 1.00000
[root@Ann-per-R7-3 /]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
5 hdd 0.44409 1.00000 465 GiB 21 GiB 444 GiB 4.49 1.36 127
0 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.16 0.96 44
1 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.14 0.95 52
2 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.98 0.91 33
6 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.97 0.90 43
7 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.19 0.97 41
8 hdd 0.44409 1.00000 465 GiB 14 GiB 450 GiB 3.09 0.94 44
TOTAL 3.2 TiB 107 GiB 3.1 TiB 3.29
MIN/MAX VAR: 0.90/1.36 STDDEV: 0.49
Based on our crush map, crush rule should select 1 OSD from each host.
However, from above log, we can see that an acting set is [6,1,2] and osd.1
and osd.2 are in the same host, which seems to violate crush rule. So, my
question is how does this happen...? Any enlightenment is much appreciated.
Best
Cian
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com