Hi, One way this can happen is if you change the crush rule of a pool after the balancer has been running awhile. This is because the balancer upmaps are only validated when they are initially created.
ceph osd dump | grep upmap Does it explain your issue? .. Dan On Tue, 14 Jan 2020, 04:17 Yi-Cian Pu, <yician1000c...@gmail.com> wrote: > Hi all, > > We sometimes can observe that acting set seems to violate crush rule. For > example, we had an environment before: > > [root@Ann-per-R7-3 /]# ceph -s > cluster: > id: 248ce880-f57b-4a4c-a53a-3fc2b3eb142a > health: HEALTH_WARN > 34/8019 objects misplaced (0.424%) > > services: > mon: 3 daemons, quorum Ann-per-R7-3,Ann-per-R7-7,Ann-per-R7-1 > mgr: Ann-per-R7-3(active), standbys: Ann-per-R7-7, Ann-per-R7-1 > mds: cephfs-1/1/1 up {0=qceph-mds-Ann-per-R7-1=up:active}, 2 up:standby > osd: 7 osds: 7 up, 7 in; 1 remapped pgs > > data: > pools: 7 pools, 128 pgs > objects: 2.67 k objects, 10 GiB > usage: 107 GiB used, 3.1 TiB / 3.2 TiB avail > pgs: 34/8019 objects misplaced (0.424%) > 127 active+clean > 1 active+clean+remapped > > [root@Ann-per-R7-3 /]# ceph pg ls remapped > PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE > STATE_STAMP VERSION REPORTED UP ACTING SCRUB_STAMP > DEEP_SCRUB_STAMP > 1.7 34 0 34 0 134217728 42 active+clean+remapped > 2019-11-05 10:39:58.639533 144'42 229:407 [6,1]p6 [6,1,2]p6 2019-11-04 > 10:36:19.519820 2019-11-04 10:36:19.519820 > > > [root@Ann-per-R7-3 /]# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -2 0 root perf_osd > -1 3.10864 root default > -7 0.44409 host Ann-per-R7-1 > 5 hdd 0.44409 osd.5 up 1.00000 1.00000 > -3 1.33228 host Ann-per-R7-3 > 0 hdd 0.44409 osd.0 up 1.00000 1.00000 > 1 hdd 0.44409 osd.1 up 1.00000 1.00000 > 2 hdd 0.44409 osd.2 up 1.00000 1.00000 > -9 1.33228 host Ann-per-R7-7 > 6 hdd 0.44409 osd.6 up 1.00000 1.00000 > 7 hdd 0.44409 osd.7 up 1.00000 1.00000 > 8 hdd 0.44409 osd.8 up 1.00000 1.00000 > > > [root@Ann-per-R7-3 /]# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 5 hdd 0.44409 1.00000 465 GiB 21 GiB 444 GiB 4.49 1.36 127 > 0 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.16 0.96 44 > 1 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.14 0.95 52 > 2 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.98 0.91 33 > 6 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.97 0.90 43 > 7 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.19 0.97 41 > 8 hdd 0.44409 1.00000 465 GiB 14 GiB 450 GiB 3.09 0.94 44 > TOTAL 3.2 TiB 107 GiB 3.1 TiB 3.29 > MIN/MAX VAR: 0.90/1.36 STDDEV: 0.49 > > > Based on our crush map, crush rule should select 1 OSD from each host. > However, from above log, we can see that an acting set is [6,1,2] and osd.1 > and osd.2 are in the same host, which seems to violate crush rule. So, my > question is how does this happen...? Any enlightenment is much appreciated. > > Best > Cian > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com