Ah, that was the problem! So I edited the crushmap ( http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a weight of 10.000 for all three 10TB OSD hosts. The instant result was all those pgs with only 2 OSDs were replaced with 3 OSDs while the cluster started rebalancing the data. I trust it will complete with time and I'll be good to go!
New OSD tree: $ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 30.00000 root default -5 10.00000 host osd1 3 10.00000 osd.3 up 1.00000 1.00000 -6 10.00000 host osd2 4 10.00000 osd.4 up 1.00000 1.00000 -2 10.00000 host osd3 0 10.00000 osd.0 up 1.00000 1.00000 Kudos to Brad Hubbard for steering me in the right direction! On Tue, Jul 18, 2017 at 8:27 PM Brad Hubbard <bhubb...@redhat.com> wrote: > ID WEIGHT TYPE NAME > -5 1.00000 host osd1 > -6 9.09560 host osd2 > -2 9.09560 host osd3 > > The weight allocated to host "osd1" should presumably be the same as > the other two hosts? > > Dump your crushmap and take a good look at it, specifically the > weighting of "osd1". > > > On Wed, Jul 19, 2017 at 11:48 AM, Roger Brown <rogerpbr...@gmail.com> > wrote: > > I also tried ceph pg query, but it gave no helpful recommendations for > any > > of the stuck pgs. > > > > > > On Tue, Jul 18, 2017 at 7:45 PM Roger Brown <rogerpbr...@gmail.com> > wrote: > >> > >> Problem: > >> I have some pgs with only two OSDs instead of 3 like all the other pgs > >> have. This is causing active+undersized+degraded status. > >> > >> History: > >> 1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a > 1TB > >> drive. > >> 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive. > >> 3. Removed the original 3 1TB OSD hosts from the osd tree (reweight 0, > >> wait, stop, remove, del osd&host, rm). > >> 4. The last OSD to be removed would never return to active+clean after > >> reweight 0. It returned undersized instead, but I went on with removal > >> anyway, leaving me stuck with 5 undersized pgs. > >> > >> Things tried that didn't help: > >> * give it time to go away on its own > >> * Replace replicated default.rgw.buckets.data pool with erasure-code 2+1 > >> version. > >> * ceph osd lost 1 (and 2) > >> * ceph pg repair (pgs from dump_stuck) > >> * googled 'ceph pg undersized' and similar searches for help. > >> > >> Current status: > >> $ ceph osd tree > >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > >> -1 19.19119 root default > >> -5 1.00000 host osd1 > >> 3 1.00000 osd.3 up 1.00000 1.00000 > >> -6 9.09560 host osd2 > >> 4 9.09560 osd.4 up 1.00000 1.00000 > >> -2 9.09560 host osd3 > >> 0 9.09560 osd.0 up 1.00000 1.00000 > >> $ ceph pg dump_stuck > >> ok > >> PG_STAT STATE UP UP_PRIMARY ACTING > ACTING_PRIMARY > >> 88.3 active+undersized+degraded [4,0] 4 [4,0] > 4 > >> 97.3 active+undersized+degraded [4,0] 4 [4,0] > 4 > >> 85.6 active+undersized+degraded [4,0] 4 [4,0] > 4 > >> 87.5 active+undersized+degraded [0,4] 0 [0,4] > 0 > >> 70.0 active+undersized+degraded [0,4] 0 [0,4] > 0 > >> $ ceph osd pool ls detail > >> pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags hashpspool > >> stripe_width 0 > >> pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 > >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 576 > owner > >> 18446744073709551615 flags hashpspool stripe_width 0 > >> pool 85 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags hashpspool > >> stripe_width 0 > >> pool 86 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule > 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool > >> stripe_width 0 > >> pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags hashpspool > >> stripe_width 0 > >> pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags hashpspool > >> stripe_width 0 > >> pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags hashpspool > >> stripe_width 0 > >> pool 90 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule > 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags hashpspool > >> stripe_width 0 > >> pool 91 'default.rgw.users.email' replicated size 3 min_size 2 > crush_rule > >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags > hashpspool > >> stripe_width 0 > >> pool 92 'default.rgw.users.keys' replicated size 3 min_size 2 > crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags hashpspool > >> stripe_width 0 > >> pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2 > >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664 > flags > >> hashpspool stripe_width 0 > >> pool 95 'default.rgw.intent-log' replicated size 3 min_size 2 > crush_rule 0 > >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags hashpspool > >> stripe_width 0 > >> pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags hashpspool > >> stripe_width 0 > >> pool 97 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags hashpspool > >> stripe_width 0 > >> pool 98 'default.rgw.users.swift' replicated size 3 min_size 2 > crush_rule > >> 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags > hashpspool > >> stripe_width 0 > >> pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2 > >> crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 663 > flags > >> hashpspool stripe_width 0 > >> pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 > object_hash > >> rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool > stripe_width 0 > >> pool 101 'default.rgw.reshard' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner > >> 18446744073709551615 flags hashpspool stripe_width 0 > >> pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2 > crush_rule 1 > >> object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags > >> hashpspool stripe_width 8192 > >> > >> I'll keep on googling, but I'm open to advice! > >> > >> Thank you, > >> > >> Roger > >> > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Cheers, > Brad >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com