Problem:
I have some pgs with only two OSDs instead of 3 like all the other pgs
have. This is causing active+undersized+degraded status.

History:
1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a 1TB
drive.
2. Added 3 more hosts, each with 1 OSD process for a 10TB drive.
3. Removed the original 3 1TB OSD hosts from the osd tree (reweight 0,
wait, stop, remove, del osd&host, rm).
4. The last OSD to be removed would never return to active+clean after
reweight 0. It returned undersized instead, but I went on with removal
anyway, leaving me stuck with 5 undersized pgs.

Things tried that didn't help:
* give it time to go away on its own
* Replace replicated default.rgw.buckets.data pool with erasure-code 2+1
version.
* ceph osd lost 1 (and 2)
* ceph pg repair (pgs from dump_stuck)
* googled 'ceph pg undersized' and similar searches for help.

Current status:
$ ceph osd tree
ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 19.19119 root default
-5  1.00000     host osd1
 3  1.00000         osd.3      up  1.00000          1.00000
-6  9.09560     host osd2
 4  9.09560         osd.4      up  1.00000          1.00000
-2  9.09560     host osd3
 0  9.09560         osd.0      up  1.00000          1.00000
$ ceph pg dump_stuck
ok
PG_STAT STATE                      UP    UP_PRIMARY ACTING ACTING_PRIMARY
88.3    active+undersized+degraded [4,0]          4  [4,0]              4
97.3    active+undersized+degraded [4,0]          4  [4,0]              4
85.6    active+undersized+degraded [4,0]          4  [4,0]              4
87.5    active+undersized+degraded [0,4]          0  [0,4]              0
70.0    active+undersized+degraded [0,4]          0  [0,4]              0
$ ceph osd pool ls detail
pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags hashpspool
stripe_width 0
pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 576 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 85 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags hashpspool
stripe_width 0
pool 86 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool
stripe_width 0
pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags hashpspool
stripe_width 0
pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags hashpspool
stripe_width 0
pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags hashpspool
stripe_width 0
pool 90 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags hashpspool
stripe_width 0
pool 91 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags hashpspool
stripe_width 0
pool 92 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags hashpspool
stripe_width 0
pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664 flags hashpspool
stripe_width 0
pool 95 'default.rgw.intent-log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags hashpspool
stripe_width 0
pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags hashpspool
stripe_width 0
pool 97 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags hashpspool
stripe_width 0
pool 98 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags hashpspool
stripe_width 0
pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 663 flags hashpspool
stripe_width 0
pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool stripe_width 0
pool 101 'default.rgw.reshard' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags
hashpspool stripe_width 8192

I'll keep on googling, but I'm open to advice!

Thank you,

Roger
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to