Re: [ceph-users] undersized pgs after removing smaller OSDs

David Turner Tue, 18 Jul 2017 22:19:24 -0700

I would recommend sucking with the weight of 9.09560 for the osds as that
is the TiB size of the osds that ceph details to as supposed to the TB size
of the osds. New osds will have their weights based on the TiB value. What
is your `ceph osd df` output just to see what things look like? Hopefully
very healthy.


On Tue, Jul 18, 2017, 11:16 PM Roger Brown <rogerpbr...@gmail.com> wrote:

> Resolution confirmed!
>
> $ ceph -s
>   cluster:
>     id:     eea7b78c-b138-40fc-9f3e-3d77afb770f0
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum desktop,mon1,nuc2
>     mgr: desktop(active), standbys: mon1
>     osd: 3 osds: 3 up, 3 in
>
>   data:
>     pools:   19 pools, 372 pgs
>     objects: 54243 objects, 71722 MB
>     usage:   129 GB used, 27812 GB / 27941 GB avail
>     pgs:     372 active+clean
>
>
> On Tue, Jul 18, 2017 at 8:47 PM Roger Brown <rogerpbr...@gmail.com> wrote:
>
>> Ah, that was the problem!
>>
>> So I edited the crushmap (
>> http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a
>> weight of 10.000 for all three 10TB OSD hosts. The instant result was all
>> those pgs with only 2 OSDs were replaced with 3 OSDs while the cluster
>> started rebalancing the data. I trust it will complete with time and I'll
>> be good to go!
>>
>> New OSD tree:
>> $ ceph osd tree
>> ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 30.00000 root default
>> -5 10.00000     host osd1
>>  3 10.00000         osd.3      up  1.00000          1.00000
>> -6 10.00000     host osd2
>>  4 10.00000         osd.4      up  1.00000          1.00000
>> -2 10.00000     host osd3
>>  0 10.00000         osd.0      up  1.00000          1.00000
>>
>> Kudos to Brad Hubbard for steering me in the right direction!
>>
>>
>> On Tue, Jul 18, 2017 at 8:27 PM Brad Hubbard <bhubb...@redhat.com> wrote:
>>
>>> ID WEIGHT   TYPE NAME
>>> -5  1.00000     host osd1
>>> -6  9.09560     host osd2
>>> -2  9.09560     host osd3
>>>
>>> The weight allocated to host "osd1" should presumably be the same as
>>> the other two hosts?
>>>
>>> Dump your crushmap and take a good look at it, specifically the
>>> weighting of "osd1".
>>>
>>>
>>> On Wed, Jul 19, 2017 at 11:48 AM, Roger Brown <rogerpbr...@gmail.com>
>>> wrote:
>>> > I also tried ceph pg query, but it gave no helpful recommendations for
>>> any
>>> > of the stuck pgs.
>>> >
>>> >
>>> > On Tue, Jul 18, 2017 at 7:45 PM Roger Brown <rogerpbr...@gmail.com>
>>> wrote:
>>> >>
>>> >> Problem:
>>> >> I have some pgs with only two OSDs instead of 3 like all the other pgs
>>> >> have. This is causing active+undersized+degraded status.
>>> >>
>>> >> History:
>>> >> 1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a
>>> 1TB
>>> >> drive.
>>> >> 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive.
>>> >> 3. Removed the original 3 1TB OSD hosts from the osd tree (reweight 0,
>>> >> wait, stop, remove, del osd&host, rm).
>>> >> 4. The last OSD to be removed would never return to active+clean after
>>> >> reweight 0. It returned undersized instead, but I went on with removal
>>> >> anyway, leaving me stuck with 5 undersized pgs.
>>> >>
>>> >> Things tried that didn't help:
>>> >> * give it time to go away on its own
>>> >> * Replace replicated default.rgw.buckets.data pool with erasure-code
>>> 2+1
>>> >> version.
>>> >> * ceph osd lost 1 (and 2)
>>> >> * ceph pg repair (pgs from dump_stuck)
>>> >> * googled 'ceph pg undersized' and similar searches for help.
>>> >>
>>> >> Current status:
>>> >> $ ceph osd tree
>>> >> ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>> >> -1 19.19119 root default
>>> >> -5  1.00000     host osd1
>>> >>  3  1.00000         osd.3      up  1.00000          1.00000
>>> >> -6  9.09560     host osd2
>>> >>  4  9.09560         osd.4      up  1.00000          1.00000
>>> >> -2  9.09560     host osd3
>>> >>  0  9.09560         osd.0      up  1.00000          1.00000
>>> >> $ ceph pg dump_stuck
>>> >> ok
>>> >> PG_STAT STATE                      UP    UP_PRIMARY ACTING
>>> ACTING_PRIMARY
>>> >> 88.3    active+undersized+degraded [4,0]          4  [4,0]
>>>   4
>>> >> 97.3    active+undersized+degraded [4,0]          4  [4,0]
>>>   4
>>> >> 85.6    active+undersized+degraded [4,0]          4  [4,0]
>>>   4
>>> >> 87.5    active+undersized+degraded [0,4]          0  [0,4]
>>>   0
>>> >> 70.0    active+undersized+degraded [0,4]          0  [0,4]
>>>   0
>>> >> $ ceph osd pool ls detail
>>> >> pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
>>> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 576
>>> owner
>>> >> 18446744073709551615 flags hashpspool stripe_width 0
>>> >> pool 85 'default.rgw.control' replicated size 3 min_size 2 crush_rule
>>> 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 86 'default.rgw.data.root' replicated size 3 min_size 2
>>> crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 90 'default.rgw.users.uid' replicated size 3 min_size 2
>>> crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 91 'default.rgw.users.email' replicated size 3 min_size 2
>>> crush_rule
>>> >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 92 'default.rgw.users.keys' replicated size 3 min_size 2
>>> crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2
>>> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664
>>> flags
>>> >> hashpspool stripe_width 0
>>> >> pool 95 'default.rgw.intent-log' replicated size 3 min_size 2
>>> crush_rule 0
>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 97 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 98 'default.rgw.users.swift' replicated size 3 min_size 2
>>> crush_rule
>>> >> 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags
>>> hashpspool
>>> >> stripe_width 0
>>> >> pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2
>>> >> crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 663
>>> flags
>>> >> hashpspool stripe_width 0
>>> >> pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0
>>> object_hash
>>> >> rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool
>>> stripe_width 0
>>> >> pool 101 'default.rgw.reshard' replicated size 3 min_size 2
>>> crush_rule 0
>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner
>>> >> 18446744073709551615 flags hashpspool stripe_width 0
>>> >> pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2
>>> crush_rule 1
>>> >> object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags
>>> >> hashpspool stripe_width 8192
>>> >>
>>> >> I'll keep on googling, but I'm open to advice!
>>> >>
>>> >> Thank you,
>>> >>
>>> >> Roger
>>> >>
>>> >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brad
>>>
>> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] undersized pgs after removing smaller OSDs

Reply via email to