Wild guess: you hit the PG hard limit, how many PGs per OSD do you have?
If this is the case: increase "osd max pg per osd hard ratio"

Check "ceph pg <pgid> query" to see why it isn't activating.

Can you share the output of "ceph osd df tree" and "ceph pg <pgid> query"
of the affected PGs?


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber <taeu...@bbaw.de> wrote:

> Hi there!
>
> Recently I made our cluster rack aware
> by adding racks to the crush map.
> The failure domain was and still is "host".
>
> rule cephfs2_data {
>         id 7
>         type erasure
>         min_size 3
>         max_size 6
>         step set_chooseleaf_tries 5
>         step set_choose_tries 100
>         step take PRZ
>         step chooseleaf indep 0 type host
>         step emit
>
>
> Then I sorted the hosts into the new
> rack buckets of the crush map as they
> are in reality, by:
>   # osd crush move onodeX rack=XYZ
> for all hosts.
>
> The cluster started to reorder the data.
>
> In the end the cluster has now:
> HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs
> inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%),
> 2 pgs degraded, 2 pgs undersized
> FS_DEGRADED 1 filesystem is degraded
>     fs cephfs_1 is degraded
> PG_AVAILABILITY Reduced data availability: 2 pgs inactive
>     pg 21.2e4 is stuck inactive for 142792.952697, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting
> [5,2147483647,25,28,11,2]
>     pg 23.5 is stuck inactive for 142791.437243, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
> PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded
> (0.029%), 2 pgs degraded, 2 pgs undersized
>     pg 21.2e4 is stuck undersized for 142779.321192, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting
> [5,2147483647,25,28,11,2]
>     pg 23.5 is stuck undersized for 142789.747915, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
>
> The cluster hosts a cephfs which is
> not mountable anymore.
>
> I tried a few things (as you can see:
> forced_backfill), but failed.
>
> The cephfs_data pool is EC 4+2.
> Both inactive pgs seem to have enough
> copies to recalculate the contents for
> all osds.
>
> Is there a chance to get both pgs
> clean again?
>
> How can I force the pgs to recalculate
> all necessary copies?
>
>
> Thanks
> Lars
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to