Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? If this is the case: increase "osd max pg per osd hard ratio"
Check "ceph pg <pgid> query" to see why it isn't activating. Can you share the output of "ceph osd df tree" and "ceph pg <pgid> query" of the affected PGs? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber <taeu...@bbaw.de> wrote: > Hi there! > > Recently I made our cluster rack aware > by adding racks to the crush map. > The failure domain was and still is "host". > > rule cephfs2_data { > id 7 > type erasure > min_size 3 > max_size 6 > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take PRZ > step chooseleaf indep 0 type host > step emit > > > Then I sorted the hosts into the new > rack buckets of the crush map as they > are in reality, by: > # osd crush move onodeX rack=XYZ > for all hosts. > > The cluster started to reorder the data. > > In the end the cluster has now: > HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs > inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), > 2 pgs degraded, 2 pgs undersized > FS_DEGRADED 1 filesystem is degraded > fs cephfs_1 is degraded > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.2e4 is stuck inactive for 142792.952697, current state > activating+undersized+degraded+remapped+forced_backfill, last acting > [5,2147483647,25,28,11,2] > pg 23.5 is stuck inactive for 142791.437243, current state > activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] > PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded > (0.029%), 2 pgs degraded, 2 pgs undersized > pg 21.2e4 is stuck undersized for 142779.321192, current state > activating+undersized+degraded+remapped+forced_backfill, last acting > [5,2147483647,25,28,11,2] > pg 23.5 is stuck undersized for 142789.747915, current state > activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] > > The cluster hosts a cephfs which is > not mountable anymore. > > I tried a few things (as you can see: > forced_backfill), but failed. > > The cephfs_data pool is EC 4+2. > Both inactive pgs seem to have enough > copies to recalculate the contents for > all osds. > > Is there a chance to get both pgs > clean again? > > How can I force the pgs to recalculate > all necessary copies? > > > Thanks > Lars > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com