The PGs will stay active+recovery_wait+degraded until you solve the unfound objects issue. You can follow this doc to look at which objects are unfound[1] and if no other recourse mark them lost
[1] http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects . On Thu, May 23, 2019 at 5:47 AM Kevin Flöh <kevin.fl...@kit.edu> wrote: > thank you for this idea, it has improved the situation. Nevertheless, > there are still 2 PGs in recovery_wait. ceph -s gives me: > > cluster: > id: 23e72372-0d44-4cad-b24f-3641b14b86f4 > health: HEALTH_WARN > 3/125481112 objects unfound (0.000%) > Degraded data redundancy: 3/497011315 objects degraded > (0.000%), 2 pgs degraded > > services: > mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02 > mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu > mds: cephfs-1/1/1 up {0=ceph-node03.etp.kit.edu=up:active}, 3 > up:standby > osd: 96 osds: 96 up, 96 in > > data: > pools: 2 pools, 4096 pgs > objects: 125.48M objects, 259TiB > usage: 370TiB used, 154TiB / 524TiB avail > pgs: 3/497011315 objects degraded (0.000%) > 3/125481112 objects unfound (0.000%) > 4083 active+clean > 10 active+clean+scrubbing+deep > 2 active+recovery_wait+degraded > 1 active+clean+scrubbing > > io: > client: 318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr > > > and ceph health detail: > > HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data > redundancy: 3/497011315 objects degraded (0.000%), 2 p > gs degraded > OBJECT_UNFOUND 3/125481112 objects unfound (0.000%) > pg 1.24c has 1 unfound objects > pg 1.779 has 2 unfound objects > PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded > (0.000%), 2 pgs degraded > pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1 > unfound > pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2 > unfound > > > also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph > osd down for all OSDs of the degraded PGs. Do you have any further > suggestions on how to proceed? > > On 23.05.19 11:08 vorm., Dan van der Ster wrote: > > I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer > > their degraded PGs. > > > > Open a window with `watch ceph -s`, then in another window slowly do > > > > ceph osd down 1 > > # then wait a minute or so for that osd.1 to re-peer fully. > > ceph osd down 11 > > ... > > > > Continue that for each of the osds with stuck requests, or until there > > are no more recovery_wait/degraded PGs. > > > > After each `ceph osd down...`, you should expect to see several PGs > > re-peer, and then ideally the slow requests will disappear and the > > degraded PGs will become active+clean. > > If anything else happens, you should stop and let us know. > > > > > > -- dan > > > > On Thu, May 23, 2019 at 10:59 AM Kevin Flöh <kevin.fl...@kit.edu> wrote: > >> This is the current status of ceph: > >> > >> > >> cluster: > >> id: 23e72372-0d44-4cad-b24f-3641b14b86f4 > >> health: HEALTH_ERR > >> 9/125481144 objects unfound (0.000%) > >> Degraded data redundancy: 9/497011417 objects degraded > >> (0.000%), 7 pgs degraded > >> 9 stuck requests are blocked > 4096 sec. Implicated osds > >> 1,11,21,32,43,50,65 > >> > >> services: > >> mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02 > >> mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu > >> mds: cephfs-1/1/1 up {0=ceph-node03.etp.kit.edu=up:active}, 3 > >> up:standby > >> osd: 96 osds: 96 up, 96 in > >> > >> data: > >> pools: 2 pools, 4096 pgs > >> objects: 125.48M objects, 259TiB > >> usage: 370TiB used, 154TiB / 524TiB avail > >> pgs: 9/497011417 objects degraded (0.000%) > >> 9/125481144 objects unfound (0.000%) > >> 4078 active+clean > >> 11 active+clean+scrubbing+deep > >> 7 active+recovery_wait+degraded > >> > >> io: > >> client: 211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr > >> > >> On 23.05.19 10:54 vorm., Dan van der Ster wrote: > >>> What's the full ceph status? > >>> Normally recovery_wait just means that the relevant osd's are busy > >>> recovering/backfilling another PG. > >>> > >>> On Thu, May 23, 2019 at 10:53 AM Kevin Flöh <kevin.fl...@kit.edu> > wrote: > >>>> Hi, > >>>> > >>>> we have set the PGs to recover and now they are stuck in > active+recovery_wait+degraded and instructing them to deep-scrub does not > change anything. Hence, the rados report is empty. Is there a way to stop > the recovery wait to start the deep-scrub and get the output? I guess the > recovery_wait might be caused by missing objects. Do we need to delete them > first to get the recovery going? > >>>> > >>>> Kevin > >>>> > >>>> On 22.05.19 6:03 nachm., Robert LeBlanc wrote: > >>>> > >>>> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.fl...@kit.edu> > wrote: > >>>>> Hi, > >>>>> > >>>>> thank you, it worked. The PGs are not incomplete anymore. Still we > have > >>>>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is > >>>>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to > >>>>> repair" and nothing happens. Does somebody know how we can get the > PGs > >>>>> to repair? > >>>>> > >>>>> Regards, > >>>>> > >>>>> Kevin > >>>> Kevin, > >>>> > >>>> I just fixed an inconsistent PG yesterday. You will need to figure > out why they are inconsistent. Do these steps and then we can figure out > how to proceed. > >>>> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix > some of them) > >>>> 2. Print out the inconsistent report for each inconsistent PG. `rados > list-inconsistent-obj <PG_NUM> --format=json-pretty` > >>>> 3. You will want to look at the error messages and see if all the > shards have the same data. > >>>> > >>>> Robert LeBlanc > >>>> > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com