Ah I see, should have look at the “raw” data instead ;-) Then I agree this very weird?
Best, Jesper -------------------------- Jesper Lykkegaard Karlsen Scientific Computing Centre for Structural Biology Department of Molecular Biology and Genetics Aarhus University Universitetsbyen 81 8000 Aarhus C E-mail: je...@mbg.au.dk Tlf: +45 50906203 > On 28 Jul 2022, at 12.45, Frank Schilder <fr...@dtu.dk> wrote: > > Hi Jesper, > > thanks for looking at this. The failure domain is OSD and not host. I typed > it wrong in the text, the copy of the crush rule shows it right: step choose > indep 0 type osd. > > I'm trying to reproduce the observation to file a tracker item, but it is > more difficult than expected. It might be a race condition, so far I didn't > see it again. I hope I can figure out when and why this is happening. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Jesper Lykkegaard Karlsen <je...@mbg.au.dk> > Sent: 28 July 2022 12:02:51 > To: Frank Schilder > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] PG does not become active > > Hi Frank, > > I think you need at least 6 OSD hosts to make EC 4+2 with faillure domain > host. > > I do not know how it was possible for you to create that configuration at > first? > Could it be that you have multiple name for the OSD hosts? > That would at least explain the one OSD down, being show as two OSDs down. > > Also, I believe that min_size should never be smaller than “coding” shards, > which is 4 in this case. > > You can either make a new test setup with your three test OSD hosts using EC > 2+1 or make e.g. 4+2, but with failure domain set to OSD. > > Best, > Jesper > > -------------------------- > Jesper Lykkegaard Karlsen > Scientific Computing > Centre for Structural Biology > Department of Molecular Biology and Genetics > Aarhus University > Universitetsbyen 81 > 8000 Aarhus C > > E-mail: je...@mbg.au.dk > Tlf: +45 50906203 > >> On 27 Jul 2022, at 17.32, Frank Schilder <fr...@dtu.dk> wrote: >> >> Update: the inactive PG got recovered and active after a loooonngg wait. The >> middle question is now answered. However, these two questions are still of >> great worry: >> >> - How can 2 OSDs be missing if only 1 OSD is down? >> - If the PG should recover, why is it not prioritised considering its severe >> degradation >> compared with all other PGs? >> >> I don't understand how a PG can loose 2 shards if 1 OSD goes down. That >> looks really really bad to me (did ceph loose track of data??). >> >> The second is of no less importance. The inactive PG was holding back client >> IO, leading to further warnings about slow OPS/requests/... Why are such >> critically degraded PGs not scheduled for recovery first? There is a service >> outage but only a health warning? >> >> Thanks and best regards. >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Frank Schilder <fr...@dtu.dk> >> Sent: 27 July 2022 17:19:05 >> To: ceph-users@ceph.io >> Subject: [ceph-users] PG does not become active >> >> I'm testing octopus 15.2.16 and run into a problem right away. I'm filling >> up a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how >> recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs >> of this pool 2 (!!!) shards are missing. This most degraded PG is not >> becoming active, its stuck inactive but peered. >> >> Questions: >> >> - How can 2 OSDs be missing if only 1 OSD is down? >> - Wasn't there an important code change to allow recovery for an EC PG with >> at >> least k shards present even if min_size>k? Do I have to set something? >> - If the PG should recover, why is it not prioritised considering its severe >> degradation >> compared with all other PGs? >> >> I have already increased these crush tunables and executed a pg repeer to no >> avail: >> >> tunable choose_total_tries 250 <-- default 100 >> rule fs-data { >> id 1 >> type erasure >> min_size 3 >> max_size 6 >> step set_chooseleaf_tries 50 <-- default 5 >> step set_choose_tries 200 <-- default 100 >> step take default >> step choose indep 0 type osd >> step emit >> } >> >> Ceph health detail says to that: >> >> [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive >> pg 4.32 is stuck inactive for 37m, current state >> recovery_wait+undersized+degraded+remapped+peered, last acting >> [1,2147483647,2147483647,4,5,2] >> >> I don't want to cheat and set min_size=k on this pool. It should work by >> itself. >> >> Thanks for any pointers! >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io