On Mon, Sep 17, 2018 at 8:21 AM Graham Allan <g...@umn.edu> wrote:

>
>
> On 09/14/2018 02:38 PM, Gregory Farnum wrote:
> > On Thu, Sep 13, 2018 at 3:05 PM, Graham Allan <g...@umn.edu> wrote:
> >>
> >> However I do see transfer errors fetching some files out of radosgw -
> the
> >> transfer just hangs then aborts. I'd guess this probably due to one pg
> stuck
> >> down, due to a lost (failed HDD) osd. I think there is no alternative to
> >> declare the osd lost, but I wish I understood better the implications
> of the
> >> "recovery_state" and "past_intervals" output by ceph pg query:
> >> https://pastebin.com/8WrYLwVt
> >
> > What are you curious about here? The past intervals is listing the
> > OSDs which were involved in the PG since it was last clean, then each
> > acting set and the intervals it was active for.
>
> That's pretty much what I'm looking for, and that the pg can roll back
> to an earlier interval if there were no writes, and the current osd has
> to be declared lost.
>
> >> I find it disturbing/odd that the acting set of osds lists only 3/6
> >> available; implies that without getting one of these back it would be
> >> impossible to recover the data (from 4+2 EC). However the dead osd 98
> only
> >> appears in the most recent (?) interval - presumably during the flapping
> >> period, during which time client writes were unlikely (radosgw
> disabled).
> >>
> >> So if 98 were marked lost would it roll back to the prior interval? I
> am not
> >> certain how to interpret this information!
> >
> > Yes, that’s what should happen if it’s all as you outline here.
> >
> > It *is* quite curious that the PG apparently went active with only 4
> > members in a 4+2 system — it's supposed to require at least k+1 (here,
> > 5) by default. Did you override the min_size or something?
> > -Greg
>
> Looking back through history it seems that I *did* override the min_size
> for this pool, however I didn't reduce it - it used to have min_size 2!
> That made no sense to me - I think it must be an artifact of a very
> early (hammer?) ec pool creation, but it pre-dates me.
>
> I found the documentation on what min_size should be a bit confusing
> which is how I arrived at 4. Fully agree that k+1=5 makes way more sense.
>
> I don't think I was the only one confused by this though, eg
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html
>
> I suppose the safest thing to do is update min_size->5 right away to
> force any size-4 pgs down until they can perform recovery. I can set
> force-recovery on these as well...
>

Mmm, this is embarrassing but that actually doesn't quite work due to
https://github.com/ceph/ceph/pull/24095, which has been on my task list but
at the bottom for a while. :( So if your cluster is stable now I'd let it
clean up and then change the min_size once everything is repaired.


>
> Is there any setting which can permit these pgs to fulfil reads while
> refusing writes when active size=k?
>

No, that's unfortunately infeasible.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to