[ceph-users] Re: EC PGs stuck activating, 2^31-1 as OSD ID, automatic recovery not kicking in

Paul Emmerich Fri, 22 Nov 2019 12:46:41 -0800

On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy
<zol...@linux.vnet.ibm.com> wrote:


> The 2^31-1 in there seems to indicate an overflow somewhere - the way we
> were able to figure out where exactly
> is to query the PG and compare the "up" and "acting" sets - only _one_
> of them had the 2^31-1 number in place
> of the correct OSD number. We restarted that and the PG started doing
> its job and recovered.

no, this value is intentional (and shows up as 'None' on higher level
tools), it means no mapping could be found; check your crush map and
crush rule

Paul


>
> The issue seems to be going back to 2015:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001661.html
> however no solution...
>
> I'm more concerned about the cluster not being able to recover (it's a
> 4+2 EC pool across 12 hosts - plenty of room
> to heal) than about the weird print-out.
>
> The VMs who wanted to access data in any of the affected PGs of course
> died.
>
> Are we missing some settings to let the cluster self-heal even for EC
> pools? First EC pool in production :)
>
> Cheers,
> Zoltan
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: EC PGs stuck activating, 2^31-1 as OSD ID, automatic recovery not kicking in

Reply via email to