On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy <zol...@linux.vnet.ibm.com> wrote:
> The 2^31-1 in there seems to indicate an overflow somewhere - the way we > were able to figure out where exactly > is to query the PG and compare the "up" and "acting" sets - only _one_ > of them had the 2^31-1 number in place > of the correct OSD number. We restarted that and the PG started doing > its job and recovered. no, this value is intentional (and shows up as 'None' on higher level tools), it means no mapping could be found; check your crush map and crush rule Paul > > The issue seems to be going back to 2015: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001661.html > however no solution... > > I'm more concerned about the cluster not being able to recover (it's a > 4+2 EC pool across 12 hosts - plenty of room > to heal) than about the weird print-out. > > The VMs who wanted to access data in any of the affected PGs of course > died. > > Are we missing some settings to let the cluster self-heal even for EC > pools? First EC pool in production :) > > Cheers, > Zoltan > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io