Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

Graham Allan Mon, 19 Feb 2018 13:38:59 -0800


On 02/17/2018 12:48 PM, David Zafman wrote:

The commits below came after v12.2.2 and may impact this issue. When apg is active+clean+inconsistent means that scrub has detected issueswith 1 or more replicas of 1 or more objects . An unfound object is apotentially temporary state in which the current set of available OSDsdoesn't allow an object to be recovered/backfilled/repaired. When theprimary OSD restarts, any unfound objects ( an in memory structure) arereset so that the new set of peered OSDs can determine again whatobjects are unfound.
I'm not clear in this scenario whether recovery failed to start,recovery hung before due to a bug or if recovery stopped (as designed)because of the unfound object. The new recovery_unfound andbackfill_unfound states indicates that recovery has stopped due tounfound objects.

Thanks for your comments David. I could certainly enable any additionallogging that might help to clarify what's going on here - perhaps on theprimary OSD for a given pg?

I am still having a hard time understanding why these objects repeatedlyget flagged as unfound, when they are downloadable and contain correctdata whenever they are not in this state. It is a 4+2 EC pool, so Iwould think it possible to reconstruct any missing EC chunks.

It's an extensive problem; while I have been focusing on examining acouple of specific pgs, the pool in general is showing 2410 pgsinconsistent (out of 4096).


Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

Reply via email to