check your ceph.log on the mons for "stat mismatch" and grep for the PG in
question for potentially more information.

Additionally "rados list-inconsistent-obj {pgid}" will often show which OSD
and objects are implicated for the inconsistency. If the acting set has
changed since the scrub (for example an osd is removed or failed) in which
the inconsistency was found this data wont be there any longer and you
would need to deep-scrub the PG again to get that information.

Respectfully,

*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
w...@wesdillingham.com




On Fri, Apr 12, 2024 at 6:56 AM Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

>
> Hello Albert,
>
> Have you check the hardware status of the involved drives other than with
> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for
> DELL hardware for example).
> If these tools don't report any media error (that is bad blocs on disks)
> then you might just be facing the bit rot phenomenon. But this is very rare
> and should happen in a sysadmin's lifetime as often as a Royal Flush hand
> in a professional poker player's lifetime. ;-)
>
> If no media error is reported, then you might want to check and update the
> firmware of all drives.
>
> Once you figured it out, you may enable osd_scrub_auto_repair=true to have
> these inconsistencies repaired automatically on deep-scrubbing, but make
> sure you're using the alert module [1] so to at least get informed about
> the scrub errors.
>
> Regards,
> Frédéric.
>
> [1] https://docs.ceph.com/en/latest/mgr/alerts/
>
> ----- Le 12 Avr 24, à 11:59, Albert Shih albert.s...@obspm.fr a écrit :
>
> > Hi everyone.
> >
> > I got a warning with
> >
> > root@cthulhu1:/etc/ceph# ceph -s
> >  cluster:
> >    id:     9c5bb196-c212-11ee-84f3-c3f2beae892d
> >    health: HEALTH_ERR
> >            1 scrub errors
> >            Possible data damage: 1 pg inconsistent
> >
> > So I find the pg with the issue, and launch a pg repair (still waiting)
> >
> > But I try to find «why» so I check all the OSD related on this pg and
> > didn't find anything, no error from osd daemon, no errors from smartctl,
> no
> > error from the kernel message.
> >
> > So I just like to know if that's «normal» or should I scratch deeper.
> >
> > JAS
> > --
> > Albert SHIH 🦫 🐸
> > France
> > Heure locale/Local time:
> > ven. 12 avril 2024 11:51:37 CEST
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to