Is this the only PG in this undersized + degraded, or maybe could this be the tailend of a recovery event e.g small % of degraded objects?
ceph status output could help as well On Thu, 2 Oct 2025 at 17:53, Sa Pham <[email protected]> wrote: > Hi Anthony, > > I dont see any error from dmesg log and smart of those OSDs > > Regards > > On Fri, 3 Oct 2025 at 07:08 Anthony D'Atri <[email protected]> wrote: > > > Check dmesg / var/log/messages on the hosts with that PG’s OSDs for > > errors, and run a `smartctl -a` on each of those OSDs’ drives. > > > > > On Oct 2, 2025, at 7:55 PM, Sa Pham <[email protected]> wrote: > > > > > > Hi Joshua, > > > > > > No. OSD is still responding. > > > > > > # ceph tell osd.130 version > > > { > > > "version": "18.2.7-0-g6b0e988052e", > > > "release": "reef", > > > "release_type": "stable" > > > } > > > > > > But the primary OSD which includes slow pg (18.773) will respond > slower. > > > > > > > > > detailed as below > > > > > > > > > # ceph pg dump_stuck degraded > > > PG_STAT STATE UP > > > UP_PRIMARY ACTING ACTING_PRIMARY > > > 18.773 active+undersized+degraded+remapped+backfilling [302,150,138] > > > 302 [130,101] 130 > > > > > > > > > # time ceph tell osd.130 version > > > { > > > "version": "18.2.7-0-g6b0e988052e", > > > "release": "reef", > > > "release_type": "stable" > > > } > > > > > > real 0m2.113s > > > user 0m0.148s > > > sys 0m0.036s > > > # time ceph tell osd.101 version > > > { > > > "version": "18.2.7-0-g6b0e988052e", > > > "release": "reef", > > > "release_type": "stable" > > > } > > > > > > real 0m0.192s > > > user 0m0.152s > > > sys 0m0.037s > > > > > > > > > I don't know why. > > > > > > > > > Regards, > > > > > > > > > > > > On Fri, Oct 3, 2025 at 4:21 AM Joshua Blanch <[email protected]> > > > wrote: > > > > > >> could it be an OSD not responding? > > >> > > >> I would usually do > > >> > > >> ceph tell osd.* version > > >> > > >> To test if you can connect with osds > > >> > > >> > > >> On Thu, Oct 2, 2025 at 12:09 PM Sa Pham <[email protected]> wrote: > > >> > > >>> *Hello everyone,* > > >>> > > >>> I’m running a Ceph cluster used as an RGW backend, and I’m facing an > > issue > > >>> with one particular placement group (PG). > > >>> > > >>> > > >>> - > > >>> > > >>> Accessing objects from this PG is *extremely slow*. > > >>> - > > >>> > > >>> Even running ceph pg <pg_id> takes a very long time. > > >>> - > > >>> > > >>> The PG is currently *stuck in a degraded state*, so I’m unable to > > move > > >>> it to other OSDs. > > >>> > > >>> > > >>> Current ceph version is reef 18.2.7. > > >>> > > >>> Has anyone encountered a similar issue before or have any suggestions > > on > > >>> how to troubleshoot and resolve it? > > >>> > > >>> > > >>> Thanks in advance! > > >>> _______________________________________________ > > >>> ceph-users mailing list -- [email protected] > > >>> To unsubscribe send an email to [email protected] > > >>> > > >> > > > > > > -- > > > Sa Pham Dang > > > Skype: great_bn > > > Phone/Telegram: 0986.849.582 > > > _______________________________________________ > > > ceph-users mailing list -- [email protected] > > > To unsubscribe send an email to [email protected] > > > > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
