On Mon, Oct 6, 2025, 5:34 PM Sa Pham <[email protected]> wrote: > Hi Frédéric, > > I tried to restart every osd related to that PG many times but it didn't > work. > > When the primary switch to OSD 101, we still have slow on OSD 101. So I > think it’s not hardware issue > > Regards, > > > > > > > On Mon, 6 Oct 2025 at 20:46 Frédéric Nass <[email protected]> wrote: > > > Could be an issue with the primary OSD which is now osd.130. Have you > > checked osd.130 for any errors? > > Maybe try restarting osd.130 and osd.302 one at a time and maybe 101 as > > well, waiting for ~all PGs to become active+clean between all restarts. > > > > Could you please share a ceph status? So we get a better view of the > > situation. > > > > Regards, > > Frédéric. > > > > -- > > Frédéric Nass > > Ceph Ambassador France | Senior Ceph Engineer @ CLYSO > > Try our Ceph Analyzer -- https://analyzer.clyso.com/ > > https://clyso.com | [email protected] > > > > Le lun. 6 oct. 2025 à 14:19, Sa Pham <[email protected]> a écrit : > > > >> Hi Frédéric, > >> > >> I tried to repeer and deep scrub, but it's not working. > >> > >> Have you already checked the logs for osd.302 and /var/log/messages for > >> any I/O-related issues? > >> > >> => I checked , there is no I/O error/issue. > >> > >> Regards, > >> > >> On Mon, Oct 6, 2025 at 3:15 PM Frédéric Nass <[email protected]> > >> wrote: > >> > >>> Hi Sa, > >>> > >>> Regarding the output you provided, it appears that osd.302 is listed as > >>> UP but not ACTING for PG 18.773: > >>> > >>> PG_STAT STATE > >>> UP UP_PRIMARY ACTING > >>> ACTING_PRIMARY > >>> 18.773 active+undersized+degraded+remapped+backfilling > >>> [302,150,138] 302 [130,101] 130 > >>> > >>> Have you already checked the logs for osd.302 and /var/log/messages for > >>> any I/O-related issues? Could you also try running 'ceph pg repeer > 18.773'? > >>> > >>> If this is the only PG for which `osd.302` is not acting and the > >>> 'repeer' command does not resolve the issue, I would suggest > attempting a > >>> deep-scrub on this PG. > >>> This might uncover errors that could potentially be fixed, either > online > >>> or offline. > >>> > >>> Regards, > >>> Frédéric > >>> > >>> -- > >>> Frédéric Nass > >>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO > >>> Try our Ceph Analyzer -- https://analyzer.clyso.com/ > >>> https://clyso.com | [email protected] > >>> > >>> > >>> Le lun. 6 oct. 2025 à 06:31, Sa Pham <[email protected]> a écrit : > >>> > >>>> Hello Eugen, > >>>> > >>>> > >>>> This PG include: 254490 objects, size: 68095493667 bytes > >>>> > >>>> > >>>> Regards, > >>>> > >>>> On Fri, Oct 3, 2025 at 9:10 PM Eugen Block <[email protected]> wrote: > >>>> > >>>> > Is it possible that this is a huge PG? What size does it have? But > it > >>>> > could also be a faulty disk. > >>>> > > >>>> > > >>>> > Zitat von Sa Pham <[email protected]>: > >>>> > > >>>> > > *Hello everyone,* > >>>> > > > >>>> > > I’m running a Ceph cluster used as an RGW backend, and I’m facing > an > >>>> > issue > >>>> > > with one particular placement group (PG). > >>>> > > > >>>> > > > >>>> > > - > >>>> > > > >>>> > > Accessing objects from this PG is *extremely slow*. > >>>> > > - > >>>> > > > >>>> > > Even running ceph pg <pg_id> takes a very long time. > >>>> > > - > >>>> > > > >>>> > > The PG is currently *stuck in a degraded state*, so I’m unable > >>>> to move > >>>> > > it to other OSDs. > >>>> > > > >>>> > > > >>>> > > Current ceph version is reef 18.2.7. > >>>> > > > >>>> > > Has anyone encountered a similar issue before or have any > >>>> suggestions on > >>>> > > how to troubleshoot and resolve it? > >>>> > > > >>>> > > > >>>> > > Thanks in advance! > >>>> > > _______________________________________________ > >>>> > > ceph-users mailing list -- [email protected] > >>>> > > To unsubscribe send an email to [email protected] > >>>> > > >>>> > > >>>> > _______________________________________________ > >>>> > ceph-users mailing list -- [email protected] > >>>> > To unsubscribe send an email to [email protected] > >>>> > > >>>> > >>>> > >>>> -- > >>>> Sa Pham Dang > >>>> Skype: great_bn > >>>> Phone/Telegram: 0986.849.582 > >>>> _______________________________________________ > >>>> ceph-users mailing list -- [email protected] > >>>> To unsubscribe send an email to [email protected] > >>>> > >>> > >> > >> -- > >> Sa Pham Dang > >> Skype: great_bn > >> Phone/Telegram: 0986.849.582 > >> > >> > >> > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected]
Hi, If it's a performance issue, particularly if it's manifesting as high CPU load, you can usually pinpoint what's going on based on which symbol(s) are hot according to `perf top -p <pid>`. If it's not CPU hot, `iostat` is worth a look to see if the kernel thinks the block device is busy. Barring either of those it gets a bit trickier to tease out, but first discern whether or not it's a resource issue and work backwards from there would be my advice. Cheers, Tyler _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
