Wait, first just restart the leader mon. See: https://tracker.ceph.com/issues/47380 for a related issue.
-- dan On Mon, May 3, 2021 at 2:55 PM Vladimir Sigunov <vladimir.sigu...@gmail.com> wrote: > > Hi Frank, > Yes, I would purge the osd. The cluster looks absolutely healthy except of > this osd.584 Probably, the purge will help the cluster to forget this faulty > one. Also, I would restart monitors, too. > With the amount of data you maintain in your cluster, I don't think your > ceph.conf contains any information about some particular osds, but if it > does, don't forget to remove the configuration of osd.584 from the ceph.conf > > Get Outlook for Android<https://aka.ms/ghei36> > > ________________________________ > From: Frank Schilder <fr...@dtu.dk> > Sent: Monday, May 3, 2021 8:37:09 AM > To: Vladimir Sigunov <vladimir.sigu...@gmail.com>; ceph-users@ceph.io > <ceph-users@ceph.io> > Subject: Re: OSD slow ops warning not clearing after OSD down > > Hi Vladimir, > > thanks for your reply. I did, the cluster is healthy: > > [root@gnosis ~]# ceph status > cluster: > id: --- > health: HEALTH_WARN > 430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs2-2/2/2 up {0=ceph-08=up:active,1=ceph-12=up:active}, 2 > up:standby > osd: 584 osds: 578 up, 578 in > > data: > pools: 11 pools, 3215 pgs > objects: 610.3 M objects, 1.2 PiB > usage: 1.5 PiB used, 4.6 PiB / 6.0 PiB avail > pgs: 3191 active+clean > 13 active+clean+scrubbing+deep > 9 active+clean+snaptrim_wait > 2 active+clean+snaptrim > > io: > client: 358 MiB/s rd, 56 MiB/s wr, 2.35 kop/s rd, 1.32 kop/s wr > > [root@gnosis ~]# ceph health detail > HEALTH_WARN 430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops > SLOW_OPS 430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops > > OSD 580 is down+out and the message does not even increment the seconds. Its > probably stuck in some part of the health checking that tries to query 580 > and doesn't understand that the OSD being down means there are no ops. > > I tried to restart the OSD on this disk, but it seems completely rigged. The > iDRAC log on the server says that the disk was removed during operation > possibly due to a physical connection fail on the SAS lanes. I somehow need > to get rid of this message and am wondering of purging the OSD would help. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Vladimir Sigunov <vladimir.sigu...@gmail.com> > Sent: 03 May 2021 13:45:19 > To: ceph-users@ceph.io; Frank Schilder > Subject: Re: OSD slow ops warning not clearing after OSD down > > Hi Frank. > Check your cluster for inactive/incomplete placement groups. I saw similar > behavior on Octopus when some pgs stuck in incomplete/inactive or peering > state. > > ________________________________ > From: Frank Schilder <fr...@dtu.dk> > Sent: Monday, May 3, 2021 3:42:48 AM > To: ceph-users@ceph.io <ceph-users@ceph.io> > Subject: [ceph-users] OSD slow ops warning not clearing after OSD down > > Dear cephers, > > I have a strange problem. An OSD went down and recovery finished. For some > reason, I have a slow ops warning for the failed OSD stuck in the system: > > health: HEALTH_WARN > 430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops > > The OSD is auto-out: > > | 580 | ceph-22 | 0 | 0 | 0 | 0 | 0 | 0 | > autoout,exists | > > It is probably a warning dating back to just before the fail. How can I clear > it? > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io