Personally before extreme measures like marking lost, i would try bringing up the osd, so it's up and out -- i believe the data will still be found and re balanced away from it by Ceph.
-Ben On Thu, Apr 6, 2017 at 11:20 AM, David Welch <dwe...@thinkars.com> wrote: > Hi, > We had a disk on the cluster that was not responding properly and causing > 'slow requests'. The osd on the disk was stopped and the osd was marked > down and then out. Rebalancing succeeded but (some?) pgs from that osd are > now stuck in stale+active+clean state, which is not being resolved (see > below for query results). > > My question: is it better to mark this osd as "lost" (i.e. 'ceph osd lost > 14') or to remove the osd as detailed here: > https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/ > > Thanks, > David > > > > > > > > > > > > > > > > > > > > > > * $ ceph health detail HEALTH_ERR 17 pgs are stuck inactive for more than > 300 seconds; 17 pgs stale; 17 pgs stuck stale pg 7.f3 is stuck stale for > 6138.330316, current state stale+active+clean, last acting [14] pg 7.bd > <http://7.bd> is stuck stale for 6138.330365, current state > stale+active+clean, last acting [14] pg 7.b6 is stuck stale for > 6138.330374, current state stale+active+clean, last acting [14] pg 7.c5 is > stuck stale for 6138.330363, current state stale+active+clean, last acting > [14] pg 7.ac <http://7.ac> is stuck stale for 6138.330385, current state > stale+active+clean, last acting [14] pg 7.5b is stuck stale for > 6138.330678, current state stale+active+clean, last acting [14] pg 7.1b4 is > stuck stale for 6138.330409, current state stale+active+clean, last acting > [14] pg 7.182 is stuck stale for 6138.330445, current state > stale+active+clean, last acting [14] pg 7.1f8 is stuck stale for > 6138.330720, current state stale+active+clean, last acting [14] pg 7.53 is > stuck stale for 6138.330697, current state stale+active+clean, last acting > [14] pg 7.1d2 is stuck stale for 6138.330663, current state > stale+active+clean, last acting [14] pg 7.70 is stuck stale for > 6138.330742, current state stale+active+clean, last acting [14] pg 7.14f is > stuck stale for 6138.330585, current state stale+active+clean, last acting > [14] pg 7.23 is stuck stale for 6138.330610, current state > stale+active+clean, last acting [14] pg 7.153 is stuck stale for > 6138.330600, current state stale+active+clean, last acting [14] pg 7.cc is > stuck stale for 6138.330409, current state stale+active+clean, last acting > [14] pg 7.16b is stuck stale for 6138.330509, current state > stale+active+clean, last acting [14] $ ceph pg dump_stuck stale* > *ok* > *pg_stat state up up_primary acting acting_primary* > *7.f3 stale+active+clean [14] 14 [14] 14* > *7.bd <http://7.bd> stale+active+clean [14] 14 [14] 14* > *7.b6 stale+active+clean [14] 14 [14] 14* > *7.c5 stale+active+clean [14] 14 [14] 14* > *7.ac <http://7.ac> stale+active+clean [14] 14 [14] 14* > *7.5b stale+active+clean [14] 14 [14] 14* > *7.1b4 stale+active+clean [14] 14 [14] 14* > *7.182 stale+active+clean [14] 14 [14] 14* > *7.1f8 stale+active+clean [14] 14 [14] 14* > *7.53 stale+active+clean [14] 14 [14] 14* > *7.1d2 stale+active+clean [14] 14 [14] 14* > *7.70 stale+active+clean [14] 14 [14] 14* > *7.14f stale+active+clean [14] 14 [14] 14* > *7.23 stale+active+clean [14] 14 [14] 14* > *7.153 stale+active+clean [14] 14 [14] 14* > *7.cc stale+active+clean [14] 14 [14] 14* > *7.16b stale+active+clean [14] 14 [14] 14* > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com