Dear all,

We're running Ceph Luminous and we've recently hit an issue with some OSD's 
(autoout states, IO/CPU overload) which unfortunately resulted with one 
placement group with the state "stale+active+clean", it's a placement group 
from .rgw.root pool:

1.15          0                  0        0         0       0          0     1  
      1                                stale+active+clean 2020-05-11 
23:22:51.396288         40'1      2142:152 [3,2,6]          3 [3,2,6]           
   3        40'1 2020-04-22 00:46:05.904418            40'1 2020-04-20 
20:18:13.371396             0 

I guess there is no active replica of that object anywhere on the cluster. 
Restarting osd.3, osd.2 or osd.6 daemons does not help.

I've used ceph-objectstore-tool and successfully exported placement group from 
osd.3, osd.2 and osd.6 and tried to import it on a completely different OSD, 
the exports differ in filesize slightly, but the osd.3 wihch was the latest 
primary is the biggest so I've tried to import it on a different OSD, when 
starting up I see the following (this is from osd.1):
2020-05-14 21:43:19.779740 7f7880ac3700  1 osd.1 pg_epoch: 2459 pg[1.15( v 40'1 
(0'0,40'1] local-lis/les=2073/2074 n=0 ec=73/39 lis/c 2073/2073 les/c/f 
2074/2074/633 2145/39/2145) [] r=-1 lpr=2455 crt=40'1 lcod 0'0 unknown NOTIFY] 
state<Start>: transitioning to Stray

I see from previous pg dumps (several weeks before while it was still 
active+clean) that it was 115 bytes with zero objects in it but I am not sure 
how to interpret that.

As this is a pg from .rgw.root pool, I cannot get any response from the cluster 
when accessing (everything timeouts).

What is the correct course of action with this pg?

Any help would be greatly appriciated.

Thanks,
Tomislav
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to