[ceph-users] Re: multiple OSD crash, unfound objects

Michael Thomas Sun, 22 Nov 2020 09:29:41 -0800

On 10/23/20 3:07 AM, Frank Schilder wrote:

Hi Michael.

I still don't see any traffic to the pool, though I'm also unsure how much 
traffic is to be expected.


Probably not much. If ceph df shows that the pool contains some objects, I 
guess that's sorted.

That osdmaptool crashes indicates that your cluster runs with corrupted 
internal data. I tested your crush map and you should get complete PGs for the 
fs data pool. That you don't and that osdmaptool crashes points at a corruption 
of internal data. I'm afraid this is the point where you need support from ceph 
developers and should file a tracker report 
(https://tracker.ceph.com/projects/ceph/issues). A short description of the 
origin of the situation with the osdmaptool output and a reference to this 
thread linked in should be sufficient. Please post a link to the ticket here.


https://tracker.ceph.com/issues/48059

In parallel, you should probably open a new thread focussed on the osd map 
corruption. Maybe there are low-level commands to repair it.


Will do.

You should wait with trying to clean up the unfound objects until this is 
resolved. Not sure about adding further storage either. To me, this sounds 
quite serious.

Another approach that I'm considering is to create a new pool using thesame set of OSDs, adding it to the set of cephfs data pools, andmigrating the data from the "broken" pool to the new pool.

I have some additional unused storage that I could add to this new pool,if I can figure out the right crush rules to make sure they don't getused for the "broken" pool too.


--Mike
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: multiple OSD crash, unfound objects

Reply via email to