[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues

Jan Pekař - Imatic Tue, 21 Jul 2020 09:32:11 -0700

Hi Ben,

we are not using EC pool on that cluster.


OSD out behavior almost stopped when we solved memory issues (less memory 
allocated to OSD's).
Now we are not working on that cluster anymore so we have no other info about 
that problem.

Jan

On 20/07/2020 07.59, Benoît Knecht wrote:

Hi Jan,

Jan Pekař wrote:

Also I'm concerned, that this OSD restart caused data degradation and recovery 
- cluster should be clean immediately after OSD up when no
client was uploading/modifying data during my tests.

We're experiencing the same thing on our 14.2.10 cluster. After marking an OSD 
out, if it's briefly marked down (due to the missed heartbeats or because the 
daemon was manually restarted) the PGs that were still mapped on it disappear 
all at once, and we get degraded objects as a result.

In our case, those PGs belong to an EC pool, and we use the PG balancer in 
upmap mode, so we have a few upmapped PGs on that OSD. Is that the case for you 
too?

We're going to run some tests to try and better understand what's going on 
there, but we welcome any feedback in the meantime.

Cheers,

--
Ben


--
============
Ing. Jan Pekař
jan.pe...@imatic.cz
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326
============
--
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues

Reply via email to