Hi,

> On 7 Sep 2023, at 10:05, J-P Methot <[email protected]> wrote:
> 
> We're running latest Pacific on our production cluster and we've been seeing 
> the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after 
> 15.000000954s' error. We have reasons to believe this happens each time the 
> RocksDB compaction process is launched on an OSD. My question is, does the 
> cluster detecting that an OSD has timed out interrupt the compaction process? 
> This seems to be what's happening, but it's not immediately obvious. We are 
> currently facing an infinite loop of random OSDs timing out and if the 
> compaction process is interrupted without finishing, it may explain that.

You run the online compacting for this OSD's (`ceph osd compact ${osd_id}` 
command), right?



k
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to