ISTR that the Ceph slow op threshold defaults to 30 or 32 seconds. Naturally
an op over the threshold often means there are more below the reporting
threshold.
120s I think is the default Linux op timeout.
> On Apr 6, 2024, at 10:53 AM, David C. wrote:
>
> Hi,
>
> Do slow ops impact data integrity => No
> Can I generally ignore it => No :)
>
> This means that some client transactions are blocked for 120 sec (that's a
> lot).
> This could be a lock on the client side (CephFS, essentially), an incident
> on the infrastructure side (a disk about to fall, network instability,
> etc.), ...
>
> When this happens, you need to look at the blocked requests.
> If you systematically see an osd ID, then look at dmesg and the SMART of
> the disk.
>
> This can also be an architectural problem (for example, high IOPS load with
> osdmap on HDD, all multiplied by the erasure code)
>
> *David*
>
>
>> Le ven. 5 avr. 2024 à 19:42, adam.ther a écrit :
>>
>> Hello,
>>
>> Do slow ops impact data integrity or can I generally ignore it? I'm
>> loading 3 hosts with a 10GB link and it saturating the disks or the OSDs.
>>
>>2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check
>>update: 3 slow ops, oldest one blocked for 117 sec, daemons
>>[osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops.
>> (SLOW_OPS)
>>
>>2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check
>>update: 2 slow ops, oldest one blocked for 123 sec, daemons
>>[osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS)
>>
>> I guess more to the point, what the impact here?
>>
>> Thanks,
>>
>> Adam
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io