[ceph-users] Re: Impact of Slow OPS?

2024-04-06 Thread Anthony D'Atri
ISTR that the Ceph slow op threshold defaults to 30 or 32 seconds.   Naturally 
an op over the threshold often means there are more below the reporting 
threshold.  

120s I think is the default Linux op timeout.  

> On Apr 6, 2024, at 10:53 AM, David C.  wrote:
> 
> Hi,
> 
> Do slow ops impact data integrity => No
> Can I generally ignore it => No :)
> 
> This means that some client transactions are blocked for 120 sec (that's a
> lot).
> This could be a lock on the client side (CephFS, essentially), an incident
> on the infrastructure side (a disk about to fall, network instability,
> etc.), ...
> 
> When this happens, you need to look at the blocked requests.
> If you systematically see an osd ID, then look at dmesg and the SMART of
> the disk.
> 
> This can also be an architectural problem (for example, high IOPS load with
> osdmap on HDD, all multiplied by the erasure code)
> 
> *David*
> 
> 
>> Le ven. 5 avr. 2024 à 19:42, adam.ther  a écrit :
>> 
>> Hello,
>> 
>> Do slow ops impact data integrity or can I generally ignore it? I'm
>> loading 3 hosts with a 10GB link and it saturating the disks or the OSDs.
>> 
>>2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check
>>update: 3 slow ops, oldest one blocked for 117 sec, daemons
>>[osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops.
>> (SLOW_OPS)
>> 
>>2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check
>>update: 2 slow ops, oldest one blocked for 123 sec, daemons
>>[osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS)
>> 
>> I guess more to the point, what the impact here?
>> 
>> Thanks,
>> 
>> Adam
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Impact of Slow OPS?

2024-04-06 Thread David C.
Hi,

Do slow ops impact data integrity => No
Can I generally ignore it => No :)

This means that some client transactions are blocked for 120 sec (that's a
lot).
This could be a lock on the client side (CephFS, essentially), an incident
on the infrastructure side (a disk about to fall, network instability,
etc.), ...

When this happens, you need to look at the blocked requests.
If you systematically see an osd ID, then look at dmesg and the SMART of
the disk.

This can also be an architectural problem (for example, high IOPS load with
osdmap on HDD, all multiplied by the erasure code)

*David*


Le ven. 5 avr. 2024 à 19:42, adam.ther  a écrit :

> Hello,
>
> Do slow ops impact data integrity or can I generally ignore it? I'm
> loading 3 hosts with a 10GB link and it saturating the disks or the OSDs.
>
> 2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check
> update: 3 slow ops, oldest one blocked for 117 sec, daemons
> [osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops.
> (SLOW_OPS)
>
> 2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check
> update: 2 slow ops, oldest one blocked for 123 sec, daemons
> [osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS)
>
> I guess more to the point, what the impact here?
>
> Thanks,
>
> Adam
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io