Hello,

We've got some Intel DC S3610s 800GB in operation on cache tiers.
On the ones with G2010150 firmware we've seen _very_ infrequent SATA bus
resets [1]. On the order of once per year and these are fairly busy
critters with an average of 400 IOPS and peaks much higher than that.

Funnily enough the older 8 SSDs with the G2010140 firmware never have
shown this and given that all the newer once have at least once, that's
somewhat conclusive.

What I'm wondering is if there's a knob that allows an OSD to declare
itself down (not out) when any (and all) I/O takes more than x amount of
time.

On the affected OSD we see this, but from the perspective of the other
OSDs and MONs the health of this OSD was never in question of course:
---
2019-03-26 15:33:03.392644 7f09b0500700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f099747a700' had timed out after 15
---

Regards,

Christian


[1]
These kinds of reset, the logging happens after the fact, it takes about
40 seconds actually:
---
[54954736.886707] ata5.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action 0x6 
frozen
[54954736.887424] ata5.00: failed command: WRITE FPDMA QUEUED
[54954736.887856] ata5.00: cmd 61/20:30:70:a2:da/00:00:25:00:00/40 tag 6 ncq 
dma 16384 out
                           res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[54954736.888695] ata5.00: status: { DRDY }
[54954736.889112] ata5.00: failed command: WRITE FPDMA QUEUED
[54954736.889527] ata5.00: cmd 61/08:38:b0:8b:71/00:00:26:00:00/40 tag 7 ncq 
dma 4096 out
                           res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[54954736.890358] ata5.00: status: { DRDY }
[54954736.890781] ata5: hard resetting link
[54954737.205313] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[54954737.206133] ata5.00: configured for UDMA/133
[54954737.206140] ata5: EH complete
---

-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to