We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port Backplane 
port multipliers (the "backblaze storage pod").  Under intense IO (ZFS rebuild, 
presently) the system will lock up all IO for 3-4 minutes and the following 
entry appears in the dmesg:

siisch11: Timeout on slot 30
siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
80192000 serr 00000000
siisch11:  ... waiting for slots 25000000
siisch11: Timeout on slot 26
siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
80192000 serr 00000000
siisch11:  ... waiting for slots 21000000
siisch11: Timeout on slot 29
siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
80192000 serr 00000000
siisch11:  ... waiting for slots 01000000
siisch11: Timeout on slot 24
siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
80192000 serr 00000000

The errors are on different siisch devices so its not likely to be a SATA cable 
issue unless multiple cables all went bad at the same time.  On the advice of 
some other posts to the mailing list I've already tried locking the SATA rev to 
one with the following in /boot/loader.conf which didn't

hint.siisch.0.sata_rev=1
hint.siisch.1.sata_rev=1
hint.siisch.2.sata_rev=1
hint.siisch.3.sata_rev=1
hint.siisch.4.sata_rev=1
hint.siisch.5.sata_rev=1
hint.siisch.6.sata_rev=1
hint.siisch.7.sata_rev=1
hint.siisch.8.sata_rev=1
hint.siisch.9.sata_rev=1
hint.siisch.10.sata_rev=1
hint.siisch.11.sata_rev=1

From time to time this is also causing one of the attached drives to go offline:

siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 
801f2000 serr 00000000
(ada0:siisch0:0:0:0): lost device
(ada0:siisch0:0:0:0): removing device entry
ada0 at siisch0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
siisch11: Timeout on slot 30

When the drive goes offline that causes the ZFS rebuild to restart, and so it's 
never finishing the rebuild of the array.  Does anyone have any insight into 
what could be causing the timeouts and what we can do to resolve them?  Right 
now my priority is to get the system a bit more stable so the current ZFS 
rebuild can complete – right now it's been doing the same rebuild for just over 
6 days and the timeouts and drive drop offs are causing it to restart 
constantly.





________________________________

 This electronic message contains information from Primus Telecommunications 
Canada Inc. ("PRIMUS") , which may be legally privileged and confidential. The 
information is intended to be for the use of the individual(s) or entity named 
above. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the contents of this information is prohibited. 
If you have received this electronic message in error, please notify us by 
telephone or e-mail (to the number or address above) immediately. Any views, 
opinions or advice expressed in this electronic message are not necessarily the 
views, opinions or advice of PRIMUS. It is the responsibility of the recipient 
to ensure that any attachments are virus free and PRIMUS bears no 
responsibility for any loss or damage arising in any way from the use 
thereof.The term "PRIMUS" includes its affiliates.

________________________________
 Pour la version en français de ce message, veuillez voir
http://www.primustel.ca/fr/legal/cs.htm
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to