Re: 4.6.4, 3w_sas: timeout too small?
Hi Adam, thanx very much for your detailed analysis. Regards Harri
Re: 4.6.4, 3w_sas: timeout too small?
Hi Adam, thanx very much for your detailed analysis. Regards Harri
Re: 4.6.4, 3w_sas: timeout too small?
>> Oct 16 06:41:33 nasl003b kernel: [4903207.577484] 3w-sas: scsi0: ERROR: >> (0x06:0x000D): Microcontroller Error: clearing. Microcontroller Error = FW crashed and will be reset on the next I/O timeout or ioctl() timeout, whichever happens first. It could be the iSCSI target code or initiator sent a SCSI CDB not supported by 3ware 9750 firmware (or malformed CDB), or your 3ware card is going bad, most likely the later. >> Oct 16, 2016 06:45:03AM (0x04:0x005E): Cache synchronization completed: >> unit=0 Cache sync happens after an online controller reset. Previously acknowledged write back cache entries are flushed to disk at this time. >> The 3ware web interface says the controller was busy to write the cache to >> disk It is not the case that the controller was performing some long cache flush interval, > 60 seconds, such that you would need to adjust any timeout values in the controller reset sequence code, i.e. twl_reset_sequence(). -Adam On Mon, Oct 17, 2016 at 6:09 AM, Harald Dunkelwrote: > Hi folks, > > Question about the reset timeout of the 3w_sas: > > For an unknown reason my 3ware 9750 RAID controller became > unresponsive. kernel.log: > > Oct 16 06:30:42 nasl003b kernel: [4902556.775272] 3w-sas: scsi0: AEN: INFO > (0x04:0x002B): Verify completed:unit=1. > Oct 16 06:38:29 nasl003b kernel: [4903023.388357] 3w-sas: scsi0: AEN: INFO > (0x04:0x002B): Verify completed:unit=2. > Oct 16 06:41:33 nasl003b kernel: [4903207.577484] 3w-sas: scsi0: ERROR: > (0x06:0x000D): Microcontroller Error: clearing. > Oct 16 06:42:11 nasl003b kernel: [4903245.701070] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva001 > Oct 16 06:42:11 nasl003b kernel: [4903245.706778] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva001 > Oct 16 06:42:15 nasl003b kernel: [4903249.298778] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva002 > Oct 16 06:42:15 nasl003b kernel: [4903249.304369] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva002 > Oct 16 06:42:15 nasl003b kernel: [4903249.318734] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva003 > Oct 16 06:42:15 nasl003b kernel: [4903249.324122] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva003 > Oct 16 06:42:41 nasl003b kernel: [4903275.739327] 3w-sas: scsi0: WARNING: > (0x06:0x0006): Character ioctl (0x108) timed out, resetting card. > Oct 16 06:42:41 nasl003b kernel: [4903275.739402] sd 0:0:0:0: WARNING: > (0x06:0x002C): Command (0x2a) timed out, resetting card. > Oct 16 06:43:36 nasl003b kernel: [4903330.757153] iSCSI Login timeout on > Network Portal 172.19.96.217:3260 > Oct 16 06:43:36 nasl003b kernel: [4903330.762124] iSCSI Login negotiation > failed. > Oct 16 06:43:42 nasl003b kernel: [4903336.720892] 3w-sas: scsi0: ERROR: > (0x06:0x0011): Controller not ready during reset sequence. > Oct 16 06:43:51 nasl003b kernel: [4903345.796529] iSCSI Login timeout on > Network Portal 172.19.96.217:3260 > Oct 16 06:43:51 nasl003b kernel: [4903345.801462] iSCSI Login negotiation > failed. > Oct 16 06:44:06 nasl003b kernel: [4903360.835920] iSCSI Login timeout on > Network Portal 172.19.96.217:3260 > Oct 16 06:44:06 nasl003b kernel: [4903360.840843] iSCSI Login negotiation > failed. > Oct 16 06:44:13 nasl003b kernel: [4903367.919662] INFO: task jbd2/sda1-8:235 > blocked for more than 120 seconds. > Oct 16 06:44:13 nasl003b kernel: [4903367.924633] Tainted: G > E 4.6.0-0.bpo.1-amd64 #1 > Oct 16 06:44:13 nasl003b kernel: [4903367.929624] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 16 06:44:13 nasl003b kernel: [4903367.934751] jbd2/sda1-8 D > 88042d896000 0 235 2 0x > Oct 16 06:44:13 nasl003b kernel: [4903367.934756] 880424c2c440 > 88042b43a140 8804249e 8804249dfbc0 > Oct 16 06:44:13 nasl003b kernel: [4903367.934760] 7fff > 815c50a0 8804249dfc40 8804000daf00 > Oct 16 06:44:13 nasl003b kernel: [4903367.934763] 815c4821 > 815c785f 7fff > Oct 16 06:44:13 nasl003b kernel: [4903367.934766] Call Trace: > Oct 16 06:44:13 nasl003b kernel: [4903367.934783] [] ? > bit_wait_timeout+0xa0/0xa0 > Oct 16 06:44:13 nasl003b kernel: [4903367.934786] [] ? > schedule+0x31/0x80 > Oct 16 06:44:13 nasl003b kernel: [4903367.934789] [] ? > schedule_timeout+0x22f/0x2c0 > Oct 16 06:44:13 nasl003b kernel: [4903367.934794] [] ? > blk_peek_request+0x4d/0x280 > Oct 16 06:44:13 nasl003b kernel: [4903367.934797] [] ? > bit_wait_timeout+0xa0/0xa0 > : > : > > The interesting point is that apparently the reset didn't succeed. >
Re: 4.6.4, 3w_sas: timeout too small?
>> Oct 16 06:41:33 nasl003b kernel: [4903207.577484] 3w-sas: scsi0: ERROR: >> (0x06:0x000D): Microcontroller Error: clearing. Microcontroller Error = FW crashed and will be reset on the next I/O timeout or ioctl() timeout, whichever happens first. It could be the iSCSI target code or initiator sent a SCSI CDB not supported by 3ware 9750 firmware (or malformed CDB), or your 3ware card is going bad, most likely the later. >> Oct 16, 2016 06:45:03AM (0x04:0x005E): Cache synchronization completed: >> unit=0 Cache sync happens after an online controller reset. Previously acknowledged write back cache entries are flushed to disk at this time. >> The 3ware web interface says the controller was busy to write the cache to >> disk It is not the case that the controller was performing some long cache flush interval, > 60 seconds, such that you would need to adjust any timeout values in the controller reset sequence code, i.e. twl_reset_sequence(). -Adam On Mon, Oct 17, 2016 at 6:09 AM, Harald Dunkel wrote: > Hi folks, > > Question about the reset timeout of the 3w_sas: > > For an unknown reason my 3ware 9750 RAID controller became > unresponsive. kernel.log: > > Oct 16 06:30:42 nasl003b kernel: [4902556.775272] 3w-sas: scsi0: AEN: INFO > (0x04:0x002B): Verify completed:unit=1. > Oct 16 06:38:29 nasl003b kernel: [4903023.388357] 3w-sas: scsi0: AEN: INFO > (0x04:0x002B): Verify completed:unit=2. > Oct 16 06:41:33 nasl003b kernel: [4903207.577484] 3w-sas: scsi0: ERROR: > (0x06:0x000D): Microcontroller Error: clearing. > Oct 16 06:42:11 nasl003b kernel: [4903245.701070] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva001 > Oct 16 06:42:11 nasl003b kernel: [4903245.706778] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva001 > Oct 16 06:42:15 nasl003b kernel: [4903249.298778] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva002 > Oct 16 06:42:15 nasl003b kernel: [4903249.304369] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva002 > Oct 16 06:42:15 nasl003b kernel: [4903249.318734] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva003 > Oct 16 06:42:15 nasl003b kernel: [4903249.324122] TMR Opcode > TARGET_WARM_RESET authorization failed for Initiator Node: > iqn.2014-01.com.example.ac.srva003 > Oct 16 06:42:41 nasl003b kernel: [4903275.739327] 3w-sas: scsi0: WARNING: > (0x06:0x0006): Character ioctl (0x108) timed out, resetting card. > Oct 16 06:42:41 nasl003b kernel: [4903275.739402] sd 0:0:0:0: WARNING: > (0x06:0x002C): Command (0x2a) timed out, resetting card. > Oct 16 06:43:36 nasl003b kernel: [4903330.757153] iSCSI Login timeout on > Network Portal 172.19.96.217:3260 > Oct 16 06:43:36 nasl003b kernel: [4903330.762124] iSCSI Login negotiation > failed. > Oct 16 06:43:42 nasl003b kernel: [4903336.720892] 3w-sas: scsi0: ERROR: > (0x06:0x0011): Controller not ready during reset sequence. > Oct 16 06:43:51 nasl003b kernel: [4903345.796529] iSCSI Login timeout on > Network Portal 172.19.96.217:3260 > Oct 16 06:43:51 nasl003b kernel: [4903345.801462] iSCSI Login negotiation > failed. > Oct 16 06:44:06 nasl003b kernel: [4903360.835920] iSCSI Login timeout on > Network Portal 172.19.96.217:3260 > Oct 16 06:44:06 nasl003b kernel: [4903360.840843] iSCSI Login negotiation > failed. > Oct 16 06:44:13 nasl003b kernel: [4903367.919662] INFO: task jbd2/sda1-8:235 > blocked for more than 120 seconds. > Oct 16 06:44:13 nasl003b kernel: [4903367.924633] Tainted: G > E 4.6.0-0.bpo.1-amd64 #1 > Oct 16 06:44:13 nasl003b kernel: [4903367.929624] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 16 06:44:13 nasl003b kernel: [4903367.934751] jbd2/sda1-8 D > 88042d896000 0 235 2 0x > Oct 16 06:44:13 nasl003b kernel: [4903367.934756] 880424c2c440 > 88042b43a140 8804249e 8804249dfbc0 > Oct 16 06:44:13 nasl003b kernel: [4903367.934760] 7fff > 815c50a0 8804249dfc40 8804000daf00 > Oct 16 06:44:13 nasl003b kernel: [4903367.934763] 815c4821 > 815c785f 7fff > Oct 16 06:44:13 nasl003b kernel: [4903367.934766] Call Trace: > Oct 16 06:44:13 nasl003b kernel: [4903367.934783] [] ? > bit_wait_timeout+0xa0/0xa0 > Oct 16 06:44:13 nasl003b kernel: [4903367.934786] [] ? > schedule+0x31/0x80 > Oct 16 06:44:13 nasl003b kernel: [4903367.934789] [] ? > schedule_timeout+0x22f/0x2c0 > Oct 16 06:44:13 nasl003b kernel: [4903367.934794] [] ? > blk_peek_request+0x4d/0x280 > Oct 16 06:44:13 nasl003b kernel: [4903367.934797] [] ? > bit_wait_timeout+0xa0/0xa0 > : > : > > The interesting point is that apparently the reset didn't succeed. > "Controller not ready during reset
4.6.4, 3w_sas: timeout too small?
Hi folks, Question about the reset timeout of the 3w_sas: For an unknown reason my 3ware 9750 RAID controller became unresponsive. kernel.log: Oct 16 06:30:42 nasl003b kernel: [4902556.775272] 3w-sas: scsi0: AEN: INFO (0x04:0x002B): Verify completed:unit=1. Oct 16 06:38:29 nasl003b kernel: [4903023.388357] 3w-sas: scsi0: AEN: INFO (0x04:0x002B): Verify completed:unit=2. Oct 16 06:41:33 nasl003b kernel: [4903207.577484] 3w-sas: scsi0: ERROR: (0x06:0x000D): Microcontroller Error: clearing. Oct 16 06:42:11 nasl003b kernel: [4903245.701070] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva001 Oct 16 06:42:11 nasl003b kernel: [4903245.706778] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva001 Oct 16 06:42:15 nasl003b kernel: [4903249.298778] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva002 Oct 16 06:42:15 nasl003b kernel: [4903249.304369] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva002 Oct 16 06:42:15 nasl003b kernel: [4903249.318734] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva003 Oct 16 06:42:15 nasl003b kernel: [4903249.324122] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva003 Oct 16 06:42:41 nasl003b kernel: [4903275.739327] 3w-sas: scsi0: WARNING: (0x06:0x0006): Character ioctl (0x108) timed out, resetting card. Oct 16 06:42:41 nasl003b kernel: [4903275.739402] sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x2a) timed out, resetting card. Oct 16 06:43:36 nasl003b kernel: [4903330.757153] iSCSI Login timeout on Network Portal 172.19.96.217:3260 Oct 16 06:43:36 nasl003b kernel: [4903330.762124] iSCSI Login negotiation failed. Oct 16 06:43:42 nasl003b kernel: [4903336.720892] 3w-sas: scsi0: ERROR: (0x06:0x0011): Controller not ready during reset sequence. Oct 16 06:43:51 nasl003b kernel: [4903345.796529] iSCSI Login timeout on Network Portal 172.19.96.217:3260 Oct 16 06:43:51 nasl003b kernel: [4903345.801462] iSCSI Login negotiation failed. Oct 16 06:44:06 nasl003b kernel: [4903360.835920] iSCSI Login timeout on Network Portal 172.19.96.217:3260 Oct 16 06:44:06 nasl003b kernel: [4903360.840843] iSCSI Login negotiation failed. Oct 16 06:44:13 nasl003b kernel: [4903367.919662] INFO: task jbd2/sda1-8:235 blocked for more than 120 seconds. Oct 16 06:44:13 nasl003b kernel: [4903367.924633] Tainted: GE 4.6.0-0.bpo.1-amd64 #1 Oct 16 06:44:13 nasl003b kernel: [4903367.929624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 16 06:44:13 nasl003b kernel: [4903367.934751] jbd2/sda1-8 D 88042d896000 0 235 2 0x Oct 16 06:44:13 nasl003b kernel: [4903367.934756] 880424c2c440 88042b43a140 8804249e 8804249dfbc0 Oct 16 06:44:13 nasl003b kernel: [4903367.934760] 7fff 815c50a0 8804249dfc40 8804000daf00 Oct 16 06:44:13 nasl003b kernel: [4903367.934763] 815c4821 815c785f 7fff Oct 16 06:44:13 nasl003b kernel: [4903367.934766] Call Trace: Oct 16 06:44:13 nasl003b kernel: [4903367.934783] [] ? bit_wait_timeout+0xa0/0xa0 Oct 16 06:44:13 nasl003b kernel: [4903367.934786] [] ? schedule+0x31/0x80 Oct 16 06:44:13 nasl003b kernel: [4903367.934789] [] ? schedule_timeout+0x22f/0x2c0 Oct 16 06:44:13 nasl003b kernel: [4903367.934794] [] ? blk_peek_request+0x4d/0x280 Oct 16 06:44:13 nasl003b kernel: [4903367.934797] [] ? bit_wait_timeout+0xa0/0xa0 : : The interesting point is that apparently the reset didn't succeed. "Controller not ready during reset sequence" at 06:43:42. The 3ware web interface says the controller was busy to write the cache to disk: Oct 16, 2016 06:45:03AM (0x04:0x005E): Cache synchronization completed: unit=0 I have the impression that the timeout (60 secs?) in the driver is insufficient. Do you think the status check in twl_reset_sequence() could be improved? There were no disk failures, of course. Every helpful comment is highly appreciated. Harri
4.6.4, 3w_sas: timeout too small?
Hi folks, Question about the reset timeout of the 3w_sas: For an unknown reason my 3ware 9750 RAID controller became unresponsive. kernel.log: Oct 16 06:30:42 nasl003b kernel: [4902556.775272] 3w-sas: scsi0: AEN: INFO (0x04:0x002B): Verify completed:unit=1. Oct 16 06:38:29 nasl003b kernel: [4903023.388357] 3w-sas: scsi0: AEN: INFO (0x04:0x002B): Verify completed:unit=2. Oct 16 06:41:33 nasl003b kernel: [4903207.577484] 3w-sas: scsi0: ERROR: (0x06:0x000D): Microcontroller Error: clearing. Oct 16 06:42:11 nasl003b kernel: [4903245.701070] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva001 Oct 16 06:42:11 nasl003b kernel: [4903245.706778] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva001 Oct 16 06:42:15 nasl003b kernel: [4903249.298778] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva002 Oct 16 06:42:15 nasl003b kernel: [4903249.304369] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva002 Oct 16 06:42:15 nasl003b kernel: [4903249.318734] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva003 Oct 16 06:42:15 nasl003b kernel: [4903249.324122] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.2014-01.com.example.ac.srva003 Oct 16 06:42:41 nasl003b kernel: [4903275.739327] 3w-sas: scsi0: WARNING: (0x06:0x0006): Character ioctl (0x108) timed out, resetting card. Oct 16 06:42:41 nasl003b kernel: [4903275.739402] sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x2a) timed out, resetting card. Oct 16 06:43:36 nasl003b kernel: [4903330.757153] iSCSI Login timeout on Network Portal 172.19.96.217:3260 Oct 16 06:43:36 nasl003b kernel: [4903330.762124] iSCSI Login negotiation failed. Oct 16 06:43:42 nasl003b kernel: [4903336.720892] 3w-sas: scsi0: ERROR: (0x06:0x0011): Controller not ready during reset sequence. Oct 16 06:43:51 nasl003b kernel: [4903345.796529] iSCSI Login timeout on Network Portal 172.19.96.217:3260 Oct 16 06:43:51 nasl003b kernel: [4903345.801462] iSCSI Login negotiation failed. Oct 16 06:44:06 nasl003b kernel: [4903360.835920] iSCSI Login timeout on Network Portal 172.19.96.217:3260 Oct 16 06:44:06 nasl003b kernel: [4903360.840843] iSCSI Login negotiation failed. Oct 16 06:44:13 nasl003b kernel: [4903367.919662] INFO: task jbd2/sda1-8:235 blocked for more than 120 seconds. Oct 16 06:44:13 nasl003b kernel: [4903367.924633] Tainted: GE 4.6.0-0.bpo.1-amd64 #1 Oct 16 06:44:13 nasl003b kernel: [4903367.929624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 16 06:44:13 nasl003b kernel: [4903367.934751] jbd2/sda1-8 D 88042d896000 0 235 2 0x Oct 16 06:44:13 nasl003b kernel: [4903367.934756] 880424c2c440 88042b43a140 8804249e 8804249dfbc0 Oct 16 06:44:13 nasl003b kernel: [4903367.934760] 7fff 815c50a0 8804249dfc40 8804000daf00 Oct 16 06:44:13 nasl003b kernel: [4903367.934763] 815c4821 815c785f 7fff Oct 16 06:44:13 nasl003b kernel: [4903367.934766] Call Trace: Oct 16 06:44:13 nasl003b kernel: [4903367.934783] [] ? bit_wait_timeout+0xa0/0xa0 Oct 16 06:44:13 nasl003b kernel: [4903367.934786] [] ? schedule+0x31/0x80 Oct 16 06:44:13 nasl003b kernel: [4903367.934789] [] ? schedule_timeout+0x22f/0x2c0 Oct 16 06:44:13 nasl003b kernel: [4903367.934794] [] ? blk_peek_request+0x4d/0x280 Oct 16 06:44:13 nasl003b kernel: [4903367.934797] [] ? bit_wait_timeout+0xa0/0xa0 : : The interesting point is that apparently the reset didn't succeed. "Controller not ready during reset sequence" at 06:43:42. The 3ware web interface says the controller was busy to write the cache to disk: Oct 16, 2016 06:45:03AM (0x04:0x005E): Cache synchronization completed: unit=0 I have the impression that the timeout (60 secs?) in the driver is insufficient. Do you think the status check in twl_reset_sequence() could be improved? There were no disk failures, of course. Every helpful comment is highly appreciated. Harri