On Fri, Jun 17, 2016 at 2:21 PM, <[email protected]> wrote: > I successfully exported the problematic zfs and installed it into > another JBOD chassis > and imported. scrub and zfs send now run fine. So it isn't the > disks, must be cable or chassis or HBA or ....
FWIW there is a handy tool [1] that decodes LSI log info codes. Looking at your logs, there are two unique IOCLogInfo codes: IOCLogInfo=0x3112010c IOCLogInfo=0x31120302 $ ./lsi_decode_loginfo.py 0x3112010c Value 3112010Ch Type: 30000000h SAS Origin: 01000000h PL Code: 00120000h PL_LOGINFO_CODE_ABORT See Sub-Codes below (PL_LOGINFO_SUB_CODE) Sub Code: 00000100h PL_LOGINFO_SUB_CODE_OPEN_FAILURE SubSub Code: 0000000Ch PL_LOGINFO_SUB_CODE_OPEN_FAIL_OPEN_TIMEOUT_EXP $ ./lsi_decode_loginfo.py 0x31120302 Value 31120302h Type: 30000000h SAS Origin: 01000000h PL Code: 00120000h PL_LOGINFO_CODE_ABORT See Sub-Codes below (PL_LOGINFO_SUB_CODE) Sub Code: 00000300h PL_LOGINFO_SUB_CODE_WRONG_REL_OFF_OR_FRAME_LENGTH Unparsed 00000002h If I had to hazard a guess I'd say there's a low-level issue in the SAS fabric, maybe a bad expander or cable that's disrupting everything. The HBA is aborting the commands it's waiting for answers to, either because the target never responds, or in the latter case, possibly corruption of the protocol traffic. This would seem to align with your finding that moving the disks to a new chassis made the issue go away. Eric [1] https://github.com/baruch/lsi_decode_loginfo _______________________________________________ OmniOS-discuss mailing list [email protected] http://lists.omniti.com/mailman/listinfo/omnios-discuss
