On Wed, Sep 16, 2009 at 9:41 PM, Chris Worley <worl...@gmail.com> wrote: > > On Wed, Sep 16, 2009 at 12:15 PM, Vladislav Bolkhovitin <v...@vlnb.net> wrote: > > Chris Worley, on 09/16/2009 12:51 AM wrote: > > > > > > On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <v...@vlnb.net> > > > wrote: > > > [ ... ] > > > [ 357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2 > > > [ 357.250553] scst: ***ERROR***: Target driver ib_srpt > > > xmit_response() returned fatal error > > > > It's because srpt called scst_tgt_cmd_done() when the corresponding command > > hasn't yet been sent to xmit_response() callback, so srpt should use another > > function to abort commands in this state. > > Could this be related to the hang (i.e. the command has been aborted > before xmit_response has been called... but w/o causing a panic)?
When analyzing such logs it's important to distinguish between cause and consequence. What happened first is that the OFED SRP initiator noticed that something went wrong with the IB communication, as indicated by the log message "srp_qp_in_err_timer called". This means that an error occurred in the IB network or in one of the two IB stacks. This resulted in the SRP initiator trying to relogin without intervening logout. The error messages logged by SRPT are a consequence of the initiator relogin. While the SRPT issue will be fixed, such a fix won't solve the slow reads and the hang you observed. Regarding the SRP communication problems you observed: since my attempts to reproduce this issue have been unsuccessful so far, I'm afraid these communication problems are caused by some component in your IB network that is not working as reliable as it should. By the way, the description of the patch that generated the message "srp_qp_in_err_timer called" is interesting. The patch description indicates that the condition "srp_qp_in_err_timer called" should only happen during multipath failover. See also http://www.mail-archive.com/e...@lists.openfabrics.org/msg01959.html (which is not the latest version of this patch). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html