Roland Dreier wrote:
 > > And what if you comment out the line
 > >      .eh_device_reset_handler        = srp_reset_device,
 > > does that fix it?

 > No

Now I'm really confused.


Me too.

It seems we lose the connection to the target (BTW -- do you know why
the connection is getting killed)?

I reported the error from my original email responding to your fmr patch. For ia64 system with pcix hca I got asyn event IB_EVENT_QP_ACCESS_ERR at the initiator (and I got cqe with IB_COMPLETION_STATUS_REMOTE_ACCESS_ERROR status at my target)
I still have not had an IB analyzer trace (as you suggested)


So the SCSI midlayer times out commands and tries to abort them.  But
we have no connection so the abort fails.  The SCSI command shouldn't
get freed now (at least if I'm understanding scsi_error.c correctly).

Then we have no .eh_device_reset_handler so everything should fall
through to calling our .eh_host_reset_handler without freeing any SCSI
commands.  And then we crash on a use-after-free of a SCSI command.

So where is that command getting freed on us??


The scsi command that is used by error handlers (.eh_abort_handler, .eh_host_reset_handler...) is not the same as use-after-free scsi command from req->scmnd

There is some glitch that the scsi command from req->scmnd already freed by scsi midlayer; however, the request is still in our pending request queue

Vu


_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to