Bart Van Assche wrote:
On 02/06/13 22:42, Vu Pham wrote:
Conclusion:
1. disable the port/path long enough >35 minutes, we have dangling scsi
host.
2. enable the port within 30 minute, scsi host re-establish connection,
path re-instate and then scsi_host was removed (no entry in sysfs)
I attached a log here to show what happened above.
Hello Vu,
I found the following in the attached logs:
[ ... ]
Feb 6 19:24:25 vsa30 kernel: scsi host10: ib_srp: SRP reset_host called
[ ... ]
Feb 6 19:25:28 vsa30 kernel: scsi host10: SRP abort called
[ ... ]
It is easy to see in patch 3/3 that srp_reset_host() invokes
srp_reconnect_target() unconditionally and that that last function
kills all outstanding requests via srp_reset_req(). So to me the above
output means that the attached logs were generated by a kernel missing
at least patch 3/3. This means that the above conclusions are invalid.
I think it is also worth mentioning here that I asked Mellanox two
months ago via private e-mail to provide me access to a setup on which
this issue can be reproduced and on which I can recompile the kernel
myself. However, such access was never provided.
Bart.
Hello Bart,
srp_reconnect_target() kill all outstanding requests, fail to reconnect
(port offline), queued to remove scsi_host --> srp_reset_host() return
FAILED.
While scsi host has not been removed, multipath periodically still send
TUR commands to check liveness of this path.
Current srp_queuecommand() still process these TUR commands; therefore,
the next SRP aborts are for those tur checker commands
You can see these tur checker errors from multipath in the log.
Sagi from Mellanox will work with you on webex.
thanks,
-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html