Bart Van Assche wrote:
On 02/06/13 22:42, Vu Pham wrote:
Conclusion:
1. disable the port/path long enough >35 minutes, we have dangling scsi
host.
2. enable the port within 30 minute, scsi host re-establish connection,
path re-instate and then scsi_host was removed (no entry in sysfs)

I attached a log here to show what happened above.

Hello Vu,

I found the following in the attached logs:

[ ... ]
Feb  6 19:24:25 vsa30 kernel: scsi host10: ib_srp: SRP reset_host called
[ ... ]
Feb  6 19:25:28 vsa30 kernel: scsi host10: SRP abort called
[ ... ]

It is easy to see in patch 3/3 that srp_reset_host() invokes srp_reconnect_target() unconditionally and that that last function kills all outstanding requests via srp_reset_req(). So to me the above output means that the attached logs were generated by a kernel missing at least patch 3/3. This means that the above conclusions are invalid.

I think it is also worth mentioning here that I asked Mellanox two months ago via private e-mail to provide me access to a setup on which this issue can be reproduced and on which I can recompile the kernel myself. However, such access was never provided.

Bart.
Hello Bart,

srp_reconnect_target() kill all outstanding requests, fail to reconnect (port offline), queued to remove scsi_host --> srp_reset_host() return FAILED.

While scsi host has not been removed, multipath periodically still send TUR commands to check liveness of this path. Current srp_queuecommand() still process these TUR commands; therefore, the next SRP aborts are for those tur checker commands

You can see these tur checker errors from multipath in the log.

Sagi from Mellanox will work with you on webex.

thanks,
-vu


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to