Bart Van Assche <bvanass...@acm.org> wrote:
> On 11/12/12 23:40, Or Gerlitz wrote:
> This patch is not an essential part of this patch series. All it does
> is to trigger failover more quickly if a port down event has been
> received. Without this patch, if an IB cable has been disconnected long
> enough, a QP error will be generated anyway and that event will trigger
> the path failure logic introduced in the earlier patches of this series.

But if you have IB link which went down only to few milli-seconds or
even few hundred msecs, why disconnected the IB RC connection? IB is
layered and the RC transport is layer four, why we want to manually
break it if we have L2 event of port down?

Also, for the use case of multipath, an essentail part of the mpath
driver is to deal with (say) two devices when at some point at least
one of them becomes "failed" from the mpath point of view. So now this
patch comes and delets failed devices, but we've put mpath there so we
can deal with failed devices! also its very confusing for the mpath
users that would expect to be able to observe all the devices which
this mpath is set on and  their state, agree?





> Regarding file system behavior: if a file system should be shielded
> from path failures in a multipath setup then it should be mounted on
> top of a multipath device instead of using the SCSI host directly
> created by ib_srp. In the file system tests I ran I have been using
> the following multipathd options:
>
> defaults {
>     queue_without_daemon no
> }
> devices {
>     device {
>         ...
>         features          "3 queue_if_no_path pg_init_retries 50"
>         fast_io_fail_tmo  15
>         dev_loss_tmo      60
>     }
> }
>
> Are you perhaps worrying about what will happen in a setup with a single
> path between initiator and target and where the IB connection disappears
> and reappears quickly ? Shouldn't multipath be used even in such a setup
> to avoid that the filesystem encounters an I/O error if the path disappears
> for a longer time than what is tolerated by the SCSI error handler in order
> to recover gracefully ?

this gets way too much complicated, and just for patch which you said
"is not an essential part of this patch series" ... can we just drop
it altogether from the series?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to