Hmm, I don't see how REQ gives you data on existing connection. Further,
this would need a spec extension to define private data format then?
LAP trick works out of the box ...
LAP keep-alives requires the apps to implement the keep alive timers and
detection, but sends the messages out-of-band. Why not send the
messages in-band? Would it make more sense to implement the entire
keep-alive solution in the CM?
I actually think a single working solution is enough.
No need to explore all of them :).
I'm not saying implement all of them, just make sure that we have the
best solution. I can't think of one that I like better than using LAP,
but it feels like the CM protocol / MADs are being hijacked. For
example, if there's only one path between two nodes, LAP doesn't really
make any sense, but it ends up being used. Should we instead look at
adding new CM messages for just this purpose?
For
example, event registration could be used to detect that a remote node
has gone down.
We could use per node keep alive messages, rather than
per connection messages.
No, these won't address cases such as DREQ timeout after remote
decides to close connection, without reboot.
Per node keep alive messages could. It depends on what data is carried
in the message (e.g. all currently connected QPs to the node in
question). I mentioned this because it may be more efficient under some
circumstances.
- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general