Hmm, I don't see how REQ gives you data on existing connection. Further,
this would need a spec extension to define private data format then?
LAP trick works out of the box ...

LAP keep-alives requires the apps to implement the keep alive timers and detection, but sends the messages out-of-band. Why not send the messages in-band? Would it make more sense to implement the entire keep-alive solution in the CM?

I actually think a single working solution is enough.
No need to explore all of them :).

I'm not saying implement all of them, just make sure that we have the best solution. I can't think of one that I like better than using LAP, but it feels like the CM protocol / MADs are being hijacked. For example, if there's only one path between two nodes, LAP doesn't really make any sense, but it ends up being used. Should we instead look at adding new CM messages for just this purpose?

For example, event registration could be used to detect that a remote node has gone down. We could use per node keep alive messages, rather than per connection messages.

No, these won't address cases such as DREQ timeout after remote
decides to close connection, without reboot.

Per node keep alive messages could. It depends on what data is carried in the message (e.g. all currently connected QPs to the node in question). I mentioned this because it may be more efficient under some circumstances.

- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to