On Apr 16, 2009, at 9:12 AM, Ralph Castain wrote:
Sounds fine, though note that we don't want ob1 itself to do this as it inevitably adds overhead that translates into latency. Instead, we want that functionality to be in a separate component for those people who want to use it.
To drive this point home: in an MPI implementation, latency and bandwidth performance benchmarks are [unfortunately] king. There should be zero (not "close to zero") performance impact of such changes for those who do not want to use them. That's why all work has been done in "cloned" ob1 components to date, to include failover, retransmission (note that retransmission implies a lot of tracking of pending requests that ob1 does not currently do -- the overhead for that is definitely going to be non-zero).
We did talk on a telecon earlier this week about the need to refactor the PML so that all these various PML components don't have to keep tracking what is done in ob1 - bit of a pain. Nothing has been done yet, but hopefully at some point we'll address this issue.
Yes; talking to Sun is probably the next logical step to see a) the details of what Rolf has been doing, and b) if we can make a more general framework for these kinds of things without having to clone ob1 every time (this was the death of dr, for example -- dr is hasn't been updated with all the new changes to ob1 over the past year or two; I already see Nysal making heroic efforts to keep csum up to date with ob1. It just seems like there should be a better way... although I don't know offhand what that is, because all the options we've talked about so far have added overhear :-\ ).
-- Jeff Squyres Cisco Systems