On Fri, Feb 09, 2007 at 06:08:34PM -0800, Sean Hefty wrote: > >So basically what you are saying is that the TClass and FlowLabel act > >as some kind of global dis-ambiguation that lets all SAs know that the > >tuple <SGID,DGID,TClass,FlowLabel> MUST be matched with <LRH_A,LRH_B> > >on each side. > > Sort of... My reasoning is that if you look at a packet traveling > from the source QP to the destination QP, and examine the packet in > some intermediate subnet (say between two routers), then the only > information that it carries is the <SGID, DGID, TClass, FlowLabel> > tuple. This information must be sufficient to direct the routing at > the endpoints.
Ah, I think I missed the key step in your scheme.. You plan to query the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I was thinking only about the SGID=local DGID=remote query direction) Yes, I agree this works in the simple cases. Quite well in fact... The reversed direction of the PR query is very much aligned with the idea that the GRH is only a destination affecting thing. Let my try to outline to you what I think you are proposing. This is the diagram I am thinking of: SA SA' Node1 --> (LID 1) Router A ------- Router A' (LID A) ---> Node2 |-> (LID 2) Router A | |-> (LID 3) Router B ------- Router B' (LID B) --| Router A and Router B are independent redundant devices, not a route cloud of some sort. B -> A' is not a possible path. So your idea is to do: PR0: Node 1 asks SA for Node1 -> Node2 reversable path. SA returns SLID=Node1 DLID=1, FlowLabel=Magic Reversable indicator. This path is used for CM GMPs, or for the normal non-routed CM. PR1: Detecting a routed situation from PR0, Node 1 asks SA for Node2 -> Node1. SA returns SLID=1 DLID=Node1 and a GRH that configures Router A to use SLID=1 You reverse the local LIDS from that path to get the QP configuration. PR2: Node 1 asks SA' for Node1 -> Node2. SA returns SLID=A DLID=Node. OK. But what if: PR1: Node 1 asks SA for Node2 -> Node1. SA returns SLID=3 DLID=Node1 PR2: Node 1 asks SA' for Node1 -> Node2. SA returns SLID=A DLID=Node2. Now the LIDs don't match and the QP won't work. SA' has no idea that SA picked Router B. > It shouldn't need information about the paths used by packets on the > remote subnet. If a subnet has multiple routers into it, they can > forward packets to the correct router if needed. (Could the routers > just forward to the end node and insert the expected SLID?) Right, this is a good way to solve the problem. Going with the example above, SA' returns a GRH that configures Router B' to use SLID=A and the GRH SA returned configures Router A to use SLID=3. Router B' and A both are faking the SLID in the LRH. This effectively defeats the QP SLID check and everything works :> [Like I said before, this check seems to be a misfeature] I can think of the following downsides: 1) Re-reading Michael Krause's email makes me think that defeating the QP SLID check is contrary to the spirit of IBA 2) Routers now require a GRH->LRH translation table size that is proportional to all the router LIDs in the subnet, not just its own LIDs. [Smart selection of the Flow Label could mitigate this growth though] 3) The reverse PR query method requires 3 PR queries for the simple case and as many as 5 if you want non-reversible paths. 4) Some means of remote SA communication needs to be decided pre-standardization :< (I agree that a magic GID seems best) But... It is the SLID faking that solves the multiple-router-path problem, not the reverse PR. Do you think something like that could be standardized? I guess the big question I have is if IBA chooses to standardize some other method, how much chance is there that it would also make this unsupportable? Ie by preventing the remote SA communication mechanism or by defining a reverse PR to mean something else? I could easially imagine the reverse PR being defined as a way to ask the local SA about the *remote* LIDs. [Actually, if you define it that way and use a MultiPathRecord query then there is enough information to return working LIDs for both subnets. The SAs would have to communicate between themselves and the routers using a new protocol, but that is doable. This does require that a PR be defined so that the LIDs are relative to the subnet of the SGID - not to the local subnet!] > I'm still trying to find a solution that doesn't violate the > architecture as defined. I don't see why my idea wouldn't work yet. > It just requires some unspecified coordination between the local SA > and local routers. I'd also very much like to not have to change the passive side to make this work. But this has turned into such a complex problem it seems really hard to predict what will pass through to standardization.. That is the main benifit I see of the small change to the passive side. No matter what is standardized it can be accomidated in the resulting standard, wheras defining a PR with SGID==offsubnet to mean one thing or another seems much more risky. Jason _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general