[GROW] RtgDir review: draft-ietf-grow-ix-bgp-route-server-operations-03.txt

John G . Scudder Thu, 18 Sep 2014 13:50:52 -0700

Hello,

I have been selected as the Routing Directorate reviewer for this draft. The 
Routing Directorate seeks to review all routing or routing-related drafts as 
they pass through IETF last call and IESG review, and sometimes on special 
request. The purpose of the review is to provide assistance to the Routing ADs. 
For more information about the Routing Directorate, please see 
http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir


Although these comments are primarily for the use of the Routing ADs, it would 
be helpful if you could consider them along with any other IETF Last Call 
comments that you receive, and strive to resolve them through discussion or by 
updating the draft.

Thanks,

--John



Document: draft-ietf-grow-ix-bgp-route-server-operations-03.txt
Reviewer: John Scudder
Review Date: 2014-09-18
IETF LC End Date: 2014-09-22 
Intended Status: Informational




Summary: 

        • I have some minor concerns about this document that I think should be 
resolved before publication.




Comments:

This is overall a good document and worth publishing, although I have found a 
number of minor issues I would like the authors to address before the document 
progresses. I initially flagged the first two issues as "major" but on 
consideration I've moved them to the "minor" list. With the noted exceptions, I 
think the document is very good in terms of its readability and fitness for 
publication without major editing.




Major Issues:

- None identified.




Minor Issues:

- Throughout the document, various terms are used to describe what RFC 4271 
calls a "route". The definition given in RFC 4271 is:

   Route
      A unit of information that pairs a set of destinations with the
      attributes of a path to those destinations.  The set of
      destinations are systems whose IP addresses are contained in one
      IP address prefix carried in the Network Layer Reachability
      Information (NLRI) field of an UPDATE message.  The path is the
      information reported in the path attributes field of the same
      UPDATE message.

That is, one NLRI plus its path attributes, as carried in an UPDATE, is a 
"route". I would suggest adopting this term, or "BGP route" if you prefer, 
instead of terms such as "NLRI UPDATE message", "NLRI message", "prefix UPDATE 
message", and even just plain "NLRI" and "message". Also some, but not all, of 
the uses of "prefix". I think doing so will make the document clearer, more 
readable, and more technically accurate. A simple search for the terms I've 
called out should show most of them so I won't enumerate them here unless you 
ask me to (feel free, if you want). 

- Reference [RS-ARCH] is a dead link. I found a live copy at 
http://www.cs.usc.edu/assets/003/83191.pdf. It might be worth checking with the 
authors of RS-ARCH to ask what a good archival reference is.

- S. 4.2 talks about scaling. I'm trying to make sense of the analysis:

   Regardless of any Loc-RIB optimization technique is implemented, the
   route server's control plane bandwidth requirements will scale
   according to O(P * N), where P is the total number of unique paths
   received by the route server and N is the total number of route
   server clients.  

So far so good. (Except nit: there seems to be a word missing, such as 
"whether" as in "Regardless of whether any Loc-RIB...")

   In the case where P_avg (the arithmetic mean number
   of unique paths received per route server client) remains roughly
   constant even as the number of connected clients increases, this
   relationship can be rewritten as O((P_avg * N) * N) or O(N^2).  

I don't see where the second factor of N comes from. You're basically expanding 
the P in the first expression as P_avg * N -- but why? I think this would only 
apply if add-path all-paths was chosen as the path hiding mitigation strategy 
-- but this is not touched on in route-server-operations, only in 
ix-bgp-route-server, and besides that the beginning of the paragraph implies 
you're analyzing the multiple Loc-RIB strategy, so I don't guess all-path is 
what you were thinking of. If you're not doing all-path, the O(N^2) analysis is 
wrong AFAICT. To see this, consider that the inbound routes require O(P_avg * 
N) which is just O(N), but the number of routes you're going to advertise is 
bounded by the size of the Internet routing table, which is a constant for 
purposes of this analysis, so also O(N). In and out are summed, not multiplied, 
so the whole thing works out to be O(N), not O(N^2).

So I think this needs to either be corrected, or the assumptions need to be 
better explained. Moving on:

   This
   quadratic upper bound on the network traffic requirements indicates
   that the route server model will not scale to arbitrarily large
   sizes.

If you continue to think this sentence is warranted, I think it should be 
better quantified. Of course nothing can scale to *arbitrarily* large sizes, 
but that still leaves a lot to the imagination. I would think it would be 
beneficial for an IX operator reading this document to be able to have some 
idea of how practical the limitation is. Since the analysis in question is 
looking at control traffic bandwidth consumption, it wouldn't be too onerous to 
throw some simple assumptions up against it -- for example, "if we suppose a RS 
receives on average 100,000 routes from each client with a rate of change of 10 
routes/second, sends on average 1,000,000 routes to each client with a rate of 
change of 100 routes/second, and that each route consumes on average 50 bytes 
in a BGP UPDATE message, simple arithmetic shows that a GigE connection to that 
RS will be fully saturated by the time the number of clients reaches 25,000." 
(Which does not seem like a very practical limitation, the RS will hit a CPU or 
memory bottleneck first.)

Anyway, maybe you will decide on reconsideration of the big-O analysis that 
this bit is not needed at all, which would be OK with me.

- S 4.2.2.1, 

   If the route server
   operator has prior knowledge of interconnection relationships between
   route server clients, then the operator may configure separate Loc-
   RIBs only for route server clients with unique outbound routing
   policies.

It wasn't obvious to me what "outbound" applies to -- the client? The RS? -- 
and for that matter why an inbound policy (on the RS) might not apply. Possibly 
this could be remedied by simply dropping the adjective "outbound".

- S. 4.2.1.2,

   destination splitting would require significant co-ordination
   between the route server operator and each route server client

It's not clear to me why it would "require significant co-ordination", 
depending on what resource you're trying to conserve. Two examples of how you 
could avoid coordination while still getting benefit: You could have clients 
send all their routes to all the RSes, but have RSes filter out the prefixes 
they don't care about. This gives the RS most of the CPU benefit it would have 
gotten had the client done the filtering (prefix filtering is cheap), almost 
all the memory benefit (the filtered routes need not be retained in the 
Adj-RIB-In), and around half the control traffic bandwidth benefit. The client 
incurs cost to send duplicate routes that are going to be discarded by the RS, 
but the client is presumably not the bottleneck resource. Or better still, the 
RS could use ORF towards the clients to control what routes the clients will 
send.

- S. 4.6.1,

OLD:
   Prefixes sent to the route server are tagged with specific [RFC1997]
   or [RFC4360] BGP community attributes

I don't think the naked references scan well as adjectives in this context. I 
suggest

NEW:
   Prefixes sent to the route server are tagged with specific standard [RFC1997]
   or extended [RFC4360] BGP community attributes

- Also in S. 4.6.1,

OLD:
   As both standard and extended BGP communities values are restricted
   to 6 octets

Actually standard communities are restricted to less than that. Perhaps reword 
as

NEW:
   As both standard and extended BGP communities values are restricted
   to 6 octets or fewer

- Also in S. 4.6.1,

   route server operator should take care to ensure
   that the predefined BGP community values mechanism used on their
   route server is compatible with [RFC4893] 4-octet autonomous system
   numbers.

I suspect an RS operator reading this might be left scratching his or her head 
and asking "what does it mean for me to be compatible with RFC4893 in this 
context"? It would be kind to offer them some guidance, since after all this is 
a guidance document.

- S. 4.7: Where you say "non-commutative" I think you mean "non-transitive".

- S. 4.7:

   Problems of this form can be dealt with using [RFC5881] bidirectional
   forwarding detection.

It's not clear to me how certain non-transitive forwarding failures can be 
dealt with using BFD. To take an example, suppose clients A, B and C peer with 
RS. The IX fabric has a failure such that A and B can both reach RS, but not 
each other. C has connectivity to everyone. Prefix X is advertised to RS by 
both B and C. For whatever reason, RS selects X via B to advertise to A. Even 
if A runs BFD towards B, at best A can determine that the route from RS can't 
be used. A isn't able to fail over to C's route as it would in the full-mesh 
case, since it's not aware of it. Depending on A's other connectivity, this may 
result in sub-optimal routing towards X, or complete loss of connectivity to X.

It's beyond the scope of the draft to solve this problem, but the text could be 
made more accurate. A minimal fix would be

   Problems of this form can be partially mitigated using [RFC5881] 
bidirectional
   forwarding detection.

although you might want to go on a bit longer to explain what problems can't be 
mitigated.

- S. 4.8:

   This problem is not specific to route servers and it can also be
   implemented using bilateral peering sessions.  However, the potential
   damage is amplified by route servers because a single BGP session can
   be used to affect many networks simultaneously.

This is true, but there is a more severe way RSes aggravate the problem: In a 
full mesh, a router can (and usually does) directly enforce a "no third-party 
next hops" policy against its peers. An RS peer by definition cannot enforce 
this policy against the RS, so the RS is the only place it can be enforced.

- S. 4.8:

   Route server operators SHOULD check that the BGP NEXT_HOP attribute
   for NLRIs received from a route server client matches the interface
   address of the client.  If the route server receives an NLRI where
   these addresses are different

so far so good (modulo my first comment about the use of "NLRI", of course), 
but:

   and where the announcing route server
   client is in a different autonomous system to the route server client
   which uses the next hop address, 

Is the RS sincerely expected to enforce the above? I suppose it could be 
implemented automatically although imperfectly, by noticing that multiple 
clients are in the same neighbor AS and noticing when they use each other as 
third-party next hops, but AFAIK people generally don't try to figure this out, 
they just do what you've said in the preceding sentence -- make sure the NH 
matches the interface address. If you really do propose that the RS should 
allow third-party next hops but only from clients in a common AS, I think you 
should talk about it specifically and in more detail. If you didn't really mean 
that, then I suggest you drop the clause. 

- S. 5:

   On route server installations which do not employ path hiding
   mitigation techniques, the path hiding problem outlined in section
   Section 4.1 can be used in certain circumstances to proactively block
   third party prefix announcements from other route server clients.

I don't understand what this means. Specifically, I don't know what it means to 
"proactively block third party prefix announcements" or for that matter, even 
what you mean by "third party prefix announcements" in this context. (As a term 
of art, I normally understand "third party announcement" in a BGP context to 
mean announcing a third-party next hop as you discuss in S. 4.8). I also don't 
know what the "certain circumstances" are, quite likely these should be given 
at least a little color if not entirely spelled out.

Also, a nit -- the xref expansion has put "section section" into your text.

- S. 7:

   BIRD, OpenBGPD and Quagga, whose open source BGP implementations
   include route server capabilities 

Great, cool, but:

   which are compliant with this
   document.

I'm not sure what it actually means to be "compliant" with a document that 
"describes operational considerations". Perhaps just drop the phrase?




Nits:

- In S. 2, 
OLD:
        BGP sessions between each participant router
NEW:
        BGP sessions between each pair of participant routers

- In S. 4.2.1.1, 

OLD:
   In
   this situation, the multiple Loc-RIB views required by each client
   are merged into a single view.

As written, this implies that each client requires multiple Loc-RIB views, 
which I don't think is what was intended. I suggest:

NEW:
   In
   this situation, multiple Loc-RIB views
   are merged into a single view.

- I personally am strongly put off by the neologism "granular" to mean 
"fine-grained" and suggest the latter instead. I realize it's not an unusual 
usage so by all means disregard if you feel strongly about it.

- S. 4.6.2:

OLD:
   server operators to implement construct per-client routing policies.
NEW:
   server operators to construct per-client routing policies.
_______________________________________________
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow

[GROW] RtgDir review: draft-ietf-grow-ix-bgp-route-server-operations-03.txt

Reply via email to