Short version: Exploring the scalability of IRON-RANGER's "bubble"-based registration system - every 10 minutes the two IRON routers of the two ISPs send a registration packet to however many VP (Virtual Prefix) Iron routers there are for the VP which covers the I-R PI prefix in question.
I think the scaling properties of this system look bad - and I can't yet see how the IRON routers can discover the IP addresses of all the VP routers. Hi Fred, You wrote: >>> IRON-RANGER used to speak of using IPv6 neighbour discovery >>> as the means for locator liveness testing, dissemination >>> of routing information, secure redirection, etc. However, >>> the VET and SEAL mechanisms are being revised to instead >>> use a different mechanism called the SEAL Control Message >>> Protocol (SCMP) for tunnel endpoint negotiations that occur >>> *within* the tunnel sublayer and are therefore not visible >>> to either the outer IP protocol nor the inner network layer >>> protocol. Hence, the inner network layer protocol could be >>> anything, including IPv4, IPv6, OSI CLNP, or any other network >>> layer protocol that is eligible for encapsulation in IP. >> >> OK. I hope you will be able to explain these things not just in >> terms of high-level concepts, but to give examples of how the whole >> thing would actually work on a large scale. > > OK if you are talking about an architectural description, > but please note that both VET and SEAL are already full > functional specifications that can be used by software > developers to produce real code. I think I-R needs to be described in a way that someone who is up to speed on scalable routing in general can read one or perhaps two I-R documents and have a good idea of how the whole thing is going to work - including with respect to scaling and security. This doesn't require exact bits in headers, but that could be part of it. I think it needs to be pretty-much self-contained rather than requiring people to read other documents which are not part of I-R. >> For instance, how many IRON routers are there in an IPv4 I-R system, >> and how many individual EID prefixes? > > Let's suppose that each VP is an IPv6 ::/32, and that > the smallest unit of PI prefix delegation from a VP is > an IPv6 ::/56. In that case, there can theoretically be > up to 4B VPs in the IRON RIB and 16M PI prefixes per VP. > In practice, however, we can expect to see far fewer than > that until the IPv6 address space reaches exhaustion > which many believe will be well beyond our lifetimes. OK. Still, depending on how the address space was allocated - or at least that subset of the address space covered by I-R's VPs - there could be high numbers, approaching 16M perhaps, of I-R PI prefixes per VP. > Still thinking (very) big, let's try sizing the system > for 100K VPs; each with 100K ::/56 delegated PI prefixes. > That would give 10B ::/56 PI prefixes, or 1 PI prefix > for every person on earth (depending on when you sample > the earth's population). Let's look at the scaling > considerations under these parameters: OK, I think this is a good scenario to discuss. I assume that the VPs can be of various sizes, so some VPs could be a longer prefix, covering less space, if there are a larger number of I-R PI prefixes within that part of the address space. As far as I know, you don't need VPs covering the entire advertised subset of global unicast address space. However, for worst-case scaling discussions I think it is good to assume this. >> Then, how do these IRON >> routers, for each of these EID prefixes continually and repeatedly (I >> guess every 10 minutes or less) securely inform a given number of VP >> routers they are the router, or one of the routers, to which packets >> matching a given EID prefix should be tunneled. Since there could be >> multiple VP routers for a given VP, and the IRON routers don't and (I >> think) can't know where they are, how does this process work securely >> and scalably? > > Each IRON router R(i) discovers the full map of VPs in > the IRON through participation in the IRON BGP. I recall that some IRON routers handle VPs and others don't. As I wrote earlier, assuming VP routers advertise the VP in the DFZ, not just in the I-R overlay network, then they are acting like LISP PTRs or Ivip DITRs. In order for them to do this in a manner which generally reduces the path length from sending host, via VP router to the IRON router which delivers the packet to the destination, I think that for each VP something like 20 or more IRON routers need to be advertising the same VP. I interpret your previous sentence to mean that all the IRON routers are part of the IRON BGP overlay network, and that each one will therefore get a single best path for each VP. That will give it the IP address of one IRON router which handles this VP. It won't give it any information on the full set of IRON routers which handle this VP. > That > means that each R(i) would need to perform full database > synchronization for 100K stable IRON RIB entries that rarely > if ever change. I am not sure what you mean by "full database synchronization". Only a subset of IRON routers advertise a VP, and each IRON router would get a best-path to a single IRON router out of potentially numerous IRON routers which were advertising a given VP. So any one IRON router would not be able to use the IRON BGP overlay system to either discover the IP addresses (or best paths) to all IRON routers, or to all the IRON routers which advertise VPs, assuming that some VPs were advertised by more than one IRON router. > This doesn't sound terrible even for existing > core router equipment. As you noted, it is also possible that > a given VP(j) would be advertised by multiple R(i)s - let's > say each VP(j) is advertised by 2 R(i)s (call them R(x) and > R(y)). But, since the IRON RIB is fully populated to all > R(i)s, each R(i) would discover both R(x) and R(y) that > advertise VP(j). I don't see how this would occur. A given IRON router receives best paths for each VP, so for VP(j) it will get a best path to (and IP address of) either R(x) or R(y). > Now, for IRON router R(i) that is the provider for 100K PI > prefixes delegated from VP(j), R(i) needs to send a "bubble" > to both R(x) and R(y) for each PI prefix. Its no-doubt a relief to less muscle-bound scalable routing architectures that the routers of IRON-RANGER are hurling about merely "bubbles" rather than something with greater impact! > That would amount to 200K bubbles every 600 sec, or 333 > bubbles/sec. If each bubble is 100bytes, the total bandwidth > required for updating all of the 100K PI prefixes is 260Kbps. I am not sure each registration "bubble" would only be 100 bytes of protocol-level data. You need to specify, for IPv6: 1 - The IP address of the IRON sending the registration (16 bytes). 2 - The prefix the IRON router is registering (18 bytes). 3 - Nonces and other stuff which invariably accompany messages such as this (10 to 20 bytes?). 4 - Authentication material, such as a digital signature for the above, including the public key of the signer (the IRON router itself?) and a pointer to one or more PKI CAs or whatever so the VP router can ascertain that this really is the public key of the signer. These will be FQDNs - lets say 50 bytes or so. Maybe you could get the whole thing into 100 bytes. Then add the IPv6 header - 40 bytes - and a UDP header 8 bytes - and we are up to about 150 bytes already. Add in L2 headers - Ethernet is 46 octets - and we are up to 200 bytes. Multiply by 8 and this is 1600 bits. 1600 x 333 = 532,800 bits/sec ~=0.5Mbps This is the bandwidth of incoming packets to R(x) and likewise for R(y) in your description. This is assuming a two IRON routers ("200k bubbles every 600 sec") per I-R PI prefix. But your description varies from mine already in two other important respects. Firstly, if these VP-advertising routers are to operate properly like DITRs or PTRs, they needs to be a lot more than 2 of them per VP. Let's say 20. Maybe 10 would be acceptable, maybe more - but 20 will do. Let's call them RVP(j, 0) to RVP(j, 19) where, in your example: R(x) == RVP(j, 0) R(y) == RVP(j, 1) Secondly, I don't see how R(i) could discover the IP addresses of more than one of this set of 20 routers. In my model, if it could be shown how routers such as R(i) which handle the 100k I-R PI prefixes in VP(j) could discover all the 20 routers RVP(j, 0) to RVP(j, 19), then each of these 20 routers has this incoming bandwidth. > Now, let's say that each PI prefix is multihomed to 2 providers, > then we get 2x the message traffic for 520Kbps total for the > bubbles needed to keep the 100K PI prefixes refreshed. You already assumed two IRON routers per I-R PI prefix in your 260kbps figure above, so there's no need to double at again to 520kbps. 2 ISPs seems a reasonable figure, which was already part of my calculations. Each provider has an IRON router which handles a given I-R IP prefix, and each such IRON router is sending bubbles to all the VP routers (though I don't yet understand how these VP routers would be discovered - and I am assuming there are 20 of them while you are assuming there will be 2 of them). My figure is 532kbps ~= 0.5Mbps incoming bandwidth per VP router. >> If the VP routers act like DITRs or PTRs by advertising their VP in >> the DFZ, then in order to make them work well in this respect - to >> generally minimise the extra path length taken to and from them >> compared to the path from the sending host to the proper IRON router >> - I think you need at least a dozen of them. This directly drives >> the scaling problems in the process just mentioned where the IRON >> routers continually register each of their EID prefixes with the >> dozen or so VP routers which cover that EID prefix. > > I don't understand why the dozen - I think with IRON VP > routers, the only reason for multiples is for fault tolerance > and not for optimal path routing, since path optimization will > be coordinated by secure redirection. So, just a couple (or a > few) IRON routers per VP should be enough I think? Secure redirection works when an IRON router sends the initial packet to a VP router, but it doesn't apply when the sending router is that of a non-upgraded network. To support generally low stretch paths from those sending networks to the IRON router which is currently the desired one for forwarding packets to the destination network, I think you need a larger number. 20 is a rough figure, assuming a global distribution of sending hosts and IRON routers which handle the I-R PI prefixes - as is required for real portability. If all the IRON routers for the I-R PI prefixes of a given VP were in Europe, then it would suffice to have all the VP routers also in Europe - so depending on the need for robustness and load sharing, perhaps you wouldn't need 20 or them. Maybe 5 would do. But generally, for this kind of scaling discussion, I think we need to assume the goal of global portability of the new kind of address space, with sending hosts likewise distributed globally. So I think that for a VP containing 100k I-R PI prefixes, there are going to be 20 such VP routers, and each is going to get a continual 1Mbps stream of registration packets. This is not counting the work that VP router needs to do in order to establish the authenticity of those registrations. As far as I know, it could only do this by looking up PKI CAs (Certification Authorities) on a regular basis to ensure the signed registrations were valid. There are serious scaling problems per VP router in handling 333 signed registrations per second. That's a lot of crypto stuff to do just to check the signatures - and a lot more work and packets going back and forth for regularly checking that the public keys provided are still valid. There is also the scaling problem of there being 20 or so of these VP routers, so the entire Internet needs to handle 20 x 0.5Mbps = 10Mbps continually just to handle the registration of these 100k I-R PI prefixes. Each such prefix requires 100 bits per second in continual registration activity - 5 bits per second per VP router per I-R PI prefix. For each VP router, 5 bits per second on average comes from each of the typically two IRON routers which are registering a given I-R PI prefix. Checking this: If there was a single VP router and a single IRON router registering an I-R PI prefix, the IRON router would send 1600 bits every 600 seconds. This is 2.66 bits a second. Since there are 20 VP routers, the figure per IRON router per I-R PI prefix is 53bps. Since there are two such IRON routers per I-R PI prefix, each such IRON router sends 106bps per I-R PI prefix. With 100k of these I-R PI prefixes per VP, this is about 10Mbps. This checks out OK. I think this is an unacceptable continual burden of registration traffic. Also, this is just for 10 minute registrations. I recall that the 10 minute time is directly related to the worst-case (10 minute) and average (5 minute) multihoming service restoration time, as per our previous discussions. I think that these are rather long times. >> Your IDs tend to be very high level and tend to specify external RFCs >> for how you do important functions in I-R. > > You may be speaking of IRON/RANGER, but the same is not > true of VET/SEAL. VET and SEAL are fully functional > specifications from which real code can be and has been > derived. Yes - SEAL is a self-contained protocol, but I still found it hard to navigate my way within the one document. >> Yet those RFCs say >> nothing about I-R itself. I think your I-Ds generally need more >> material telling the reader specifically how you use these processes >> in I-R. Then, for each such process, have a detailed discussion >> with real worst-case numbers to show that it is scalable at every >> level for some worst-case numbers of EID prefixes, IRON routers etc. >> - as well as secure against various kinds of attack. > > Does the analysis I gave above help? If so, I can put > it in the next version of IRON. This is the sort of example I am hoping you will add. But first I think there are two questions I raised which would need to be resolved before your example would be realistic according to my understanding of I-R: 1 - How does an IRON router discover all the IRON routers advertising a VP? The I-R BGP overlay network does not provide this, as far as I know. 2 - Allow for 20 or so routers each advertising the one VP, for the purposes of supporting packets from non-upgraded networks. Assuming 2 is accepted, and 1 is somehow achieved, we now have, for each of the 20 VP routers, 0.5Mbps of registration traffic. That's a lot of traffic and a lot of crypto processing to do. It is no-doubt more efficient than the ~100k or so extremely expensive BGP routers of today's DFZ fussing around comparing notes about 300k prefixes. However, I don't think it scales as well as an alternative: http://tools.ietf.org/html/draft-whittle-ivip-arch http://tools.ietf.org/html/draft-whittle-ivip-drtm which doesn't have such continual flows of registration, mapping etc. data, unrelated to the traffic flowing to a given micronet, or to changes in the ETR to which the micronet is mapped. >>>> 8 - Apart from Ivip's Modified Header Forwarding arrangements, >>>> CES architectures involve encapsulation for tunneling >>>> packets from ITRs to ETRs (IRON-RANGER doesn't have ITRs and >>>> ETRs, but it still requires encapsulated tunneling). There >>>> are some problems with this - but they do not appear to be >>>> prohibitive. >>> IRON-RANGER calls them as ITEs/ETEs because it is possible >>> to also configure a tunnel endpoint on a host and not just >>> on routers. In terms of routers, the IRON-RANGER ITE/ETE >>> are exactly equivalent to what the other proposals are >>> calling as ITR/ETR. >> OK. In Ivip the sending host can have in "ITR" function - though it >> is not a router and this "ITR" function doesn't advertise routes to >> the MABs (Mapped Address Blocks) inside the host. It does however >> only handle packets sent by the host's stack which have destination >> addresses matching any of the MABs. I am sticking with "ITR" and >> "ETR" in Ivip, to remain compatible with LISP - and because I think >> they are easier to pronounce than "ITE" and "ETE". > > I'm not sure about this - an {Ingress/Egress} Tunnel > *Router* is a router that happens to terminate tunnel > endpoints. On the other hand, an {Ingress/Egress} > Tunnel *Endpoint* is a tautologically a tunnel > *endpoint* - so, why not call it as such? I am not suggesting you adopt "ITR" and "ETR" instead of "ITE" and "ETE" - which I agree are more apt terms. I was just explaining why, for now, I will stick with "ITR" and "ETR" for Ivip. - Robin _______________________________________________ rrg mailing list rrg@irtf.org http://www.irtf.org/mailman/listinfo/rrg