RE: BGP and aggregation
actually gre fragmentation itself has nothing to do w/df bit. you either leave the tunnel with default mtu (and use ip fragmentation - of course depending on df) or you may cause it fragmenting packets and resembling them at the tunnel end. on cisco boxes this is triggered by using larger 'ip mtu' (not interface mtu) value. there are some memory and cpu drawbacks due to defragmentation (a hold queue for fragments until they all arive etc.) -- Tomas Daniska systems engineer Tronet Computer Networks Plynarenska 5, 829 75 Bratislava, Slovakia tel: +421 2 58224111, fax: +421 2 58224199 A transistor protected by a fast-acting fuse will protect the fuse by blowing first. -Original Message- From: Forrest W. Christian [mailto:[EMAIL PROTECTED]] Sent: 14. mája 2002 0:02 To: Roger Marquis Cc: [EMAIL PROTECTED] Subject: Re: BGP and aggregation On Mon, 13 May 2002, Roger Marquis wrote: Last time I tried this (IOS11.X to IOS11.X GRE) it was unreliable due to MTU limits. Certain websites (mainly financial) send large packets and set DF. This probably works around some security issue but the result was that these SSL servers couldn't reach clients over the GRE. We have seen the same issue in recent history. Generally, we try to have most of the traffic not pass through a GRE tunnel. With some creative routing, we can pass the data back out to our upstream which knows the more specific for that route. That said, we do support /32 static dialups across our net - I.E. if you have a /32 static on your dialup, you get the same /32 no matter where you dialup. These generally pass through the GRE tunnel as we only know of them through OSPF through the GRE tunnel. We have found that setting a mtu of roughly 1514 on the tunnel fixes this. I think this forces the GRE encapsulation to frag the packets regardless of the setting of the DF bit. Whether the far end router reassembles them or not I'm not sure about and haven't had the opportunity to stick a packet sniffer on the far end to tell. Regardless, it seems to fix the broken sites. YMMV - Forrest W. Christian ([EMAIL PROTECTED]) AC7DE -- The Innovation Machine Ltd. P.O. Box 5749 http://www.imach.com/Helena, MT 59604 Home of PacketFlux Technogies and BackupDNS.com (406)-442-6648 -- Protect your personal freedoms - visit http://www.lp.org/
Re: BGP and aggregation
On Tue, May 14, 2002 at 08:19:22AM +0200, Daniska Tomas wrote: actually gre fragmentation itself has nothing to do w/df bit. you either leave the tunnel with default mtu (and use ip fragmentation - of course depending on df) or you may cause it fragmenting packets and resembling them at the tunnel end. on cisco boxes this is triggered by using larger 'ip mtu' (not interface mtu) value. there are some memory and cpu drawbacks due to defragmentation (a hold queue for fragments until they all arive etc.) http://www.cisco.com/warp/public/105/56.html#subsecondone A final option is to increase the IP MTU on the tunnel interface to 1500 (available in IOS 12.0 and higher). However, increasing the tunnel IP MTU causes the tunnel packets to be fragmented because the DF bit of the original packet is not copied to the tunnel packet header. In this scenario, the router on the other end of the GRE tunnel must reassemble the GRE tunnel packet before it can remove the GRE header and forward the inner packet. IP packet reassembly is done in process-switch mode and uses memory. Therefore, this option can significantly reduce the packet throughput through the GRE tunnel. Handy for getting around MTUs you can't increase. Unfortunately, I do not believe Juniper has any such functionality (even when gre is done by the RE). -- Richard A Steenbergen [EMAIL PROTECTED] http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
Re: BGP and aggregation
On Mon, 13 May 2002, E.B. Dreger wrote: As long as this is getting messy... I'm tempted to suggest confederations. Or spending a few extra bucks on a second ASN, although that doesn't scale. Multiple ASNs wouldn't solve anything in this case. What was wanted was under normal circumstances both A and B only announce a /20, and when the link between A and B breaks announce more specifics. Multiple ASN = inconsistent AS.. no no. - Paul
Re: BGP and aggregation
On Mon, May 13, 2002 at 06:57:19AM -0400, PS wrote: On Mon, 13 May 2002, E.B. Dreger wrote: As long as this is getting messy... I'm tempted to suggest confederations. Or spending a few extra bucks on a second ASN, although that doesn't scale. Multiple ASNs wouldn't solve anything in this case. What was wanted was under normal circumstances both A and B only announce a /20, and when the link between A and B breaks announce more specifics. Multiple ASN = inconsistent AS.. no no. Not necessarily. If 'A' originates the aggregate route it can still be transited via 'B', though with an additional AS hop. Not a perfect solution, but then neither is running a gre tunnel. Austin
Re: BGP and aggregation
In the referenced message, Ralph Doncaster said: BGP will discard any prefix with its own AS in the path, for loop prevention. Hence, one half of the AS would still be unable to reach the other half. This is why a partitioned AS is a failure condition. A tunnel is a means to keep the AS nonpartitioned. I was thinking of doing iBGP over my transit connections (with a couple of static routes so the iBGP works) AND over my inter-city circuit. Any reason why this won't work? -Ralph The loss of igp metric will make it untenable at best. Do it over a GRE tunnel, with your regular igp (isis, ospf, eigrp, or shudder rip). default routes have their own problems which only treat the symptoms of a partitioned as, rather than the problem.
Re: BGP and aggregation
In the referenced message, Austin Schutz said: On Mon, May 13, 2002 at 06:57:19AM -0400, PS wrote: Multiple ASNs wouldn't solve anything in this case. What was wanted was under normal circumstances both A and B only announce a /20, and when the link between A and B breaks announce more specifics. Multiple ASN = inconsistent AS.. no no. Not necessarily. If 'A' originates the aggregate route it can still be transited via 'B', though with an additional AS hop. Not a perfect solution, but then neither is running a gre tunnel. Austin The only perfect solution is having multiple internal paths which are resilient to simultaneous outage. Failing that, I've never had a problem with GRE. Back in 1994-1997 or so, I used them a lot for disconnected sites, much as someone else mentioned, across sprint. Worked great and was certainly cheaper than interlata circuits.
Re: BGP and aggregation
Scott Granados wrote: We set ospf internally, set up bgp for the announcements at each site and used the no-export tag for the more specifics. Then gre tunnels:) for the internal. It worked and I pushed probably 45 to 50mb over the internal loops or gre tunnels. Not ideal but it worked. Last time I tried this (IOS11.X to IOS11.X GRE) it was unreliable due to MTU limits. Certain websites (mainly financial) send large packets and set DF. This probably works around some security issue but the result was that these SSL servers couldn't reach clients over the GRE. -- Roger Marquis Roble Systems Consulting http://www.roble.com/
Re: BGP and aggregation
On Mon, 13 May 2002, Roger Marquis wrote: Last time I tried this (IOS11.X to IOS11.X GRE) it was unreliable due to MTU limits. Certain websites (mainly financial) send large packets and set DF. This probably works around some security issue but the result was that these SSL servers couldn't reach clients over the GRE. We have seen the same issue in recent history. Generally, we try to have most of the traffic not pass through a GRE tunnel. With some creative routing, we can pass the data back out to our upstream which knows the more specific for that route. That said, we do support /32 static dialups across our net - I.E. if you have a /32 static on your dialup, you get the same /32 no matter where you dialup. These generally pass through the GRE tunnel as we only know of them through OSPF through the GRE tunnel. We have found that setting a mtu of roughly 1514 on the tunnel fixes this. I think this forces the GRE encapsulation to frag the packets regardless of the setting of the DF bit. Whether the far end router reassembles them or not I'm not sure about and haven't had the opportunity to stick a packet sniffer on the far end to tell. Regardless, it seems to fix the broken sites. YMMV - Forrest W. Christian ([EMAIL PROTECTED]) AC7DE -- The Innovation Machine Ltd. P.O. Box 5749 http://www.imach.com/Helena, MT 59604 Home of PacketFlux Technogies and BackupDNS.com (406)-442-6648 -- Protect your personal freedoms - visit http://www.lp.org/
Re: BGP and aggregation
I was thinking of doing iBGP over my transit connections (with a couple of static routes so the iBGP works) AND over my inter-city circuit. Any reason why this won't work? -Ralph The loss of igp metric will make it untenable at best. Do it over a GRE tunnel, with your regular igp (isis, ospf, eigrp, or shudder rip). As far as I can tell, GRE doesn't support fragmentation - i.e. encapsulation of a 1500-byte IP packet that results in a GRE packet larger than the interface MTU size. -Ralph
Re: BGP and aggregation
set your mtu on your gre's to 1514 On Mon, 13 May 2002, Ralph Doncaster wrote: I was thinking of doing iBGP over my transit connections (with a couple of static routes so the iBGP works) AND over my inter-city circuit. Any reason why this won't work? -Ralph The loss of igp metric will make it untenable at best. Do it over a GRE tunnel, with your regular igp (isis, ospf, eigrp, or shudder rip). As far as I can tell, GRE doesn't support fragmentation - i.e. encapsulation of a 1500-byte IP packet that results in a GRE packet larger than the interface MTU size. -Ralph
Re: BGP and aggregation
Don't forget that if both sites use the same as even if the connection link drops they will not be able to see each other over the upstream provider as routers won't take the srutes from the same as. If this isn't a problem don't worry about it. If you wish to preserve connectivity between cities you should have a back-up link or use different as's or gre tunnels:). On Sat, 11 May 2002, Ralph Doncaster wrote: I have transit in 2 cities. I have a circuit connecting the 2 cities as well. So far I've been using non-contiguous IPs, so there's been no opportunity for aggregation. Having just received my /20 from ARIN, I'm trying to plan my network. Lets say I split the /20 into 2 /21's, one for each city. I'd like to announce the aggregate /20 instead of 2 /21's, as long as the circuit connecting the 2 cities is working. If the circuit goes down I want each city to announce the local /21. Is this possible? (using either a Cisco router or Zebra) Ralph Doncaster principal, IStop.com div. of Doncaster Consulting Inc.
Re: BGP and aggregation
- This is a great solution to a point. I did this, with the help of someone who reads this list frequently:) but you have to jump through some hoops should you wish both cities to reach each other. Assuming for example all your dns and mail servers are in one city you'd have to jump through this hoop. On Sat, 11 May 2002, Richard A Steenbergen wrote: On Sat, May 11, 2002 at 05:34:39PM -0400, Ralph Doncaster wrote: I have transit in 2 cities. I have a circuit connecting the 2 cities as well. So far I've been using non-contiguous IPs, so there's been no opportunity for aggregation. Having just received my /20 from ARIN, I'm trying to plan my network. Lets say I split the /20 into 2 /21's, one for each city. I'd like to announce the aggregate /20 instead of 2 /21's, as long as the circuit connecting the 2 cities is working. If the circuit goes down I want each city to announce the local /21. Is this possible? (using either a Cisco router or Zebra) If I was paying for transit, I would want THEM to do the work of delivering it to the right city, without wasting the bandwidth of my circuit (unless they're really close and that circuit is really cheap). If you're using the same transit provider in both cities, how about announcing the /20, and the 2 /21s tagged with no-export. The /20 would be heard by the world and get the traffic to your transit provider, then the /21s would route it to the right exit point.
Re: BGP and aggregation
Interesting point there Scott.. we were discussing just that at a recent IXP meeting I was at. Theres a number of different ways (well hacks) in which you can keep connectivity between two halves of an AS network in the event of a split. Is anyone out there actually doing something either this or similar to keep two halves connected in the event of a split.. and have you actually run successfully on your backup and maintained a reasonable throughput (say 30 or 40Mbs) ? I'd be interested if anyone has a proven technique as I want to implement something myself and dont really want to test it by pulling the plug on some backbone links and waiting to see what happens! Steve On Sun, 12 May 2002, Scott Granados wrote: Don't forget that if both sites use the same as even if the connection link drops they will not be able to see each other over the upstream provider as routers won't take the srutes from the same as. If this isn't a problem don't worry about it. If you wish to preserve connectivity between cities you should have a back-up link or use different as's or gre tunnels:). On Sat, 11 May 2002, Ralph Doncaster wrote: I have transit in 2 cities. I have a circuit connecting the 2 cities as well. So far I've been using non-contiguous IPs, so there's been no opportunity for aggregation. Having just received my /20 from ARIN, I'm trying to plan my network. Lets say I split the /20 into 2 /21's, one for each city. I'd like to announce the aggregate /20 instead of 2 /21's, as long as the circuit connecting the 2 cities is working. If the circuit goes down I want each city to announce the local /21. Is this possible? (using either a Cisco router or Zebra) Ralph Doncaster principal, IStop.com div. of Doncaster Consulting Inc.
Re: BGP and aggregation
In the referenced message, E.B. Dreger said: * BGP is an EGP, not an IGP BGP is one half of an IGP, it is the where to go half. You generally run another IGP along with it to provide the how to get there half. Most folks run isis or ospf to transport router loopbacks and other next-hop information, but still transport the majority of routes via bgp.
Re: BGP and aggregation
Actually I ran this way for a while as a primary. I had three sites attached via cogent entirely all announcing a /19 and the internally a /21 each and a couple /21's out of the primary location. In the main location was a 7507 and in the two other pops 6509's. We set ospf internally, set up bgp for the announcements at each site and used the no-export tag for the more specifics. Then gre tunnels:) for the internal. It worked and I pushed probably 45 to 50mb over the internal loops or gre tunnels. Not ideal but it worked. On Sun, 12 May 2002, Stephen J. Wilcox wrote: Interesting point there Scott.. we were discussing just that at a recent IXP meeting I was at. Theres a number of different ways (well hacks) in which you can keep connectivity between two halves of an AS network in the event of a split. Is anyone out there actually doing something either this or similar to keep two halves connected in the event of a split.. and have you actually run successfully on your backup and maintained a reasonable throughput (say 30 or 40Mbs) ? I'd be interested if anyone has a proven technique as I want to implement something myself and dont really want to test it by pulling the plug on some backbone links and waiting to see what happens! Steve On Sun, 12 May 2002, Scott Granados wrote: Don't forget that if both sites use the same as even if the connection link drops they will not be able to see each other over the upstream provider as routers won't take the srutes from the same as. If this isn't a problem don't worry about it. If you wish to preserve connectivity between cities you should have a back-up link or use different as's or gre tunnels:). On Sat, 11 May 2002, Ralph Doncaster wrote: I have transit in 2 cities. I have a circuit connecting the 2 cities as well. So far I've been using non-contiguous IPs, so there's been no opportunity for aggregation. Having just received my /20 from ARIN, I'm trying to plan my network. Lets say I split the /20 into 2 /21's, one for each city. I'd like to announce the aggregate /20 instead of 2 /21's, as long as the circuit connecting the 2 cities is working. If the circuit goes down I want each city to announce the local /21. Is this possible? (using either a Cisco router or Zebra) Ralph Doncaster principal, IStop.com div. of Doncaster Consulting Inc.
Re: BGP and aggregation
SJW Date: Sun, 12 May 2002 21:07:50 +0100 (BST) SJW From: Stephen J. Wilcox SJW Is anyone out there actually doing something either this or SJW similar to keep two halves connected in the event of a SJW split.. and have you actually run successfully on your SJW backup and maintained a reasonable throughput (say 30 or SJW 40Mbs) ? I'd be interested if anyone has a proven technique Anyone know more than myself about InterNAP who can disclose details? -- Eddy Brotsman Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence ~ Date: Mon, 21 May 2001 11:23:58 + (GMT) From: A Trap [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to [EMAIL PROTECTED], or you are likely to be blocked.
Re: BGP and aggregation
On Sun, 12 May 2002, Stephen Griffin wrote: In the referenced message, Andy Walden said: Conditional Router Advertisement: http://www.american.com/warp/public/459/cond_adv.pdf As it sounds like he's using a single AS, the above may not be a fix, since a partitioned AS is still a failure condition. Why? If you announce one prefix via one circuit and announce a different prefix via a different with the same source AS, I don't see a problem since traffic will continue to reach its intended destination. andy -- PGP Key Available at http://www.tigerteam.net/andy/pgp
Re: BGP and aggregation
In the referenced message, Andy Walden said: On Sun, 12 May 2002, Stephen Griffin wrote: In the referenced message, Andy Walden said: Conditional Router Advertisement: http://www.american.com/warp/public/459/cond_adv.pdf As it sounds like he's using a single AS, the above may not be a fix, since a partitioned AS is still a failure condition. Why? If you announce one prefix via one circuit and announce a different prefix via a different with the same source AS, I don't see a problem since traffic will continue to reach its intended destination. andy BGP will discard any prefix with its own AS in the path, for loop prevention. Hence, one half of the AS would still be unable to reach the other half. This is why a partitioned AS is a failure condition. A tunnel is a means to keep the AS nonpartitioned. There are other ways to treat the symptoms, but they aren't particularly good, imho.
Re: BGP and aggregation
On Sun, 12 May 2002, Stephen Griffin wrote: BGP will discard any prefix with its own AS in the path, for loop prevention. Hence, one half of the AS would still be unable to reach the other half. This is why a partitioned AS is a failure condition. A tunnel is a means to keep the AS nonpartitioned. There are other ways to treat the symptoms, but they aren't particularly good, imho. True. This also assumes that we aren't talking about vanilla access here or perhaps you don't have local servers. This could also be fixed with a floating static I suppose as well. At any rate, it depends on your setup I suppose. Connecting remote offices != Bad, Vanilla access = probably tolerable. andy -- PGP Key Available at http://www.tigerteam.net/andy/pgp
Re: BGP and aggregation
isn't a problem don't worry about it. If you wish to preserve connectivity between cities you should have a back-up link or use different as's or gre tunnels:). Floating statics would be a less-hassle means to continue connectivity (with only 2 locations not much of a scaling issue). Or, if you want, a default route (learned via BGP if possible) going to your upstream(s). An IBGP session sharing full routing information might not be something you want to keep established over a GRE tunnel. Hmm... the default route idea sounds even easier than my iBGP over a transit link. I think I'll try your idea first. -Ralph
Re: BGP and aggregation
On Sun, 12 May 2002, Stephen J. Wilcox wrote: Interesting point there Scott.. we were discussing just that at a recent IXP meeting I was at. Theres a number of different ways (well hacks) in which you can keep connectivity between two halves of an AS network in the event of a split. Is anyone out there actually doing something either this or similar to keep two halves connected in the event of a split.. and have you actually run successfully on your backup and maintained a reasonable throughput (say 30 or 40Mbs) ? I'd be interested if anyone has a proven technique as I want to implement something myself and dont really want to test it by pulling the plug on some backbone links and waiting to see what happens! My answer isn't even to close to your reasonable throughput as the example is only T1 connected, but I have a site which we are only connected to via a non-igp path. Everything is via the internet (well sprint.net usually). We're announcing a /18 to sprint at our main site, and a /23 at the disconnected site. The disconnected site points default at sprint, and doesn't take a full routing table. Basically we have BGP up at the disconnected site just to announce the /23 with our AS. With some creative use of cisco routing tools including OSPF, GRE tunnels, and some creative static routing we maintain decent connectivity between the two sites. It works quite well. In fact, it works well enough that we're starting to buy circuits at each of our POPs as it is cheaper to buy circuits from sprint or similar to their internet PoPs than it is to buy circuits around the state. In most cases we will still be maintaining internal connectivity for backup and latency reasons. - Forrest W. Christian ([EMAIL PROTECTED]) AC7DE -- The Innovation Machine Ltd. P.O. Box 5749 http://www.imach.com/Helena, MT 59604 Home of PacketFlux Technogies and BackupDNS.com (406)-442-6648 -- Protect your personal freedoms - visit http://www.lp.org/
Re: BGP and aggregation
RD Date: Sat, 11 May 2002 17:34:39 -0400 (EDT) RD From: Ralph Doncaster RD I have transit in 2 cities. I have a circuit connecting the RD 2 cities as well. So far I've been using non-contiguous IPs, RD so there's been no opportunity for aggregation. Having just RD received my /20 from ARIN, I'm trying to plan my network. RD Lets say I split the /20 into 2 /21's, one for each city. RD I'd like to announce the aggregate /20 instead of 2 /21's, as RD long as the circuit connecting the 2 cities is working. If RD the circuit goes down I want each city to announce the local RD /21. Is this possible? (using either a Cisco router or RD Zebra) * BGP is an EGP, not an IGP * You might want to check out OSPF if you think your net will grow * You don't want your IGP influencing your EGP. Flap, flap. * Redistributing EGP into IGP isn't exactly good, either. Are the upstreams the same in each city? Why not announce the aggregate /20 normally, and set NO_REDISTRIBUTE and use MEDs on the /21s? You're paying for transit, so MEDs are fair game. -- Eddy Brotsman Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence ~ Date: Mon, 21 May 2001 11:23:58 + (GMT) From: A Trap [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to [EMAIL PROTECTED], or you are likely to be blocked.
Re: BGP and aggregation
Conditional Router Advertisement: http://www.american.com/warp/public/459/cond_adv.pdf andy -- PGP Key Available at http://www.tigerteam.net/andy/pgp On Sat, 11 May 2002, Ralph Doncaster wrote: I have transit in 2 cities. I have a circuit connecting the 2 cities as well. So far I've been using non-contiguous IPs, so there's been no opportunity for aggregation. Having just received my /20 from ARIN, I'm trying to plan my network. Lets say I split the /20 into 2 /21's, one for each city. I'd like to announce the aggregate /20 instead of 2 /21's, as long as the circuit connecting the 2 cities is working. If the circuit goes down I want each city to announce the local /21. Is this possible? (using either a Cisco router or Zebra) Ralph Doncaster principal, IStop.com div. of Doncaster Consulting Inc.
Re: BGP and aggregation
* BGP is an EGP, not an IGP * You might want to check out OSPF if you think your net will grow Using iBGP between the 2 cities right now. May try OSPF later. * You don't want your IGP influencing your EGP. Flap, flap. * Redistributing EGP into IGP isn't exactly good, either. Are the upstreams the same in each city? Why not announce the aggregate /20 normally, and set NO_REDISTRIBUTE and use MEDs on the /21s? You're paying for transit, so MEDs are fair game. Well, the assumption is that most of the time the circuit between the 2 cities will be up, so flapping should be rare. The transit is from different providers, so only announcing the /20 won't do the trick. -Ralph
Re: BGP and aggregation
On Sat, May 11, 2002 at 05:34:39PM -0400, Ralph Doncaster wrote: [...] goes down I want each city to announce the local /21. Is this possible? (using either a Cisco router or Zebra) If I was paying for transit, I would want THEM to do the work of delivering it to the right city, without wasting the bandwidth of my circuit (unless they're really close and that circuit is really cheap). It's 2 different providers, and one is much cheaper than the other. Therefore I want all traffic to come in through city A, unless my circuit to city B is down. -Ralph
Re: BGP and aggregation
On Sat, 11 May 2002, Andy Walden wrote: Conditional Router Advertisement: http://www.american.com/warp/public/459/cond_adv.pdf Cool. This looks like what I want. For those that don't like pdf, here it is in HTML from cisco. http://www.cisco.com/warp/public/459/cond_adv.html -Ralph