RE: Resilience: faults, causes, statistics, open issues
Hi David, this is going to be very useful, I really appretiate it, thank you very much. Just some comments about the root causes of BGP related problems, maybe you find something useful from the research perspective, although probably this is not going to be new for you. I found a few author groups with very related and useful papers: - Tim Griffin and co. - Nick Feamster and co. - Jennifer Rexford and co. - Lixin Gao and co. These people often have joint publications but sometimes separate as well. Also, Craig Labovitz and co have some very useful papers in the area of routing convergence time. The IRTF also has some interesting, futuristic and somewhat visionary drafts about Future Domain Routing. As I see things now, in case of BGP, routing divergence, configuration and policies have a very strong correlation. A high level conclusion (what you probably can expect from half year paper- and presentation-reading research) is that the first root cause of BGP problems is the absence of a widely deployed and practical formal language for policies. Since there is no formal language, there is no compiler, and so you have unwanted anomalies resulting from your config. My conclusion was that BGP has an analogy to software development: SW: Specification=High-level formal language (e.g. C++)=Low-level formal language (assembly, binary, etc.) Both steps can be called implementation or compiliation. The good thing here is that you have automated compilers for the second step, which is harder. BGP: Business relation=Policies=Router configuration First you implement your business relations, when you think out policies, but in the end you will have to implement/compile your policies as router configuration. The problem is, there is no automated compiler for the second step, since there is no formal policy language, and so verifcation is also very hard. As a result you may have configuration bugs or your config is not doing what you originally wanted to do, or you have inconsistency among your routers, etc. Of course, it is clear why such a formal language and compiler is not used in practice (different router vendors, different features, different capabilities, no standard interface, etc.), although there is, e.g., RPSL and the tools built upon RPSL. Lately, Griffin and co have begun thinking about a completely new policy language. The second root cause that I think can be somewhat separated is that there is no practically used central database about policies. You do not necessary know what your neighbour operators are doing (their configs and policies). As a result you may have external inconsistency (that may lead to divergence, wedgies, etc.). Of course, here it is also clear why, e.g., IRRs are not used or not updated frequently (information hiding principle , which is actually the basis of the hierarchical domain structure of the internet). So, in the end, although we can possibly identify the root causes behind BGP problems, I'm not sure they can ever be fully ceased. OK, I can imagine a formal language and config compiler, and one can find verification tools as well, but I can hardly imagine e.g. the sharing of policies (although some papers write about methods how to infer the necessary knowledge from measurements). Thanks again for you help, András p.s. Sorry for the long mail :) :) Original Message From: David Andersen [mailto:[EMAIL PROTECTED] Sent: 2005. január 27. 17:38 To: András Császár (IJ/ETH) Cc: nanog@merit.edu Subject: Re: Resilience: faults, causes, statistics, open issues On Jan 27, 2005, at 6:39 AM, András Császár (IJ/ETH) wrote: Hi people! I've begun research on (carrier-grade, aka telecom-grade) resiliency in IP transport networks. The first step would be to collect possible failure events, their causes and consequences, statistics about downtimes (mean time to repair) and mean times between failures, and I would like to identify which of the problems are most typical (HW bug, SW bug, cable cut through, plugged out (link going down), severe misconfiguration). I think this is the perfect forum to get some feedback from real network-operational experience. Is anyone out there who has some statistics/documents that would help me in any way? This is self-serving, but see the intro and related work sections of my thesis (we'll have a conference paper version of it done soon for NSDI, but we're still revising it. Apologies for not having a shorter reference to give you): http://nms.lcs.mit.edu/papers/index.php?detail=113 It doesn't focus specifically on carrier failures, but it has a batch of references that might get you started on what the academic side knows. I've also got some refs in there to some of the earlier teleco studies, which I recommend taking a peek at. Again, relation to year 2005 ISP failures isn't totally clear, but it's a starting point. Unfortunately, the reality is
The Cidr Report
This report has been generated at Fri Jan 28 21:49:25 2005 AEST. The report analyses the BGP Routing Table of an AS4637 (Reach) router and generates a report on aggregation potential within the table. Check http://www.cidr-report.org/as4637 for a current version of this report. Recent Table History Date PrefixesCIDR Agg 21-01-05150315 103500 22-01-05151119 103502 23-01-05150437 103475 24-01-05150434 103516 25-01-05150622 103724 26-01-05150792 103854 27-01-05150792 103116 28-01-05150969 103172 AS Summary 18774 Number of ASes in routing system 7698 Number of ASes announcing only one prefix 1435 Largest number of prefixes announced by an AS AS7018 : ATTW ATT WorldNet Services 90486528 Largest address span announced by an AS (/32s) AS721 : DNIC DoD Network Information Center Aggregation Summary The algorithm used in this report proposes aggregation only when there is a precise match using the AS path, so as to preserve traffic transit policies. Aggregation is also proposed across non-advertised address space ('holes'). --- 28Jan05 --- ASnumNetsNow NetsAggr NetGain % Gain Description Table 151027 1032154781231.7% All ASes AS18566 7667 75999.1% CVAD Covad Communications AS4134 839 206 63375.4% CHINANET-BACKBONE No.31,Jin-rong Street AS4323 833 224 60973.1% TWTC Time Warner Telecom AS721 1118 573 54548.7% DNIC DoD Network Information Center AS7015 596 77 51987.1% CCCH-3 Comcast Cable Communications Holdings, Inc AS7018 1435 973 46232.2% ATTW ATT WorldNet Services AS22773 436 19 41795.6% CXA Cox Communications Inc. AS27364 460 46 41490.0% ARMC Armstrong Cable Services AS6197 834 448 38646.3% BNS-14 BellSouth Network Solutions, Inc AS6478 501 118 38376.4% ATTW ATT WorldNet Services AS3602 513 148 36571.2% SPCA Sprint Canada Inc. AS22909 423 74 34982.5% CMCS Comcast Cable Communications, Inc. AS1239 923 615 30833.4% SPRN Sprint AS9929 340 35 30589.7% CNCNET-CN China Netcom Corp. AS4766 568 279 28950.9% KIXS-AS-KR Korea Telecom AS17676 391 103 28873.7% JPNIC-JP-ASN-BLOCK Japan Network Information Center AS14654 2637 25697.3% WAYPOR-3 Wayport AS9443 366 121 24566.9% INTERNETPRIMUS-AS-AP Primus Telecommunications AS6140 378 141 23762.7% IMPSA ImpSat AS4355 300 64 23678.7% ERSD EARTHLINK, INC AS9583 576 343 23340.5% SIFY-AS-IN Sify Limited AS25844 244 16 22893.4% SASMFL-2 Skadden, Arps, Slate, Meagher Flom LLP AS6198 447 225 22249.7% BNS-14 BellSouth Network Solutions, Inc AS15270 245 32 21386.9% PDP-14 PaeTec.net -a division of PaeTecCommunications, Inc. AS2386 812 609 20325.0% ADCS-1 ATT Data Communications Services AS23126 210 18 19291.4% KTEL KMC Telecom, Inc. AS5668 426 241 18543.4% CIH-12 CenturyTel Internet Holdings, Inc. AS19632 1918 18395.8% Metropolis Intercom AS6517 301 119 18260.5% YIPS Yipes Communications, Inc. AS9498 235 54 18177.0% BBIL-AP BHARTI BT INTERNET LTD. Total 15970 59431002762.8% Top 30 total Possible Bogus Routes 24.138.80.0/20 AS11260 AHSICHCL Andara High Speed Internet c/o Halifax Cable Ltd. 24.246.0.0/17AS7018 ATTW ATT WorldNet Services 24.246.38.0/24 AS25994 NPGCAB NPG Cable, INC 24.246.128.0/18 AS7018 ATTW ATT WorldNet Services 64.17.32.0/24AS5024 BRIDGE-75 BridgeNet, LC 64.17.33.0/24AS5024 BRIDGE-75 BridgeNet, LC 64.17.37.0/24AS5024 BRIDGE-75 BridgeNet, LC 64.46.27.0/24AS8674
Re: beware of the unknown packets
On Wed, Jan 26, 2005 at 11:12:19PM +0200, Petri Helenius wrote: Hi, http://www.kb.cert.org/vuls/id/409555 Did anyone here of any exploits being in the wild? -- Sabri Berisha, SAB666-RIPE - I route, therefore you are http://www.cluecentral.net - http://www.virt-ix.net
Re: beware of the unknown packets
Sabri Berisha wrote: On Wed, Jan 26, 2005 at 11:12:19PM +0200, Petri Helenius wrote: Hi, http://www.kb.cert.org/vuls/id/409555 Did anyone here of any exploits being in the wild? How would one tell if the actual issue is not published? (without violating possible NDA's) Pete
Announce: BGP::Inspect
All, Merit Network and the University of Maryland would like to announce the beta release of a BGP update messages research tool that might be of use to the NANOG community. The tool is called BGP::Inspect. The goal is to make the vast quantities of Routeviews data easily accesible to the network operator and research community. This involves not just allowing people to query and obtain the update messages, but also providing some simple analysis and statistics on the data which can help in locating anomalies and problems. At this point we feel that we could really benefit from some feedback from the community. A beta release of our prototype is available at: http://weasel.merit.edu:9191/ This version has been initialized with a limited amount of data. It currently provides information regarding 5 of the 40 routeview peers, and only contains data for the time period from Dec 20 - Jan 6. The basic interface has been kept simple. There are 2 types of queries that can be run Summary Queries and Raw Data Analysis. The summary queries allow users to quickly focus on potential trouble spots(as observed at the routeview peers). Basic queries include things like most active ASes, most active prefixes, as well as prefixes that exhibited the most number of changes in their OriginAS. The second type of queries, Raw Data Analysis can be used to obtain information regarding specific ASes or prefixes for a given time range. A query for a specific AS will return not only the various prefixes announced by that AS, the times, paths, and communities, but also summary stats including total number of announcements in that time period and the number of unique prefixes announced in that time period. A 7 day summary graph is also returned which summarized the most recent activity as seen originating from that AS. A similar query for a specific prefix will return times, types(announce/withdraw), aspaths and communities from update messages as well as summary statistics that indicate the min/max/avg AS path length as seen over the query time interval, the number of originAS changes as well as the number of unique ASes that announced that prefix. A summary graph indicating the activity of that prefix over the last 7 days is also displayed. In a lot of ways this tool complements the Search by AS/Prefix tools from RIPE, BGP Monitor from MIT, and LinkRank from UCLA. The more views from different vantage points the better. In addition there is a real effort with BGP::Inspect to provide not simply access to the raw data, but some simple analysis and summary statistics as well. The hope is that people no longer need to write custom parsers to be able to extract the information they need. We would appreciate any and all feedback from the NANOG community. In particular, it would be instructive to us to learn what are some other typical queries that we could add, in addition to the the Top 20 most active ASes/Prefixes and Top 20 prefixes which have most number of origin AS changes. What are some other basic questions that researchers and network operators ask when attempting to analyze problems. Please send feedback offlist to: [EMAIL PROTECTED] thanks manish karir
Weekly Routing Table Report
This is an automated weekly mailing describing the state of the Internet Routing Table as seen from APNIC's router in Japan. Daily listings are sent to [EMAIL PROTECTED] If you have any comments please contact Philip Smith [EMAIL PROTECTED]. Routing Table Report 04:00 +10GMT Sat 29 Jan, 2005 Analysis Summary BGP routing table entries examined: 154984 Prefixes after maximum aggregation: 90381 Unique aggregates announced to Internet: 74096 Total ASes present in the Internet Routing Table: 18880 Origin-only ASes present in the Internet Routing Table: 16393 Origin ASes announcing only one prefix:7690 Transit ASes present in the Internet Routing Table:2487 Transit-only ASes present in the Internet Routing Table: 80 Average AS path length visible in the Internet Routing Table: 4.5 Max AS path length visible: 22 Prefixes from unregistered ASNs in the Routing Table: 6 Special use prefixes present in the Routing Table:0 Prefixes being announced from unallocated address space: 19 Number of addresses announced to Internet: 1370739756 Equivalent to 81 /8s, 179 /16s and 212 /24s Percentage of available address space announced: 37.0 Percentage of allocated address space announced: 58.5 Percentage of available address space allocated: 63.2 Total number of prefixes smaller than registry allocations: 72278 APNIC Region Analysis Summary - Prefixes being announced by APNIC Region ASes:30715 Total APNIC prefixes after maximum aggregation: 14875 Prefixes being announced from the APNIC address blocks: 28751 Unique aggregates announced from the APNIC address blocks:14553 APNIC Region origin ASes present in the Internet Routing Table:2194 APNIC Region origin ASes announcing only one prefix:649 APNIC Region transit ASes present in the Internet Routing Table:326 Average APNIC Region AS path length visible:4.4 Max APNIC Region AS path length visible: 15 Number of APNIC addresses announced to Internet: 172218240 Equivalent to 10 /8s, 67 /16s and 215 /24s Percentage of available APNIC address space announced: 63.9 APNIC AS Blocks4608-4864, 7467-7722, 9216-10239, 17408-18431 23552-24575 APNIC Address Blocks 58/7, 60/7, 124/7, 126/8, 202/7, 210/7, 218/7, 220/7 and 222/8 ARIN Region Analysis Summary Prefixes being announced by ARIN Region ASes: 86924 Total ARIN prefixes after maximum aggregation:52224 Prefixes being announced from the ARIN address blocks:66269 Unique aggregates announced from the ARIN address blocks: 23997 ARIN Region origin ASes present in the Internet Routing Table: 9828 ARIN Region origin ASes announcing only one prefix:3564 ARIN Region transit ASes present in the Internet Routing Table: 964 Average ARIN Region AS path length visible: 4.3 Max ARIN Region AS path length visible: 16 Number of ARIN addresses announced to Internet: 240024576 Equivalent to 14 /8s, 78 /16s and 124 /24s Percentage of available ARIN address space announced: 71.5 ARIN AS Blocks 1-1876, 1902-2042, 2044-2046, 2048-2106 2138-2584, 2615-2772, 2823-2829, 2880-3153 3354-4607, 4865-5119, 5632-6655, 6912-7466 7723-8191, 10240-12287, 13312-15359, 16384-17407 18432-20479, 21504-23551, 25600-26591, 26624-27647,29695-30719, 31744-33791 ARIN Address Blocks24/8, 63/8, 64/6, 68/7, 70/7, 72/8, 198/7, 204/6, 208/7 and 216/8 RIPE Region Analysis Summary Prefixes being announced by RIPE Region ASes: 29197 Total RIPE prefixes after maximum aggregation:20200 Prefixes being announced from the RIPE address blocks:26172 Unique aggregates announced from the RIPE address blocks: 17218 RIPE Region origin ASes present in the Internet Routing Table: 6269 RIPE Region origin ASes announcing only one prefix:3332 RIPE Region transit ASes present in the Internet Routing Table:1067 Average RIPE Region AS path length visible: 5.1 Max RIPE Region AS path length visible: 22 Number of RIPE addresses announced to Internet: 187084864 Equivalent to 11
Re: Resilience: faults, causes, statistics, open issues
On Jan 28, 2005, at 5:30 AM, András Császár (IJ/ETH) wrote: Just some comments about the root causes of BGP related problems, maybe you find something useful from the research perspective, although probably this is not going to be new for you. I found a few author groups with very related and useful papers: - Tim Griffin and co. - Nick Feamster and co. - Jennifer Rexford and co. - Lixin Gao and co. Yup. That particular group you mentioned has a lot of interplay. These people often have joint publications but sometimes separate as well. Also, Craig Labovitz and co have some very useful papers in the area of routing convergence time. Yes. There's also Morley Mao's convergence work. As I see things now, in case of BGP, routing divergence, configuration and policies have a very strong correlation. A high level conclusion (what you probably can expect from half year paper- and presentation-reading research) is that the first root cause of BGP problems is the absence of a widely deployed and practical formal language for policies. Since there is no formal language, there is no compiler, and so you have unwanted anomalies resulting from your config. In a sense. I think that this is one of the root causes, but it's perhaps not the only one. I think we can group it into two areas: a) Fundamental BGP problems (e.g., the convergence/flap damping issues, etc.). By fundamental I don't mean uncorrectable - I simply mean that they're features of the protocol as it exists today. Some may be fundamental trade-offs in global routing; I don't know. b) The abovementioned policy issue Some of the issues in (a) can be corrected through (b) - for example, the Gao/Rexford examination of what policies can be permitted if you want to ensure stable routing. Given that BGP is a strongly policy-driven beast, many, many of its problems do arise from this. So, in the end, although we can possibly identify the root causes behind BGP problems, I'm not sure they can ever be fully ceased. OK, I can imagine a formal language and config compiler, and one can find verification tools as well, but I can hardly imagine e.g. the sharing of policies (although some papers write about methods how to infer the necessary knowledge from measurements). Agreed. I think we'll make steps, though, and I think that groups of collaborating providers can probably implement some of the solutions between themselves in ways that make sense. p.s. Sorry for the long mail :) :) No worries - quite interesting. (to me, at least!) -Dave
Multicast and unicast delivery at NANOG 33
Just an update. Coverage is planned for Sunday Jan 30 19:30 - 21:30 (PST) Coordinating NANOG: Input From the Community. Merit will provide Real Netoworks streaming media for the NANOG meeting proper, see: http://www.nanog.org/mtg-0501/network.html#real From the UO, NANOG will be multicast live using an ISMA MPEG-4 standard multicast stream using the mp3 audio codec at 250Kb/s. The stream should be of similar visual quality to the 1Mb/s MPEG-1 stream! Recommended free clients are Quicktime for Windows or Macintosh and MPlayer on Linux or *BSD. The VideoLAN client is able to play back the video and audio on all platforms. We will also be providing a unicast http based mp3 audio stream, playable in itunes and any http/icecast/shoutcast capable client. The multicast sources are: MPEG-4 (ISMA MPEG-4 250kb/s) (SDP FILE) http://videolab.uoregon.edu/events/nanog/nanog33-mpg4.sdp * Video IP Address: 224.2.174.81 * Video UDP Port: 51314 * Audio IP Address: 224.2.174.82 * Audio UDP Port: 28256 The unicast mp3 audio source is: http://tinder.uoregon.edu:8000/nanog33.mp3 for updated information see: http://videolab.uoregon.edu/events/nanog/nanog_33.html -- -- Joel Jaeggli Unix Consulting [EMAIL PROTECTED] GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2