> > Depending on what failure cases you actually see from your peers in the > wild, I can see (at least as a thought experiment), a two-bucket solution - > "transit" and "everyone else". (Excluding downstream customers, who you > obviously hold some responsibility for the hygiene of.) >
Although I didn't say it clearly, that's exactly what we do. The described 'bucket' logic is only applied to the 'everyone else' pile ; our transit stuff gets its own special care and feeding. How often do folks see a failure case that's "deaggregated something and > announced you 1000 /24s, rather than the expected/configured 100 max", vs > "fat-fingered being a transit provider, and announced you the global table"? > I can count on one hand the number of times I can remember that a peer has gone on a deagg party and ran over limits. Maybe twice in the last 8 years? It's possible it's happened more that I'm not aware of. We have additional protections in place for that second scenario. If a generic peer tries to send us a route with a transit provider in the as-path, we just toss the route on the floor. That protection has been much more useful than prefix limits IMO. On Wed, Aug 18, 2021 at 11:37 AM t...@pelican.org <t...@pelican.org> wrote: > On Wednesday, 18 August, 2021 14:21, "Tom Beecher" <beec...@beecher.cc> > said: > > > We created 5 or 6 different buckets of limit values (for v4 and v6 of > > course.) Depending on what you have published in PeeringDB (or told us > > directly what to expect), you're placed in a bucket that gives you a > decent > > amount of headroom to that bucket's max. If your ASN reaches 90% of your > > limit, our ops folks just move you up to the next bucket. If you start to > > get up there in the last bucket, then we'll take a manual look and decide > > what is appropriate. This covers well over 95% of our non-transit > sessions, > > and has dramatically reduced the volume of tickets and changes our ops > team > > has had to sort through. > > Depending on what failure cases you actually see from your peers in the > wild, I can see (at least as a thought experiment), a two-bucket solution - > "transit" and "everyone else". (Excluding downstream customers, who you > obviously hold some responsibility for the hygiene of.) > > How often do folks see a failure case that's "deaggregated something and > announced you 1000 /24s, rather than the expected/configured 100 max", vs > "fat-fingered being a transit provider, and announced you the global table"? > > My gut says it's the latter case that breaks things and you need to make > damn sure doesn't happen. Curious to hear others' experience. > > Thanks, > Tim. > > >