Re: Cogent RPKI invalid filtering

2021-04-26 Thread Job Snijders via NANOG
Hi Robert, NANOG,

On Mon, Apr 26, 2021 at 09:29:27AM -0400, Robert Blayzor via NANOG wrote:
> According to Cloudflares isbgpsafeyet.com, Cogent has been considered "safe"
> and is filtering invalids.
> 
> But I have found that to be untrue (mostly). It appears that some days they
> filter IPv4, sometimes not, and IPv6 invalids are always coming through. I
> know it's Cogent, but curious as to what others are seeing.

   [ Disclaimer: I'm not affiliated with the companies referenced in the
 above message. But as I love talking about RPKI, I'd like to share
 some perspective based on my own experience with both small and
 large scale RPKI deployments. ]

TL;DR - RPKI Route Origin Validation (ROV) is incrementally deployed
inside networks, and incrementally across the Default-Free Zone. This
means right now (and for years to come), operators will see RPKI invalid
routes spill through the cracks of the global routing system.
This is expected and unavoidable.

Details ---

There are a few caveats to consider when using the isbgpsafeyet.com
testing utility to determine whether a network is doing RPKI ROV with
'invalid == reject' EBGP policies. The isbgpsafeyet.com beacon prefixes
are anycasted from many vantage points, this 'skews' the testing results
in some ways.  Imagine the prefixes being anycasted from (hypothetical)
a 100 POPs, this essentially is a 100 attempts to propagate RPKI invalid
routes into the default-free zone. Only a single route (out of the 100)
needs to slip past any potential 'invalid == reject' barriers between
the testsite and the visitor. The Cloudflare test essentially goes out
of its way to circumvent RPKI filters, but at the same time is easily
fooled in the presence of default routes (0.0.0.0/0 + ::/0).

To get a broader sense of how one's local internet connection is impacted
by RPKI, is to compare traceroutes to 103.21.244.15 versus traceroutes
to 1.1.1.1 - if the first trace takes a bit of a detour compared to the
latter IP, it might be indicative of only one (or a few) routers in a
global IP backbone are not RPKI-capable.

In addition to the CF test, I recommend also testing similar but
alternative tools, such as https://sg-pub.ripe.net/jasper/rpki-web-test/
The ripe.net test is *not* anycasted and single-homed behind a
transit-free carrier, this too skews the results in some way.

Another test can be done by pinging the RIPE RIS "Resource Certification
(RPKI) Routing Beacons" at the bottom of this page:
https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/current-ris-routing-beacons

And yet another way of measuring to what degree RPKI ROV has been
deployed in an individual AS or the DFZ as a whole, is by looking at BGP
data. The NLNOG RING LG (AS 199036, http://lg.ring.nlnog.net/summary/lg01/ipv4)
receives tens of full table feeds from various BGP speakers around the
planet. Every few hours a script takes a snapshot of the LG's Local RIB
and applies the RFC 6811 Origin Validation procedure to all paths, and
for a select few ASNs stores the list of prefixes.

Cecilia Testart et al. did a thorough study using similar methodology:
https://www.caida.org/publications/papers/2020/filter_not_filter/filter_not_filter.pdf
This paper is a fun friday afternoon read!

Below is the current top ten "RPKI invalid distributor" ASNs as seen
from AS 199036:

   RPKI invalid routes | Transiting Autonomous System
   +-
 2,224 | AS6461 - Zayo
 2,094 | AS3320 - Deutsche Telekom
 1,989 | AS8220 - Colt
 1,976 | AS5511 - Orange
 1,924 | AS6762 - Telecom Italia
 1,613 | AS1273 - Vodafone
   573 | AS6453 - Tata
   436 | AS6939 - Hurricane Electric
   425 | AS6830 - Liberty Global
   355 | AS3491 - PCCW
 (rough estimates as of April 26th, 2021)

Cogent (AS 174) isn't even in the global top ten RPKI Invalids
distributors! :-) Banana for scale: in 2018-2019 the top ten was
distributing between 5,000 and 6,000 unique RPKI invalid routes.

Many in the community deploying RPKI consider a RPKI deployment
'functionally complete' when a transit network dives below propagating
~ 30% of the total of DFZ invalids (and manages to stay there).

The gap of ~ 1,600 prefixes between Zayo/Deutsche Telekom - and the group
of ASNs propagating less than 600 - is the difference between not
rejecting invalids on any EBGP session, and rejecting invalids on most
EBGP sessions.

How does one end up deploying RPKI ROV on most, but not all EBGP sessions?

In the last few years HUNDREDS of RPKI-related software defects have
been uncovered in BGP implementations. Some bugs are cosmetic in nature,
other bugs are of the "if you enable RPKI, the entire router crashes"
severity level. When bugs are identified and fixed, it'll take
additional time for the QA process to complete and 

Cogent RPKI invalid filtering

2021-04-26 Thread Robert Blayzor via NANOG
According to Cloudflares isbgpsafeyet.com, Cogent has been considered 
"safe" and is filtering invalids.


But I have found that to be untrue (mostly). It appears that some days 
they filter IPv4, sometimes not, and IPv6 invalids are always coming 
through. I know it's Cogent, but curious as to what others are seeing.




invalid.rpki.cloudflare.com has address 103.21.244.15
invalid.rpki.cloudflare.com has address 103.21.244.14
invalid.rpki.cloudflare.com has IPv6 address 2606:4700:7000::6715:f40e
invalid.rpki.cloudflare.com has IPv6 address 2606:4700:7000::6715:f40f



BGP routing table entry for 103.21.244.0/24
  174 13335, (aggregated by 13335 172.69.172.1)
  Origin IGP, metric 83040, localpref 100, valid, external, best, 
group-best, import-candidate

  Community: 174:21101 174:22012


BGP routing table entry for 2606:4700:7000::/48
  174 13335, (aggregated by 13335 172.69.172.1)
2001:550:2f01:: from 2001:550:2f01:: (66.28.1.115)
  Origin IGP, metric 83040, localpref 100, valid, external, best, 
group-best, import-candidate

  Received Path ID 0, Local Path ID 1, version 1272502628
  Community: 174:21101 174:22012


--
inoc.net!rblayzor
XMPP: rblayzor.AT.inoc.net
PGP:  https://pgp.inoc.net/rblayzor/