Summary: 
Yesterday, on 7 January 2021, an issue with our RPKI software caused an 
inconsistent certificate to be published from 15:29-16:20 (UTC+1). This may 
have resulted in outages. We strongly recommend network operators update their 
Relying Party software to the latest version.

Details: 
At 15:06 (UTC+1) yesterday, we processed an outgoing transfer of IP resources 
to another RIR service region. This caused our system to update the 
corresponding RPKI certificates in our Certificate Authority (CA). 

Unfortunately, our RPKI software published the updated parent certificate 
(production CA) ahead of its child certificate (member CA). As a result, in the 
period immediately after the updated parent was published, the child 
certificate (updated later) contained resources that were no longer on the 
updated parent, and the child certificate over-claimed. This was resolved once 
the child certificate was updated.

Currently we have three separate processes:
* One that updates the resources in the registry in RPKI (every 15min) 
* One that updates the resources of the RIPE production CA (parent of all 
member CA) from the registry (1h, takes ~5 min) 
* One that updates the resources for member CAs from the registry (1h, takes 
~40 min)

If there is an outgoing transfer and the member CA update runs before the 
production CA update, the situation with over-claiming occurs. The update of 
the member CA needs to happen at the same time (i.e. same RRDP delta), or 
before the production CA resources are reduced. This does not happen the other 
way around (and so is not an issue with incoming resources). 

Some older Relying Parties had applied a strict manifest handling 
interpretation in their validator software. This meant that they were 
configured to reject all certificates in the manifest if a single entry was 
invalid. As a consequence, all RPKI certificates covering RIPE resources were 
rejected by these validators during this period.

Based on our access logs, we estimate that 327 instances of Relying Party 
software were impacted.

On Monday 11 January, we will implement a fix so that every time a RIPE NCC 
certificate changes, we will look at all members to see if their certificates 
are over-claiming and force an immediate re-issue if so. This approach does not 
give us a 100% bullet-proof fix to the problem, but it reduces the period of 
over-claiming from an hour to a couple of minutes. 
We will work on reducing this time to less than a minute, to further reduce the 
potential for inconsistency. In the longer term, we will work on implementing 
atomic publishing of data for this type of situation. 

In the meantime, we strongly recommend that network operators update their RPKI 
Relying Party software to the latest version: 
* Routinator 0.8.2 
* rpki-client 6.8p1 
* FORT 1.4.2 
* octorpki 1.2.2 
* RIPE NCC 3.2-2020.12.10.13.57

Best regards,

Nathalie Trenaman
Routing Security Programme Manager
RIPE NCC

Reply via email to