Re: [db-wg] NRTM replication inefficiencies

Tim Bruijnzeels via db-wg Thu, 09 Nov 2017 07:28:30 -0800

Hi Job, WG,

> On 7 Nov 2017, at 23:11, Job Snijders via db-wg <db-wg@ripe.net> wrote:
> 
> I would also welcome an investigation into alternative approaches, (some
> not-via-WHOIS replication mechanisms), perhaps something over HTTPS can
> be done? Either way, something more robust would be useful.


We recently developed and implement a standard for something similar for RPKI:
https://datatracker.ietf.org/doc/rfc8182/

I believe this approach can be useful here as well. Without going into all the 
RPKI specifics, it works a little something like this:

Starting points:
= The state of the rpki repository (or whois) at a given point in time can 
represented by a ‘snapshot’
    - This snapshot is “immutable” - therefore they may be cached indefinitely 
and we can give it a unique URL and deliver it through a distributed CDN
= The delta between two consecutive snapshots is also “immutable” data - so 
again we can cache it and give it a unique URL and distribute
= We can publish a notification file (which should NOT be cached) that points 
to:
   - the CURRENT snapshot
   - a list of deltas (each for 1 increment) - total size of deltas MUST not 
exceed size of snapshot

Clients can then just poll the notification file and work out for themselves 
whether a list of deltas is available to them, or that they need to get the 
latest snapshot instead.

Yes, we use a session_id and hashes of referenced files for additional checks 
(details in the RFC).

The idea behind this design was that we wanted to minimise the impact on the 
server. In a chatty protocol (like rsync which is still used in RPKI) the 
server and client need to work out their differences together to determine what 
needs to be transferred. This is fine in one on one relations, but when a 
server needs to serve a multitude of clients this doesn’t scale. We want to be 
able parallelise as much as we can (Amdahl’s law), so we push the computational 
burden to the clients. The server just needs a one-off investment to create the 
snapshot and delta and latest notification which it can then offload. Using 
HTTPS allows us to leverage one of the many, many CDNs out there. This problem 
has been solved in the industry. So we do not need to invent our own 
infrastructure for this.

Note that in the case of RPKI the protocol is XML based. This made sense 
because it leveraged existing definitions in the RPKI space that were also XML 
based. For whois it may make more sense to look at JSON and/or RDAP.

Please let me know if you see merit in this kind of ‘delta’ protocol in the 
whois space.


Kind regards


Tim Bruijnzeels
Assistant Manager Software Engineering and Senior Technology Officer
RIPE NCC

Re: [db-wg] NRTM replication inefficiencies

Reply via email to