Emile Aben, I would be happy to see code for detecting overly similar
probes. It would save a lot of time spend on data filtering.
These are snapshots of data for probe similarity detection [1] in IPv4
and IPv6
https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv4-2022-06-29.csv
https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv6-2022-06-29.csv
This calculates 3 similarity values between 0 and 1 (the last 3 values
in the csv). Pick the middle one if you don't care about the details.
There are 54k probe-pairs with similarity values over 0.5,
7.4k with value over 0.95
6.7k with value over 0.99
My gut feeling is that anything over 0.95 is likely very redundant for
many types of measurements.
There seems to be a cluster of about 400 probes that are very similar to
each other, and a couple of smaller clusters too.
Happy to work with you and others to see if we can make this into
something that is operationally valuable.
kind regards,
Emile Aben
RIPE NCC
[1] Holterbach, Thomas, et al. "Measurement vantage point selection
using a similarity metric." Proceedings of the Applied Networking
Research Workshop. 2017.
https://trac.ietf.org/trac/irtf/export/478/www/content/anrw/2017/anrw17-final9.pdf
--
ripe-atlas mailing list
ripe-atlas@ripe.net
https://lists.ripe.net/mailman/listinfo/ripe-atlas