Emile Aben, I would be happy to see code for detecting overly similar probes. It would save a lot of time spend on data filtering.


These are snapshots of data for probe similarity detection [1] in IPv4 and IPv6

https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv4-2022-06-29.csv
https://sg-pub.ripe.net/emile/probe-similarity/probe_similarity_ipv6-2022-06-29.csv

This calculates 3 similarity values between 0 and 1 (the last 3 values in the csv). Pick the middle one if you don't care about the details.

There are 54k probe-pairs with similarity values over 0.5,
7.4k with value over 0.95
6.7k with value over 0.99

My gut feeling is that anything over 0.95 is likely very redundant for many types of measurements.

There seems to be a cluster of about 400 probes that are very similar to each other, and a couple of smaller clusters too.

Happy to work with you and others to see if we can make this into something that is operationally valuable.

kind regards,
Emile Aben
RIPE NCC

[1] Holterbach, Thomas, et al. "Measurement vantage point selection using a similarity metric." Proceedings of the Applied Networking Research Workshop. 2017.
https://trac.ietf.org/trac/irtf/export/478/www/content/anrw/2017/anrw17-final9.pdf


--
ripe-atlas mailing list
ripe-atlas@ripe.net
https://lists.ripe.net/mailman/listinfo/ripe-atlas

Reply via email to