Tor at 1AEO:
Appreciate the support and quick response!
You are welcome!
"You mean true DNS resolution failures per scan or general ones (including DNS)?" Both. Over the last few days, per scan (~3,000 exit relays in the consensus), we’ve observed: ~55 exit relays with true DNS resolution failures, where circuits established successfully but DNS resolution failed with the exit. ~415 exit relays with circuit or infrastructure failures, where we were unable to reach the relay. Breaking that down further: DNS failures (~55 total): ~33 NXDOMAIN responses (SOCKS4: domain not found) ~21 DNS query timeouts (45 seconds, 3 attempts) ~1 case where a wrong IP was returned Circuit / infrastructure failures (~415 total): ~328 cases where the relay channel closed unexpectedly ~59 circuit construction timeouts ~29 circuits closed or destroyed The remaining ~2,570 exit relays (~98%) successfully resolved a unique wildcard DNS query per timestamp and relay fingerprint.
Those results are interesting, thanks. The true DNS failures are pretty high compared to what we have been getting over the years when testing whether example.com and torproject.org are resolvable. Anything between 5 and 10 issues per week seems not unreasonable according to the data we have, but 55 is clearly an outlier worthy some explanation, in particular as we got just 5 relays with issues last week during our weekly scan.
Can you share the fingerprints of the relays you found so I can have a closer look and check whether you might have a bunch of false positives in your results? Are the results for those relays stable if you scan over the course of a couple of days or are they fluctuating?
"It depends on the regularity. What did you have in mind in that regard?" Our current plan is to run scans every few hours to provide faster feedback when an operator is actively troubleshooting an issue, given that this is a single DNS query per relay. We’ll refine cadence as we get feedback and learn more.
Hrm. exitmap can run modules for particular relays (provided on the command line either per fingerprint or file). So, when an operator is troubleshooting an issue it makes more sense to me to run something just with their fingerprints than to scan the network over and over again just so the operator can see whether something is working again for them. Or maybe I missed the point as to why you need to scan the whole network within a couple of hours' frequency for that use case?
I think running the scan for the whole network at most once a day and then zooming closer into relay groups with issues is a strictly better deployment plan.
"I don't think there are any publishing concerns, no. Where are the results supposed to show up?" Results will be published on a small public site with a simple JSON API for programmatic access (tentatively https://exitdnshealth.1aeo.com/), and integrated into operator-facing views (per family / AROI and per relay) on https://metrics.1aeo.com/. Happy to adjust presentation if there are preferences.
Sounds good to me at least.
"As for the methodology it would be very much appreciated if you could upstream your changes to exitmap itself…" Absolutely. We strongly prefer not to maintain a long-term fork and are very open to submitting pull requests. What’s the preferred workflow from your perspective — start with a PR directly, or discuss scope and structure first? For reference, current code lives at: https://github.com/1aeo/exitmap (working on a branch to make the commits easier to follow for sending upstream) https://github.com/1aeo/exitmap-dns-health-deploy
I think filing a ticket at https://gitlab.torproject.org/tpo/network-health/exitmap is a good start. I am happy to file child tickets for different parts of the work if needed (e.g. the structured output of the results), so no worries about that. If it's easiest for you to have one big MR referencing the parent ticket then just raise that one and we can have all the technical discussion there. If you want to scope the potential MR in a ticket discussion first then I am happy to do that as well. Or if there is yet another plan even more appealing to you, go for it. Up to you. :)
New question, from seeing DNSSEC support in exitmap, is DNSSEC expected to be enabled for exit relays? Hearing mixed viewpoints from operators. Debating whether it should be added to this exit relay DNS health effort and leaning towards yes.
No, it's not expected at this point. Sometimes we have modules to only gather information about what the network looks like at a particular point in time, without getting to a unhealthy/healthy discrimination. Thus, I'd say at least at this point, don't worry about the DNSSEC status either in exitmap scanning efforts or actual exit relay support.
Thanks, Georg
On Monday, January 19th, 2026 at 3:14 AM, Georg Koppen via network-health <[email protected]> wrote:Hi!Tor at 1AEO via network-health:Hi Network Health Team,A few large-scale exit relay operators have asked for better visibility into DNS health across their relays. We've built an exitmap module, dnshealth, to address this and want your input before we start running scans and publishing results.Very nice work, thanks!What It Does- Generates unique DNS queries per relay (wildcard subdomain → expected IP) to avoid caches - Classifies failures: timeout, NXDOMAIN, wrong IP, SOCKS errors - Outputs structured JSON with latency and error detailsAll code is open source: https://github.com/1aeo/exitmapInitial testing: ~98% success rate across ~3k exits, 50-90 true failures per scan, 4-8 min runtime.You mean true DNS resolution failures per scan or general ones (including DNS)?Before We Proceed1. Any concerns with us running regular scans and publishing results?It depends on the regularity. What did you have in mind in that regard? We are currently running the dnsresolution module on a weekly basis and informing relay operators in case of trouble and that has been sufficient frequency-wise I think.I don't think there are any publishing concerns, no. Where are the results supposed to show up?2. Recommendations on scan frequency or methodology?Yes. I think weekly should be fine at least for a start. As for the methodology it would be very much appreciated if you could upstream you changes to exitmap itself where it makes sense so we don't start creating duplicated infrastructure. Ideally, there would be only one dnsresolution module and not a myriad of different ones.Happy to adjust our approach based on your guidance.I tried to provide some, let me know if you had something else in mind or I forgot to address anything.Thanks, Georg_______________________________________________ network-health mailing list -- [email protected] To unsubscribe send an email to [email protected]_______________________________________________ network-health mailing list -- [email protected] To unsubscribe send an email to [email protected]
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ network-health mailing list -- [email protected] To unsubscribe send an email to [email protected]
