Tor at 1AEO:
Appreciate the support and quick response!

You are welcome!

"You mean true DNS resolution failures per scan or general ones (including 
DNS)?"
Both. Over the last few days, per scan (~3,000 exit relays in the consensus), 
we’ve observed:

~55 exit relays with true DNS resolution failures, where circuits established 
successfully but DNS resolution failed with the exit.

~415 exit relays with circuit or infrastructure failures, where we were unable 
to reach the relay.

Breaking that down further:

DNS failures (~55 total):
~33 NXDOMAIN responses (SOCKS4: domain not found)
~21 DNS query timeouts (45 seconds, 3 attempts)
~1 case where a wrong IP was returned

Circuit / infrastructure failures (~415 total):
~328 cases where the relay channel closed unexpectedly
~59 circuit construction timeouts
~29 circuits closed or destroyed

The remaining ~2,570 exit relays (~98%) successfully resolved a unique wildcard 
DNS query per timestamp and relay fingerprint.

Those results are interesting, thanks. The true DNS failures are pretty high compared to what we have been getting over the years when testing whether example.com and torproject.org are resolvable. Anything between 5 and 10 issues per week seems not unreasonable according to the data we have, but 55 is clearly an outlier worthy some explanation, in particular as we got just 5 relays with issues last week during our weekly scan.

Can you share the fingerprints of the relays you found so I can have a closer look and check whether you might have a bunch of false positives in your results? Are the results for those relays stable if you scan over the course of a couple of days or are they fluctuating?


"It depends on the regularity. What did you have in mind in that regard?"

Our current plan is to run scans every few hours to provide faster feedback 
when an operator is actively troubleshooting an issue, given that this is a 
single DNS query per relay. We’ll refine cadence as we get feedback and learn 
more.

Hrm. exitmap can run modules for particular relays (provided on the command line either per fingerprint or file). So, when an operator is troubleshooting an issue it makes more sense to me to run something just with their fingerprints than to scan the network over and over again just so the operator can see whether something is working again for them. Or maybe I missed the point as to why you need to scan the whole network within a couple of hours' frequency for that use case?

I think running the scan for the whole network at most once a day and then zooming closer into relay groups with issues is a strictly better deployment plan.


"I don't think there are any publishing concerns, no. Where are the results supposed 
to show up?"

Results will be published on a small public site with a simple JSON API for 
programmatic access (tentatively https://exitdnshealth.1aeo.com/), and 
integrated into operator-facing views (per family / AROI and per relay) on 
https://metrics.1aeo.com/. Happy to adjust presentation if there are 
preferences.

Sounds good to me at least.


"As for the methodology it would be very much appreciated if you could upstream your 
changes to exitmap itself…"

Absolutely. We strongly prefer not to maintain a long-term fork and are very 
open to submitting pull requests.

What’s the preferred workflow from your perspective — start with a PR directly, 
or discuss scope and structure first? For reference, current code lives at:
https://github.com/1aeo/exitmap (working on a branch to make the commits easier 
to follow for sending upstream)
https://github.com/1aeo/exitmap-dns-health-deploy

I think filing a ticket at https://gitlab.torproject.org/tpo/network-health/exitmap is a good start. I am happy to file child tickets for different parts of the work if needed (e.g. the structured output of the results), so no worries about that. If it's easiest for you to have one big MR referencing the parent ticket then just raise that one and we can have all the technical discussion there. If you want to scope the potential MR in a ticket discussion first then I am happy to do that as well. Or if there is yet another plan even more appealing to you, go for it. Up to you. :)


New question, from seeing DNSSEC support in exitmap, is DNSSEC expected to be 
enabled for exit relays? Hearing mixed viewpoints from operators. Debating 
whether it should be added to this exit relay DNS health effort and leaning 
towards yes.

No, it's not expected at this point. Sometimes we have modules to only gather information about what the network looks like at a particular point in time, without getting to a unhealthy/healthy discrimination. Thus, I'd say at least at this point, don't worry about the DNSSEC status either in exitmap scanning efforts or actual exit relay support.

Thanks,
Georg


On Monday, January 19th, 2026 at 3:14 AM, Georg Koppen via network-health 
<[email protected]> wrote:





Hi!


Tor at 1AEO via network-health:


Hi Network Health Team,


A few large-scale exit relay operators have asked for better visibility into
DNS health across their relays. We've built an exitmap module, dnshealth, to 
address this
and want your input before we start running scans and publishing results.




Very nice work, thanks!


What It Does


- Generates unique DNS queries per relay (wildcard subdomain → expected IP) to 
avoid caches
- Classifies failures: timeout, NXDOMAIN, wrong IP, SOCKS errors
- Outputs structured JSON with latency and error details


All code is open source: https://github.com/1aeo/exitmap


Initial testing: ~98% success rate across ~3k exits, 50-90 true failures per
scan, 4-8 min runtime.




You mean true DNS resolution failures per scan or general ones
(including DNS)?


Before We Proceed


1. Any concerns with us running regular scans and publishing results?




It depends on the regularity. What did you have in mind in that regard?
We are currently running the dnsresolution module on a weekly basis and
informing relay operators in case of trouble and that has been
sufficient frequency-wise I think.


I don't think there are any publishing concerns, no. Where are the
results supposed to show up?


2. Recommendations on scan frequency or methodology?




Yes. I think weekly should be fine at least for a start. As for the
methodology it would be very much appreciated if you could upstream you
changes to exitmap itself where it makes sense so we don't start
creating duplicated infrastructure. Ideally, there would be only one
dnsresolution module and not a myriad of different ones.


Happy to adjust our approach based on your guidance.




I tried to provide some, let me know if you had something else in mind
or I forgot to address anything.


Thanks,
Georg


_______________________________________________
network-health mailing list -- [email protected]
To unsubscribe send an email to [email protected]




_______________________________________________
network-health mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
network-health mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to