> There sure are some bizarre things in CNs... is there any chance you could
> implement what Marcus Ranum calls "artificial stupidity" in your anomaly-
> detection, create a filter that accepts all standard DN component styles and
> then kick out certs that don't pass the filter?
that's essentially what I did yesterday tho I refined the regex a bit this
morning. Here's the (perl) regex..
/[Cc][Nn]=(?!([\-\w\*]+\.)+[\-\w]+)/
..which appears to also work fine in NEdit. What it does (my intent anyway, I
am not an awesome regex master) is recognize any CN-ID pattern that is /not/
syntatically a dot-separated LDH (letter digit hyphen) DNS domain name.
Employed thus..
while (<>) {
if ( /[Cc][Nn]=(?!([\-\w\*]+\.)+[\-\w]+)/ ) {
print "\n"; print;
};
};
I think it does what you suggest below. What I fed into the above loop is a txt
file output by querying the raw database table for all contacted domains and
returned cert subject values (one pair per line, would of course work on a file
with just subjects in it).
> In other words instead of
> trying to create a regex to detect all the bizarre things that turn up in
> there, create one to pass normal DNs and treat everything that doesn't pass as
> an anomaly?
yes, done.
So there appear to (actually) be 433 (I'd mis-counted yesterday (haste makes
waste)) such subject DNs (622 is an incorrect count AFAICT today).
Note that these subject DNs may /also/ contain syntactically correct CN-ID
values (many do, some do not).
> It'd be interesting to see what sort of stuff is floating around
> out there...
of course. Note that the data is nominally available upon request as Ivan has
blogged, which I posted here earlier here..
[certid] fyi: Ivan Ristic / Qualsys SSL Labs release raw data from the Internet
SSL survey
http://www.ietf.org/mail-archive/web/certid/current/msg00484.html
There's a shrink-wrap EULA that Ivan wants folks to agree to before obtaining
the data and I haven't groveled thru it enough to tell whether it's legit to
just post all 433 results (or any other full result set) to the a public list &
archive such as this.
Also note that he/qualsys may re-run the survey periodically. This is also the
intention of the EFF folks and their "TLS/SSL observatory"
<https://www.eff.org/observatory> (tho they have yet to publicly release their
data). (note also that their survey data collection methodologies differ -- EFF
folks say they "NMAPed Internet for hosts listening on tcp 443", where Ivan
queried publicly registered domain names in a (large) subset of all TLDs)
HTH,
=JeffH
_______________________________________________
certid mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/certid