>>  I'm a little nervous about encouraging wide use of OCR. You may recall at 
>> least one CA was bit by an issue in which their OCR system misidentified 
>> letters - https://bugzilla.mozilla.org/show_bug.cgi?id=1311713

>> That's why I was keen to suggest technical solutions which would verify and 
>> cross-check. My main concern here would be, admittedly, to ensure the 
>> serialNumber itself is reliably entered and detected. Extracting that from a 
>> system, such as you could due via an Extension when looking at, say, the 
>> Handelsregister, is a possible path to reduce both human transcription and 
>> machine-aided transcription issues.

Right – and the OCR there is just to make the initial assessment. The idea is 
to still require validation staff to select the appropriate fields. I like the 
idea of cross-checking. Maybe what we can also do is tie into a non-primary 
source (Like D&B or something) to confirm the  jurisdiction information. We’ll 
have to evaluate it, but I like the idea of cross-checking against a reliable 
source that has an API even if we can’t use the source as our primary source 
for that information. I’ll need to investigate, but it should be possible for 
most of EU and the US. Less so for the middle east and Asia.

>> Of course, alternative ways of cross-checking and vetting that data may 
>> exist. Alternatively, it may be that the solution would be to only allowlist 
>> the use of validation sources that made their datasets machine readable - 
>> this would/could address a host of issues in terms of quality. I'm 
>> admittedly not sure the extent to which organizations still rely on legacy 
>> paper trails, and I understand they're still unfortunately common in some 
>> jurisdictions, particularly in the Asia/Pacific region, so it may not be as 
>> viable.

Yeah – that mean you basically can’t issue in middle east and mot of Asia. 
Japan would still work. China I’d have to look. Like I said, there could be 
non-primary sources that could correlate. We’ll spec that out as we get closer 
and see what we can do for cross-correlation. May be that we can have enough 
somethings world-wide that you can always confirm registration with a secondary 
source.

The process right now is we right a script based on things we can think of that 
might be wrong (abbreviated states, the word “some” in the state field, etc). 
We usually pull a sampling of a couple thousand certs and review those to see 
if we can find anything wrong that can help identify other patterns. We’re in 
the middle of doing that for the JOI issues.  What would be WAY better is if we 
had rule sets for validation information (similar to cablint) that checked 
validation information and how it is stored in our system and made these rule 
sets run on the complete data every time we change something in validation. 
Right now, we build quick and dirty checks that run one time when we have an 
incident. That’s not great as it’s a lot of stuff we can’t reuse. What we 
should do is build something (that crossing my fingers we can open source and 
share) that will be a library of checks on validation information. Sure, it’ll 
take a lot of configuration to work with how other CAs store data, but one 
thing we’ve seen problems with is that changes in one system lead to 
un-expected potential non-compliances in others. Having something  that works 
cross-functionally throughout the system helps.


  *   Hugely, and this is exactly the kind of stuff I'm excited to see CAs 
discussing and potentially sharing. I think there are some opportunities for 
incremental improvements here that may be worth looking at, even before that 
final stage.


  *   I would argue a source of (some of) these problems is ambiguity that is 
left to the CA's discretion. For example, is the state abbreviated or not? Is 
the jurisdictional information clear?  Who are the authorized registries for a 
jurisdiction that a CA can use?

I think that’s definitely true. There’s lots of ambiguities in the EV 
guidelines. You and I were talking about Incorporating Agencies, which is not 
really defined as incorporating agencies. Note that CAs can use Incorporating 
Agencies or Registration Agencies to confirm identity, which is very broad, but 
there is no indication in the certificate what that means.

> I can think of some incremental steps here:
> - Disclosing exact detailed procedures via CP/CPS

Maybe an addendum to the CPS. Or RPS. I’ll experiment and post something to see 
what the community thinks.

>  - An emphasis should be on allowlisting. Anything not on the allowlist 
> *should* be an exceptional thing.

This we actually have internally. Or are you saying across the industry? The 
allow list internally is something prevetted by compliance and legal. We’re 
currently (prompted by a certificate problem report) reviewing the entire 
allowed list to see what’s there and taking anything off that I don’t like. 
Basically we’re using your suggestion of 
https://www.gleif.org/en/about-lei/code-lists/gleif-registration-authorities-list
 plus a couple of lists for banking (like FDIC).

 > - For example, stating DigiCert will always use a State from ISO 3166-2 
 > makes it clear, and also makes it something verifiable (i.e. someone can 
 > implement an
automated check)

Maybe what we’ll do is keep a running list of the checks. We’re finalizing on 
spelling out all states. No abbreviations.  This is something we can specify in 
our RPS – how it looks for each field.

 > - Similarly, enumerating the registries used makes it possible, in many 
 > cases, to automatically check the serialNumber for both format and accuracy

Checking the registration number for format and accuracy is something I 
proposed for the new project, but I wasn’t sure how feasible it was considering 
the wide variation. You end up with a lot of different numbers. I wonder if you 
could get it to range for formats? That would certainly be doable while adding 
some layers of protection.

>- Modifying the CA/B Forum documents to formalize those processes, by 
>explicitly removing the ambiguity or CA discretion. DigiCert's done well here 
>in the past, removing validation methods like 3.2.2.4.1 / 3.2.2.4.5 due to 
>their misuse and danger

One ballot I do want to pass is adding a field for the JOI entity information. 
This way everyone can see where the registration number originated. Short of a 
formalized CAB forum list of permitted entities (which is also on the table), 
this would make it very easy to have a conversation on whether what the 
registration number means. There’s probably others, but that’s a request that’s 
been surfacing a few times.

> - Writing automated tooling to vet/validate
This is where we are going for sure.



  *   The nice part is that by formalizing the rules, you can benefit a lot 
from improved checking that the community may develop, and if it doesn't 
materialize, contribute your own to the benefit of the community.




A better example in some-state. We scanned for values not listed as states and 
cities that have “some”, “any”, “none”, etc. That only finds a limited set of 
the problem, and obviously missed the JOI information (not part of the same 
data set. Going forward, I want a rule set that says, is this a state? If so, 
then check this source to see if it’s a real state. Then check this to see if 
it also exists in the country specified. Then check to see if the locality 
specified exists in the state. Then see if there is a red flag from a map that 
says the org doesn’t exist. (The map check is coming – not there yet….) Instead 
of finding small one off problems people report, find them on a global scale 
with a rule we run every time something in the CAB forum, Mozilla policy, or 
our own system changes.

>> Yes, this is the expectation of all CAs.

>> As I understand it, following CAs' remediation of Some-State, etc, this is 
>> exactly what members of the community went and did. This is not surprising, 
>> since one of the clearly identified best practices from that discussion was 
>> to look at ISO 3166-1/ISO 3166-2 for such information inconsistency. 
>> SecureTrust, one of the illustrative good reports, did exactly that, and 
>> that's why it's such a perfect example. It's unfortunate that a number of 
>> other CAs didn't, which is why on the incident reports, I've continued to 
>> push them in terms of their evaluation and disclosure.

>> This is the exact goal of Incident Reports: identifying not just the 
>> incidents, but the systemic issues, devising solutions that can work, and 
>> making sure to holistically remediate the problem.

Right, and we did this for the location on our some state issues (on all the 
data). But that was a one-time scan and reported it to compliance for review. 
It was a little script we wrote. What I want the system to do is scan for this 
particular change every time the validation system changes to make sure nothing 
contradicts this and invalidate all validations that break a rule.

_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to