*   Could you highlight a bit more your proposal here? My understanding is 
that, despite the Handelsregister ("Commercial Register") being available at a 
country level, it's further subdivided into a list of couunty or region - e.g. 
the Amtsgericht Herne ("Local Court Herne").


  *   It sounds like you're still preparing to allow for manual/human input, 
and simply consistency checking. Is there a reason to not use an 
allowlist-based approach, in which your Registration Agents may only select 
from an approved list of County/Region/Locality managed by your Compliance Team?


  *   That, of course, still allows for human error. Using the excellent 
example of the Handelsregister, perhaps you could describe a bit more the flow 
a Validation Specialist would go through. Are they expected to examine a faxed 
hardcopy? Or do they go to handelsregister.de<http://handelsregister.de> and 
look up via the registration code?



  *   I ask, because it strikes me that this could be an example where a CA 
could further improve automation. For example, it's not difficult to imagine 
that a locally-developed extension could know the webpages used for validation 
of the information, and extract the salient info, when that information is not 
easily encoded in a URL. For those not familiar, Handelsregister encodes the 
parameters via form POST, a fairly common approach for these company registers, 
and thus makes it difficult to store a canonical resource URL for, say, a 
server-to-server retrieval. This would help you quickly and systematically 
identify the relevant jurisdiction and court, and in a way that doesn't involve 
human error.

I did not know that about Handelsregister. So that’s good info.  Right now, the 
validation staff selects Handelsregister as the source, the system retrieves 
the information, the staff then selects the jurisdiction information and enters 
the registration information. Germany is locked in as the country of 
verification (because Handelsregister is the source), but the staff enters the 
locality/state type information as the system doesn’t know which region is 
correct.

The idea is that everywhere we can, the process should automatically fill in 
jurisdiction information for the validation staff so no typing is required. 
This is being done in three parts:

  1.  Immediate (aka Stop the Hurt): The first step is to put the GeoCode check 
in place to ensure that no matter what there will be valid non-mispelled 
information in the certificate. There will still be user-typed information 
during the phase since this phase is Aug 18 2019. The system will work exactly 
as it does now except that the JOI information will run through the GeoCode 
system to verify that yes, this information isn’t wrong. If wrong, the system 
won’t allow the cert to be approved.  At this point, no new issues should 
occur, but I won’t be satisfied as its way too manual – and the registration 
number is still a manual entry. That needs to change.
  2.  Intermediate (aka Neuter the Staff): During this process we plan to 
eliminate typing of sources. Instead, the sources will be picklists based on 
jurisdiction. This means that if you select Germany and the company type is an 
LLC, you get a list of available sources. Fool proof-ish. There’s still a 
copy/paste or manual entry of the registration number. For those sources that 
do provide an API, we can tie into the API, retrieve the documentation, and 
populate the information.  We want to do that as well, provided it doesn’t 
throw off phase 3. Since the intermediate solution is also a stop-gap to the 
final solution, we want it to be a substantial improvement but one that doesn’t 
impede our final destination.
  3.  The refactor (aka Documents r Us): This is still very much being specc’ed 
but we’re currently thinking we want to evolve the system to a document system. 
Right now the system works on checklists. For JOI, you enter the JOI part, 
select a document (or two) that you’ll to verify JOI and then transfer 
information to the system from the document. The revamp moves it to where you 
have the document and specify on the document which parts of the document apply 
to the organization. For example, you specify on the document that a number is 
a registration number or that a name is an org name, highlighting the info.  
With auto-detection of the fields (just based on key words), you end up with a 
pretty dang automated system. The validation staff is there to review for 
accuracy and highlight things that might be missed. Hence, no typing or 
specifying any information. It’s all directly from the source.

Naming conventions also not approved yet. Since the engineers watch this forum, 
they’ll probably throw things at me when they see the code names.


  *   I'm curious how well that approach generalizes, and/or what challenges 
may exist. I totally understand that for registries which solely use hard 
copies, this is a far more difficult task than it needs to be, and thus an 
element of human review. However, depending on how prevalent the hardcopy vs 
online copy is, we might be able to pursue automation for more, and thus 
increase the stringency for the exceptions that do involve physical copies.

Right now we get the hard copies and turn them into a PDF to store in the audit 
system for review during internal and external audits.  During validation, all 
documentation must be present and reviewed. Using OCR better, we can always at 
least copy and paste information instead of typing it.

The more interesting part (in my opinion) is how to find and address these 
certs. Right now, every time we have an issue or whenever a guideline changes 
we write a lot of code, pull a lot of certs, and spend a lot of time reviewing. 
Instead of doing this every time, we're going to develop a tool that will run 
automatically every time we change a validation rule to find everything else 
that will fail the new update rules. IN essence, building unit tests on the 
data. What I like about this approach is it ends up building a system that lets 
us see how all the rule changes interplay since sometimes they may intercept in 
weird ways. It'll also let us easier measure impact of changes on the system. 
Anyway, I like the idea. Thought I'd share it here to get feedback and 
suggestions for improvement. Still in spec phase, but I can share more info as 
it gets developed.


  *   This sounds like a great idea, and would love to know more details here.  
For example, what's the process now for identifying these 
jurisdictionOfIncorporation issues? How would it improve or change with this 
system?


The process right now is we right a script based on things we can think of that 
might be wrong (abbreviated states, the word “some” in the state field, etc). 
We usually pull a sampling of a couple thousand certs and review those to see 
if we can find anything wrong that can help identify other patterns. We’re in 
the middle of doing that for the JOI issues.  What would be WAY better is if we 
had rule sets for validation information (similar to cablint) that checked 
validation information and how it is stored in our system and made these rule 
sets run on the complete data every time we change something in validation. 
Right now, we build quick and dirty checks that run one time when we have an 
incident. That’s not great as it’s a lot of stuff we can’t reuse. What we 
should do is build something (that crossing my fingers we can open source and 
share) that will be a library of checks on validation information. Sure, it’ll 
take a lot of configuration to work with how other CAs store data, but one 
thing we’ve seen problems with is that changes in one system lead to 
un-expected potential non-compliances in others. Having something  that works 
cross-functionally throughout the system helps.

A better example in some-state. We scanned for values not listed as states and 
cities that have “some”, “any”, “none”, etc. That only finds a limited set of 
the problem, and obviously missed the JOI information (not part of the same 
data set. Going forward, I want a rule set that says, is this a state? If so, 
then check this source to see if it’s a real state. Then check this to see if 
it also exists in the country specified. Then check to see if the locality 
specified exists in the state. Then see if there is a red flag from a map that 
says the org doesn’t exist. (The map check is coming – not there yet….) Instead 
of finding small one off problems people report, find them on a global scale 
with a rule we run every time something in the CAB forum, Mozilla policy, or 
our own system changes.



  *   You describe it as "validation rule" changes - and I'm not sure if you're 
talking about the BRs (i.e. "we validated this org at time X") or something 
else. I'm not sure whether you're adding additional data, or formalizing checks 
on existing data. More details here could definitely help try and generalize 
it, and might be able to formalize it as a best practice. Alternatively, even 
if we can't formalize it as a requirement, it may be able to use as the basis 
when evaluating potential impact or cost of changes (to policy or the BRs) in 
the future. That is, "any CA that has implemented (system you describe) should 
be able to provide quantifiable data about the impact of (proposed change X). 
If CAs cannot do so (because they did not implement the change), their feedback 
and concerns will not be considered."

Validation rule meaning our own system, the CAB forum, mozilla policy. 
Basically, anything that could call into question the integrity of some data 
piece within our system. The point is to catch all changes that may happen 
proactively, not just when someone pings me with a problem.  The requirement I 
think we’re trying to meet is “never have the same problem again, even if a 
rule changes” because the system will take that one problem, log it as a unit 
test, and run that unit test ever time we change the internal rule set to 
detect all data that violates that rule as modified.  Illustrative example: 
Assume we decide we want all states abbreviated.  Note this would contradict 
the rule in the EV guidelines that requires JOI states to be written out. Right 
now, this contradiction could pass undetected by a lot of CA systems I think. 
However, if you have a rule set that can be enforced globally across the entire 
data set, you end up instantly detecting that no valid EV cert could ever 
issue. Danger! Anyway, the value of this is pretty huge internally IMO. And for 
compliance, it’ll make our job easier. No more 3% audits trying to catch 
mistakes.

_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to