Do you mean this post?

https://sunlightfoundation.com/blog/2014/03/20/a-little-math-could-make-identifiers-a-whole-lot-better/


On Mar 20, 2014, at 3:44 PM, Tom Lee <t...@sunlightfoundation.com> wrote:

> Thanks again to everyone who helped me think through how government's 
> approach to disclosing identifiers could be improved through checksums, 
> tokenization and related techniques -- it was extremely helpful.  The 
> resulting post is here:
> 
> https://sunlightfoundation.com/blog/2013/07/25/the-sunlight-foundations-comments-on-the-faas-proposed-open-data-policy/
> 
> I'd be grateful for any feedback -- or, especially, corrections -- that might 
> occur to you.
> 
> 
> On Thu, Feb 6, 2014 at 3:49 PM, Tom Lee <t...@sunlightfoundation.com> wrote:
> We've been kicking around an idea at Sunlight that aims to use cryptographic 
> ideas to resolve some of the concerns around the publication of publicly 
> identifiable information in government disclosures. I could use some smart 
> people to tell me what's dumb about it.
> 
> We often face challenges related to disambiguating entities: is the John 
> Smith who gave political donation A the same John Smith that gave political 
> donation B? One obvious solution to this problem is to push to expand the 
> information that's collected and disclosed -- if we had John's driver's 
> license number (DLN), for instance, it'd be easy to disambiguate these 
> records. But that could introduce privacy concerns for John. One approach to 
> this problem (which I don't think government has tried) is employing a 
> one-way hash. 
> 
> Obviously the input key space for DLNs and most other personal ID numbers is 
> so small that reversing this with a dictionary attack would be trivial. You 
> can add a salt, but only on a per-entity basis (not a per-record basis) if 
> you want to preserve the capacity to disambiguate. That in turns calls for a 
> lookup table in which the input keys are stored, which kind of defeats the 
> point of using a hash (you might as well just assign random output IDs for 
> each input ID). I would worry about government's ability to keep this lookup 
> table secure, and I worry about the brittleness of such a system.
> 
> Alternately, you can use a single system-wide secret (or set of secrets) to 
> transform inputs into reliable outputs. I think this is less brittle and 
> maybe easier to preserve as a secret, but this system might be too easily 
> reversible given the ability to observe its outputs and know the universe of 
> possible inputs. I'm unsure of the cryptographic options that might be 
> appropriate here.
> 
> For all I know, the lack of implementations using this kind of one-way 
> transformation isn't about government sluggishness but rather about its 
> feasibility. I'd be very curious to hear folks ideas on this score, though.  
> My general hunch is that something must be possible -- even a few bits' worth 
> of disambiguating information would be hugely useful to us, and presumably 
> you're not leaking important amounts of information by, say, sharing the last 
> digit of a DLN. So there must be a spectrum of options. But as is probably 
> apparent, I don't think I've got a handle on how to think about this problem 
> rigorously.
> 
> Tom
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sunlightlabs" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sunlightlabs+unsubscr...@googlegroups.com.
> To post to this group, send email to sunlightl...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sunlightlabs.
> For more options, visit https://groups.google.com/d/optout.

-- 
Liberationtech is public & archives are searchable on Google. Violations of 
list guidelines will get you moderated: 
https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, 
change to digest, or change password by emailing moderator at 
compa...@stanford.edu.

Reply via email to