You can also check out the Data Portability project. It sort of turns things on its head to use it in this way, but the issues remain.
On Jan 16, 2011, at 7:00 AM, [email protected] wrote: > Send developers-public mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public > > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of developers-public digest..." > Today's Topics: > > 1. Re: Public Data Corporation (Francis Irving) > 2. Re: Public Data Corporation ('Dragon' Dave McKee) > 3. Re: Public Data Corporation ('Dragon' Dave McKee) > > From: Francis Irving <[email protected]> > Date: January 15, 2011 8:31:39 PM EST > To: "mySociety public, general purpose discussion list" > <[email protected]> > Subject: Re: [mySociety:public] Public Data Corporation > Reply-To: "mySociety public, general purpose discussion list" > <[email protected]> > > > Yes - to me the most shocking thing about MPs expenses was this. > > That is - before hand I had reckoned it was fair enough to redact MPs > addresses for data protection reasons (I remember saying this to > Heather Brook). > > But it turned out that she was right - the addresses were vital. So > the only way of balancing the public and private nature of the data > was for a third party (the Telegraph) to examine it, and play fair for > both DPA of innocent MPs, and the public interest of detecting home > flipping. > > Now, Tom (if he is still reading this), or anyone who can remember... > > What is the name of the query language for quizzing databases with a > certain level of privacy as a parameter of the query? It was a very > clever theoretical thing, I think from Microsoft research, and gets to > the core of this debate. > > Francis > > On Sat, Jan 15, 2011 at 08:40:14AM +0000, Tim Green wrote: >> I remember wondering this for the MPs expenses stuff - them >> objecting to the publication of addresses meaning that you wouldn't >> have been able to spot flipping. >> >> Thoughts: >> a) Not sure how you'd explain hashing and salting to someone. >> b) With only a few tens of millions of addresses, even with a salt, >> it could be trivial to brute-force someone's address hash. You'd >> have to estimate the current and future cost of the resources >> involved. >> >> Tim >> >> On 15/01/2011 01:29, 'Dragon' Dave McKee wrote: >>> (I know this is pie-in-the-sky thinking but...) >>> >>> The issue with the personally identifying information is that... well, >>> it identifies a person. >>> >>> However, we don't necessarily want to identify that person, just >>> confirm that record A and record B refer to the same person. >>> >>> Couldn't we take a hash (with appropriate salt etc) of the personally >>> identifying information to permit these comparisons, without providing >>> actual identifying information? >>> >>> Addresses can be normalised to whatever Royal Mail believes it should be. >>> Names are harder, and more mutable - surname changes mess up most >>> systems - but could potentially have different hashes (surname / >>> surname + forename / surname + all names) to allow for partial >>> matches. (We could salt it with further information - perhaps address? >>> - to avoid 'SMITH' being the encoding for the most common surname >>> hash.) >>> >>> There could even be a system to convert hashes from one system to >>> hashes in another system, but not necessarily vice-versa. >>> >>> This doesn't necessarily solve the underlying problem, but might go >>> some way to finding middle ground. >>> >>> _______________________________________________ >>> developers-public mailing list >>> [email protected] >>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >>> >>> Unsubscribe: >>> https://secure.mysociety.org/admin/lists/mailman/options/developers-public/timothy.green%40gmail.com >> >> >> _______________________________________________ >> developers-public mailing list >> [email protected] >> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> >> Unsubscribe: >> https://secure.mysociety.org/admin/lists/mailman/options/developers-public/francis%40mysociety.org >> > > > > > > From: "'Dragon' Dave McKee" <[email protected]> > Date: January 15, 2011 8:36:38 PM EST > To: "mySociety public, general purpose discussion list" > <[email protected]> > Subject: Re: [mySociety:public] Public Data Corporation > Reply-To: "mySociety public, general purpose discussion list" > <[email protected]> > > >> a) Not sure how you'd explain hashing and salting to someone. > I'm not 100% I've used the right terms, but a hash would be 'a unique > identifier for each address which doesn't reveal the address but shows > whether two addresses are the same' - avoiding discussion of what > actually happens behind the scenes and neatly sidestepping the > question of salting. :) > >> b) With only a few tens of millions of addresses, even with a salt, it could >> be trivial to brute-force someone's address hash. You'd have to estimate the >> current and future cost of the resources involved. > > With a sufficiently large secret salt (hundreds of bits, I should > think), it should be infeasible. Or we could potentially encrypt the > normalised address (which I think is broadly equivalent and should > neatly avoid any issue of collisions!). Granted, if the secret got out > then there would be major issues (think of any of the many 'X > department lost Y data on Z mode of transport' stories), but this > secret isn't something that should ever need to leave a datacentre. > > These are ways of doing it, but I'm fairly sure they're not the ways > of doing it *right*. > > Dave. > > > > > > From: "'Dragon' Dave McKee" <[email protected]> > Date: January 15, 2011 8:39:33 PM EST > To: "mySociety public, general purpose discussion list" > <[email protected]> > Subject: Re: [mySociety:public] Public Data Corporation > Reply-To: "mySociety public, general purpose discussion list" > <[email protected]> > > >> What is the name of the query language for quizzing databases with a >> certain level of privacy as a parameter of the query? It was a very >> clever theoretical thing, I think from Microsoft research, and gets to >> the core of this debate. > > This looks about right? > http://research.microsoft.com/en-us/projects/pinq/tutorial.aspx > > > > > _______________________________________________ > Mailing list [email protected] > Unsubscribe, archive or settings: > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
_______________________________________________ developers-public mailing list [email protected] https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public Unsubscribe: https://secure.mysociety.org/admin/lists/mailman/options/developers-public/archive%40mail-archive.com
