On Fri, 2008-05-16 at 14:48 -0400, Peter Beckman wrote: > On Fri, 16 May 2008, Dean Collins wrote: > > > It would be voluntary to download the module in the first place so not > > necessary to be able to 'turn off'. > > Even "scrubbed" anonymous data has been able to be used in ways that > really don't make it anonymous. When Netflix offered $1m to anyone who > could improve their movie recommendations, they released a large amount of > what they believed was scrubbed, anonymous data. Turns out, it wasn't so > anonymous.
they didnt scrub it properly, what they did effectively was replace the name with a token that represents that person. What researchers did then was filter the data and remove in the case of netflix the most commonly viewed films, and look for the less common ones, then match the token to someone they targeted. IIRC they couldnt just guess which token went to which person, they had to target a person first then find their token. While that does still present some issues, and the more semi-scrubbed aggregate data that is published the easier it becomes to cross reference to get even more information. Another paper discussed this possibility, which can even lead to discovery of identities where no other method was previously available. So to correct you on what you said "Turns out, it wasn't so SCRUBBED" :) how one would properly scrub depends largely on the data in question, who its released to, etc. The more its scrubbed though the less valuable it becomes to many. For example if you were an ITSP publishing call figures, you could do like netflix and just replace the customer acct number with some token, but that lets you see that token X called all these numbers, and low and behold you got a call from X which is listed, you now know who X is. But if you released figures that broke the data down into simple stats, customer X did Y minutes to country Z, A% of traffic was during these hours, etc, it isnt as useful to many making doing it less rewarding. You could of course do well to mask all the numbers in this particular example, maybe just list the region its in (US state for example) and not even the city. In that way you could try to reduce more and more the information but still have some value. There will of course be those that dont want to participate, and I generally think that tying aggregate data to the use of a product is a bad idea and that it should be optional. At the very least it should be well revealed that this is going on, especially since some places dont allow this without implicit not tacit agreements over this. There is the potential in some places (and it would take only the customer being there not necessarily the business) for a civil or criminal charge to occur, and if its criminal it can get ugly with extradition and all that. Data privacy is a touchy thing in some parts of the world. -- Trixter http://www.0xdecafbad.com Bret McDanel Belfast +44 28 9099 6461 US +1 516 687 5200 http://www.trxtel.com the phone company that pays you! _______________________________________________ --Bandwidth and Colocation Provided by http://www.api-digital.com-- asterisk-biz mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-biz
