BTW, here is the transcript: https://www.grc.com/sn/sn-940-notes.pdf
Page 12 is where Ray's project discussion starts. On Friday, September 22, 2023 at 09:43, Ray Voelker eloquently inscribed: > Hi code4lib folks .. and again ... happy Friday!! > > I just wanted to post an update to this. I wrote in to the Security Now! > podcast (fantastic show by the way and fully worth listening to on a > regular basis) about this notion, and it was made the main topic of show > number 940! > > https://twit.tv/shows/security-now/episodes/940?autostart=false > > The discussion starts around the 1:36 mark. > > Here's what I wrote to Steve Gibson: > > In addition to being an avid listener to Security Now, I'm also a System >> Administrator for a large public library system in Ohio. Libraries >> often struggle with data—being especially sensitive around data related >> to patrons and patron behavior in terms of borrowing, library program >> attendance, reference questions, etc. The common practice is for >> libraries to aggregate and then promptly destroy this data within a >> short time frame—which is typically one month. However, administrators >> and local government officials, who are often instrumental in >> allocating library funding and guiding operational strategies, >> frequently ask questions on a larger time scale than one month to >> validate the library's significance and its operational strategies. >> Disaggregation of this data to answer these types of questions is very >> difficult and arguably impossible. This puts people like me, and many >> others like me, in a tough spot in terms of storing and later using >> sensitive data to provide the answers to these questions of pretty >> serious consequence—like, what should we spend money on, or why we >> should continue to exist. > > I’m sure you’re aware, but there are many interesting historical reasons >> for this sensitivity, and organizations like the American Library >> Association (ALA) and other international library associations have >> even codified the protection of patron privacy into their codes of >> ethics. For example, the ALA's Code of Ethics states: "We protect each >> library user's right to privacy and confidentiality with respect to >> information sought or received and resources consulted, borrowed, >> acquired or transmitted." While I deeply respect and admire this >> stance, it doesn't provide a solution for those of us wrestling with >> the aforementioned existential questions. >> > > In this context, I'd be immensely grateful if you could share your insights >> on the technique of "Pseudonymization" ( https:// >> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for >> PII data. Additionally, I'd appreciate a brief review of a Python >> module I'm developing, which aims to assist me (and potentially other >> library professionals) in retaining crucial data for subsequent >> analysis while ensuring data subject privacy. >> https://gist.github.com/rayvoelker/80c 0dfa5cb47e63c7e498bd064d3c0b6 >> <https://t.co/aAapRKgElr> Thank you once again, Steve, for your >> invaluable contributions to the security community. I eagerly await >> your feedback! >> >> > I think the even better solution compared to Pseudonymization involves the > Birthday Paradox. It's a direction I hadn't even thought of for this! > > --Ray > > On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker <ray.voel...@gmail.com> wrote: > >> Hi code4lib folks .. happy Friday! >> >> I started putting together a little Python utility for doing >> Pseudonymization tasks >> (https://en.wikipedia.org/wiki/Pseudonymization). The goal is to be >> able to do more analysis on data related to circulation while securely >> maintaining patron privacy. >> >> For a little bit of background I wanted something *like a hash* (but >> more secure than a hash), for replacing select fields related to patron >> records. I also wanted something that could possibly be reversed given >> an encrypted private key that would be stored well outside of the scope >> of the project. I'm thinking that if you wanted to geocode addresses >> for example, you could temporarily decrypt each field needed for the >> task, use the *pseudonymized* patron id as the identifier, and then >> send your data off to the geocoder of your choice. Another example >> would be to store a pseudonymized patron id as the identifier in things >> like circulation data used for later analysis, or for transmitting to >> trusted 3rd parties who may do analysis for you. >> >> I'm humbly asking for anyone with some background in using encryption to >> review the code I have and maybe offer some comments / concerns / >> suggestions / jokes about this. >> >> Thanks in advance! >> >> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6 >> >> -- >> Ray Voelker >> > >