It's Friday again, so another update on this project! lol I put together what I think is a pretty good explanation and framing of the problem and a possible solution!
https://chimpy.me/blog/ --Ray On Fri, Sep 22, 2023 at 9:43 AM Ray Voelker <ray.voel...@gmail.com> wrote: > Hi code4lib folks .. and again ... happy Friday!! > > I just wanted to post an update to this. I wrote in to the Security Now! > podcast (fantastic show by the way and fully worth listening to on a > regular basis) about this notion, and it was made the main topic of show > number 940! > > https://twit.tv/shows/security-now/episodes/940?autostart=false > > The discussion starts around the 1:36 mark. > > Here's what I wrote to Steve Gibson: > > In addition to being an avid listener to Security Now, I'm also a System >> Administrator for a large public library system in Ohio. Libraries often >> struggle with data—being especially sensitive around data related to >> patrons and patron behavior in terms of borrowing, library program >> attendance, reference questions, etc. The common practice is for libraries >> to aggregate and then promptly destroy this data within a short time >> frame—which is typically one month. However, administrators and local >> government officials, who are often instrumental in allocating library >> funding and guiding operational strategies, frequently ask questions on a >> larger time scale than one month to validate the library's significance and >> its operational strategies. Disaggregation of this data to answer these >> types of questions is very difficult and arguably impossible. This puts >> people like me, and many others like me, in a tough spot in terms of >> storing and later using sensitive data to provide the answers to these >> questions of pretty serious consequence—like, what should we spend money >> on, or why we should continue to exist. > > > I’m sure you’re aware, but there are many interesting historical reasons >> for this sensitivity, and organizations like the American Library >> Association (ALA) and other international library associations have even >> codified the protection of patron privacy into their codes of ethics. For >> example, the ALA's Code of Ethics states: "We protect each library user's >> right to privacy and confidentiality with respect to information sought or >> received and resources consulted, borrowed, acquired or transmitted." While >> I deeply respect and admire this stance, it doesn't provide a solution for >> those of us wrestling with the aforementioned existential questions. >> > > In this context, I'd be immensely grateful if you could share your >> insights on the technique of "Pseudonymization" ( https:// >> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for >> PII data. Additionally, I'd appreciate a brief review of a Python module >> I'm developing, which aims to assist me (and potentially other library >> professionals) in retaining crucial data for subsequent analysis while >> ensuring data subject privacy. https://gist.github.com/rayvoelker/80c >> 0dfa5cb47e63c7e498bd064d3c0b6 <https://t.co/aAapRKgElr> Thank you once >> again, Steve, for your invaluable contributions to the security community. >> I eagerly await your feedback! >> > > I think the even better solution compared to Pseudonymization involves > the Birthday Paradox. It's a direction I hadn't even thought of for this! > > --Ray > > On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker <ray.voel...@gmail.com> wrote: > >> Hi code4lib folks .. happy Friday! >> >> I started putting together a little Python utility for doing >> Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization). >> The goal is to be able to do more analysis on data related to circulation >> while securely maintaining patron privacy. >> >> For a little bit of background I wanted something *like a hash* (but >> more secure than a hash), for replacing select fields related to patron >> records. I also wanted something that could possibly be reversed given an >> encrypted private key that would be stored well outside of the scope of the >> project. I'm thinking that if you wanted to geocode addresses for example, >> you could temporarily decrypt each field needed for the task, use the >> *pseudonymized* patron id as the identifier, and then send your data off >> to the geocoder of your choice. Another example would be to store a >> pseudonymized patron id as the identifier in things like circulation data >> used for later analysis, or for transmitting to trusted 3rd parties who may >> do analysis for you. >> >> I'm humbly asking for anyone with some background in using encryption to >> review the code I have and maybe offer some comments / concerns / >> suggestions / jokes about this. >> >> Thanks in advance! >> >> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6 >> >> -- >> Ray Voelker >> > > > -- > Ray Voelker > (937) 620-1830 > -- Ray Voelker (937) 620-1830