Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

Ray Voelker Fri, 29 Sep 2023 06:39:56 -0700

It's Friday again, so another update on this project! lol

I put together what I think is a pretty good explanation and framing of the
problem and a possible solution!


https://chimpy.me/blog/

--Ray

On Fri, Sep 22, 2023 at 9:43 AM Ray Voelker <[email protected]> wrote:

> Hi code4lib folks .. and again ... happy Friday!!
>
> I just wanted to post an update to this. I wrote in to the Security Now!
> podcast (fantastic show by the way and fully worth listening to on a
> regular basis) about this notion, and it was made the main topic of show
> number 940!
>
> https://twit.tv/shows/security-now/episodes/940?autostart=false
>
> The discussion starts around the 1:36 mark.
>
> Here's what I wrote to Steve Gibson:
>
> In addition to being an avid listener to Security Now, I'm also a System
>> Administrator for a large public library system in Ohio. Libraries often
>> struggle with data—being especially sensitive around data related to
>> patrons and patron behavior in terms of borrowing, library program
>> attendance, reference questions, etc. The common practice is for libraries
>> to aggregate and then promptly destroy this data within a short time
>> frame—which is typically one month. However, administrators and local
>> government officials, who are often instrumental in allocating library
>> funding and guiding operational strategies, frequently ask questions on a
>> larger time scale than one month to validate the library's significance and
>> its operational strategies. Disaggregation of this data to answer these
>> types of questions is very difficult and arguably impossible. This puts
>> people like me, and many others like me, in a tough spot in terms of
>> storing and later using sensitive data to provide the answers to these
>> questions of pretty serious consequence—like, what should we spend money
>> on, or why we should continue to exist.
>
>
> I’m sure you’re aware, but there are many interesting historical reasons
>> for this sensitivity, and organizations like the American Library
>> Association (ALA) and other international library associations have even
>> codified the protection of patron privacy into their codes of ethics. For
>> example, the ALA's Code of Ethics states: "We protect each library user's
>> right to privacy and confidentiality with respect to information sought or
>> received and resources consulted, borrowed, acquired or transmitted." While
>> I deeply respect and admire this stance, it doesn't provide a solution for
>> those of us wrestling with the aforementioned existential questions.
>>
>
> In this context, I'd be immensely grateful if you could share your
>> insights on the technique of "Pseudonymization" ( https://
>> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for
>> PII data. Additionally, I'd appreciate a brief review of a Python module
>> I'm developing, which aims to assist me (and potentially other library
>> professionals) in retaining crucial data for subsequent analysis while
>> ensuring data subject privacy. https://gist.github.com/rayvoelker/80c
>> 0dfa5cb47e63c7e498bd064d3c0b6 <https://t.co/aAapRKgElr> Thank you once
>> again, Steve, for your invaluable contributions to the security community.
>> I eagerly await your feedback!
>>
>
>  I think the even better solution compared to Pseudonymization involves
> the Birthday Paradox. It's a direction I hadn't even thought of for this!
>
> --Ray
>
> On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker <[email protected]> wrote:
>
>> Hi code4lib folks .. happy Friday!
>>
>> I started putting together a little Python utility for doing
>> Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization).
>> The goal is to be able to do more analysis on data related to circulation
>> while securely maintaining patron privacy.
>>
>> For a little bit of background I wanted something *like a hash* (but
>> more secure than a hash), for replacing select fields related to patron
>> records. I also wanted something that could possibly be reversed given an
>> encrypted private key that would be stored well outside of the scope of the
>> project. I'm thinking that if you wanted to geocode addresses for example,
>> you could temporarily decrypt each field needed for the task, use the
>> *pseudonymized* patron id as the identifier, and then send your data off
>> to the geocoder of your choice. Another example would be to store a
>> pseudonymized patron id as the identifier in things like circulation data
>> used for later analysis, or for transmitting to trusted 3rd parties who may
>> do analysis for you.
>>
>> I'm humbly asking for anyone with some background in using encryption to
>> review the code I have and maybe offer some comments / concerns /
>> suggestions / jokes about this.
>>
>> Thanks in advance!
>>
>> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6
>>
>> --
>> Ray Voelker
>>
>
>
> --
> Ray Voelker
> (937) 620-1830
>


-- 
Ray Voelker
(937) 620-1830

Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

Reply via email to