Re: [CODE4LIB] Securing shared workstations

2023-12-14 Thread Ray Voelker
This doesn't really solve your "shared login" problem, but I was always a
big fan of using the DeepFreeze software on shared computers. It does a
fantastic job of preventing those changes you were talking about from
"sticking" -- especially if you force a reboot after logout, which isn't
too hard to create a logout script to do that.

https://www.faronics.com/deep-freeze-on-cloud

--Ray


On Thu, Dec 14, 2023 at 9:36 AM Hammer, Erich F  wrote:

> All,
>
> First, I apologize because this is much more of an IT question than a
> coding question, but I come from an IT/desktop support background with a
> particular interest in security.
>
> How are larger, academic libraries securing your employee-used, shared
> workstations -- specifically, the circulation desk machines and the
> back-end, ILL scanning stations?  I have been trying mightily for a few
> years to eliminate the shared-password generic accounts because they
> present a real security/privacy concern.  I am running into some real
> road-blocks though, and I'm wondering if anyone here has found solutions
> that work.
>
> Having viewed the chaotic state of the circulation desk with the constant
> churn of employees using the stations, I have conceded that it is better to
> use a generic login than to have folks log in/out constantly.
>
> The ILL employees who do a lot of scanning don't have the rapid-fire
> turnover at their workstations, but they (or their manager) is insisting on
> a generic login because the scans need to be saved in a specific, network
> location and Acrobat has no mechanism to set the default save location for
> all users.  (I hate Adobe!)  When we have tried using personal logins,
> folks forget, don't notice, or don't know about watching that the PDFs are
> saved in the proper location, and those scans have to be redone by someone
> else or are inaccessible within the particular employee's private user
> profile until they return to work (which could be days-weeks with student
> employees).
>
> In both cases, users still need to sign into services as themselves (the
> LSP -- Alma --, scheduling, wiki documentation, ILLiad, etc.), so I'm not
> really sure what the security advantages are with the generic account
> (especially for ILL scanning).  I've had to push settings to prevent the
> browsers (Edge, Chrome and FireFox) from saving passwords.  I also have
> automated scripts running to regularly blow away the MS Teams configuration
> to prevent users from using it as someone else.  (Teams "helpfully"
> remembers credentials for one-click login even after logging out of it and
> rebooting.)  I have not been able to find a way to do the same with MS
> Office, so I have been forced to uninstall it completely.  Otherwise,
> everyone who uses it while logged onto the computer with the generic
> account is signed into/owns all the M365 documents as the user who first
> used it (and had to sign into M365).
>
> The lack of Microsoft Office is the particular issue that I'm being
> pressed on to prompt me to post this.  I should add that I can't use device
> licenses for M365 (where login/registration isn't required) because they
> only work with Azure Active Directory which we do not have.  What are you
> all doing?  I've been considering trying to set circ desk systems up as
> mulit-app, auto-login kiosks so at least we don't need to share the generic
> password, but the other problems still remain.
>
> Any feedback is appreciated.
>
> Thanks,
> Erich
>
>
>
> --
> Erich HammerHead of Library Systems
> er...@albany.edu University Libraries
> 518-442-3891      University @ Albany
>
> "Faith is the unflagging determination to remain ignorant
> in the face of any and all evidence that you're ignorant."
> -- Shaun Mason
>


-- 
*Ray Voelker*
Integrated Library Systems Administrator

Mobile phone:  (937)620-1830 <+1937-620-1830>
Office phone:  (513)369-4583
E-mail: ray.voel...@chpl.org

Cincinnati & Hamilton County Public Library
800 Vine Street Cincinnati, Ohio 45202

*For Minds of All Kinds*
CHPL.org <https://chpl.org/>


[CODE4LIB] AI Governance / Ethical and Responsible AI Use

2023-11-20 Thread Ray Voelker
Hi Code4Lib,

Does anyone belong to an org -- or know of one -- that has written /
established any guidelines around the use of AI in the context of library
data and staff use?

I had seen that the Urban Libraries Council released a brief on the subject
(https://www.urbanlibraries.org/files/AI_Leadership-Brief_October2023.pdf)
but I was wondering if anyone has "formalized" some of these concerns for
"responsible AI use."

I had started a draft, but I was curious what others were thinking /
planning around this subject.
https://gist.github.com/rayvoelker/5730766001b3306fdb6955451ba26115

-- Ray


[CODE4LIB] IT Ticketing System Recommendations...

2023-10-26 Thread Ray Voelker
Greetings Code4Lib folks,

I'm reaching out to inquire if anyone has recommendations or opinions
(positive or negative) regarding IT Ticketing solutions suitable for a
Public Library system. We're exploring options and are particularly
interested in systems that:

   - Are easy to maintain (with a preference for FOSS, but doesn't have to
   be).
   - Can support various submission forms tailored to different departments
   (e.g., Marketing Dept, IT Dept, ILS Dept, etc.).
   - If not FOSS, then they should accept POs and accommodate non-profit or
   tax-exempt customers, like public library systems.

We are currently using osTicket. However, we've encountered challenges as
it has been heavily customized by a team who no longer works at the
library. Although we're considering migrating to another solution, I'm
curious to know if anyone has experience with https://supportsystem.com/,
which seems to be a cloud-based version of osTicket.

Any insights, recommendations, or opinions you can share would be greatly
appreciated.

Thank you in advance!
-- 
Ray Voelker


Re: [CODE4LIB] VPNs - free to low cost

2023-10-06 Thread Ray Voelker
Maybe a "better" / cheaper solution could be to set up an account with
https://privacy.com/ where you can create virtual credit cards that you can
further lock down to one vendor / payee, a total monthly limit, etc. Seems
almost silly to spend $5 / month to a VPN provider just to pay one or two
bills over what is likely a perfectly secure TLS tunnel that https already
provides. Again just my $0.02 as well :-)

--Ray

On Thu, Oct 5, 2023 at 9:19 PM charles meyer  wrote:

> My esteemed listmates,
>
> Patron on living on modest Social Security alone is exploring if there’s
> any free to low cost ($5-10 a month) VPN for her once a month electronic
> payment of her bank credit card from her checking account using a free
> library hotspot.
>
> Her bank for her credit card has no brick and mortar locations.
>
> I wasn’t sure what others used as reliable tech sites for comparing and
> contrasting VPNs?
>
> Just Googling this, I found these sites but I’m not sure how a accurate any
> of them are?
>
> https://www.vpnmentor.com/blog/best-free-vpn-wifi-hotspots/
>
> https://www.pcmag.com/reviews/hotspot-shield-vpn
>
> https://www.cnet.com/tech/services-and-software/surfshark-vpn-review/
>
> https://www.usnews.com/360-reviews/privacy/vpn/hotspot-shield
>
> Thanks so much,
>
> Charles.
>
> Charlotte County Public Library
>


-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-29 Thread Ray Voelker
It's Friday again, so another update on this project! lol

I put together what I think is a pretty good explanation and framing of the
problem and a possible solution!

https://chimpy.me/blog/

--Ray

On Fri, Sep 22, 2023 at 9:43 AM Ray Voelker  wrote:

> Hi code4lib folks .. and again ... happy Friday!!
>
> I just wanted to post an update to this. I wrote in to the Security Now!
> podcast (fantastic show by the way and fully worth listening to on a
> regular basis) about this notion, and it was made the main topic of show
> number 940!
>
> https://twit.tv/shows/security-now/episodes/940?autostart=false
>
> The discussion starts around the 1:36 mark.
>
> Here's what I wrote to Steve Gibson:
>
> In addition to being an avid listener to Security Now, I'm also a System
>> Administrator for a large public library system in Ohio. Libraries often
>> struggle with data—being especially sensitive around data related to
>> patrons and patron behavior in terms of borrowing, library program
>> attendance, reference questions, etc. The common practice is for libraries
>> to aggregate and then promptly destroy this data within a short time
>> frame—which is typically one month. However, administrators and local
>> government officials, who are often instrumental in allocating library
>> funding and guiding operational strategies, frequently ask questions on a
>> larger time scale than one month to validate the library's significance and
>> its operational strategies. Disaggregation of this data to answer these
>> types of questions is very difficult and arguably impossible. This puts
>> people like me, and many others like me, in a tough spot in terms of
>> storing and later using sensitive data to provide the answers to these
>> questions of pretty serious consequence—like, what should we spend money
>> on, or why we should continue to exist.
>
>
> I’m sure you’re aware, but there are many interesting historical reasons
>> for this sensitivity, and organizations like the American Library
>> Association (ALA) and other international library associations have even
>> codified the protection of patron privacy into their codes of ethics. For
>> example, the ALA's Code of Ethics states: "We protect each library user's
>> right to privacy and confidentiality with respect to information sought or
>> received and resources consulted, borrowed, acquired or transmitted." While
>> I deeply respect and admire this stance, it doesn't provide a solution for
>> those of us wrestling with the aforementioned existential questions.
>>
>
> In this context, I'd be immensely grateful if you could share your
>> insights on the technique of "Pseudonymization" ( https://
>> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for
>> PII data. Additionally, I'd appreciate a brief review of a Python module
>> I'm developing, which aims to assist me (and potentially other library
>> professionals) in retaining crucial data for subsequent analysis while
>> ensuring data subject privacy. https://gist.github.com/rayvoelker/80c
>> 0dfa5cb47e63c7e498bd064d3c0b6 <https://t.co/aAapRKgElr> Thank you once
>> again, Steve, for your invaluable contributions to the security community.
>> I eagerly await your feedback!
>>
>
>  I think the even better solution compared to Pseudonymization involves
> the Birthday Paradox. It's a direction I hadn't even thought of for this!
>
> --Ray
>
> On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker  wrote:
>
>> Hi code4lib folks .. happy Friday!
>>
>> I started putting together a little Python utility for doing
>> Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization).
>> The goal is to be able to do more analysis on data related to circulation
>> while securely maintaining patron privacy.
>>
>> For a little bit of background I wanted something *like a hash* (but
>> more secure than a hash), for replacing select fields related to patron
>> records. I also wanted something that could possibly be reversed given an
>> encrypted private key that would be stored well outside of the scope of the
>> project. I'm thinking that if you wanted to geocode addresses for example,
>> you could temporarily decrypt each field needed for the task, use the
>> *pseudonymized* patron id as the identifier, and then send your data off
>> to the geocoder of your choice. Another example would be to store a
>> pseudonymized patron id as the identifier in things like circulation data
>> used for later analysis, or for transmitting to trusted 3rd parties who may
>> do analysis for you.
>>
>> I'm humbly asking for anyone with some background in using encryption to
>> review the code I have and maybe offer some comments / concerns /
>> suggestions / jokes about this.
>>
>> Thanks in advance!
>>
>> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6
>>
>> --
>> Ray Voelker
>>
>
>
> --
> Ray Voelker
> (937) 620-1830
>


-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-22 Thread Ray Voelker
Ben

Those are all super critical things to be aware of with data of this
nature. I think that's where having a protected app secret, and mixing
other static record data (like a patron record num / id, creation date,
etc.) into a salt would be VERY important so that data is more shielded
from these types of issues and attacks.

But again, this data wouldn't (and shouldn't) be considered fully
anonymized for the reason that given the database of known data subjects,
and the application secrets, you could possibly build a lookup table and
re-attribute patrons to their past activity. Data (even encrypted data) is
of course only private so long as you can properly protect secrets--which
should NEVER be stored along with the processed data. With this in mind,
this type of technique is still very useful for being able to perform
statistical analysis on your library data, while still maintaining and
respecting patron privacy.

https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6#limitations

--Ray

On Fri, Sep 22, 2023 at 4:11 PM Steinberg, Benjamin <
bsteinb...@law.harvard.edu> wrote:

> Sorry if this has already been mentioned in this thread or related links
> and I’ve missed it – I believe that anonymization of this sort can be
> broken when other data sources contain data relating to the anonymized
> subjects. The idea is called trail re-identification; see work by Latanya
> Sweeney, Bradley Malin, and Elaine Newton at
>
> https://dataprivacylab.org/people/sweeney/trails1.html
>
> and
>
> https://pubmed.ncbi.nlm.nih.gov/15196482/
>
> - Ben
>
> From: Code for Libraries  on behalf of Ray
> Voelker 
> Date: Friday, September 22, 2023 at 3:31 PM
> To: CODE4LIB@LISTS.CLIR.ORG 
> Subject: Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...
> > This could prevent the use of lists of known values (e.g.  user email
> > addresses or IDs that have been harvested from public directories) to
> > calculate related hashes for comparison with those in a dataset -
> enabling
> > de-anonymization.
> >
>
> Basically the exact problem of "Known Plaintext" (
> https://en.wikipedia.org/wiki/Known-plaintext_attack)
>
> I wonder if doing something like basing the PBKDF2HMAC iteration count on
> some static integer related to the patron record (like the patron number or
> id in the local ILS) would be helpful. I suspect that really you just want
> to mix in an "app secret" with the salt coming from other static parts of
> the patron record would be sufficient.
>
> --Ray
>
>
> On Fri, Sep 22, 2023 at 3:00 PM Karl Benedict  wrote:
>
> > Thanks for sharing this. I’m also a long time listener to Security Now
> > (along with a bunch of other TWIT  rework podcasts) and heard the
> response
> > to your question yesterday. It was great to hear Steve's deep dive on a
> > topic that I've done a little work on- fortunately confirming the
> approach
> > that I previously used in the analysis of our proxy logs to troubleshoot
> an
> > issue with Google Scholar blocking our proxy IP address.
> >
> > Listening yesterday made me think that a needed additional step in
> > generating the hashes from identifiable elements in our data is salting
> the
> > hashes (adding an additional constant but random) value to the values
> being
> > hashed. This could prevent the use of lists of known values (e.g.  user
> > email addresses or IDs that have been harvested from public directories)
> to
> > calculate related hashes for comparison with those in a dataset -
> enabling
> > de-anonymization.
> >
> >
> > Thanks, Karl
> >
> > Schedule an appointment: Online booking page<
> >
> https://outlook.office365.com/owa/calendar/karlbened...@unmm.onmicrosoft.com/bookings/
> > >
> > 
> > From: Code for Libraries  on behalf of Ray
> > Voelker 
> > Sent: Friday, September 22, 2023 7:43:26 AM
> > To: CODE4LIB@LISTS.CLIR.ORG 
> > Subject: Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...
> >
> > [You don't often get email from ray.voel...@gmail.com. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> >   [EXTERNAL]
> >
> > Hi code4lib folks .. and again ... happy Friday!!
> >
> > I just wanted to post an update to this. I wrote in to the Security Now!
> > podcast (fantastic show by the way and fully worth listening to on a
> > regular basis) about this notion, and it was made the main topic of show
> > number 940!
> >
> > https://twit.tv/shows/security-now/episodes/940?autostart=false
> >
> > The discussion starts around the 1

Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-22 Thread Ray Voelker
> This could prevent the use of lists of known values (e.g.  user email
> addresses or IDs that have been harvested from public directories) to
> calculate related hashes for comparison with those in a dataset - enabling
> de-anonymization.
>

Basically the exact problem of "Known Plaintext" (
https://en.wikipedia.org/wiki/Known-plaintext_attack)

I wonder if doing something like basing the PBKDF2HMAC iteration count on
some static integer related to the patron record (like the patron number or
id in the local ILS) would be helpful. I suspect that really you just want
to mix in an "app secret" with the salt coming from other static parts of
the patron record would be sufficient.

--Ray


On Fri, Sep 22, 2023 at 3:00 PM Karl Benedict  wrote:

> Thanks for sharing this. I’m also a long time listener to Security Now
> (along with a bunch of other TWIT  rework podcasts) and heard the response
> to your question yesterday. It was great to hear Steve's deep dive on a
> topic that I've done a little work on- fortunately confirming the approach
> that I previously used in the analysis of our proxy logs to troubleshoot an
> issue with Google Scholar blocking our proxy IP address.
>
> Listening yesterday made me think that a needed additional step in
> generating the hashes from identifiable elements in our data is salting the
> hashes (adding an additional constant but random) value to the values being
> hashed. This could prevent the use of lists of known values (e.g.  user
> email addresses or IDs that have been harvested from public directories) to
> calculate related hashes for comparison with those in a dataset - enabling
> de-anonymization.
>
>
> Thanks, Karl
>
> Schedule an appointment: Online booking page<
> https://outlook.office365.com/owa/calendar/karlbened...@unmm.onmicrosoft.com/bookings/
> >
> 
> From: Code for Libraries  on behalf of Ray
> Voelker 
> Sent: Friday, September 22, 2023 7:43:26 AM
> To: CODE4LIB@LISTS.CLIR.ORG 
> Subject: Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...
>
> [You don't often get email from ray.voel...@gmail.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
>   [EXTERNAL]
>
> Hi code4lib folks .. and again ... happy Friday!!
>
> I just wanted to post an update to this. I wrote in to the Security Now!
> podcast (fantastic show by the way and fully worth listening to on a
> regular basis) about this notion, and it was made the main topic of show
> number 940!
>
> https://twit.tv/shows/security-now/episodes/940?autostart=false
>
> The discussion starts around the 1:36 mark.
>
> Here's what I wrote to Steve Gibson:
>
> In addition to being an avid listener to Security Now, I'm also a System
> > Administrator for a large public library system in Ohio. Libraries often
> > struggle with data—being especially sensitive around data related to
> > patrons and patron behavior in terms of borrowing, library program
> > attendance, reference questions, etc. The common practice is for
> libraries
> > to aggregate and then promptly destroy this data within a short time
> > frame—which is typically one month. However, administrators and local
> > government officials, who are often instrumental in allocating library
> > funding and guiding operational strategies, frequently ask questions on a
> > larger time scale than one month to validate the library's significance
> and
> > its operational strategies. Disaggregation of this data to answer these
> > types of questions is very difficult and arguably impossible. This puts
> > people like me, and many others like me, in a tough spot in terms of
> > storing and later using sensitive data to provide the answers to these
> > questions of pretty serious consequence—like, what should we spend money
> > on, or why we should continue to exist.
>
>
> I’m sure you’re aware, but there are many interesting historical reasons
> > for this sensitivity, and organizations like the American Library
> > Association (ALA) and other international library associations have even
> > codified the protection of patron privacy into their codes of ethics. For
> > example, the ALA's Code of Ethics states: "We protect each library user's
> > right to privacy and confidentiality with respect to information sought
> or
> > received and resources consulted, borrowed, acquired or transmitted."
> While
> > I deeply respect and admire this stance, it doesn't provide a solution
> for
> > those of us wrestling with the aforementioned existential questions.
> >
>
> In this context, I'd be immensely grateful if you could share your

Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-22 Thread Ray Voelker
It should be noted that even using this "Stochastic Pseudonymization"
technique still likely still falls under the category of "PII"

I put in the other README that "It's essential to understand that even
pseudonymized data remains within the realm of personal data as per the
GDPR and many other regulations and laws. This categorization is because
such data can be linked back to an individual when complemented with
supplementary details".

Since you can basically take a known value, like email address, run it
through the Stochastic Pseudonymization process, and return with a
deterministic result means that you could *still *link the data back to its
original identifier. I think that's why it's important to have an "app
secret" that you protect and keep as protected as possible.

I'm not sure if there's anything you could do to solve that type of
problem, other than protecting your "app secret" that should go into the
salt for the hash.

--Ray



On Fri, Sep 22, 2023 at 1:56 PM Ray Voelker  wrote:

> I hereby dub this technique "Stochastic Pseudonymization" :-)
>
> Here's a quick implementation.
>
> https://colab.research.google.com/gist/rayvoelker/74278aa82ee95e3c6dbf0caa993f1ebe/stochastic_pseudonymization.ipynb
>
> I think the trick is of course picking a number of bits in order to
> sufficiently cover the size of your patron population, and to introduce
> this chance of collision into your data--in order to as Steve put it, "...
> to leave a *modicum of deliberate uncertainty*, introduced by hashing
> collision, to keep it from being possible to prove that any one patron had
> some specific behavior in the past".
>
> --Ray
>
> On Fri, Sep 22, 2023 at 11:44 AM Hammer, Erich F  wrote:
>
>> Ray,
>>
>> I liked your original pseudononymous idea and was thinking about trying
>> to re-write it in PowerShell, but looking into the "birthday paradox"
>> comment opens some very interesting possibilities.
>>
>> As I understand it, you can uniquely identify someone using just a
>> portion of a hash of their unique ID (e.g. email address).  Even better for
>> us in library land is that if you use just the right number of bits from
>> the hash, you can create a legal-level of plausible deniability while
>> maintaining statistically valid data.  For example, with the right
>> calculations, you can introduce enough of a chance of two patrons having
>> the same "anonymized" identifiers (calculated from a hash of their unique
>> ID) that a patron can't be definitively identified, but at the same time,
>> the vast majority of the identifiers will be unique and thus any statistics
>> will still be highly accurate.
>>
>> Good stuff!
>>
>> Thanks,
>>
>> Erich
>>
>>
>>
>> On Friday, September 22, 2023 at 09:43, Ray Voelker eloquently inscribed:
>>
>> > Hi code4lib folks .. and again ... happy Friday!!
>> >
>> > I just wanted to post an update to this. I wrote in to the Security Now!
>> > podcast (fantastic show by the way and fully worth listening to on a
>> > regular basis) about this notion, and it was made the main topic of show
>> > number 940!
>> >
>> > https://twit.tv/shows/security-now/episodes/940?autostart=false
>> >
>> > The discussion starts around the 1:36 mark.
>> >
>> > Here's what I wrote to Steve Gibson:
>> >
>> > In addition to being an avid listener to Security Now, I'm also a System
>> >> Administrator for a large public library system in Ohio. Libraries
>> >> often struggle with data—being especially sensitive around data related
>> >> to patrons and patron behavior in terms of borrowing, library program
>> >> attendance, reference questions, etc. The common practice is for
>> >> libraries to aggregate and then promptly destroy this data within a
>> >> short time frame—which is typically one month. However, administrators
>> >> and local government officials, who are often instrumental in
>> >> allocating library funding and guiding operational strategies,
>> >> frequently ask questions on a larger time scale than one month to
>> >> validate the library's significance and its operational strategies.
>> >> Disaggregation of this data to answer these types of questions is very
>> >> difficult and arguably impossible. This puts people like me, and many
>> >> others like me, in a tough spot in terms of storing and later using
>> >> sensitive data to provide the answers to these questions of pretty
>> >> ser

Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-22 Thread Ray Voelker
I hereby dub this technique "Stochastic Pseudonymization" :-)

Here's a quick implementation.
https://colab.research.google.com/gist/rayvoelker/74278aa82ee95e3c6dbf0caa993f1ebe/stochastic_pseudonymization.ipynb

I think the trick is of course picking a number of bits in order to
sufficiently cover the size of your patron population, and to introduce
this chance of collision into your data--in order to as Steve put it, "...
to leave a *modicum of deliberate uncertainty*, introduced by hashing
collision, to keep it from being possible to prove that any one patron had
some specific behavior in the past".

--Ray

On Fri, Sep 22, 2023 at 11:44 AM Hammer, Erich F  wrote:

> Ray,
>
> I liked your original pseudononymous idea and was thinking about trying to
> re-write it in PowerShell, but looking into the "birthday paradox" comment
> opens some very interesting possibilities.
>
> As I understand it, you can uniquely identify someone using just a portion
> of a hash of their unique ID (e.g. email address).  Even better for us in
> library land is that if you use just the right number of bits from the
> hash, you can create a legal-level of plausible deniability while
> maintaining statistically valid data.  For example, with the right
> calculations, you can introduce enough of a chance of two patrons having
> the same "anonymized" identifiers (calculated from a hash of their unique
> ID) that a patron can't be definitively identified, but at the same time,
> the vast majority of the identifiers will be unique and thus any statistics
> will still be highly accurate.
>
> Good stuff!
>
> Thanks,
>
> Erich
>
>
>
> On Friday, September 22, 2023 at 09:43, Ray Voelker eloquently inscribed:
>
> > Hi code4lib folks .. and again ... happy Friday!!
> >
> > I just wanted to post an update to this. I wrote in to the Security Now!
> > podcast (fantastic show by the way and fully worth listening to on a
> > regular basis) about this notion, and it was made the main topic of show
> > number 940!
> >
> > https://twit.tv/shows/security-now/episodes/940?autostart=false
> >
> > The discussion starts around the 1:36 mark.
> >
> > Here's what I wrote to Steve Gibson:
> >
> > In addition to being an avid listener to Security Now, I'm also a System
> >> Administrator for a large public library system in Ohio. Libraries
> >> often struggle with data—being especially sensitive around data related
> >> to patrons and patron behavior in terms of borrowing, library program
> >> attendance, reference questions, etc. The common practice is for
> >> libraries to aggregate and then promptly destroy this data within a
> >> short time frame—which is typically one month. However, administrators
> >> and local government officials, who are often instrumental in
> >> allocating library funding and guiding operational strategies,
> >> frequently ask questions on a larger time scale than one month to
> >> validate the library's significance and its operational strategies.
> >> Disaggregation of this data to answer these types of questions is very
> >> difficult and arguably impossible. This puts people like me, and many
> >> others like me, in a tough spot in terms of storing and later using
> >> sensitive data to provide the answers to these questions of pretty
> >> serious consequence—like, what should we spend money on, or why we
> >> should continue to exist.
> >
> > I’m sure you’re aware, but there are many interesting historical reasons
> >> for this sensitivity, and organizations like the American Library
> >> Association (ALA) and other international library associations have
> >> even codified the protection of patron privacy into their codes of
> >> ethics. For example, the ALA's Code of Ethics states: "We protect each
> >> library user's right to privacy and confidentiality with respect to
> >> information sought or received and resources consulted, borrowed,
> >> acquired or transmitted." While I deeply respect and admire this
> >> stance, it doesn't provide a solution for those of us wrestling with
> >> the aforementioned existential questions.
> >>
> >
> > In this context, I'd be immensely grateful if you could share your
> insights
> >> on the technique of "Pseudonymization" ( https://
> >> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for
> >> PII data. Additionally, I'd appreciate a brief review of a Python
> >> module I'm developing, which aims to assist me (and potentially other
> &g

Re: [CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-22 Thread Ray Voelker
Hi code4lib folks .. and again ... happy Friday!!

I just wanted to post an update to this. I wrote in to the Security Now!
podcast (fantastic show by the way and fully worth listening to on a
regular basis) about this notion, and it was made the main topic of show
number 940!

https://twit.tv/shows/security-now/episodes/940?autostart=false

The discussion starts around the 1:36 mark.

Here's what I wrote to Steve Gibson:

In addition to being an avid listener to Security Now, I'm also a System
> Administrator for a large public library system in Ohio. Libraries often
> struggle with data—being especially sensitive around data related to
> patrons and patron behavior in terms of borrowing, library program
> attendance, reference questions, etc. The common practice is for libraries
> to aggregate and then promptly destroy this data within a short time
> frame—which is typically one month. However, administrators and local
> government officials, who are often instrumental in allocating library
> funding and guiding operational strategies, frequently ask questions on a
> larger time scale than one month to validate the library's significance and
> its operational strategies. Disaggregation of this data to answer these
> types of questions is very difficult and arguably impossible. This puts
> people like me, and many others like me, in a tough spot in terms of
> storing and later using sensitive data to provide the answers to these
> questions of pretty serious consequence—like, what should we spend money
> on, or why we should continue to exist.


I’m sure you’re aware, but there are many interesting historical reasons
> for this sensitivity, and organizations like the American Library
> Association (ALA) and other international library associations have even
> codified the protection of patron privacy into their codes of ethics. For
> example, the ALA's Code of Ethics states: "We protect each library user's
> right to privacy and confidentiality with respect to information sought or
> received and resources consulted, borrowed, acquired or transmitted." While
> I deeply respect and admire this stance, it doesn't provide a solution for
> those of us wrestling with the aforementioned existential questions.
>

In this context, I'd be immensely grateful if you could share your insights
> on the technique of "Pseudonymization" ( https://
> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for PII
> data. Additionally, I'd appreciate a brief review of a Python module I'm
> developing, which aims to assist me (and potentially other library
> professionals) in retaining crucial data for subsequent analysis while
> ensuring data subject privacy. https://gist.github.com/rayvoelker/80c
> 0dfa5cb47e63c7e498bd064d3c0b6 <https://t.co/aAapRKgElr> Thank you once
> again, Steve, for your invaluable contributions to the security community.
> I eagerly await your feedback!
>

 I think the even better solution compared to Pseudonymization involves the
Birthday Paradox. It's a direction I hadn't even thought of for this!

--Ray

On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker  wrote:

> Hi code4lib folks .. happy Friday!
>
> I started putting together a little Python utility for doing
> Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization).
> The goal is to be able to do more analysis on data related to circulation
> while securely maintaining patron privacy.
>
> For a little bit of background I wanted something *like a hash* (but more
> secure than a hash), for replacing select fields related to patron records.
> I also wanted something that could possibly be reversed given an encrypted
> private key that would be stored well outside of the scope of the project.
> I'm thinking that if you wanted to geocode addresses for example, you could
> temporarily decrypt each field needed for the task, use the
> *pseudonymized* patron id as the identifier, and then send your data off
> to the geocoder of your choice. Another example would be to store a
> pseudonymized patron id as the identifier in things like circulation data
> used for later analysis, or for transmitting to trusted 3rd parties who may
> do analysis for you.
>
> I'm humbly asking for anyone with some background in using encryption to
> review the code I have and maybe offer some comments / concerns /
> suggestions / jokes about this.
>
> Thanks in advance!
>
> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6
>
> --
> Ray Voelker
>


-- 
Ray Voelker
(937) 620-1830


[CODE4LIB] Patron Data Pseudonymization Review Request ...

2023-09-15 Thread Ray Voelker
Hi code4lib folks .. happy Friday!

I started putting together a little Python utility for doing
Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization).
The goal is to be able to do more analysis on data related to circulation
while securely maintaining patron privacy.

For a little bit of background I wanted something *like a hash* (but more
secure than a hash), for replacing select fields related to patron records.
I also wanted something that could possibly be reversed given an encrypted
private key that would be stored well outside of the scope of the project.
I'm thinking that if you wanted to geocode addresses for example, you could
temporarily decrypt each field needed for the task, use the *pseudonymized*
patron id as the identifier, and then send your data off to the geocoder of
your choice. Another example would be to store a pseudonymized patron id as
the identifier in things like circulation data used for later analysis, or
for transmitting to trusted 3rd parties who may do analysis for you.

I'm humbly asking for anyone with some background in using encryption to
review the code I have and maybe offer some comments / concerns /
suggestions / jokes about this.

Thanks in advance!

https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6

-- 
Ray Voelker


Re: [CODE4LIB] ezmesure + ezpaarse folks?

2023-02-27 Thread Ray Voelker
Hey Anson,
I just setup ezpaarse from their docker image, and ran a bunch of our
EZproxy logs through it. It seems to work great! It'll even accept logs
that have been gzip-compressed!

I'm not sure if i'll make the effort to automate it (I'm at a public
library, and the need to analyze the logs is probably less than that of an
academic library.)

I'm not yet at the stage of setting up ezmeasure, but it's been really
interesting / helpful to load the ezpaarse output into sqlite and running
some basic queries on it!

--Ray

On Mon, Feb 27, 2023 at 2:54 PM parker, anson D (adp6j) <
00d5611add13-dmarc-requ...@lists.clir.org> wrote:

> is anyone here working with ezmesure and ezpaarse for their ezproxy logs?
> i'm getting it set up and would appreciate getting some feedback from any
> devs who have rolled this out recently
>
> seems like a great set of tools, but i'm having a few problems getting the
> whole thing wired up and it'd be great to talk to someone else running it
>
> thanks
> anson



-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] Creating Custom Maps

2022-03-17 Thread Ray Voelker
en
> > > you
> > > > create a custom area map. But, when I remove destination points
> withing
> > > > that, it also removes the library name.
> > > >
> > > >
> > > >
> > > > I’ve toggled through Options, Car, bike, walk and transit icons but
> > none
> > > > produce the custom map I’d lie without leaving the word Restaurant or
> > > Sign
> > > > In prominently on the map.
> > > >
> > > >
> > > >
> > > > Mapquest seems even more bizarre.
> > > >
> > > >
> > > >
> > > > Has anyone found an online application or free downloadable program
> > which
> > > > will create custom maps for basically a square area where you can
> > include
> > > > just one destination?
> > > >
> > > >
> > > >
> > > > Thank you!
> > > >
> > > >
> > > >
> > > > Charles.
> > > >
> > > >
> > > >
> > > > Charles Meyer
> > > >
> > > > Charlotte County Public Library
> > > >
> > > > Port Charlotte, Fl 33952
> > > >
> > >
> >
>
>
> --
> Christine Murray
> Social Science Librarian
> Bates College Library
> 48 Campus Ave.
> Lewiston, ME  04240
>
> (207)786-6268
>


-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] MS word doc table to Excel

2020-09-16 Thread Ray Voelker
If you're a little handy with python (or got someone who is) you could read
the contents of a table in the word doc, convert it to a pandas datafromae,
and then it's pretty simple to export a dataframe to and excel spreadsheet
with the pd.to_excel() function.

Take a look at this:
https://stackoverflow.com/questions/47977367/how-to-create-a-dataframe-from-a-table-in-a-word-document-docx-file-using-pan?noredirect=1=1

could be handy to do this in a little script if you're going to be doing it
more than once.

Good luck!
--Ray

On Wed, Sep 16, 2020 at 11:16 AM Tim McMahon  wrote:

> You should be able to highlight and copy from the Word document and
> paste into the Excel document.  Or did you want it automated somehow?
>
> On 9/16/20 10:11 AM, Amy Schuler wrote:
> > Hi all,
> > does anyone know of a way to export (or scrape) the contents of a Word
> doc
> > text table into Excel or some other spreadsheet application like Google
> > sheets?  I know this is a long shot.
> > thanks!
> >
> --
> Tim McMahon
> West Liberty Public Library
> 319-627-2084
>


-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] file sharing/transfer

2020-01-15 Thread Ray Voelker
One way that I really wish would catch on more would be public-key
encryption using GPG (also goes by PGP). You can combine it with an email
client like Thunderbird, which has some good tools that use PGP keys
(Enigmail plugin) to securely send attachments. If you encrypt and sign
something like a zip file using GPG, you can also securely share that file
via things like dropbox, or google drive, and still be assured that only
the recipient would be able to decrypt and see the contents. Here's the
link to the Mozilla support page for Digitally Signing and Encrypting
Messages:
https://support.mozilla.org/en-US/kb/digitally-signing-and-encrypting-messages#w_installing-gpg-and-enigmail

The only disadvantage of this, is that you must exchange GPG public keys
with your recipient (or use a key-server) and have a little bit of
understanding about how the tool works. But it's all open source, free,
proven secure, and has some decent tools for encrypting / decrypting.

I believe that there are plans this year to have GPG support build directly
into the Thunderbird client, so, it should get even easier!

Good luck!

--Ray

On Wed, Jan 15, 2020 at 12:46 PM Mike Kastellec  wrote:

> One answer: https://www.globus.org/data-transfer
> - - - - - - - -
> Mike Kastellec + makas...@ncsu.edu + 919-513-2176 + My Calendar
> <
> https://calendar.google.com/calendar/embed?src=makastel%40ncsu.edu=America/New_York
> >
> Associate Head of Information Technology, NC State University Libraries
> <http://lib.ncsu.edu/>
> All electronic mail messages in connection with State business which are
> sent to or received by this account are subject to the NC Public Records
> Law and may be disclosed to third parties.
>
>
> On Wed, Jan 15, 2020 at 12:14 PM Elizabeth Leonard <
> elizabeth.leon...@shu.edu> wrote:
>
> > Possibly both- this is a bit of a thought experiment. I'd like to know
> > what's out there to be able to help learn enough to advise and support
> our
> > faculty.
> >
> > Elizabeth Leonard
> > 973-761-9445
> >
> > -Original Message-
> > From: Code for Libraries  On Behalf Of Goben,
> > Abigail H
> > Sent: Wednesday, January 15, 2020 12:06 PM
> > To: CODE4LIB@LISTS.CLIR.ORG
> > Subject: Re: [CODE4LIB] file sharing/transfer
> >
> > Could you clarify the level of security you're dealing with?  Is this
> > where you need HIPAA compliance? PHI? Sensitive personal information?
> >
> > --
> > Abigail H. Goben, MLS
> > ago...@uic.edu
> >
> > -Original Message-
> > From: Code for Libraries  On Behalf Of
> Elizabeth
> > Leonard
> > Sent: Wednesday, January 15, 2020 11:03 AM
> > To: CODE4LIB@LISTS.CLIR.ORG
> > Subject: [CODE4LIB] file sharing/transfer
> >
> > Hi all:
> >
> > Let say your faculty have research files that they want to securely share
> > with researchers at another academic institutions (say, on another
> > continent).
> >
> > What are secure ways that they can do this? An example I've heard of is
> > Cyberduck- anything else?
> >
> > We are hoping for reasonably priced solutions (I know... secure,
> > reasonably priced, and effective... can't have all of them- but hoping
> > anyway).
> >
> > Thanks!
> >
> > Elizabeth Leonard
> > Assistant Dean, Information Technologies and Collections Services Seton
> > Hall University
> > 400 South Orange Avenue
> > South Orange, NJ 07079
> > 973-761-9445
> > Preferred pronouns: She, her, hers
> > ** WARNING: This email originated from outside of Seton Hall University.
> > Do not click links or open attachments unless you recognize the sender
> and
> > know the content is safe. **
> >
>


-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] compare LC call number

2019-12-07 Thread Ray Voelker
This might work for you... it has a normalization feature, and allows you
the then compare the normalized LC numbers.

https://github.com/rayvoelker/js-loc-callnumbers

--Ray

On Fri, Dec 6, 2019 at 7:46 PM Dhanushka Samarakoon 
wrote:

> Hi Kun,
>
> This is a JS that I wrote sometime back to normalize LC call numbers.
>
> https://gist.github.com/dhanushka-samarakoon/1f0e080cbca48b4a53a7b988a9b020d2
> Even though it does not directly address comparison, you can normalize and
> then compare.
>
> -Dhanushka.
>
> On Thu, Dec 5, 2019 at 5:13 PM Kun Lin  wrote:
>
> > Is there any good existing scripts of comparing LC call numbers to
> > determine which one should come first? Javascript or PHP.
> >
> >
> >
> >
> >
> > Thanks a lot
> >
> > 
> >
> > Kun Lin
> >
> > Systems and Application Librarian
> >
> > Whitman College
> >
> >
> >
> > PGP Public Key https://kj7ieg.com/dnwklin_public.txt
> >
>


-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] python 2 versus python 3

2018-03-07 Thread Ray Voelker
I was just talking to a friend of mine about Python stuff, as we've both
just started using it for some library-related projects.

We we're looking into the environment stuff when it comes to python, and I
didn't realize that there were so many ways to go!

https://stackoverflow.com/questions/41573587/what-is-the-difference-between-venv-pyvenv-pyenv-virtualenv-virtualenvwrappe

My two-cents is that virtualenv seems fairly easy to use, is fairly
popular, and gives you a sandboxed python environment.

--Ray

On Mar 7, 2018 8:57 PM, "Peter Murray"  wrote:

> For what it's worth, I recently ran across this article about setting up
> sane Python development environments:
>
>   https://jacobian.org/writing/python-environment-2018/
>
>
> Peter
>
> On Mar 7, 2018, 4:54 PM -0500, Jay Luker , wrote:
> > I would add a recommendation for pyenv as a way to manage multiple
> versions
> > of python on a machine. Very helpful, particularly if you need to run
> > tests under multiple versions with something like tox.
> >
> > https://github.com/pyenv/pyenv
> >
> > —jay
> >
> > On Wed, Mar 7, 2018 at 3:35 PM Ed Summers  wrote:
> >
> > > I agree. Third party support for Python3 is pretty good now. But if you
> > > have any dependencies you know you're going to need it's a good idea to
> > > check beforehand.
> > >
> > > There's also the six module if you want to be able to say you support 2
> > > and 3, and want a nice way of papering over the differences.
> > >
> > > http://six.readthedocs.io/
> > >
> > > //Ed
> > >
> > > > On Mar 7, 2018, at 3:31 PM, Tod Olson  wrote:
> > > >
> > > > I'd suggest Python 3.
> > > >
> > > > There are mechanisms for managing virtual environments for Python,
> like
> > > penv, which make it easy to install and switch between versions without
> > > confusing the system.
> > > >
> > > > -Tod
> > >
>


Re: [CODE4LIB] Sorting LCC numbers in R

2017-07-21 Thread Ray Voelker
I replied to your stack overflow question, but here's my solution in
JavaScript. Maybe it could be useful to you and others if port it over to R.

https://github.com/rayvoelker/js-loc-callnumbers/blob/master/locCallClass.js

I'm happy to answer any questions about it, but I found that the method I
used was pretty effective to sort all the call numbers I could throw at it!

--Ray

On Fri, Jul 21, 2017 at 11:56 AM, Cooper, Krystal <kcoop...@illinois.edu>
wrote:

> Hi Bill,
>
> Elizabeth Wickes from the uiuc iSchool suggested this.
>
> https://github.com/libraryhackers/library-callnumber-lc/tree/master/python
>
> Sent from my iPhone
>
> On Jul 21, 2017, at 9:43 AM, William Denton <w...@pobox.com<mailto:wtd@
> POBOX.COM>> wrote:
>
> Sorting by LCC numbers has been solved in most popular languages, but I
> couldn't find any example of how to do it in R, nor could I figure it out,
> so in case anyone is interested or can help, here's where I asked about it
> on Stack Overflow:
>
> How to sort by Library of Congress Classification (LCC) number in R
> https://stackoverflow.com/q/45240337/854346
>
> Bill
> --
> William Denton :: Toronto, Canada   ---   Listening to Art:
> https://listeningtoart.org/
> https://www.miskatonic.org/ ---   GHG.EARTH: http://ghg.earth/
> Caveat lector.  ---   STAPLR: http://staplr.org/
>



-- 
Ray Voelker
(937) 620-1830


Re: [CODE4LIB] 46 gigabytes

2016-12-13 Thread Ray Voelker
I would really stay away from SSDs for anything other than desktop use,
where you're backing up somewhere else on a regular basis. SSDs tend to
fully fail suddenly, where are a spinning disk is often times recoverable.

I would say if you're looking for a good solution, it's probably not a
single "thing", but rather a collection of devices / services.

My suggestion would be to get an inexpensive external drive formatted for
your main / everyday OS (NTFS if you're using windows or EXT if using Linux
for example) and then look into using something like dropbox, or my current
favorite Amazon Cloud Drive to sync that drive every so often. For bonus
points, and extra piece of mind backup your external drive to another
external drive and take it off-site every few weeks.

I like to use the 3-2-1 rule for data backups ... Have 3 copies of your
data on 2 different media, and store one offsite. Storage is so cheap these
days that it's pretty easy to do this on a budget.

Good luck!

--Ray

On Tue, Dec 13, 2016 at 2:03 PM, Craig Dietrich <craigdietr...@gmail.com>
wrote:

> > Interesting. I never intended to use things like CD’s nor DVD’s as real
> long-term storage mediums, since I always planned on migrating forward, but
> Craig, please elaborate. Please tell us more about the lifespan of bits on
> an SSD card. I don’t know about such things. —Eric M.
>
> Here is a report by CLIR:
>
> <https://www.clir.org/pubs/reports/pub121/sec4.html>
>
> They list 20 years as the low estimate for writable disks, but I've
> encountered studies that put the number lower, around 7-10 years.
> Regardless, the estimate is in years rather than decades.
>
> SSD bits, however, are typically measured in hours (the Mean Time Between
> Failure (MTBF)).   So, for example, a bit can be written over-and-over for
> 2 million hours.  This assumes you're using the SSD regularly; my network
> hard drive is always on, so power is cycling through the drives, which I
> assume keeps the drive fresh. ( I'd recommend a SSD, that has its own power
> source, rather than SD cards, for this reason.)   2 million hours is about
> 200 years, so an order of magnitude greater than a writable disk.
>
> Cheers,
> Craig
>



-- 
Ray Voelker
(937) 229-1407
Roesch Library
University of Dayton
300 College Park
Dayton OH, 45469-1360