Thank you for sharing the link to the Dark Visitors website - it looks very useful, indeed!

Linda

On 4/19/24 20:21, Lolis, John via Evergreen-general wrote:
There's been quite a conversation on the CODE4LIB listserv about this lately...

Scott Prater <0000007dd2c67ad2-dmarc-requ...@lists.clir.org>

Thu, 11 Apr, 10:43 (8 days ago)

to CODE4LIB
We've also been seeing some traffic from inconsiderate AI bots.

One of my colleagues came across this site, which tracks and documents AI bots:

https://darkvisitors.com/

-- Scott

--
Scott Prater
Digital Library Architect
UW Digital Collections Center
University of Wisconsin - Madison



________________________________________
From: Code for Libraries <code4...@lists.clir.org> on behalf of Lolis, John <jlo...@whiteplainsny.gov>
Sent: Wednesday, April 10, 2024 12:15 PM
To: code4...@lists.clir.org
Subject: Re: [CODE4LIB] blocking GPTBot?

This *sounds* as if it should help:
https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$

*“I would rather have questions that can’t be answered than answers that
can’t be questioned.”*
— Richard Feynman
<https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$ >,
theoretical physicist and recipient of the Nobel Prize in Physics in 1965


On Mon, 8 Apr 2024 at 16:31, Jason Casden <cas...@gmail.com> wrote:

> Thanks for bringing this up, Eben. We've been having a horrible time with
> these bots, including those from previously fairly well-behaved sources
> like Google. They've caused issues ranging from slow response times and
> high system load all the way up to outages for some older systems. So far, > our systems folks have been playing whack-a-mole with a combination of IP
> range blocks and increasingly detailed robots.txt statements. A group is
> being convened to investigate more comprehensive options so I will be
> watching this thread closely.
>
> Jason
>
> On Mon, Apr 8, 2024 at 4:18 PM Eben English <eben.engl...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I'm wondering if other folks are seeing AI and/or ML-related crawlers
> like
> > GPTBot accessing your library's website, catalog, digital collections, or
> > other sites.
> >
> > If so, are you blocking or disallowing these crawlers? Has anyone come up
> > with any policies around this?
> >
> > We're debating whether to allow these types of bots to crawl our digital
> > collections, many of which contain large amounts of copyrighted or "no
> > derivatives"-licensed materials. On one hand, these materials are
> available
> > for public view, but on the other hand the type of use that GPTBot and
> the
> > like are after (integrating the content into their models) could be
> > characterized as creating a derivative work, which is expressly
> > discouraged.
> >
> > Thanks,
> >
> > Eben English (he/him/his)
> > Digital Repository Services Manager
> > Boston Public Library
> >
>

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://whiteplainslibrary.org/

/“I would rather have questions that can’t be answered than answers that can’t be questioned.”/ — Richard Feynman <https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu>, theoretical physicist and recipient of the Nobel Prize in Physics in 1965


On Fri, 19 Apr 2024 at 07:05, Jane Sandberg via Evergreen-general <evergreen-general@list.evergreen-ils.org> wrote:

    Hi Linda,

    It's not for Evergreen, but my colleague recently blocked
    claudebot using fail2ban on our load balancer
    
<https://github.com/pulibrary/princeton_ansible/commit/6f9009249a168442391d90e2b75028d40a8a9e91>.
    Essentially, fail2ban is configured to watch Nginx's access log,
    and if more than 10 claudebot requests appear within the past
    minute from a particular IP, it automatically blocks all requests
    from that IP for the next 24 hours.  I would think that something
    similar could work for Apache's access log.

    Good luck with the bots!

      -Jane

    El vie, 19 abr 2024 a la(s) 3:42 a.m., Linda Jansová via
    Evergreen-general (evergreen-general@list.evergreen-ils.org) escribió:

        Dear all,

        Have any of you encountered an extensive crawling by
        Bytespider and
        Bytedance (see e.g.,
        
https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/),

        Claudebot or other AI bots?

        If so, do you have any secret recipe how to disable the
        crawler from
        accessing the site?

        Thank you very much for sharing your experience!

        Linda

        _______________________________________________
        Evergreen-general mailing list
        Evergreen-general@list.evergreen-ils.org
        http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general

    _______________________________________________
    Evergreen-general mailing list
    Evergreen-general@list.evergreen-ils.org
    http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


_______________________________________________
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general

_______________________________________________
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general

Reply via email to