While they do ignore robots.txt they do at least supply a recognizable
user agent that you can just block:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
RewriteRule "^.*" "-" [F]
Note that second RewriteCond is required or you'll end up with a
redirect loop. They will still be sending you requests but at least
they won't tie up a plack backend doing useless work. I haven't tried
returning 5xx errors to see if that causes them to back off but I doubt
they would take much notice.
Jason
--
Jason Boyer
Senior System Administrator
Equinox Open Library Initiative
jbo...@equinoxoli.org
+1 (877) Open-ILS (673-6457)
https://equinoxOLI.org/ <https://equinoxoli.org/>
On Thu, Jul 25 2024 at 01:45:56 PM +0100, Nigel Titley
<ni...@titley.com> wrote:
Dear Michael
On 25/07/2024 13:28, Michael Kuhn wrote:
Hi Nigel
In such a case I would advise to create a sitemap - unfortunately
this Koha feature seems not so well documented, but the following
may give you a start:
* <https://lists.katipo.co.nz/public/koha/2020-November/055401.html>
*
<https://wiki.koha-community.org/wiki/Commands_provided_by_the_Debian_packages#koha-sitemap>
*
<https://koha-community.org/manual/24.05/en/html/cron_jobs.html#sitemap>
Thanks for this. I'll give it a go and see what happens, although if
Facebook is ignoring the robots.txt file I suspect it will ignore the
sitemap too.
There's been a great deal of annoyance about this on the facebook
developers forums.
I'll let you know how it goes
Nigel
_______________________________________________
Koha mailing list http://koha-community.org
<http://koha-community.org/>
Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz>
Unsubscribe: <https://lists.katipo.co.nz/mailman/listinfo/koha>
_______________________________________________
Koha mailing list http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha