Hi Sonia,

 

I’m excited to hear that KohaLA would like to finance an OAI-PMH client in 
Koha! This functionality is always brewing in the back of my mind, since I 
first raised 10662 back in 2013.

 

As Tomas says, I think that the background jobs are a key component for 
processing incoming OAI-PMH records. 

 

However, the ***missing component right now is the scheduling of the OAI-PMH 
harvesting tasks***, and I think this is where opinions get divided. Below, 
I’ll provide some history and opinions on Koha OAI-PMH.

 

--

 

With 10662, the sponsored goal was for Koha library staff to schedule OAI-PMH 
harvests through the Web UI. However, Fridolin from BibLibre raised a point 
with me at Kohacon18 about how letting library staff control the timing of 
harvesting tasks could be a problem for support vendors. If too many libraries 
using the same public IP address tried to harvest from the same OAI-PMH 
repository, they could be rate limited or blocked. There could also be server 
load concerns. So there probably needs to be a balance between user 
configuration and system configuration. If I recall correctly, this is how 
DSpace’s OAI-PMH harvester works. Users set up targets and can start/stop 
harvests, but things like frequency and concurrency are handled by the system 
configuration.

 

Based on my experience working on OAI-PMH on and off for nearly 10 years and as 
a Koha support vendor, I think my preference would be for sysadmins to handle 
most of the OAI-PMH harvesting details. 

 

The sponsorship for 10662 had certain requirements that many other libraries 
might not have, which is what made me think that it might be better to have an 
external client that connects to Koha. I thought maybe I could get the ordinary 
requirements pushed into Koha, and then handle extraordinary requirements 
externally. However, an external harvester won’t perform as fast as an internal 
harvester. (The compromise would be to write the harvester in such a way that 
people could provide different OAI-PMH harvester Perl modules that all stage 
records using the same core Koha modules.)

 

Even then… the scheduling would depend on a library’s needs. Back in 2013, I 
had a Koha OAI-PMH harvester which worked as a cronjob. It would run each 
night. However, some libraries want to run OAI-PMH harvests as frequently as 
every 3 seconds. A cronjob’s smallest frequency is 60 seconds, so that wouldn’t 
work for that requirement. 

 

If a cronjob isn’t suitable, then I think you’d need a daemon created by a new 
command like “koha-oai --start <instance_name>”. It could read a configuration 
file and handle scheduling accordingly. With 10662, I used the POE module, 
because I knew it well and it has some timer tools for scheduling tasks. If I 
were to work on it again, I’d probably use Mojo::IOLoop instead these days, 
since Mojolicious is already part of Koha while POE is not. (That said, using 
modules like Mojo and POE are difficult, because they’re difficult to test 
using automation. That was one of the stumbling blocks with 10662. While the 
10662 harvester worked very well, it was difficult to unit test. In hindsight, 
I should’ve written it in a way that was easier to unit test, but it had a lot 
of event-driven code which made things more difficult.)

 

Another option would be to create a generic daemon for task scheduling in 
general (e.g. “koha-schedule”). Koha could use this for many things, but it’s a 
project in itself. 

 

--

 

The process of downloading OAI-PMH records and importing MARCXML into Koha is 
actually a fairly straightforward process. The difficulty is the task 
scheduling and management of tasks (and unit testing). 

 

I don’t know the answer that will make everyone happy. There’s lots of 
different ways of managing and scheduling the tasks. Based on my experience, 
I’d suggest targeting the simplest approach first, because complexity will make 
it less likely for the project to succeed. 

 

On that note, I’d be happy to test/QA any OAI-PMH harvester put forward. When I 
was writing OAI-PMH harvester patches, I found it really hard to get QA, so I’m 
happy to be that resource for someone else. I’ve spent a lot of time thinking 
about this topic, so happy to provide advice, warnings, emotional support 😉. 

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

From: Koha-devel <koha-devel-boun...@lists.koha-community.org> On Behalf Of 
Tomas Cohen Arazi
Sent: Wednesday, 26 October 2022 3:46 AM
To: BOUIS Sonia <sonia.bo...@univ-lyon3.fr>
Cc: koha <k...@lists.katipo.co.nz>; koha-devel 
<koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester

 

I think with background jobs we have most of the framework that is needed to 
deal with this within Koha.

 

Best regards

 

El mar, 25 oct 2022 7:08, BOUIS Sonia <sonia.bo...@univ-lyon3.fr 
<mailto:sonia.bo...@univ-lyon3.fr> > escribió:

Hi,
KohaLA would like to finance an OAI-PMH client in Koha but, we have  questions 
that we want to raise to the community.
There was already tries to propose an OAI-PMH client :
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 : it's an old 
project that doesnt seem compatible with the current version of Koha
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 : the scope 
is more to use an external OAI-PMH client and to connect it to Koha

Our main question is about the way to handle this. Do you think that it's a 
better idea to use an external software or PERL routine and to find a way to 
connect it to Koha. Or would it be better to a new module in Koha from scratch 
and that Koha have his own OAI-PMH client.

Please, let us hear your toughts about this projet.

Kind regards

Sonia

Sonia BOUIS
------------------------------------------------------
Responsable du Service informatique documentaire
Département d'Appui à la Recherche et aux Projets (DARP)
Bibliothèques universitaires
Université Jean Moulin Lyon 3
ADRESSE GÉOGRAPHIQUE > Manufacture des Tabacs | 6 cours Albert Thomas | LYON 8e
ADRESSE POSTALE > Bibliothèque de la Manufacture | 1C avenue des Frères Lumière 
| CS 78242 - 69372 LYON CEDEX 08

Ligne directe : 33 (0)4 78 78 79 03

http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous > 
Facebook<https://www.facebook.com/bulyon3/> | 
Twitter<https://twitter.com/bulyon3>| 
Instagram<https://www.instagram.com/bu.lyon3/?hl=fr>

_______________________________________________

Koha mailing list  http://koha-community.org
k...@lists.katipo.co.nz <mailto:k...@lists.katipo.co.nz> 
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

Reply via email to