Hi, Just to let you know that during the KohaLa hackathon (until wednesday), we are thinking about the OAI-PMH harvester. I add our first thoughts on the BZ ticket : https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662#c318
Kind regards, Sonia -----Message d'origine----- De : Arthur Suzuki [mailto:arthur.suz...@biblibre.com] Envoyé : mercredi 23 novembre 2022 14:44 À : Mike D. <blac...@gmail.com>; BOUIS Sonia <sonia.bo...@univ-lyon3.fr> Cc : koha@lists.katipo.co.nz; koha-de...@lists.koha-community.org Objet : Re: [Koha] OAI-PMH harvester Hello there, If I may suggest a good harvester library, Catmandu may do the job pretty well. I've not used the OAI module but used it to harvest from a JSON source and transform to an UNIMARC file with pretty good success so far. It can export seamlessly to iso2709 or marcxml. https://metacpan.org/dist/Catmandu-OAI Best, Arthur On 2022-11-22 15:57, Mike D. wrote: > Hey. Hey, > I'm really glad to see the OAI-PMH harvester debate going on for Koha. > I > think if we choose a good external harvester with support, we can save > a lot of energy and resources to implement related activities in the > system. > Shoveling the logs is only part of the story. The easy part. Since the > result of shoveling is a lot of records, most of the time we can't > avoid post-processing, merging with the records in the local database. > For example, if you need to update records from a source where there > are millions of records, but there are hundreds of thousands in the > local database. Only a slice of that huge amount is relevant. If we > design the processing workflow wrong, it will take unnecessarily long > and burn valuable resources. > I would hereby like to invite us to be in touch, to debate and share > our experiences. Let's get this area moving towards a successful > finish. > > Take care. > > Michal > > út 22. 11. 2022 v 15:13 odesílatel BOUIS Sonia > <sonia.bo...@univ-lyon3.fr> > napsal: > >> Hi, >> Thanks to David, Tomas, Michal and Michael for your replies. >> >> So we have decided to evaluate several external OAI-PMH client that >> could be used by Koha and to choose one in the end of January There a >> lot to do after that and we discussed about the background jobs and >> cronjobs seems to be appropriate. We thought that the settings in the >> koha intranet should be only to define URLs, SETs, or XSLT sheets >> (for example, to transform DC XML in MARCXML). >> >> We are only at the begining of the process 😊 >> >> Kind regards, >> Sonia >> >> ------------------------------ >> >> Message: 2 >> Date: Wed, 26 Oct 2022 10:37:49 +1100 >> From: "David Cook" <dc...@prosentient.com.au> >> To: "'Tomas Cohen Arazi'" <tomasco...@gmail.com>, "'BOUIS Sonia'" >> <sonia.bo...@univ-lyon3.fr> >> Cc: "'koha'" <koha@lists.katipo.co.nz>, "'koha-devel'" >> <koha-de...@lists.koha-community.org> >> Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester >> Message-ID: <07af01d8e8ca$dfbddef0$9f399cd0$@prosentient.com.au> >> Content-Type: text/plain; charset="utf-8" >> >> Hi Sonia, >> >> >> >> I’m excited to hear that KohaLA would like to finance an OAI-PMH >> client in Koha! This functionality is always brewing in the back of >> my mind, since I first raised 10662 back in 2013. >> >> >> >> As Tomas says, I think that the background jobs are a key component >> for processing incoming OAI-PMH records. >> >> >> >> However, the ***missing component right now is the scheduling of the >> OAI-PMH harvesting tasks***, and I think this is where opinions get >> divided. Below, I’ll provide some history and opinions on Koha >> OAI-PMH. >> >> >> >> -- >> >> >> >> With 10662, the sponsored goal was for Koha library staff to schedule >> OAI-PMH harvests through the Web UI. However, Fridolin from BibLibre >> raised a point with me at Kohacon18 about how letting library staff >> control the timing of harvesting tasks could be a problem for support >> vendors. If too many libraries using the same public IP address tried >> to harvest from the same OAI-PMH repository, they could be rate >> limited or blocked. There could also be server load concerns. So >> there probably needs to be a balance between user configuration and >> system configuration. If I recall correctly, this is how DSpace’s >> OAI-PMH harvester works. Users set up targets and can start/stop >> harvests, but things like frequency and concurrency are handled by >> the system configuration. >> >> >> >> Based on my experience working on OAI-PMH on and off for nearly 10 >> years and as a Koha support vendor, I think my preference would be >> for sysadmins to handle most of the OAI-PMH harvesting details. >> >> >> >> The sponsorship for 10662 had certain requirements that many other >> libraries might not have, which is what made me think that it might >> be better to have an external client that connects to Koha. I thought >> maybe I could get the ordinary requirements pushed into Koha, and >> then handle extraordinary requirements externally. However, an >> external harvester won’t perform as fast as an internal harvester. >> (The compromise would be to write the harvester in such a way that >> people could provide different OAI-PMH harvester Perl modules that >> all stage records using the same core Koha >> modules.) >> >> >> >> Even then… the scheduling would depend on a library’s needs. Back in >> 2013, I had a Koha OAI-PMH harvester which worked as a cronjob. It >> would run each night. However, some libraries want to run OAI-PMH >> harvests as frequently as every 3 seconds. A cronjob’s smallest >> frequency is 60 seconds, so that wouldn’t work for that requirement. >> >> >> >> If a cronjob isn’t suitable, then I think you’d need a daemon created >> by a new command like “koha-oai --start <instance_name>”. It could >> read a configuration file and handle scheduling accordingly. With >> 10662, I used the POE module, because I knew it well and it has some >> timer tools for scheduling tasks. If I were to work on it again, I’d >> probably use Mojo::IOLoop instead these days, since Mojolicious is >> already part of Koha while POE is not. (That said, using modules like >> Mojo and POE are difficult, because they’re difficult to test using >> automation. That was one of the stumbling blocks with 10662. While >> the 10662 harvester worked very well, it was difficult to unit test. >> In hindsight, I should’ve written it in a way that was easier to unit >> test, but it had a lot of event-driven code which made things more >> difficult.) >> >> >> >> Another option would be to create a generic daemon for task >> scheduling in general (e.g. “koha-schedule”). Koha could use this for >> many things, but it’s a project in itself. >> >> >> >> -- >> >> >> >> The process of downloading OAI-PMH records and importing MARCXML into >> Koha is actually a fairly straightforward process. The difficulty is >> the task scheduling and management of tasks (and unit testing). >> >> >> >> I don’t know the answer that will make everyone happy. There’s lots >> of different ways of managing and scheduling the tasks. Based on my >> experience, I’d suggest targeting the simplest approach first, >> because complexity will make it less likely for the project to succeed. >> >> >> >> On that note, I’d be happy to test/QA any OAI-PMH harvester put >> forward. >> When I was writing OAI-PMH harvester patches, I found it really hard >> to get QA, so I’m happy to be that resource for someone else. I’ve >> spent a lot of time thinking about this topic, so happy to provide >> advice, warnings, emotional support 😉. >> >> >> >> David Cook >> >> Senior Software Engineer >> >> Prosentient Systems >> >> Suite 7.03 >> >> 6a Glen St >> >> Milsons Point NSW 2061 >> >> Australia >> >> >> >> Office: 02 9212 0899 >> >> Online: 02 8005 0595 >> >> >> >> From: Koha-devel <koha-devel-boun...@lists.koha-community.org> On >> Behalf Of Tomas Cohen Arazi >> Sent: Wednesday, 26 October 2022 3:46 AM >> To: BOUIS Sonia <sonia.bo...@univ-lyon3.fr> >> Cc: koha <koha@lists.katipo.co.nz>; koha-devel < >> koha-de...@lists.koha-community.org> >> Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester >> >> >> >> I think with background jobs we have most of the framework that is >> needed to deal with this within Koha. >> >> >> >> Best regards >> >> >> >> El mar, 25 oct 2022 7:08, BOUIS Sonia <sonia.bo...@univ-lyon3.fr >> <mailto: >> sonia.bo...@univ-lyon3.fr> > escribió: >> >> Hi, >> KohaLA would like to finance an OAI-PMH client in Koha but, we have >> questions that we want to raise to the community. >> There was already tries to propose an OAI-PMH client : >> - https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 : >> it's >> an old project that doesnt seem compatible with the current version >> of Koha >> - https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 : >> the >> scope is more to use an external OAI-PMH client and to connect it to >> Koha >> >> Our main question is about the way to handle this. Do you think that >> it's a better idea to use an external software or PERL routine and to >> find a way to connect it to Koha. Or would it be better to a new >> module in Koha from scratch and that Koha have his own OAI-PMH >> client. >> >> Please, let us hear your toughts about this projet. >> >> Kind regards >> >> Sonia >> >> Sonia BOUIS >> ------------------------------------------------------ >> Responsable du Service informatique documentaire Département d'Appui >> à la Recherche et aux Projets (DARP) Bibliothèques universitaires >> Université Jean Moulin Lyon 3 ADRESSE GÉOGRAPHIQUE > Manufacture des >> Tabacs | 6 cours Albert Thomas | LYON 8e ADRESSE POSTALE > >> Bibliothèque de la Manufacture | 1C avenue des Frères Lumière | CS >> 78242 - 69372 LYON CEDEX 08 >> >> Ligne directe : 33 (0)4 78 78 79 03 >> >> http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous > >> Facebook< https://www.facebook.com/bulyon3/> | >> Twitter<https://twitter.com/bulyon3>| >> Instagram<https://www.instagram.com/bu.lyon3/?hl=fr> >> >> _______________________________________________ >> >> Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz >> <mailto:Koha@lists.katipo.co.nz> >> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha >> >> -------------- next part -------------- An HTML attachment was >> scrubbed... >> URL: < >> http://lists.koha-community.org/pipermail/koha-devel/attachments/2022 >> 1026/d7712779/attachment-0001.htm >> > >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> Koha-devel mailing list >> koha-de...@lists.koha-community.org >> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel >> website : https://www.koha-community.org/ git : >> https://git.koha-community.org/ bugs : >> https://bugs.koha-community.org/ >> >> >> ------------------------------ >> >> End of Koha-devel Digest, Vol 203, Issue 15 >> ******************************************* >> _______________________________________________ >> >> Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz >> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha >> > _______________________________________________ > > Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha -- Arthur Suzuki, 🌈🏔️ Développeur @BibLibre _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha