http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662
David Cook <dc...@prosentient.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #20757|0 |1 is obsolete| | --- Comment #4 from David Cook <dc...@prosentient.com.au> --- Created attachment 20758 --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=20758&action=edit Bug 10662 - Build OAI-PMH Harvesting Client N.B. This feature is still a work in progress. This commit represents my work to date on the OAI-PMH harvester, but it's not complete. While the core classes are operational, I still need to improve the cronjob, the DC to MARC21 XSLT, and the internal workings of the classes (mostly in terms of error handling, edge cases, and reporting). This patch adds several new files to Koha: Overview: 1) C4/OAI/Harvester.pm: This contains 2 classes. The Harvester class sets up the 6 OAI-PMH verbs (2 of which harvest records) and imports records into Koha. The Harvester::Record class is a helper class for processing records, transforming metadata, etc. At the moment, this is hardwired for MARC21 but it's easy enough to expand it out to other flavours (provided there are XSLTs there for the metadata transforms). 2) koha-tmpl/intranet-tmpl/prog/en/xslt/DC2MARC21slim.xsl: This is a XSLT which transforms oai_dc into MARC21. This is a lightweight XSLT based on one I found from the Library of Congress. I improved the Leader and I will endeavour to improve the 008 and map more fields in a more orderly way than this. However, it's a good start. 3) koha-tmpl/intranet-tmpl/prog/en/xslt/MARC21slim2MARC21enhanced.xsl: This simply strips 999 fields from incoming records, adds the OAI-PMH unique identifier, and adds a 999 field if that unique identifier has already been imported in the past (the 999 is added so that automatic matching and replacement can take place on import). 4) misc/cronjobs/oai_harvester.pl: At the moment, this script takes database config for a OAI-PMH repository to create a Harvester object, then tries each OAI-PMH verb and imports the resulting records into Koha. 5) t/db_dependent/OAI_harvester.t: This is a unit test which uses Koha as the OAI-PMH repository and client in a circular loop. It should test the high level methods and go from retrieving database config to importing records. It uses the "rollback" method so you won't have a bunch of records imported into your database (although your autoincrement will probably go up anyway). Test Plan (For MARC21 users): 0) Apply the patch and run updatedatabase.pl (it will add two new tables to your database: oai_harvest and oai_harvest_repositories). I've documented them in kohastructure.sql. 1) For starters, run the OAI_harvester.t test. It should cycle through all the tests without any problems. If you there are problems, let me know on Bugzilla, in IRC, or via email. 2a) Next, if you feel safe importing records into your database, use the config from the unit test as an example and create an entry in your oai_harvest_repositories table. Be careful with the "import_mode". "Automatic" will automatically stage and import any records harvested via OAI-PMH. "Manual" will stage them but not import them. 2b) Once you're somewhat sure of your config, run oai_harvester.pl. The default is not to use automatic token resumption (so you should only have 50 records in your import batch most likely). You can change this in the cronjob itself. If you fully import this batch, try running "oai_harvester.pl" again. You shouldn't get any results (as you've already imported that batch of records). To try out the "replace/update" feature: update one of the original records you imported (out of that first 50), re-index, and re-run "oai_harvester.pl". You should now get 1 record in your batch which should automatically match the original. In "automatic" mode, it would automatically replace the record (although this can still be reverted in the Manage Staged MARC Records). In "manual" mode, you'll notice in the management interface that there has been a match for that record with the original. 3) To more fully explore: Examine C4/OAI/Harvester.pm. I wrote the POD at the end of August, so it's a bit out of date but it should be mostly accurate. I've tried to include as many comments as possible along the way as well. It's not a very sophisticated module. However, there are a lot of different scenarios that I'm trying to account for. I might've missed some use/edge cases, implemented bad code, or error handling. A place to start might be any FIXME or TODO messages. -- As I said at the top, this is still a work in progress, but I'd be happy to get any feedback or advice on how to proceed with this feature. I'll continue working on it and post updates. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/