I'm working on a Perl-based OAI harvester and have run a problem.  The module 
that I'm using - Net::OAI::Harvester - does a great job of parsing out the 
different OAI tagged "fields" so that they can be put into a MySQL table of 
retrieved OAI records for searching.

Unfortunately, in using the University of Michigan OAI Toolkit, I have found 
that at least one repository has repeated tags.  In particular, multiple 
identifier tags.  This presents a problem in that it seems that 
Net::OAI::Harvester gets the first (and, as far as I know how to use it, only 
the first) instance of a tag.  In addition to the loss of data (which is always 
bad), it is made worse here by the fact that the repository that I'm trying to 
harvest usually places the URL to connect to the repository item in  the second 
identifier tag.  That being the case, the URL does not get saved to the 
database and the harvest is less-than-useful to our users.

Does anyone know a way in which Net::OAI::Harvester can be used with oai_dc 
records in a way where multiple instances of a tag can be captured and then 
concatenated with the first one.

I have spent some time trying a number of different approaches, including 
trying different libraries (such as XML::LibXML and XML::SAX::Parser), but I 
can't seem to get it to work with the input I get inside the 
Net::OAI::Harvester module, which has been run through the Storable module).

Unfortunately, the documentation that I have been able to find on the Web does 
not provide information on any methods that I could use.

Would it make more sense just to move to the University of Michigan Toolkit to 
harvest the XML records?  I would prefer to continue with the 
Net::OAI::Harvester module if I can in that it allows me to be flexible in what 
sorts of schemas I'm able to harvest, not just unqualified Dublin Core.  

That being said, I do have one other question: Is there a way within the 
Net::OAI::Harvester to output the actual metadata structure that's being 
harvested?

Thanks in advance for any assistance that you can provide!

Stephen Westman

Reply via email to