Hello Alexander, >> when importing from external OAI sources, I'm finding records empty >> fields, for example: >> >> <datafield tag="500" ind1="" ind2=""> >> <subfield code="a" /> >> </datafield> >> > [...] >> I'm seeking some advice about how, where or when to deal with them. >> Should it be done just during the Dublin Core to Marcxml conversion >> (say, etc/bibconvert/config/ojs2marcxml.xsl) or in the MarcXML parser >> (lib/python/invenio/bibrecord.py), in the general function >> (create_record) or for each of the low lever parsers >> (create_record_RXP, create_record_minidom create_record_4suite). > > After some thinking about the issue, I think it would be nice to have > some "Marc Sanitiser" that takes care about the above issues before > ingest. One could actually imagine quite some reasons why such records > exist. At least as far as I can see they are valid XML, just not valid > Marc.
I rather think that it should be done at the MarcXML parser level, to be compliant with the robustness principle, or Postel Law: Be conservative in what you do, be liberal in what you accept from others. (http://en.wikipedia.org/wiki/Robustness_principle) I've done a simple implementation using xsl for one specific conversion, but it doesn't scale. If it were done in the bibrecord.py, everybody, everywhere, would benefit from it. Ferran
