Hello Alexander,

>> when importing from external OAI sources, I'm finding records empty
>> fields, for example:
>>
>>      <datafield tag="500" ind1="" ind2="">
>>        <subfield code="a" />
>>      </datafield>
>>
> [...]
>> I'm seeking some advice about how, where or when to deal with them.
>> Should it be done just during the Dublin Core to Marcxml conversion
>> (say, etc/bibconvert/config/ojs2marcxml.xsl) or in the MarcXML parser
>> (lib/python/invenio/bibrecord.py), in the general function
>> (create_record) or for each of the low lever parsers
>> (create_record_RXP, create_record_minidom create_record_4suite).
>
> After some thinking about the issue, I think it would be nice to have
> some "Marc Sanitiser" that takes care about the above issues before
> ingest. One could actually imagine quite some reasons why such records
> exist. At least as far as I can see they are valid XML, just not valid
> Marc.

I rather think that it should be done at the MarcXML parser level, to be
compliant with the robustness principle, or Postel Law:

 Be conservative in what you do, be liberal in what you accept from
 others. (http://en.wikipedia.org/wiki/Robustness_principle)

I've done a simple implementation using xsl for one specific conversion,
but it doesn't scale.  If it were done in the bibrecord.py, everybody,
everywhere, would benefit from it.

Ferran

Reply via email to