Hello Alexander, [...] >> I rather think that it should be done at the MarcXML parser level, to be >> compliant with the robustness principle, or Postel Law: > > I think you slightly missunderstood me. My idea would have been to see > this sanitiser as an additional step in the ingester. So to say a > separate module that is called here. IT complies almost entirely with > your idea of integration. Except that it might be handy to have it > callable from the shell as well. (ipy or whatever.)
I see. As a matter of fact, I tried to find that utility, starting with tidy (http://packages.debian.org/tidy) and xmllint (http://packages.debian.org/libxml2-utils), but I didn't find an obvious way of doing it, neither a clean way to integrate in the Invenio's Bibharvest utility, in the BibFilter program box, until I asked for advice. But yes, you are right. It would be nice to have a remove-emtpy-fields (or singletons, as I've learned from Tibor message) utility around. Thanks for your comment, Ferran
