Hi folks,
some of the metadata here at ADS contain CDATA elements. It causes
errors in Invenio because invenio.textutils.encode_for_xml(), which is
used in bibformat_utils.record_get_xml(), does not properly encode these
elements. The result is that the MARCXML provided by Invenio is not
valid for these records. (And invenio.search_engine.get_record() fails
without saying anything, but that's another story...)
'<![CDATA[ lalala ]]>' is converted to '<![CDATA[ lalala ]]>' and
inserted in the MARCXML representation of the record. The XML parsers
(all 3 supported by Invenio) then refuse to parse because ']]>' is not a
valid string in the content.
I come here (in peace) because I am unsure about what encode_for_xml()
should do. Escape the CDATA element completely (by transforming ']]>' to
']]>') or leave the CDATA element alone.
Benoit.
- encode_for_xml and CDATA Benoit Thiell
-