Hi folks,

some of the metadata here at ADS contain CDATA elements. It causes errors in Invenio because invenio.textutils.encode_for_xml(), which is used in bibformat_utils.record_get_xml(), does not properly encode these elements. The result is that the MARCXML provided by Invenio is not valid for these records. (And invenio.search_engine.get_record() fails without saying anything, but that's another story...)

'<![CDATA[ lalala ]]>' is converted to '&lt;![CDATA[ lalala ]]>' and inserted in the MARCXML representation of the record. The XML parsers (all 3 supported by Invenio) then refuse to parse because ']]>' is not a valid string in the content.

I come here (in peace) because I am unsure about what encode_for_xml() should do. Escape the CDATA element completely (by transforming ']]>' to ']]&gt;') or leave the CDATA element alone.

Benoit.

Reply via email to