On Wed, Oct 29, 2008 at 5:01 PM, Dan Scott <[EMAIL PROTECTED]> wrote:
> 2008/10/29 Bill Erickson <[EMAIL PROTECTED]>: > > Hi all, > > > > I ran across some gnarly MARC data today, which contained, among other > > things, MARC codes of "<". I realized that Marc::File::XML outputs the > MARC > > tags, codes, and indicators without escaping them. This results, in my > > case, in invalid XML like: > > > > <subfield code="<">France</subfield> > > > > It seems reasonable that, regardless of the (horrible) content of the > MARC, > > marc::file::xml should produce valid XML. > > > > Attached is a patch to explicitly escape the values before inserting them > > into the XML document under construction. I'm not sure if it's the best > > approach, but it got me up and running again. > > Any chance of including a sample (horrible) MARC record to include in > a testcase? > > I'm not saying I would build a testcase for MARC::File::XML, but I > might build one for File_MARC (PHP)... and a nice horrible MARC record > from the wild would help. > > Attached, including the post-escape XML version. -b
<?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/ standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <leader>00727nam 2200205 a 4500</leader> <controlfield tag="001">03-0016458</controlfield> <controlfield tag="005">19971103184734.0</controlfield> <controlfield tag="008">970701s1997 oru u000 0 eng u</controlfield> <datafield tag="035" ind1=" " ind2=" "> <subfield code="a">(Sirsi) a351664</subfield> </datafield> <datafield tag="050" ind1="0" ind2="0"> <subfield code="a">ML270.2</subfield> <subfield code="b">.A6 1997</subfield> </datafield> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">Anthony, James R.</subfield> </datafield> <datafield tag="245" ind1="0" ind2="0"> <subfield code="a">French baroque music from Beaujoyeulx to Rameau</subfield> </datafield> <datafield tag="250" ind1=" " ind2=" "> <subfield code="a">Rev. and expanded ed.</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="a">Portland, OR :</subfield> <subfield code="b">Amadeus Press,</subfield> <subfield code="c">1997.</subfield> </datafield> <datafield tag="300" ind1=" " ind2=" "> <subfield code="a">586 p. :</subfield> <subfield code="b">music</subfield> </datafield> <datafield tag="650" ind1=" " ind2="0"> <subfield code="a">Music</subfield> <subfield code="<">France</subfield> <subfield code="y">16th century</subfield> <subfield code="x">History and criticism.</subfield> </datafield> <datafield tag="650" ind1=" " ind2="0"> <subfield code="a">Music</subfield> <subfield code="z">France</subfield> <subfield code="y">17th century</subfield> <subfield code="x">History and criticism.</subfield> </datafield> <datafield tag="650" ind1=" " ind2="0"> <subfield code="a">Music</subfield> <subfield code="z">France</subfield> <subfield code="y">18th century</subfield> <subfield code="x">History and criticism.</subfield> </datafield> <datafield tag="949" ind1=" " ind2=" "> <subfield code="a">ML 270.2 A6 1997</subfield> <subfield code="w">LC</subfield> <subfield code="i">30007006841505</subfield> <subfield code="r">Y</subfield> <subfield code="t">BOOKS</subfield> <subfield code="l">HUNT-CIRC</subfield> <subfield code="m">HUNTINGTON</subfield> </datafield> <datafield tag="596" ind1=" " ind2=" "> <subfield code="a">1</subfield> </datafield> </record>
test.mrc
Description: Binary data