> > I can also see that this record is broken because the XML entity > > ' is in a MARC communications format file. > > The character entity ' *is valid* in a MARC-XML file. > It is one of the few standard character entities allowed in > an XML file, e.g., &, <, >, and '.
A recent MARC Proposal recommends the use of Numeric Character References as an alternative for unmappable characters when converting from Unicode to MARC-8 in MARC21 records [1]. The "'" entity is not a *numeric* character reference, but I'm just mentioning this as an FYI in case you start seeing the numeric character entities in MARC communications format files. -- Michael [1] MARC PROPOSAL NO. 2006-09 http://www.loc.gov/marc/marbi/2006/2006-09.html "SUMMARY: This paper specifies a lossless technique utilizing Numeric Character References for converting unmappable characters when going from Unicode to MARC-8 for systems that cannot handle Unicode encoding. It is intended to be an alternative to the lossy technique approved in 2006-04. The MARC advisory committee recommended that both a lossy and a lossless technique be officially adopted." # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Houghton,Andrew [mailto:[EMAIL PROTECTED] > Sent: Monday, May 14, 2007 9:56 AM > To: perl4lib@perl.org > Subject: RE: Working around a UTF8/Unicode encoding problem > > > From: Jason Ronallo [mailto:[EMAIL PROTECTED] > > Sent: 12 May, 2007 16:52 > > To: William Denton > > Cc: perl4lib@perl.org > > Subject: Re: Working around a UTF8/Unicode encoding problem > > > > I can also see that this record is broken because the XML entity > > ' is in a MARC communications format file. > > The character entity ' *is valid* in a MARC-XML file. > It is one of the few standard character entities allowed in > an XML file, e.g., &, <, >, and '. > > > Andy. >