Hi Jackie,

I'm working on a very similar problem... converting theses/dissertations 
records (in XML) to MARC records.  I'm still in the testing stage, but have had 
similar problems with records with diacritics in the 100 or 245 fields (however 
diacritics in a 520a field don't seem to cause any problems).  Since our 
records are not "diacritic rich" it's hard to determine the exact extent of the 
problem.

I am using these versions:
  Perl v5.8.8
  MARC::Charset 0.98
  MARC::Lint 1.43
  MARC::Record 2.0
  XML::LibXML 1.66

Here's an example "bad" record (which I have minimized to just the 245 field):

marcdump test.mrc
test.mrc
LDR 00127cam a2200037   4500
245 13 _aAn Empirical Test Of The Situational Leadership® Model In Japan /
       _cRiho Yoshioka.

 Recs  Errs Filename
----- ----- --------
    1     1 test.mrc

When I run test.mrc through MARC::Lint, I get this message:

 Invalid record length in record 1: Leader says 00127 bytes but it's actually 
125
 Invalid length in directory for tag 245 in record 1
 field does not end in end of field character in tag 245 in record 1

When examined in vi the character in question, a Registered Sign, appears to be 
correctly UTF-8 encoded C2AE, and the bib Leader (position 09=a) indicates that 
it is Unicode encoded.  I've attached the MARC record.

I noticed that when I run your record (ck245.dat) through MARC::Lint, I get the 
same invalid record length message:

 Invalid record length in record 3: Leader says 00567 bytes but it's actually 
569
 field does not end in end of field character in tag 100 in record 3
 field does not end in end of field character in tag 245 in record 3
 Invalid indicators ".10" forced to blanks in record 3 for tag 245

 field does not end in end of field character in tag 260 in record 3
 Invalid indicators ".  " forced to blanks in record 3 for tag 260

 field does not end in end of field character in tag 300 in record 3
 Invalid indicators ".  " forced to blanks in record 3 for tag 300

 field does not end in end of field character in tag 502 in record 3
 Invalid indicators ".  " forced to blanks in record 3 for tag 502

 field does not end in end of field character in tag 504 in record 3
 Invalid indicators ".  " forced to blanks in record 3 for tag 504

 field does not end in end of field character in tag 690 in record 3
 Invalid indicators ". 4" forced to blanks in record 3 for tag 690

Anybody have any ideas?

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
 

> -----Original Message-----
> From: Shieh, Jackie [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, February 19, 2008 10:50 AM
> To: perl4lib@perl.org
> Subject: Help for utf-8 output
> 
> I was wondering if anyone has similar experience and has come 
> up with good solutions to help solving the challenge below?!
> 
> What I have is an Excel spreadsheet for dissertations which I 
> have saved as a tab delimited file (examining the file in 
> TextPad, the diacritics appears to be fine), then read in and 
> output the file as a utf-8 MARC file. I  <print> title field 
> confirming author field that contains diacritics with the 
> title showing proper indicator values. 
> 
> But when I looked the MARC itself, the fields that follow the 
> field containing diacritics are all off its original 
> position. See attached zip file.  Examples below: first two 
> have diacritics in a 100 field, last one diacritic is in 245 
> subfield b)
> 
> 001     diss 34001
> 100 1  _aP<E9>rez, Nancy L.
> 245     _aSynchronic and Diachronic Matlatzinkan Phonology.
> 
> 001     diss 34042
> 100 1  _aValent<ED>n-M<E1>rquez, Wilfredo
> 245     _aDoing being boricua :
> 
> 001     diss 33892
> 100 1   _aDavis, Jennifer M.
> 245 14 _aThe Functional Complexities of Inherited Cardiac 
> Troponin I Mutations :
>             _bIdentification of Ca<B2>+ Independent 
> Contractile Dysfunction.
> 
> I would be greatly appreciate any suggestion to solve this. 
> Thank you most kindly. 
> 
> Regards, 
>  
> --Jackie 
>  
> |Jackie Shieh
> |Data Loads & Development
> |Harlan Hatcher Graduate Library
> |University of Michigan
> |920 North University
> |Ann Arbor, MI 48109-1205
> |Phone: 734.763.6070 FAX: 734.615.9788
> |E-mail: JShieh [AT] umich [DOT] edu
> 

Attachment: test.mrc
Description: test.mrc

Reply via email to