Thanks so much for all the great tips!! Have a great week. Best, --Jackie
-----Original Message----- From: Galen Charlton [mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> ] Sent: Thursday, February 21, 2008 2:33 PM To: Shieh, Jackie Cc: perl4lib@perl.org Hi Jackie, On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie <[EMAIL PROTECTED]> wrote: > What I have is an Excel spreadsheet for dissertations which I have > saved as a tab delimited file (examining the file in TextPad, the > diacritics appears to be fine), then read in and output the file as a > utf-8 MARC file. I <print> title field confirming author field that > contains diacritics with the title showing proper indicator values. It looks like your input file is in ISO-8859-1. In order to have Perl do the character conversion to UTF-8, you can either assert that the input filehandle is in the ISO-8859-1 character set by doing this: use Encode; binmode IN, ":encoding(iso-8859-1)"; or do the conversion explicitly like this: use Encode; my $converted_line = decode("iso-8859-1", $line); By not doing the conversion before adding data from the string to the MARC::Record object, MARC::Record->as_usmarc() builds the MARC blob from ISO-8859-1 strings and calculates the record length for the leader by counting the resulting number of bytes. The resulting MARC blob is an ISO-8859-1 string, with Perl's internal utf8 flag turned off. However, during the print to your output filehandle, Perl automatically converts the MARC blob string from ISO-8859-1 to UTF-8 because the output filehandle is binmode :utf8, resulting in the discrepancy between Leader/00-04 and the length of the output MARC blob. Regards, Galen -- Galen Charlton Koha Application Developer LibLime [EMAIL PROTECTED] p: 1-888-564-2457 x709