Thanks so much for all the great tips!!  Have a great week.
 
Best, 
 
--Jackie


-----Original Message-----
From: Galen Charlton [mailto:[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> ]
Sent: Thursday, February 21, 2008 2:33 PM
To: Shieh, Jackie
Cc: perl4lib@perl.org

Hi Jackie,

On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie <[EMAIL PROTECTED]>
wrote:
> What I have is an Excel spreadsheet for dissertations which I have
> saved as a tab delimited file (examining the file in TextPad, the
> diacritics appears to be fine), then read in and output the file as a
> utf-8 MARC file. I <print> title field confirming author field that
> contains diacritics with the title showing proper indicator values.

It looks like your input file is in ISO-8859-1.  In order to have Perl
do the character conversion to UTF-8, you can either assert that the
input filehandle is in the ISO-8859-1 character set by doing this:

use Encode;
binmode IN, ":encoding(iso-8859-1)";

or do the conversion explicitly like this:

use Encode;
my $converted_line = decode("iso-8859-1", $line);

By not doing the conversion before adding data from the string to the
MARC::Record object, MARC::Record->as_usmarc() builds the MARC blob from
ISO-8859-1 strings and calculates the record length for the leader by
counting the resulting number of bytes.  The resulting MARC blob is an
ISO-8859-1 string, with Perl's internal utf8 flag turned off.

However, during the print to your output filehandle, Perl automatically
converts the MARC blob string from ISO-8859-1 to UTF-8 because the
output filehandle is binmode :utf8, resulting in the discrepancy between
Leader/00-04 and the length of the output MARC blob.

Regards,

Galen
--
Galen Charlton
Koha Application Developer
LibLime
[EMAIL PROTECTED]
p: 1-888-564-2457 x709


Reply via email to