> -----Original Message----- > From: Shelley Doljack [mailto:sdolj...@stanford.edu] > Sent: 31 July 2012 20:18 > > The problem was I wasn't telling perl to output UTF-8. Now that I added > binmode(FILE, ':utf8') to my script, the problem is fixed. However, it sounds > like once I set binmode to UTF-8 everything will be interpreted as such, even > when the record is in MARC-8. Is that right? So this means that I can only use > my script with a file of records where all of them are encoded in UTF-8. If I > want to run the script against a file with all MARC-8 encoding, then I'd need > to remove the binmode line.
It depends how much manipulation of the records you are doing in the script. One approach is to use binmode(FILE, ':raw'); for both input and output. Perl will then keep the bytes of the records exactly as they are. You won't be able to test for exotic characters so easily, and amending field content would be inadvisable, but if all you are doing is something like reading in the records and filtering out any that have no 245 field, or something fairly basic like that, this could be the best approach. The MARC::Record module does not seem to care how the records are encoded. It's only once you start altering field content, testing field content, or adding fields that the character set being used becomes an issue. Removing fields would be fine too. MARC-8 can be very complex, particularly if other code tables like CJK are invoked, or even just Greek or Cyrillic. If you were manipulating field content in that kind of way they converting everything to UTF-8 would make things very much easier. Matthew -- Matthew Phillips Electronic Systems Librarian, Durham University Durham University Library, Stockton Road, Durham, DH1 3LY +44 (0)191 334 2941