MARC::Record and UTF-8 Perl version
Hi Ed, How would people feel about the next version of MARC-Record (perhaps a v2.0) which handled utf8 properly and required a modern perl? Entirely agree with Michael Doran: Definitely a *good* thing. Perhaps if people could respond to the list (or me if you prefer) with the version of Perl that you use MARC::Record with I could keep tallies and report back to the list. - I am currently using MARC::Record 1.34 with Perl 5.6.0 - I'll soon be migrating to another machine running Aleph with utf8 data with Perl 5.8.2 I will install the latest stable version of MARC::Record on this machine. Regards, Ian _ Ian Hamilton Library Systems Administrator European Commission - Directorate General for Education and Culture EAC C4 - Central Library Unit * +32-2-295.24.60 (direct phone) * +32-2-299.91.89 (fax) http://europa.eu.int/comm/dgs/education_culture/index_en.htm http://europa.eu.int/comm/libraries/index.htm http://europa.eu.int/eclas/
RE: MARC::Record and UTF-8
From: Ron Davies [mailto:[EMAIL PROTECTED] Sent: Friday, January 07, 2005 2:54 AM Subject: Re: MARC::Record and UTF-8 At 07:50 7/01/2005, [EMAIL PROTECTED] wrote: Does anyone know of any work underway to adapt MARC::Record for utf-8 encoding ? I will have a similar project in a few months' time, converting a whole bunch of processing from MARC-8 to UTF-8. I would be very happy to assist in testing or development of a UTF-8 capability for MARC::Record. Is the problem listed in This is not a Perl solution, but if you are just looking to convert MARC-8 records to UTF-8 record you can use Terry Reese's MarcEdit program. Under its MARC Tools section it allows you to do batch conversions. You can download it from: http://oregonstate.edu/~reeset/marcedit/html/downloads.html Andy.
Re: MARC::Record and UTF-8
On Fri, Jan 07, 2005 at 08:53:40AM +0100, Ron Davies wrote: I will have a similar project in a few months' time, converting a whole bunch of processing from MARC-8 to UTF-8. I would be very happy to assist in testing or development of a UTF-8 capability for MARC::Record. Is the problem listed in rt.cpan.org (http://rt.cpan.org/NoAuth/Bug.html?id=3707) the only known issue? Correct. A few months ago I hacked at MARC::Record to try to get it to use utf8 for platforms that support perl = 5.8. I backed out these changes because my initial implememtation proved to be faulty. Essentially I treated all data as utf8 if perl was = 5.8 ... but this didn't work out since some valid MARC-8 data is invalid UTF-8. I was bummed. The problem (as Ron correctly points out) is that the Perl function length() is being used to construct the byte offsets in the record directory. This works fine when a character is a byte, but breaks badly on utf8 data since a character is more than one byte. Fortunately there is the bytes pragma which was introduced in 5.6 which has a bytes::length() function which computes the correct length. I belive that bytes::length() was introduced in 5.8 somewhere, it was added on later. I wanted MARC::Record to do the right thing based on position 9 in the leader. But I don't know if this is feasible. Perhaps simply having a flag when you create the MARC::Record, MARC::Batch or MARC::File::USMARC objects will be enough. my $batch = MARC::Batch( 'USMARC', 'file.dat', utf8 = 1 ); or my $record = MARC::Record-new( utf8 = 1 ); Comments, thoughts, hacks welcome :-) This shouldn't be too tough, it just needs some concentrated attention. //Ed
RE: MARC::Record and UTF-8
From: Ed Summers [mailto:[EMAIL PROTECTED] Sent: 07 January, 2005 09:56 To: perl4lib@perl.org Subject: Re: MARC::Record and UTF-8 On Fri, Jan 07, 2005 at 08:13:08AM -0500, Houghton,Andrew wrote: This is not a Perl solution, but if you are just looking to convert MARC-8 records to UTF-8 record you can use Terry Reese's MarcEdit program. Does MarcEdit completely map MARC-8 to UTF-8? Yes it does. I think he uses the LC code table XML document for his conversions. The URL is: http://www.loc.gov/marc/specifications/codetables.xml which can be found off the Character Sets: Code Tables page at: http://www.loc.gov/marc/specifications/specchartables.html Andy.
Re: MARC::Record and UTF-8
At 07:50 7/01/2005, [EMAIL PROTECTED] wrote: Does anyone know of any work underway to adapt MARC::Record for utf-8 encoding ? I will have a similar project in a few months' time, converting a whole bunch of processing from MARC-8 to UTF-8. I would be very happy to assist in testing or development of a UTF-8 capability for MARC::Record. Is the problem listed in rt.cpan.org (http://rt.cpan.org/NoAuth/Bug.html?id=3707) the only known issue? Ron Ron Davies Information and documentation systems consultant Av. Baden-Powell 1 Bte 2, 1200 Brussels, Belgium Email: ron(at)rondavies.be Tel:+32 (0)2 770 33 51 GSM:+32 (0)484 502 393