On Fri, 07 Jan 2005 08:53:40 +0100, Ron Davies <[EMAIL PROTECTED]> wrote: > At 07:50 7/01/2005, [EMAIL PROTECTED] wrote: > >Does anyone know of any work underway to adapt MARC::Record for utf-8 > >encoding ?
I'm in the process of updating MARC::File::XML to support unicode. I was hoping to have the changes in CVS about a month ago, but I've had no time until now. Once that is done I'll look into what it will take to do the same for MARC::File::USMARC. If you'd like to look into it you'll be able to grab an updated MARC::File::XML from sourceforge's CVS some time this afternoon. I'll announce it here when I get CVS updated, and post a link to the anon cvs instructions from the project page. > > I will have a similar project in a few months' time, converting a whole > bunch of processing from MARC-8 to UTF-8. I would be very happy to assist > in testing or development of a UTF-8 capability for MARC::Record. Is the > problem listed in rt.cpan.org (http://rt.cpan.org/NoAuth/Bug.html?id=3707) > the only known issue? The way I am getting around issues like this in MARC::File::XML is to strip the utf8 flag off the data using Encode::encode(), which gives me the raw bytes in the string. In that case length works correctly, outputting to a file does not complain about wide characters, and C-based XML libraries (libxml2 in my case) see the correct data. The only issue is that you cannot use the any Unicode-aware perl functions on the strings, everything is treated as 8-bit Extended ASCII (or Latin-1, or whatever non-Unicode codepage your locale is set up for). I can't find a reason why this is actually a problem other than for locale specific sorting, which is not an issue for XML as it is only used as an input/output format; other software, usually written in C, handles actually manipulating the data. ... Not that that applies *directly* to your question ... :) -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org