Suggest Break into 2 problems.
1) Check the unicode/utf faq for perl5888 or whichever as appropriate.
(Perldoc.perl.org).
Sound like for you use you have multibyte chars being handled as 1-byte chars
because it was read or forced raw at one ponit.
2) If not fixed by reading differently, to fix a string with these chars as
you'd like. either (2a) do (1) before twig parses OR (2b) have twig apply it
inplace to each element/text() you're extracting, and also any attributes
you're keeping.
Bill @ XML2007 /
Bill, typing with thumbs
- Original Message -
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: boston-pm@mail.pm.org boston-pm@mail.pm.org
Sent: Wed Dec 05 14:21:55 2007
Subject: [Boston.pm] converting utf-8 to unicode from XML text gathered
byXML::Twig
Hi All,
I am currently using XML::Twig to read in some XML.
This XML's text is in utf-8.
So there are smart-quotes and such in there.
I need to unicode-ify the text.
I tried using most of the methods that are part of XML::Twig, but came up
dry.
The best I could do is convert all unsupported chars to question marks.
Without any XML::Twig conversion the smart quotes come out looking
like: “ ” or ’
I tried doing a simple $val =~ s/’/'/gs;
But that didn't work either.
Does anyone have any suggestions on how I can do this conversion either
manually OR with XML::Twig methods?
Thanks.
--Alex
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm