Suggest Break into 2 problems. 

1) Check the unicode/utf faq for perl5888 or whichever as appropriate. 
(Perldoc.perl.org). 
Sound like for you use you have multibyte chars being handled as 1-byte chars 
because it was read or forced raw at one ponit. 

2) If not fixed by reading differently, to fix a string with these chars as 
you'd like. either (2a) do (1) before twig parses OR (2b) have twig apply it 
inplace to each element/text() you're extracting, and also any attributes 
you're keeping.   

Bill @ <XML2007 />

Bill, typing with thumbs

----- Original Message -----
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: boston-pm@mail.pm.org <boston-pm@mail.pm.org>
Sent: Wed Dec 05 14:21:55 2007
Subject: [Boston.pm] converting utf-8 to unicode from XML text gathered 
byXML::Twig

Hi All,
I am currently using XML::Twig to read in some XML.
This XML's text is in utf-8.
So there are smart-quotes and such in there.
I need to unicode-ify the text.
I tried using most of the methods that are part of XML::Twig, but came up
dry.
The best I could do is convert all unsupported chars to question marks.
Without any XML::Twig conversion the smart quotes come out looking
like: “ ” or ’
I tried doing a simple $val =~ s/’/'/gs;
But that didn't work either.

Does anyone have any suggestions on how I can do this conversion either
manually OR with XML::Twig methods?

Thanks.
--Alex
 
_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

 
_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to