Thomas,

I've had a similar experience and will provide my solution below. I'm not sure it's optimal, but it works for me. I'm working with a file, not an HTTP POST. (In addition to what is below, I would suggest looking at how you specify the charset encoding of your POST to be sure it is what you think it is. That part is beyond me.)

As far as I can tell, Perl works in UTF-8 and can mangle diacritics given to it in other character sets. The key is that you encode TWICE. First to get it into Perl, then once more right before you put data in the database. As soon as Perl does any transformations on text, it seems to go back to UTF-8. When I leave off the first or second encoding, I get mangled diacritics.

use Encode;
my $file = "file data in iso-8859-1 or LATIN1";

# this could be a string too, i.e., what you receive from your POST, but then you would use the second command below, I think

open (F, "<:encoding(iso-8859-1)", $file)

#This gets the data in cleanly. You do transformations on the text as you please, but then Perl has it in UTF-8 again. So *right before* you put it in your SQL query, take your $string and put it into the proper encoding for your database.

$string = encode("iso-8859-1", $string);

#Probably a good idea to use a bound parameter, i.e., ? in the query and provide the $string as a parameter in the execute command.

At least in my case, this solves the problem.

-Chris Cosner

[EMAIL PROTECTED] wrote:
Hi List,

I'm trying really hard the last 2 days to get around the problem UTF-8 to
ISO-8859-1

I receive a POST of an UTF-8 XML Document, declaration is okay, the document
is send by a Windows Server.

Now I have tried to convert the document to Latin1 (ISO-8859-1) by all the
ways I can imagine, but nothing really modifies the utf flag.

When I change the text to iso-8859-1 and I put it into my database (utf8
also as latin1) I get this sign " Â "  before the sign I want to save in my
database!

When I print out the string on the screen of the server (logfile) it shows
me that the data comes in with the utf-8 flag set on (Â sign I guess) an
after transforming it I print it out by data::dump and the signs become
something like \x{c2}\x.... the \x{c2} I guess is the special character set
by utf, okay now I transform the string using Unicode::String
And the string becomes Latin1 in the logfile, but in my database not, in the
UTF-8 table the signs are good, but in the latin1 table the signs become
weird.

Maybe someone has a hint how to convert a XML::Simple document (by POST) in
UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe it
into my latin1 table!

Tanks for any help


Ciao Thomas





--
Chris Cosner

Systems Administrator
Stanford University Press
1450 Page Mill Road
Palo Alto, CA 94304
(650) 724-7276
[EMAIL PROTECTED]
http://www.sup.org

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to