Hello Jean-Christophe,
Am 12.06.2013 um 16:44 schrieb Jean-Christophe Boggio:
> Hello,
>
> Can someone help me understand what could cause this :
>
> warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8");
> warn "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? "utf8" :
> "not utf8");
> warn "content4=$content";
> if ($ticketdata[0]->[0] ne $content) {
> warn "content5=$content";
> #
> warn "content6=$content stored=".$ticketdata[0]->[0];
> warn "content7=$content";
> }
>
[...]
> I guess the problem comes from the fact that on the same line I have one
> utf-8 variable and one non-utf8 one.
>
> $content comes from $fdat{content} (not marked as utf8 while the page
> encoding is declared and recognized as utf-8).
>
> What can I do to force embperl to always set the utf-8 flag on $fdat{...} ?
>
> If you know a way of telling Apache/EmbPerl that no encoding other than UTF-8
> exist in the world, I'll take it. And it's not a problem if I'm incompatible
> with anything.
I guess your guess is right - having one utf8 flagged variable in a statement
converts all other things to utf8 also - and perl uses ISO-8895-1 for the
conversion!
So your string is destroyed after that. The same thing happens, when you use a
Freeze::Thaw or a DataDumper - bad for serializing and storing something in a
database :-(
Embperl decides for itself, if the %fdat parameters are utf8 or not - I don't
know, how it does so, maybe Gerald could say something about that - but we had
a lot of "funny" things in the past regarding this problem. Our website is in
different encodings (not UTF8 and not ISO-8859-1) so we ran in the trouble. We
implemented an own "thaw" method which tries to thaw the data and if that
fails, it converts the data to utf8 and thaws it again...
A solution for you could be: use "$content=decode('UTF-8',$content)" to flag
your variable or walk over %fdat to do it with all keys which are not already
utf8-flagged. After that, you should have UTF8-only variables and everything
works as expected.
One little additional comment: using non utf8-flagged variables with
utf8-content (as your $content variable) breaks a lot of perl stuff: lc, uc,
cmp, le, gt, length, sort, ....
With best regards,
Dirk Melchers
/// IT/Software-Development ///
NUREG GmbH ///
Dorfäckerstraße 31 | 90427 Nürnberg | Germany
Tel. +49-911-32002-256 | Fax +49-911-32002-299
Mobil +49-172-9354670 | www.nureg.de
Nürnberg HRB 22653 | USt.ID DE 814 685 653
Geschäftsführer: Michael Schmidt, Stefan Boas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]