Re: CGI and character encoding

Cees Hek Thu, 24 Feb 2011 14:34:26 -0800

Hi André,

There is a perlmonks post from a few years ago that explains one way
of automating this with CGI.pm.  I've used this for several years now
without problems.


http://www.perlmonks.org/?node_id=651574

Just remember that decoding params is just one part of dealing with
utf-8.  You need to worry about any data coming into or going out of
your app (reading files, retrieving from DB, send HTML out to the
browser, etc...).  The following wiki book has some great information
on how to deal with utf-8 in your perl applications (and it also
includes the CGI.pm hack from Rhesa that I linked to above in the
perlmonks link).

http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

Cheers,

Cees Hek


On Fri, Feb 25, 2011 at 8:31 AM, André Warnier <a...@ice-sa.com> wrote:
> Hi.
>
> I wonder if someone here can give me a clue as to where to look...
>
> I am using
> Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 mod_jk/1.2.26 PHP/5.2.6-1+lenny9 with
> Suhosin-Patch mod_ssl/2.2.9 OpenSSL/0.9.8g mod_apreq2-20051231/2.6.0
> mod_perl/2.0.4 Perl/v5.10.0
>
> perl -MCGI -e 'print $CGI::VERSION'
> 3.52
>
> A perl cgi-bin script running under mod_perl, receives posted form
> parameters from a form defined as such :
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
>       "http://www.w3.org/TR/html4/loose.dtd";>
> <html>
>        <head>
>        <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
> ....
>  <body>
>        <form action="/litfdm/litfdm.pl" name="form"
>                enctype="multipart/form-data" charset="UTF-8" method="POST">
> ...
> <input name="de-utf8" type="hidden" value="ÄäÖöÜü">
> ...
>
> (Note: the html page itself has been saved as UTF-8 by an UTF-8 aware
> editor)
>
>
> When I retrieve the above hidden field using
>
> my $chars = $cgi->param('de-utf8');
>
> the variable $chars does contain the proper UTF-8 encoded *bytes* for the
> above string (in other words, 2 bytes per character e.g.), but it arrives
> into the script /without/ the perl "utf8" flag set.
>
> If I then use this value to print to a filehandle opened as such :
>
> open(FH,'>:utf8',"myfile");
> print FH $chars,"\n";
>
> It comes out of course as .. well, I cannot type this on my keyboard, but
> anyone aware of double-encoding issues can imagine the "A-tilde Copyright
> A-tilde squiggle.. " result.
>
> I can of course convert it, by using
>
> $chars = Encode::decode('utf8',$cgi->param('de-utf8'));
>
> but it is a p.i.t.a. and I would like to know if there is a way to retrieve
> the posted value directly as UTF-8, and if yes what this depends on.
> (I cannot find a setting for instance in the CGI.pm module documentation.)
>
>
> Thanks.
> André
>
> P.S.
> Unfortunately, when the browser (Firefox 3.5.3) is posting this data to the
> server, it is posting it as something like
>
> ...
> Content-Type    multipart/form-data;
> boundary=---------------------------326972172326727
> ...
>
> -----------------------------326972172326727
> Content-Disposition: form-data; name="de-utf8"
>
> Ã„Ã¤Ã–Ã¶ÃœÃ¼
> -----------------------------326972172326727
>
> which means that there is no charset header to the parts either.
>

Re: CGI and character encoding

Reply via email to