FWIW, with CGI.pm I always iterate through the params and Encode::decode with 
the appropriate encoding with an exception for anything binary. (file uploads 
etc)


-----Original Message-----
From: André Warnier [mailto:a...@ice-sa.com] 
Sent: Thursday, February 24, 2011 3:31 PM
To: mod_perl list
Subject: CGI and character encoding

Hi.

I wonder if someone here can give me a clue as to where to look...

I am using
Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 mod_jk/1.2.26 PHP/5.2.6-1+lenny9 with 
Suhosin-Patch 
mod_ssl/2.2.9 OpenSSL/0.9.8g mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 
Perl/v5.10.0

perl -MCGI -e 'print $CGI::VERSION'
3.52

A perl cgi-bin script running under mod_perl, receives posted form parameters 
from a form 
defined as such :

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd";>
<html>
        <head>
         <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
....
  <body>
        <form action="/litfdm/litfdm.pl" name="form"
                enctype="multipart/form-data" charset="UTF-8" method="POST">
...
<input name="de-utf8" type="hidden" value="ÄäÖöÜü">
...

(Note: the html page itself has been saved as UTF-8 by an UTF-8 aware editor)


When I retrieve the above hidden field using

my $chars = $cgi->param('de-utf8');

the variable $chars does contain the proper UTF-8 encoded *bytes* for the above 
string (in 
other words, 2 bytes per character e.g.), but it arrives into the script 
/without/ the 
perl "utf8" flag set.

If I then use this value to print to a filehandle opened as such :

open(FH,'>:utf8',"myfile");
print FH $chars,"\n";

It comes out of course as .. well, I cannot type this on my keyboard, but 
anyone aware of 
double-encoding issues can imagine the "A-tilde Copyright A-tilde squiggle.. " 
result.

I can of course convert it, by using

$chars = Encode::decode('utf8',$cgi->param('de-utf8'));

but it is a p.i.t.a. and I would like to know if there is a way to retrieve the 
posted 
value directly as UTF-8, and if yes what this depends on.
(I cannot find a setting for instance in the CGI.pm module documentation.)


Thanks.
André

P.S.
Unfortunately, when the browser (Firefox 3.5.3) is posting this data to the 
server, it is 
posting it as something like

...
Content-Type    multipart/form-data; 
boundary=---------------------------326972172326727
...

-----------------------------326972172326727
Content-Disposition: form-data; name="de-utf8"

ÄäÖöÜü
-----------------------------326972172326727

which means that there is no charset header to the parts either.

Reply via email to