Hi folks,
I've been dragged kicking and screaming into the 21st century and am
making my mod_perl application fully utf-8 aware and transparent. It's
all going OK but I want to know if anyone has a better solution to
receiving form data containing non-ASCII chars.
Output is fine - I can override any Apache settings with
$r->content_type('text/html; charset=utf-8');
The puzzling bit was getting UTF-8 in and out of forms without mangling
it - I have it working but I want to know if anyone has a better
solution - here's what I do:
HTML: (the page is specifically set to be utf-8)
<form> <input name="test" value="some utf-8 text with optional HTML
entities" /> </form>
Perl:
use Encode;
sub handler {
my $r=shift;
my $q=Apache2::Request->new($r);
my $known_to_be_utf8 = $q->param('test'); # form post doesn't
give charset, none assumed
my $utf8_aware_string = decode_utf8( $known_to_be_utf8 );
......
# the above works (we get our data back in one piece)
# and of course the HTML entities have been turned into UTF-8 chars
}
I tried some form attributes:
enctype="multipart/form-data" - this doesn't specify a charset in
the content-type headers (tried IE6 and FF)
accept-charset="utf-8" - no change for me (as no charset
transformation required)
So there's no way for the server to know what charset the parameters are
in, the application has to know what to expect.
Any thoughts?
cheers
John