Hi folks,

I've been dragged kicking and screaming into the 21st century and am making my mod_perl application fully utf-8 aware and transparent. It's all going OK but I want to know if anyone has a better solution to receiving form data containing non-ASCII chars.

Output is fine - I can override any Apache settings with $r->content_type('text/html; charset=utf-8');

The puzzling bit was getting UTF-8 in and out of forms without mangling it - I have it working but I want to know if anyone has a better solution - here's what I do:

HTML:  (the page is specifically set to be utf-8)
<form> <input name="test" value="some utf-8 text with optional HTML entities" /> </form>

Perl:
   use Encode;
   sub handler {
      my $r=shift;
      my $q=Apache2::Request->new($r);
my $known_to_be_utf8 = $q->param('test'); # form post doesn't give charset, none assumed
      my $utf8_aware_string = decode_utf8( $known_to_be_utf8 );
      ......
      # the above works (we get our data back in one piece)
      # and of course the HTML entities have been turned into UTF-8 chars
   }

I tried some form attributes:
enctype="multipart/form-data" - this doesn't specify a charset in the content-type headers (tried IE6 and FF) accept-charset="utf-8" - no change for me (as no charset transformation required)

So there's no way for the server to know what charset the parameters are in, the application has to know what to expect.

Any thoughts?

cheers
John

Reply via email to