Hi John

I've been using libapreq, which has a charset method:
http://search.cpan.org/~joesuf/libapreq2-2.08/glue/perl/xsbuilder/APR/Request/Param/Param.pod#charset

It is fairly limited, it recognises:

0 APREQ_CHARSET_ASCII (7-bit us-ascii)
1 APREQ_CHARSET_LATIN1 (8-bit iso-8859-1)
2 APREQ_CHARSET_CP1252 (8-bit Windows-1252)
8 APREQ_CHARSET_UTF8 (utf8 encoded Unicode)

but this has been working fine for me on IE 6, 7, Firefox and Opera. I
think (not sure) that these more modern browsers do try to respect the
character set of the web page.

It hasn't been tested to the point that I am certain that it works every
time, but I've had no problems with it over the last year of use.


Don't forget the other part, which is that, if you put UTF8 into the
database, you may need to reset the UTF8 flag when you get the data back
again.

The new DBD::MySQL driver has added this automatically, but I haven't
tried it - I've been using my own wrapper on an older driver which I
know works. Not sure about other drivers, but (again) I "think" there is
reasonable support for UTF8 on the more popular ones.

Once you're happy with the fact that the data coming in and out of your
system is UTF8, it makes life a lot easier.  Things like filtering input
data with \w just work.

good luck

Clint

> Perl:
>     use Encode;
>     sub handler {
>        my $r=shift;
>        my $q=Apache2::Request->new($r);
>        my $known_to_be_utf8 = $q->param('test'); # form post doesn't 
> give charset, none assumed
>        my $utf8_aware_string = decode_utf8( $known_to_be_utf8 );
>        ......
>        # the above works (we get our data back in one piece)
>        # and of course the HTML entities have been turned into UTF-8 chars
>     }
> 
> I tried some form attributes:
>     enctype="multipart/form-data" - this doesn't specify a charset in 
> the content-type headers (tried IE6 and FF)
>     accept-charset="utf-8" - no change for me (as no charset 
> transformation required)
> 
> So there's no way for the server to know what charset the parameters are 
> in, the application has to know what to expect.
> 
> Any thoughts?
> 
> cheers
> John
> 

Reply via email to