Thanks Gents,

I've got a certain level of abstraction as per Jonathan's approach, which I can just add the libapreq method.

The note about DBD::MySQL is interesting, I was wondering about that!

cheers
John


Clinton Gormley wrote:
Hi John

I've been using libapreq, which has a charset method:
http://search.cpan.org/~joesuf/libapreq2-2.08/glue/perl/xsbuilder/APR/Request/Param/Param.pod#charset

It is fairly limited, it recognises:

0 APREQ_CHARSET_ASCII (7-bit us-ascii)
1 APREQ_CHARSET_LATIN1 (8-bit iso-8859-1)
2 APREQ_CHARSET_CP1252 (8-bit Windows-1252)
8 APREQ_CHARSET_UTF8 (utf8 encoded Unicode)

but this has been working fine for me on IE 6, 7, Firefox and Opera. I
think (not sure) that these more modern browsers do try to respect the
character set of the web page.

It hasn't been tested to the point that I am certain that it works every
time, but I've had no problems with it over the last year of use.


Don't forget the other part, which is that, if you put UTF8 into the
database, you may need to reset the UTF8 flag when you get the data back
again.

The new DBD::MySQL driver has added this automatically, but I haven't
tried it - I've been using my own wrapper on an older driver which I
know works. Not sure about other drivers, but (again) I "think" there is
reasonable support for UTF8 on the more popular ones.

Once you're happy with the fact that the data coming in and out of your
system is UTF8, it makes life a lot easier.  Things like filtering input
data with \w just work.

good luck

Clint

Perl:
    use Encode;
    sub handler {
       my $r=shift;
       my $q=Apache2::Request->new($r);
my $known_to_be_utf8 = $q->param('test'); # form post doesn't give charset, none assumed
       my $utf8_aware_string = decode_utf8( $known_to_be_utf8 );
       ......
       # the above works (we get our data back in one piece)
       # and of course the HTML entities have been turned into UTF-8 chars
    }

I tried some form attributes:
enctype="multipart/form-data" - this doesn't specify a charset in the content-type headers (tried IE6 and FF) accept-charset="utf-8" - no change for me (as no charset transformation required)

So there's no way for the server to know what charset the parameters are in, the application has to know what to expect.

Any thoughts?

cheers
John




Reply via email to