Geoffrey Young wrote:
John ORourke wrote:
Eli Shemer wrote:

For some reason the following test doesn’t print anything out to the screen

I'm not sure why you get nothing, but I can tell you strings read from Apache objects come through as octets and need to be decoded before use. We're using UTF-8 chars in URLs but I've never used one in a GET request parameter.

I can't say why it doesn't work, but I'm surprised it would in either case - the only characters explicitly allowed in a uri are us-ascii. from rfc2396:


My bad memory there - you are quite correct. The way we do it is the accepted way - to URL-encode the UTF-8 encoded text, and that will work with URLs and parameters.

eg:

http://www....../categories/name/ty%C3%B6kalut-lamput

is the correct form of:

http://www....../categories/name/työkalut-lamput


encode before printing:

$octets = utf8_encode($my_utf8_string); # make octets
$octets =~ s/([^\041-\177])/sprintf("%%%02X",ord($1))/ge; # URL-encode non-ASCII chars
$r->print($octets);
(the above is simplified - you'll also need to encode question marks etc)

decode after reading:

$url = utf8_decode ( $r->uri() );
or
$param = utf8_decode ( $r->param('info') );

cheers
John


Reply via email to