John ORourke wrote:
Eli Shemer wrote:

For some reason the following test doesn’t print anything out to the screen

Do I need to change something in the apache configuration, or mod_perl’s ?

/articles_read.pl?id=חוזרת

## get http parameters

$r = shift;

$apr = Apache2::Request->new($r);

print  $apr->param('id');


I'm not sure why you get nothing, but I can tell you strings read from Apache objects come through as octets and need to be decoded before use. We're using UTF-8 chars in URLs but I've never used one in a GET request parameter.

I can't say why it doesn't work, but I'm surprised it would in either case - the only characters explicitly allowed in a uri are us-ascii. from rfc2396:

  2.4. Escape Sequences

   Data must be escaped if it does not have a representation using an
   unreserved character; this includes data that does not correspond to
   a printable character of the US-ASCII coded character set, or that
   corresponds to any US-ASCII character that is disallowed, as
   explained below.

I bit of googling turned up this cpan module:

  http://search.cpan.org/dist/URI-Find-UTF8/lib/URI/Find/UTF8.pm

where the docs point to a ja.wikipedia.org page. for me (firefox 2.0) clicking on the "original" uri (the one with the japanese characters) opens up a uri with the uri-escaped character sequence. it's like magic ;)

anyway, my point wasn't to get into some huge debate on whether people are (successfully) using utf-8 characters in uris, etc. rather, it is that mod_perl is (mostly) merely a wrapper around apache, and if something is improper wrt an official rfc apache generally dismisses it rather than bending to a behavior which people may be using anyway.

so, if it works, great. if not, try making your urls conform to 2396 and see if you have better results.

--Geoff

Reply via email to