John ORourke wrote:
Eli Shemer wrote:
For some reason the following test doesn’t print anything out to the
screen
Do I need to change something in the apache configuration, or
mod_perl’s ?
/articles_read.pl?id=חוזרת
## get http parameters
$r = shift;
$apr = Apache2::Request->new($r);
print $apr->param('id');
I'm not sure why you get nothing, but I can tell you strings read from
Apache objects come through as octets and need to be decoded before
use. We're using UTF-8 chars in URLs but I've never used one in a GET
request parameter.
I can't say why it doesn't work, but I'm surprised it would in either
case - the only characters explicitly allowed in a uri are us-ascii.
from rfc2396:
2.4. Escape Sequences
Data must be escaped if it does not have a representation using an
unreserved character; this includes data that does not correspond to
a printable character of the US-ASCII coded character set, or that
corresponds to any US-ASCII character that is disallowed, as
explained below.
I bit of googling turned up this cpan module:
http://search.cpan.org/dist/URI-Find-UTF8/lib/URI/Find/UTF8.pm
where the docs point to a ja.wikipedia.org page. for me (firefox 2.0)
clicking on the "original" uri (the one with the japanese characters)
opens up a uri with the uri-escaped character sequence. it's like magic ;)
anyway, my point wasn't to get into some huge debate on whether people
are (successfully) using utf-8 characters in uris, etc. rather, it is
that mod_perl is (mostly) merely a wrapper around apache, and if
something is improper wrt an official rfc apache generally dismisses it
rather than bending to a behavior which people may be using anyway.
so, if it works, great. if not, try making your urls conform to 2396
and see if you have better results.
--Geoff