Re: utf8 urls

Geoffrey Young Wed, 19 Mar 2008 05:54:49 -0700


John ORourke wrote:

Eli Shemer wrote:
For some reason the following test doesn’t print anything out to thescreen
Do I need to change something in the apache configuration, ormod_perl’s ?
/articles_read.pl?id=חוזרת
## get http parameters

$r = shift;

$apr = Apache2::Request->new($r);

print  $apr->param('id');
I'm not sure why you get nothing, but I can tell you strings read fromApache objects come through as octets and need to be decoded beforeuse. We're using UTF-8 chars in URLs but I've never used one in a GETrequest parameter.

I can't say why it doesn't work, but I'm surprised it would in eithercase - the only characters explicitly allowed in a uri are us-ascii.from rfc2396:


  2.4. Escape Sequences

   Data must be escaped if it does not have a representation using an
   unreserved character; this includes data that does not correspond to
   a printable character of the US-ASCII coded character set, or that
   corresponds to any US-ASCII character that is disallowed, as
   explained below.

I bit of googling turned up this cpan module:

  http://search.cpan.org/dist/URI-Find-UTF8/lib/URI/Find/UTF8.pm

where the docs point to a ja.wikipedia.org page. for me (firefox 2.0)clicking on the "original" uri (the one with the japanese characters)opens up a uri with the uri-escaped character sequence. it's like magic ;)

anyway, my point wasn't to get into some huge debate on whether peopleare (successfully) using utf-8 characters in uris, etc. rather, it isthat mod_perl is (mostly) merely a wrapper around apache, and ifsomething is improper wrt an official rfc apache generally dismisses itrather than bending to a behavior which people may be using anyway.

so, if it works, great. if not, try making your urls conform to 2396and see if you have better results.


--Geoff

Re: utf8 urls

Reply via email to