On 6/30/06, mamcxyz <[EMAIL PROTECTED]> wrote:
>
> I'm building a site for restaurants, like a yellow pages.
>
> I wanna provide listing based in state / city /zones. Some citys in
> Colombia are "Medellín", "Santa Marta" and so on...
>
> So, the url are transformed to Medell%C3%ADn and Santa%20Marta.
>
> Easy, I think... in the urls:
>
> (r'^(?P<depto>[a-zA-Z0-9%\-]+)/(?P<city>[a-zA-Z0-9%\-]+)/$',
> 'restaurant.views.byCity' ),
>
> to match depto and city.
>
> I test this in the interactive mode:
>
> re.match(r'^(?P<depto>[a-zA-Z0-9%\-]+)/(?P<city>[a-zA-Z0-9%\-]+)/$',r'a/Bogot%C3%A1/').groups()
> >>('a', 'Bogot%C3%A1')
>
> However, the django site not can found this...
>
> Page not found (404)
>
> I don't find another regular _expression_ that work fine....
The %C3%AD is just escaping for the URL, and isn't actually a character you'll see from the web server.
The unicode character í is U+00ED. A matching regex would be:
Medell\xEDn
How did I get from C3AD to 00ED?
I guessed that the URL is escaped UTF-8.
Medellín
Medell\xEDn
unicode
1110 1101
E D
url (escaped utf-8)
range pattern:
110x xxxx 10xx xxxx
The unicode bits:
11 10 1101
left zero-pad the bits:
1100 0011 1010 1101
C 3 A D
Yep, it's escaped UTF-8 since the escaped chars match the bit pattern for UTF-8-encoded U+00ED.
So, unless I'm a moron (in the Pilgrim sense[1]), in general, you can expect to get UTF-8 URLs.
See UTF-8 on Wikipedia if you're totally confused. :)
[1]
http://diveintomark.org/archives/2004/08/16/specs
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/django-users
-~----------~----~----~----~------~----~------~--~---
- Help with regex for URL with spanish characters mamcxyz
- Re: Help with regex for URL with spanish characters Jeremy Dunck