Re: Help with regex for URL with spanish characters

Jeremy Dunck Fri, 30 Jun 2006 15:25:37 -0700

On 6/30/06, mamcxyz <[EMAIL PROTECTED]> wrote:
>
> I'm building a site for restaurants, like a yellow pages.
>
> I wanna provide listing based in state / city /zones. Some citys in
> Colombia are "Medellín", "Santa Marta" and so on...
>
> So, the url are transformed to Medell%C3%ADn and Santa%20Marta.
>
> Easy, I think... in the urls:
>
> (r'^(?P<depto>[a-zA-Z0-9%\-]+)/(?P<city>[a-zA-Z0-9%\-]+)/$',
> 'restaurant.views.byCity' ),
>
> to match depto and city.
>
> I test this in the interactive mode:
>
> re.match(r'^(?P<depto>[a-zA-Z0-9%\-]+)/(?P<city>[a-zA-Z0-9%\-]+)/$',r'a/Bogot%C3%A1/').groups()
> >>('a', 'Bogot%C3%A1')
>
> However, the django site not can found this...
>
> Page not found (404)
>
> I don't find another regular _expression_ that work fine....

The %C3%AD is just escaping for the URL, and isn't actually a character you'll see from the web server.

The unicode character í is U+00ED.  A matching regex would be:
Medell\xEDn

How did I get from C3AD to 00ED?
I guessed that the URL is escaped UTF-8.

Medellín

Medell\xEDn

unicode
1110 1101
E    D

url (escaped utf-8)
range pattern:
110x xxxx 10xx  xxxx

The unicode bits:
       11   10  1101

left zero-pad the bits:
1100 0011 1010  1101
C    3    A     D

Yep, it's escaped UTF-8 since the escaped chars match the bit pattern for UTF-8-encoded U+00ED.

So, unless I'm a moron (in the Pilgrim sense[1]), in general, you can expect to get UTF-8 URLs.

See UTF-8 on Wikipedia if you're totally confused. :)

[1]
http://diveintomark.org/archives/2004/08/16/specs

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/django-users
-~----------~----~----~----~------~----~------~--~---

Re: Help with regex for URL with spanish characters

Reply via email to