#4594: django.core.urlresolvers.reverse_helper doesn't unescape characters that
are escaped in URL regexes
----------------------------------------------+-----------------------------
Reporter:  Todd O'Bryan <[EMAIL PROTECTED]>  |       Owner:  jacob             
  Status:  new                                |   Component:  Uncategorized     
 Version:  SVN                                |    Keywords:  url reverse escape
   Stage:  Unreviewed                         |   Has_patch:  1                 
----------------------------------------------+-----------------------------
 Any backslash-escaped character in a URL doesn't get unescaped by the `{%
 url ...%}` tag (and presumably by other methods of view reversing). For
 example,
 {{{
 urlpatterns = patterns('',
     (r'^prices/less_than_\$(?P<price>\d+)/$', 'cost_less'),
     (r'^headlines/(?P<year>/d+)\.(?P<month>\d+)\.(?P<day>\d+)/$',
 'daily_headlines'),
     (r'^priests/(?P<name>\w+)\+/$', 'priest_homepage'),
     (r'^windows_path/(?P<drive_name>[A-Z]):\\\\(?P<path>.+)',
 'windows_path'),
 )
 }}}
 The dollar sign, dot, plus, and backslash in each of the URL patterns
 match a single character, but don't get converted back to that character
 by the reverse function.
 
 It seems that there aren't that many of these. Any escape sequence that
 doesn't match a constant string (i.e. something like `\s` or `\d` or `\w`)
 had better be part of a pattern so that it can be replaced with the right
 string to get the URL you're expecting. That leaves the following, I
 think.
 || Pattern || Replacement ||
 || `\A`    || `''` (equivalent to `^`)||
 || `\Z` || `''` (equivalent to `$`)||
 || `\b` and `\B` || `''` (these ''shouldn't'' appear in urls, but can only
 match the empty string)||
 || `\.`, `\^`, `\$`, `\*`, `\+`, `\?`, `\(`, `\)`, `\{`, `\}`, `\[`, `\]`,
 and `\\` || the same character, without a backslash||
 As a first stab, I'd just get rid of `\A`, `\Z`, `\b`, and `\B`, just as
 the current code does for `^` and `$`. This is actually kind of
 complicated, because you have to make sure that the `\` in front isn't
 part of a pair of backslashes. In other words, `\\b` should become `\b`,
 but `\\\b` should just become `\`. Also, the current code removes all `^`
 and `$`. That's wrong if they're preceded by a backslash and meant to be
 the actual character.
 
 There are some gotchas--when you insert values, you have to escape
 characters that you'll be unescaping later. I do check for character
 classes that don't map to a single definite character (e.g., `\d` and
 `\w`) and raise an exception if they're still there when we finish (since
 the reverse lookup can't work). I don't check for things like `[a-z]` or
 `a{2,3}`, but that will almost guarantee the reversing fails, too.
 
 Note that #2977 also addresses this problem, but it does other things,
 too. Also I think that code may not handle some corner cases correctly.
 Meanwhile, my patch may be overly agressive and might include handling for
 characters that will never appear in a URL.
 
 Give SmileyChris and me some time to work this out.

-- 
Ticket URL: <http://code.djangoproject.com/ticket/4594>
Django Code <http://code.djangoproject.com/>
The web framework for perfectionists with deadlines
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to