#22223: reverse() escapes unreserved characters ----------------------------------+-------------------------------------- Reporter: erik.van.zijst@… | Owner: nobody Type: Bug | Status: new Component: Core (URLs) | Version: 1.6 Severity: Normal | Resolution: Keywords: | Triage Stage: Unreviewed Has patch: 0 | Needs documentation: 0 Needs tests: 0 | Patch needs improvement: 0 Easy pickings: 0 | UI/UX: 0 ----------------------------------+--------------------------------------
Comment (by aaugustin): This change is indeed documented in the release notes. It's a consequence of #13260. See [https://code.djangoproject.com/ticket/13260#comment:17 my analysis] for details. The change suggested here would still preserve the requirements of #13260, which was primarily concerned with % characters in variable parts of URLs. Now the question is -- what characters do we consider safe? By default [http://docs.python.org/2/library/urllib.html#urllib.quote urllib.quote] preserves `A-Za-z0-9_.-` and characters defined as safe, which default to `/`. Based on RFC 1738: 1. {{{ <>"#%{}|\^~[]`}}} are unsafe and must be encoded (that list includes the SPACE character). 2. `;/?:@=&` are reserved and must be encoded unless they are used for their special meaning. 3. `$-_.+!*'(),` are safe and need not be encoded. We can certainly put the third set of characters in the safe list. If characters from the second set end up unencoded in URLs generated by Django, we start relying on user-agent quirks to re-encode them properly in HTTP request lines. However, `/` is part of this list and considered safe by the stdlib by default (which may not mean much; the stdlib contains many unfortunate API choices). Sinc the path segment always starts with a slash following the host and ends at the end of the URL or with one of `?`, `;` or `#` (which is always unsafe), we may choose to preserve `/:@=&`, that is, all of the second set except for `?` and `;`. If we want to be more careful, we may choose to preserve only `/` and `:` because `/` is safe by default and `:` is only used to separate the protocol from the remainder of the URL. That would resolve your problem. Can you clarify how you came up with `/:@&=+$,`? If you're including some characters from the third set above, you should probably include all of them. The fix should be backported to 1.6.x since it's a regression. -- Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:2> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/082.06eb53dab5859205cc66c76e3fd36d54%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.