[issue3300] urllib.quote and unquote - Unicode issues

Bill Janssen Thu, 07 Aug 2008 14:17:14 -0700

Bill Janssen <[EMAIL PROTECTED]> added the comment:

My main fear with this patch is that "unquote" will become seen as
unreliable, because naive software trying to parse URLs will encounter
uses of percent-encoding where the encoded octets are not in fact UTF-8
bytes.  They're just some set of bytes.  A secondary concern is that it
will invisibly produce invalid data, because it decodes some
non-UTF-8-encoded string that happens to only use UTF-8-valid sequences
as the wrong string value.


Now, I have to confess that I don't know how common these use cases are
in actual URL usage.  It would be nice if there was some organization
that had a large collection of URLs, and could provide a test set we
could run a scanner over :-).

As a workaround, though, I've sent a message off to Larry Masinter to
ask about this case.  He's one of the authors of the URI spec.

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3300] urllib.quote and unquote - Unicode issues

Reply via email to