Re: urllib.unquote and unicode

Martin v. Löwis Thu, 21 Dec 2006 12:30:58 -0800

>>> The way that uri encoding is supposed to work is that first the input
>>> string in unicode is encoded to UTF-8 and then each byte which is not in
>>> the permitted range for characters is encoded as % followed by two hex
>>> characters. 
>> Can you back up this claim ("is supposed to work") by reference to
>> a specification (ideally, chapter and verse)?
> http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1


Thanks. Unfortunately, this isn't normative, but "we recommend". In
addition, it talks about URIs found HTML only. If somebody writes
a user agent written in Python, they are certainly free to follow
this recommendation - but I think this is a case where Python should
refuse the temptation to guess.

If somebody implemented IRIs, that would be an entirely different
matter.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.unquote and unicode

Reply via email to