>>> The way that uri encoding is supposed to work is that first the input >>> string in unicode is encoded to UTF-8 and then each byte which is not in >>> the permitted range for characters is encoded as % followed by two hex >>> characters. >> Can you back up this claim ("is supposed to work") by reference to >> a specification (ideally, chapter and verse)? > http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1
Thanks. Unfortunately, this isn't normative, but "we recommend". In addition, it talks about URIs found HTML only. If somebody writes a user agent written in Python, they are certainly free to follow this recommendation - but I think this is a case where Python should refuse the temptation to guess. If somebody implemented IRIs, that would be an entirely different matter. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list