On 2008-08-06 18:55, Antoine Pitrou wrote:
Martin v. Löwis <martin <at> v.loewis.de> writes:
URLs are just not made for non-ASCII characters.

Perhaps they are not, but every non-English wiki (just to take a simple, generic
example) potentially contains non-ASCII URLs.
e.g. http://fr.wikipedia.org/wiki/%C3%89l%C3%A9phant
http://wiki.python.org/moin/J%C3%BCrgenHermann
(notice the utf-8 encoding in both)

Implement IRIs if you want non-ASCII characters; the rules are much clearer
for these.

I think most people would expect something which works with the current World
Wide Web rather than a rigorous implementation of a specific RFC. Implementing
RFCs is fine but it does not magically eliminate all problems, especially when
the RFCs themselves are not in sync with real-world usage.

+1. Practicality beats purity...

The web is moving towards UTF-8 as standard Unicode encoding, so
it's probably wise to follow that approach for quote().

http://en.wikipedia.org/wiki/Percent-encoding

The other way around will also have to deal with old-style URLs
which typically still use the Latin-1 encoding which was the
basis for HTML:

http://www.w3schools.com/TAGS/ref_urlencode.asp

So unquote() should probably try to decode using UTF-8 first
and then fall back to Latin-1 if that doesn't work.

Whether the result of quote()/unquote() should be bytes or
Unicode is a different story and probably also depends on
what the application does with the result. I don't think there's
a good general answer for that one, except maybe just going
for one possible combination and document that.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 05 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to