On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > unicode handling redesign. I'm stating my reading of the RFC not to defend > > the use case Philip has, but because I think that the outlook that non-text > > uris (before being percentencoded) are violations of the RFC > > That's not what I'm saying. What I'm trying to point out is that > manipulating a bytes object as an URI sort of presumes a lot about its > encoding as text.
I think we're more or less in agreement now but here I'm not sure. What manipulations are you thinking about? Which stage of URI construction are you considering? I've just taken a quick look at python3.1's urllib module and I see that there is a bit of confusion there. But it's not about unicode vs bytes but about whether a URI should be operated on at the real URI level or the data-that-makes-a-uri level. * all functions I looked at take python3 str rather than bytes so there's no confusing stuff here * urllib.request.urlopen takes a strict uri. That means that you must have a percent encoded uri at this point * urllib.parse.urljoin takes regular string values * urllib.parse and urllib.unparse take regular string values > Since many of the URIs we deal with are more or > less textual, why not take advantage of that? > Cool, so to summarize what I think we agree on: * Percent encoded URIs are text according to the RFC. * The data that is used to construct the URI is not defined as text by the RFC. * However, it is very often text in an unspecified encoding * It is extremely convenient for programmers to be able to treat the data that is used to form a URI as text in nearly all common cases. -Toshio
pgpDvecDxPAjV.pgp
Description: PGP signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com