Phillip J. Eby wrote: >>>> Why not just have the constructor be: >>>> >>>> bytes(initializer [,encoding]) >>>> >>>> Where initializer must be either an iterable of suitable integers, or a >>>> unicode/string object. If the latter (i.e., it's a basestring), the >>>> encoding argument would then be required. Then, there's no need for >>>> special codec support for the bytes type, since you call bytes on the >> thing >>>> to be encoded. And of course, no need for a 'b' literal. >>> It'd be cruel and unusual punishment though to have to write >>> >>> bytes("abc", "Latin-1") >>> >>> I propose that the default encoding (for basestring instances) ought >>> to be "ascii" just like everywhere else. (Meaning, it should really be >>> the system default encoding, which defaults to "ascii" and is >>> intentionally hard to change.) >> We're talking about Py3k here: "abc" will be a Unicode string, >> so why restrict the conversion to 7 bits when you can have 8 bits >> without any conversion problems ? > > Actually, I thought we were talking about adding bytes() in 2.5.
Then we'd need to make the "ascii" encoding assumption again, just like Guido proposed. > However, now that you've brought this up, it actually makes perfect sense > to just use latin-1 as the effective encoding for both strings and > unicode. In Python 2.x, strings are byte strings by definition, so it's > only in 3.0 that an encoding would be required. And again, latin1 is a > reasonable, roundtrippable default encoding. It is. However, it's not a reasonable assumption of the default encoding since there are many encodings out there that special case the characters 0x80-0xFF, hence the choice of using ASCII as default encoding in Python. The conversion from Unicode to bytes is different in this respect, since you are converting from a "bigger" type to a "smaller" one. Choosing latin-1 as default for this conversion would give you all 8 bits, instead of just 7 bits that ASCII provides. > So, it sounds like making the encoding default to latin-1 would be a > reasonably safe approach in both 2.x and 3.x. Reasonable for bytes(): yes. In general: no. >> While we're at it: I'd suggest that we remove the auto-conversion >>from bytes to Unicode in Py3k and the default encoding along with >> it. In Py3k the standard lib will have to be Unicode compatible >> anyway and string parser markers like "s#" will have to go away >> as well, so there's not much need for this anymore. > > I thought all this was already in the plan for 3.0, but maybe I assume too > much. :) Wouldn't want to wait for Py4D :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 13 2006) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com