On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote: > At 11:08 AM 2/14/2006 -0500, James Y Knight wrote: >> I like it, it makes sense. Unicode strings are simply not allowed as >> arguments to the byte constructor. Thinking about it, why would it be >> otherwise? And if you're mixing str-strings and unicode-strings, that >> means the str-strings you're sometimes giving are actually not byte >> strings, but character strings anyhow, so you should be encoding >> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good >> spelling. > Actually, I think you mean: > > if isinstance(s_or_U, str): > s_or_U = s_or_U.decode('utf-8') > > b = bytes(s_or_U.encode('utf-8')) > > Or maybe: > > if isinstance(s_or_U, unicode): > s_or_U = s_or_U.encode('utf-8') > > b = bytes(s_or_U) > > Which is why I proposed that the boilerplate logic get moved *into* > the bytes constructor. I think this use case is going to be common > in today's Python, but in truth I'm not as sure what bytes() will > get used *for* in today's Python. I'm probably overprojecting > based on the need to use str objects now, but bytes aren't going to > be a replacement for str for a good while anyway.
I most certainly *did not* mean that. If you are mixing together str and unicode instances, the str instances _must be_ in the default encoding (ascii). Otherwise, you are bound for failure anyhow, e.g. ''.join(['\x95', u'1']). Str is used for two things right now: 1) a byte string. 2) a unicode string restricted to 7bit ASCII. These two uses are separate and you cannot mix them without causing disaster. You've created an interface which can take either a utf8 byte-string, or unicode character string. But that's wrong and can only cause problems. It should take either an encoded bytestring, or a unicode character string. Not both. If it takes a unicode character string, there are two ways of spelling that in current python: a "str" object with only ASCII in it, or a "unicode" object with arbitrary characters in it. bytes(s_or_U.encode('utf-8')) works correctly with both. James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com