Just van Rossum <[EMAIL PROTECTED]> wrote: > > Ron Adam wrote: > > > Josiah Carlson wrote: > > > Greg Ewing <[EMAIL PROTECTED]> wrote: > > >> u = unicode(b) > > >> u = unicode(b, 'utf8') > > >> b = bytes['utf8'](u) > > >> u = unicode['base64'](b) # encoding > > >> b = bytes(u, 'base64') # decoding > > >> u2 = unicode['piglatin'](u1) # encoding > > >> u1 = unicode(u2, 'piglatin') # decoding > > > > > > Your provided semantics feel cumbersome and confusing to me, as > > > compared with str/unicode.encode/decode() . > > > > > > - Josiah > > > > This uses syntax to determine the direction of encoding. It would be > > easier and clearer to just require two arguments or a tuple. > > > > u = unicode(b, 'encode', 'base64') > > b = bytes(u, 'decode', 'base64') > > > > b = bytes(u, 'encode', 'utf-8') > > u = unicode(b, 'decode', 'utf-8') > > > > u2 = unicode(u1, 'encode', 'piglatin') > > u1 = unicode(u2, 'decode', 'piglatin') > > > > > > > > It looks somewhat cleaner if you combine them in a path style string. > > > > b = bytes(u, 'encode/utf-8') > > u = unicode(b, 'decode/utf-8') > > It gets from bad to worse :( > > I always liked the assymmetry between > > u = unicode(s, "utf8") > > and > > s = u.encode("utf8") > > which I think was the original design of the unicode API. Cudos for > whoever came up with that.
I personally have never used that mechanism. I always used s.decode('utf8') and u.encode('utf8'). I prefer the symmetry that .encode() and .decode() offer. > When I saw > > b = bytes(u, "utf8") > > mentioned for the first time, I thought: why on earth must the bytes > constructor be coupled to the unicode API?!?! It makes no sense to me > whatsoever. It's not a 'unicode API'. See integers for another example where a second argument to a type object defines how to interpret the other argument, or even arrays/structs where the first argument defines the interpretation. > Bytes have so much more use besides encoded text. Agreed. > I believe (please correct me if I'm wrong) that the encoding argument of > bytes() was invented to make it easier to write byte literals. Perhaps a > true bytes literal notation is in order after all? Maybe, but I think the other earlier use-case was for using: s2 = bytes(s1, 'base64') If bytes objects recieved an .encode() method, or even a .tobytes() method. I could be misremembering. > My preference for bytes -> unicode -> bytes API would be this: > > u = unicode(b, "utf8") # just like we have now > b = u.tobytes("utf8") # like u.encode(), but being explicit > # about the resulting type > > As to base64, while it works as a codec ("Why a base64 codec? Because we > can!"), I don't find it a natural API at all, for such conversions. Depending on whose definiton of codec you listen to (is it a compressor/decompressor, or a coder/decoder?), either very little of what we have as 'codecs' are actual codecs (only zlib, etc.), or all of them are. I would imagine that base64, etc., were made into codecs, or really encodings, because base64 is an 'encoding' of binary data in base64 format. Similar to the way you can think of utf8 is an 'encoding' of textual data in utf8 format. I would argue, due to the "one obvious way to do it", that using encodings/codecs should be preferred to one-shot encoding/decoding functions in various modules (with some exceptions). These exceptions are things like pickle, marshal, struct, etc., which may take a non-basestring object and convert it into a byte string, which is arguably an encoding of the object in a particular format. - Josiah _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com