2010/12/14 Jaime Fernández <jjja...@gmail.com>: > Hi > To build a binary packet (for SMPP protocol), we have to concatenate > different types of data: integers, floats, strings. > We are using struct.pack to generate the binary representation of each > integer and float of the packet, and then they are concatenated with the + > operand. > However, for strings we directly concatenate the string with +, without > using struct. > Everything works with python 2 except when string encoding is introduced. > Whenever, a non ASCII char appears in the string, an exception is launched. > In python 3, it's not possible to do this trick because all the strings are > unicode. > What would be the best approach to: > - Support non-ascii chars (we just want to concatenate the binary > representation of the string without any modification) > - Compatibility between python 2 and python 3. > Thanks, > Jaime > --
I don't think you quite understand how encodings and unicode work.You have two similar, but distinct data types involved: a byte string ("" in python 2.x, b"" in Python 3.x) which is a sequence of bytes, and a unicode String (u"" in Python 2.x and "" in Python 3.x) which is a sequence of characters. Neither type of strings has an encoding associated with it- an encoding is just a function for converting between these two data types. You only get those non-ascii character problems when you try concatenating Unicode strings with byte strings, because Python defaults to using ASCII as the encoding when you don't specify the encoding yourself. If you want to avoid those errors (in both Python 2.x and Python 3.x), use the unicode string's encode method to turn the characters into a sequence of bytes before you concat them. -- http://mail.python.org/mailman/listinfo/python-list