Concatenate a string as binary bytes

2010-12-14 Thread Jaime Fernández
Hi

To build a binary packet (for SMPP protocol), we have to concatenate
different types of data: integers, floats, strings.

We are using struct.pack to generate the binary representation of each
integer and float of the packet, and then they are concatenated with the +
operand.
However, for strings we directly concatenate the string with +, without
using struct.

Everything works with python 2 except when string encoding is introduced.
Whenever, a non ASCII char appears in the string, an exception is launched.
In python 3, it's not possible to do this trick because all the strings are
unicode.

What would be the best approach to:
 - Support non-ascii chars (we just want to concatenate the binary
representation of the string without any modification)
 - Compatibility between python 2 and python 3.

Thanks,
Jaime
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Concatenate a string as binary bytes

2010-12-14 Thread MRAB

On 14/12/2010 19:50, Jaime Fernández wrote:

Hi

To build a binary packet (for SMPP protocol), we have to concatenate
different types of data: integers, floats, strings.

We are using struct.pack to generate the binary representation of each
integer and float of the packet, and then they are concatenated with the
+ operand.
However, for strings we directly concatenate the string with +, without
using struct.

Everything works with python 2 except when string encoding is
introduced. Whenever, a non ASCII char appears in the string, an
exception is launched. In python 3, it's not possible to do this trick
because all the strings are unicode.

What would be the best approach to:
  - Support non-ascii chars (we just want to concatenate the binary
representation of the string without any modification)
  - Compatibility between python 2 and python 3.


I'd say encode to UTF-8.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Concatenate a string as binary bytes

2010-12-14 Thread Benjamin Kaplan
2010/12/14 Jaime Fernández jjja...@gmail.com:
 Hi
 To build a binary packet (for SMPP protocol), we have to concatenate
 different types of data: integers, floats, strings.
 We are using struct.pack to generate the binary representation of each
 integer and float of the packet, and then they are concatenated with the +
 operand.
 However, for strings we directly concatenate the string with +, without
 using struct.
 Everything works with python 2 except when string encoding is introduced.
 Whenever, a non ASCII char appears in the string, an exception is launched.
 In python 3, it's not possible to do this trick because all the strings are
 unicode.
 What would be the best approach to:
  - Support non-ascii chars (we just want to concatenate the binary
 representation of the string without any modification)
  - Compatibility between python 2 and python 3.
 Thanks,
 Jaime
 --

I don't think you quite understand how encodings and unicode work.You
have two similar, but distinct data types involved: a byte string (
in python 2.x, b in Python 3.x) which is a sequence of bytes, and a
unicode String (u in Python 2.x and  in Python 3.x) which is a
sequence of characters. Neither type of strings has an encoding
associated with it- an encoding is just a function for converting
between these two data types.

You only get those non-ascii character problems when you try
concatenating Unicode strings with byte strings, because Python
defaults to using ASCII as the encoding when you don't specify the
encoding yourself. If you want to avoid those errors (in both Python
2.x and Python 3.x), use the unicode string's encode method to turn
the characters into a sequence of bytes before you concat them.
-- 
http://mail.python.org/mailman/listinfo/python-list