Re: [newbie] String to binary conversion
Steven D'Aprano於 2012年8月7日星期二UTC+8上午10時01分05秒寫道: On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote: If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. First you have to know the encoding, as that will define the integers you get. There are many 8-bit encodings, but of course they can't all encode arbitrary 4-character strings. Since there are tens of thousands of different characters, and an 8-bit encoding can only code for 256 of them, there are many strings that an encoding cannot handle. For those, you need multi-byte encodings like UTF-8, UTF-16, etc. Sticking to one-byte encodings: since most of them are compatible with ASCII, examples with abcd aren't very interesting: py 'abcd'.encode('latin1') b'abcd' Even though the bytes object b'abcd' is printed as if it were a string, it is actually treated as an array of one-byte ints: py b'abcd'[0] 97 Here's a more interesting example, using Python 3: it uses at least one character (the Greek letter π) which cannot be encoded in Latin1, and two which cannot be encoded in ASCII: py aπ©d.encode('iso-8859-7') b'a\xf0\xa9d' Most encodings will round-trip successfully: py text = 'aπ©Z!' py data = text.encode('iso-8859-7') py data.decode('iso-8859-7') == text True (although the ability to round-trip is a property of the encoding itself, not of the encoding system). Naturally if you encode with one encoding, and then decode with another, you are likely to get different strings: py text = 'aπ©Z!' py data = text.encode('iso-8859-7') py data.decode('latin1') 'að©Z!' py data.decode('iso-8859-14') 'aŵ©Z!' Both the encode and decode methods take an optional argument, errors, which specify the error handling scheme. The default is errors='strict', which raises an exception. Others include 'ignore' and 'replace'. py 'aŵðπ©Z!'.encode('ascii', 'ignore') b'aZ!' py 'aŵðπ©Z!'.encode('ascii', 'replace') b'aZ!' -- Steven Steven D'Aprano於 2012年8月7日星期二UTC+8上午10時01分05秒寫道: On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote: If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. First you have to know the encoding, as that will define the integers you get. There are many 8-bit encodings, but of course they can't all encode arbitrary 4-character strings. Since there are tens of thousands of different characters, and an 8-bit encoding can only code for 256 of them, there are many strings that an encoding cannot handle. For those, you need multi-byte encodings like UTF-8, UTF-16, etc. Sticking to one-byte encodings: since most of them are compatible with ASCII, examples with abcd aren't very interesting: py 'abcd'.encode('latin1') b'abcd' Even though the bytes object b'abcd' is printed as if it were a string, it is actually treated as an array of one-byte ints: py b'abcd'[0] 97 Here's a more interesting example, using Python 3: it uses at least one character (the Greek letter π) which cannot be encoded in Latin1, and two which cannot be encoded in ASCII: py aπ©d.encode('iso-8859-7') b'a\xf0\xa9d' Most encodings will round-trip successfully: py text = 'aπ©Z!' py data = text.encode('iso-8859-7') py data.decode('iso-8859-7') == text True (although the ability to round-trip is a property of the encoding itself, not of the encoding system). Naturally if you encode with one encoding, and then decode with another, you are likely to get different strings: py text = 'aπ©Z!' py data = text.encode('iso-8859-7') py data.decode('latin1') 'að©Z!' py data.decode('iso-8859-14') 'aŵ©Z!' Both the encode and decode methods take an optional argument, errors, which specify the error handling scheme. The default is errors='strict', which raises an exception. Others include 'ignore' and 'replace'. py 'aŵðπ©Z!'.encode('ascii', 'ignore') b'aZ!' py 'aŵðπ©Z!'.encode('ascii', 'replace') b'aZ!' -- Steven I think UTF-8 CODEC or UTF-16 is necessary, just recall those MS encoding codecs of Win98, and NT that collected taxes all over the world. Actually for each kind of some character encoding, please develop a codec to UTF-8 or UTF-16. It means one can make conversions between any two of the qualified character sets. --
[newbie] String to binary conversion
If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. M. K. Shen -- http://mail.python.org/mailman/listinfo/python-list
Re: [newbie] String to binary conversion
The binascii module looks like it might have something for you. I've never used it. Tobiah http://docs.python.org/library/binascii.html On 08/06/2012 01:46 PM, Mok-Kong Shen wrote: If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. M. K. Shen -- http://mail.python.org/mailman/listinfo/python-list
Re: [newbie] String to binary conversion
On 08/06/2012 01:59 PM, Tobiah wrote: The binascii module looks like it might have something for you. I've never used it. Having actually read some of that doc, I see it's not what you want at all. Sorry. -- http://mail.python.org/mailman/listinfo/python-list
Re: [newbie] String to binary conversion
Am 06.08.2012 22:59, schrieb Tobiah: The binascii module looks like it might have something for you. I've never used it. Thanks for the hint, but if I don't err, the module binascii doesn't seem to work. I typed: import binascii and a line that's given as example in the document: crc = binascii.crc32(hello) but got the following error message: TypeError: 'str' does not support the buffer interface. The same error message appeared when I tried the other functions. M. K. Shen -- http://mail.python.org/mailman/listinfo/python-list
Re: [newbie] String to binary conversion
On 06/08/2012 21:46, Mok-Kong Shen wrote: If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. Try this (Python 3, in which strings are Unicode): import struct # For a little-endian integer struct.unpack(I, abcd.encode(latin-1))[0] 1684234849 hex(_) '0x64636261' or this (Python 2, in which strings are bytestrings): import struct # For a little-endian integer struct.unpack(I, abcd)[0] 1684234849 hex(_) '0x64636261' -- http://mail.python.org/mailman/listinfo/python-list
Re: [newbie] String to binary conversion
On 8/6/2012 1:46 PM Mok-Kong Shen said... If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. It's easy to write one: def str2val(str,_val=0): if len(str)1: return str2val(str[1:],256*_val+ord(str[0])) return 256*_val+ord(str[0]) def val2str(val,_str=): if val256: return val2str(int(val/256),_str)+chr(val%256) return _str+chr(val) print str2val(abcd) print val2str(str2val(abcd)) print val2str(str2val(good)) print val2str(str2val(longer)) print val2str(str2val(verymuchlonger)) Flavor to taste. Emile -- http://mail.python.org/mailman/listinfo/python-list
Re: [newbie] String to binary conversion
On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote: If I have a string abcd then, with 8-bit encoding of each character, there is a corresponding 32-bit binary integer. How could I best obtain that integer and from that integer backwards again obtain the original string? Thanks in advance. First you have to know the encoding, as that will define the integers you get. There are many 8-bit encodings, but of course they can't all encode arbitrary 4-character strings. Since there are tens of thousands of different characters, and an 8-bit encoding can only code for 256 of them, there are many strings that an encoding cannot handle. For those, you need multi-byte encodings like UTF-8, UTF-16, etc. Sticking to one-byte encodings: since most of them are compatible with ASCII, examples with abcd aren't very interesting: py 'abcd'.encode('latin1') b'abcd' Even though the bytes object b'abcd' is printed as if it were a string, it is actually treated as an array of one-byte ints: py b'abcd'[0] 97 Here's a more interesting example, using Python 3: it uses at least one character (the Greek letter π) which cannot be encoded in Latin1, and two which cannot be encoded in ASCII: py aπ©d.encode('iso-8859-7') b'a\xf0\xa9d' Most encodings will round-trip successfully: py text = 'aπ©Z!' py data = text.encode('iso-8859-7') py data.decode('iso-8859-7') == text True (although the ability to round-trip is a property of the encoding itself, not of the encoding system). Naturally if you encode with one encoding, and then decode with another, you are likely to get different strings: py text = 'aπ©Z!' py data = text.encode('iso-8859-7') py data.decode('latin1') 'að©Z!' py data.decode('iso-8859-14') 'aŵ©Z!' Both the encode and decode methods take an optional argument, errors, which specify the error handling scheme. The default is errors='strict', which raises an exception. Others include 'ignore' and 'replace'. py 'aŵðπ©Z!'.encode('ascii', 'ignore') b'aZ!' py 'aŵðπ©Z!'.encode('ascii', 'replace') b'aZ!' -- Steven -- http://mail.python.org/mailman/listinfo/python-list