On 5 Aug 2009, at 16:46 , Martin P. Hellwig wrote:
Hi List,
On several occasions I have needed (and build) a parser that reads a
binary piece of data with custom structure. For example (bogus one):
BE
+---------+---------+-------------+-------------+------+--------+
| Version | Command | Instruction | Data Length | Data | Filler |
+---------+---------+-------------+-------------+------+--------+
Version: 6 bits
Command: 4 bits
Instruction: 5 bits
Data Length: 5 bits
Data: 0-31 bits
Filler: filling 0 bits to make the packet dividable by 8
what I usually do is read the packet in binary mode, convert the
output to a concatenated 'binary string'(i.e. '0101011000110') and
then use slice indeces to get the right data portions.
Depending on what I need to do with these portions I convert them to
whatever is handy (usually an integer).
This works out fine for me. Most of the time I also put the ASCII
art diagram of this 'protocol' as a comment in the code, making it
more readable/understandable.
Though there are a couple of things that bothers me with my approach:
- This seems such a general problem that I think that there must be
already a general pythonic solution.
- Using a string for binary representation takes at least 8 times
more memory for the packet than strictly necessary.
- Seems to need a lot of prep work before doing the actual parsing.
Any suggestion is greatly appreciated.
The gold standard for binary parsing (and serialization) is probably
Erlang's bit syntax, but as far as Python goes you might be interested
by Hachoir (http://hachoir.org/ but it seems down right now).
It's not going to match your second point, but it can probably help
with the rest (caveat: I haven't used hachoir personally).
--
http://mail.python.org/mailman/listinfo/python-list