Re: Parsing Binary Structures; Is there a better way / What is your way?
On Aug 5, 10:46 am, "Martin P. Hellwig" wrote: > Hi List, > > On several occasions I have needed (and build) a parser that reads a > binary piece of data with custom structure. For example (bogus one): > > BE > +-+-+-+-+--++ > | Version | Command | Instruction | Data Length | Data | Filler | > +-+-+-+-+--++ > Version: 6 bits > Command: 4 bits > Instruction: 5 bits > Data Length: 5 bits > Data: 0-31 bits > Filler: filling 0 bits to make the packet dividable by 8 hi, sorry i'm a bit late here, but lepl does exactly this. also, as you asked for in another post, the BitString class allows arbitrary indexing into a sequence of bits. and because it's part of a recursive descent parser you can match "anything" (lepl will handle - although less efficiently - left recursive and ambiguous grammars). see the example at http://www.acooke.org/lepl/binary.html#matching which constructs an ethernet frame and then parses it, extracting the source and destination addresses. feel free to email me with more questions. disclaimer: this is quite new and i don't know of anyone that actually uses it; it is also Python3 only (because it uses bytes()). andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On Thursday 06 August 2009 20:50:30 Martin P. Hellwig wrote: > Thanks all for your insights and suggestions. > It seems to me that there are a couple of ways to this bit manipulation > and a couple of foreign modules to assist you with that. > > Would it be worth the while to do a PEP on this? > Personally I think that it would be nice to have a standard module in > Python for this but perhaps there is a good reason that there isn't > already one. I agree that it would be nice - I think it would be nice if something like this is added to the struct module to give a "this is the obvious way that python handles foreign data" module - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
Thanks all for your insights and suggestions. It seems to me that there are a couple of ways to this bit manipulation and a couple of foreign modules to assist you with that. Would it be worth the while to do a PEP on this? Personally I think that it would be nice to have a standard module in Python for this but perhaps there is a good reason that there isn't already one. -- MPH http://blog.dcuktec.com 'If consumed, best digested with added seasoning to own preference.' -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On Aug 5, 3:46 pm, "Martin P. Hellwig" wrote: > Hi List, > > On several occasions I have needed (and build) a parser that reads a > binary piece of data with custom structure. For example (bogus one): > > BE > +-+-+-+-+--++ > | Version | Command | Instruction | Data Length | Data | Filler | > +-+-+-+-+--++ > Version: 6 bits > Command: 4 bits > Instruction: 5 bits > Data Length: 5 bits > Data: 0-31 bits > Filler: filling 0 bits to make the packet dividable by 8 > > what I usually do is read the packet in binary mode, convert the output > to a concatenated 'binary string'(i.e. '0101011000110') and then use > slice indeces to get the right data portions. > Depending on what I need to do with these portions I convert them to > whatever is handy (usually an integer). > > This works out fine for me. Most of the time I also put the ASCII art > diagram of this 'protocol' as a comment in the code, making it more > readable/understandable. > > Though there are a couple of things that bothers me with my approach: > - This seems such a general problem that I think that there must be > already a general pythonic solution. > - Using a string for binary representation takes at least 8 times more > memory for the packet than strictly necessary. > - Seems to need a lot of prep work before doing the actual parsing. > > Any suggestion is greatly appreciated. > > -- > MPHhttp://blog.dcuktec.com > 'If consumed, best digested with added seasoning to own preference.' Take a look at the bitstring module (in pypi or google code). It's designed to help make this sort of thing easy and it's more fully featured than BitVector or BitSet. Internally the data is stored as a byte array, so memory isn't wasted. It will also do all the dirty work of bit masking and shifting so that you can concentrate on the real problems. For example: >>> s = BitString('0x1232432312') # just to give us some data to play with >>> ver, comm, instr, bitlen = s.read('uint6, bin4, bin5, uint5') >>> data = s.readbits(bitlen) Different interpretations of the binary data are given using Python properties (e.g. s.hex, s.oct, s.uint, etc.) and it supports bit-wise slicing, modification, finding, replacing and more. It is also still in active development (full disclosure: I'm the author :-)). -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On Wednesday 05 August 2009 21:41:26 Martin P. Hellwig wrote: > Yes you are (of course) right, my 'dream' solution would be something > that accepts slice indeces on bit level. Your reasoning did reveal some > flaws in my approach though ;-) This is the first time I have been compared to the sandman... :-) - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On Wednesday 05 August 2009 20:12:05 Paul Rubin wrote: > "Martin P. Hellwig" writes: > > what I usually do is read the packet in binary mode, convert the > > output to a concatenated 'binary string'(i.e. '0101011000110') and > > Something wrong with reading the data words as an integer and using > old fashioned shifts and masks to get at the bit fields? There is nothing wrong with that, and in the final analysis, it is what is used - however, I can sympathize with the OP, as it is much nicer to just call a named thing than to juggle a bunch of hard coded shifts and masks. I think the real point here is sort of meta programming - what is needed is a way to describe and name the elements of the structure so that useful python objects are created. The code I posted for bits does that, and I use it all the time - it is almost as good as being back in 8031 assembler with direct named bit access. - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On Wednesday 05 August 2009 16:46:13 Martin P. Hellwig wrote: > Hi List, > > On several occasions I have needed (and build) a parser that reads a > binary piece of data with custom structure. For example (bogus one): > > BE > +-+-+-+-+--++ > > | Version | Command | Instruction | Data Length | Data | Filler | > > +-+-+-+-+--++ > Version: 6 bits > Command: 4 bits > Instruction: 5 bits > Data Length: 5 bits > Data: 0-31 bits > Filler: filling 0 bits to make the packet dividable by 8 > > what I usually do is read the packet in binary mode, convert the output > to a concatenated 'binary string'(i.e. '0101011000110') and then use > slice indeces to get the right data portions. > Depending on what I need to do with these portions I convert them to > whatever is handy (usually an integer). This bit banging stuff is a PITA, no matter what you do. Python does not have bit fields like C. And C bit fields are implementation dependent. Write an extension? Some time ago I asked a similar question, and Castironpi came up with what was essentially an indexed integer, with named bits. It stores the bits natively, but I suspect that the price paid is access time. I enclose a module that you can adapt. It talks about bytes but they are integers really. It is different from what you are doing, as it was aimed at reading and writing bits in a hardware context. If you get your head around the concept, then it may give you some ideas. It should be possible to extend the concept to pass name,length tuples at construction time instead of just a name with an implied length of one bit, and it may make sense to change the underlying type from an integer to an array of one byte integers. It is nice to be able to say: val = bitname() to read, and bitname(1) or bitname(0) to write. I can also write: outputbits[3] = 1 or val = inputbits[5] If you can successfully generalise it to field names It should be very useful. I cannot think of a way though, to not have the "in" and "out" split, but you can program your way around that - you do not have to update "in place". - Hendrik bits.py Description: application/python -- http://mail.python.org/mailman/listinfo/python-list
Re: Re: Parsing Binary Structures; Is there a better way / What is your way?
Paul Rubin wrote: "Martin P. Hellwig" writes: Is there an advantage using shifts and masks over my kitchen type solution? Weren't you complaining about the 8-to-1 expansion from turning each bit to an ascii char? One warning to Martin: If you want your code portable across systems, watch out for big-endian/little-endian issues, as well as alignment ones. Shift & mask code tends to be highly specific to a particular endian-ness, especially if trying to get multiple bits that cross a byte or word boundary. Over the years, I know I've seen at least three endian versions for the same 32bit word. Something like abcd, dcba, and cdab. One advantage of converting first to bitstrings, is that there's just the two places to fix, for portability, the conversion from byte array to bitstring, and the conversion back. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
En Wed, 05 Aug 2009 11:46:13 -0300, Martin P. Hellwig escribió: On several occasions I have needed (and build) a parser that reads a binary piece of data with custom structure. For example (bogus one): BE +-+-+-+-+--++ | Version | Command | Instruction | Data Length | Data | Filler | +-+-+-+-+--++ Version: 6 bits Command: 4 bits Instruction: 5 bits Data Length: 5 bits Data: 0-31 bits Filler: filling 0 bits to make the packet dividable by 8 - Using a string for binary representation takes at least 8 times more memory for the packet than strictly necessary. The size difference isn't so big; an integer takes 12 bytes and a string takes 24+len bytes. "Data" above would take 56 bytes max when stored as a string '0110001...' vs. 12 bytes when using a plain integer (sizes computed on Windows 32bits). Plus, doing bitwise operations in Python isn't a fast operation as it is in C, by example -- so your current implementation might be a quite good one (in pure Python, I mean). -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
"Martin P. Hellwig" writes: > Is there an advantage using shifts and masks over my kitchen type solution? Weren't you complaining about the 8-to-1 expansion from turning each bit to an ascii char? -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On 5 Aug, 20:41, "Martin P. Hellwig" wrote: > Paul Rubin wrote: > > "Martin P. Hellwig" writes: > >> Is there an advantage using shifts and masks over my kitchen type solution? > > > Weren't you complaining about the 8-to-1 expansion from turning each bit > > to an ascii char? > > Yes you are (of course) right, my 'dream' solution would be something > that accepts slice indeces on bit level. Your reasoning did reveal some > flaws in my approach though ;-) > > -- > MPHhttp://blog.dcuktec.com > 'If consumed, best digested with added seasoning to own preference.' Ahh huh -- found it... ID: DLT119 Desc: datasyzygy misc. development, production and accounts. Date: 2008-12-09 File: pybits.py """ Implements a python level interface to reading bit fields with an associated type. Fields are accessed via __getitem__ and must be a valid python name. eg: class MyStructure: a = BitField(4, int) b = BitField(6, int) c = BitField(2, int) d = BitField(36, str) """ Ummm, seems I left the rest pretty vague, but I'm guessing that's what you're after, and I'm fairly curious to see what I wrote... mind you, not too hard to write again now that's jogged my memory! (although after a couple at the pub, not going to do it tonight!) Cheers, Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
Paul Rubin wrote: "Martin P. Hellwig" writes: Is there an advantage using shifts and masks over my kitchen type solution? Weren't you complaining about the 8-to-1 expansion from turning each bit to an ascii char? Yes you are (of course) right, my 'dream' solution would be something that accepts slice indeces on bit level. Your reasoning did reveal some flaws in my approach though ;-) -- MPH http://blog.dcuktec.com 'If consumed, best digested with added seasoning to own preference.' -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
Paul Rubin wrote: "Martin P. Hellwig" writes: what I usually do is read the packet in binary mode, convert the output to a concatenated 'binary string'(i.e. '0101011000110') and Something wrong with reading the data words as an integer and using old fashioned shifts and masks to get at the bit fields? No not at all of course, I just found it more readable to slice it instead of shifting and masking; i.e. if the ASCII art schematics of the protocol are included as a comment it is almost an 1 to 1 implementation. Is there an advantage using shifts and masks over my kitchen type solution? -- MPH http://blog.dcuktec.com 'If consumed, best digested with added seasoning to own preference.' -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
"Martin P. Hellwig" writes: > what I usually do is read the packet in binary mode, convert the > output to a concatenated 'binary string'(i.e. '0101011000110') and Something wrong with reading the data words as an integer and using old fashioned shifts and masks to get at the bit fields? -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On 5 Aug 2009, at 19:17 , Bearophile wrote: Have you tried Hachoir? (I think its name may be changed to Fusil, I don't know). Name hasn't been changed (I think fusil is a subproject, something like that) on the other hand the hachoir.org site is dead. But apparently Hachoir was moved to bitbucket (http://bitbucket.org/haypo/hachoir/wiki/Home ) just in time so… -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On 5 Aug 2009, at 16:46 , Martin P. Hellwig wrote: Hi List, On several occasions I have needed (and build) a parser that reads a binary piece of data with custom structure. For example (bogus one): BE +-+-+-+-+--++ | Version | Command | Instruction | Data Length | Data | Filler | +-+-+-+-+--++ Version: 6 bits Command: 4 bits Instruction: 5 bits Data Length: 5 bits Data: 0-31 bits Filler: filling 0 bits to make the packet dividable by 8 what I usually do is read the packet in binary mode, convert the output to a concatenated 'binary string'(i.e. '0101011000110') and then use slice indeces to get the right data portions. Depending on what I need to do with these portions I convert them to whatever is handy (usually an integer). This works out fine for me. Most of the time I also put the ASCII art diagram of this 'protocol' as a comment in the code, making it more readable/understandable. Though there are a couple of things that bothers me with my approach: - This seems such a general problem that I think that there must be already a general pythonic solution. - Using a string for binary representation takes at least 8 times more memory for the packet than strictly necessary. - Seems to need a lot of prep work before doing the actual parsing. Any suggestion is greatly appreciated. The gold standard for binary parsing (and serialization) is probably Erlang's bit syntax, but as far as Python goes you might be interested by Hachoir (http://hachoir.org/ but it seems down right now). It's not going to match your second point, but it can probably help with the rest (caveat: I haven't used hachoir personally). -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
Martin P. Hellwig: > On several occasions I have needed (and build) a parser that reads a > binary piece of data with custom structure. For example (bogus one): > > BE > +-+-+-+-+--++ > | Version | Command | Instruction | Data Length | Data | Filler | > +-+-+-+-+--++ > Version: 6 bits > Command: 4 bits > Instruction: 5 bits > Data Length: 5 bits > Data: 0-31 bits > Filler: filling 0 bits to make the packet dividable by 8 Have you tried Hachoir? (I think its name may be changed to Fusil, I don't know). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
Jon Clements wrote: IIRC (and I have my doubts) the BitVector module may be of use, but it's been about 3 years since I had to look at it. I think it used the C equiv. of short ints to do its work. Otherwise, maybe the array module, the struct module or even possibly ctypes. Not much use, but might give a few pointers. Jon. I tried struct before (it seemed logical) but it didn't really do what I expected: >>> import struct >>> struct.unpack('?','x') (True,) I expected a list like (False, True, True, True, True, False, False, False). But it is (actually I hope) just an error of my side, so if somebody knows how to achieve what I want, with this module, please enlighten me :-) The array module doesn't seem to have an option for boolean values. BitVector does look very promising, but after peeking in the source it does more or less the same as what I do with converting back and forth of 'bitstrings'. I haven't thought about using ctypes for this, excellent suggestion! Thank you for your feedback. -- MPH http://blog.dcuktec.com 'If consumed, best digested with added seasoning to own preference.' -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
On 5 Aug, 15:46, "Martin P. Hellwig" wrote: > Hi List, > > On several occasions I have needed (and build) a parser that reads a > binary piece of data with custom structure. For example (bogus one): > > BE > +-+-+-+-+--++ > | Version | Command | Instruction | Data Length | Data | Filler | > +-+-+-+-+--++ > Version: 6 bits > Command: 4 bits > Instruction: 5 bits > Data Length: 5 bits > Data: 0-31 bits > Filler: filling 0 bits to make the packet dividable by 8 > > what I usually do is read the packet in binary mode, convert the output > to a concatenated 'binary string'(i.e. '0101011000110') and then use > slice indeces to get the right data portions. > Depending on what I need to do with these portions I convert them to > whatever is handy (usually an integer). > > This works out fine for me. Most of the time I also put the ASCII art > diagram of this 'protocol' as a comment in the code, making it more > readable/understandable. > > Though there are a couple of things that bothers me with my approach: > - This seems such a general problem that I think that there must be > already a general pythonic solution. > - Using a string for binary representation takes at least 8 times more > memory for the packet than strictly necessary. > - Seems to need a lot of prep work before doing the actual parsing. > > Any suggestion is greatly appreciated. > > -- > MPHhttp://blog.dcuktec.com > 'If consumed, best digested with added seasoning to own preference.' IIRC (and I have my doubts) the BitVector module may be of use, but it's been about 3 years since I had to look at it. I think it used the C equiv. of short ints to do its work. Otherwise, maybe the array module, the struct module or even possibly ctypes. Not much use, but might give a few pointers. Jon. -- http://mail.python.org/mailman/listinfo/python-list
Parsing Binary Structures; Is there a better way / What is your way?
Hi List, On several occasions I have needed (and build) a parser that reads a binary piece of data with custom structure. For example (bogus one): BE +-+-+-+-+--++ | Version | Command | Instruction | Data Length | Data | Filler | +-+-+-+-+--++ Version: 6 bits Command: 4 bits Instruction: 5 bits Data Length: 5 bits Data: 0-31 bits Filler: filling 0 bits to make the packet dividable by 8 what I usually do is read the packet in binary mode, convert the output to a concatenated 'binary string'(i.e. '0101011000110') and then use slice indeces to get the right data portions. Depending on what I need to do with these portions I convert them to whatever is handy (usually an integer). This works out fine for me. Most of the time I also put the ASCII art diagram of this 'protocol' as a comment in the code, making it more readable/understandable. Though there are a couple of things that bothers me with my approach: - This seems such a general problem that I think that there must be already a general pythonic solution. - Using a string for binary representation takes at least 8 times more memory for the packet than strictly necessary. - Seems to need a lot of prep work before doing the actual parsing. Any suggestion is greatly appreciated. -- MPH http://blog.dcuktec.com 'If consumed, best digested with added seasoning to own preference.' -- http://mail.python.org/mailman/listinfo/python-list