Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-07 Thread Hendrik van Rooyen
On Thursday 06 August 2009 20:50:30 Martin P. Hellwig wrote:
 Thanks all for your insights and suggestions.
 It seems to me that there are a couple of ways to this bit manipulation
 and a couple of foreign modules to assist you with that.

 Would it be worth the while to do a PEP on this?
 Personally I think that it would be nice to have a standard module in
 Python for this but perhaps there is a good reason that there isn't
 already one.

I agree that it would be nice - I think it would be nice if something
like this is added to the struct module to give a this is the obvious
way that python handles foreign data module

- Hendrik


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-07 Thread andrew cooke
On Aug 5, 10:46 am, Martin P. Hellwig martin.hell...@dcuktec.org
wrote:
 Hi List,

 On several occasions I have needed (and build) a parser that reads a
 binary piece of data with custom structure. For example (bogus one):

 BE
 +-+-+-+-+--++
 | Version | Command | Instruction | Data Length | Data | Filler |
 +-+-+-+-+--++
 Version: 6 bits
 Command: 4 bits
 Instruction: 5 bits
 Data Length: 5 bits
 Data: 0-31 bits
 Filler: filling 0 bits to make the packet dividable by 8

hi,

sorry i'm a bit late here, but lepl does exactly this.  also, as you
asked for in another post, the BitString class allows arbitrary
indexing into a sequence of bits.  and because it's part of a
recursive descent parser you can match anything (lepl will handle -
although less efficiently - left recursive and ambiguous grammars).

see the example at http://www.acooke.org/lepl/binary.html#matching
which constructs an ethernet frame and then parses it, extracting the
source and destination addresses.

feel free to email me with more questions.

disclaimer: this is quite new and i don't know of anyone that actually
uses it; it is also Python3 only (because it uses bytes()).

andrew


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-06 Thread Dave Angel

Paul Rubin wrote:

Martin P. Hellwig martin.hell...@dcuktec.org writes:
  

Is there an advantage using shifts and masks over my kitchen type solution?



Weren't you complaining about the 8-to-1 expansion from turning each bit
to an ascii char?

  

One warning to Martin:

If you want your code portable across systems, watch out for 
big-endian/little-endian  issues, as well as alignment ones.  Shift  
mask code tends to be highly specific to a particular endian-ness, 
especially if trying to get multiple bits that cross a byte or word 
boundary.


Over the years, I know I've seen at least three endian versions for the 
same 32bit word.  Something like abcd, dcba, and cdab.


One advantage of converting first to bitstrings, is that there's just 
the two places to fix, for portability, the conversion from byte array 
to bitstring, and the conversion back.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-06 Thread Hendrik van Rooyen
On Wednesday 05 August 2009 16:46:13 Martin P. Hellwig wrote:
 Hi List,

 On several occasions I have needed (and build) a parser that reads a
 binary piece of data with custom structure. For example (bogus one):

 BE
 +-+-+-+-+--++

 | Version | Command | Instruction | Data Length | Data | Filler |

 +-+-+-+-+--++
 Version: 6 bits
 Command: 4 bits
 Instruction: 5 bits
 Data Length: 5 bits
 Data: 0-31 bits
 Filler: filling 0 bits to make the packet dividable by 8

 what I usually do is read the packet in binary mode, convert the output
 to a concatenated 'binary string'(i.e. '0101011000110') and then use
 slice indeces to get the right data portions.
 Depending on what I need to do with these portions I convert them to
 whatever is handy (usually an integer).

This bit banging stuff is a PITA, no matter what you do.
Python does not have bit fields like C.
And C bit fields are implementation dependent.
Write an extension?

Some time ago I asked a similar question, and
Castironpi came up with what was essentially an
indexed integer, with named bits.

It stores the bits natively, but I suspect that the
price paid is access time.

I enclose a module that you can adapt.
It talks about bytes but they are integers really.
It is different from what you are doing, as it
was aimed at reading and writing bits in 
a hardware context.

If you get your head around the concept,
then it may give you some ideas.  It should
be possible to extend the concept to
pass name,length tuples at construction time
instead of just a name with an implied length 
of one bit, and it may make sense to change
the underlying type from an integer to an 
array of one byte integers.

It is nice to be able to say:
val = bitname() 
to read, and 
bitname(1)
or
bitname(0)
to write.

I can also write:
outputbits[3] = 1
or
val = inputbits[5]

If you can successfully generalise it
to field names It should be very useful.

I cannot think of a way though, to not
have the in and out split, but you can
program your way around that - you do not
have to update in place.

- Hendrik



bits.py
Description: application/python
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-06 Thread Hendrik van Rooyen
On Wednesday 05 August 2009 20:12:05 Paul Rubin wrote:
 Martin P. Hellwig martin.hell...@dcuktec.org writes:
  what I usually do is read the packet in binary mode, convert the
  output to a concatenated 'binary string'(i.e. '0101011000110') and

 Something wrong with reading the data words as an integer and using
 old fashioned shifts and masks to get at the bit fields?

There is nothing wrong with that, and in the final analysis, it is what
is used - however, I can sympathize with the OP, as it is much nicer
to just call a named thing than to juggle a bunch of hard coded
shifts and masks.

I think the real point here is sort of meta programming - what
is needed is a way to describe and name the elements of the
structure so that useful python objects are created.

The code I posted for bits does that, and I use it all the
time - it is almost as good as being back in 8031 assembler
with direct named bit access.

- Hendrik
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-06 Thread Hendrik van Rooyen
On Wednesday 05 August 2009 21:41:26 Martin P. Hellwig wrote:

 Yes you are (of course) right, my 'dream' solution would be something
 that accepts slice indeces on bit level. Your reasoning did reveal some
 flaws in my approach though ;-)

This is the first time I have been compared to the sandman...

:-)

- Hendrik

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-06 Thread sgriffiths
On Aug 5, 3:46 pm, Martin P. Hellwig martin.hell...@dcuktec.org
wrote:
 Hi List,

 On several occasions I have needed (and build) a parser that reads a
 binary piece of data with custom structure. For example (bogus one):

 BE
 +-+-+-+-+--++
 | Version | Command | Instruction | Data Length | Data | Filler |
 +-+-+-+-+--++
 Version: 6 bits
 Command: 4 bits
 Instruction: 5 bits
 Data Length: 5 bits
 Data: 0-31 bits
 Filler: filling 0 bits to make the packet dividable by 8

 what I usually do is read the packet in binary mode, convert the output
 to a concatenated 'binary string'(i.e. '0101011000110') and then use
 slice indeces to get the right data portions.
 Depending on what I need to do with these portions I convert them to
 whatever is handy (usually an integer).

 This works out fine for me. Most of the time I also put the ASCII art
 diagram of this 'protocol' as a comment in the code, making it more
 readable/understandable.

 Though there are a couple of things that bothers me with my approach:
 - This seems such a general problem that I think that there must be
 already a general pythonic solution.
 - Using a string for binary representation takes at least 8 times more
 memory for the packet than strictly necessary.
 - Seems to need a lot of prep work before doing the actual parsing.

 Any suggestion is greatly appreciated.

 --
 MPHhttp://blog.dcuktec.com
 'If consumed, best digested with added seasoning to own preference.'

Take a look at the bitstring module (in pypi or google code). It's
designed to help make this sort of thing easy and it's more fully
featured than BitVector or BitSet. Internally the data is stored as a
byte array, so memory isn't wasted. It will also do all the dirty work
of bit masking and shifting so that you can concentrate on the real
problems. For example:

 s = BitString('0x1232432312')  # just to give us some data to play with
 ver, comm, instr, bitlen = s.read('uint6, bin4, bin5, uint5')
 data = s.readbits(bitlen)

Different interpretations of the binary data are given using Python
properties (e.g. s.hex, s.oct, s.uint, etc.) and it supports bit-wise
slicing, modification, finding, replacing and more. It is also still
in active development (full disclosure: I'm the author :-)).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Jon Clements
On 5 Aug, 15:46, Martin P. Hellwig martin.hell...@dcuktec.org
wrote:
 Hi List,

 On several occasions I have needed (and build) a parser that reads a
 binary piece of data with custom structure. For example (bogus one):

 BE
 +-+-+-+-+--++
 | Version | Command | Instruction | Data Length | Data | Filler |
 +-+-+-+-+--++
 Version: 6 bits
 Command: 4 bits
 Instruction: 5 bits
 Data Length: 5 bits
 Data: 0-31 bits
 Filler: filling 0 bits to make the packet dividable by 8

 what I usually do is read the packet in binary mode, convert the output
 to a concatenated 'binary string'(i.e. '0101011000110') and then use
 slice indeces to get the right data portions.
 Depending on what I need to do with these portions I convert them to
 whatever is handy (usually an integer).

 This works out fine for me. Most of the time I also put the ASCII art
 diagram of this 'protocol' as a comment in the code, making it more
 readable/understandable.

 Though there are a couple of things that bothers me with my approach:
 - This seems such a general problem that I think that there must be
 already a general pythonic solution.
 - Using a string for binary representation takes at least 8 times more
 memory for the packet than strictly necessary.
 - Seems to need a lot of prep work before doing the actual parsing.

 Any suggestion is greatly appreciated.

 --
 MPHhttp://blog.dcuktec.com
 'If consumed, best digested with added seasoning to own preference.'

IIRC (and I have my doubts) the BitVector module may be of use, but
it's been about 3 years since I had to look at it. I think it used the
C equiv. of short ints to do its work. Otherwise, maybe the array
module, the struct module or even possibly ctypes.

Not much use, but might give a few pointers.

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Martin P. Hellwig

Jon Clements wrote:
cut

IIRC (and I have my doubts) the BitVector module may be of use, but
it's been about 3 years since I had to look at it. I think it used the
C equiv. of short ints to do its work. Otherwise, maybe the array
module, the struct module or even possibly ctypes.

Not much use, but might give a few pointers.

Jon.


I tried struct before (it seemed logical) but it didn't really do what I 
expected:

 import struct
 struct.unpack('?','x')
(True,)

I expected a list like  (False, True, True, True, True, False, False, 
False). But it is (actually I hope) just an error of my side, so if 
somebody knows how to achieve what I want, with this module, please 
enlighten me :-)


The array module doesn't seem to have an option for boolean values.

BitVector does look very promising, but after peeking in the source it 
does more or less the same as what I do with converting back and forth 
of 'bitstrings'.


I haven't thought about using ctypes for this, excellent suggestion!
Thank you for your feedback.

--
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'
--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Bearophile
Martin P. Hellwig:
 On several occasions I have needed (and build) a parser that reads a
 binary piece of data with custom structure. For example (bogus one):

 BE
 +-+-+-+-+--++
 | Version | Command | Instruction | Data Length | Data | Filler |
 +-+-+-+-+--++
 Version: 6 bits
 Command: 4 bits
 Instruction: 5 bits
 Data Length: 5 bits
 Data: 0-31 bits
 Filler: filling 0 bits to make the packet dividable by 8

Have you tried Hachoir? (I think its name may be changed to Fusil, I
don't know).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Masklinn

On 5 Aug 2009, at 16:46 , Martin P. Hellwig wrote:

Hi List,

On several occasions I have needed (and build) a parser that reads a  
binary piece of data with custom structure. For example (bogus one):


BE
+-+-+-+-+--++
| Version | Command | Instruction | Data Length | Data | Filler |
+-+-+-+-+--++
Version: 6 bits
Command: 4 bits
Instruction: 5 bits
Data Length: 5 bits
Data: 0-31 bits
Filler: filling 0 bits to make the packet dividable by 8

what I usually do is read the packet in binary mode, convert the  
output to a concatenated 'binary string'(i.e. '0101011000110') and  
then use slice indeces to get the right data portions.
Depending on what I need to do with these portions I convert them to  
whatever is handy (usually an integer).


This works out fine for me. Most of the time I also put the ASCII  
art diagram of this 'protocol' as a comment in the code, making it  
more readable/understandable.


Though there are a couple of things that bothers me with my approach:
- This seems such a general problem that I think that there must be  
already a general pythonic solution.
- Using a string for binary representation takes at least 8 times  
more memory for the packet than strictly necessary.

- Seems to need a lot of prep work before doing the actual parsing.

Any suggestion is greatly appreciated.
The gold standard for binary parsing (and serialization) is probably  
Erlang's bit syntax, but as far as Python goes you might be interested  
by Hachoir (http://hachoir.org/ but it seems down right now).


It's not going to match your second point, but it can probably help  
with the rest (caveat: I haven't used hachoir personally).

--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Masklinn

On 5 Aug 2009, at 19:17 , Bearophile wrote:

Have you tried Hachoir? (I think its name may be changed to Fusil, I
don't know).
Name hasn't been changed (I think fusil is a subproject, something  
like that) on the other hand the hachoir.org site is dead.


But apparently Hachoir was moved to bitbucket (http://bitbucket.org/haypo/hachoir/wiki/Home 
) just in time so…

--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Paul Rubin
Martin P. Hellwig martin.hell...@dcuktec.org writes:
 what I usually do is read the packet in binary mode, convert the
 output to a concatenated 'binary string'(i.e. '0101011000110') and

Something wrong with reading the data words as an integer and using
old fashioned shifts and masks to get at the bit fields?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Martin P. Hellwig

Paul Rubin wrote:

Martin P. Hellwig martin.hell...@dcuktec.org writes:

what I usually do is read the packet in binary mode, convert the
output to a concatenated 'binary string'(i.e. '0101011000110') and


Something wrong with reading the data words as an integer and using
old fashioned shifts and masks to get at the bit fields?


No not at all of course, I just found it more readable to slice it 
instead of shifting and masking; i.e. if the ASCII art schematics of the 
protocol are included as a comment it is almost an 1 to 1  implementation.


Is there an advantage using shifts and masks over my kitchen type solution?

--
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'
--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Martin P. Hellwig

Paul Rubin wrote:

Martin P. Hellwig martin.hell...@dcuktec.org writes:

Is there an advantage using shifts and masks over my kitchen type solution?


Weren't you complaining about the 8-to-1 expansion from turning each bit
to an ascii char?


Yes you are (of course) right, my 'dream' solution would be something 
that accepts slice indeces on bit level. Your reasoning did reveal some 
flaws in my approach though ;-)


--
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'
--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Jon Clements
On 5 Aug, 20:41, Martin P. Hellwig martin.hell...@dcuktec.org
wrote:
 Paul Rubin wrote:
  Martin P. Hellwig martin.hell...@dcuktec.org writes:
  Is there an advantage using shifts and masks over my kitchen type solution?

  Weren't you complaining about the 8-to-1 expansion from turning each bit
  to an ascii char?

 Yes you are (of course) right, my 'dream' solution would be something
 that accepts slice indeces on bit level. Your reasoning did reveal some
 flaws in my approach though ;-)

 --
 MPHhttp://blog.dcuktec.com
 'If consumed, best digested with added seasoning to own preference.'

Ahh huh -- found it...

ID: DLT119
Desc: datasyzygy misc. development, production and accounts.
Date: 2008-12-09
File: pybits.py

Implements a python level interface to reading bit fields with an
associated type. Fields are accessed via __getitem__ and must be a
valid python name. eg:

class MyStructure:
a = BitField(4, int)
b = BitField(6, int)
c = BitField(2, int)
d = BitField(36, str)


Ummm, seems I left the rest pretty vague, but I'm guessing that's what
you're after, and I'm fairly curious to see what I wrote... mind you,
not too hard to write again now that's jogged my memory! (although
after a couple at the pub, not going to do it tonight!)

Cheers,

Jon.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Gabriel Genellina
En Wed, 05 Aug 2009 11:46:13 -0300, Martin P. Hellwig  
martin.hell...@dcuktec.org escribió:


On several occasions I have needed (and build) a parser that reads a  
binary piece of data with custom structure. For example (bogus one):


BE
+-+-+-+-+--++
| Version | Command | Instruction | Data Length | Data | Filler |
+-+-+-+-+--++
Version: 6 bits
Command: 4 bits
Instruction: 5 bits
Data Length: 5 bits
Data: 0-31 bits
Filler: filling 0 bits to make the packet dividable by 8


- Using a string for binary representation takes at least 8 times more  
memory for the packet than strictly necessary.


The size difference isn't so big; an integer takes 12 bytes and a string  
takes 24+len bytes. Data above would take 56 bytes max when stored as a  
string '0110001...' vs. 12 bytes when using a plain integer (sizes  
computed on Windows 32bits). Plus, doing bitwise operations in Python  
isn't a fast operation as it is in C, by example -- so your current  
implementation might be a quite good one (in pure Python, I mean).


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list