Re: [Tutor] Reading binary files #2

2009-02-09 Thread eShopping

Hi Bob

some replies below.  One thing I noticed with the "full" file was 
that I ran into problems when the number of records was 10500, and 
the file read got misaligned.  Presumably 10500 is still within the 
range of int?


Best regards

Alun


At 17:49 09/02/2009, bob gailer wrote:

etrade.griffi...@dsl.pipex.com wrote:

Hi

following last week's discussion with Bob Gailer about reading 
unformatted FORTRAN files, I have attached an example of the file 
in ASCII format and the equivalent unformatted version.


Thank you. It is good to have real data to work with.

Below is some code that works OK until it gets to a data item that 
has no additional associated data, then seems to have got 4 bytes 
ahead of itself.


Thank you. It is good to have real code to work with.

I though I had trapped this but it appears not.  I think the issue 
is asociated with "newline" characters or the unformatted equivalent.




I think not, But we will see.

I fail to see where the problem is. The data printed below seems to 
agree with the files you sent. What am I missing?


When I run the program it exits in the middle but should run through 
to the end.  The output to the console was


236 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS', 
'\x00\x00\x00\x10\x00\x00\x00\x10')
264 ('TIME', '\x00\x00\x00\x01', 1380270412, '\x00\x00\x00\x10', 
'\x00\x00\x00\x04\x00\x00\x00\x00')


Here "TIME" is in vals[0] when it should be in vals[1] and so on.  I 
found the problem earlier today and I re-wrote the main loop as 
follows (before I saw your helpful coding style comments):


while stop < nrec:

# extract data structure

start, stop = stop, stop + struct.calcsize('4s8si4s4s')
vals = struct.unpack('>4s8si4s4s', data[start:stop])
items.extend(vals[1:4])
print stop, vals

# define format of subsequent data

nval = int(vals[2])

if vals[3] == 'INTE':
fmt_string = '>i'
elif vals[3] == 'CHAR':
fmt_string = '>8s'
elif vals[3] == 'LOGI':
fmt_string = '>i'
elif vals[3] == 'REAL':
fmt_string = '>f'
elif vals[3] == 'DOUB':
fmt_string = '>d'
elif vals[3] == 'MESS':
fmt_string = '>%ds' % nval
else:
print "Unknown data type ... exiting"
print items[-40:]
sys.exit(0)

# leading spaces

if nval > 0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])

# extract data

for i in range(0,nval):
start, stop = stop, stop + struct.calcsize(fmt_string)
vals = struct.unpack(fmt_string, data[start:stop])
items.extend(vals)

# trailing spaces

if nval > 0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])

Now I get this output

232 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS', '\x00\x00\x00\x10')
256 ('\x00\x00\x00\x10', 'TIME', 1, 'REAL', '\x00\x00\x00\x10')

and the script runs to the end


FWIW a few observations re coding style and techniques.

1) put the formats in a dictionary before the while loop:
formats = {'INTE': '>i', 'CHAR': '>8s', 'LOGI': '>i', 'REAL': '>f', 
'DOUB': '>d', 'MESS': ''>d,}


2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]


Neat!!



3) condense the 3 infile lines:
data = open("test.bin","rb").read()


I still don't quite trust myself to "chain" functions together, but I 
guess that's lack of practice



4) nrec is a misleading name (to me it means # of records), nbytes 
would be better.


Agreed



5) Be consistent with the format between calcsize and unpack:
struct.calcsize('>4s8si4s8s')

6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc


Will do

7) The format for MESS should be '>d' rather than '>%dd' % nval. 
When nval is 0 the for loop will make 0 cycles.


Wasn't sure about that one.  "MESS" implies string but I wasn't sure 
what to do about a zero-length string



8) You don't have a format for DATA (BEGI); therefore the prior 
format (for CHAR) is being applied. The formats are the same so it 
does not matter but could be confusing later.


DATABEGI should be a keyword to indicate the start of the "proper" 
data which has format MESS (ie string).  You did make me look again 
at the MESS format and it should be '>%ds' % nval and not '>%dd' % nval



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading binary files #2

2009-02-09 Thread bob gailer

etrade.griffi...@dsl.pipex.com wrote:

Hi

following last week's discussion with Bob Gailer about reading unformatted FORTRAN files, I have attached an example of the file in ASCII format and the equivalent unformatted version.  


Thank you. It is good to have real data to work with.

Below is some code that works OK until it gets to a data item that has no additional associated data, then seems to have got 4 bytes ahead of itself.  


Thank you. It is good to have real code to work with.


I though I had trapped this but it appears not.  I think the issue is asociated with 
"newline" characters or the unformatted equivalent.
  


I think not, But we will see.

I fail to see where the problem is. The data printed below seems to 
agree with the files you sent. What am I missing?


FWIW a few observations re coding style and techniques.

1) put the formats in a dictionary before the while loop:
formats = {'INTE': '>i', 'CHAR': '>8s', 'LOGI': '>i', 'REAL': '>f', 
'DOUB': '>d', 'MESS': ''>d,}


2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]

3) condense the 3 infile lines:
data = open("test.bin","rb").read()

4) nrec is a misleading name (to me it means # of records), nbytes would 
be better.


5) Be consistent with the format between calcsize and unpack:
struct.calcsize('>4s8si4s8s')

6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc

7) The format for MESS should be '>d' rather than '>%dd' % nval. When 
nval is 0 the for loop will make 0 cycles.


8) You don't have a format for DATA (BEGI); therefore the prior format 
(for CHAR) is being applied. The formats are the same so it does not 
matter but could be confusing later.




# Test function to write/read from unformatted files

import sys
import struct

# Read file in one go

in_file = open("test.bin","rb")
data = in_file.read()
in_file.close()

# Initialise

nrec = len(data)
stop = 0
items = []

# Read data until EOF encountered

while stop < nrec:

# extract data structure


start, stop = stop, stop + struct.calcsize('4s8si4s8s')
vals = struct.unpack('>4s8si4s8s', data[start:stop])
items.extend(vals)
print stop, vals

# define format of subsequent data

nval = int(vals[2])

if vals[3] == 'INTE':
fmt_string = '>i'
elif vals[3] == 'CHAR':
fmt_string = '>8s'
elif vals[3] == 'LOGI':
fmt_string = '>i'
elif vals[3] == 'REAL':
fmt_string = '>f'
elif vals[3] == 'DOUB':
fmt_string = '>d'
elif vals[3] == 'MESS':
fmt_string = '>%dd' % nval
else:
print "Unknown data type ... exiting"
print items
sys.exit(0)

# extract data

for i in range(0,nval):

start, stop = stop, stop + struct.calcsize(fmt_string)
vals = struct.unpack(fmt_string, data[start:stop])
items.extend(vals)

# trailing spaces

if nval > 0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])

# All data read so print items

print items


-
Visit Pipex Business: The homepage for UK Small Businesses

Go to http://www.pipex.co.uk/business-services

  



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading binary files #2

2009-02-09 Thread Alan Gauld

 wrote


I have attached an example of the file in ASCII format and the
equivalent unformatted version.


Comparing them in vim...
It doesn't look too bad except for the DATABEGI / DATAEND message 
format.

That could be tricky to unravel but we have no clear format for MESS.
But I assume that all the stuff between BEG and END is supposed to
be effectively nested?.


it gets to a data item that has no additional associated data,
then seems to have got 4 bytes ahead of itself.


You are creating a format string of >0d but I'm not sure how struct
behaves with zero lenths...

HTH,

Alan G.

==
# Test function to write/read from unformatted files

import sys
import struct

# Read file in one go

in_file = open("test.bin","rb")
data = in_file.read()
in_file.close()

# Initialise

nrec = len(data)
stop = 0
items = []

# Read data until EOF encountered

while stop < nrec:

   # extract data structure

   start, stop = stop, stop + struct.calcsize('4s8si4s8s')
   vals = struct.unpack('>4s8si4s8s', data[start:stop])
   items.extend(vals)
   print stop, vals

   # define format of subsequent data

   nval = int(vals[2])

   if vals[3] == 'INTE':
   fmt_string = '>i'
   elif vals[3] == 'CHAR':
   fmt_string = '>8s'
   elif vals[3] == 'LOGI':
   fmt_string = '>i'
   elif vals[3] == 'REAL':
   fmt_string = '>f'
   elif vals[3] == 'DOUB':
   fmt_string = '>d'
   elif vals[3] == 'MESS':
   fmt_string = '>%dd' % nval
   else:
   print "Unknown data type ... exiting"
   print items
   sys.exit(0)

   # extract data

   for i in range(0,nval):
   start, stop = stop, stop + struct.calcsize(fmt_string)
   vals = struct.unpack(fmt_string, data[start:stop])
   items.extend(vals)

   # trailing spaces

   if nval > 0:
   start, stop = stop, stop + struct.calcsize('4s')
   vals = struct.unpack('4s', data[start:stop])

# All data read so print items

print items


-
Visit Pipex Business: The homepage for UK Small Businesses

Go to http://www.pipex.co.uk/business-services








___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor