Re: [Tutor] Reading binary files #2

2009-02-09 Thread Alan Gauld

etrade.griffi...@dsl.pipex.com wrote


I have attached an example of the file in ASCII format and the
equivalent unformatted version.


Comparing them in vim...
It doesn't look too bad except for the DATABEGI / DATAEND message 
format.

That could be tricky to unravel but we have no clear format for MESS.
But I assume that all the stuff between BEG and END is supposed to
be effectively nested?.


it gets to a data item that has no additional associated data,
then seems to have got 4 bytes ahead of itself.


You are creating a format string of 0d but I'm not sure how struct
behaves with zero lenths...

HTH,

Alan G.

==
# Test function to write/read from unformatted files

import sys
import struct

# Read file in one go

in_file = open(test.bin,rb)
data = in_file.read()
in_file.close()

# Initialise

nrec = len(data)
stop = 0
items = []

# Read data until EOF encountered

while stop  nrec:

   # extract data structure

   start, stop = stop, stop + struct.calcsize('4s8si4s8s')
   vals = struct.unpack('4s8si4s8s', data[start:stop])
   items.extend(vals)
   print stop, vals

   # define format of subsequent data

   nval = int(vals[2])

   if vals[3] == 'INTE':
   fmt_string = 'i'
   elif vals[3] == 'CHAR':
   fmt_string = '8s'
   elif vals[3] == 'LOGI':
   fmt_string = 'i'
   elif vals[3] == 'REAL':
   fmt_string = 'f'
   elif vals[3] == 'DOUB':
   fmt_string = 'd'
   elif vals[3] == 'MESS':
   fmt_string = '%dd' % nval
   else:
   print Unknown data type ... exiting
   print items
   sys.exit(0)

   # extract data

   for i in range(0,nval):
   start, stop = stop, stop + struct.calcsize(fmt_string)
   vals = struct.unpack(fmt_string, data[start:stop])
   items.extend(vals)

   # trailing spaces

   if nval  0:
   start, stop = stop, stop + struct.calcsize('4s')
   vals = struct.unpack('4s', data[start:stop])

# All data read so print items

print items


-
Visit Pipex Business: The homepage for UK Small Businesses

Go to http://www.pipex.co.uk/business-services








___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading binary files #2

2009-02-09 Thread bob gailer

etrade.griffi...@dsl.pipex.com wrote:

Hi

following last week's discussion with Bob Gailer about reading unformatted FORTRAN files, I have attached an example of the file in ASCII format and the equivalent unformatted version.  


Thank you. It is good to have real data to work with.

Below is some code that works OK until it gets to a data item that has no additional associated data, then seems to have got 4 bytes ahead of itself.  


Thank you. It is good to have real code to work with.


I though I had trapped this but it appears not.  I think the issue is asociated with 
newline characters or the unformatted equivalent.
  


I think not, But we will see.

I fail to see where the problem is. The data printed below seems to 
agree with the files you sent. What am I missing?


FWIW a few observations re coding style and techniques.

1) put the formats in a dictionary before the while loop:
formats = {'INTE': 'i', 'CHAR': '8s', 'LOGI': 'i', 'REAL': 'f', 
'DOUB': 'd', 'MESS': ''d,}


2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]

3) condense the 3 infile lines:
data = open(test.bin,rb).read()

4) nrec is a misleading name (to me it means # of records), nbytes would 
be better.


5) Be consistent with the format between calcsize and unpack:
struct.calcsize('4s8si4s8s')

6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc

7) The format for MESS should be 'd' rather than '%dd' % nval. When 
nval is 0 the for loop will make 0 cycles.


8) You don't have a format for DATA (BEGI); therefore the prior format 
(for CHAR) is being applied. The formats are the same so it does not 
matter but could be confusing later.




# Test function to write/read from unformatted files

import sys
import struct

# Read file in one go

in_file = open(test.bin,rb)
data = in_file.read()
in_file.close()

# Initialise

nrec = len(data)
stop = 0
items = []

# Read data until EOF encountered

while stop  nrec:

# extract data structure


start, stop = stop, stop + struct.calcsize('4s8si4s8s')
vals = struct.unpack('4s8si4s8s', data[start:stop])
items.extend(vals)
print stop, vals

# define format of subsequent data

nval = int(vals[2])

if vals[3] == 'INTE':
fmt_string = 'i'
elif vals[3] == 'CHAR':
fmt_string = '8s'
elif vals[3] == 'LOGI':
fmt_string = 'i'
elif vals[3] == 'REAL':
fmt_string = 'f'
elif vals[3] == 'DOUB':
fmt_string = 'd'
elif vals[3] == 'MESS':
fmt_string = '%dd' % nval
else:
print Unknown data type ... exiting
print items
sys.exit(0)

# extract data

for i in range(0,nval):

start, stop = stop, stop + struct.calcsize(fmt_string)
vals = struct.unpack(fmt_string, data[start:stop])
items.extend(vals)

# trailing spaces

if nval  0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])

# All data read so print items

print items


-
Visit Pipex Business: The homepage for UK Small Businesses

Go to http://www.pipex.co.uk/business-services

  



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading binary files #2

2009-02-09 Thread eShopping

Hi Bob

some replies below.  One thing I noticed with the full file was 
that I ran into problems when the number of records was 10500, and 
the file read got misaligned.  Presumably 10500 is still within the 
range of int?


Best regards

Alun


At 17:49 09/02/2009, bob gailer wrote:

etrade.griffi...@dsl.pipex.com wrote:

Hi

following last week's discussion with Bob Gailer about reading 
unformatted FORTRAN files, I have attached an example of the file 
in ASCII format and the equivalent unformatted version.


Thank you. It is good to have real data to work with.

Below is some code that works OK until it gets to a data item that 
has no additional associated data, then seems to have got 4 bytes 
ahead of itself.


Thank you. It is good to have real code to work with.

I though I had trapped this but it appears not.  I think the issue 
is asociated with newline characters or the unformatted equivalent.




I think not, But we will see.

I fail to see where the problem is. The data printed below seems to 
agree with the files you sent. What am I missing?


When I run the program it exits in the middle but should run through 
to the end.  The output to the console was


236 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS', 
'\x00\x00\x00\x10\x00\x00\x00\x10')
264 ('TIME', '\x00\x00\x00\x01', 1380270412, '\x00\x00\x00\x10', 
'\x00\x00\x00\x04\x00\x00\x00\x00')


Here TIME is in vals[0] when it should be in vals[1] and so on.  I 
found the problem earlier today and I re-wrote the main loop as 
follows (before I saw your helpful coding style comments):


while stop  nrec:

# extract data structure

start, stop = stop, stop + struct.calcsize('4s8si4s4s')
vals = struct.unpack('4s8si4s4s', data[start:stop])
items.extend(vals[1:4])
print stop, vals

# define format of subsequent data

nval = int(vals[2])

if vals[3] == 'INTE':
fmt_string = 'i'
elif vals[3] == 'CHAR':
fmt_string = '8s'
elif vals[3] == 'LOGI':
fmt_string = 'i'
elif vals[3] == 'REAL':
fmt_string = 'f'
elif vals[3] == 'DOUB':
fmt_string = 'd'
elif vals[3] == 'MESS':
fmt_string = '%ds' % nval
else:
print Unknown data type ... exiting
print items[-40:]
sys.exit(0)

# leading spaces

if nval  0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])

# extract data

for i in range(0,nval):
start, stop = stop, stop + struct.calcsize(fmt_string)
vals = struct.unpack(fmt_string, data[start:stop])
items.extend(vals)

# trailing spaces

if nval  0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])

Now I get this output

232 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS', '\x00\x00\x00\x10')
256 ('\x00\x00\x00\x10', 'TIME', 1, 'REAL', '\x00\x00\x00\x10')

and the script runs to the end


FWIW a few observations re coding style and techniques.

1) put the formats in a dictionary before the while loop:
formats = {'INTE': 'i', 'CHAR': '8s', 'LOGI': 'i', 'REAL': 'f', 
'DOUB': 'd', 'MESS': ''d,}


2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]


Neat!!



3) condense the 3 infile lines:
data = open(test.bin,rb).read()


I still don't quite trust myself to chain functions together, but I 
guess that's lack of practice



4) nrec is a misleading name (to me it means # of records), nbytes 
would be better.


Agreed



5) Be consistent with the format between calcsize and unpack:
struct.calcsize('4s8si4s8s')

6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc


Will do

7) The format for MESS should be 'd' rather than '%dd' % nval. 
When nval is 0 the for loop will make 0 cycles.


Wasn't sure about that one.  MESS implies string but I wasn't sure 
what to do about a zero-length string



8) You don't have a format for DATA (BEGI); therefore the prior 
format (for CHAR) is being applied. The formats are the same so it 
does not matter but could be confusing later.


DATABEGI should be a keyword to indicate the start of the proper 
data which has format MESS (ie string).  You did make me look again 
at the MESS format and it should be '%ds' % nval and not '%dd' % nval



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-04 Thread bob gailer

eShopping wrote:

Bob

I am trying to read UNFORMATTED files.  The files also occur as 
formatted files and the format string I provided is the string used to 
write the formatted version.  I can read the formatted version OK.  I 
(naively) assumed that the same format string was used for both files, 
the only differences being whether the FORTRAN WRITE statement 
indicated unformatted or formatted.


WRITE UNFORMATTED dump memory to disk with no formatting. That is why we 
must do some analysis of the file to see where the data has been placed, 
how long the floats are, and what endian is being used.


I'd like to examine the file myself. We might save a lot of time and 
energy that way. If it is not very large would you attach it to your 
reply. If it is very large you could either copy just the first 1000 or 
so bytes, or send the whole thing thru www.yousendit.com.


At 21:41 03/02/2009, bob gailer wrote:
First question: are you trying to work with the file written 
UNFORMATTED? If so read on.


Well, did you read on? What reactions do you have?


eShopping wrote:


Data format:

TIME  1  F  0.0
DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

F=float, D=double, L=logical, S=string etc


The first part of the file should contain a string (eg TIME),
an integer (1) and another string (eg F) so I tried using

import struct
in_file = open(file_name+.dat,rb)
data = in_file.read()
items = struct.unpack('sds', data)

Now I get the error

error: unpack requires a string argument of length 17

which has left me completely baffled!


Did you open the file with mode 'b'? If not change that.

You are passing the entire file to unpack when you should be giving 
it only the first line. That's why is is complaining about the 
length. We need to figure out the lengths of the lines.


Consider the first line

TIME  1  F  0.0

There were (I assume)  4 FORTRAN variables written here: character 
integer character float. Without knowing the lengths of the 
character variables we are at a loss as to what the struct format 
should be. Do you know their lengths? Is the last float or double?


Try this: print data[:40] You should see something like:

TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00 



where ... means 0 or more intervening stuff. It might be that the 
\x01 and the \n are in other places, as we also have to deal with 
byte order issues.


Please do this and report back your results. And also the FORTRAN 
variable types if you have access to them.


Apologies if this is getting a bit messy but the files are at a 
remote location and I forgot to bring copies home.  I don't have 
access to the original FORTRAN program so I tried to emulate the 
reading the data using the Python script below.  AFAIK the FORTRAN 
format line for the header is  (1X, 1X, A8, 1X, 1X, I6, 1X, 1X, 
A1).  If the data following is a float it is written using n(1X, 
F6.2) where n is the number of records picked up from the preceding 
header.


# test program to read binary data

import struct

# create dummy data

data = []
for i in range(0,10):
data.append(float(i))

# write data to binary file

b_file = open(test.bin,wb)

b_file.write(  %8s  %6d  %1s\n % (DISTANCE, len(data), F))
for x in data:
b_file.write( %6.2f % x)


You are still confusing text vs binary. The above writes text 
regardless of the file mode. If the FORTRAN file was written 
UNFORMATTED then you are NOT emulating that with the above program. 
The character data is read back in just fine, since there is no 
translation involved in the writing nor in the reading. The integer 
len(data) is being written as its text (character) representation 
(translating binary to text) but being read back in without 
translation. Also all the floating point data is going out as text.


The file looks like (where b = blank) (how it would look in notepad):

bbDISTANCEbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 
2s8s2si2s1s
you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i 
matches . (\x40\x40\x40\x40). The i tells unpack to shove those 4 
bytes unaltered into a Python integer, resulting in 538976288. You 
can verify that:


 struct.unpack('i', '')
(538976288,)

Please either assure me you understand or are prepared for a more in 
depth tutorial.

b_file.close()

# read back data from file

c_file = open(test.bin,rb)

data = c_file.read()
start, stop = 0, struct.calcsize(2s8s2si2s1s)

items = struct.unpack(2s8s2si2s1s,data[start:stop])
print items
print data[:40]

I'm pretty sure that when I tried this at the other PC there were a 
bunch of \x00\x00 characters in the file but they don't appear in 
NotePad  ... anyway, I thought the Python above would unpack the 
data but items appears as


('  ', 'DISTANCE', '  ', 538976288, '10', ' ')

which seems to be contain an extra item (538976288)

Alun Griffiths



--
Bob Gailer
Chapel Hill NC
919-636-4239


___
Tutor 

Re: [Tutor] reading binary files

2009-02-04 Thread eShopping

Bob

sorry, I misread your email and thought it said read on  if the 
file was FORMATTED.  It wasn't so I didn't (but should have).  I read 
the complete thread and it is getting a little messy so I have 
extracted your questions and added some answers.


I'd like to examine the file myself. We might save a lot of time and 
energy that way. If it is not very large would you attach it to your 
reply. If it is very large you could either copy just the first 1000 
or so bytes, or send the whole thing thru www.yousendit.com.


The file is around 800 Mb but I can't get hold of it until next week 
so suggest starting a new topic once I have a cut-down copy.



Well, did you read on? What reactions do you have?


I did (finally) read on and I am still a little confused, though less 
than before.  I guess the word UNFORMATTED means that the file has no 
format  though it presumably has some structure? One 
major  hurdle is that I am not really sure about the difference 
between a Python binary file and a FORTRAN UNFORMATTED file so any 
pointers would be gratefully received



The file looks like (where b = blank) (how it would look in notepad):
bbDISTANCEbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 2s8s2si2s1s
you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i 
matches . (\x40\x40\x40\x40). The i tells unpack to shove those 
4 bytes unaltered into a Python integer, resulting in 538976288. You 
can verify that:


 struct.unpack('i', '')
(538976288,)

Please either assure me you understand or are prepared for a more in 
depth tutorial.


I now understand why Python gave me the results it did ... it looks 
like reading the FORTRAN file will be a non-trivial task so probably 
best to wait until I can post a copy of it.


Thanks for your help

Alun Griffiths




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-04 Thread Alan Gauld


eShopping etrade.griffi...@dsl.pipex.com wrote

I now understand why Python gave me the results it did ... it looks 
like reading the FORTRAN file will be a non-trivial task so probably 
best to wait until I can post a copy of it.


You don't say which OS you are on but you can read the
binary file into a hex editor and see the structure. If you are on 
*nix

you can use od -x and if on Windows run debug and use the d
command to display the file as hex

Using that you should be able to determine whether fields are
fixed length or delimited by a particular character or tagged
with a length prefix etc.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-04 Thread bob gailer

eShopping wrote:


The file is around 800 Mb but I can't get hold of it until next week 
so suggest starting a new topic once I have a cut-down copy.

OK will wait with bated breath.



Well, did you read on? What reactions do you have?


I did (finally) read on and I am still a little confused, though less 
than before.  I guess the word UNFORMATTED means that the file has no 
format 
Depends on what you mean by format. When you use % formatting in Python 
it is the same thing as a FORMATTED WRITE in FORTRAN - a set of 
directives that direct the translation of data to human readable text.


Files per se are a sequence of bytes. As such they have no format. 
When we examine a file we attempt to make sense of the bytes.


Some of the bytes may represent ASCII printable characters - other 
not.The body of this email is a sequence of ASCII printable characters 
that make sense to you when you read them.


The file written UNFORMATTED has some ASCII printable characters that 
you can read (e.g. DISTANCE), some that you can recognize as letters, 
numbers, etc but are not English words, and non-printable characters 
that show up as garbage symbols or not at all. Those that are not 
readable are the internal representation of numbers.


 though it presumably has some structure? One major  hurdle is 
that I am not really sure about the difference between a Python binary 
file and a FORTRAN UNFORMATTED file so any pointers would be 
gratefully received


There is no such thing as a Python binary file. When you open a file 
with mode 'b' you are asking the file system to ignore line-ends. If you 
do not specify 'b' then the file system translates line-ends into \n 
when reading and translates \n back to line-ends. The reason for this is 
that different OS file systems use different codes for line-ends. By 
translating them to and from \n the Python program becomes OS independent.


Windows uses ctrl-M ctrl-J (carriage return - line feed; \x0d\x0a).
Linux/Unix uses ctrl-J (line feed; \x0a).
Mac uses ctrl-M (carriage return; \x0d).
Python uniformly translates these to \n (x0a)

When processing files written without line-ends (e.g. UNFORMATTED) there 
may be line-end characters or sequences that must NOT be treated as 
line-ends. Hence mode 'b'


Example:

 x=open('x','w') # write normal allowing \n to be translated to 
the OS line end.

 x.write(Hello\n)
 x=open('x','rb') # read binary, avoiding translation.
 x.read()
'Hello\r\n'

where \r = \x0d

--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-03 Thread etrade . griffiths
Sorry, still having problems 


   I am trying to read data from a file that has format
  item_name  num_items  item_type  items 
  
  eg
  
  TIME  1  0.0
  DISTANCE 10  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
 
 Where is the item_type?
Ooops, the data format should look like this:

TIME  1  F  0.0
DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

F=float, D=double, L=logical, S=string etc

 
  I can read this if the data are in ASCII format using
  
 in_file = open(my_file.dat,r)
 data1 = in_file.read()
 tokens = data1.split()
 
 It might be easier to process line by line using readline 
 or readlines rather than read but otherwise, ok so far...
 
  and then stepping through the resulting list but the data 
  also appear in the same format in a binary file.  
 
 When you say a binary file do you mean an ASCII file 
 encoded into binary using some standard algorithm?
 Or do you mean the data is binary so that, for example, 
 the number 1 would appear as 4 bytes? If so do you 
 know how strings (the name) are delimited? Also 
 how many could be present - is length a single or 
 multiple bytes? and are the reors fixed length or 
 variable? If variable what is the field/record separator?

Sorry, no idea what the difference is.  All I know is that the data
were written by a FORTRAN program using the UNFORMATTED argument
in the WRITE statement and that if they had been written FORMATTED
then we would get  afile that looks something like the example above

 
 You may need to load the file into a hex editor of debugger 
 to determine the answers...
 
 Having done that the struct module will allow you to read 
 the data.
 
 You can see a basic example of using struct in my 
 tutorial topic about handling files.

The first part of the file should contain a string (eg TIME),
an integer (1) and another string (eg F) so I tried using

import struct
in_file = open(file_name+.dat,rb)
data = in_file.read()
items = struct.unpack('sds', data)

Now I get the error

error: unpack requires a string argument of length 17

which has left me completely baffled! 

 
 
 --
 
 Message: 4
 Date: Mon, 02 Feb 2009 14:53:59 -0700
 From: Bernd Prager be...@prager.ws
 Subject: [Tutor] question about mpmath product expression
 To: tutor@python.org
 Message-ID: ac7e7f56dc4bc0903dc7df8861f9b...@prager.ws
 Content-Type: text/plain; charset=UTF-8
 
 Does anybody know if there is a precision difference when I use mpmath and
 take an expression:
 
 from mpmath import *
 mp.dps = 100
 mu0 = [mpf('4') * pi * power(10, -7)
 
 rather then:
 
 mu0 = fprod([mpf('4'), pi, power(10, -7)])
 
 ?
 
 Thanks,
 -- Bernd
 
 
 --
 
 Message: 5
 Date: Mon, 2 Feb 2009 14:46:18 -0800 (PST)
 From: Bernard Rankin beranki...@yahoo.com
 Subject: [Tutor] regex: not start with FOO
 To: Tutor@python.org
 Message-ID: 528538.84097...@web112218.mail.gq1.yahoo.com
 Content-Type: text/plain; charset=us-ascii
 
 Hello,
 
 
 I'd like to match any line that does not start with FOO.  (Using just a
 reg-ex rule)
 
 1) What is the effective difference between:
 
 (?!^FOO).*
 
 ^(?!FOO).*
 
 2) Is there a better way to do this?
 
 
 Thanks,
 :)
 
 
 
   
 
 
 
 --
 
 Message: 6
 Date: Mon, 02 Feb 2009 15:50:18 -0800
 From: WM. wfergus...@socal.rr.com
 Subject: [Tutor] newton's sqrt formula
 To: tutor@python.org
 Message-ID: 498786ba.6090...@socal.rr.com
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 
 # program to find square root
 square = input ('Please enter a number to be rooted, ')
 square = square * 1.0
 guess = input('Please guess at the root, ')
 guess = guess * 1.0
 newguess = 0.
 
 while guess**2 != square:
  # Newton's formula
  newguess = guess - (guess * guess - square) / (guess * 2)
  guess = newguess
  guess**2 - square
 print
 print
 print guess, ' is the square root of ', square
 print
 print
 print 'bye'
 Last month there was a square root program discussed. I wondered if the 
 tide of my ignorance had receded enough that I could take a whack at 
 messing with it.
 I offer this rewrite for your critique. Can it be terser, faster, prettier?
 Thank you.
 
 
 
 
 --
 
 Message: 7
 Date: Tue, 3 Feb 2009 00:44:27 -
 From: Alan Gauld alan.ga...@btinternet.com
 Subject: Re: [Tutor] newton's sqrt formula
 To: tutor@python.org
 Message-ID: gm841b$l9...@ger.gmane.org
 Content-Type: text/plain; format=flowed; charset=iso-8859-1;
   reply-type=response
 
 WM. wfergus...@socal.rr.com wrote
 
  square = input ('Please enter a number to be rooted, ')
  square = square * 1.0
 
 Use raw_input() instead of input() and don't multiply
 by 1.0 - instead convert to float using float():
 
 square = float( raw_input ('Please enter a number to be rooted, '))
 
  guess = input('Please guess at the root, ')
  guess = guess * 1.0
  newguess = 0.
 
  while guess**2 != square:
  # 

Re: [Tutor] reading binary files

2009-02-03 Thread eShopping

Bob
At 19:52 03/02/2009, you wrote:

etrade.griffi...@dsl.pipex.com wrote:

Data format:

TIME  1  F  0.0
DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

F=float, D=double, L=logical, S=string etc


The first part of the file should contain a string (eg TIME),
an integer (1) and another string (eg F) so I tried using

import struct
in_file = open(file_name+.dat,rb)
data = in_file.read()
items = struct.unpack('sds', data)

Now I get the error

error: unpack requires a string argument of length 17

which has left me completely baffled!



Did you open the file with mode 'b'? If not change that.

You are passing the entire file to unpack when you should be giving 
it only the first line. That's why is is complaining about the 
length. We need to figure out the lengths of the lines.


Consider the first line

TIME  1  F  0.0

There were (I assume)  4 FORTRAN variables written here: character 
integer character float. Without knowing the lengths of the 
character variables we are at a loss as to what the struct format 
should be. Do you know their lengths? Is the last float or double?


Try this: print data[:40] You should see something like:

TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00

where ... means 0 or more intervening stuff. It might be that the 
\x01 and the \n are in other places, as we also have to deal with 
byte order issues.


Please do this and report back your results. And also the FORTRAN 
variable types if you have access to them.


Apologies if this is getting a bit messy but the files are at a 
remote location and I forgot to bring copies home.  I don't have 
access to the original FORTRAN program so I tried to emulate the 
reading the data using the Python script below.  AFAIK the FORTRAN 
format line for the header is  (1X, 1X, A8, 1X, 1X, I6, 1X, 1X, 
A1).  If the data following is a float it is written using n(1X, 
F6.2) where n is the number of records picked up from the preceding header.


# test program to read binary data

import struct

# create dummy data

data = []
for i in range(0,10):
data.append(float(i))

# write data to binary file

b_file = open(test.bin,wb)

b_file.write(  %8s  %6d  %1s\n % (DISTANCE, len(data), F))
for x in data:
b_file.write( %6.2f % x)

b_file.close()

# read back data from file

c_file = open(test.bin,rb)

data = c_file.read()
start, stop = 0, struct.calcsize(2s8s2si2s1s)

items = struct.unpack(2s8s2si2s1s,data[start:stop])
print items
print data[:40]

I'm pretty sure that when I tried this at the other PC there were a 
bunch of \x00\x00 characters in the file but they don't appear in 
NotePad  ... anyway, I thought the Python above would unpack the data 
but items appears as


('  ', 'DISTANCE', '  ', 538976288, '10', ' ')

which seems to be contain an extra item (538976288)

Alun Griffiths





___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-03 Thread bob gailer

etrade.griffi...@dsl.pipex.com wrote:

Data format:

TIME  1  F  0.0
DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

F=float, D=double, L=logical, S=string etc

  
The first part of the file should contain a string (eg TIME),

an integer (1) and another string (eg F) so I tried using

import struct
in_file = open(file_name+.dat,rb)
data = in_file.read()
items = struct.unpack('sds', data)

Now I get the error

error: unpack requires a string argument of length 17

which has left me completely baffled! 

  


Did you open the file with mode 'b'? If not change that.

You are passing the entire file to unpack when you should be giving it 
only the first line. That's why is is complaining about the length. We 
need to figure out the lengths of the lines.


Consider the first line

TIME  1  F  0.0

There were (I assume)  4 FORTRAN variables written here: character 
integer character float. Without knowing the lengths of the character 
variables we are at a loss as to what the struct format should be. Do 
you know their lengths? Is the last float or double?


Try this: print data[:40] You should see something like:

TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00

where ... means 0 or more intervening stuff. It might be that the \x01 
and the \n are in other places, as we also have to deal with byte 
order issues.


Please do this and report back your results. And also the FORTRAN 
variable types if you have access to them.



--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-03 Thread bob gailer
First question: are you trying to work with the file written 
UNFORMATTED? If so read on.


If you are working with a file formatted (1X, 1X, A8, 1X, 1X, I6, 1X, 
1X, A1) then we have a completely different issue to deal with. Do not 
read on, instead let us know.


eShopping wrote:


Data format:

TIME  1  F  0.0
DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

F=float, D=double, L=logical, S=string etc


The first part of the file should contain a string (eg TIME),
an integer (1) and another string (eg F) so I tried using

import struct
in_file = open(file_name+.dat,rb)
data = in_file.read()
items = struct.unpack('sds', data)

Now I get the error

error: unpack requires a string argument of length 17

which has left me completely baffled!



Did you open the file with mode 'b'? If not change that.

You are passing the entire file to unpack when you should be giving 
it only the first line. That's why is is complaining about the 
length. We need to figure out the lengths of the lines.


Consider the first line

TIME  1  F  0.0

There were (I assume)  4 FORTRAN variables written here: character 
integer character float. Without knowing the lengths of the character 
variables we are at a loss as to what the struct format should be. Do 
you know their lengths? Is the last float or double?


Try this: print data[:40] You should see something like:

TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00 



where ... means 0 or more intervening stuff. It might be that the 
\x01 and the \n are in other places, as we also have to deal with 
byte order issues.


Please do this and report back your results. And also the FORTRAN 
variable types if you have access to them.


Apologies if this is getting a bit messy but the files are at a remote 
location and I forgot to bring copies home.  I don't have access to 
the original FORTRAN program so I tried to emulate the reading the 
data using the Python script below.  AFAIK the FORTRAN format line for 
the header is  (1X, 1X, A8, 1X, 1X, I6, 1X, 1X, A1).  If the data 
following is a float it is written using n(1X, F6.2) where n is the 
number of records picked up from the preceding header.


# test program to read binary data

import struct

# create dummy data

data = []
for i in range(0,10):
data.append(float(i))

# write data to binary file

b_file = open(test.bin,wb)

b_file.write(  %8s  %6d  %1s\n % (DISTANCE, len(data), F))
for x in data:
b_file.write( %6.2f % x)


You are still confusing text vs binary. The above writes text regardless 
of the file mode. If the FORTRAN file was written UNFORMATTED then you 
are NOT emulating that with the above program. The character data is 
read back in just fine, since there is no translation involved in the 
writing nor in the reading. The integer len(data) is being written as 
its text (character) representation (translating binary to text) but 
being read back in without translation. Also all the floating point data 
is going out as text.


The file looks like (where b = blank) (how it would look in notepad):

bbDISTANCEbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 
2s8s2si2s1s
you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i 
matches . (\x40\x40\x40\x40). The i tells unpack to shove those 4 
bytes unaltered into a Python integer, resulting in 538976288. You can 
verify that:


 struct.unpack('i', '')
(538976288,)

Please either assure me you understand or are prepared for a more in 
depth tutorial.

b_file.close()

# read back data from file

c_file = open(test.bin,rb)

data = c_file.read()
start, stop = 0, struct.calcsize(2s8s2si2s1s)

items = struct.unpack(2s8s2si2s1s,data[start:stop])
print items
print data[:40]

I'm pretty sure that when I tried this at the other PC there were a 
bunch of \x00\x00 characters in the file but they don't appear in 
NotePad  ... anyway, I thought the Python above would unpack the data 
but items appears as


('  ', 'DISTANCE', '  ', 538976288, '10', ' ')

which seems to be contain an extra item (538976288)

Alun Griffiths




--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-03 Thread eShopping

Bob

I am trying to read UNFORMATTED files.  The files also occur as 
formatted files and the format string I provided is the string used 
to write the formatted version.  I can read the formatted version 
OK.  I (naively) assumed that the same format string was used for 
both files, the only differences being whether the FORTRAN WRITE 
statement indicated unformatted or formatted.


Best regards

Alun Griffiths

At 21:41 03/02/2009, bob gailer wrote:
First question: are you trying to work with the file written 
UNFORMATTED? If so read on.


If you are working with a file formatted (1X, 1X, A8, 1X, 1X, I6, 
1X, 1X, A1) then we have a completely different issue to deal with. 
Do not read on, instead let us know.


eShopping wrote:


Data format:

TIME  1  F  0.0
DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

F=float, D=double, L=logical, S=string etc


The first part of the file should contain a string (eg TIME),
an integer (1) and another string (eg F) so I tried using

import struct
in_file = open(file_name+.dat,rb)
data = in_file.read()
items = struct.unpack('sds', data)

Now I get the error

error: unpack requires a string argument of length 17

which has left me completely baffled!


Did you open the file with mode 'b'? If not change that.

You are passing the entire file to unpack when you should be 
giving it only the first line. That's why is is complaining 
about the length. We need to figure out the lengths of the lines.


Consider the first line

TIME  1  F  0.0

There were (I assume)  4 FORTRAN variables written here: character 
integer character float. Without knowing the lengths of the 
character variables we are at a loss as to what the struct format 
should be. Do you know their lengths? Is the last float or double?


Try this: print data[:40] You should see something like:

TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00

where ... means 0 or more intervening stuff. It might be that the 
\x01 and the \n are in other places, as we also have to deal with 
byte order issues.


Please do this and report back your results. And also the FORTRAN 
variable types if you have access to them.


Apologies if this is getting a bit messy but the files are at a 
remote location and I forgot to bring copies home.  I don't have 
access to the original FORTRAN program so I tried to emulate the 
reading the data using the Python script below.  AFAIK the FORTRAN 
format line for the header is  (1X, 1X, A8, 1X, 1X, I6, 1X, 1X, 
A1).  If the data following is a float it is written using n(1X, 
F6.2) where n is the number of records picked up from the preceding header.


# test program to read binary data

import struct

# create dummy data

data = []
for i in range(0,10):
data.append(float(i))

# write data to binary file

b_file = open(test.bin,wb)

b_file.write(  %8s  %6d  %1s\n % (DISTANCE, len(data), F))
for x in data:
b_file.write( %6.2f % x)


You are still confusing text vs binary. The above writes text 
regardless of the file mode. If the FORTRAN file was written 
UNFORMATTED then you are NOT emulating that with the above program. 
The character data is read back in just fine, since there is no 
translation involved in the writing nor in the reading. The integer 
len(data) is being written as its text (character) representation 
(translating binary to text) but being read back in without 
translation. Also all the floating point data is going out as text.


The file looks like (where b = blank) (how it would look in notepad):

bbDISTANCEbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 2s8s2si2s1s
you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i 
matches . (\x40\x40\x40\x40). The i tells unpack to shove those 
4 bytes unaltered into a Python integer, resulting in 538976288. You 
can verify that:


 struct.unpack('i', '')
(538976288,)

Please either assure me you understand or are prepared for a more in 
depth tutorial.

b_file.close()

# read back data from file

c_file = open(test.bin,rb)

data = c_file.read()
start, stop = 0, struct.calcsize(2s8s2si2s1s)

items = struct.unpack(2s8s2si2s1s,data[start:stop])
print items
print data[:40]

I'm pretty sure that when I tried this at the other PC there were a 
bunch of \x00\x00 characters in the file but they don't appear in 
NotePad  ... anyway, I thought the Python above would unpack the 
data but items appears as


('  ', 'DISTANCE', '  ', 538976288, '10', ' ')

which seems to be contain an extra item (538976288)

Alun Griffiths



--
Bob Gailer
Chapel Hill NC
919-636-4239


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-02 Thread Alan Gauld


etrade.griffi...@dsl.pipex.com wrote

 I am trying to read data from a file that has format
item_name  num_items  item_type  items 

eg

TIME  1  0.0
DISTANCE 10  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0


Where is the item_type?


I can read this if the data are in ASCII format using

   in_file = open(my_file.dat,r)
   data1 = in_file.read()
   tokens = data1.split()


It might be easier to process line by line using readline 
or readlines rather than read but otherwise, ok so far...


and then stepping through the resulting list but the data 
also appear in the same format in a binary file.  


When you say a binary file do you mean an ASCII file 
encoded into binary using some standard algorithm?
Or do you mean the data is binary so that, for example, 
the number 1 would appear as 4 bytes? If so do you 
know how strings (the name) are delimited? Also 
how many could be present - is length a single or 
multiple bytes? and are the reors fixed length or 
variable? If variable what is the field/record separator?


You may need to load the file into a hex editor of debugger 
to determine the answers...


Having done that the struct module will allow you to read 
the data.


You can see a basic example of using struct in my 
tutorial topic about handling files.


HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary files

2009-02-02 Thread jadrifter
On Mon, 2009-02-02 at 11:31 +, etrade.griffi...@dsl.pipex.com wrote:
 Hi
 
 I am trying to read data from a file that has format
 
 item_name  num_items  item_type  items 
 
 eg
 
 TIME  1  0.0
 DISTANCE 10  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
 TIME  1  1.0
 DISTANCE 10  1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
 
 I can read this if the data are in ASCII format using
 
 in_file = open(my_file.dat,r)
 data1 = in_file.read()
 tokens = data1.split()
 
 and then stepping through the resulting list but the data 
 also appear in the same format in a binary file.  I tried 
 converting the binary file to an ASCII file using
 
 ifile = open(my_file.dat,rb)
 ofile = open(new_file.dat,w)
 base64.decode(ifile, ofile)
 
 but that gave the error Error: Incorrect padding.  I imagine
 that there is a straightforward way of doing this but haven't
 found it so far.  Would be grateful for any suggestions!
 
 Thanks
 
 Alun Griffiths
 
Honestly I'm not sure what you're asking for but in general for reading
binary data the I use the struct module.  Check it out in the
documentation.

John Purser

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor