Terry Reedy wrote:
Dale Amon wrote:
Now I can move on to parsing those pesky Fortran card
images... There wouldn't happen to be a way to take n
continguous slices from a string (card image) where each slice may be
a different length would there? Fortran you know. No spaces between
input fields. :-)
I know a way to do it, iterating over a list of slice sizes,
Yes.
perhaps in a list comprehension, but some of the august python
personages here no doubt know better ways.
No. Off the top of my head, here is what I would do something like
(untested)
def card_slice(card, sizes):
"Card is data input string. Sizes is an iterable of int field sizes,
where negatives are skipped fields. Return list of strings."
pos, ret = 0, []
for i in sizes:
if i > 0:
ret.append(card[pos:pos+i])
else:
i = -i
pos += i
return ret
To elaborate this, make sizes an iterable of (size, class) pairs, where
class is str, int, or float (for Fortran) or other for more generel use.
Then
...
for i,c in sizes:
if i > 0:
ret.append(c(card[pos:pos+i]))
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
------------------------------------------------
Terry is on right track.
I use this:
* 1#
comment="""
Structure for TYPE III file named <tmp_att.dbf>
Number of bytes per record : 129
Number of fields in record : 25
Number of records in file : 0
Date file was last updated : 11/20/ 8
Field Label Type Size/Dec. Offset
1 AREA N 14 3 1
2 PERIMITER N 14 3 15
3 DUMMY1 N 11 0 29
4 DUMMY2 N 11 0 40
5 BL_X N 12 0 51
6 BL_Y N 12 0 63
7 ACRES C 13 0 75
.
.
"""
Then:
open file
skip any header bytes
while not eof:
read full record (129 bytes in this case) into buffer (tstln)
#parse:
PERIMITER= tstln[15:29]
DUMMY1= tstln[29:40]
DUMMY2= tstln[40:51]
BL_X= tstln[51:63]
BL_Y= tstln[63:75]
ACRES= tstln[75:88]
.
.
do stuff
#loop
The other method he mentions works too. I create an Ashton-Tate dBASE
III+ from a template file, add the structure I want and populate it
from, well... a wide variety of sources. The 'header' has all the field
names and sizes and such. I can then use the above manual method or let
the program do the parsing. The second method is much more flexible.
import StringIO
source1= StringIO.StringIO("""(This file must be converted with
BinHex 4.0)
:#h0[GA*MC6%ZC'*Q!$q3#!#3"!4L!*!%'H!$#!B$!*!%B36M!*!98e4"9%P26Pp
133"$#3$'44i!!!!"!*!,48a&63#3"d-R!-C&"!!!!!%!N!YC48&568m!N!9$+`$
'43B!!!!"!*!,4%&C-$%!-3#3"%-a!-C&"J!!!!%!N!Y%39N`-J!b!*!%3cF!aN8
'!!!!!3#3#d4"@6!c!$-!N!4$23$'43B!!!!"!*!,4%&C-$3!0!#3"%0$!-C&"J!
!!!%!N!Y%39N`03!e!*!%3dN!aN8'!!!!!3#3#d4"@6!f!$B!N!4$6`$'43B!!!!
"!*!,4%&C-$F!0`#3"%09!-C&"J!!!!%!N!Y%39N`1!!i!*!%3eX!aN8'!!!!!3#
3#d4"@6!j!$N!N!4$B3$'43B!!!!"!*!,4%&C-6!!-!#3"%0R!-C&"J!!!!%!N!Y
%39Na-3#3"N0Y!-C&"J!!!!%!N!Y%39Na-J!b!*!%3h-!aN8'!!!!!3#3#d4"@6%
c!$-!N!4$H3$'43B!!!!"!*!,4%&C-63!0!#3"%0r!-C&"J!!!!%!N!Y%39Na03!
e!*!%3i8!aN8'!!!!!3#3#d4"@6%f!$B!N!4$L`$'43B!!!!"!*!,4%&C-6F!0`#
3"%14!-C&"J!!!!%!N!Y%39Na1!!i!*!%3jF!aN8'!!!!!3#3#d4"@6%j!$N!N!4
$R3$'43B!!!!"!*!,4%&C-M!!-!#3"%1M!-C&"J!!!!%!N!Y%39Nb-3!a!*!%3kN
!aN8'!!!!!3#3#d4"@6)b!$)!N!4$V`$'43B!!!!"!*!,4%&C-M-!-`#3"%1e!-C
&"J!!!!%!N!Y%39Nb0!!d!*!%3lX!aN8'!!!!!3#3#d4"@6)e!$8!N!4$`3$'43B
!!!!"!*!,4%&C-MB!0J#3"%2(!-C&"J!!!!%!N!Y%39Nb0`!h!*!%3md!aN8'!!!
!!3#3#d4"@6)i!$J!N!4$d`$'43B!!!!"!*!,4%&C-MN!13#3"%2C!-C&"J!!!!%
!N!Y%39Nc-!!`!*!%3pm!aN8'!!!!!3#3#d4"@6-a!$%!N!4$j3$'43B!!!!"!*!
,$4TU9`!!:
""")
hexbin(source1, 'source1.dbf')
source1.close()
del source1
######### above makes a structure
def rdhdr(adbf):
adbf.seek(4) #ver,yr,mo,day
hdr= struct.unpack('L',adbf.read(4)) #number of records
hdr= hdr + struct.unpack('H',adbf.read(2)) #length of header
hdr= hdr + struct.unpack('H',adbf.read(2)) #length of records
adbf.seek(32)
fld= 1
while adbf.tell() < (hdr[1] - 32): #each field def is 32bytes
adbf.seek(fld*32)
hdrn= struct.unpack('11s',adbf.read(11))[0].strip('x\00')
adbf.seek(5,1)
hdrs= (struct.unpack('B',adbf.read(1))[0])
hdr= hdr + ((hdrn,hdrs,fld),) #name,size,seq.number
fld= fld + 1
return(hdr)
################ above sets up for parsing (reading or writing or both)
If the Fortran is in fact from a punch card then your record will be 80
columns (IBM) or 90 (UniVac). The green bar paper is 132. In each
case, offset zero is page control and last 4 columns of card are for
sequencing the cards in case you (or someone) dropped the deck. Last 6
columns for sequence on green bar as I recall. (decks numbers additive)
The advantage of using the .dbf is it creates a user friendly file.
Excel, well - almost any spread sheet or database program.
Plus it is a dbf and database operations are 'right there'. Microsoft
office, OpenOffice and the list goes on, all read/write .dbf
By the way - CSV (comma seperated values) were in use in the past, but
due to memory (or lack of) were only for sequential use. You have to
count the commas and compare to number expected at EOL.
SDF (standard data format) the card was for random access.
good old lseek(record_number * record_size, basepoint)
snipets are from actual code dated Feb 2008.
using Python 2.5.2 on Linux Slackware 10.2
today: 20090430
Steve
--
http://mail.python.org/mailman/listinfo/python-list