Re: [Tutor] parse text file

2010-06-03 Thread Colin Talbert
Dave,
I think you are probably right about using decompressor.  I 
couldn't find any example of it in use and wasn't having any luck getting 
it to work based on the documentation.  Maybe I should try harder on this 
front.

Colin Talbert
GIS Specialist
US Geological Survey - Fort Collins Science Center
2150 Centre Ave. Bldg. C
Fort Collins, CO 80526

(970) 226-9425
talbe...@usgs.gov




From:
Dave Angel 
To:
Colin Talbert 
Cc:
Steven D'Aprano , tutor@python.org
Date:
06/03/2010 12:36 PM
Subject:
Re: [Tutor] parse text file



Colin Talbert wrote:
> 
> You are so correct.  I'd been trying numerous things to read in this 
file 
> and had deleted the code that I meant to put here and so wrote this from 

> memory incorrectly.  The code that I wrote should have been:
>
> import bz2
> input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
> str=input_file.read()
> len(str)
>
> Which indeed does return only 90.
>
> Which is also the number returned when you sum the length of all the 
lines 
> returned in a for line in file with:
>
>
> import bz2
> input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
> lengthz = 0
> for uline in input_file:
> lengthz = lengthz + len(uline)
>
> print lengthz
>
> 
> 
>
Seems to me for such a large file you'd have to use 
bz2.BZ2Decompressor.  I have no experience with it, but its purpose is 
for sequential decompression -- decompression where not all the data is 
simultaneously available in memory.

DaveA



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] parse text file

2010-06-03 Thread Colin Talbert
Hello Steven,
Thanks for the reply.  Also this is my first post to tu...@python 
so I'll reply all in the future.


However, a file of that size changes things drastically. You can't 
expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file 
into memory at once, let along the unpacked 131 GB text file, EVEN if 
your computer has more than 9.2 GB of memory. So your tests need to 
take this into account.

I thought when you did a for uline in input_file each single line would go 
into memory independently, not the entire file.



I'm pretty sure that this is not your code, because you can't call len() 
on a bz2 file. If you try, you get an error:

You are so correct.  I'd been trying numerous things to read in this file 
and had deleted the code that I meant to put here and so wrote this from 
memory incorrectly.  The code that I wrote should have been:

import bz2
input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
str=input_file.read()
len(str)

Which indeed does return only 90.

Which is also the number returned when you sum the length of all the lines 
returned in a for line in file with:


import bz2
input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
lengthz = 0
for uline in input_file:
lengthz = lengthz + len(uline)

print lengthz


Thanks again for you help and sorry for the bad code in the previous 
submittal.


Colin Talbert
GIS Specialist
US Geological Survey - Fort Collins Science Center
2150 Centre Ave. Bldg. C
Fort Collins, CO 80526

(970) 226-9425
talbe...@usgs.gov




From:
Steven D'Aprano 
To:
tutor@python.org
Date:
06/02/2010 03:42 PM
Subject:
Re: [Tutor] parse text file
Sent by:
tutor-bounces+talbertc=usgs@python.org



Hi Colin,

I'm taking the liberty of replying to your message back to the list, as 
others hopefully may be able to make constructive comments. When 
replying, please ensure that you reply to the tutor mailing list rather 
than then individual.


On Thu, 3 Jun 2010 12:20:10 am Colin Talbert wrote:

> > Without seeing your text file, and the code you use to read the text
> > file, there's no way of telling what is going on, but I can guess
> > the most likely causes:
>
> Since the file is 9.2 gig it wouldn't make sense to send it to you. 

And I am very glad you didn't try *smiles*

However, a file of that size changes things drastically. You can't 
expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file 
into memory at once, let along the unpacked 131 GB text file, EVEN if 
your computer has more than 9.2 GB of memory. So your tests need to 
take this into account.

> > (2) There's a bug in your code so that you stop reading after
> > 900,000 bytes.
> The code is simple enough that I'm pretty sure there is not a
> bug in it.
>
> import bz2
> input_file =
> bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') print
> len(input_file)
>
> returns 90

I'm pretty sure that this is not your code, because you can't call len() 
on a bz2 file. If you try, you get an error:


>>> x = bz2.BZ2File('test.bz2', 'w')  # create a temporary file
>>> x.write("some data")
>>> x.close()
>>> input_file = bz2.BZ2File('test.bz2', 'r')  # open it
>>> print len(input_file)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: object of type 'bz2.BZ2File' has no len()


So whatever your code actually is, I'm fairly sure it isn't what you say 
here.



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] parse text file

2010-06-01 Thread Colin Talbert
I am also experiencing this same problem.  (Also on a OSM bz2 
file).  It appears to be working but then partway through reading a file 
it simple ends.  I did track down that file length is always 90 so it 
appears to be related to some sort of buffer constraint.


Any other ideas?

import bz2

input_file = bz2.BZ2File(r"C:\temp\planet-latest.osm.bz2","r")
try:
all_data = input_file.read()
print str(len(all_data))
finally:
input_file.close()






Colin Talbert
GIS Specialist
US Geological Survey - Fort Collins Science Center
2150 Centre Ave. Bldg. C
Fort Collins, CO 80526

(970) 226-9425
talbe...@usgs.gov
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor