This is a basic python question and nothing todo with GIS but
following untested but should work
assumes a whitespace field separator
a string and a float are not the same thing
see 'float' operator and 'string' methods in the python docs
[code]
readfile='bethlehem.xyz'
file = open(readfile)
def inside(x,y):
if x >= -66483.300 and x <= -33474.900:
if y >= -3155672.310 and y <= -3122229.700:
return True
return False
while 1:
# read a chunck of the file
lines = file.readlines(100000)
if not lines:
break
for line in lines:
# extract x, y and z
fields = line.split() # or X,Y,Z = line.split()
if inside(float(fields[0]),float(fields[1])): # or
inside(float(X),float(Y))
print line
[/code]
On Mar 29, 2011, at 4:56 AM, Hanlie Pretorius wrote:
> Thanks for the suggestion. I can't use the CSV driver to read the
> file, though, because it doesn't support a space as a field separator,
> only COMMA/SEMICOLON/TAB.
>
> 2011/3/29, Paolo Corti <[email protected]>:
>>> I've received a 10m resolution DEM in xyz text format. The file is
>>> about 1 GB in size. The file is too big to open in a text editor, such
>>> as Notepad and I don't have Office 2007, so Excel cuts off the file
>>> after 67 000 lines.
>>>
>>> So, I need to write a Python script to to read this file and extract
>>> only the data that falls within my study area. According to QGIS, the
>>> extents of my area is:
>>> xMin,yMin -66483.3,-3155672.31 : xMax,yMax -33474.9,-3122229.70
>>>
>>> This is the first unprocessed line in the file, which I extracted using
>>> Python:
>>> -74289.694 -3182439.485 2092.029
>>>
>>> The spacing between the lines are not consistent, which is another
>>> reason why I need to manipulate the data so that GRASS can import it.
>>>
>>> Reading the whole file at once causes a MemoryError in Python, so I've
>>> written the following code to read it in chunks, with some help from
>>> the web - <http://effbot.org/zone/readline-performance.htm>:
>>>
>>> [code]
>>> readfile='bethlehem.xyz'
>>>
>>> file = open(readfile)
>>>
>>> while 1:
>>> # read a chunck of the file
>>> lines = file.readlines(100000)
>>> if not lines:
>>> break
>>> for line in lines:
>>> # extract x, y and z
>>> x = line[2:12]
>>> y = line[13:25]
>>> z = line[27:35]
>>> if x >= -66483.300 and x <= -33474.900:
>>> if y >= -3155672.310 and y <= -3122229.700:
>>> print line
>>> [/code]
>>>
>>> This code runs for a (relatively) short while and exits having printed no
>>> lines.
>>>
>>> My questions are thus:
>>> 1. Will this code iterate through the whole file, or does it read only
>>> the first 100 000 bytes of text? If it reads only the first 100 000
>>> bytes, how can I change it to read the while file in chunks?
>>>
>>> 2. Is the logic in my if statements correct to extract the values for
>>> my study area? If not, how should I change it?
>>
>> Hi Hanlie
>>
>> don't reinvent the wheel: use GDAL ogr2ogr utility [0] with the -clipsrc
>> option.
>> Read the file by using the csv driver [1]
>>
>> best regards
>> P
>>
>> [0] http://www.gdal.org/ogr2ogr.html
>> [1] http://www.gdal.org/ogr/drv_csv.html
>>
>> --
>> Paolo Corti
>> Geospatial software developer
>> web: http://www.paolocorti.net
>> twitter: @paolo_corti
>>