Re: Readling a very large text file of coordinates and heights

Norman Vine Tue, 29 Mar 2011 02:29:27 -0700

This is a basic python question and nothing todo with GIS but

following untested but should work
assumes a whitespace field separator


a string and a float are not the same thing
see 'float' operator and 'string' methods in the python docs

[code]
readfile='bethlehem.xyz'

file = open(readfile)

def inside(x,y):
   if x >= -66483.300 and x <= -33474.900:
      if y >= -3155672.310 and y <= -3122229.700:
          return True
    return False

while 1:
   # read a chunck of the file
   lines = file.readlines(100000)
   if not lines:
       break
   for line in lines:
   # extract x, y and z
      fields = line.split()    # or X,Y,Z = line.split()
      if inside(float(fields[0]),float(fields[1])):   # or 
inside(float(X),float(Y))
          print line
[/code]

On Mar 29, 2011, at 4:56 AM, Hanlie Pretorius wrote:

> Thanks for the suggestion. I can't use the CSV driver to read the
> file, though, because it doesn't support a space as a field separator,
> only COMMA/SEMICOLON/TAB.
> 
> 2011/3/29, Paolo Corti <[email protected]>:
>>> I've received a 10m resolution DEM in xyz text format. The file is
>>> about 1 GB in size. The file is too big to open in a text editor, such
>>> as Notepad and I don't have Office 2007, so Excel cuts off the file
>>> after 67 000 lines.
>>> 
>>> So, I need to write a Python script to to read this file and extract
>>> only the data that falls within my study area. According to QGIS, the
>>> extents of my area is:
>>> xMin,yMin -66483.3,-3155672.31 : xMax,yMax -33474.9,-3122229.70
>>> 
>>> This is the first unprocessed line in the file, which I extracted using
>>> Python:
>>>  -74289.694 -3182439.485  2092.029
>>> 
>>> The spacing between the lines are not consistent, which is another
>>> reason why I need to manipulate the data so that GRASS can import it.
>>> 
>>> Reading the whole file at once causes a MemoryError in Python, so I've
>>> written the following code to read it in chunks, with some help from
>>> the web - <http://effbot.org/zone/readline-performance.htm>:
>>> 
>>> [code]
>>> readfile='bethlehem.xyz'
>>> 
>>> file = open(readfile)
>>> 
>>> while 1:
>>>    # read a chunck of the file
>>>    lines = file.readlines(100000)
>>>    if not lines:
>>>        break
>>>    for line in lines:
>>>    # extract x, y and z
>>>        x = line[2:12]
>>>        y = line[13:25]
>>>        z = line[27:35]
>>>        if x >= -66483.300 and x <= -33474.900:
>>>           if y >= -3155672.310 and y <= -3122229.700:
>>>               print line
>>> [/code]
>>> 
>>> This code runs for a (relatively) short while and exits having printed no
>>> lines.
>>> 
>>> My questions are thus:
>>> 1. Will this code iterate through the whole file, or does it read only
>>> the first 100 000 bytes of text? If it reads only the first 100 000
>>> bytes, how can I change it to read the while file in chunks?
>>> 
>>> 2. Is the logic in my if statements correct to extract the values for
>>> my study area? If not, how should I change it?
>> 
>> Hi Hanlie
>> 
>> don't reinvent the wheel: use GDAL ogr2ogr utility [0] with the -clipsrc
>> option.
>> Read the file by using the csv driver [1]
>> 
>> best regards
>> P
>> 
>> [0] http://www.gdal.org/ogr2ogr.html
>> [1] http://www.gdal.org/ogr/drv_csv.html
>> 
>> --
>> Paolo Corti
>> Geospatial software developer
>> web: http://www.paolocorti.net
>> twitter: @paolo_corti
>>

Re: Readling a very large text file of coordinates and heights

Reply via email to