[GRASS-dev] Python loop and SQLite performance issue?

Blumentrath, Stefan Tue, 03 Mar 2015 01:28:44 -0800

Hi,

I am struggling with the performance of SQLite (I think), esp. when I use it in 
a python loop executed in parallel processes (using xargs) .


I am analyzing characteristics of a relatively large number (270k) of 
overlapping lake catchments which were generated in GRASS and now are stored in 
a PostGIS DB. I split the data in (10) chunks and analyse each chunk in it`s 
own GRASS 70 mapset (with SQLite backend) where I am looping over the 
catchments one by one (in python).

At first I tried to import the individual catchments using v.in.ogr. But 
v.in.ogr was slowing down the process significantly. It took 11 seconds to 
import a single, not very complex polygon (which is probably related to: 
https://trac.osgeo.org/grass/ticket/2185 ?; btw. my GRASSDB is not on NFS). So 
I switched to using gdal_rasterize and linked the resulting raster to GRASS 
(r.external) (as I am planning to work with rasters ater anyway). Rasterization 
and import takes all together less than a second. It made no difference for the 
speed of v.in.ogr if I imported the attribute table or not. However, converting 
the raster to vector in GRASS takes less than a second, so the topology 
creation does not seem to be the issue and also an attribute table is created...

Then I add a relatively large number of columns (400 or something) using 
v.db.addcolumn. That again takes 19 seconds for my single test process. If I 
run all 10 chunks in parallel (I have 24 cores and lots of memory available), 
adding the columns takes 30 seconds for each catchment, almost twice as much). 
During the loop the time spend on adding the columns continues increasing up to 
almost 30 min (at that point I canceled the process)... There is obviously 
something not working as it should be...

Analysing (r.univar) ca. 40 raster maps takes for the smaller catchments all 
together less than 5 seconds.

After that I removed all SQLite related code from my script and loaded results 
directly back to PostgreSQL/PostGIS. Then the smaller catchments are done in 
less than 5 seconds...

Does anyone have an idea what cause this performance loss is due to? Is it no 
good practice to call a python script (v.db.addcolumn) in a python loop, or is 
this related to SQLite journal files or ...
I am grateful for any hints!

Cheers
Stefan

P.S.: I can share the script if that helps identifying the issue...

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

[GRASS-dev] Python loop and SQLite performance issue?

Reply via email to