So I've was running this query for 866000 s (10 days) before I decided to kill 
it:

create table taz_and_lancover_10_fast_2 as
SELECT p.lc_class, n.taz
 , CASE 
   WHEN ST_CoveredBy(p.geom, n.the_geom) 
   THEN p.geom 
   ELSE 
    ST_Multi(
      ST_Intersection(p.geom,n.geom)
      ) END AS geom 
 FROM lancover_polygons_snap AS p 
   INNER JOIN tazjan2 AS n 
    ON (ST_Intersects(p.geom, n.the_geom) 
      AND NOT ST_Touches(p.geom, n.the_geom) );

Explain shows this, it's using the spatial index:

"Nested Loop  (cost=0.00..310492.35 rows=483105 width=9973)"
"  Join Filter: (_st_intersects(p.geom, n.the_geom) AND ((NOT (p.geom && 
n.the_geom)) OR (NOT _st_touches(p.geom, n.the_geom))))"
"  ->  Seq Scan on tazjan2 n  (cost=0.00..843.09 rows=2709 width=9493)"
"  ->  Index Scan using lancover_polygons_snap_geom_idx on 
lancover_polygons_snap p  (cost=0.00..21.67 rows=5 width=480)"
"        Index Cond: (geom && n.the_geom)"

There are 2709 rows in tazjan2 and 998031 rows in lancover_polygons_snap, so I 
appreciate that it's a bit of a large problem.  But MapInfo was able to do it 
interactively in a few days and Geomedia was also able to do it in about a day. 
 

Both MapInfo and Geomedia ran out of memory (8GB machines) until the problem 
was broken into two regions (North and South), but postgresql seems to be 
chewing on the problem using only 400MB.  The interactive approach in MapInfo 
was to divide lancover_polygons_snap by lc_class, to further divide each region 
into about 10 subproblems.  Perhaps subprobleming this is the way to go?  Can't 
the query subproblem it based in the indices or would I have to do that 
manually?

One potential thing I've realized is that a few of the geometries in tazjan2 
are multipolygons, not single polygons.  But it's only a few.  There are a few 
very large and complex polygons in lancover_polygons_snap, but again, most of 
the 998031 are simple small polygons, about half would be ST_CoveredBy the 
polygons in tazjan2 and most of the rest would only overlap two or three of the 
polygons in tazjan2.

I must be doing something wrong.  Any hints?

I have
 max_connections set to 100 (currently only 11 connections active)
 work_mem was defaulting to 1MB, I just bumped it to 256MB (machine has 32GB 
but it's Postgresql9.1 (x86), i.e. 32bit version
 shared_buffers was 32MB, I am trying 512MB, could go higher
 effective_cache_size was defaulting to 128MB, I am trying 20GB.
 random_page_cost was 4.0, this is VMWare virtual server.  I am trying 2.0 

I'm trying an "explain analyze" with just 5 rows of tazjan2, a 5 x 998031 
intersection problem with indices shouldn't take that long, should it?  It's 
been running for a little over an hour.  I'm wondering if the few complex 
polygons in lancover_polygons_snap are causing the slowness?  Is there some 
fast way to divide complex polygons, e.g. apply a 1km grid over them?  

PostGISfullversion(): "POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" 
PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" 
LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.0.3 r11132" need upgrade) 
TOPOLOGY (topology procs from "2.0.3 r11132" need upgrade) RASTER (raster procs 
from "2.0.3 r11132" need upgrade)"

--
John Abraham


_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to