Thanks for the hint, Rémi.  I'll give it a try, to see if the st_intersection 
is slow (as compared to just st_intersects).

Here's my EXPLAIN ANALYZE for my 5 zone test:

"Nested Loop  (cost=1.56..80.91 rows=8 width=560) (actual 
time=3929258.738..19756123.479 rows=29 loops=1)"
"  Join Filter: (_st_intersects(p.geom, n.the_geom) AND ((NOT (p.geom && 
n.the_geom)) OR (NOT _st_touches(p.geom, n.the_geom))))"
"  CTE sometaz"
"    ->  Limit  (cost=0.00..1.56 rows=5 width=9537) (actual time=0.433..0.576 
rows=5 loops=1)"
"          ->  Seq Scan on tazjan2  (cost=0.00..843.09 rows=2709 width=9537) 
(actual time=0.415..0.511 rows=5 loops=1)"
"  ->  CTE Scan on sometaz n  (cost=0.00..0.10 rows=5 width=80) (actual 
time=0.460..0.724 rows=5 loops=1)"
"  ->  Index Scan using lancover_polygons_snap_geom_idx on 
lancover_polygons_snap p  (cost=0.00..12.47 rows=5 width=480) (actual 
time=0.242..65.640 rows=16 loops=5)"
"        Index Cond: (geom && n.the_geom)"
"Total runtime: 19756143.935 ms"

Here it is explained in layspeak: http://explain.depesz.com/s/dbVi

So, am I reading this right, that it took 5.5 hours to st_intersects 5 
(multi)polygons with 29 (multi)polygons, and find the 8 that intersected?  That 
the index scan was fast (328ms), and returned only a few (16, or maybe 29) of 
the 998031 polygons, but that the actual intersection afterwards, between just 
a few polygons, was so very slow?

(The query was this:

with sometaz as (select * from tazjan2 limit 5)
SELECT p.lc_class, n.taz
 , CASE 
   WHEN ST_CoveredBy(p.geom, n.the_geom) 
   THEN p.geom 
   ELSE 
    ST_Multi(
      ST_Intersection(p.geom,n.geom)
      ) END AS geom 
 FROM lancover_polygons_snap AS p 
   INNER JOIN sometaz AS n 
    ON (ST_Intersects(p.geom, n.the_geom) 
      AND NOT ST_Touches(p.geom, n.the_geom) );

)

--
John Abraham
j...@hbaspecto.com
403-232-1060

On Feb 19, 2015, at 11:56 AM, Rémi Cura <remi.c...@gmail.com> wrote:

> Hey,
> you could try to not use CASE (so separate the spatial join from the 
> processing, which is easy to parallelize (assuming you have more than one 
> core )).
> 
> First generate the table with  
> ----
> CREATE TABLE psatial_mapping_between_lancover_and_taz AS 
> SELECT row_number() over() as row_id, p.your_primary_key, n.your_primary_key
>  FROM lancover_polygons_snap AS p 
>    INNER JOIN tazjan2 AS n 
>     ON (ST_Intersects(p.geom, n.the_geom) 
>       AND NOT ST_Touches(p.geom, n.the_geom) );
> ----
> 
> Then when you have your table, you can do your processing (put into a SQL 
> function , it is cleaner),
> and you parallelize on the "row_id".
> 
> Basically you get min(row_id), ax(row_id), you spearate it into N parts, then 
> execute the parts with your K processes.
> (both bash and python have utilities for this).
> 
> Of course if it is actually the spatial join that is slow, you can also 
> parallelise this.
> (by cutting lancoverinto N pieces for instance )
> Cheers,
> Rémi-C
> 
> 2015-02-19 19:45 GMT+01:00 John Abraham <j...@hbaspecto.com>:
> So I've was running this query for 866000 s (10 days) before I decided to 
> kill it:
> 
> create table taz_and_lancover_10_fast_2 as
> SELECT p.lc_class, n.taz
>  , CASE 
>    WHEN ST_CoveredBy(p.geom, n.the_geom) 
>    THEN p.geom 
>    ELSE 
>     ST_Multi(
>       ST_Intersection(p.geom,n.geom)
>       ) END AS geom 
>  FROM lancover_polygons_snap AS p 
>    INNER JOIN tazjan2 AS n 
>     ON (ST_Intersects(p.geom, n.the_geom) 
>       AND NOT ST_Touches(p.geom, n.the_geom) );
> 
> Explain shows this, it's using the spatial index:
> 
> "Nested Loop  (cost=0.00..310492.35 rows=483105 width=9973)"
> "  Join Filter: (_st_intersects(p.geom, n.the_geom) AND ((NOT (p.geom && 
> n.the_geom)) OR (NOT _st_touches(p.geom, n.the_geom))))"
> "  ->  Seq Scan on tazjan2 n  (cost=0.00..843.09 rows=2709 width=9493)"
> "  ->  Index Scan using lancover_polygons_snap_geom_idx on 
> lancover_polygons_snap p  (cost=0.00..21.67 rows=5 width=480)"
> "        Index Cond: (geom && n.the_geom)"
> 
> There are 2709 rows in tazjan2 and 998031 rows in lancover_polygons_snap, so 
> I appreciate that it's a bit of a large problem.  But MapInfo was able to do 
> it interactively in a few days and Geomedia was also able to do it in about a 
> day.  
> 
> Both MapInfo and Geomedia ran out of memory (8GB machines) until the problem 
> was broken into two regions (North and South), but postgresql seems to be 
> chewing on the problem using only 400MB.  The interactive approach in MapInfo 
> was to divide lancover_polygons_snap by lc_class, to further divide each 
> region into about 10 subproblems.  Perhaps subprobleming this is the way to 
> go?  Can't the query subproblem it based in the indices or would I have to do 
> that manually?
> 
> One potential thing I've realized is that a few of the geometries in tazjan2 
> are multipolygons, not single polygons.  But it's only a few.  There are a 
> few very large and complex polygons in lancover_polygons_snap, but again, 
> most of the 998031 are simple small polygons, about half would be 
> ST_CoveredBy the polygons in tazjan2 and most of the rest would only overlap 
> two or three of the polygons in tazjan2.
> 
> I must be doing something wrong.  Any hints?
> 
> I have
>  max_connections set to 100 (currently only 11 connections active)
>  work_mem was defaulting to 1MB, I just bumped it to 256MB (machine has 32GB 
> but it's Postgresql9.1 (x86), i.e. 32bit version
>  shared_buffers was 32MB, I am trying 512MB, could go higher
>  effective_cache_size was defaulting to 128MB, I am trying 20GB.
>  random_page_cost was 4.0, this is VMWare virtual server.  I am trying 2.0 
> 
> I'm trying an "explain analyze" with just 5 rows of tazjan2, a 5 x 998031 
> intersection problem with indices shouldn't take that long, should it?  It's 
> been running for a little over an hour.  I'm wondering if the few complex 
> polygons in lancover_polygons_snap are causing the slowness?  Is there some 
> fast way to divide complex polygons, e.g. apply a 1km grid over them?  
> 
> PostGISfullversion(): "POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" 
> PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" 
> LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.0.3 r11132" need 
> upgrade) TOPOLOGY (topology procs from "2.0.3 r11132" need upgrade) RASTER 
> (raster procs from "2.0.3 r11132" need upgrade)"
> 
> --
> John Abraham
> 
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to