Thanks for the hint, Rémi. I'll give it a try, to see if the st_intersection is slow (as compared to just st_intersects).
Here's my EXPLAIN ANALYZE for my 5 zone test: "Nested Loop (cost=1.56..80.91 rows=8 width=560) (actual time=3929258.738..19756123.479 rows=29 loops=1)" " Join Filter: (_st_intersects(p.geom, n.the_geom) AND ((NOT (p.geom && n.the_geom)) OR (NOT _st_touches(p.geom, n.the_geom))))" " CTE sometaz" " -> Limit (cost=0.00..1.56 rows=5 width=9537) (actual time=0.433..0.576 rows=5 loops=1)" " -> Seq Scan on tazjan2 (cost=0.00..843.09 rows=2709 width=9537) (actual time=0.415..0.511 rows=5 loops=1)" " -> CTE Scan on sometaz n (cost=0.00..0.10 rows=5 width=80) (actual time=0.460..0.724 rows=5 loops=1)" " -> Index Scan using lancover_polygons_snap_geom_idx on lancover_polygons_snap p (cost=0.00..12.47 rows=5 width=480) (actual time=0.242..65.640 rows=16 loops=5)" " Index Cond: (geom && n.the_geom)" "Total runtime: 19756143.935 ms" Here it is explained in layspeak: http://explain.depesz.com/s/dbVi So, am I reading this right, that it took 5.5 hours to st_intersects 5 (multi)polygons with 29 (multi)polygons, and find the 8 that intersected? That the index scan was fast (328ms), and returned only a few (16, or maybe 29) of the 998031 polygons, but that the actual intersection afterwards, between just a few polygons, was so very slow? (The query was this: with sometaz as (select * from tazjan2 limit 5) SELECT p.lc_class, n.taz , CASE WHEN ST_CoveredBy(p.geom, n.the_geom) THEN p.geom ELSE ST_Multi( ST_Intersection(p.geom,n.geom) ) END AS geom FROM lancover_polygons_snap AS p INNER JOIN sometaz AS n ON (ST_Intersects(p.geom, n.the_geom) AND NOT ST_Touches(p.geom, n.the_geom) ); ) -- John Abraham j...@hbaspecto.com 403-232-1060 On Feb 19, 2015, at 11:56 AM, Rémi Cura <remi.c...@gmail.com> wrote: > Hey, > you could try to not use CASE (so separate the spatial join from the > processing, which is easy to parallelize (assuming you have more than one > core )). > > First generate the table with > ---- > CREATE TABLE psatial_mapping_between_lancover_and_taz AS > SELECT row_number() over() as row_id, p.your_primary_key, n.your_primary_key > FROM lancover_polygons_snap AS p > INNER JOIN tazjan2 AS n > ON (ST_Intersects(p.geom, n.the_geom) > AND NOT ST_Touches(p.geom, n.the_geom) ); > ---- > > Then when you have your table, you can do your processing (put into a SQL > function , it is cleaner), > and you parallelize on the "row_id". > > Basically you get min(row_id), ax(row_id), you spearate it into N parts, then > execute the parts with your K processes. > (both bash and python have utilities for this). > > Of course if it is actually the spatial join that is slow, you can also > parallelise this. > (by cutting lancoverinto N pieces for instance ) > Cheers, > Rémi-C > > 2015-02-19 19:45 GMT+01:00 John Abraham <j...@hbaspecto.com>: > So I've was running this query for 866000 s (10 days) before I decided to > kill it: > > create table taz_and_lancover_10_fast_2 as > SELECT p.lc_class, n.taz > , CASE > WHEN ST_CoveredBy(p.geom, n.the_geom) > THEN p.geom > ELSE > ST_Multi( > ST_Intersection(p.geom,n.geom) > ) END AS geom > FROM lancover_polygons_snap AS p > INNER JOIN tazjan2 AS n > ON (ST_Intersects(p.geom, n.the_geom) > AND NOT ST_Touches(p.geom, n.the_geom) ); > > Explain shows this, it's using the spatial index: > > "Nested Loop (cost=0.00..310492.35 rows=483105 width=9973)" > " Join Filter: (_st_intersects(p.geom, n.the_geom) AND ((NOT (p.geom && > n.the_geom)) OR (NOT _st_touches(p.geom, n.the_geom))))" > " -> Seq Scan on tazjan2 n (cost=0.00..843.09 rows=2709 width=9493)" > " -> Index Scan using lancover_polygons_snap_geom_idx on > lancover_polygons_snap p (cost=0.00..21.67 rows=5 width=480)" > " Index Cond: (geom && n.the_geom)" > > There are 2709 rows in tazjan2 and 998031 rows in lancover_polygons_snap, so > I appreciate that it's a bit of a large problem. But MapInfo was able to do > it interactively in a few days and Geomedia was also able to do it in about a > day. > > Both MapInfo and Geomedia ran out of memory (8GB machines) until the problem > was broken into two regions (North and South), but postgresql seems to be > chewing on the problem using only 400MB. The interactive approach in MapInfo > was to divide lancover_polygons_snap by lc_class, to further divide each > region into about 10 subproblems. Perhaps subprobleming this is the way to > go? Can't the query subproblem it based in the indices or would I have to do > that manually? > > One potential thing I've realized is that a few of the geometries in tazjan2 > are multipolygons, not single polygons. But it's only a few. There are a > few very large and complex polygons in lancover_polygons_snap, but again, > most of the 998031 are simple small polygons, about half would be > ST_CoveredBy the polygons in tazjan2 and most of the rest would only overlap > two or three of the polygons in tazjan2. > > I must be doing something wrong. Any hints? > > I have > max_connections set to 100 (currently only 11 connections active) > work_mem was defaulting to 1MB, I just bumped it to 256MB (machine has 32GB > but it's Postgresql9.1 (x86), i.e. 32bit version > shared_buffers was 32MB, I am trying 512MB, could go higher > effective_cache_size was defaulting to 128MB, I am trying 20GB. > random_page_cost was 4.0, this is VMWare virtual server. I am trying 2.0 > > I'm trying an "explain analyze" with just 5 rows of tazjan2, a 5 x 998031 > intersection problem with indices shouldn't take that long, should it? It's > been running for a little over an hour. I'm wondering if the few complex > polygons in lancover_polygons_snap are causing the slowness? Is there some > fast way to divide complex polygons, e.g. apply a 1km grid over them? > > PostGISfullversion(): "POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" > PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" > LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.0.3 r11132" need > upgrade) TOPOLOGY (topology procs from "2.0.3 r11132" need upgrade) RASTER > (raster procs from "2.0.3 r11132" need upgrade)" > > -- > John Abraham > > > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users