Re: [postgis-users] Increase query performance
On 08/03/11 17:01, Andreas Forø Tollefsen wrote: After a suggestion from pgsql_performance i tried with ST_Simplify to speed things up. However this gives me a: NOTICE: ptarray_simplify returned a 2 pts array Then server connection terminates. Like this: SELECT ST_Intersection(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; Yup that's a bug related to the new code in trunk - I think we've already got something similar filed in the bug tracker. This is probably going to require a Paul to take a look at it. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Increase query performance
Thanks for all the suggestions. I first thought about doing the simplify, but after some reconsideration I realized that this would change the geometry of the country borders, and Area measurements might be wrong. Then i tried the query without the ST_Intersection and only the conditional ST_Intersects. This completed in 87 seconds, so this was very fast compared to 40-50 minutes per year. I also tried the UNION ALL suggestion by Nicklas which had almost similar performance as the original query. I will test this some more to see whether i can save some time. My guess is that the calculation time required to create the new intersected geometry is what takes so long, and this is probably something i need to accept (or get more cpu power :) Andreas 2011/3/9 Mark Cave-Ayland mark.cave-ayl...@siriusit.co.uk On 08/03/11 17:01, Andreas Forø Tollefsen wrote: After a suggestion from pgsql_performance i tried with ST_Simplify to speed things up. However this gives me a: NOTICE: ptarray_simplify returned a 2 pts array Then server connection terminates. Like this: SELECT ST_Intersection(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; Yup that's a bug related to the new code in trunk - I think we've already got something similar filed in the bug tracker. This is probably going to require a Paul to take a look at it. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] Increase query performance
Hi all, Another question on postgis query performance. I did some discussion on the pgsql performance list on how i could increase the performance on my server. I did manage to increase the transactions per second, but came to the conclusion that this did not help the ST_Intersection query which i was trying to speed up. Any suggestions on how to speed up this query? Basically i want to create an intersection between a vector grid and country shapefiles. Then calculate the area of all the polygons in the intersected table, and finally selecting the country code for each cell which represents the largest area within each cell. I does work as it is, but i would like to increase the speed. The query which takes a lot of time is the ST_Intersection. PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit POSTGIS=2.0.0SVN GEOS=3.2.2-CAPI-1.6.2 PROJ=Rel. 4.7.1, 23 September 2009 LIBXML=2.7.7 USE_STATS DROP TABLE IF EXISTS cshapesgrid1946; SELECT ST_Intersection(priogrid_land.cell, cshapeswdate.geom) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord INTO cshapesgrid1946 FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, cshapeswdate.geom) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; ALTER TABLE cshapesgrid1946 ADD COLUMN area float; UPDATE cshapesgrid1946 SET area = ST_Area(cshapesgrid1946.geom); DROP TABLE IF EXISTS pg1946; SELECT * INTO pg1946 FROM cshapesgrid1946 a WHERE a.area IN (SELECT MAX(area) FROM cshapesgrid1946 b GROUP BY divider); CREATE INDEX idx_pg1946 ON pg1946 USING gist(geom); Best, Andreas ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Increase query performance
Andreas, I don't see anything glaringly wrong with your query, but hard to tell without seeing an explain plan or what you mean by takes a lot of time. Is a lot of time minutes, hours, days? How many records are we talking about here? What's the max number of points you have in any geometry. Often times its just one huge mega geometry with a 100,000 points or more slowing everything down. Do you have indexes on your year fields and date columns? Hope that helps, Regina http://www.postgis.us _ From: postgis-users-boun...@postgis.refractions.net [mailto:postgis-users-boun...@postgis.refractions.net] On Behalf Of Andreas Forø Tollefsen Sent: Tuesday, March 08, 2011 4:15 AM To: PostGIS Users Discussion Subject: [postgis-users] Increase query performance Hi all, Another question on postgis query performance. I did some discussion on the pgsql performance list on how i could increase the performance on my server. I did manage to increase the transactions per second, but came to the conclusion that this did not help the ST_Intersection query which i was trying to speed up. Any suggestions on how to speed up this query? Basically i want to create an intersection between a vector grid and country shapefiles. Then calculate the area of all the polygons in the intersected table, and finally selecting the country code for each cell which represents the largest area within each cell. I does work as it is, but i would like to increase the speed. The query which takes a lot of time is the ST_Intersection. PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit POSTGIS=2.0.0SVN GEOS=3.2.2-CAPI-1.6.2 PROJ=Rel. 4.7.1, 23 September 2009 LIBXML=2.7.7 USE_STATS DROP TABLE IF EXISTS cshapesgrid1946; SELECT ST_Intersection(priogrid_land.cell, cshapeswdate.geom) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord INTO cshapesgrid1946 FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, cshapeswdate.geom) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; ALTER TABLE cshapesgrid1946 ADD COLUMN area float; UPDATE cshapesgrid1946 SET area = ST_Area(cshapesgrid1946.geom); DROP TABLE IF EXISTS pg1946; SELECT * INTO pg1946 FROM cshapesgrid1946 a WHERE a.area IN (SELECT MAX(area) FROM cshapesgrid1946 b GROUP BY divider); CREATE INDEX idx_pg1946 ON pg1946 USING gist(geom); Best, Andreas ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Increase query performance
Hi, This query takes about 41 minutes per year. Doing this for every year from 1946 to 2008 takes a lot of time. The grid consists of 64818 cell polygons, while the country table has about 210 polygons. I will add indexes on year and date and try again. An do an explain analyze. Cheers, Andreas 2011/3/8 Paragon Corporation l...@pcorp.us Andreas, I don't see anything glaringly wrong with your query, but hard to tell without seeing an explain plan or what you mean by takes a lot of time. Is a lot of time minutes, hours, days? How many records are we talking about here? What's the max number of points you have in any geometry. Often times its just one huge mega geometry with a 100,000 points or more slowing everything down. Do you have indexes on your year fields and date columns? Hope that helps, Regina http://www.postgis.us -- *From:* postgis-users-boun...@postgis.refractions.net [mailto: postgis-users-boun...@postgis.refractions.net] *On Behalf Of *Andreas Forø Tollefsen *Sent:* Tuesday, March 08, 2011 4:15 AM *To:* PostGIS Users Discussion *Subject:* [postgis-users] Increase query performance Hi all, Another question on postgis query performance. I did some discussion on the pgsql performance list on how i could increase the performance on my server. I did manage to increase the transactions per second, but came to the conclusion that this did not help the ST_Intersection query which i was trying to speed up. Any suggestions on how to speed up this query? Basically i want to create an intersection between a vector grid and country shapefiles. Then calculate the area of all the polygons in the intersected table, and finally selecting the country code for each cell which represents the largest area within each cell. I does work as it is, but i would like to increase the speed. The query which takes a lot of time is the ST_Intersection. PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit POSTGIS=2.0.0SVN GEOS=3.2.2-CAPI-1.6.2 PROJ=Rel. 4.7.1, 23 September 2009 LIBXML=2.7.7 USE_STATS DROP TABLE IF EXISTS cshapesgrid1946; SELECT ST_Intersection(priogrid_land.cell, cshapeswdate.geom) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord INTO cshapesgrid1946 FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, cshapeswdate.geom) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; ALTER TABLE cshapesgrid1946 ADD COLUMN area float; UPDATE cshapesgrid1946 SET area = ST_Area(cshapesgrid1946.geom); DROP TABLE IF EXISTS pg1946; SELECT * INTO pg1946 FROM cshapesgrid1946 a WHERE a.area IN (SELECT MAX(area) FROM cshapesgrid1946 b GROUP BY divider); CREATE INDEX idx_pg1946 ON pg1946 USING gist(geom); Best, Andreas ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Increase query performance
Results from the EXPLAIN ANALYZE: Nested Loop (cost=0.00..1189.72 rows=3941 width=87790) (actual time=7.091..2524830.264 rows=54145 loops=1) Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom) - Seq Scan on cshapeswdate (cost=0.00..16.23 rows=22 width=87304) (actual time=0.011..0.542 rows=72 loops=1) Filter: ((gwsyear = 1946::numeric) AND (gweyear = 1946::numeric) AND (startdate = '1946-01-01'::date)) - Index Scan using idx_priogrid_land_cell on priogrid_land (cost=0.00..8.29 rows=1 width=486) (actual time=3.026..30.152 rows=1338 loops=72) Index Cond: (priogrid_land.cell cshapeswdate.geom) Total runtime: 2524889.630 ms 2011/3/8 Andreas Forø Tollefsen andrea...@gmail.com Hi, This query takes about 41 minutes per year. Doing this for every year from 1946 to 2008 takes a lot of time. The grid consists of 64818 cell polygons, while the country table has about 210 polygons. I will add indexes on year and date and try again. An do an explain analyze. Cheers, Andreas 2011/3/8 Paragon Corporation l...@pcorp.us Andreas, I don't see anything glaringly wrong with your query, but hard to tell without seeing an explain plan or what you mean by takes a lot of time. Is a lot of time minutes, hours, days? How many records are we talking about here? What's the max number of points you have in any geometry. Often times its just one huge mega geometry with a 100,000 points or more slowing everything down. Do you have indexes on your year fields and date columns? Hope that helps, Regina http://www.postgis.us -- *From:* postgis-users-boun...@postgis.refractions.net [mailto: postgis-users-boun...@postgis.refractions.net] *On Behalf Of *Andreas Forø Tollefsen *Sent:* Tuesday, March 08, 2011 4:15 AM *To:* PostGIS Users Discussion *Subject:* [postgis-users] Increase query performance Hi all, Another question on postgis query performance. I did some discussion on the pgsql performance list on how i could increase the performance on my server. I did manage to increase the transactions per second, but came to the conclusion that this did not help the ST_Intersection query which i was trying to speed up. Any suggestions on how to speed up this query? Basically i want to create an intersection between a vector grid and country shapefiles. Then calculate the area of all the polygons in the intersected table, and finally selecting the country code for each cell which represents the largest area within each cell. I does work as it is, but i would like to increase the speed. The query which takes a lot of time is the ST_Intersection. PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit POSTGIS=2.0.0SVN GEOS=3.2.2-CAPI-1.6.2 PROJ=Rel. 4.7.1, 23 September 2009 LIBXML=2.7.7 USE_STATS DROP TABLE IF EXISTS cshapesgrid1946; SELECT ST_Intersection(priogrid_land.cell, cshapeswdate.geom) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord INTO cshapesgrid1946 FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, cshapeswdate.geom) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; ALTER TABLE cshapesgrid1946 ADD COLUMN area float; UPDATE cshapesgrid1946 SET area = ST_Area(cshapesgrid1946.geom); DROP TABLE IF EXISTS pg1946; SELECT * INTO pg1946 FROM cshapesgrid1946 a WHERE a.area IN (SELECT MAX(area) FROM cshapesgrid1946 b GROUP BY divider); CREATE INDEX idx_pg1946 ON pg1946 USING gist(geom); Best, Andreas ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Increase query performance
Andreas, Try ST_SimplifyPreserveTolerance. The ST_Simplify often simplifies to nothing or close. Like I mentioned in the other post, it could be some huge geometries causing your problems. You don't necessarily want to simplify everything. We usually do a conditional simplify like CASE WHEN ST_NPoints(geom) 3000 THEN ST_SimplyfyPreserveTopology(geom,0.1) ELSE geom END It probably wouldn't hurt to do a max check on your tables or a count to see how many have more than n number of points. SELECT MAX(ST_NPoints(geom)) As biggest, COUNT(CASE WHEN ST_NPoints(geom) 3000 THEN 1 ELSE NULL END) as cnt_big_geoms FROM yourtable To get a sense of the largest geometry you are dealing with. Hope that helps, Regina http://www.postgis.us _ From: Andreas Forø Tollefsen [mailto:andrea...@gmail.com] Sent: Tuesday, March 08, 2011 12:02 PM To: PostGIS Users Discussion Cc: Paragon Corporation Subject: Re: [postgis-users] Increase query performance After a suggestion from pgsql_performance i tried with ST_Simplify to speed things up. However this gives me a: NOTICE: ptarray_simplify returned a 2 pts array Then server connection terminates. Like this: SELECT ST_Intersection(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; 2011/3/8 Andreas Forø Tollefsen andrea...@gmail.com Results from the EXPLAIN ANALYZE: Nested Loop (cost=0.00..1189.72 rows=3941 width=87790) (actual time=7.091..2524830.264 rows=54145 loops=1) Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom) - Seq Scan on cshapeswdate (cost=0.00..16.23 rows=22 width=87304) (actual time=0.011..0.542 rows=72 loops=1) Filter: ((gwsyear = 1946::numeric) AND (gweyear = 1946::numeric) AND (startdate = '1946-01-01'::date)) - Index Scan using idx_priogrid_land_cell on priogrid_land (cost=0.00..8.29 rows=1 width=486) (actual time=3.026..30.152 rows=1338 loops=72) Index Cond: (priogrid_land.cell cshapeswdate.geom) Total runtime: 2524889.630 ms 2011/3/8 Andreas Forø Tollefsen andrea...@gmail.com Hi, This query takes about 41 minutes per year. Doing this for every year from 1946 to 2008 takes a lot of time. The grid consists of 64818 cell polygons, while the country table has about 210 polygons. I will add indexes on year and date and try again. An do an explain analyze. Cheers, Andreas 2011/3/8 Paragon Corporation l...@pcorp.us Andreas, I don't see anything glaringly wrong with your query, but hard to tell without seeing an explain plan or what you mean by takes a lot of time. Is a lot of time minutes, hours, days? How many records are we talking about here? What's the max number of points you have in any geometry. Often times its just one huge mega geometry with a 100,000 points or more slowing everything down. Do you have indexes on your year fields and date columns? Hope that helps, Regina http://www.postgis.us _ From: postgis-users-boun...@postgis.refractions.net [mailto:postgis-users-boun...@postgis.refractions.net] On Behalf Of Andreas Forø Tollefsen Sent: Tuesday, March 08, 2011 4:15 AM To: PostGIS Users Discussion Subject: [postgis-users] Increase query performance Hi all, Another question on postgis query performance. I did some discussion on the pgsql performance list on how i could increase the performance on my server. I did manage to increase the transactions per second, but came to the conclusion that this did not help the ST_Intersection query which i was trying to speed up. Any suggestions on how to speed up this query? Basically i want to create an intersection between a vector grid and country shapefiles. Then calculate the area of all the polygons in the intersected table, and finally selecting the country code for each cell which represents the largest area within each cell. I does work as it is, but i would like to increase the speed. The query which takes a lot of time is the ST_Intersection. PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit POSTGIS=2.0.0SVN GEOS=3.2.2-CAPI-1.6.2 PROJ=Rel. 4.7.1, 23 September 2009 LIBXML=2.7.7 USE_STATS DROP TABLE IF EXISTS cshapesgrid1946; SELECT ST_Intersection(priogrid_land.cell, cshapeswdate.geom) AS geom, priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord INTO cshapesgrid1946 FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, cshapeswdate.geom) AND cshapeswdate.gwsyear =1946 AND cshapeswdate.gweyear =1946 AND cshapeswdate.startdate = '1946/1/1'; ALTER TABLE cshapesgrid1946 ADD COLUMN