Re: [postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?
Evan, Beyond the cutting-up-of-large-things as Stephen suggests, there is the extra development work to optimize the edge searching routines with an internal index on the objects (and caching that index once built). Something for 2.1 :) P. On Mon, May 28, 2012 at 10:08 AM, Evan Martin postgre...@realityexists.net wrote: Hi all, I have a table of ~350 multi-polygons and ~185,000 points and I need to find which points are inside which multi-polygons. Some polygons are quite large and span the dateline, so I'm using geography ST_DWithin for this (with a tolerance of 100m). My initial query looks like this (simplified): SELECT ... FROM points, polygons WHERE ST_DWithin(point, real_area, 100) This works, but it takes about 90 minutes. I'm trying to make it faster by using ST_SimplifyPreserveTopology. That worked very nicely for my adjacent polygons problem [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], because all polygons were modified in the same way, but this is trickier. Since I'm modifying the polygon, but not the point, the results are different. So I thought maybe I could do this in two phases: if the point is well within or well outside the polygon then take the result of the simplified check as correct, but if it's close to the border then check it properly, ie. SELECT ... FROM points, polygons WHERE ST_DWithin(point, simple_area, 2) AND (ST_Distance(point, simple_border) 2 OR ST_DWithin(point, real_area, 100)) simple_area = ST_SimplifyPreserveTopology(real_area::geometry, 0.01)::geography and simple_border = the exterior ring of simple_area. This takes about 18 minutes (a 5x improvement) and gives very similar results, but not quite the same. It falls down on polygons that have rhumblines along parallels, because they get turned into great circle lines. Eg. the original polyon may have a rhumbline approximated as (24 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and simplifies it to (24 10,27 10) and then ST_DWithin on geography treats it as a great circle line, giving an incorrect result. I tried inserting extra points to unsimplify the rhumblines, but that itself is very slow and also quite a hack, because I can't really be sure which lines were supposed be rhumblines when looking at the simplified polygon. I feel like I'm so close and this is a very silly little problem - but it is a problem. Could anyone suggest a way to work around this? Or a different approach to the whole thing? Thanks, Evan ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?
Thanks, Paul, but do you have any suggestions on how I can cut up the polygons properly? As I posted, ST_Intersection doesn't work properly on the ones spanning the dateline and I don't know how else to do it. Maybe there's some obvious trick I'm missing. (I haven't looked at the code, but it seems a bit strange to me that ST_Intersects works, but not ST_Intersection - so PostGIS can somehow figure out that they intersect, but can't figure out where?) By the development work do you mean ticket 1796 or something else? Either way, any estimate on when it might get done? :) Regards, Evan On 30/05/2012 11:35 PM, Paul Ramsey wrote: Evan, Beyond the cutting-up-of-large-things as Stephen suggests, there is the extra development work to optimize the edge searching routines with an internal index on the objects (and caching that index once built). Something for 2.1 :) P. On Mon, May 28, 2012 at 10:08 AM, Evan Martin postgre...@realityexists.net wrote: Hi all, I have a table of ~350 multi-polygons and ~185,000 points and I need to find which points are inside which multi-polygons. Some polygons are quite large and span the dateline, so I'm using geography ST_DWithin for this (with a tolerance of 100m). My initial query looks like this (simplified): SELECT ... FROM points, polygons WHERE ST_DWithin(point, real_area, 100) This works, but it takes about 90 minutes. I'm trying to make it faster by using ST_SimplifyPreserveTopology. That worked very nicely for my adjacent polygons problem [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], because all polygons were modified in the same way, but this is trickier. Since I'm modifying the polygon, but not the point, the results are different. So I thought maybe I could do this in two phases: if the point is well within or well outside the polygon then take the result of the simplified check as correct, but if it's close to the border then check it properly, ie. SELECT ... FROM points, polygons WHERE ST_DWithin(point, simple_area, 2) AND (ST_Distance(point, simple_border) 2 OR ST_DWithin(point, real_area, 100)) simple_area = ST_SimplifyPreserveTopology(real_area::geometry, 0.01)::geography and simple_border = the exterior ring of simple_area. This takes about 18 minutes (a 5x improvement) and gives very similar results, but not quite the same. It falls down on polygons that have rhumblines along parallels, because they get turned into great circle lines. Eg. the original polyon may have a rhumbline approximated as (24 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and simplifies it to (24 10,27 10) and then ST_DWithin on geography treats it as a great circle line, giving an incorrect result. I tried inserting extra points to unsimplify the rhumblines, but that itself is very slow and also quite a hack, because I can't really be sure which lines were supposed be rhumblines when looking at the simplified polygon. I feel like I'm so close and this is a very silly little problem - but it is a problem. Could anyone suggest a way to work around this? Or a different approach to the whole thing? Thanks, Evan ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?
On Wed, May 30, 2012 at 9:47 AM, Evan Martin postgre...@realityexists.net wrote: Thanks, Paul, but do you have any suggestions on how I can cut up the polygons properly? As I posted, ST_Intersection doesn't work properly on the ones spanning the dateline and I don't know how else to do it. Maybe there's some obvious trick I'm missing. (I haven't looked at the code, but it seems a bit strange to me that ST_Intersects works, but not ST_Intersection - so PostGIS can somehow figure out that they intersect, but can't figure out where?) Intersects() is run on the spheroid, Intersection is a hack wrapper on a reprojection into planar and then back again. Did we talk about this already? Changing the hack to use gnomic instead of UTM might improve things. By the development work do you mean ticket 1796 or something else? Either way, any estimate on when it might get done? :) Yes, that's the ticket. It might get done for 2.1, or it might not. Money makes the world go 'round. All things being equal it will get done, since it's something that personally interests me, but things might not be equal if I'm paid for some other work instead. P. Regards, Evan On 30/05/2012 11:35 PM, Paul Ramsey wrote: Evan, Beyond the cutting-up-of-large-things as Stephen suggests, there is the extra development work to optimize the edge searching routines with an internal index on the objects (and caching that index once built). Something for 2.1 :) P. On Mon, May 28, 2012 at 10:08 AM, Evan Martin postgre...@realityexists.net wrote: Hi all, I have a table of ~350 multi-polygons and ~185,000 points and I need to find which points are inside which multi-polygons. Some polygons are quite large and span the dateline, so I'm using geography ST_DWithin for this (with a tolerance of 100m). My initial query looks like this (simplified): SELECT ... FROM points, polygons WHERE ST_DWithin(point, real_area, 100) This works, but it takes about 90 minutes. I'm trying to make it faster by using ST_SimplifyPreserveTopology. That worked very nicely for my adjacent polygons problem [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], because all polygons were modified in the same way, but this is trickier. Since I'm modifying the polygon, but not the point, the results are different. So I thought maybe I could do this in two phases: if the point is well within or well outside the polygon then take the result of the simplified check as correct, but if it's close to the border then check it properly, ie. SELECT ... FROM points, polygons WHERE ST_DWithin(point, simple_area, 2) AND (ST_Distance(point, simple_border) 2 OR ST_DWithin(point, real_area, 100)) simple_area = ST_SimplifyPreserveTopology(real_area::geometry, 0.01)::geography and simple_border = the exterior ring of simple_area. This takes about 18 minutes (a 5x improvement) and gives very similar results, but not quite the same. It falls down on polygons that have rhumblines along parallels, because they get turned into great circle lines. Eg. the original polyon may have a rhumbline approximated as (24 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and simplifies it to (24 10,27 10) and then ST_DWithin on geography treats it as a great circle line, giving an incorrect result. I tried inserting extra points to unsimplify the rhumblines, but that itself is very slow and also quite a hack, because I can't really be sure which lines were supposed be rhumblines when looking at the simplified polygon. I feel like I'm so close and this is a very silly little problem - but it is a problem. Could anyone suggest a way to work around this? Or a different approach to the whole thing? Thanks, Evan ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?
Thanks, Stephen, but I haven't been able to figure out how to get this to work on geography. ST_Intersection on geography doesn't work properly across the dateline. For example, I tried manually tiling a simple polygon that goes across the dateline, like this: WITH data AS ( SELECT ST_GeomFromEWKT('SRID=4326;POLYGON((163 50,-176 55,-176 60,163 60,163 50))')::geography AS simple ), tiles AS ( SELECT ST_SetSRID('BOX(145 50,150 60)'::box2d, 4326) AS tile UNION ALL SELECT ST_SetSRID('BOX(150 50,155 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(155 50,160 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(160 50,165 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(165 50,170 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(170 50,175 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(175 50,180 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(-180 50,-175 60)'::box2d, 4326) UNION ALL SELECT ST_SetSRID('BOX(-175 50,-170 60)'::box2d, 4326) ) SELECT ST_AsText(ST_Intersection(simple, tile::geography)) AS tiled FROM data JOIN tiles ON ST_Intersects(simple, tile::geography) This returns: POLYGON((163 49.99887,160 50.0467770263591,160 59.99987,163 59.99987,163 49.99887)) GEOMETRYCOLLECTION EMPTY GEOMETRYCOLLECTION EMPTY GEOMETRYCOLLECTION EMPTY POLYGON((-175 54.9860851802267,-176 54.99957,-176 59.99987,-175 59.99987,-175 54.9860851802267)) I think ST_Intersects works OK on geography, but not ST_Intersection. Regards, Evan On 29/05/2012 2:21 AM, Stephen Woodbridge wrote: On 5/28/2012 10:08 AM, Evan Martin wrote: Hi all, I have a table of ~350 multi-polygons and ~185,000 points and I need to find which points are inside which multi-polygons. Some polygons are quite large and span the dateline, so I'm using geography ST_DWithin for this (with a tolerance of 100m). My initial query looks like this (simplified): SELECT ... FROM points, polygons WHERE ST_DWithin(point, real_area, 100) This works, but it takes about 90 minutes. I'm trying to make it faster by using ST_SimplifyPreserveTopology. That worked very nicely for my adjacent polygons problem [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], because all polygons were modified in the same way, but this is trickier. Since I'm modifying the polygon, but not the point, the results are different. So I thought maybe I could do this in two phases: if the point is well within or well outside the polygon then take the result of the simplified check as correct, but if it's close to the border then check it properly, ie. SELECT ... FROM points, polygons WHERE ST_DWithin(point, simple_area, 2) AND (ST_Distance(point, simple_border) 2 OR ST_DWithin(point, real_area, 100)) simple_area = ST_SimplifyPreserveTopology(real_area::geometry, 0.01)::geography and simple_border = the exterior ring of simple_area. This takes about 18 minutes (a 5x improvement) and gives very similar results, but not quite the same. It falls down on polygons that have rhumblines along parallels, because they get turned into great circle lines. Eg. the original polyon may have a rhumbline approximated as (24 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and simplifies it to (24 10,27 10) and then ST_DWithin on geography treats it as a great circle line, giving an incorrect result. I tried inserting extra points to unsimplify the rhumblines, but that itself is very slow and also quite a hack, because I can't really be sure which lines were supposed be rhumblines when looking at the simplified polygon. I feel like I'm so close and this is a very silly little problem - but it is a problem. Could anyone suggest a way to work around this? Or a different approach to the whole thing? I think the alternative approach that has been discussed on the list in the past, and should be in the archives, is to cut the multiple polygons into tiles with all the attributes of the original multipolygon and then to match the points to the tiles.. This works much faster for two basic reasons: 1. the number of points in the each tile is much less than the original because it only contains a fraction of the originals complex boundary but maintains all the detail. 2. since each tile is spatially smaller, you get better (faster) interactions between the tile index and the points. -Steve ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?
Hi all, I have a table of ~350 multi-polygons and ~185,000 points and I need to find which points are inside which multi-polygons. Some polygons are quite large and span the dateline, so I'm using geography ST_DWithin for this (with a tolerance of 100m). My initial query looks like this (simplified): SELECT ... FROM points, polygons WHERE ST_DWithin(point, real_area, 100) This works, but it takes about 90 minutes. I'm trying to make it faster by using ST_SimplifyPreserveTopology. That worked very nicely for my adjacent polygons problem [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], because all polygons were modified in the same way, but this is trickier. Since I'm modifying the polygon, but not the point, the results are different. So I thought maybe I could do this in two phases: if the point is well within or well outside the polygon then take the result of the simplified check as correct, but if it's close to the border then check it properly, ie. SELECT ... FROM points, polygons WHERE ST_DWithin(point, simple_area, 2) AND (ST_Distance(point, simple_border) 2 OR ST_DWithin(point, real_area, 100)) simple_area = ST_SimplifyPreserveTopology(real_area::geometry, 0.01)::geography and simple_border = the exterior ring of simple_area. This takes about 18 minutes (a 5x improvement) and gives very similar results, but not quite the same. It falls down on polygons that have rhumblines along parallels, because they get turned into great circle lines. Eg. the original polyon may have a rhumbline approximated as (24 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and simplifies it to (24 10,27 10) and then ST_DWithin on geography treats it as a great circle line, giving an incorrect result. I tried inserting extra points to unsimplify the rhumblines, but that itself is very slow and also quite a hack, because I can't really be sure which lines were supposed be rhumblines when looking at the simplified polygon. I feel like I'm so close and this is a very silly little problem - but it is a problem. Could anyone suggest a way to work around this? Or a different approach to the whole thing? Thanks, Evan ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?
On 5/28/2012 10:08 AM, Evan Martin wrote: Hi all, I have a table of ~350 multi-polygons and ~185,000 points and I need to find which points are inside which multi-polygons. Some polygons are quite large and span the dateline, so I'm using geography ST_DWithin for this (with a tolerance of 100m). My initial query looks like this (simplified): SELECT ... FROM points, polygons WHERE ST_DWithin(point, real_area, 100) This works, but it takes about 90 minutes. I'm trying to make it faster by using ST_SimplifyPreserveTopology. That worked very nicely for my adjacent polygons problem [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], because all polygons were modified in the same way, but this is trickier. Since I'm modifying the polygon, but not the point, the results are different. So I thought maybe I could do this in two phases: if the point is well within or well outside the polygon then take the result of the simplified check as correct, but if it's close to the border then check it properly, ie. SELECT ... FROM points, polygons WHERE ST_DWithin(point, simple_area, 2) AND (ST_Distance(point, simple_border) 2 OR ST_DWithin(point, real_area, 100)) simple_area = ST_SimplifyPreserveTopology(real_area::geometry, 0.01)::geography and simple_border = the exterior ring of simple_area. This takes about 18 minutes (a 5x improvement) and gives very similar results, but not quite the same. It falls down on polygons that have rhumblines along parallels, because they get turned into great circle lines. Eg. the original polyon may have a rhumbline approximated as (24 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and simplifies it to (24 10,27 10) and then ST_DWithin on geography treats it as a great circle line, giving an incorrect result. I tried inserting extra points to unsimplify the rhumblines, but that itself is very slow and also quite a hack, because I can't really be sure which lines were supposed be rhumblines when looking at the simplified polygon. I feel like I'm so close and this is a very silly little problem - but it is a problem. Could anyone suggest a way to work around this? Or a different approach to the whole thing? I think the alternative approach that has been discussed on the list in the past, and should be in the archives, is to cut the multiple polygons into tiles with all the attributes of the original multipolygon and then to match the points to the tiles.. This works much faster for two basic reasons: 1. the number of points in the each tile is much less than the original because it only contains a fraction of the originals complex boundary but maintains all the detail. 2. since each tile is spatially smaller, you get better (faster) interactions between the tile index and the points. -Steve ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users