Your welcom, I hope it suits you ! (It still could be improved )
Cheers, Rémi-C 2013/11/21 Lee Hachadoorian <lee.hachadooria...@gmail.com> > And Brent & Rémi, thanks for your suggestions. --Lee > > > On Thu, Nov 21, 2013 at 4:40 PM, Lee Hachadoorian < > lee.hachadooria...@gmail.com> wrote: > >> This is what I came up with: >> >> SELECT * FROM table WHERE gid IN ( >> SELECT DISTINCT unnest(array[a.gid, b.gid]) AS gid >> FROM table a JOIN table b >> ON (ST_Intersects(a.geom, b.geom) AND a.gid < b.gid) >> ) >> >> I've posted my original attempts, your suggestions, and the final version >> to my blog (with copious commentary) at >> http://geospatial.commons.gc.cuny.edu/2013/11/21/finding-islands-or-the-converse/ >> >> Best, >> --Lee >> >> >> >> On Thu, Nov 21, 2013 at 7:28 AM, Rémi Cura <remi.c...@gmail.com> wrote: >> >>> Hey , >>> Oops, >>> vocaulary error >>> "because it's making (2 million)^2 comparisons." -> it's cartesian >>> product. >>> >>> basically you want to compare every polygon to every other, this is a >>> cartesian product, and this is what takes time. >>> Choosing wel the function used (touches or intersects or ...) won't >>> change much computing time as long as it uses index. >>> In the same way, the logic you do after is not what takes time. >>> >>> Maybe you should not use the NOT IN syntaxe, but instead a where, or >>> better, an "except", this should work much better >>> http://www.postgresql.org/docs/9.3/static/queries-union.html >>> >>> >>> Cheers, >>> >>> Rémi-C >>> >>> >>> >>> 2013/11/20 Lee Hachadoorian <lee.hachadooria...@gmail.com> >>> >>>> Thank you for these suggestions. I haven't replied yet because I am >>>> testing them, and the queries take an hour or more to run on the 2 million >>>> plus table. I will report back with some results. >>>> >>>> >>>> On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <remi.c...@gmail.com> wrote: >>>> >>>>> >>>>> Maybe you can try the stupidest way to go, >>>>> anyway you have to do a inner product because you have to consider >>>>> each pair of polygons. >>>>> >>>> >>>> You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS >>>> NULL` is a typical unmatched query. By doing self join ON ST_Touches(), the >>>> polygons with no neighbors are returned because the LEFT JOIN returns all >>>> records from a, while ST_Touches() produces no match (because a polygon >>>> doesn't touch itself) so b.gid IS NULL. >>>> >>>> >>>>> CREATE TABLE result AS >>>>> SELECT DISTINCT ON (a.gid) a.* >>>>> FROM table AS a , table AS b >>>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE >>>>> AND a.gid != b.gid >>>>> >>>> >>>> Just to be clear, this produces the polygons *with* neighbors, which is >>>> why I use this as a subselect with gid NOT IN (...) >>>> >>>> Still testing, >>>> --Lee >>>> >>>> >>>> >>>> >>>>> Now you can improve, because in the previous query you will compute >>>>> (geom1,geom2) and (geom2,geom1), which is duplicate >>>>> so you can try creating an index on gid , and >>>>> CREATE TABLE result2 AS >>>>> WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS >>>>> bgid, b.geom AS bgeom >>>>> FROM table AS a , table AS b >>>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE >>>>> AND a.gid < b.gid) >>>>> SELECT agid AS gid, ageom AS geom >>>>> FROM direct_comp >>>>> UNION >>>>> SELECT bgid AS gid, bgeom AS geom >>>>> FROM direct_comp >>>>> >>>>> >>>>> >>>>> >>>>> Now if you want to go really big, this will be difficult. >>>>> Assuming your polygons bboxes have a homogenous size and/or are not >>>>> too big reguarding the total area . >>>>> The idea is to group polygons into same cells, so when you try to find >>>>> which polygons intersects, instead of comparing a polygon to each other >>>>> polygon, you compare a polygon to each other polygon in the same cell. >>>>> >>>>> >>>>> create a grid of cells (big enough or you will have lot's of cells) >>>>> index the table >>>>> >>>>> for each cell, get polygons intersecting it. This defines groups of >>>>> polygons in the same cell, defined by a new id "group_id" >>>>> index "group_id" >>>>> >>>>> Now you will have several options depending on your hardware/data >>>>> 1) process a cell at a time if limited hardware, adding "WHERE >>>>> group_id=XX" >>>>> >>>>> 2) do a massiv inner join on the group_id and compute all >>>>> >>>>> In fact you could avoid to create explicitly the cells, but it will be >>>>> more complicated. >>>>> >>>>> Cheers, >>>>> Rémi-C >>>>> >>>>> >>>>> 2013/11/20 Brent Wood <pcr...@pcreso.com> >>>>> >>>>>> I figure you have spatially indexed the polygons already? >>>>>> >>>>>> Any way of pre-categorising your polygons - binning them in some way >>>>>> that allows a non spatial test in the where clause to replace the spatial >>>>>> test... >>>>>> >>>>>> eg: a boolean attribute set to indicate if a feature has any part <= >>>>>> a particular x value or not. >>>>>> >>>>>> run this against the 2m features once to populate it, and assuming >>>>>> you get a 50/50 split of T/F features, your 2m^2 query can instead >>>>>> include >>>>>> a "where a.bool = b.bool", as we already know that if the boolean flag is >>>>>> different, they cannot touch, so your query should involve a spatial test >>>>>> only over 1m^2 instead... if you do this on both x & y, the boolean >>>>>> filter >>>>>> will replace even more spatial calculations & drop them to 500,000^2 >>>>>> tests... >>>>>> >>>>>> If you can pre-classify features so that non-spatial tests can reduce >>>>>> the spatial ones (esp for features with lots of vertices) in a where >>>>>> clause, such queries do run much faster, but you have the trade-off of >>>>>> the >>>>>> time taken to carry out the classification... which is only >>>>>> n*no_classes/2, >>>>>> generally much faster than n^n >>>>>> >>>>>> Hope this makes sense... >>>>>> >>>>>> Brent Wood >>>>>> ------------------------------ >>>>>> *From:* Lee Hachadoorian <lee.hachadooria...@gmail.com> >>>>>> *To:* PostGIS Users <postgis-us...@postgis.refractions.net> >>>>>> *Sent:* Wednesday, November 20, 2013 8:52 PM >>>>>> *Subject:* [postgis-users] Finding Islands >>>>>> >>>>>> I am trying to find "islands", polygons in a (multi)polygon layer >>>>>> which >>>>>> are not connected to any other polygons in the same layer. What I >>>>>> came >>>>>> up with runs in a couple of seconds on a layer with ~1000 geometries, >>>>>> and a couple of minutes on a layer with ~23,000 geometries, but I >>>>>> want >>>>>> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time, >>>>>> presumably because it's making (2 million)^2 comparisons. >>>>>> >>>>>> What I came up with is: >>>>>> >>>>>> SELECT a.* >>>>>> FROM table a LEFT JOIN table b >>>>>> ON (ST_Touches(a.geom, b.geom)) >>>>>> WHERE b.gid is null >>>>>> >>>>>> or >>>>>> >>>>>> SELECT * >>>>>> FROM table >>>>>> WHERE gid NOT IN ( >>>>>> SELECT DISTINCT a.gid >>>>>> FROM table a JOIN table b >>>>>> ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid) >>>>>> ) >>>>>> >>>>>> The first variant raises "NOTICE: geometry_gist_joinsel called with >>>>>> incorrect join type". So I thought I could improve performance with >>>>>> an >>>>>> INNER JOIN instead of an OUTER JOIN, and came up with the second >>>>>> variant, and it does seem to perform somewhat better. Any suggestions >>>>>> for how to speed up either approach? >>>>>> >>>>>> Best, >>>>>> --Lee >>>>>> >>>>>> -- >>>>>> Lee Hachadoorian >>>>>> Assistant Professor in Geography, Dartmouth College >>>>>> http://freecity.commons.gc.cuny.edu >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> postgis-users mailing list >>>>>> postgis-users@lists.osgeo.org >>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> postgis-users mailing list >>>>>> postgis-users@lists.osgeo.org >>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> postgis-users mailing list >>>>> postgis-users@lists.osgeo.org >>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>>> >>>> >>>> >>>> >>>> -- >>>> Lee Hachadoorian >>>> Asst Professor of Geography, Dartmouth College >>>> http://freecity.commons.gc.cuny.edu/ >>>> >>>> _______________________________________________ >>>> postgis-users mailing list >>>> postgis-users@lists.osgeo.org >>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>> >>> >>> >>> _______________________________________________ >>> postgis-users mailing list >>> postgis-users@lists.osgeo.org >>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>> >> >> >> >> -- >> Lee Hachadoorian >> Asst Professor of Geography, Dartmouth College >> http://freecity.commons.gc.cuny.edu/ >> > > > > -- > Lee Hachadoorian > Asst Professor of Geography, Dartmouth College > http://freecity.commons.gc.cuny.edu/ > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users