Glad you got it working.

Just a little note of warning ... I noticed you are using "UNION", not "UNION ALL". The difference is that the later will perform a simple concatenation of the two results sets. The former is kind of like a set union, returning results that are in the first or second query, removing duplicate rows. In your 8 second query, not a big deal, but in your 3 second query, you've include a geometry object. In determining record equivalence to filter duplicate rows, it is the = operator that is invoked against two geometry objects, which goes back to comparing bounding box equivalence, not geometry equivalence.

i.e.
SELECT 'POINT(0 0)'::geometry
UNION
SELECT 'POINT(0 0.0000001)'::geometry;
                  geometry
--------------------------------------------
 010100000000000000000000000000000000000000
(1 row)


SELECT 'POINT(0 0)'::geometry
UNION ALL
SELECT 'POINT(0 0.0000001)'::geometry;
                  geometry
--------------------------------------------
 010100000000000000000000000000000000000000
 0101000000000000000000000048AFBC9AF2D77A3E
(2 rows)

Not a big deal, but I just wanted to make sure you knew what was happening with your query. UNION ALL should also be faster as it's not trying to merge two results sets together.

Cheers,
Kevin


On 11/30/2010 2:34 PM, Brian Stempin wrote:

So, I decided to run with the following query:

SELECT osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)
UNION
SELECT osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_EndPoint(t1.way), t2.way)

)

In my dataset, this takes ~ 8 seconds to run. Being a bit of a performance junkee, I modified it to the following:

SELECT osm_id, ST_StartPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND t2.way ~ t1.way
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)
UNION
SELECT osm_id, ST_EndPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND t2.way ~ t1.way
   AND ST_Intersects(ST_EndPoint(t1.way), t2.way)

)

Note the addition of the "t2.way ~ t1.way" bit. "A ~ B" means "A's bounding box contains B's bounding box." Since in my case B is a point, I eliminate a lot of comparisons by only looking at shapes who's bounding box B is contained in. This reduced my query time from 8.x seconds to 3.4x seconds.

Thanks a ton, Kevin!

Brian
PS -- anyone else spot any other improvements that I can make? I plan on writing about this in a blog article as part of a larger piece later tonight/tomorrow.


_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to