Glad you got it working.
Just a little note of warning ... I noticed you are using "UNION", not
"UNION ALL". The difference is that the later will perform a simple
concatenation of the two results sets. The former is kind of like a set
union, returning results that are in the first or second query, removing
duplicate rows. In your 8 second query, not a big deal, but in your 3
second query, you've include a geometry object. In determining record
equivalence to filter duplicate rows, it is the = operator that is
invoked against two geometry objects, which goes back to comparing
bounding box equivalence, not geometry equivalence.
i.e.
SELECT 'POINT(0 0)'::geometry
UNION
SELECT 'POINT(0 0.0000001)'::geometry;
geometry
--------------------------------------------
010100000000000000000000000000000000000000
(1 row)
SELECT 'POINT(0 0)'::geometry
UNION ALL
SELECT 'POINT(0 0.0000001)'::geometry;
geometry
--------------------------------------------
010100000000000000000000000000000000000000
0101000000000000000000000048AFBC9AF2D77A3E
(2 rows)
Not a big deal, but I just wanted to make sure you knew what was
happening with your query.
UNION ALL should also be faster as it's not trying to merge two results
sets together.
Cheers,
Kevin
On 11/30/2010 2:34 PM, Brian Stempin wrote:
So, I decided to run with the following query:
SELECT osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (
-- list all ids where the startpoint intersects something.
SELECT t1.osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
"OSMData".osm_mn_data_highway_20101129_101234 t2
WHERE t1.osm_id <> t2.osm_id
AND ST_Intersects(ST_StartPoint(t1.way), t2.way)
)
UNION
SELECT osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (
-- list all ids where the startpoint intersects something.
SELECT t1.osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
"OSMData".osm_mn_data_highway_20101129_101234 t2
WHERE t1.osm_id <> t2.osm_id
AND ST_Intersects(ST_EndPoint(t1.way), t2.way)
)
In my dataset, this takes ~ 8 seconds to run. Being a bit of a
performance junkee, I modified it to the following:
SELECT osm_id, ST_StartPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (
-- list all ids where the startpoint intersects something.
SELECT t1.osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
"OSMData".osm_mn_data_highway_20101129_101234 t2
WHERE t1.osm_id <> t2.osm_id
AND t2.way ~ t1.way
AND ST_Intersects(ST_StartPoint(t1.way), t2.way)
)
UNION
SELECT osm_id, ST_EndPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (
-- list all ids where the startpoint intersects something.
SELECT t1.osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
"OSMData".osm_mn_data_highway_20101129_101234 t2
WHERE t1.osm_id <> t2.osm_id
AND t2.way ~ t1.way
AND ST_Intersects(ST_EndPoint(t1.way), t2.way)
)
Note the addition of the "t2.way ~ t1.way" bit. "A ~ B" means "A's
bounding box contains B's bounding box." Since in my case B is a
point, I eliminate a lot of comparisons by only looking at shapes
who's bounding box B is contained in. This reduced my query time from
8.x seconds to 3.4x seconds.
Thanks a ton, Kevin!
Brian
PS -- anyone else spot any other improvements that I can make? I plan
on writing about this in a blog article as part of a larger piece
later tonight/tomorrow.
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users