Re: [postgis-users] ST_DWithin on 2 columns which are the same

2011-03-01 Thread Ben Madin
Robert,

On 01/03/2011, at 4:47 PM, robertvc wrote:

> I have been trying, with no luck, to implement a query to return me all the
> pairs of rows that are within a certain range of each other. I searched the
> forum before trying to find a solution but I haven't been able to. Sorry if
> there is a solution to this problem already that I didn't see. 

I think you need to search on nearest neighbour.

> Suppose you have a 2 column table cars with an integer ID, and a geography
> Location representing the position of the car globally. I want to write a
> query that will return me pairs of IDs of cars that are within say 1km of
> each other. Having looked at the functions available ST_DWithin seems the
> obvious choice but I haven't been able to actually use it for what I want. 
> 
> I've started out by simply testing if it matches each car as being in range
> with itself : 
> 
> SELECT * FROM cars WHERE ST_DWithin(location, location, 1);
> 
> This returned all the entries from the cars table as expected (given that no
> cars where actually within a meter of each other).

This doesn't sound like my interpretation - it returned all the entries because 
each car's location was within 1 metre of itself (unsurprisingly). 

> I then tried to find all
> the cars that are within a km of each other but don't have the same ID (to
> avoid matching a car with itself). Because of this extra constraint I need
> to somehow treat the location columns individually so I've tried the
> following: 
> 
> SELECT c1.id, c2.id FROM cars AS c1, cars AS c2 WHERE
> ST_DWithin(c1.location, c2.location, 1000) AND c1.id != c2.id; 
> 
> But this query never actually finishes computing (the number of entries in
> my cars table is around 30k and after 2 hours of the query being executed I
> still didn't get back a result). I would greatly appreciate any help in
> computing this query as well as any tips on performance. I should also
> probably mention that I did an indexing on the location column as suggested
> in the documentation: 
> 
> CREATE INDEX cars_gix ON cars USING GIST (location);

Did you also VACUUM ANALYZE after creating the index?

Performance-wise, using geometry instead of geography might help. You could try 
EXPLAIN to see where the slow point in the query is.

How widespread are the cars - are you talking about 30 000 cars that might all 
be within 1 km of each other?

Given that the direction of the relation doesn't matter maybe use an outer join 
something like 

SELECT c1.id, c2.id 
FROM cars c1
LEFT OUTER JOIN cars c2 ON ST_DWithin(c1.location, c2.location, 1000)
AND c2 IS NOT NULL AND c1.id <> c2.id;

(I have no idea if that would work!)

cheers

Ben


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] ST_DWithin on 2 columns which are the same

2011-03-01 Thread robertvc

Hi, 

I have been trying, with no luck, to implement a query to return me all the
pairs of rows that are within a certain range of each other. I searched the
forum before trying to find a solution but I haven't been able to. Sorry if
there is a solution to this problem already that I didn't see. 

Suppose you have a 2 column table cars with an integer ID, and a geography
Location representing the position of the car globally. I want to write a
query that will return me pairs of IDs of cars that are within say 1km of
each other. Having looked at the functions available ST_DWithin seems the
obvious choice but I haven't been able to actually use it for what I want. 

I've started out by simply testing if it matches each car as being in range
with itself : 

SELECT * FROM cars WHERE ST_DWithin(location, location, 1);

This returned all the entries from the cars table as expected (given that no
cars where actually within a meter of each other). I then tried to find all
the cars that are within a km of each other but don't have the same ID (to
avoid matching a car with itself). Because of this extra constraint I need
to somehow treat the location columns individually so I've tried the
following: 

 SELECT c1.id, c2.id FROM cars AS c1, cars AS c2 WHERE
ST_DWithin(c1.location, c2.location, 1000) AND c1.id != c2.id; 

But this query never actually finishes computing (the number of entries in
my cars table is around 30k and after 2 hours of the query being executed I
still didn't get back a result). I would greatly appreciate any help in
computing this query as well as any tips on performance. I should also
probably mention that I did an indexing on the location column as suggested
in the documentation: 

CREATE INDEX cars_gix ON cars USING GIST (location);

Thank you in advance, 

Robert
-- 
View this message in context: 
http://old.nabble.com/ST_DWithin-on-2-columns-which-are-the-same-tp31038958p31038958.html
Sent from the PostGIS - User mailing list archive at Nabble.com.

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users