Re: [postgis-users] Determining clusters of points

pcreso Tue, 07 Dec 2010 12:07:00 -0800

Hi Sébastien ,

One way I "clustered" points in the (distant) past for a zoom-layer mapping 
solution was to create multiple geometries, by reducing the number of decimal 
points in the X & Y coords and grouping by these locations. Simplistic & not 
especially statistical, but given a random point distribution you get an order 
of magnitude reduction in the number of positions of points at each step. With 
a non-random distribution, the reduction in number of these clusters is often 
greater.


I have also done a similar thing by creating grids of square cells of 
appropriate sizes, then assigning points to the cells using Postgis queries,  
which allows a regular clustered sampling of the points by grid cell.

If you are after a more statistical approach, I'd look at using R, and once you 
have your R clustering capability coded, perhaps look at PL/R to embed the 
clustering capability within your database.

What isn't clear from your question is whether the clustering is something done 
often, perhaps at various resolutions, or is done once, to assign points to 
clusters, then used as these groups from then on. This would have an impact on 
performance & the approach you might take.

Also note that with both Postgis & R, any use of multiple cores or hardware 
clusters will require some work, as these are not applications that parallel 
process particularly well, but you could perhaps develop code that assigns 
individual analyses of data clusters to a load balanced hardware cluster. 

If you have a clusterin algorithm, it may also be feaqsibl to implement it as a 
native user defined Postgis/Postgres function, but I think PL/R is an easier 
approach for embedded SQL clustering capability.


HTH,

  Brent Wood

--- On Wed, 12/8/10, Sébastien Lorion <s...@thestrangefactory.com> wrote:

From: Sébastien Lorion <s...@thestrangefactory.com>
Subject: Re: [postgis-users] Determining clusters of points
To: "PostGIS Users Discussion" <postgis-users@postgis.refractions.net>
Date: Wednesday, December 8, 2010, 8:03 AM

I like your idea very much, especially since something I did not say is that 
the points have a range attribute which determines how far they can interact 
with other points. So the circle buffer you talk about would have a diameter 
equals to the range of each point.


What would be fastest as the last step : bounding box, minimum bounding circle 
or minimum convex hull ? I am guessing the BB, but is the difference 
significant enough that one should be chosen over another ?


Sébastien

On Tue, Dec 7, 2010 at 13:42, Emilie Laffray <emilie.laff...@gmail.com> wrote:




On 7 December 2010 17:01, Sébastien Lorion <s...@thestrangefactory.com> wrote:



Hello,
I am trying to find an efficient way to find clusters of points as shown in the 
attached image. The only clustering criteria is the distance between the 
points. The dataset can be very large (millions of points) and point 
distribution is mostly clustered with some sparse points in the gaps.





I searched the net and this mailing list and found two promising solution 
paths: 
- use a statistical tools such as R with a density function 
(http://www.r-project.org)




- use a clustering algorithm like those explained here 
http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf
 (agnes seems the most promising for my purposes)





I would like your advice to help me find which approach would be best suited 
with PostGIS (maybe there is even something already made that I can use?). 
Whatever solution I pick, it must be efficient and the workload must be able to 
be distributed on a cluster of commodity hardware.





I am new to GIS and this mailing list, so please excuse me if I am not using 
the right vocabulary.
Thank you very much!

Hello,



Some time ago I have worked on something similar, except that I was using 
circles instead of boxes which should not be a problem. I am just giving the 
logic as I don't have access to the code right now.

You can start by creating a buffer around each of your points of the distance 
you want. 
The next step is to create an UNION of all the buffers that intersect.
You get the list of points included in each of the resulting polygons and then 
you create either a bounding box around them or use a minimum bounding circle 
(Postgis 1.5 and above).




Emily Laffray


_______________________________________________

postgis-users mailing list

postgis-users@postgis.refractions.net

http://postgis.refractions.net/mailman/listinfo/postgis-users





-----Inline Attachment Follows-----

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] Determining clusters of points

Reply via email to