Thanks Pierre and Dylan for your helpful replies. FYI my dataset is 90K records describing events that occurred over 14 years over an area of 50^2m.
The suggestion of using R's PAM and CLARA functions for clustering lead me to the 'dbscan' algorithm which may well be a better choice for my needs as one doesn't need to know in advance how many clusters require identification. "Clusters require a minimum no of points (MinPts) within a maximum distance (eps) around one of its members (the seed). Any point within eps around any point which satisfies the seed condition is a cluster member (recursively). Some points may not belong to any clusters." (http://bm2.genes.nig.ac.jp/RGM2/R_current/library/fpc/man/dbscan.html). Another approach I'm considering is to discretize 2D space and time (three dimensions) into a cellular matrix, associate each event with a cell and amalgamate all records that have the same cell reference. This would of course fail to cluster 'close' events that happen to fall either side of a cell divide but _might_ be easy to implement using say PL/SQL. For reference it appears that a clustering function for PostGIS has already been proposed: http://opengeo.org/products/coredevelopment/postgis/bi-utilities/ Thanks again for pointing me towards PAM/CLARA. Cheers, Will On 15 July 2010 21:33, Pierre Racine <pierre.rac...@sbf.ulaval.ca> wrote: > I would suggest you ask your question to the r-sig-geo mailing list. You will > get a R solution. You can then get your PostGIS table from R using the > gdal/ogr package or use PL/R in PostgreSQL. > > Pierre > >>-----Original Message----- >>From: Dylan Beaudette [mailto:debeaude...@ucdavis.edu] >>Sent: 15 juillet 2010 16:02 >>To: Pierre Racine >>Cc: PostGIS Users Discussion; w...@thearete.co.uk >>Subject: Re: [postgis-users] 'Clustering' records in space and time >> >>On Thursday 15 July 2010, Pierre Racine wrote: >>> What should happen when event A is at a distance n minus epsilon from B, B >>> is at a distance n-epsilon from C but A is at a distance 2*n-epsilon from >>> C? Should A and C be in the same cluster with B? >>> >>> Pierre >> >>Interesting. The choice of clustering algorithm would need to be based on the >>questions the OP was trying to answer. Without much thought (warning!) I >>pictured a 3D space (x, y, time) partitioned around medoids (PAM algorithm) >>of data. >> >>In this very simple case chunks of data in (x, y, time) space would be >>collected based on their proximity. For this to work, space and time >>coordinates would need to be standardized accordingly... For x and y, I think >>that subtracting the mean and dividing by the standard deviation should do. I >>am not sure about the standardization of time... maybe the same thing, but >>applied to the number of seconds | minutes | hours | days elapsed since the >>start of the experiment? >> >>Dylan >> >> >>> >-----Original Message----- >>> >From: postgis-users-boun...@postgis.refractions.net [mailto:postgis-users- >>> >boun...@postgis.refractions.net] On Behalf Of Dylan Beaudette >>> >Sent: 15 juillet 2010 15:10 >>> >To: w...@thearete.co.uk; PostGIS Users Discussion >>> >Subject: Re: [postgis-users] 'Clustering' records in space and time >>> > >>> >Hi, >>> > >>> >Can you give us some hints about your data? >>> > >>> >1. how many records >>> >2. temporal domain (i.e. 1 year?) >>> >3. spatial domain (local, regional, continental?) >>> > >>> >If you don't have too much data, you may be able to standardize them, and >>> >apply an algorithm like PAM, or CLARA (see cluster package in R). >>> > >>> >Cheers, >>> >Dylan >>> > >>> >On Thursday 15 July 2010, William Furnass wrote: >>> >> I have a PostGIS table of records describing events therefore the >>> >> table has a timestamp attribute. I wish to replace 'clusters' of >>> >> events that occur within a m-hour window and a spatial radius of n >>> >> with single events which have the mean timestamp and central position >>> >> of the cluster. I understand that I can quantize my data spatially >>> >> using the St_SnapToGrid function but using this function alone I lose >>> >> some of the distinct events that occurred at the same point in space >>> >> but at very different times (it's my understanding that St_SnapToGrid >>> >> only allows one point to be stored at each node in the grid). Also, I >>> >> am unsure as to how I could use St_SnapToGrid in such a way so as not >>> >> to relocate points that are unique within the aforementioned spatial >>> >> and temporal window boundaries. >>> >> >>> >> Has anyone any suggestions as to how this can be achieved >>> >> programmatically using SQL (rather than a graphical tool)? Should I >>> >> perhaps be looking to use R to spatially and temporally cluster my >>> >> data? Apologies if the description of my problem isn't particularly >>> >> clear; it's been a long day:) >>> >> _______________________________________________ >>> >> postgis-users mailing list >>> >> postgis-users@postgis.refractions.net >>> >> http://postgis.refractions.net/mailman/listinfo/postgis-users >>> > >>> >-- >>> >Dylan Beaudette >>> >Soil Resource Laboratory >>> >http://casoilresource.lawr.ucdavis.edu/ >>> >University of California at Davis >>> >530.754.7341 >>> >_______________________________________________ >>> >postgis-users mailing list >>> >postgis-users@postgis.refractions.net >>> >http://postgis.refractions.net/mailman/listinfo/postgis-users >> >> >> >>-- >>Dylan Beaudette >>Soil Resource Laboratory >>http://casoilresource.lawr.ucdavis.edu/ >>University of California at Davis >>530.754.7341 > _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users