Hi Will,
Yes, DBSCAN is a much better choice for what you want to do. However,
how to include the temporal element becomes something of an issue with
DBSCAN or PAM/CLARA. Then there are the issues of selecting an epsilon
radius and min pts for the DBSCAN algorithm. At this point, given the
nature your questions, you are likely to find the R-sig-Geo mailing list
a better choice than the PostGIS-User list. Here is the link to
subscribe to that list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Dan
On 07/16/2010 09:59 AM, William Furnass wrote:
Thanks Pierre and Dylan for your helpful replies. FYI my dataset is
90K records describing events that occurred over 14 years over an area
of 50^2m.
The suggestion of using R's PAM and CLARA functions for clustering
lead me to the 'dbscan' algorithm which may well be a better choice
for my needs as one doesn't need to know in advance how many clusters
require identification. "Clusters require a minimum no of points
(MinPts) within a maximum distance (eps) around one of its members
(the seed). Any point within eps around any point which satisfies the
seed condition is a cluster member (recursively). Some points may not
belong to any clusters."
(http://bm2.genes.nig.ac.jp/RGM2/R_current/library/fpc/man/dbscan.html).
Another approach I'm considering is to discretize 2D space and time
(three dimensions) into a cellular matrix, associate each event with a
cell and amalgamate all records that have the same cell reference.
This would of course fail to cluster 'close' events that happen to
fall either side of a cell divide but _might_ be easy to implement
using say PL/SQL.
For reference it appears that a clustering function for PostGIS has
already been proposed:
http://opengeo.org/products/coredevelopment/postgis/bi-utilities/
Thanks again for pointing me towards PAM/CLARA.
Cheers,
Will
On 15 July 2010 21:33, Pierre Racine<pierre.rac...@sbf.ulaval.ca> wrote:
I would suggest you ask your question to the r-sig-geo mailing list. You will
get a R solution. You can then get your PostGIS table from R using the gdal/ogr
package or use PL/R in PostgreSQL.
Pierre
-----Original Message-----
From: Dylan Beaudette [mailto:debeaude...@ucdavis.edu]
Sent: 15 juillet 2010 16:02
To: Pierre Racine
Cc: PostGIS Users Discussion; w...@thearete.co.uk
Subject: Re: [postgis-users] 'Clustering' records in space and time
On Thursday 15 July 2010, Pierre Racine wrote:
What should happen when event A is at a distance n minus epsilon from B, B
is at a distance n-epsilon from C but A is at a distance 2*n-epsilon from
C? Should A and C be in the same cluster with B?
Pierre
Interesting. The choice of clustering algorithm would need to be based on the
questions the OP was trying to answer. Without much thought (warning!) I
pictured a 3D space (x, y, time) partitioned around medoids (PAM algorithm)
of data.
In this very simple case chunks of data in (x, y, time) space would be
collected based on their proximity. For this to work, space and time
coordinates would need to be standardized accordingly... For x and y, I think
that subtracting the mean and dividing by the standard deviation should do. I
am not sure about the standardization of time... maybe the same thing, but
applied to the number of seconds | minutes | hours | days elapsed since the
start of the experiment?
Dylan
-----Original Message-----
From: postgis-users-boun...@postgis.refractions.net [mailto:postgis-users-
boun...@postgis.refractions.net] On Behalf Of Dylan Beaudette
Sent: 15 juillet 2010 15:10
To: w...@thearete.co.uk; PostGIS Users Discussion
Subject: Re: [postgis-users] 'Clustering' records in space and time
Hi,
Can you give us some hints about your data?
1. how many records
2. temporal domain (i.e. 1 year?)
3. spatial domain (local, regional, continental?)
If you don't have too much data, you may be able to standardize them, and
apply an algorithm like PAM, or CLARA (see cluster package in R).
Cheers,
Dylan
On Thursday 15 July 2010, William Furnass wrote:
I have a PostGIS table of records describing events therefore the
table has a timestamp attribute. I wish to replace 'clusters' of
events that occur within a m-hour window and a spatial radius of n
with single events which have the mean timestamp and central position
of the cluster. I understand that I can quantize my data spatially
using the St_SnapToGrid function but using this function alone I lose
some of the distinct events that occurred at the same point in space
but at very different times (it's my understanding that St_SnapToGrid
only allows one point to be stored at each node in the grid). Also, I
am unsure as to how I could use St_SnapToGrid in such a way so as not
to relocate points that are unique within the aforementioned spatial
and temporal window boundaries.
Has anyone any suggestions as to how this can be achieved
programmatically using SQL (rather than a graphical tool)? Should I
perhaps be looking to use R to spatially and temporally cluster my
data? Apologies if the description of my problem isn't particularly
clear; it's been a long day:)
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users