Re: Canopy Clustering not scaling

Ted Dunning Sun, 02 May 2010 14:07:05 -0700

How about making the threshold adapt over time?

Another option is to keep a count of all of the canopies so far and evict
any which have too few points with too large average distance.  The points
emitted so far would still reference these canopies, but we wouldn't be able
to add new points to these canopies.

The number of canopies should grow with the amount of data, but slowly.  Log
N or slower is probably about right.  Clever adjustment of t1 could enforce
this and eviction of early canopies that were accepted with a small
threshold could avoid problems with the transients of the adaptation.

On Sun, May 2, 2010 at 5:19 AM, Robin Anil <robin.a...@gmail.com> wrote:

> Algorithm is simple
> For each point read into the mapper.
>           Find the canopy it is closest to(from memory List<>) and add it
> to the canopy.
>           Else if the distance is greater than a threshold t1 then create a
> new canopy(into memory List<>)
>

Re: Canopy Clustering not scaling

Reply via email to