How about making the threshold adapt over time? Another option is to keep a count of all of the canopies so far and evict any which have too few points with too large average distance. The points emitted so far would still reference these canopies, but we wouldn't be able to add new points to these canopies.
The number of canopies should grow with the amount of data, but slowly. Log N or slower is probably about right. Clever adjustment of t1 could enforce this and eviction of early canopies that were accepted with a small threshold could avoid problems with the transients of the adaptation. On Sun, May 2, 2010 at 5:19 AM, Robin Anil <robin.a...@gmail.com> wrote: > Algorithm is simple > For each point read into the mapper. > Find the canopy it is closest to(from memory List<>) and add it > to the canopy. > Else if the distance is greater than a threshold t1 then create a > new canopy(into memory List<>) >