Re: [Numpy-discussion] Proposal: scipy.spatial

Anne Archibald Thu, 02 Oct 2008 18:57:37 -0700

2008/10/2 David Bolme <[EMAIL PROTECTED]>:
> It may be useful to have an interface that handles both cases:
> similarity and dissimilarity.  Often I have seen "Nearest Neighbor"
> algorithms that look for maximum similarity instead of minimum
> distance.  In my field (biometrics) we often deal with very
> specialized distance or similarity measures.  I would like to see
> support for user defined distance and similarity functions.  It should
> be easy to implement by passing a function object to the KNN class.  I
> am not sure if kd-trees or other fast algorithms are compatible with
> similarities or non-euclidian norms, however I would be willing to
> implement an exhaustive search KNN that would support user defined
> functions.


kd-trees can only work for distance measures which have certain
special properties (in particular, you have to be able to bound them
based on coordinate differences). This is just fine for all the
Minkowski p-norms (so in particular, Euclidean distance,
maximum-coordinate-difference, and Manhattan distance) and in fact the
current implementation already supports all of these. I don't think
that correlation can be made into such a distance measure - the
neighborhoods are the wrong shape. In fact the basic space is
projective n-1 space rather than affine n-space, so I think you're
going to need some very different algorithm. If you make a metric
space out of it - define d(A,B) to be the angle between A and B - then
cover trees can serve as a spatial data structure for nearest-neighbor
search. Cover trees may be worth implementing, as they're a very
generic data structure, suitable for (among other things)
low-dimensional data in high-dimensional spaces.

Anne
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: scipy.spatial

Reply via email to