I have written up basic nearest neighbor algorithm. It does a brute force search so it will be slower than kdtrees as the number of points gets large. It should however work well for high dimensional data. I have also added the option for user defined distance measures. The user can set a default "p". "p" has the same functionality if it is a float. "p" can also be a function that computes a distance matrix or the measure can be selected using the strings: "Manhattan", "Euclidean", or "Correlation".
https://pyvision.svn.sourceforge.net/svnroot/pyvision/trunk/src/pyvision/vector/knn.py The interface is similar to Anne's code and in many cases can be used as a drop in replacement. I have posted the code to my own project because I have a short term need and I do not have access to the scipy repository. Feel free to include the code with scipy under the scipy license. I did find a typo your documentation. typo "trie -> tree" - ... kd-tree is a binary trie, each of whose ... Also I found the use of k in the documentation some what confusing as it is the dimensionality of the data points in the kd-tree and the number of neighbors for k-nearest neighbors. >> I believe that with mean subtracted and unit length vectors, a >> Euclidean knn algorithm will produces the same result as if the >> vectors were compared using correlation. I am not sure if kd-trees >> will perform well on the normalized vectors as they have a very >> specific geometry. If my math checks out it may be worth adding >> Pearson's correlation as a default option or as a separate class. > > Actually it's probably easier if the user just does the > prenormalization. I agree. I think we should keep your class as-is and maybe create a class that wraps the kdtree and handles the normalization for correlation. I would also like to look at cover trees, however that will have to wait a couple months until I have more time. Dave --- http://www.cs.colostate.edu/~bolme _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
