2008/10/9 David Bolme <[EMAIL PROTECTED]>: > I have written up basic nearest neighbor algorithm. It does a brute > force search so it will be slower than kdtrees as the number of points > gets large. It should however work well for high dimensional data. I > have also added the option for user defined distance measures. The > user can set a default "p". "p" has the same functionality if it is a > float. "p" can also be a function that computes a distance matrix or > the measure can be selected using the strings: "Manhattan", > "Euclidean", or "Correlation". > > https://pyvision.svn.sourceforge.net/svnroot/pyvision/trunk/src/pyvision/vector/knn.py
This is interesting. I would point out, though, that if you want a Minkowski norm, it may be more efficient (that is, faster) to use the new compiled kd-tree implementation with leafsize set to the size of your data. This is written in compiled code and uses short-circuit distance evaluation, and may be much faster for high-dimensional problems. Given that, this should perhaps go in with other generic metric space code. I have a functional implementation of ball trees (though I don't know how efficient they are), and am looking into implementing cover trees. > The interface is similar to Anne's code and in many cases can be used > as a drop in replacement. I have posted the code to my own project > because I have a short term need and I do not have access to the scipy > repository. Feel free to include the code with scipy under the scipy > license. > > I did find a typo your documentation. > typo "trie -> tree" - ... kd-tree is a binary trie, each of whose ... That's not actually a typo: a trie is a tree in which all the data is stored at leaf nodes. Basic kd-trees use the nodes themselves to define splitting planes; you can actually construct one with no extra storage at all, just by choosing an appropriate order for your array of points. This implementation chooses splitting planes that may not pass through any point, so all the points get stored in leaves. > Also I found the use of k in the documentation some what confusing as > it is the dimensionality of the data points in the kd-tree and the > number of neighbors for k-nearest neighbors. That's a good point. I changed the dimension of the kd-tree to m throughout. Anne _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion