Hi all, I'm on my way of writing a formal proposal for GSOC but i want to test the waters on my idea, on what to focus my application.
So basically on my college we use Elements of statistical learning heavily, and we also sometimes dwell in (even) more advanced techniques if they are needed. I have seen on JIRA that there is interest (based on this ticket https://issues.apache.org/jira/browse/MAHOUT-597) to use Kernels, though they are no used for localization as they are in my proposal. My work would consist in two parts : Add a kernel smoothing implementation for current implementations of k-NN. This is useful for assigning weights to the different points in the neghborhood (depending of the point features) which makes a k-NN classification much less prone to wiggling from one class to the other. The other thing to implement this summer for expanding k-NN is : locality-Sensitive Hashing (LSH) which is an algorithm for solving the (approximate/exact) Near Neighbor Search in high dimensional spaces. LSH is great for doing dimension reduction. LSH is a good way . LSH is good for situations were you have high dimensions and you want accurate results. Also I wil have to integrate this small improvements to k-NN into Hadoop jobs. Thanks and hope to hear for you people Federico -- Federico Brubacher @fbru02
