2014-03-13 23:15 GMT+01:00 Robert Layton <[email protected]>: > Thanks Gael. My thinking was to implement "Basic LSH with basic data > structures" and then spend some of the time working on seeing if moderate > improvements (i.e. a more complex data structure) can deliver benefits. This > way, we get the key deliverable, and spend some time trying to see if we can > do better. > > I'd also like to see scalability added to the evaluation criteria!
The problem is that by having had a look at the ANN literature in the past, my gut feeling is that basic LSH is mostly worthless for practical applications. I would rather not have a method is scikit-learn that is not performing good enough (in terms of speed up vs exact brute force queries) to be useful on real data. To decide I think the best way to proceed would be to have a evaluation / prototyping stage in the project that would first hack an implementation in of the basic methods (at least random projections and maybe others) in a gist outside of the scikit-learn codebase, not to worry about documentation and compliance with the API and benchmark it / profile it on representative datasets with various statistical structures and compare the outcome to alternative methods such as the methods implemented in FLANN and Annoy. Actually there exists several Python implementation of vanilla LSH: https://pypi.python.org/pypi/lshash http://nearpy.io/ https://github.com/yahoo/Optimal-LSH/blob/master/lsh.py It would be interesting to benchmark them all on the same datasets (with various dimensionality, ranks and sparsity patterns). Note, that Radim already had a look at the yahoo LSH implementation and found it impractical to use: http://radimrehurek.com/2013/12/performance-shootout-of-nearest-neighbours-contestants/ -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
