Hi Daniel,

You are suggesting that current numpy arrays and implementation of a binary
search can be used to create the functionality of an ordered map
successfully, right? With this structure, trees for LSH forest can be
built. Or we can go for a Cython implementation as well.

Gaël, What is the status of the evaluation of the cost to benefit ratio for
this? I would appreciate it if we can decide on storing structure for the
LSH implementation quickly as I have to make a sound plan for this project.
[?]

Best Regards,
Maheshakya.


On Tue, Mar 11, 2014 at 1:47 PM, Daniel Vainsencher <
[email protected]> wrote:

> My 0.02 cents; not a scikit learn contributor but have implemented this
> before.
>
> First, its 138 citations on an important but simple datastructure/query
> optimization: replace hashed maps with ordered maps for the range
> queries, and use those to get the right number of candidates, instead of
> too many or too few depending on local density. The problem of variable
> density that it solves was pretty serious in my application, and seems
> pretty generic.
>
> The main risk is in trying too hard to optimize the ordered map for
> efficiency and compactness. A Judy array would probably be ideal, but
> its C/C++ code with a patent in the background, and another B-Tree might
> be a hassle as a dependency or maintenance.
>
> Fortunately, sorted arrays and binary search are already very efficient
> ordered maps (as long as batched construction is acceptable), and
> something more specialized can be swapped in later if justified. This
> avoids the main risk completely for this GSOC at low cost.
>
> Other than that, the construction code is the same as in LSH, and the
> query code is novel but well localized. Maybe test code for some of the
> other NN search methods can be reused.
>
> Daniel Vainsencher
>
> On 03/11/2014 04:17 AM, Gael Varoquaux wrote:
> > On Sun, Mar 09, 2014 at 08:46:32PM +1100, Robert Layton wrote:
> >> That said, you'll need to make the case to the community -- the Forest
> >> LSH is a variation of a key algorithm. LSH by itself is definitely
> >> worthy of inclusion in scikit-learn, and it looks like Forest LSH would
> >> be sufficiently well-used (Google tells me 138 citations), but I think
> >> we need the community's decision.
> >
> > 138 citations on Google scholar isn't that much given the current
> > standards of inclusion in scikit-learn. I have probably missed the
> > discussion, but can someone make a compelling case for Forest LSH in a
> > few lines and evaluate the cost to benefit ratio of implementing it
> > (factoring in risk analysis and maintenance cost).
> >
> > Cheers,
> >
> > Gaël
> >
> >
> ------------------------------------------------------------------------------
> > Learn Graph Databases - Download FREE O'Reilly Book
> > "Graph Databases" is the definitive new guide to graph databases and
> their
> > applications. Written by three acclaimed leaders in the field,
> > this first edition is now available. Download your free book today!
> > http://p.sf.net/sfu/13534_NeoTech
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka

<<330.png>>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to