[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

sethah Mon, 19 Sep 2016 22:33:41 -0700

Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/15148
  
    A few high-level comments/questions:
    
    * Should this go into the `feature` package as a feature 
estimator/transformer? That is where other dimensionality reduction techniques 
have gone and I'm not sure we should create a new package for this.
    * Could you please point me to a specific section of a specific paper that 
documents the approaches used here? AFAICT, this patch implements something 
different than most of the Approximate nearest neighbors via LSH algorithms 
found in papers. For instance, the method in section 2 
[here](http://cseweb.ucsd.edu/~dasgupta/254-embeddings/lawrence.pdf) as well as 
the method on Wikipedia 
[here](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#LSH_algorithm_for_nearest_neighbor_search)
 are different than the implementation in this pr. Also, the spark package 
[`spark-neighbors`](https://github.com/sethah/spark-neighbors) employs those 
approaches. I'm not an expert in LSH so I was just hoping for some 
clarification.
    * The implementation of the `RandomProjections` class actually follows the 
implementation of the "2-stable" (or more generically, "p-stable") LSH 
algorithm, and not the "Random Projection" algorithm in the paper that is 
referenced. At the very least, we should clarify this. Potentially, we should 
think of a better name.
    
    @karlhigley Would you mind taking a look at the patch, or providing your 
input on the comments?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

Reply via email to