[jira] [Commented] (SPARK-19771) Support OR-AND amplification in Locality Sensitive Hashing (LSH)

2017-03-02 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893508#comment-15893508 ] Yun Ni commented on SPARK-19771: [~merlin] What you are suggesting is to hash each AND hash vector into a

[jira] [Comment Edited] (SPARK-19771) Support OR-AND amplification in Locality Sensitive Hashing (LSH)

2017-03-02 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893097#comment-15893097 ] Yun Ni edited comment on SPARK-19771 at 3/2/17 9:55 PM: [~merlin] (1) The

[jira] [Commented] (SPARK-19771) Support OR-AND amplification in Locality Sensitive Hashing (LSH)

2017-03-02 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893097#comment-15893097 ] Yun Ni commented on SPARK-19771: [~merlin] (1) The computation cost is NumHashFunctions because we go

[jira] [Created] (SPARK-19771) Support OR-AND amplification in Locality Sensitive Hashing (LSH)

2017-02-28 Thread Yun Ni (JIRA)
Yun Ni created SPARK-19771: -- Summary: Support OR-AND amplification in Locality Sensitive Hashing (LSH) Key: SPARK-19771 URL: https://issues.apache.org/jira/browse/SPARK-19771 Project: Spark Issue

[jira] [Comment Edited] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH

2017-02-21 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874956#comment-15874956 ] Yun Ni edited comment on SPARK-18454 at 2/21/17 9:39 PM: - [~josephkb] [~sethah]

[jira] [Commented] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH

2017-02-20 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874956#comment-15874956 ] Yun Ni commented on SPARK-18454: [~josephkb] [~sethah] [~mlnick] I opened a gDoc for discussions. Feel

[jira] [Resolved] (SPARK-18286) Add Scala/Java/Python examples for MinHash and RandomProjection

2017-02-16 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni resolved SPARK-18286. Resolution: Fixed Fix Version/s: 2.2.0 > Add Scala/Java/Python examples for MinHash and

[jira] [Commented] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2017-02-14 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867366#comment-15867366 ] Yun Ni commented on SPARK-18392: I agree with Seth. We need to first finish SPARK-18080 and SPARK-18450

[jira] [Comment Edited] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2017-01-20 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832487#comment-15832487 ] Yun Ni edited comment on SPARK-18392 at 1/20/17 10:29 PM: -- Yes, comparing if the

[jira] [Commented] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2017-01-20 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832487#comment-15832487 ] Yun Ni commented on SPARK-18392: Yes, comparing if the hash signature equals is faster than computing the

[jira] [Comment Edited] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2017-01-20 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832404#comment-15832404 ] Yun Ni edited comment on SPARK-18392 at 1/20/17 9:15 PM: - Hi David, Thanks for

[jira] [Commented] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2017-01-20 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832404#comment-15832404 ] Yun Ni commented on SPARK-18392: Hi David, Thanks for the question. I did not group the records by their

[jira] [Updated] (SPARK-18454) Changes to fix Nearest Neighbor Search for LSH

2016-11-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18454: --- Description: We all agree to do the following improvement to Multi-Probe NN Search: (1) Use approxQuantile

[jira] [Updated] (SPARK-18454) Changes to fix Nearest Neighbor Search for LSH

2016-11-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18454: --- Description: We all agree to do the following improvement to Multi-Probe NN Search: (1) Use approxQuantile

[jira] [Updated] (SPARK-18454) Changes to fix Nearest Neighbor Search for LSH

2016-11-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18454: --- Summary: Changes to fix Nearest Neighbor Search for LSH (was: Changes to fix Multi-Probe Nearest Neighbor

[jira] [Commented] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2016-11-15 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668372#comment-15668372 ] Yun Ni commented on SPARK-18392: Re-org the ticket dependencies for a little bit. PTAL. > LSH API,

[jira] [Created] (SPARK-18454) Changes to fix Multi-Probe Nearest Neighbor Search for LSH

2016-11-15 Thread Yun Ni (JIRA)
Yun Ni created SPARK-18454: -- Summary: Changes to fix Multi-Probe Nearest Neighbor Search for LSH Key: SPARK-18454 URL: https://issues.apache.org/jira/browse/SPARK-18454 Project: Spark Issue Type:

[jira] [Created] (SPARK-18450) Add AND-amplification to Locality Sensitive Hashing

2016-11-15 Thread Yun Ni (JIRA)
Yun Ni created SPARK-18450: -- Summary: Add AND-amplification to Locality Sensitive Hashing Key: SPARK-18450 URL: https://issues.apache.org/jira/browse/SPARK-18450 Project: Spark Issue Type:

[jira] [Updated] (SPARK-18408) API Improvements for LSH

2016-11-13 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18408: --- Description: As the first improvements to current LSH Implementations, we are planning to do the

[jira] [Updated] (SPARK-18408) API Improvements for LSH

2016-11-10 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18408: --- Description: As the first improvements to current LSH Implementations, we are planning to do the

[jira] [Updated] (SPARK-18408) API Improvements for LSH

2016-11-10 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18408: --- Description: As the first improvements to current LSH Implementations, we are planning to do the

[jira] [Created] (SPARK-18408) API Improvements for LSH

2016-11-10 Thread Yun Ni (JIRA)
Yun Ni created SPARK-18408: -- Summary: API Improvements for LSH Key: SPARK-18408 URL: https://issues.apache.org/jira/browse/SPARK-18408 Project: Spark Issue Type: Improvement Reporter:

[jira] [Commented] (SPARK-18392) LSH API, algorithm, and documentation follow-ups

2016-11-10 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654822#comment-15654822 ] Yun Ni commented on SPARK-18392: I think your summary is very good in general. For class names, I think

[jira] [Comment Edited] (SPARK-18334) MinHash should use binary hash distance

2016-11-07 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645506#comment-15645506 ] Yun Ni edited comment on SPARK-18334 at 11/7/16 9:25 PM: - [~sethah] [~josephkb]

[jira] [Comment Edited] (SPARK-18334) MinHash should use binary hash distance

2016-11-07 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645506#comment-15645506 ] Yun Ni edited comment on SPARK-18334 at 11/7/16 9:25 PM: - [~sethah] [~josephkb] I

[jira] [Commented] (SPARK-18334) MinHash should use binary hash distance

2016-11-07 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645506#comment-15645506 ] Yun Ni commented on SPARK-18334: [~sethah][~josephkb] > MinHash should use binary hash distance >

[jira] [Created] (SPARK-18334) MinHash should use binary hash distance

2016-11-07 Thread Yun Ni (JIRA)
Yun Ni created SPARK-18334: -- Summary: MinHash should use binary hash distance Key: SPARK-18334 URL: https://issues.apache.org/jira/browse/SPARK-18334 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-18081) Locality Sensitive Hashing (LSH) User Guide

2016-11-04 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637493#comment-15637493 ] Yun Ni commented on SPARK-18081: This is super helpful. Thanks! > Locality Sensitive Hashing (LSH) User

[jira] [Commented] (SPARK-18081) Locality Sensitive Hashing (LSH) User Guide

2016-11-04 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637102#comment-15637102 ] Yun Ni commented on SPARK-18081: Sorry, I was really overloaded this week. I will try my best to send a

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-10-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581416#comment-15581416 ] Yun Ni commented on SPARK-5992: --- Yes, I have implemented cosine, jaccard, euclidean and hamming distance.

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-10-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581412#comment-15581412 ] Yun Ni commented on SPARK-5992: --- Yes, I do have comparison with full scan. Usually kNN with 1 hash function

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-10-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581345#comment-15581345 ] Yun Ni commented on SPARK-5992: --- Some tests on the WEX Open Dataset, with steps and results:

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-09-19 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502493#comment-15502493 ] Yun Ni commented on SPARK-5992: --- Hi Joseph, I have made an initial PR based on the design doc:

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-09-09 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478135#comment-15478135 ] Yun Ni commented on SPARK-5992: --- Thank you very much for reviewing it, Joseph! I will work on the first

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-08-31 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453980#comment-15453980 ] Yun Ni commented on SPARK-5992: --- Thanks very much for reviewing, Joseph! Based on your comments, I have

[jira] [Comment Edited] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-08-22 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432186#comment-15432186 ] Yun Ni edited comment on SPARK-5992 at 8/23/16 5:50 AM: Hi, We are engineers from

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-08-22 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432186#comment-15432186 ] Yun Ni commented on SPARK-5992: --- Hi, We are engineers from Uber. Here is our design doc for LSH: