Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66659 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66659/consoleFull)**
for PR 15148 at commit
[`efe323c`](https://github.com/apache/spark/commit/e
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
Related to the docs, some more comments defining terminology would be
useful for non-experts:
* OR-amplification
* probing buckets
* false positives/negatives (w.r.t. finding nearest ne
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66395/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66395 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66395/consoleFull)**
for PR 15148 at commit
[`df19886`](https://github.com/apache/spark/commit/
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@jkbradley Take you time for the code review. :) I will be working on the
open dataset testing at the same time.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66395 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66395/consoleFull)**
for PR 15148 at commit
[`df19886`](https://github.com/apache/spark/commit/d
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66323/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66323 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66323/consoleFull)**
for PR 15148 at commit
[`3487bcc`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66323 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66323/consoleFull)**
for PR 15148 at commit
[`3487bcc`](https://github.com/apache/spark/commit/3
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66322/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66322 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66322/consoleFull)**
for PR 15148 at commit
[`eced98d`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66322 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66322/consoleFull)**
for PR 15148 at commit
[`eced98d`](https://github.com/apache/spark/commit/e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66306/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66306 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66306/consoleFull)**
for PR 15148 at commit
[`69efc84`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66305/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66305 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66305/consoleFull)**
for PR 15148 at commit
[`ccd98f7`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66306 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66306/consoleFull)**
for PR 15148 at commit
[`69efc84`](https://github.com/apache/spark/commit/6
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66305 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66305/consoleFull)**
for PR 15148 at commit
[`ccd98f7`](https://github.com/apache/spark/commit/c
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
Starting a review...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes s
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66068/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66068 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66068/consoleFull)**
for PR 15148 at commit
[`f82f3fe`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66068 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66068/consoleFull)**
for PR 15148 at commit
[`f82f3fe`](https://github.com/apache/spark/commit/f
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@jkbradley I see. Thanks Joseph!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
> Our use case is mainly using similarity join to find fraud trips. I think
I can change the NN-search to only single-probing NN search of dataframe if you
think it's fine. What do you think?
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66065/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66065 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66065/consoleFull)**
for PR 15148 at commit
[`8f04ee8`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66065 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66065/consoleFull)**
for PR 15148 at commit
[`8f04ee8`](https://github.com/apache/spark/commit/8
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66061/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66061 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66061/consoleFull)**
for PR 15148 at commit
[`f805658`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66061 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66061/consoleFull)**
for PR 15148 at commit
[`f805658`](https://github.com/apache/spark/commit/f
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66055/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66055 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66055/consoleFull)**
for PR 15148 at commit
[`b79ebbd`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66057/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66057 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66057/consoleFull)**
for PR 15148 at commit
[`7936315`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66057 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66057/consoleFull)**
for PR 15148 at commit
[`7936315`](https://github.com/apache/spark/commit/7
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66055 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66055/consoleFull)**
for PR 15148 at commit
[`b79ebbd`](https://github.com/apache/spark/commit/b
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66054 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66054/consoleFull)**
for PR 15148 at commit
[`396ad60`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66054/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66054 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66054/consoleFull)**
for PR 15148 at commit
[`396ad60`](https://github.com/apache/spark/commit/3
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @MLnick @jkbradley
Thanks for the code review. I made some changes based on your comments.
- I agree it's better to align the input types to vector in internal
implementation. Spa
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66051/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66051 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66051/consoleFull)**
for PR 15148 at commit
[`a1c344b`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #66051 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66051/consoleFull)**
for PR 15148 at commit
[`a1c344b`](https://github.com/apache/spark/commit/a
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/15148
Yes ideally it's nice to be able to support multiple input types. Though I
lean towards Vector as the most appropriate "unified" interface. Somewhere
there is a TODO about supporting e.g. `Array[Doub
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
* Do we want to use the subpackage ```spark.ml.feature.lsh``` or just put
the classes under ```spark.ml.feature```? This would be the first division of
```feature```. I'd prefer not using subpac
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65924/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15148
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #65924 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65924/consoleFull)**
for PR 15148 at commit
[`0fad3ef`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15148
**[Test build #65924 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65924/consoleFull)**
for PR 15148 at commit
[`0fad3ef`](https://github.com/apache/spark/commit/0
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/15148
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/15148
At a high level I like the idea here and the work that's gone into a
unified interface. A few comments:
Data types
I'm not that keen on mixing up the input data types between `Vector
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks @karlhigley All of your comments are very helpful. I made some
changes to make it work. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHu
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @sethah
- My understanding is h(x) = floor((g1 dot x) / w) is one hash function, as
is in the wiki.
- In bulletpoint 6 of "Approach found on Wikipedia and here and here", we
have a pr
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks for your clarifications. I still don't see where the algorithm used
in this patch comes from. Here is my summary of how the approach here is
different than the approach found on wikipedia and
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @sethah,
Thanks for the comments.
- I agree. I have moved `lsh` package to be under `feature`
- In "Similarity search in high dimensions via hashing", there is an
algorithm in the
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
A few high-level comments/questions:
* Should this go into the `feature` package as a feature
estimator/transformer? That is where other dimensionality reduction techniques
have gone and I'm
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/15148
@Yunni Thanks for working on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled a
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @sethah, I have updated the reference in the PR and scaladoc for LSH.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
@Yunni Could you provide the specific reference paper this patch is based
on? Also, it might be nice to put the reference in the code somewhere, e.g. the
scaladoc for LSH/Random Projections. Thanks!
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks very much for reviewing @viirya I made some changes based on your
comments. PTAL.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
101 - 172 of 172 matches
Mail list logo