subject:"\[GitHub\] spark issue #15148\: \[SPARK\-5992\]\[ML\] Locality Sensitive Hashing"

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
Thanks for the discussion, everyone! I will take a look at the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
Phew, done!  https://issues.apache.org/jira/browse/SPARK-18392


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
Good points: Array of Vectors sounds good to me.

There has been a lot of discussion.  I'm going to try to summarize things 
in a follow-up JIRA, which I'll link here shortly.  LSH turned out to be a much 
messier area than I expected; thanks a lot to everyone for all of the post-hoc 
reviews and discussions!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15148
  
If we were to use a matrix for the output, then when we do 
`approxSimilarityJoin` we would want to explode the output column by matrix 
rows, assuming the matrix structure was:


| ---g1(x) |
| ---g2(x) |
| ...  |
| ---gL(x) |


This is probably possible, but might be a bit awkward? `Array[Vector]` 
might make it a bit easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/15148
  
> This is very common in academic research and literature, but it may not 
be in industry. I'm fine with not considering it for now.

Ok makes sense - for the `transform` case if users are looking to directly 
use the hash sigs as lower-dim representation, they can always set `L=1` and 
`d` (assuming we do AND + OR later) to get just one "vector" output.

For the public vals - sorry if I wan't clear. I meant we should probably 
not expose them until the API is fully baked. But yes I see that they are 
useful to expose once we're happy with the API. I just don't love the idea of 
changing things later (and throwing errors and whatnot) if we can avoid it - I 
think we saw similar issues with e.g. NaiveBayes now.

> What about outputting a Matrix instead of an Array of Vectors? That will 
make it easy to change in the future, without us having weird Vectors of length 
1.

Matrix can work - I don't think `Array[Vector]` is an issue either. I seem 
to recall a comment above that Matrix was a bit less easy to work with 
(exploding indices and so on). I don't see a big difference between an Lx1 
matrix and an L-length Array of 1-d vectors in practical terms. So, I'm ok with 
either approach.

I'll check the JIRA - sorry I missed the links.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
@MLnick  I agree with most of your comments.  A few responses:

> In terms of transform - I disagree somewhat that the main use case is 
"dimensionality reduction". Perhaps there are common examples of using the hash 
signatures as a lower-dim representation as a feature in some model (e.g. in a 
similar way to say a PCA transform), but I haven't seen that.

This is very common in academic research and literature, but it may not be 
in industry.  I'm fine with not considering it for now.

>  I also don't see why randUnitVectors or randCoefficients needs to be 
public

You mentioned people using LSH outside of Spark for serving.  In order to 
do that, we will need to expose randUnitVectors and randCoefficients so that 
users can compute hash values for query points.  That said, I'm fine with 
making those private for now and preventing this use case for 1 release while 
we stabilize the API.

> One issue I have is that currently we would output a 1 x L set of hash 
values. But it actually should be L x 1 i.e. a set of signatures of length 1. I 
guess we can leave it as is, but document what the output actually is.

What about outputting a Matrix instead of an Array of Vectors?  That will 
make it easy to change in the future, without us having weird Vectors of length 
1.

> Finally, my understanding was results from some performance testing would 
be posted. I don't believe we've seen this yet.

You can see some results linked from the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/15148
  
Oh and for naming - I'm ok with the current ones actually. However we could 
think about changing to `ScalarRandomProjectionLSH` (a term mentioned in 
@karlhigley's package), as later we will have `SignRandomProjectionLSH` for 
cosine distance; and `MinHashLSH`, etc - just to make it clear what the class 
is doing. (perhaps later we have some other random projection algorithm that 
conflicts etc).

We could name according to the estimated metric such as `EuclideanLSH` or 
so on, but if we want to support say Euclidean and Manhattan distance at some 
point that becomes problematic. So perhaps best not to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-09 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/15148
  
I tend to agree that the terminology used here is a little confusing, and 
doesn't seem to match up with the "general" terminology (I use that term 
loosely however).

 Terminology

In my dealings with LSH, I too have tended to come across the version that 
@sethah mentions (and @karlhigley's package, and others such as 
https://github.com/marufaytekin/lsh-spark, implement). that is, each input 
vector is hashed into `L` "tables" of hash signatures of "length" or 
"dimension" `d`. Each hash signature is created by concatenating the result of 
applying `d` "hash functions".

I agree what's effectively implemented here is `L = outputDim` and `d=1`. 
What I find a bit troubling is that it is done "implicitly", as part of the 
`hashDistance` function. Without knowing that is what is happening, it is not 
clear to a new user - coming from other common LSH implementations - that 
`outputDim` is not the "number of hash functions" or "length of the hash 
signatures" but actually the "number of hash tables".

 Transform semantics

In terms of `transform` - I disagree somewhat that the main use case is 
"dimensionality reduction". Perhaps there are common examples of using the hash 
signatures as a lower-dim representation as a feature in some model (e.g. in a 
similar way to say a PCA transform), but I haven't seen that. In my view, the 
real use case is the approximate nearest neighbour search.

I'll give a concrete example for the `transform` output. Let's say I want 
to export recommendation model factor vectors (from ALS), or Word2Vec vectors, 
etc, to a real-time scoring system. I have many items, so I'd like to use LSH 
to make my scoring feasible. I do this by effectively doing a real-time version 
of OR-amplification. I store the hash tables (`L` tables of `d` hash 
signatures) with my vectors. When doing "similar items" for a given item, I 
retrieve the hash sigs of the query item, and use these to filter down the 
candidate item set for my scoring. This is in fact something I'm working on in 
a demo project currently. So if we will support the OR/AND combo, then it will 
be very important to output the full `L x d` set of hash sigs in `transform`.

 Proposal:

My recommendation is: 

1. future proof the API by returning `Array[Vector]` in `transform` (as 
mentioned above by others);
2. we need to update the docs / user guide to make it really clear what the 
implementation is doing;
3. I think we need to make it clear that the implied `d` value here is `1` 
- we can mention that AND amplification will be implemented later and perhaps 
even link to a JIRA.
4. rename `outputDim` to something like `numHashTables`.
5. when we add AND-amp, we can add the parameter `hashSignatureLength` or 
`numHashFunctions`.
6. make as much private as possible to avoid being stuck with any 
implementation detail in future releases (e.g. I also don't see why 
`randUnitVectors` or `randCoefficients` needs to be public).

One issue I have is that currently we would output a `1 x L` set of hash 
values. But it actually should be `L x 1` i.e. a set of signatures of length 
`1`. I guess we can leave it as is, but document what the output actually is.

I believe we should support OR/AND in future. If so, then to me many things 
need to change - `hashFunction`, `hashDistance` etc will need to be refactored. 
Most of the implementation is private/protected so I think it will be ok. Let's 
just ensure we're not left with an API that we can't change in future. Setting 
`L` and `d=1` must then yield the same result as current impl to avoid a 
behavior change (I guess this will be ok since current default for `L` is `1`, 
and we can make the default for `d` when added also `1`).

Finally, my understanding was results from some performance testing would 
be posted. I don't believe we've seen this yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-08 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
@sethah 

> What is the intended use of the output column generated by transform? As 
an alternative set of features with decreased dimensionality?

I agree it's mainly for dimensionality reduction, though these LSH 
functions are not ideal for that.  (E.g., most people doing dimensionality 
reduction would probably want to use random projections without bucketing.)

@karlhigley 

I agree with your description of different dimensionalities and agree we 
may just have to pick some terminology out of many choices.  I'm fairly 
ambivalent about what terminology we choose, though it would be great for it to 
match whatever references we cite.  (And maybe we do need another reference 
cited for describing OR vs AND amplification and "dimensions.")

@Yunni 

* Have you seen "HyperplaneProjection" used in literature?
* I'll respond about the hashDistance in 
[https://github.com/apache/spark/pull/15800]
* Let's not implement both types of amplification just yet.  Let's either:
  * Fix the API so we can add them in the future, or
  * Make LSH private for now so that we can change fix its API for 2.2.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
@jkbradley I agree with most of your comments above. And I would like to 
suggest the following:
 - I would recommend a more intuitive name like `HyperplaneProjection` 
instead of `PStableHashing` if we adopt the LSH function @sethah suggested.
 - `x.toDense.values.zip(y.toDense.values).map(pair => pair._1 == 
pair._2).sum / x.size` is AND-amplification. I think we should use 
OR-amplification here. I have already made a pull request to fix the issue in 
#15800.
 - I think for MinHash, multi-probing NN Search is either single probing or 
full scan.
 - Here is my reference for Multi-probing: 
http://www.cs.princeton.edu/cass/papers/mplsh_vldb07.pdf

@sethah @karlhigley Now I see your LSH function for Euclidean distance is 
the AND-amplification of what I have implemented. 
- Do you have any reference for compound AND/OR-amplification? I see this 
is not always working without assumptions on distance threshold and 
sensitivity, for example, `(0.6, 0.4)` => `(0.426, 0.098)` for `L = 4, d = 4`, 
and `(0.8, 0.2)` => `(0.678, 0.000)` for `L = 10, d = 10`
- For the schema of `transform()`, I think we either add a generic type for 
the output column in LSH class or change the output type to `Array[Vector]`. I 
would recommend the latter way because (1) it's very easy to explode the array 
to get what @sethah suggested (2) The type of output column still needs to be 
spark sql compatible, which is not so generic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread karlhigley

Github user karlhigley commented on the issue:

https://github.com/apache/spark/pull/15148
  
@jkbradley: "Multi-probe" seems like a standard term, and I think this is 
the [original paper](http://www.cs.princeton.edu/cass/papers/mplsh_vldb07.pdf) 
that coined it.

> Terminology: For LSH, "dimensionality" = "number of hash functions" and 
is relevant only for amplification. Do you agree? I have yet to see a hash 
function used for LSH which does not have a discrete set.

I confess that I'm a little confused what you mean by the above. There are 
several relevant dimensionalities: the dimensionality of the input points 
(`x`), the dimensionality of the computed hashes (i.e. the results of applying 
`g(x)`), and the number of hash tables computed (i.e. how many `g(x)` functions 
are applied), which is the dimensionality of AND-amplification (in a sense).

After wrestling with inconsistent terminology for a while, what I settled 
on for spark-neighbors was to refer to `g(x)` as a hash function, the outputs 
of `g(x)` as hashes, the sub-elements of `g(x)` -- `h1(x)` etc. -- as whatever 
made sense for the particular method (e.g. `permutations` for Minhash), and the 
output of each of the L `g(x)` functions as a hash table. While that 
terminology isn't necessarily standard, it helped me identify the common 
concepts across LSH methods clearly enough to build some abstractions around 
them.

Using those terms, the dimensionality of the `g(x)` hash functions and the 
hashes they produce is equivalent to the number of `h(x)` sub-elements they 
contain. I thought of applying OR-amplification as producing multiple hash 
tables by using multiple `g(x)` functions, with a collision in any one hash 
table producing a pair of candidate neighbors.

Does that make any more (or less) sense? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread karlhigley

Github user karlhigley commented on the issue:

https://github.com/apache/spark/pull/15148
  
@sethah: Your description of the combination of AND and OR amplification 
from the literature matches my understanding, and the combination of the two is 
what I was aiming for in spark-neighbors. I also concur with your assessment of 
the potential performance impacts of OR-amplification without first applying 
AND-amplification, in terms of both precision/recall and runtime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15148
  
I was using L to refer to the number of compound hash functions, but you're 
right that in my explanation L was the "OR" parameter and d was the "AND" 
parameter.

Thinking more about it, this is a tough question. What is the intended use 
of the output column generated by transform? As an alternative set of features 
with decreased dimensionality?

When/if we use the AND/OR amplification, we could go a couple of different 
routes. Let's say for d = 3 and L = 3 we could first apply our hashing scheme 
to the input to obtain:

|features|  g1| g2| g3|

|||||
|[12.5609584702036...|[112.0,1.0,12.0]|[1.0,120.0,16.0]|[102.0,1.0,14.0]
|...|...|...|...|

Then we generate `g1(q), g2(q), g3(q)` where q is the query point and we 
would select all points where `g1(q) == g1(x_i) OR g2(q) == g2(x_i) OR ...`. In 
spark-neighbors, instead the number of elements in the output dataframe has `L 
* N` rows where N is the number of rows in the input dataframe. Then you can 
join on the hashed column plus a "table identifier" (the index l in range [1, 
L]). Still, this makes a temporary dataframe within the near-neighbors or 
approx-join algos, and I'm not sure the output schema of `transform` needs to 
have all `L` hashed values. We could store `randUnitVectors: 
Array[Array[Vector]]` and for transform output the hashed value for only the 
first sequence of random vectors, but that seems a bit strange to me. Thoughts?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
> The current implementation is equivalent to the L = 1 case always, and 
outputDim corresponds to d.

That is true if you're talking about comparing hash values.  But for approx 
similarity and nearest neighbors, this is doing d = 1 and L = outputDim (i.e., 
OR amplification).  (Did you swap accidentally?)  Definitely need to clarify in 
the docs.

I'm not too worried about making ```randUnitVectors``` private.  We can 
always deprecate it and have it throw an exception when it is not applicable.

I'm more worried about the schema for transform().  Do you think we should 
go ahead and output a Matrix so we can support AND and OR in the future?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15148
  
So I'll try to summarize the AND/OR amplification and how I think it fits 
into the current API right now. LSH relies on a single hashing function `h(x)` 
which is (R, cR, p1, p2)-sensitive which just means it meets certain properties 
needed for LSH. In the case of 2-stable method `h(x) = floor((x dot r) / w)` 
which maps `Vector[Double] => Int`. p1 and p2 correspond to "good" and "bad" 
collision probabilities respectively. To decrease the probability of a bad 
collision we can use AND-amplification by creating a new, compound hash 
function `g(x) = [h1(x), h2(x), ..., hd(x)]` where the `h_i(x)` correspond to 
different random vectors `r`. Now we only consider collisions for two vectors x 
and y if g(x) == g(y) (i.e. standard vector equality). This makes the 
probability of both types of collisions decrease to `(p1^d, p2^d)`. For a 
hypothetical (0.8, 0.2)-sensitive distribution this goes to `(0.4, 0.0016)` for 
d = 4. Making the false-positive rate very low, but meaning we also miss a lot
  of good candidates. To mitigate this we can further apply OR-amplification by 
generating not one compound hash function g(x) but `L` compound functions


g1(x) = [h11(x), ..., h1d(x)]
g2(x) = [h21(x), ..., h2d(x)]
gL(x)  = [hL1(x), ..., hLd(x)]


Then we convert the original probabilities to `(1 - (1 - p1^L)^b, 1 - (1 - 
p2^L)^b)` and in our example `(0.8, 0.2) => (0.8785, 0.006)` for L = 4, d = 4.

The current implementation is equivalent to the `L = 1` case always, and 
`outputDim` corresponds to `d`. The concern I have with the RandomProjection 
API right now is that if we extend to offer arbitrary `L` then our models do 
not store just a d-dimensional array of random vectors but more like a `L x d` 
matrix of random vectors. And we would have `hashFunctions` instead of 
`hashFunction` (though this is still private). One question I have is - why do 
we expose `randUnitVectors` at all? I feel it leaves us more room for changes 
in the future if we do not expose it, especially considering the points I just 
made. There may be some reason to expose it that I haven't thought of though. 
What do we think about changing it to private?

I like the idea of changing `outputDim` to something related to 
OR-amplification a lot. I think minhash is done properly right now but the 
`hashDistance` measure doesn't make sense as already discussed. Right now, I'd 
like to focus on making sure we don't corner ourselves with the API since 
internal algo details and documentation can always be changed later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148

It sounds like discussions are converging, but I want to confirm a few
things + make a few additions.

### Amplification

Is this agreed?
* Approx neighbors and similarity are doing OR-amplification when comparing
hash values, as described in the [Wikipedia
article](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Stable_distributions).
This is computing an amplified hash function *implicitly*.
* transform() is not doing amplification. It outputs the value of a
collection of hash functions, rather than aggregating them to do amplification.
* This is my main question: Is amplification ever done explicitly, and
when would you ever need that?

Adding combined AND and OR amplification in the future sounds good to me.
My main question right now is whether we need to adjust the API before the 2.1
release. I don't see a need to, but please comment if you see an issue with
the current API.
* One possibility: We could rename outputDim to something specific to
OR-amplification.

Terminology: For LSH, "dimensionality" = "number of hash functions" and is
relevant only for amplification. Do you agree? I have yet to see a hash
function used for LSH which does not have a discrete set.

### Random Projection

I agree this should be renamed to something like "PStableHashing." My
apologies for not doing enough background research to disambiguate.

### MinHash

I think this is implemented correctly, according to the reference given in
the linked Wikipedia article.
* [This
reference](https://github.com/apache/spark/blob/8f0ea011a7294679ec4275b2fef349ef45b6eb81/mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala#L36)
to perfect hash functions may be misleading. I'd prefer to remove it.

### hashDistance

Rethinking this, I am unsure about what function we should use. Currently,
hashDistance is only used by approxNearestNeighbors. Since
approxNearestNeighbors sorts by hashDistance, using a soft measure might be
better than what we currently have:
* MinHash
* Currently: Uses OR-amplification for single probing, and something odd
for multiple probing
* Best option for approxNearestNeighbors: [this Wikipedia
section](https://en.wikipedia.org/wiki/MinHash#Variant_with_many_hash_functions),
which is equivalent or OR-amplification when using single probing. I.e.,
replace [this line of
code](https://github.com/apache/spark/blob/8f0ea011a7294679ec4275b2fef349ef45b6eb81/mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala#L79)
with: ```x.toDense.values.zip(y.toDense.values).map(pair => pair._1 ==
pair._2).sum / x.size```
* RandomProjection
* Currently: Uses OR-amplification for single probing, and something
reasonable for multiple probing

@Yunni What is the best resource you have for single vs multiple probing?
I'm wondering now if they are uncommon terms and should be renamed.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
@sethah Yes, that's why `outputDim` is introduced for users to trade off 
between false negative rate and running time.
During my tests, LSH without amplification can be (0.5, 0.5)-sensitive or 
even worse depending on the input distribution. Even that case, `outputDim = 4` 
or `outputDim = 5` already gives very good accuracy. And the number of rows 
being scanned should be proportional to  `outputDim * averageBucketSize`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-07 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15148
  
Ok, I'm looking more closely at this algorithm versus the literature. I 
agree that there is a lot of inconsistent terminology which is probably leading 
to some of the confusion here. 

Most or all of the LSH algorithms in the literature describe a process 
which applies a composition of AND and OR amplification. @karlhigley This is 
what the package spark-neighbors does as well, correct? AND amplification is 
applied by generating hash functions `g(x) = (h1(x), h2(x), ..., hd(x))` which 
are concatenations of several of the vanilla locality sensitive hashing 
functions. These algorithms only compare `g(x) == g(y)` for near-neighbor 
candidacy. Still, they then apply OR amplification by using `L` of these 
hashing functions and accepting a point as a candidate if any of the `g_i for i 
= 1 to L` hash functions fall into the same bucket as the query point. 

In this patch we only apply OR amplification by generating a single `g(x) = 
(h1(x), h2(x), ..., hd(x))` and we consider candidates if any of the `h_i for i 
= 1 to d` match. For a `(p1, p2)` sensitive hashing family, this OR 
amplification transforms it into a `(1 - (1 - p1)^d, 1 - (1 - p2)^d)` family, 
where p1 is a "good" collision and p2 is a "bad" collision. Consider a (0.8, 
0.2) hash family where we apply OR amplification with a dimension `d = 10`. We 
will transform this into a `(0.9989, 0.893)` sensitive family. Basically, 
we amplify the good and bad collisions. If instead we implement the composition 
of **AND then OR** amplification as in the literature, we transform a `(0.8, 
0.2)` sensitive family into a `(.8785, .0064)`. In this way, we amplify the 
"good" collision and dampen the "bad" collision probabilities. If this is 
correct, then I think the current implementation will end up selecting most of 
the points as candidates and may impact the runtime performance. [This r
 eference](http://web.stanford.edu/class/cs345a/slides/05-LSH.pdf) sums it up 
nicely IMO. 

I will look into testing this out more concretely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-06 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
@sethah I think you are right. OR-amplification is only applied inside NN 
search and similarity join through `hashDistance` and `explode`. `transform` 
itself does not apply amplifications.

Sorry to miss this. I will clarify this in the user guide, and I am happy 
for the PR you send to fix the documentation. @jkbradley @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-06 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15148
  
@karlhigley Thanks for your detailed response. From the amplification 
section on 
[Wikipedia](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Amplification),
 it is pretty clear to me that this implementation is not doing OR/AND 
amplification. `outputDim` is just the number of concatenated random hash 
functions (`k` in the wiki article). 

For now we can clarify some of this a bit better in the documentation, and 
perhaps in the future we can extend this implementation to use optional AND/OR 
amplification. I can work on a PR for it this week, unless there are any 
objections. @jkbradley @Yunni @MLnick ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-05 Thread karlhigley

Github user karlhigley commented on the issue:

https://github.com/apache/spark/pull/15148
  
@sethah: I think you're right that there's a discrepancy here, and I'm 
embarrassed that I didn't see it when I first reviewed the PR. On a reread of 
the source and your comment above, it looks like the LSH models in this PR use 
a single hash function to compute a single hash table, which doesn't match my 
understanding of OR-amplification.  For OR-amplification, multiple hash 
functions would be applied to compute multiple hash tables, and points placed 
in the same bucket in any hash table would be considered candidate neighbors.

From the 
[comments](https://github.com/apache/spark/pull/15148/files#diff-e3391977ca23d69ff7201c8bdcd88437R36),
 it looks like the discrepancy might be due to some confusion between the 
number of hash functions applied and the dimensionality of the hash functions. 
This is a subtle point that I was confused about too, and it took me quite a 
while to work it out because different authors use the term "hash function" to 
refer to different things at different levels of abstraction. In one sense (at 
a lower level), a random projection is made up of many component hash 
functions, but in another sense (at a higher level) a random projection 
represents a single hash function for the purposes of OR-amplification.

Given that the PR has already been merged, I concur that the best way 
forward is to adjust the comments and documentation. That probably involves 
changing the references to OR-amplification to simply refer to the 
dimensionality of the hash function.

On the other issue you mentioned regarding mismatches between what's 
implemented and the linked documents, I think some of that confusion also stems 
from inconsistent terminology in the source material. LSH based on p-stable 
distributions (for Euclidean distance) does involve random projections, 
although the authors don't directly say so in the paper. There's a somewhat 
similar LSH method for cosine distance that's sometimes referred to as "sign 
random projection" (though the authors of the paper don't use that term 
either). Sign random projection is what the "Random Projection" section of the 
Wikipedia page is referring to; what's implemented here looks like LSH based 
p-stable distributions. Maybe one way to clarify would be to name the models 
after the distance measures they're intended to approximate, and provide 
explanations of the methods they use in the comments?








---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-11-05 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15148
  
I apologize for coming late to this, but I am taking a look at some of the 
documentation now. For `RandomProjection` class there are two links: one to 
wikipedia entry on stable distributions and one to a survey paper. The 
wikipedia links to the "stable distributions" section despite also having a 
section on random projections, which is the supposed algorithm. The paper has a 
"Random Projection" section as well - neither of the Random Projection methods 
in the links match the code here. I expressed this concern before. The approach 
in the Random Projection class does not match either the "Random Projection" 
method OR the "P-Stable distribution" methods that I find in the literature. 

I summarized this in a comment way up towards the top. If this method is 
some well-accepted hybrid of the two, fine, but I think the references would 
leave users quite confused. I think it's nice to have certainty about the 
practical effectiveness of this method since it has already been deployed in 
industry, so my main concern is really just documentation. Right now, we're 
linking to sources which describe distinctly different algorithms than what we 
have implemented. Thoughts? 

For convenience, some references: 
* http://cseweb.ucsd.edu/~dasgupta/254-embeddings/lawrence.pdf
* 
https://en.wikipedia.org/wiki/Locality-sensitive_hashing#LSH_algorithm_for_nearest_neighbor_search
* https://people.csail.mit.edu/indyk/p117-andoni.pdf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-28 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
Awesome! Thanks Joseph and thanks everyone else for reviewing this! ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-28 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
This LGTM now.  Any other comments from other reviewers?  I'll merge this, 
but we can follow up as needed.

Thanks very much @Yunni for the PR and everyone else for helping to review!

Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67721/consoleFull)**
 for PR 15148 at commit 
[`3570845`](https://github.com/apache/spark/commit/35708458a0ee156c097ca604efeafaa37d3c8a6d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67721/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67721/consoleFull)**
 for PR 15148 at commit 
[`3570845`](https://github.com/apache/spark/commit/35708458a0ee156c097ca604efeafaa37d3c8a6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67688/consoleFull)**
 for PR 15148 at commit 
[`97e1238`](https://github.com/apache/spark/commit/97e1238ddf14938539237facf354e0ce4fc4ed1c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67688/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67688/consoleFull)**
 for PR 15148 at commit 
[`97e1238`](https://github.com/apache/spark/commit/97e1238ddf14938539237facf354e0ce4fc4ed1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67683/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67683/consoleFull)**
 for PR 15148 at commit 
[`6cda936`](https://github.com/apache/spark/commit/6cda936cf2c14f3e4c0e164b0d688fd4c8996b5d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67683/consoleFull)**
 for PR 15148 at commit 
[`6cda936`](https://github.com/apache/spark/commit/6cda936cf2c14f3e4c0e164b0d688fd4c8996b5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15148
  
Only 2 comments remain, I believe:
* I'd still like to remove the default outputCol value
* Discussion about approxNearestNeighbors internals (in comments above)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67676/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67676/consoleFull)**
 for PR 15148 at commit 
[`9a3704c`](https://github.com/apache/spark/commit/9a3704c6252c842c750c8cf98b0271ab51e3d44e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67676/consoleFull)**
 for PR 15148 at commit 
[`9a3704c`](https://github.com/apache/spark/commit/9a3704c6252c842c750c8cf98b0271ab51e3d44e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67668/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67668/consoleFull)**
 for PR 15148 at commit 
[`9bb3fd6`](https://github.com/apache/spark/commit/9bb3fd607519d245f72afedf95def63e0e7400a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67665/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67665/consoleFull)**
 for PR 15148 at commit 
[`20a9ebf`](https://github.com/apache/spark/commit/20a9ebf03d9bd1d32ea46454352a2ae5500ad5ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67668/consoleFull)**
 for PR 15148 at commit 
[`9bb3fd6`](https://github.com/apache/spark/commit/9bb3fd607519d245f72afedf95def63e0e7400a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67665/consoleFull)**
 for PR 15148 at commit 
[`20a9ebf`](https://github.com/apache/spark/commit/20a9ebf03d9bd1d32ea46454352a2ae5500ad5ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67609/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67609/consoleFull)**
 for PR 15148 at commit 
[`1c4b9fb`](https://github.com/apache/spark/commit/1c4b9fb6821d5f86037a5f55976a72e85cb2440b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-26 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
Thanks @jkbradley . I have made several changes to unit tests. Please let 
me know if I missed any.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67609/consoleFull)**
 for PR 15148 at commit 
[`1c4b9fb`](https://github.com/apache/spark/commit/1c4b9fb6821d5f86037a5f55976a72e85cb2440b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67401/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67401/consoleFull)**
 for PR 15148 at commit 
[`e14f73e`](https://github.com/apache/spark/commit/e14f73e8a49d409e09a6ed541d4b40f07dc81013).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67401/consoleFull)**
 for PR 15148 at commit 
[`e14f73e`](https://github.com/apache/spark/commit/e14f73e8a49d409e09a6ed541d4b40f07dc81013).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67398/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67398/consoleFull)**
 for PR 15148 at commit 
[`cad4ecb`](https://github.com/apache/spark/commit/cad4ecb3cea47e16b9c1073d30d8fd57bc397621).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
Thanks @jkbradley. I have removed BitSampling and SignRandomProjection for 
a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67398/consoleFull)**
 for PR 15148 at commit 
[`cad4ecb`](https://github.com/apache/spark/commit/cad4ecb3cea47e16b9c1073d30d8fd57bc397621).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67055/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67055/consoleFull)**
 for PR 15148 at commit 
[`66d553a`](https://github.com/apache/spark/commit/66d553a4e2bd8c219c09e17db11962cd49114a24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67055/consoleFull)**
 for PR 15148 at commit 
[`66d553a`](https://github.com/apache/spark/commit/66d553a4e2bd8c219c09e17db11962cd49114a24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66914/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66914/consoleFull)**
 for PR 15148 at commit 
[`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66914/consoleFull)**
 for PR 15148 at commit 
[`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/15148
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/15148
  
Have no idea to solve this MiMa test. Could anyone give some clue?
```
java.lang.ArrayIndexOutOfBoundsException: 1660
at 
com.typesafe.tools.mima.core.BufferReader.nextByte(BufferReader.scala:33)
at 
com.typesafe.tools.mima.core.ClassfileParser$ConstantPool.(ClassfileParser.scala:91)
at 
com.typesafe.tools.mima.core.ClassfileParser.parseAll(ClassfileParser.scala:67)
at 
com.typesafe.tools.mima.core.ClassfileParser.parse(ClassfileParser.scala:59)
at 
com.typesafe.tools.mima.core.ClassInfo.ensureLoaded(ClassInfo.scala:86)
at com.typesafe.tools.mima.core.ClassInfo.methods(ClassInfo.scala:101)
at 
com.typesafe.tools.mima.core.ClassInfo$$anonfun$lookupClassMethods$2.apply(ClassInfo.scala:123)
at 
com.typesafe.tools.mima.core.ClassInfo$$anonfun$lookupClassMethods$2.apply(ClassInfo.scala:123)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66873/consoleFull)**
 for PR 15148 at commit 
[`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66873/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66873/consoleFull)**
 for PR 15148 at commit 
[`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66800/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66800 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66800/consoleFull)**
 for PR 15148 at commit 
[`1b63173`](https://github.com/apache/spark/commit/1b6317396629b9f290a279dd735923c0fc8efd89).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BitSampling(override val uid: String) extends 
LSH[BitSamplingModel]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66800/consoleFull)**
 for PR 15148 at commit 
[`1b63173`](https://github.com/apache/spark/commit/1b6317396629b9f290a279dd735923c0fc8efd89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66774/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66774 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)**
 for PR 15148 at commit 
[`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)**
 for PR 15148 at commit 
[`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66717/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66717 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66717/consoleFull)**
 for PR 15148 at commit 
[`2c95e5c`](https://github.com/apache/spark/commit/2c95e5c1d89e2db0350b5d8667e2ae8d293df7a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MinHash(override val uid: String) extends LSH[MinHashModel] with 
HasSeed `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66717 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66717/consoleFull)**
 for PR 15148 at commit 
[`2c95e5c`](https://github.com/apache/spark/commit/2c95e5c1d89e2db0350b5d8667e2ae8d293df7a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66677/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66677/consoleFull)**
 for PR 15148 at commit 
[`40d1f1b`](https://github.com/apache/spark/commit/40d1f1b077232a8feeb2dd66d9b846ded1839e63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66677/consoleFull)**
 for PR 15148 at commit 
[`40d1f1b`](https://github.com/apache/spark/commit/40d1f1b077232a8feeb2dd66d9b846ded1839e63).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/4/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #4 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/4/consoleFull)**
 for PR 15148 at commit 
[`142d8e9`](https://github.com/apache/spark/commit/142d8e96f7c7e5ef80b3fe11ada1be9cd499bc8a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #4 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/4/consoleFull)**
 for PR 15148 at commit 
[`142d8e9`](https://github.com/apache/spark/commit/142d8e96f7c7e5ef80b3fe11ada1be9cd499bc8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66659/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66659/consoleFull)**
 for PR 15148 at commit 
[`efe323c`](https://github.com/apache/spark/commit/efe323cd69b87cea6a19d39be0e480e9322b5fe5).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MinHash(override val uid: String) extends LSH[MinHashModel] with 
MinHashParams `
  * `class RandomProjection(override val uid: String) extends 
LSH[RandomProjectionModel]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 170 matches

Mail list logo