[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22032


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22032#discussion_r208642160
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
 
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val transformUDF = udf(hashFunction, DataTypes.createArrayType(new 
VectorUDT))
+val transformUDF = udf({ v: Vector => hashFunction(v) },
--- End diff --

Ah I see, let me try that instead. I didn't know you could express a type 
on `_` placeholder args.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-08 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22032#discussion_r208637385
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
 
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val transformUDF = udf(hashFunction, DataTypes.createArrayType(new 
VectorUDT))
+val transformUDF = udf({ v: Vector => hashFunction(v) },
--- End diff --

not really, you had `hashFunction(_)` which doesn't work. `hashFunction(_: 
Vector)` or `hashFunction _` should work instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22032#discussion_r208608273
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
 
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val transformUDF = udf(hashFunction, DataTypes.createArrayType(new 
VectorUDT))
+val transformUDF = udf({ v: Vector => hashFunction(v) },
--- End diff --

Yeah, that's what I had. It didn't compile in Scala 2.11 for some reason 
(see the "fails to build" build above). This seemed to work though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-08 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22032#discussion_r208599920
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
 
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val transformUDF = udf(hashFunction, DataTypes.createArrayType(new 
VectorUDT))
+val transformUDF = udf({ v: Vector => hashFunction(v) },
--- End diff --

nit: why not `hashFunction _`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22032#discussion_r208425588
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -75,7 +75,7 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
* The hash function of LSH, mapping an input feature vector to multiple 
hash vectors.
* @return The mapping of LSH function.
*/
-  protected[ml] val hashFunction: Vector => Array[Vector]
+  protected[ml] def hashFunction(elems: Vector): Array[Vector]
--- End diff --

This change does appear to resolve the issue by avoiding whatever is 
happening in these two cases. This is at least reasonable, as it sort of 
appears it's something to do with scala + Java 8 rather than Spark.

However I do wonder whether MiMa will allow this change. It is still 
exposed as a function in the bytecode, so maybe. If not, will be a tough call 
whether this experimental class is allowed to change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

2018-08-07 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/22032

[SPARK-25047][ML] Can't assign SerializedLambda to scala.Function1 in 
deserialization of BucketedRandomProjectionLSHModel

## What changes were proposed in this pull request?

Convert two function fields in ML classes to simple functions to avoi…d 
odd SerializedLambda deserialization problem

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-25047

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22032


commit 9fa90804f7216898d31cc83d477a39686df40bde
Author: Sean Owen 
Date:   2018-08-08T00:31:28Z

Convert two function fields in ML classes to simple functions to avoid odd 
SerializedLambda deserialization problem




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org