[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22032 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22032#discussion_r208642160 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT)) +val transformUDF = udf({ v: Vector => hashFunction(v) }, --- End diff -- Ah I see, let me try that instead. I didn't know you could express a type on `_` placeholder args. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22032#discussion_r208637385 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT)) +val transformUDF = udf({ v: Vector => hashFunction(v) }, --- End diff -- not really, you had `hashFunction(_)` which doesn't work. `hashFunction(_: Vector)` or `hashFunction _` should work instead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22032#discussion_r208608273 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT)) +val transformUDF = udf({ v: Vector => hashFunction(v) }, --- End diff -- Yeah, that's what I had. It didn't compile in Scala 2.11 for some reason (see the "fails to build" build above). This seemed to work though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22032#discussion_r208599920 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT)) +val transformUDF = udf({ v: Vector => hashFunction(v) }, --- End diff -- nit: why not `hashFunction _`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22032#discussion_r208425588 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -75,7 +75,7 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] * The hash function of LSH, mapping an input feature vector to multiple hash vectors. * @return The mapping of LSH function. */ - protected[ml] val hashFunction: Vector => Array[Vector] + protected[ml] def hashFunction(elems: Vector): Array[Vector] --- End diff -- This change does appear to resolve the issue by avoiding whatever is happening in these two cases. This is at least reasonable, as it sort of appears it's something to do with scala + Java 8 rather than Spark. However I do wonder whether MiMa will allow this change. It is still exposed as a function in the bytecode, so maybe. If not, will be a tough call whether this experimental class is allowed to change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/22032 [SPARK-25047][ML] Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel ## What changes were proposed in this pull request? Convert two function fields in ML classes to simple functions to avoiâ¦d odd SerializedLambda deserialization problem ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-25047 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22032 commit 9fa90804f7216898d31cc83d477a39686df40bde Author: Sean Owen Date: 2018-08-08T00:31:28Z Convert two function fields in ML classes to simple functions to avoid odd SerializedLambda deserialization problem --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org