[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/21274 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/21274#discussion_r187234165 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] ( dataset.schema($(idCol)).dataType match { case _: LongType => uncastPredictions +case _: IntegerType => + uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType)) --- End diff -- Shouldn't it be ` case _: IntegerType => + uncastPredictions.withColumn($(idCol), col($(idCol)).cast(IntegerType)) ` Otherwise it is not necessary for casting. right? Because prediction already has id as Long type and dataset has id as IntegerType. So, we need to cast prediction.id to IntegerType. right? Correct me if I am wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21274#discussion_r187159250 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] ( dataset.schema($(idCol)).dataType match { case _: LongType => uncastPredictions +case _: IntegerType => + uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType)) case otherType => - uncastPredictions.select(col($(idCol)).cast(otherType).alias($(idCol))) + throw new IllegalArgumentException(s"PowerIterationClustering had an unexpected error: " + +s"ID col was found to be of type $otherType, despite initial schema checks. Please " + --- End diff -- nit: ${otherType.simpleString} --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21274#discussion_r186986006 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -232,7 +232,7 @@ class PowerIterationClustering private[clustering] ( case _: LongType => uncastPredictions case otherType => --- End diff -- Why not directly use `case intType: IntType: ` so that make a stronger restriction ? Or do we need to support other types besides int/long ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21274 [SPARK-24213][ML] Fix for Int id type for PowerIterationClustering in spark.ml ## What changes were proposed in this pull request? PIC in spark.ml has tests for "id" type IntegerType, but those tests did not fully check the result. This patch: * fixes the unit test to break for the existing implementation * fixes the implementation ## How was this patch tested? Existing unit tests, with fixes You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark SPARK-24213 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21274.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21274 commit cd02df5a995ac7daa2e76dcc13c54b987265b6e4 Author: Joseph K. BradleyDate: 2018-05-08T22:02:33Z fix for Int id type for PowerIterationClustering in spark.ml --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org