[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-10 Thread jkbradley
Github user jkbradley closed the pull request at:

https://github.com/apache/spark/pull/21274


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-09 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21274#discussion_r187234165
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] (
   dataset.schema($(idCol)).dataType match {
 case _: LongType =>
   uncastPredictions
+case _: IntegerType =>
+  uncastPredictions.withColumn($(idCol), 
col($(idCol)).cast(LongType))
--- End diff --

Shouldn't it be 
` case _: IntegerType =>
+  uncastPredictions.withColumn($(idCol), 
col($(idCol)).cast(IntegerType))
`
Otherwise it is not necessary for casting. right? Because prediction 
already has id as Long type and dataset has id as IntegerType. So, we need to 
cast prediction.id to IntegerType. right?
Correct me if I am wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21274#discussion_r187159250
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] (
   dataset.schema($(idCol)).dataType match {
 case _: LongType =>
   uncastPredictions
+case _: IntegerType =>
+  uncastPredictions.withColumn($(idCol), 
col($(idCol)).cast(LongType))
 case otherType =>
-  
uncastPredictions.select(col($(idCol)).cast(otherType).alias($(idCol)))
+  throw new IllegalArgumentException(s"PowerIterationClustering 
had an unexpected error: " +
+s"ID col was found to be of type $otherType, despite initial 
schema checks.  Please " +
--- End diff --

nit: ${otherType.simpleString}


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21274#discussion_r186986006
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -232,7 +232,7 @@ class PowerIterationClustering private[clustering] (
 case _: LongType =>
   uncastPredictions
 case otherType =>
--- End diff --

Why not directly use
`case intType: IntType: ` so that make a stronger restriction ?
Or do we need to support other types besides int/long ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-08 Thread jkbradley
GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/21274

[SPARK-24213][ML] Fix for Int id type for PowerIterationClustering in 
spark.ml

## What changes were proposed in this pull request?

PIC in spark.ml has tests for "id" type IntegerType, but those tests did 
not fully check the result.  This patch:
* fixes the unit test to break for the existing implementation
* fixes the implementation

## How was this patch tested?

Existing unit tests, with fixes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark SPARK-24213

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21274


commit cd02df5a995ac7daa2e76dcc13c54b987265b6e4
Author: Joseph K. Bradley 
Date:   2018-05-08T22:02:33Z

fix for Int id type for PowerIterationClustering in spark.ml




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org