GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21081
[SPARK-23975][ML]Allow Clustering to take Arrays of Double as input features ## What changes were proposed in this pull request? - Multiple possible input types is added in validateAndTransformSchema() and computeCost() while checking column type - Add if statement in transform() to support array type as featuresCol - Add the case statement in fit() while selecting columns from dataset These changes will be applied to KMeans first, then to other clustering method ## How was this patch tested? unit test is added Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ludatabricks/spark-1 SPARK-23975 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21081.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21081 ---- commit ed890d35ff1e9edbe2a557f68732835b3e911906 Author: Lu WANG <lu.wang@...> Date: 2018-04-16T17:32:02Z add Array input support for KMeans commit badb0cc5ca6ca69bb8e8fc0fce5ea05a4100bca0 Author: Lu WANG <lu.wang@...> Date: 2018-04-16T17:49:00Z remove redundent code ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org