Derrick Burns created SPARK-5405: ------------------------------------ Summary: Spark clusterer should support high dimensional data Key: SPARK-5405 URL: https://issues.apache.org/jira/browse/SPARK-5405 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.2.0 Reporter: Derrick Burns
The MLLIB clusterer works well for low (<200) dimensional data. However, performance is linear with the number of dimensions. So, for practical purposes, it is not very useful for high dimensional data. Depending on the data type, one can embed the high dimensional data into lower dimensional spaces in a distance-preserving way. The Spark clusterer should support such embedding. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org