GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/3192
[FLINK-1731][ml] Add KMeans clustering(Lloyd's algorithm) This is a breakoff from https://github.com/apache/flink/pull/757 to add the lloyd's algorithm first. I will follow this up with initialization schemes in the above linked PR. To address a few comments from the previous PR: We cannot use `DataSet[LabeledVector]` instead of `DataSet[Seq[LabeledVector]]` because the model here is of type `Seq[LabeledVector]` and the semantics of pipeline require as such. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink kmeans Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3192 ---- commit 598f1ea9b4a0e1daf1f151c8b69c88bf83224f71 Author: Peter Schrott <peter.schrot...@gmail.com> Date: 2015-07-29T22:44:54Z [FLINK-1731][ml]Added KMeans algorithm to ML library commit d70c46e71e152b374c9b3f23c9d0bd006bf503ff Author: Florian Goessler <m...@floriangoessler.de> Date: 2015-07-29T22:50:22Z [FLINK-1731][ml]Added unit tests for KMeans algorithm ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---