[ https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017828#comment-17017828 ]
Gilles Sadowski commented on MATH-1509: --------------------------------------- bq. I'd like to contribute the code to Apache Commons Math Thanks, and welcome. bq. I have created a pull request [...] for reference only. What do you mean by "for reference only"? > Implement the MiniBatchKMeansClusterer > -------------------------------------- > > Key: MATH-1509 > URL: https://issues.apache.org/jira/browse/MATH-1509 > Project: Commons Math > Issue Type: New Feature > Reporter: Chen Tao > Priority: Major > Attachments: compare.png > > > MiniBatchKMeans is a fast clustering algorithm, > which use partial points in initialize cluster centers, and mini batch in > training iterations. > It can finish in few seconds on clustering millions of data, and has few > differences between KMeans. > I have implemented it by Kotlin in my own project, and I'd like to contribute > the code to Apache Commons Math, of course in java. > My implemention is base on Apache Commons Math3, refer to Python > sklearn.cluster.MiniBatchKMeans > Thought test I found it works well on intensive data, significant performance > improvement and return value has few difference to KMeans++, but has many > difference on sparse data. > > Below is the comparation of my implemention and KMeansPlusPlusClusterer > !compare.png! > > I have created a pull request on > [https://github.com/apache/commons-math/pull/117], for reference only. -- This message was sent by Atlassian Jira (v8.3.4#803005)