[ 
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017828#comment-17017828
 ] 

Gilles Sadowski commented on MATH-1509:
---------------------------------------

bq. I'd like to contribute the code  to Apache Commons Math

Thanks, and welcome.

bq. I have created a pull request [...] for reference only.

What do you mean by "for reference only"?

> Implement the MiniBatchKMeansClusterer
> --------------------------------------
>
>                 Key: MATH-1509
>                 URL: https://issues.apache.org/jira/browse/MATH-1509
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Chen Tao
>            Priority: Major
>         Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm, 
> which use partial points in initialize cluster centers, and mini batch in 
> training iterations.
>  It can finish in few seconds on clustering millions of data, and has few 
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute 
> the code  to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python 
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance 
> improvement and return value has few difference to KMeans++, but has many 
> difference on sparse data.
>  
> Below is the comparation of my implemention and KMeansPlusPlusClusterer
>   !compare.png!
>  
> I have created a pull request on 
> [https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to