Hi all, While I understand there is a project decision threads are going on ML, however I'd like to suggest and provide some improvements of CM kmeans++ implementation in the seeding procedure. Currently sum of squared distances computed each iteration during initial centers seeding, which is redundant since sum can be computed once and updated within the cycle.
Subjected JIRA item explains the optimization and I've also provided patch with suggested fix. Would be glad to hear any comments or reviews. Best regards, Artem Barger.