[Scikit-learn-general] MiniBatchKMeans doesn't really re-run its algorithm 'n_init' times

Stefano Lattarini Fri, 26 Apr 2013 01:14:04 -0700

Hello scikit-learn developers.

I've noticed a somewhat unexpected difference between the behaviour
of the KMeans class and the MiniBatchKMeans class.


When the 'n_init' argument is given, I'd expect both of these classes
to run the corresponding algorithm (Lloyd and mini-batch k-means,
respectively) 'n_init' times on the data to be fitted, each time with
a different initialization, and then select the result which gives the
smallest inertia.

However, while this expectation is met by the KMeans class, it's not
really met the by the MiniBatchKMeans class: the latter only executes
the *initialization* of centroids 'n_init' times, then selecting the
initialization that gives the smallest inertia, and running the mini-batch
k-means algorithm only once, with that initialization.

This different behaviour is not made apparent in the documentation,
either.

So, my question is: is this a bug, or is it intended behaviour?  And
if it's intended behaviour, should the documentation be adjusted to
reference it explicitly?

Best regards (and BTW, thank you for your great software!),
  Stefano

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] MiniBatchKMeans doesn't really re-run its algorithm 'n_init' times

Reply via email to