(I am not on this list so please CC.)
Hi,
The MiniBatchKmeans implementation in sklearn/cluster/k_means_.py crashes
rather ungracefully on line 860 with the following Traceback:
Init 1/3 with method: k-means++
/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py:1146:
RuntimeWarning: init_size=300 should be larger than k=886. Setting it to 3*k
init_size=init_size)
Inertia for init 1/3: 4950187.500000
Init 2/3 with method: k-means++
Inertia for init 2/3: 4464646.283333
Init 3/3 with method: k-means++
Inertia for init 3/3: 4941442.166667
Minibatch iteration 1/786200: mean batch inertia: 95868.474986, ewa
inertia: 95868.474986
Minibatch iteration 2/786200: mean batch inertia: 99673.433750, ewa
inertia: 95869.442923
Minibatch iteration 3/786200: mean batch inertia: 99292.147983, ewa
inertia: 95870.313618
Minibatch iteration 4/786200: mean batch inertia: 97593.331241, ewa
inertia: 95870.751934
Minibatch iteration 5/786200: mean batch inertia: 97558.089367, ewa
inertia: 95871.181173
Minibatch iteration 6/786200: mean batch inertia: 95642.533019, ewa
inertia: 95871.123007
Minibatch iteration 7/786200: mean batch inertia: 93952.687664, ewa
inertia: 95870.634980
Minibatch iteration 8/786200: mean batch inertia: 92603.303128, ewa
inertia: 95869.803809
Minibatch iteration 9/786200: mean batch inertia: 94457.049382, ewa
inertia: 95869.444421
[MiniBatchKMeans] Reassigning 100 cluster centers.
Traceback (most recent call last):
File "sift.py", line 140, in <module>
mbk.fit(random_subset)
File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
1190, in fit
verbose=self.verbose)
File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
860, in _mini_batch_step
centers[to_reassign] = X[new_centers]
ValueError: array is not broadcastable to correct shape
This is sklearn version 0.15-git and Python 2.7.3.
It appears that it only crashes when I use a large number of datapoints
with a relatively large K - I use K=sqrt(# datapoints). In this case, I
have around 30k datapoints. At the moment I'm running the same code with
smaller K just fine, but I would like to use a bigger K at some point if I
can. Any ideas?
Thanks,
Douwe
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general