Hi Andy,
      I ran it a number of times. Every once in a while, it does finish the
clustering successfully. But many times it results in the error that I have
forwarded. Anyway, for my purposes, I found that removing the init='random'
argument from the kmeans object instantiation, solves the problem. With
k-means++ it is always running successfully to completion.

Thanks,
Phani

On 24 May 2012 17:37, Andreas Mueller <[email protected]> wrote:

>  Hi Phani.
> Are you sure the behavior is non-deterministic?
> I am not sure what comes out of the vectorizer,
> but my guess would be that X is a sparse matrix, which
> KMeans doesn't handle.
> Could you check that, please?
> Cheers,
> Andy
>
>
> On 05/24/2012 06:19 PM, Phani Vadrevu wrote:
>
> Hi all,
>      I am trying to run some basic clustering code.
>
>  vectorizer =
> CountVectorizer(preprocessor=preprocessor,token_pattern=u'/\w+/')
> # url_list is a list of strings
> X = vectorizer.fit_transform(url_list)
> print "feature extraction done in %f s"%(time() - t0)
> t0 = time()
> km = KMeans(init='random', max_iter=100,verbose=1,n_init=1)
> km.fit(X)
> print "clustering done in %f s"%(time() - t0)
>
>  It runs some times, but mostly it ends in the following:
>
>  feature extraction done in 0.003542 s
> Initialization complete
> Traceback (most recent call last):
>   File "cluster.py", line 42, in <module>
>     km.fit(X)
>   File
> "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
> 735, in fit
>     n_jobs=self.n_jobs)
>   File
> "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
> 265, in k_means
>     x_squared_norms=x_squared_norms, random_state=random_state)
>   File
> "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
> 380, in _kmeans_single
>     centers = _centers(X, labels, k, distances)
>   File
> "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
> 507, in _centers
>     centers[center_id] = X[far_from_centers[reallocated_idx]]
> ValueError: setting an array element with a sequence.
>
>  What could be wrong here?
>
>  Thanks,
> Phani Vadrevu
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to