Hi all,
     I am trying to run some basic clustering code.

vectorizer =
CountVectorizer(preprocessor=preprocessor,token_pattern=u'/\w+/')
# url_list is a list of strings
X = vectorizer.fit_transform(url_list)
print "feature extraction done in %f s"%(time() - t0)
t0 = time()
km = KMeans(init='random', max_iter=100,verbose=1,n_init=1)
km.fit(X)
print "clustering done in %f s"%(time() - t0)

It runs some times, but mostly it ends in the following:

feature extraction done in 0.003542 s
Initialization complete
Traceback (most recent call last):
  File "cluster.py", line 42, in <module>
    km.fit(X)
  File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
735, in fit
    n_jobs=self.n_jobs)
  File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
265, in k_means
    x_squared_norms=x_squared_norms, random_state=random_state)
  File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
380, in _kmeans_single
    centers = _centers(X, labels, k, distances)
  File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line
507, in _centers
    centers[center_id] = X[far_from_centers[reallocated_idx]]
ValueError: setting an array element with a sequence.

What could be wrong here?

Thanks,
Phani Vadrevu
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to