Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance

Gael Varoquaux Wed, 04 Apr 2012 22:33:58 -0700

On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote:
> Thanks Chris. So I guess the question becomes how can I efficiently
> cluster 1 million x,y coordinates.


Did you try the scikit-learn's implementation of DBSCAN:
http://scikit-learn.org/stable/modules/clustering.html#dbscan
? I am not sure that it scales, but it's worth trying.

Alternatively, the best way to cluster massive datasets is to use the
mini-batch implementation of KMeans:
http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means

Hope this helps,

Gael
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance

Reply via email to