automatically determine cluster number

Ziqi Zhang Fri, 07 Aug 2015 03:03:35 -0700

Hi

I am new to spark and I need to use the clustering functionality toprocess large dataset.

There are between 50k and 1mil objects to cluster. However the problemis that the optimal number of clusters is unknown. we cannot evenestimate a range, except we know there are N objects.

Previously on small dataset I was using R and R's package on calinskiand harabasz to automatically determine cluster number. But with thatamount of data R simply breaks.

So I wonder if spark has implemented any algorithms to automaticallydetermine the cluster number?


Many thanks!!

--
Ziqi Zhang
Research Associate
Department of Computer Science
University of Sheffield


---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

automatically determine cluster number

Reply via email to