Hi
I am new to spark and I need to use the clustering functionality to process large dataset.

There are between 50k and 1mil objects to cluster. However the problem is that the optimal number of clusters is unknown. we cannot even estimate a range, except we know there are N objects.

Previously on small dataset I was using R and R's package on calinski and harabasz to automatically determine cluster number. But with that amount of data R simply breaks.

So I wonder if spark has implemented any algorithms to automatically determine the cluster number?

Many thanks!!

--
Ziqi Zhang
Research Associate
Department of Computer Science
University of Sheffield


---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to