Hi
I am new to spark and I need to use the clustering functionality to
process large dataset.
There are between 50k and 1mil objects to cluster. However the problem
is that the optimal number of clusters is unknown. we cannot even
estimate a range, except we know there are N objects.
Previously on small dataset I was using R and R's package on calinski
and harabasz to automatically determine cluster number. But with that
amount of data R simply breaks.
So I wonder if spark has implemented any algorithms to automatically
determine the cluster number?
Many thanks!!
--
Ziqi Zhang
Research Associate
Department of Computer Science
University of Sheffield
---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org