Hello,

Ralf B wrote:
Are there R packages that allow for dynamic clustering, i.e. where the
number of clusters are not predefined? I have a list of numbers that
falls in either 2 or just 1 cluster. Here an example of one that
should be clustered into two clusters:

two <- c(1,2,3,2,3,1,2,3,400,300,400)

and here one that only contains one cluster and would therefore not
need to be clustered at all.

one <- c(400,402,405, 401,410,415, 407,412)

Given a sufficiently large amount of data, a statistical test or an
effect size should be able to determined if a data set makes sense to
be divided i.e. if there are two groups that differ well enough. I am
not familiar with the underlying techniques in kmeans, but I know that
it blindly divides both data sets based on the predefined number of
clusters. Are there any more sophisticated methods that allow me to
determine the number of clusters in a data set based on statistical
tests or effect sizes ?

Caveat: I have very little experience with clustering methods, but maybe this could get you started:

http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

If you only want to make 2 clusters when the means of the data are an order of magnitude apart or more, that's easy enough to do without a statistical test.

For your examples above, I naively tried some functions in the mclust package, which I've never used before:

mclustModel(one, (mclustBIC(one, G=1:2)))$G # gives 1
mclustModel(two, (mclustBIC(two, G=1:2)))$G # gives 2

You'll have to decide for yourself to determine if this is appropriate for your data...or if I'm even using these functions correctly. :)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to