Re: [R] cluster size

Tal Galili Fri, 11 Dec 2009 10:57:28 -0800

Hello Karuna,
Christian answer was great and very detailed.

One more approach you might want to try is using the K-medoids algorithm
instead of the K-means.
That can be used through the pam command (from the cluster package)
See more here:
http://stat.ethz.ch/R-manual/R-patched/library/cluster/html/pam.html
(I used manhatan distances)


Once having a similar situation, I found the pam solution to be quite
robuts.
The only thing to note is that needing to use such a solution says something
about how noisy your original data is. And sometimes that should inspire
more thinking about the number of cluster you are using, and whether or not
your clusters are sensible.

Best of luck,
Tal




----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com/ (English)
----------------------------------------------------------------------------------------------




On Fri, Dec 11, 2009 at 5:47 PM, Christian Hennig <chr...@stats.ucl.ac.uk>wrote:

> Dear Ms Karunambigai,
>
> the kmeans algorithm depends on random initialisation.
> There are two basic strategies that can be applied in order to make your
> results reproducible:
> 1) Fix the random number generator by means of set.seed (see ?set.seed)
> before you run kmeans. The problem with this is that your solution can only
> be reproduced using the same random seed; it technically still is random.
> 2) Specify fixed initial centers, using the centers argument in kmeans.
> (Sensible initial centers may be obtained by running hclust using Ward's
> method, obtain the desired number of clusters using cutree and compute the
> centers of the resulting clusters; sorry that I don't have the time right
> now to explain how to do that precisely; the help pages and hopefully some
> understanding of what is going on may help you further.)
>
> An alternative strategy that will not absolutely guarantee reproducibility
> but make your results more stable is to use kmeansruns in library fpc, which
> is a wrapper that runs kmeans several times and gives you the optimal
> solution. That should reproduce its outcome with higher probability (though
> not precisely 1).
> I don't know whether the default value runs=100 is sufficient to give a
> stable solution for your data, but increasing the runs parameter may help.
>
> Cheers,
> Christian
>
>
> On Fri, 11 Dec 2009, karuna m wrote:
>
>  hi r-help,
>> i am doing kmeans clustering in stats. i tried for five clusters
>> clustering using:
>> kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife",
>>
>> "rellife","hordlife","doutlife","symtlife","washlife",
>>
>> "chcklife","rptlife","countlife","coltlife","ordlife")], 5, iter.max = 10,
>> nstart = 1,
>>          algorithm = "Hartigan-Wong")
>>       table(kcl1$cluster)
>> every time i am getting five clusters of different sizes like first time
>> with cluster sizes
>> table(kcl1$cluster)
>>   1   2   3   4   5
>> 140  72 105  98 112
>> second time with cluster sizes
>> table(kcl1$cluster)
>>   1   2   3   4   5
>>  91 149 106  76 105 and so on.
>> I wish to know that whether there is any function to get same sizes of
>> clusters everytime when we do kmeans clustering.
>> Thanks in advance.
>> regards,
>> Ms.Karunambigai M
>> PhD Scholar
>> Dept. of Biostatistics
>> NIMHANS
>> Bangalore
>> India
>>
>>
>>     The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
>>        [[alternative HTML version deleted]]
>>
>>
>>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster size

Reply via email to