By the way, I have to say that I am dealing with missing values and that is why I am using clara or I may use pam, as kmeans (which is very good at dealing with large datasets) cannot handle missing values.
Behnam. ________________________________________ From: David L Carlson <dcarl...@tamu.edu> Sent: 21 February 2016 17:55 To: Sarah Goslee; ABABAEI, Behnam Cc: r-help@r-project.org Subject: RE: [R] Why CLARA clustering method does not give the same classes as when I do clustering manually? I do not think this is quite true. When the medoids are not specified, pam/clara looks for a good initial set (build phase) and then finds a local minimum of the objective function (swap phase). Both pam/clara and kmeans can find local minima that are not the global minimum. If the build phase involves any random element, two runs could produce different results. If not, then the original order of the data determines the final result, but the final result is not necessarily the best one possible (assuming the order of the data is irrelevant to the analysis so we are not looking at observations taken along a line in time or space). That is why kmeans includes an argument to run the algorithm multiple times and pick the best result. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee Sent: Friday, February 19, 2016 1:47 PM To: ABABAEI, Behnam Cc: r-help@r-project.org Subject: Re: [R] Why CLARA clustering method does not give the same classes as when I do clustering manually? clara() is a version of pam() adapted to use large datasets. pam() uses the entire dataset, and should give results identical to your manual procedure, or nearly so. clara() works on subsets of the data, so it may give a slightly different result each time you run it. The default parameters for clara() are very small, so you can get substantially different results from run to run on a large dataset if you don't change them. Sarah On Fri, Feb 19, 2016 at 6:30 AM, ABABAEI, Behnam <behnam.abab...@limagrain.com> wrote: > Hi, > > > I am using CLARA (in 'cluster' package). This method is supposed to assign > each observation to the closest 'medoid'. But when I calculate the distance > of medoids and observations manually and assign them manually, the results > are slightly different (1-2 percent of occurrence probability). Does anyone > know how clara calculates dissimilarities and why I get different clustering > results? > > > Behnam. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.